A Study of the TextRank Algorithm in Python

TextRank is a graph based algorithm for keyword and sentence extraction. It is similar in nature to Google's page rank algorithm.

In this post we will go through a tutorial about how to install and use Textrank on Android reviews to extract keywords.

Requirements:

Python 3.5+
Spacy
Pytextrank

!pip install spacy
!pip install pytextrank

import pytextrank
import spacy
import pandas as pd

For this exercise I will be a using a csv which is about Android reviews.

!ls data/sample_data.csv

data/sample_data.csv

Let us read the csv file using pandas read_csv()

df = pd.read_csv('data/sample_data.csv')

Let us take a peek in to our data.

df.head(2)

Lets get rid of Unnamed: 0 column by setting index_col=0 while doing pd.read_csv

df = pd.read_csv('data/sample_data.csv',index_col=0)

set display.max_colwidth', -1 so that data is not truncated in our python notebook.

pd.set_option('display.max_colwidth', -1)

df.head(1)

Lets try to find the keywords from few of these reviews.

review1 = df.iloc[0]['review']

Before we do that, we need to load our spacy model.

nlp = spacy.load('en_core_web_sm')

Lets initializer our pytextrank now.

tr = pytextrank.TextRank(logger=None)

Next we need to add textrank as a pipeline to our spacy model.

nlp.add_pipe(tr.PipelineComponent, name="textrank", last=True)

Now we are ready to use our model. Lets load the text in to our spacy model.

doc = nlp(review1)

for phrase in doc._.phrases:
    print("%s %s %s"%(phrase.rank, phrase.count, phrase.text))

0.1643258973249535 1 app list
0.14870405163352085 1 fm tuner
0.10002872204845309 1 a real fm tuner
0.09741561461611117 1 stations
0.09562079838741741 1 the dafault launcher
0.094116179868447 1 the original launcher
0.07679311366536046 2 this launcher
0.07303293766844456 1 the only thing
0.06477630351859456 1 otherwise.. great launcher
0.053698883087075634 1 the store
0.03965858602000139 1 this one
0.0 3 anyone

As we above the Ist column is the pytext rank. The higher the rank better the quality of extracted keyword.

Lets do another example.

df.iloc[1]['review']

'Developers of this app need to work hard to fine tune. There are many issues in this app. I sent an email to developers but they don\'t bother to reply the email. I can not add system widgets to the screen. If added one, it only displays \\recover\\". Weather is nit displayed on home screen. Doesn\'t support built-in music player and it\'s control. Speed is not accurate. Please try to work on these issues if you really want to make this app the one of its kind."'

doc = nlp(df.iloc[1]['review'])
for phrase in doc._.phrases:
    print(phrase.rank,phrase.count,phrase.chunks)

0.11430978384935088 1 [system widgets]
0.11159252187593624 1 [home screen]
0.10530999092027488 1 [many issues]
0.0979183266371772 1 [fine tune]
0.08643261057360326 1 [nit]
0.08563916592311799 1 [Speed]
0.08201697027034136 2 [Developers, developers]
0.07255614913054882 1 [Weather]
0.06461967687026247 3 [this app, this app, this app]
0.06362587300087594 1 [built-in music player]
0.055491039197743064 2 [an email, the email]
0.05137598599688147 1 [these issues]
0.04561572496611145 1 [the screen]
0.033167906340332974 1 [control]
0.0175899386182573 1 [its kind]
0.0 8 [I, they, I, it, it, you, one, one]

Commonly encountered errors while installing spacy

You might run in to following error while loading Spacy model spacy.load("en_core_web_sm")

OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

Do following to fix that.

!python3 -m spacy download en_core_web_sm

Wrap Up!

This tutorial just introduces users to Textrank algorithm. In the next tutorial, I will go over how to improve the results of Textrank algorithm.

	Unnamed: 0	rating	review
0	0	4	anyone know how to get FM tuner on this launch...
1	1	2	Developers of this app need to work hard to fi...

A Study of the TextRank Algorithm in Python

Requirements:

Commonly encountered errors while installing spacy

Wrap Up!

Related Topics

Related Notebooks