A Study of the TextRank Algorithm in Python
TextRank is a graph based algorithm for keyword and sentence extraction. It is similar in nature to Google's page rank algorithm.
In this post we will go through a tutorial about how to install and use Textrank on Android reviews to extract keywords.
- Python 3.5+
!pip install spacy !pip install pytextrank
import pytextrank import spacy import pandas as pd
For this exercise I will be a using a csv which is about Android reviews.
Let us read the csv file using pandas read_csv()
df = pd.read_csv('data/sample_data.csv')
Let us take a peek in to our data.
|0||0||4||anyone know how to get FM tuner on this launch...|
|1||1||2||Developers of this app need to work hard to fi...|
Lets get rid of Unnamed: 0 column by setting index_col=0 while doing pd.read_csv
df = pd.read_csv('data/sample_data.csv',index_col=0)
set display.max_colwidth', -1 so that data is not truncated in our python notebook.
|0||4||anyone know how to get FM tuner on this launcher? It is available in the dafault launcher but does not show up in app list to add to this one. Otherwise.. great launcher! All I can find on the store are apps for streaming stations but the original launcher did have a real FM tuner which is the only thing missing from this launcher.|
Lets try to find the keywords from few of these reviews.
review1 = df.iloc['review']
Before we do that, we need to load our spacy model.
nlp = spacy.load('en_core_web_sm')
Lets initializer our pytextrank now.
tr = pytextrank.TextRank(logger=None)
Next we need to add textrank as a pipeline to our spacy model.
nlp.add_pipe(tr.PipelineComponent, name="textrank", last=True)
Now we are ready to use our model. Lets load the text in to our spacy model.
doc = nlp(review1)
for phrase in doc._.phrases: print("%s %s %s"%(phrase.rank, phrase.count, phrase.text))
0.1643258973249535 1 app list 0.14870405163352085 1 fm tuner 0.10002872204845309 1 a real fm tuner 0.09741561461611117 1 stations 0.09562079838741741 1 the dafault launcher 0.094116179868447 1 the original launcher 0.07679311366536046 2 this launcher 0.07303293766844456 1 the only thing 0.06477630351859456 1 otherwise.. great launcher 0.053698883087075634 1 the store 0.03965858602000139 1 this one 0.0 3 anyone
As we above the Ist column is the pytext rank. The higher the rank better the quality of extracted keyword.
Lets do another example.
'Developers of this app need to work hard to fine tune. There are many issues in this app. I sent an email to developers but they don\'t bother to reply the email. I can not add system widgets to the screen. If added one, it only displays \\recover\\". Weather is nit displayed on home screen. Doesn\'t support built-in music player and it\'s control. Speed is not accurate. Please try to work on these issues if you really want to make this app the one of its kind."'
doc = nlp(df.iloc['review']) for phrase in doc._.phrases: print(phrase.rank,phrase.count,phrase.chunks)
0.11430978384935088 1 [system widgets] 0.11159252187593624 1 [home screen] 0.10530999092027488 1 [many issues] 0.0979183266371772 1 [fine tune] 0.08643261057360326 1 [nit] 0.08563916592311799 1 [Speed] 0.08201697027034136 2 [Developers, developers] 0.07255614913054882 1 [Weather] 0.06461967687026247 3 [this app, this app, this app] 0.06362587300087594 1 [built-in music player] 0.055491039197743064 2 [an email, the email] 0.05137598599688147 1 [these issues] 0.04561572496611145 1 [the screen] 0.033167906340332974 1 [control] 0.0175899386182573 1 [its kind] 0.0 8 [I, they, I, it, it, you, one, one]
Commonly encountered errors while installing spacy
You might run in to following error while loading Spacy model spacy.load("en_core_web_sm")
OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
Do following to fix that.
!python3 -m spacy download en_core_web_sm
This tutorial just introduces users to Textrank algorithm. In the next tutorial, I will go over how to improve the results of Textrank algorithm.
- How to Analyze the CSV data in Pandas
- Python IndexError List Index Out of Range
- An Anatomy of Key Tricks in word2vec project with examples
- How to Plot a Histogram in Python
- Summarising Aggregating and Grouping data in Python Pandas
- Five Ways To Remove Characters From A String In Python
- Dictionaries In Python
- Activation Functions In Python
- Tidy Data In R