A Study of the TextRank Algorithm in Python
TextRank is a graph based algorith for keyword and sententece extraction.
In this post we will go through a tutorial about how to install and use Textrank on Android reviews to extract keywords.
- Python 3.5+
!pip install spacy !pip install pytextrank
import pytextrank import spacy import pandas as pd
For this exercise I will be a using a csv which is about Android reviews.
Lets read the csv file using pandas read_csv()
df = pd.read_csv('data/sample_data.csv')
Lets take a peek in to our data.
|0||0||4||anyone know how to get FM tuner on this launch...|
|1||1||2||Developers of this app need to work hard to fi...|
Lets get rid of Unnamed: 0 column by setting index_col=0 while doing pd.read_csv
df = pd.read_csv('data/sample_data.csv',index_col=0)
set display.max_colwidth', -1 so that data is not truncated in our python notebook.
|0||4||anyone know how to get FM tuner on this launcher? It is available in the dafault launcher but does not show up in app list to add to this one. Otherwise.. great launcher! All I can find on the store are apps for streaming stations but the original launcher did have a real FM tuner which is the only thing missing from this launcher.|
Lets try to find the keywords from few of these reviews.
review1 = df.iloc['review']
Before we do that, we need to load our spacy model.
nlp = spacy.load('en_core_web_sm')
Lets initializer our pytextrank now.
tr = pytextrank.TextRank(logger=None)
Next we need to add textrank as a pipeline to our spacy model.
nlp.add_pipe(tr.PipelineComponent, name="textrank", last=True)
Now we are ready to use our model. Lets load the text in to our spacy model.
doc = nlp(review1)
for phrase in doc._.phrases: print("%s %s %s"%(phrase.rank, phrase.count, phrase.text))
0.1643258973249535 1 app list 0.14870405163352085 1 fm tuner 0.10002872204845309 1 a real fm tuner 0.09741561461611117 1 stations 0.09562079838741741 1 the dafault launcher 0.094116179868447 1 the original launcher 0.07679311366536046 2 this launcher 0.07303293766844456 1 the only thing 0.06477630351859456 1 otherwise.. great launcher 0.053698883087075634 1 the store 0.03965858602000139 1 this one 0.0 3 anyone
As we above the Ist column is the pytext rank. The higher the rank better the quality of extracted keyword.
Lets do another example.
'Developers of this app need to work hard to fine tune. There are many issues in this app. I sent an email to developers but they don\'t bother to reply the email. I can not add system widgets to the screen. If added one, it only displays \\recover\\". Weather is nit displayed on home screen. Doesn\'t support built-in music player and it\'s control. Speed is not accurate. Please try to work on these issues if you really want to make this app the one of its kind."'
doc = nlp(df.iloc['review']) for phrase in doc._.phrases: print(phrase.rank,phrase.count,phrase.chunks)
0.11430978384935088 1 [system widgets] 0.11159252187593624 1 [home screen] 0.10530999092027488 1 [many issues] 0.0979183266371772 1 [fine tune] 0.08643261057360326 1 [nit] 0.08563916592311799 1 [Speed] 0.08201697027034136 2 [Developers, developers] 0.07255614913054882 1 [Weather] 0.06461967687026247 3 [this app, this app, this app] 0.06362587300087594 1 [built-in music player] 0.055491039197743064 2 [an email, the email] 0.05137598599688147 1 [these issues] 0.04561572496611145 1 [the screen] 0.033167906340332974 1 [control] 0.0175899386182573 1 [its kind] 0.0 8 [I, they, I, it, it, you, one, one]
Commonly encountered errors while installing spacy
You might run in to following error while loading Spacy model spacy.load("en_core_web_sm")
OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
Do following to fix that.
!python3 -m spacy download en_core_web_sm
This tutorial just introduces users to Textrank algorithm. In the next tutorial, I will go over how to improve the results of Textrank algorithm.
- How to Analyze the CSV data in Pandas
- How to Generate Random Numbers in Python
- How to Plot a Histogram in Python
- Summarising Aggregating and Grouping data in Python Pandas
- How to Convert Python Pandas DataFrame into a List
- How to Visualize Data Using Python - Matplotlib
- 3 Ways to Rename Columns in Pandas DataFrame
- Merge and Join DataFrames with Pandas in Python
- An Anatomy of Key Tricks in word2vec project with examples