During a presentation I gave at Distilled’s SearchLove Boston conference in early May, I advocated that people use the slope formula and Google Trends data to determine if interest keywords have grown over time or if they are slipping away into searcher oblivion.
The only problem with this, is that unfortunately Google doesn’t provide an official API for Google Trends, so we need some Python wizardry to do this in bulk.
Make sure you have the pandas Python library installed. If you’re running Windows, using a Python Distribution like Anaconda will make this whole lot easier.
Next, install the pytrends library using pip:
pip install pytrends
Awesome! You’re almost there.
Next, create a CSV file with a single column named, Keywords (it’s case sensitive). In this column, enter all the keywords for which you would like to know their slope.
Download the following script to the same folder as your csv file with the keywords.
The Python Script:
# Requires pandas. For Windows users, I recommend installing the Anaconda Python distirbution. # Requires the pytrends library. To install, run "pip install pytrends". from pytrends.pyGTrends import pyGTrends import time import os from random import randint import pandas as pd # Add your Gmail username to the google_username variable and your Gmail password to the google_password variable. google_username = "" google_password = "" connector = pyGTrends(google_username, google_password) # This script downloads a series of CSV files from Google Trends. Please specify a filepath for where you'd like these files to be stored in the below variable. path = "" # Specify the filename of a CSV with a list of keywords in the variable, keyordcsv. The CSV should be one column, with header equal to Keywords (case sensitive). keywordcsv = "keywords.csv" keywords = pd.read_csv(keywordcsv) # Downloads and Calculate Slope: keywordlist = pd.DataFrame(columns=["keyword","slope"]) for index, row in keywords.iterrows(): print("Downloading Keyword #" + str(index)) payload = {'geo': 'US', 'q': [row[0]]} connector.request_report(payload) time.sleep(randint(5, 10)) connector.save_csv(path, str(index)) csvname = str(index)+".csv" trenddata = pd.read_csv(csvname, skiprows=4, names=['date', 'values']) keyword = trenddata['values'].loc[[0]][0] trenddata = trenddata.ix[1:] trenddata['keyword'] = keyword trenddata.rename(columns={'values': 'trends'}, inplace=True) trenddata['trends'] = pd.to_numeric(trenddata['trends'], errors='coerce') trenddata['date'] = trenddata['date'].str.extract('(^[0-9]{4}\-[0-9]{2}\-[0-9]{2}) \-.*') trenddata = trenddata.dropna() trenddata['date'] = pd.to_datetime(trenddata['date']) trenddata['year'] = pd.DatetimeIndex(trenddata['date']).year trenddata['month'] = pd.DatetimeIndex(trenddata['date']).month trenddata['day'] = pd.DatetimeIndex(trenddata['date']).day maxyear = trenddata['year'].max() grouped = trenddata.groupby(['year']).mean() def slope_formula(xone, yone, xtwo, ytwo): return (ytwo-yone)/(xtwo-xone) maxyear = trenddata['year'].max() grouped = trenddata.groupby(['year']).mean() slope = slope_formula(1,float(grouped.loc[grouped.index==maxyear-2]['trends']), 2,float(grouped.loc[grouped.index==maxyear-1]['trends'])) keywordlist = keywordlist.append({'keyword':keyword,'slope':slope}, ignore_index=True) os.remove(csvname) # Specify a csv filename to output the slope values. keywordlist.to_csv("trends_slope.csv", sep=",", encoding="utf-8", index=False) print("Slope calculation and CSV export complete.")
Add your Google username to the google_username variable and your Google password to the google_password variable.
Run the Python script:
python GoogleTrendsSlopeCalculator.py
It will take some time, depending on the number of keywords you’re examining. For each keyword, the script downloads an export from Google Trends. The files are stored temporarily and then deleted.
What’s the Deal with Slope?
Slope calculates the change in y or the change in x on a graph, or rise over run. In our case, we are looking at the change in search interest over a change in time, so we use slope.
Looking at historic search interest, allows us to account for typical seasonality. If Google Keyword Planner provided search volume data for more than a single year, we would use that data instead, but since they do not, we rely on Google Trends data.
For example, Keyword X has an average search interest of 49 this year, but the previous year it had an average search interest of 45.
The slope would be calculated as 49-45/2016-2015, which would equal 4, indicating small, but positive growth.
Google Trends Trend Lines Aren’t Straight!
If you’re savvy (or remember elementary school math), you know that in order to calculate slope from one year to another like we want, it would require use to have a straight line. Since our trend line will never be perfectly straight, we solve for this by averaging the search interest values from Google Trends for each year.
TL;DR
We can use the slope formula to determine whether a keyword has grown or not in search interest. In order to accomplish the slope calculation for a collection of keywords, you can use my nifty Python script.
Why go through the hassle of Gmail authentication. Why not scrape the graph directly ?
Nevertheless, Great and inspiring article.
Curious how you’d use this data in your keyword research process.
You authenticate so it allows you to download the CSV.
For how it’s being used in my process, check out the SlideShare embedded in the post, it explains 🙂
you got a list of cat ids?
how to modify to loop through cats with list of keyowrds??
how long till you get locked out?
does it randomly mix browsers? I see a couple of other scripts on github, mixing up the browser.
Also do you use rapidminer? Its similar to Knime, I find it easier more intuative. I did try Knime, but the scraping addons where not working.
Any chance of a tut with Knime or the process??
cheers. Lee
I’ve not pushed it to the point of getting locked out yet. If you hit a limit, I’d love to know. You can use categories by changing the payload variable like (untested):
payload = {'geo': 'US', 'q': [row[0]], 'cat': '0-71'}
It’s not mixing up the browser as far as I’m aware, but the trends downloading is being handled by a 3rd party library which I haven’t dissected.
KNIME is open source and Rapidminer is not, so KNIME is my preference 🙂
I believe pyGtrends doesn’t exist anymore. Instead I did: from pytrends.request import TrendReq.
I then updated connector to be: connector = TrendReq(google_username, google_password, hl=’en-US’, tz=360, custom_useragent=None)
However, this then leads to problems with connector not having the proper attributes. Any ideas?
Maybe try installing from github? https://github.com/GeneralMills/pytrends
Okay, just looked at the documentation. Ignore my previous comment. They changed somethings. You’ll have to adjust accordingly unfortunately.
Hi! Actually you can download csv report without being connected. I made a crappy python script with two functions (get_volume_keyword_year, get_volume_keyword_month). Just choose your keyword and that’s it.
https://pastebin.com/raw/i89DsQXc
Sorry pastebin broke the indentation
I was looking that kind of “Really” simple stuff only!!
If anybody could elaborate at all on what “adjusting accordingly” means, I’d really appreciate it.
line 3 replace : from pytrends.request import TrendReq
line 12 replace : connector = TrendReq(google_username, google_password)
and its work .. 🙂
thanks for the script
Hey, I updated the script with your suggestions and it’s almost working – now I just get this error however:
Traceback (most recent call last):
File “GoogleTrendsSlopeCalculator.py”, line 26, in
connector.request_report(payload)
AttributeError: ‘TrendReq’ object has no attribute ‘request_report’
Might you know how to fix this?