Calculate Keyword Growth via Slope: Google Trends & Python

During a presentation I gave at Distilled’s SearchLove Boston conference in early May, I advocated that people use the slope formula and Google Trends data to determine if interest keywords have grown over time or if they are slipping away into searcher oblivion.

The only problem with this, is that unfortunately Google doesn’t provide an official API for Google Trends, so we need some Python wizardry to do this in bulk.

Make sure you have the pandas Python library installed. If you’re running Windows, using a Python Distribution like Anaconda will make this whole lot easier.

Next, install the pytrends library using pip:

pip install pytrends

Awesome! You’re almost there.

Next, create a CSV file with a single column named, Keywords (it’s case sensitive). In this column, enter all the keywords for which you would like to know their slope.

Download the following script to the same folder as your csv file with the keywords.

The Python Script:

# Requires pandas. For Windows users, I recommend installing the Anaconda Python distirbution.
# Requires the pytrends library. To install, run "pip install pytrends".
from pytrends.pyGTrends import pyGTrends
import time
import os
from random import randint
import pandas as pd

# Add your Gmail username to the google_username variable and your Gmail password to the google_password variable.
google_username = ""
google_password = ""
connector = pyGTrends(google_username, google_password)

# This script downloads a series of CSV files from Google Trends. Please specify a filepath for where you'd like these files to be stored in the below variable.
path = ""

# Specify the filename of a CSV with a list of keywords in the variable, keyordcsv. The CSV should be one column, with header equal to Keywords (case sensitive).
keywordcsv = "keywords.csv"
keywords = pd.read_csv(keywordcsv)

# Downloads and Calculate Slope:
keywordlist = pd.DataFrame(columns=["keyword","slope"])
for index, row in keywords.iterrows():
    print("Downloading Keyword #" + str(index))
    payload = {'geo': 'US', 'q': [row[0]]}
    connector.request_report(payload)
    time.sleep(randint(5, 10))
    connector.save_csv(path, str(index))
    csvname = str(index)+".csv"
    trenddata = pd.read_csv(csvname, skiprows=4, names=['date', 'values'])
    keyword = trenddata['values'].loc[[0]][0]
    trenddata = trenddata.ix[1:]
    trenddata['keyword'] = keyword
    trenddata.rename(columns={'values': 'trends'}, inplace=True)
    trenddata['trends'] = pd.to_numeric(trenddata['trends'], errors='coerce')
    trenddata['date'] = trenddata['date'].str.extract('(^[0-9]{4}\-[0-9]{2}\-[0-9]{2}) \-.*')
    trenddata = trenddata.dropna()
    trenddata['date'] = pd.to_datetime(trenddata['date'])
    trenddata['year'] = pd.DatetimeIndex(trenddata['date']).year
    trenddata['month'] = pd.DatetimeIndex(trenddata['date']).month
    trenddata['day'] = pd.DatetimeIndex(trenddata['date']).day

    maxyear = trenddata['year'].max()
    grouped = trenddata.groupby(['year']).mean()

    def slope_formula(xone, yone, xtwo, ytwo):
        return (ytwo-yone)/(xtwo-xone)

    maxyear = trenddata['year'].max()
    grouped = trenddata.groupby(['year']).mean()
    slope = slope_formula(1,float(grouped.loc[grouped.index==maxyear-2]['trends']),
                          2,float(grouped.loc[grouped.index==maxyear-1]['trends']))
    keywordlist = keywordlist.append({'keyword':keyword,'slope':slope}, ignore_index=True)
    os.remove(csvname)

# Specify a csv filename to output the slope values.
keywordlist.to_csv("trends_slope.csv", sep=",", encoding="utf-8", index=False)

print("Slope calculation and CSV export complete.")

Add your Google username to the google_username variable and your Google password to the google_password variable.

Run the Python script:

python GoogleTrendsSlopeCalculator.py

It will take some time, depending on the number of keywords you’re examining. For each keyword, the script downloads an export from Google Trends. The files are stored temporarily and then deleted.

What’s the Deal with Slope?

Slope calculates the change in y or the change in x on a graph, or rise over run. In our case, we are looking at the change in search interest over a change in time, so we use slope.

Looking at historic search interest, allows us to account for typical seasonality. If Google Keyword Planner provided search volume data for more than a single year, we would use that data instead, but since they do not, we rely on Google Trends data.

For example, Keyword X has an average search interest of 49 this year, but the previous year it had an average search interest of 45.

The slope would be calculated as 49-45/2016-2015, which would equal 4, indicating small, but positive growth.

Google Trends Trend Lines Aren’t Straight!

If you’re savvy (or remember elementary school math), you know that in order to calculate slope from one year to another like we want, it would require use to have a straight line. Since our trend line will never be perfectly straight, we solve for this by averaging the search interest values from Google Trends for each year.

TL;DR

We can use the slope formula to determine whether a keyword has grown or not in search interest. In order to accomplish the slope calculation for a collection of keywords, you can use my nifty Python script.

12 Comments

curious mind

May 22, 2016

Why go through the hassle of Gmail authentication. Why not scrape the graph directly ?

Nevertheless, Great and inspiring article.

Curious how you’d use this data in your keyword research process.

Reply to curious
- Paul Shapiro
  
  September 12, 2016
  
  You authenticate so it allows you to download the CSV.
  
  For how it’s being used in my process, check out the SlideShare embedded in the post, it explains 🙂
  
  Reply to Paul
Lee Mac

August 19, 2016

you got a list of cat ids?

how to modify to loop through cats with list of keyowrds??
how long till you get locked out?
does it randomly mix browsers? I see a couple of other scripts on github, mixing up the browser.

Also do you use rapidminer? Its similar to Knime, I find it easier more intuative. I did try Knime, but the scraping addons where not working.

Any chance of a tut with Knime or the process??

cheers. Lee

Reply to Lee
- Paul Shapiro
  
  September 12, 2016
  
  I’ve not pushed it to the point of getting locked out yet. If you hit a limit, I’d love to know. You can use categories by changing the payload variable like (untested):
  
  payload = {'geo': 'US', 'q': [row[0]], 'cat': '0-71'}
  
  It’s not mixing up the browser as far as I’m aware, but the trends downloading is being handled by a 3rd party library which I haven’t dissected.
  
  KNIME is open source and Rapidminer is not, so KNIME is my preference 🙂
  
  Reply to Paul
Francisco

July 21, 2017

I believe pyGtrends doesn’t exist anymore. Instead I did: from pytrends.request import TrendReq.

I then updated connector to be: connector = TrendReq(google_username, google_password, hl=’en-US’, tz=360, custom_useragent=None)

However, this then leads to problems with connector not having the proper attributes. Any ideas?

Reply to Francisco
- Paul Shapiro
  
  October 12, 2017
  
  Maybe try installing from github? https://github.com/GeneralMills/pytrends
  
  Reply to Paul
  - Paul Shapiro
    
    October 12, 2017
    
    Okay, just looked at the documentation. Ignore my previous comment. They changed somethings. You’ll have to adjust accordingly unfortunately.
    
    Reply to Paul
    - Josh
      
      February 2, 2018
      
      Hi! Actually you can download csv report without being connected. I made a crappy python script with two functions (get_volume_keyword_year, get_volume_keyword_month). Just choose your keyword and that’s it.
      https://pastebin.com/raw/i89DsQXc
      
      Sorry pastebin broke the indentation
      
      Reply to Josh
      - Arshpreet
        
        October 29, 2018
        
        I was looking that kind of “Really” simple stuff only!!
    - Sumit Dalsania
      
      October 19, 2018
      
      If anybody could elaborate at all on what “adjusting accordingly” means, I’d really appreciate it.
      
      Reply to Sumit
Donzay

August 3, 2018

line 3 replace : from pytrends.request import TrendReq
line 12 replace : connector = TrendReq(google_username, google_password)

and its work .. 🙂

thanks for the script

Reply to Donzay
- Jordan
  
  August 21, 2018
  
  Hey, I updated the script with your suggestions and it’s almost working – now I just get this error however:
  Traceback (most recent call last):
  File “GoogleTrendsSlopeCalculator.py”, line 26, in
  connector.request_report(payload)
  AttributeError: ‘TrendReq’ object has no attribute ‘request_report’
  
  Might you know how to fix this?
  
  Reply to Jordan

Published

May 11, 2016

Updated

July 28, 2017

Paul Shapiro in Programming, Search Engine Optimization | May 11, 2016

Calculate Keyword Growth Using Google Trends, Python, and the Slope Formula