in Programming, Search Engine Optimization

Calculate Keyword Growth Using Google Trends, Python, and the Slope Formula

During a presentation I gave at Distilled’s SearchLove Boston conference in early May, I advocated that people use the slope formula and Google Trends data to determine if interest keywords have grown over time or if they are slipping away into searcher oblivion.

The only problem with this, is that unfortunately Google doesn’t provide an official API for Google Trends, so we need some Python wizardry to do this in bulk.

Make sure you have the pandas Python library installed. If you’re running Windows, using a Python Distribution like Anaconda will make this whole lot easier.

Next, install the pytrends library using pip:

pip install pytrends

Awesome! You’re almost there.

Next, create a CSV file with a single column named, Keywords (it’s case sensitive). In this column, enter all the keywords for which you would like to know their slope.

Download the following script to the same folder as your csv file with the keywords.

The Python Script:

# Requires pandas. For Windows users, I recommend installing the Anaconda Python distirbution.
# Requires the pytrends library. To install, run "pip install pytrends".
from pytrends.pyGTrends import pyGTrends
import time
import os
from random import randint
import pandas as pd

# Add your Gmail username to the google_username variable and your Gmail password to the google_password variable.
google_username = ""
google_password = ""
connector = pyGTrends(google_username, google_password)

# This script downloads a series of CSV files from Google Trends. Please specify a filepath for where you'd like these files to be stored in the below variable.
path = ""

# Specify the filename of a CSV with a list of keywords in the variable, keyordcsv. The CSV should be one column, with header equal to Keywords (case sensitive).
keywordcsv = "keywords.csv"
keywords = pd.read_csv(keywordcsv)

# Downloads and Calculate Slope:
keywordlist = pd.DataFrame(columns=["keyword","slope"])
for index, row in keywords.iterrows():
    print("Downloading Keyword #" + str(index))
    payload = {'geo': 'US', 'q': [row[0]]}
    connector.request_report(payload)
    time.sleep(randint(5, 10))
    connector.save_csv(path, str(index))
    csvname = str(index)+".csv"
    trenddata = pd.read_csv(csvname, skiprows=4, names=['date', 'values'])
    keyword = trenddata['values'].loc[[0]][0]
    trenddata = trenddata.ix[1:]
    trenddata['keyword'] = keyword
    trenddata.rename(columns={'values': 'trends'}, inplace=True)
    trenddata['trends'] = pd.to_numeric(trenddata['trends'], errors='coerce')
    trenddata['date'] = trenddata['date'].str.extract('(^[0-9]{4}\-[0-9]{2}\-[0-9]{2}) \-.*')
    trenddata = trenddata.dropna()
    trenddata['date'] = pd.to_datetime(trenddata['date'])
    trenddata['year'] = pd.DatetimeIndex(trenddata['date']).year
    trenddata['month'] = pd.DatetimeIndex(trenddata['date']).month
    trenddata['day'] = pd.DatetimeIndex(trenddata['date']).day

    maxyear = trenddata['year'].max()
    grouped = trenddata.groupby(['year']).mean()

    def slope_formula(xone, yone, xtwo, ytwo):
        return (ytwo-yone)/(xtwo-xone)

    maxyear = trenddata['year'].max()
    grouped = trenddata.groupby(['year']).mean()
    slope = slope_formula(1,float(grouped.loc[grouped.index==maxyear-2]['trends']),
                          2,float(grouped.loc[grouped.index==maxyear-1]['trends']))
    keywordlist = keywordlist.append({'keyword':keyword,'slope':slope}, ignore_index=True)
    os.remove(csvname)

# Specify a csv filename to output the slope values.
keywordlist.to_csv("trends_slope.csv", sep=",", encoding="utf-8", index=False)

print("Slope calculation and CSV export complete.")

Add your Google username to the google_username variable and your Google password to the google_password variable.

Run the Python script:

python GoogleTrendsSlopeCalculator.py

It will take some time, depending on the number of keywords you’re examining. For each keyword, the script downloads an export from Google Trends. The files are stored temporarily and then deleted.

What’s the Deal with Slope?

Slope calculates the change in y or the change in x on a graph, or rise over run. In our case, we are looking at the change in search interest over a change in time, so we use slope.

slope formula

Looking at historic search interest, allows us to account for typical seasonality. If Google Keyword Planner provided search volume data for more than a single year, we would use that data instead, but since they do not, we rely on Google Trends data.

For example, Keyword X has an average search interest of 49 this year, but the previous year it had an average search interest of 45.

The slope would be calculated as 49-45/2016-2015, which would equal 4, indicating small, but positive growth.

Google Trends Trend Lines Aren’t Straight!

If you’re savvy (or remember elementary school math), you know that in order to calculate slope from one year to another like we want, it would require use to have a straight line. Since our trend line will never be perfectly straight, we solve for this by averaging the search interest values from Google Trends for each year.

sample google trends graph

TL;DR

We can use the slope formula to determine whether a keyword has grown or not in search interest. In order to accomplish the slope calculation for a collection of keywords, you can use my nifty Python script.

Write a Comment

Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

12 Comments

  1. Why go through the hassle of Gmail authentication. Why not scrape the graph directly ?

    Nevertheless, Great and inspiring article.

    Curious how you’d use this data in your keyword research process.

  2. you got a list of cat ids?

    how to modify to loop through cats with list of keyowrds??
    how long till you get locked out?
    does it randomly mix browsers? I see a couple of other scripts on github, mixing up the browser.

    Also do you use rapidminer? Its similar to Knime, I find it easier more intuative. I did try Knime, but the scraping addons where not working.

    Any chance of a tut with Knime or the process??

    cheers. Lee

    • I’ve not pushed it to the point of getting locked out yet. If you hit a limit, I’d love to know. You can use categories by changing the payload variable like (untested):

      payload = {'geo': 'US', 'q': [row[0]], 'cat': '0-71'}

      It’s not mixing up the browser as far as I’m aware, but the trends downloading is being handled by a 3rd party library which I haven’t dissected.

      KNIME is open source and Rapidminer is not, so KNIME is my preference 🙂

  3. I believe pyGtrends doesn’t exist anymore. Instead I did: from pytrends.request import TrendReq.

    I then updated connector to be: connector = TrendReq(google_username, google_password, hl=’en-US’, tz=360, custom_useragent=None)

    However, this then leads to problems with connector not having the proper attributes. Any ideas?

  4. line 3 replace : from pytrends.request import TrendReq
    line 12 replace : connector = TrendReq(google_username, google_password)

    and its work .. 🙂

    thanks for the script

    • Hey, I updated the script with your suggestions and it’s almost working – now I just get this error however:
      Traceback (most recent call last):
      File “GoogleTrendsSlopeCalculator.py”, line 26, in
      connector.request_report(payload)
      AttributeError: ‘TrendReq’ object has no attribute ‘request_report’

      Might you know how to fix this?