in Search Engine Optimization

Automatically Download Google Search Analytics Data Every Month [Updated for New API]

Google Webmaster Tools (recently renamed Google Search Console) is a treasure trove of data, especially useful keyword information.

Unfortunately, it isn’t stored for a long enough period to be useful for any trending.

If you want to get more use out of Google Webmaster Tools data, it is necessary to store it in a database, which may be challenging for some without a developer background.

I’ve been storing GWMT data ever since (not provided) came into full swing, and I recommend everyone do so as well.

Here is the Python script I have been using to download this data on a monthly basis…

Note: This script has been updated to use the new Search Analytics API.
Click here to jump to the new script.

Click Here to Download my GWMT Python Script.

To use the script, you will probably want to get some hosting like Amazon AWS or Digital Ocean.

I am using and recommend Digital Ocean since it’s a little bit easier to use than AWS and only runs me $5 per month.

You’ll need to have Python and MySQL installed.

You need to install the MySQLdb Python library, which can be a wonky process. If you’re hosting is on a Debian-based distribution like Ubuntu, I recommend installing it with apt-get:

Then you need to install gwmt-dl. To install, SSH into your server and run:

Edit my script to include your MySQL database information and your GWMT login details (see annotations in the script comments on lines 13-21).

To schedule the script to run monthly:

and this command to the bottom:

Hit ‘Control + X’ and then ‘Y’ to save and exit the cron file.

It will automatically download your webmaster tools data at 12:00am the 1st of every month.

You can use a tool like HeidiSQL (Windows) or Sequel Pro (OS X) to run queries, and analyze your data.

gwmt mysql database structure

You can even use these tools to export the data in Excel, if that’s your cup of tea. I usually throw it into a tool like Tableau or TIBCO Spotfire since I have access to them.

Branded Non-Branded Impressions

Disclaimer:

Google announced that they would be discontinuing the original Google Webmaster Tools API on April 20, 2015 and the new API didn’t have the ability to download the useful keyword information found in the Search Queries (recently renamed “Search Analytics”) report.

Thankfully, they are still allowing you to download this data via a method in a Python script they had previously provided.

Since my script is using a similar method, it is still functional for the time being. I will provide an updated script at the time that this script stops functioning.

Update:

Google is currently testing an updated API for accessing this data from the Search Console. I will release updated code once I am permitted to under the non-disclosure agreement.


Update – 8/5/2015: Search Analytics API

Here’s the updated code using the new v3 API for Google Search Console for Search Analytics. You’ll need to follow the instructions found here under “Run a sample query” to install the necessary libraries and generate a “client_secrets.json” file.

Download the updated Google Search Console Search Analytics Python script.

You’ll also have to modify your Cron accordingly to account for new command (and date range selection).

For example, I can run the script for my site using the following command:

Write a Comment

Comment

21 Comments

  1. I have a question. What are the benefits of downloading all the search data rather than just looking at the search data provided by Google Universal Analytics.

    • Hey David,

      The data in Google Analytics found under Acquisition->Search Engine Optimization->Queries is the same data found within Google Webmaster Tools. Viewed within Webmaster Tools and Google Analytics, only 90 days of data is available.

      If you’re referring to the data found in Google Analytics under Acquisition->Campaigns->Organic Keywords, then that data is woefully incomplete ever since Google went forward and enabled encrypted search on all searches. You’re likely to find that >90% of your keyword data shows as “(not provided)”, rendering the data useless as a sample.

      Hope that helps!

      -Paul

      • I have a question about this statement. Using the Acquisition->Search Engine Optimization->Queries report in Google Analytics, I get slightly different results than the Keywords tool in Search Console, specifically for CTR.

        Any ideas why?

  2. This is exactly what I was looking for. thanks for sharing the code. I also made some modifications:
    – I cannot install MySQLdb so I installed PyMySQL
    – the code can then be tweaked to using PyMySQL by replacing MySQLdb with pymysql

    • Can you please post your solution code here. I ran into the same issue as you because Python 3.x has problems with MySQLdb. I was able to successfully install PyMySQL. Now just the final link is the tweaked file. It would really help a lot. Thanks in advance.

  3. Paul- this write up is an answered prayer, thank you!

    Now, this might seem like a silly question…could this script be integrated into an AdWords script? E.g., grab keywords and bid down on avg position of X or lower?

    Excited to dive in this week…

  4. Total n00b question, but in the updated Python script for Search Analytics, at what point does the client_secrets.json come into play? I’ve read over the example in Google’s docs and yours as well, but don’t understand at what point the authorization is taking place :-/

    For instance, I’m assuming it’s somewhere here:
    service, flags = sample_tools.init(
    argv, ‘webmasters’, ‘v3′, __doc__, __file__, parents=[argparser],
    scope=’https://www.googleapis.com/auth/webmasters.readonly’)

    But that is *a lot* different from the other found here https://developers.google.com/webmaster-tools/v3/quickstart/quickstart-python:
    # Copy your credentials from the console
    CLIENT_ID = ‘YOUR_CLIENT_ID’
    CLIENT_SECRET = ‘YOUR_CLIENT_SECRET’

    # Check https://developers.google.com/webmaster-tools/v3/ for all available scopes
    OAUTH_SCOPE = ‘https://www.googleapis.com/auth/webmasters.readonly’

    # Redirect URI for installed apps
    REDIRECT_URI = ‘urn:ietf:wg:oauth:2.0:oob’

    # Run through the OAuth flow and retrieve credentials
    flow = OAuth2WebServerFlow(CLIENT_ID, CLIENT_SECRET, OAUTH_SCOPE, REDIRECT_URI)
    authorize_url = flow.step1_get_authorize_url()
    print ‘Go to the following link in your browser: ‘ + authorize_url
    code = raw_input(‘Enter verification code: ‘).strip()
    credentials = flow.step2_exchange(code)

    # Create an httplib2.Http object and authorize it with our credentials
    http = httplib2.Http()
    http = credentials.authorize(http)

    webmasters_service = build(‘webmasters’, ‘v3’, http=http)

  5. Total n00b question, but in the updated Python script for Search Analytics, at what point does the client_secrets.json come into play? I’ve read over the example in Google’s docs and yours as well, but don’t understand at what point the authorization is taking place :-/

    For instance, I’m assuming it’s somewhere here:
    service, flags = sample_tools.init(
    argv, ‘webmasters’, ‘v3′, __doc__, __file__, parents=[argparser],
    scope=’https://www.googleapis.com/auth/webmasters.readonly’)

    But that is *a lot* different from the other found here: https://developers.google.com/webmaster-tools/v3/quickstart/quickstart-python

    • Nice Alex. Anyone needing to download to CSV can use your script. My script actually dumps to CSV and then writes to SQL, but I think people will appreciate not having to modify/write their own code 🙂

  6. Hey, thanks for providing the script. I got the updated version to work without much hassle, but am still facing a problem. For most URLs, I get the following error message:

    googleapiclient.errors.HttpError:

    It seems to be working for home pages only. For example, I can access the data for http://www.example.com/, but will receive the given error message for http://www.example.com/directory/index.html. Does anyone have an idea why that is and what to do about it?

  7. What about the ability to write “top queries” to one file/table and writing “top pages” to a separate file/table. How would your script need to be modified to accomplish that task.