In this day in age, you can’t find a modern SEO guide that doesn’t recommend using the schema as part of your overall search strategy, and for a good reason. Schema markup, which has been supported by all the major search engines (Google, Bing, Yahoo, DuckDuckGo, etc.) for the better half of seven years now, has abysmal adoption rates. Seen across all ends of the spectrum, from intimidating to superfluous, most of the time talking with the SEO community its use is simply avoided due to the time it takes to implement. Like writing meta titles/descriptions, schema markup data is a very contextual, backend aspect of SEO which isn’t an inheritance ranking factor. So, the returns of adopting schema are outweighed by the time it takes to implement across hundreds to thousands of URLs. But, what if there was a way to automate the process?
In this instance, I’ll be using VideoObject as the main focus for automating schema in the SEO process. Though, reworking this framework can be reused to apply to automate the creation of schema across products, articles, or other properties. As per my other articles, I will be using Python as the main language for automating this process.
What you will need for this code to work for you:
- An excel file full of YouTube videos and their corresponding links for your site
- Genson, a JSON schema building library
- Selenium Webdriver, for automating some browser activity used in the code
- BeautifulSoup, an HTML parser to scrape parts of YouTube
- A few core essentials that usually come packaged with any IDE in Python (Pandas, JSON, Requests, Datetime)
What You Need for VideoObject Schema
The minimum requirements needed for valid VideoObject schema markup include:
- Description: A description detailing the content of the video
- Name: The title of the video
- thumbnailUrl: A URL that can tell Google where the thumbnail image for the video exists.
- uploadDate: a date value in ISO 8601 date format.
Fortunately, this code hits all the nails on the head in one go.
How to Run the VideoObject Generator
Load in Your Data
To start automating your VideoObject schema, format your file to look like the following.
Feel free to add a few additional pieces of information in this file for context (and to make the process easier to hand off to a tech team / other team members for the project). The main columns you will need are Video URL and Embed URL.
from genson import SchemaBuilder import dateutil.parser as parser import requests from bs4 import BeautifulSoup import json import pandas as pd from selenium import webdriver # Load in your file xlsx = pd.ExcelFile('Meta_Data.xlsx') video_data = pd.read_excel(xlsx, 'Video Data') youtube = video_data['Video URL'] link = video_data['Embed URL']
Grabbing the Thumbnail Image
def Thumbnail_Pull(url): options = webdriver.ChromeOptions() options.add_argument("headless") cpath= 'C:\\Webdriver\\chromedriver' driver = webdriver.Chrome(options=options,executable_path=cpath) driver.get((url)) url_main = driver.current_url driver.quit() i_id = url_main.replace('https://www.youtube.com/watch?v=', '').replace('&feature=youtu.be', '') thumbnail = 'https://img.youtube.com/vi/' + i_id + '/maxresdefault.jpg' return thumbnail
Using Selenium, the script takes the URL from your spreadsheet, loads the page and takes the source URL. This is to ensure that a consistent URL is used for the thumbnail (rather than a vanity URL, redirect, URL variation, etc). From there, the code constructs a URL that returns the same thumbnail that is present on your YouTube videos.
And the Rest of the Data
def YouTube_Data_Crawl(url): data = [] r = requests.get(url) content = r.content soup = BeautifulSoup(content, 'lxml') # Retrive the title try: title = soup.select('.watch-title') title = title[0].getText() title = title.strip().replace('\n',' ') except IndexError: title = 'null' # Retrive the description try: des = soup.find('p', {'id': 'eow-description'}) description = des.text description = description.replace('\n', '-').replace('\r', ' ').replace('\t', ' ').replace('"', '') except (IndexError, AttributeError) as e: description = 'null' # Retrive the date and convert to ISO format try: dates = soup.find('div', {'id': 'watch-uploader-info'}) date_fake = dates.text.replace('Published on ', '').replace('Uploaded on ', '') date = parser.parse(date_fake) date = date.isoformat() except (IndexError, ValueError) as e: date = 'null' data.append((title, description, date)) return data
The code uses BeautifulSoup on the URL to open up the page and parse out the title and description that are on the video. It also grabs the published date and converts the format into ISO 8601 using date.isoformat(). All the data is then placed into a list and returned.
Putting it All Together
### Schema Build ### def SchemaBuild(des, name, date, thumbnailURL, embedded): builder = SchemaBuilder() builder.add_schema({"@type": "VideoObject"}) builder.add_schema({"description": des}) builder.add_schema({"name": name}) builder.add_schema({"thumbnailUrl": thumbnailURL}) builder.add_schema({"uploadDate": date}) builder.add_schema({"embedUrl": embedded}) meta_data = builder.to_schema() meta_data['$schema'] = 'http://schema.org' meta_data["@context"] = meta_data['$schema'] del meta_data['$schema'] meta = json.dumps(meta_data) return meta
The final function uses Genson’s SchemaBuilder to construct the JSON metadata using the data you collected from the previous lines of code. Then, the information is then added and then the exact schema type is constructed into a fully usable format.
frames = [] for video, embedded in zip(youtube, link): img = Thumbnail_Pull(url=video) vid_data = YouTube_Data_Crawl(url=video) df_main = pd.DataFrame(vid_data, columns=['Title', 'Description', "Date"]) title = df_main['Title'].iloc[0] description = df_main['Description'].iloc[0] date = df_main['Date'].iloc[0] metadata = SchemaBuild(des=description, name=title, thumbnailURL=img, date=date, embedded=embedded) frames.append(metadata) data_list = pd.DataFrame(frames, columns=['VideoObject Schema']) data_list['Video'] = youtube data_list.to_excel('schema_data_raw.xlsx') video_data['Video Meta Data'] = data_list['VideoObject Schema'] video_data.to_excel('video_meta_data.xlsx')
The final product is added to your initial spreadsheet as a brand new column, lining up with its corresponding video.
From here, you can manually hard code the JSON file directly into the page you are optimizing for or utilize a separate plugin through your CMS.
For the full script, see my GitHub repo and feel free to reach out to me directly with any questions on this script or process!