Technical SEO—Additionals

Learn some additional tips of technical SEO, effectively using crawl budget, robots.txt, meta robots, hreflang and structured data to optimize your websites for search engines and users.

There are some additional factors that too can help technically optimize a website. Not all of them may apply to our website. We only have to deal with these if they apply to our project.

Crawl budget

Crawl budget refers to the number of pages on our site that search bots will crawl and index within a given timeframe. Since crawl budget is limited, we need to be aware if all of our important content is being crawled and indexed efficiently.

Since Googlebot’s main priority is crawling and indexing, most webmasters won’t need to worry about their crawl budget [40]. Google will handle crawling their pages on its own. We only need to think about the crawl budget in the following cases:

  • If we have a new website with lots of pages (1000+) as is the case with e-commerce websites.

  • If we have a large website with millions of pages.

  • If our website is updated frequently and it matters that the indexed version is fresh, as is the case with news websites.

We can monitor our website’s crawling activity by using Google Search Console’s Crawl Stats Report. It gives us stats on Googlebot’s crawling history of our website. Google offers a guide on how to use the Crawl Stats Report, and explicitly states that we shouldn’t need to use the report if our website has fewer than a thousand pages [41].

We can maximize our crawl budget by:

  • Improving our site’s speed, as advised by Google itself [42].

  • Removing or applying 301 redirects to duplicate content.

  • Keeping our sitemap updated and resubmitting the latest version to Google.

  • Fixing broken links.

  • Fixing long redirect chains (with more than one redirect between starting URL and destination URL) since they negatively impact crawling and indexing.

  • Getting rid of outdated or low-quality content.

  • Use internal links because Google prioritizes pages with more links pointing towards them. Since backlinks aren’t completely in our control, we can fill the gaps with internal links.

The robots.txt file

Setting up the robots.txt file successfully is another plus of our website’s technical SEO. robots.txt, also called Robot Exclusion Protocol, is the first thing that Googlebot will try to retrieve for permission to crawl the website’s pages.

A robots.txt file must be placed in the topmost directory of the website and has a URL similar to the one below:

http://www.yourdomain.com/robots.txt

The page appears similar to the one below:

Press + to interact
Showcasing the robots.txt file
Showcasing the robots.txt file

Reasons for having robots.txt

If a website does not have a robots.txt file, all the pages will generally be crawled in the normal way [43]. So it’s easy to understand why most websites don’t need a robots.txt file. Without a robots.txt file, Google and other search engines will automatically index all the pages on our site.

That being said, there are multiple benefits of having a robots.txt file:

  • Block pages on your website: It allows us to block certain pages on our website that we don’t want indexed and available for the public. These may be login pages or staging site that we don’t want showing up in search results. We can Disallow them in the robots.txt file.

  • Use crawl budget effectively: If there are low-quality pages that are eating away our crawl budget and preventing the important pages from getting indexed, we can Disallow the unimportant content in this file. This helps reserve our crawl budget for the pages that matter to our website.

  • Prevent crawl of duplicate pages: Using the file, we can prevent the crawling of duplicate content.

  • Prevent indexing resources: We can Disallow indexing of resources, such as PDFs, images and videos.

  • Include sitemap in robot.txt: Including the path to our site’s sitemap in the robots.txt file by including the line "Sitemap: https://yourdomain.com/my_sitemap.xml" anywhere in the file is another way to make it discoverable to search engines, aside from submitting it using the Sitemaps report.

Utilizing robots.txt correctly

If we choose to include a robots.txt file on our website, do so with care. Having it on our website without completely understanding how it works may end up in unintentionally blocking important files, or worse—our entire site!

For example, the directive

User-agent: *

Disallow: /

without putting anything after the / sign tells all crawlers not to crawl our entire site. Be careful!

Here are some tips:

Follow Google’s guide

Google provides instructions on how to create the robots.txt file. Add rules correctly to the file using the table provided in the robot.txt guide by Google [44].

New line for each directive

Insert each directive in a new line.

For example, instead of User-agent: * Disallow: /pmax Disallow: /pmin Disallow: /prefn1 in a single line, write the following:

User-agent: *

Disallow: /pmax

Disallow: /pmin

Disallow: /prefn1

Audit for errors

It’s easy to make mistakes in the robots.txt file, and just as hard to spot them. We can be left scratching our head why our site isn’t ranking even after months of efforts and the culprit can be a tiny error in the robots.txt file that went unnoticed. That’s exactly why we need to use Google’s Robots Testing Tool to check if our robots.txt file is error-free.

URL Inspection Tool in Google Search Console allows us to check individual URLs and inspect if it’s blocked to Google by robots.txt.

Meta robots

Closely related to robots.txt file is the meta robots tag. While robots.txt file allows us to limit bots activity at site level, meta robots allows us to control their activities at page level. We have already covered the concept in the lesson "Meta Tags Optimization" of the chapter "On Page SEO." By using specific directives in the tag, we can instruct crawlers not to index the page, not to follow links on it, or both. Google provides a guide on using the robots meta tag [45].

Hreflang

Do we have an international website that targets people that speak different languages and live in different countries? A multilingual website has versions of the same content in different languages. But how can we be sure that the Spanish version of the page appears to the person making the search query from a Spanish-speaking country?

A person searching for a brand in France on Google will see a URL of the form https://www.example.com/fr-FR as the first search result.

Press + to interact
Showcasing the Google search interface in French
Showcasing the Google search interface in French


But a person searching for the same brand in Germany will see URL of the form https://www.example.com/de-DE as the first search result.

Press + to interact
Showcasing the Google search interface in German
Showcasing the Google search interface in German

Though Google itself states that its algorithms are advanced enough to detect the language used on the page, the same source also directs webmasters to mark up the different versions of their pages to help Google Search direct users to the correct version by language or region [46].

That’s where the hreflang tag helps us. It’s an HTML tag that helps search engines serve multilanguage, multi-region content to searchers. It also eliminates the possible problem of duplicate content. Even if we have similar content for US and UK English speakers, Google will know that the pages are tagged for different regions and are not duplicates.

How to setup hreflang tag

Thanks to Aleyda Solis’s Hreflang Generator Tool, we don’t have to worry about coding the tag ourselves. Just fill in the form as shown below, entering the appropriate language and region, and press “GENERATE THE HREFLANG TAGS FOR THESE URLS”. Copy-paste the generated hreflang annotations in the <head> element of our pages’ HTML and we’re good to go.

Press + to interact
Showcasing the Aleyda Solis’s Hreflang Generator Tool
Showcasing the Aleyda Solis’s Hreflang Generator Tool

The example in the image above has two variations of the page:

Page URL

Description

https://example.com/es

Spanish homepage

https://example.com

Default homepage

example.com is the default page where we want all language traffic (other than Spanish) to be directed. Spanish speakers are to be directed to the page variation example.com/es. The complete code generated by the tool,

<link rel="alternate" href="https://example.com/es" hreflang="es" />
<link rel="alternate" href="https://example.com" hreflang="x-default" />

will be added to the HTML of both the pages.

Similarly, if a page has four variations:

Page URL

Description

https://example.com/es

Spanish homepage

https://example.com/en-gb

UK homepage

https://example.com/en-us

US homepage

https://example.com

Default homepage

The Spanish-speaking searchers are directed to the Spanish homepage, while the UK searchers are directed to the UK homepage, US searchers to the US homepage and all other searchers (with languages and regions not listed in the hreflang tag for this set of page variations) to the default homepage. The following, complete, hreflang annotations code should appear in the HTML of each of the page variations:

<link rel="alternate" href="https://example.com/es" hreflang="es" />
<link rel="alternate" href="https://example.com/en-gb" hreflang="en-gb" />
<link rel="alternate" href="https://example.com/en-us" hreflang="en-us" />
<link rel="alternate" href="https://example.com" hreflang="x-default" />

Once we’ve set up the hreflang tags for our pages, you can debug them for the most common errors using Google’s International Targeting report.

Note: The <head> section of the HTML code isn’t the only place we can specify the hreflang tag. Alternatively, we can also specify it in the HTTP header of the page or our website’s sitemap. It is enough to specify the hreflang tag at any one of these locations. Google provides complete guidelines for the code format used at each location [46].

Test your knowledge

Match the homepage type on the left to its most likely URL on the right.

Match The Answer
Select an option from the left-hand side

Default Homepage

German Homepage


Structured data

Structured data uses code called schema to label the elements on our web pages, such as videos, images, products and recipes, to help search bots understand the content.

This specialized code includes key details, such as title, description, length of a video, rating, price of a product and more. Structured data qualifies our content to appear as a rich result, with all these key details, making it more clickable to searchers.

How to add structured data

If we don’t want to get into the technical details or hire a web developer to do it for you, Google has a handy Structured Data Markup Helper for us. As the first step, as shown in the picture below, we have to enter the page URL and pick the most fitting category for the page.

Press + to interact
Showcasing the Google's Structured Data Markup Helper
Showcasing the Google's Structured Data Markup Helper

Next, the tool will display the page on the left and the tags on the right, as shown in the picture below. Click each element on the page and select the suitable tag from the drop down menu that appears.

Press + to interact
Showcasing the Google's Structured Data Markup Helper using educative.io
Showcasing the Google's Structured Data Markup Helper using educative.io

As we tag the elements, the tags appear on the right as shown in the picture above. Once all the elements are tagged, click “create HTML” in red on the top right of the page. Google will generate JSON-LD structured data for us.

Press + to interact
Showcasing the Google's Structured Data Markup Helper using educative.io
Showcasing the Google's Structured Data Markup Helper using educative.io

Add the above code to the <head> section of our page’s HTML.

Test the code

After adding the structured data to our page’s HTML, don’t forget to validate it. Google recommends using its Rich Results Test to check how our structured data works and which results can be generated using the code embedded in our page. It also shows what the rich result will look like in search results. For generic code validation, use Schema Markup Validator.

Test your knowledge

Choose the best option for each of the following questions.

1

What is the primary purpose of the robots.txt file?

A)

To improve website loading speed

B)

To create structured data for search engines

C)

To instruct search engines how to crawl website pages

D)

To optimize the mobile-friendliness of a website

Question 1 of 30 attempted

Get hands-on with 1300+ tech skills courses.