Technical SEO—Additionals
Learn some additional tips of technical SEO, effectively using crawl budget, robots.txt, meta robots, hreflang and structured data to optimize your websites for search engines and users.
We'll cover the following
There are some additional factors that too can help technically optimize a website. Not all of them may apply to our website. We only have to deal with these if they apply to our project.
Crawl budget
Crawl budget refers to the number of pages on our site that search bots will crawl and index within a given timeframe. Since crawl budget is limited, we need to be aware if all of our important content is being crawled and indexed efficiently.
Since Googlebot’s main priority is crawling and indexing, most webmasters won’t need to worry about their crawl budget [40]. Google will handle crawling their pages on its own. We only need to think about the crawl budget in the following cases:
If we have a new website with lots of pages (1000+) as is the case with e-commerce websites.
If we have a large website with millions of pages.
If our website is updated frequently and it matters that the indexed version is fresh, as is the case with news websites.
We can monitor our website’s crawling activity by using Google Search Console’s Crawl Stats Report. It gives us stats on Googlebot’s crawling history of our website. Google offers a guide on how to use the Crawl Stats Report, and explicitly states that we shouldn’t need to use the report if our website has fewer than a thousand pages [41].
We can maximize our crawl budget by:
Improving our site’s speed, as advised by Google itself [42].
Removing or applying 301 redirects to duplicate content.
Keeping our sitemap updated and resubmitting the latest version to Google.
Fixing broken links.
Fixing long redirect chains (with more than one redirect between starting URL and destination URL) since they negatively impact crawling and indexing.
Getting rid of outdated or low-quality content.
Use internal links because Google prioritizes pages with more links pointing towards them. Since backlinks aren’t completely in our control, we can fill the gaps with internal links.
The robots.txt
file
Setting up the robots.txt
file successfully is another plus of our website’s technical SEO. robots.txt
, also called Robot Exclusion Protocol, is the first thing that Googlebot will try to retrieve for permission to crawl the website’s pages.
A robots.txt file must be placed in the topmost directory of the website and has a URL similar to the one below:
http://www.yourdomain.com/robots.txt
The page appears similar to the one below:
Reasons for having robots.txt
If a website does not have a robots.txt
file, all the pages will generally be crawled in the normal way [43]. So it’s easy to understand why most websites don’t need a robots.txt
file. Without a robots.txt
file, Google and other search engines will automatically index all the pages on our site.
That being said, there are multiple benefits of having a robots.txt
file:
Block pages on your website: It allows us to block certain pages on our website that we don’t want indexed and available for the public. These may be login pages or staging site that we don’t want showing up in search results. We can
Disallow
them in therobots.txt
file.Use crawl budget effectively: If there are low-quality pages that are eating away our crawl budget and preventing the important pages from getting indexed, we can
Disallow
the unimportant content in this file. This helps reserve our crawl budget for the pages that matter to our website.Prevent crawl of duplicate pages: Using the file, we can prevent the crawling of duplicate content.
Prevent indexing resources: We can
Disallow
indexing of resources, such as PDFs, images and videos.Include sitemap in
robot.txt
: Including the path to our site’s sitemap in therobots.txt
file by including the line "Sitemap:https://yourdomain.com/my_sitemap.xml
" anywhere in the file is another way to make it discoverable to search engines, aside from submitting it using the Sitemaps report.
Utilizing robots.txt
correctly
If we choose to include a robots.txt
file on our website, do so with care. Having it on our website without completely understanding how it works may end up in unintentionally blocking important files, or worse—our entire site!
For example, the directive
User-agent: *
Disallow: /
without putting anything after the /
sign tells all crawlers not to crawl our entire site. Be careful!
Here are some tips:
Follow Google’s guide
Google provides instructions on how to create the robots.txt
file. Add rules correctly to the file using the table provided in the robot.txt
guide by Google [44].
New line for each directive
Insert each directive in a new line.
For example, instead of User-agent: * Disallow: /pmax Disallow: /pmin Disallow: /prefn1
in a single line, write the following:
User-agent: *
Disallow: /pmax
Disallow: /pmin
Disallow: /prefn1
Audit for errors
It’s easy to make mistakes in the robots.txt
file, and just as hard to spot them. We can be left scratching our head why our site isn’t ranking even after months of efforts and the culprit can be a tiny error in the robots.txt file that went unnoticed. That’s exactly why we need to use Google’s Robots Testing Tool to check if our robots.txt
file is error-free.
URL Inspection Tool in Google Search Console allows us to check individual URLs and inspect if it’s blocked to Google by robots.txt
.
Meta robots
Closely related to robots.txt
file is the meta robots tag. While robots.txt
file allows us to limit bots activity at site level, meta robots allows us to control their activities at page level. We have already covered the concept in the lesson "Meta Tags Optimization" of the chapter "On Page SEO." By using specific directives in the tag, we can instruct crawlers not to index the page, not to follow links on it, or both. Google provides a guide on using the robots meta tag [45].
Hreflang
Do we have an international website that targets people that speak different languages and live in different countries? A multilingual website has versions of the same content in different languages. But how can we be sure that the Spanish version of the page appears to the person making the search query from a Spanish-speaking country?
A person searching for a brand in France on Google will see a URL of the form https://www.example.com/fr-FR
as the first search result.
But a person searching for the same brand in Germany will see URL of the form https://www.example.com/de-DE
as the first search result.
Though Google itself states that its algorithms are advanced enough to detect the language used on the page, the same source also directs webmasters to mark up the different versions of their pages to help Google Search direct users to the correct version by language or region [46].
That’s where the hreflang tag helps us. It’s an HTML tag that helps search engines serve multilanguage, multi-region content to searchers. It also eliminates the possible problem of duplicate content. Even if we have similar content for US and UK English speakers, Google will know that the pages are tagged for different regions and are not duplicates.
How to setup hreflang tag
Thanks to Aleyda Solis’s Hreflang Generator Tool, we don’t have to worry about coding the tag ourselves. Just fill in the form as shown below, entering the appropriate language and region, and press “GENERATE THE HREFLANG TAGS FOR THESE URLS”. Copy-paste the generated hreflang annotations in the <head>
element of our pages’ HTML and we’re good to go.
The example in the image above has two variations of the page:
Page URL | Description |
https://example.com/es | Spanish homepage |
https://example.com | Default homepage |
example.com
is the default page where we want all language traffic (other than Spanish) to be directed. Spanish speakers are to be directed to the page variation example.com/es
. The complete code generated by the tool,
<link rel="alternate" href="https://example.com/es" hreflang="es" /><link rel="alternate" href="https://example.com" hreflang="x-default" />
will be added to the HTML of both the pages.
Similarly, if a page has four variations:
Page URL | Description |
https://example.com/es | Spanish homepage |
https://example.com/en-gb | UK homepage |
https://example.com/en-us | US homepage |
https://example.com | Default homepage |
The Spanish-speaking searchers are directed to the Spanish homepage, while the UK searchers are directed to the UK homepage, US searchers to the US homepage and all other searchers (with languages and regions not listed in the hreflang tag for this set of page variations) to the default homepage. The following, complete, hreflang annotations code should appear in the HTML of each of the page variations:
<link rel="alternate" href="https://example.com/es" hreflang="es" /><link rel="alternate" href="https://example.com/en-gb" hreflang="en-gb" /><link rel="alternate" href="https://example.com/en-us" hreflang="en-us" /><link rel="alternate" href="https://example.com" hreflang="x-default" />
Once we’ve set up the hreflang tags for our pages, you can debug them for the most common errors using Google’s International Targeting report.
Note: The
<head>
section of the HTML code isn’t the only place we can specify the hreflang tag. Alternatively, we can also specify it in the HTTP header of the page or our website’s sitemap. It is enough to specify the hreflang tag at any one of these locations. Google provides complete guidelines for the code format used at each location [46].
Test your knowledge
Match the homepage type on the left to its most likely URL on the right.
Default Homepage
German Homepage
Structured data
Structured data uses code called schema to label the elements on our web pages, such as videos, images, products and recipes, to help search bots understand the content.
This specialized code includes key details, such as title, description, length of a video, rating, price of a product and more. Structured data qualifies our content to appear as a rich result, with all these key details, making it more clickable to searchers.
How to add structured data
If we don’t want to get into the technical details or hire a web developer to do it for you, Google has a handy Structured Data Markup Helper for us. As the first step, as shown in the picture below, we have to enter the page URL and pick the most fitting category for the page.
Next, the tool will display the page on the left and the tags on the right, as shown in the picture below. Click each element on the page and select the suitable tag from the drop down menu that appears.
As we tag the elements, the tags appear on the right as shown in the picture above. Once all the elements are tagged, click “create HTML” in red on the top right of the page. Google will generate JSON-LD structured data for us.
Add the above code to the <head>
section of our page’s HTML.
Test the code
After adding the structured data to our page’s HTML, don’t forget to validate it. Google recommends using its Rich Results Test to check how our structured data works and which results can be generated using the code embedded in our page. It also shows what the rich result will look like in search results. For generic code validation, use Schema Markup Validator.
Test your knowledge
Choose the best option for each of the following questions.
What is the primary purpose of the robots.txt
file?
To improve website loading speed
To create structured data for search engines
To instruct search engines how to crawl website pages
To optimize the mobile-friendliness of a website
Get hands-on with 1300+ tech skills courses.