What is a Sitemap?

Sitemaps are as old as the first search engines - Google introduced the sitemaps protocol protocol in 2005. But what are sitemaps? In essence, they are XML files that lists all the URLs for a website or domain, together with additional information about each URL: - when the URL was last updated - how frequently the content under that URL changes - how important a URL is relative to all the other URLs on the domain

The information contained in a sitemap file allows search engines, such as Google Search, or Bing, to more efficiently crawl the website and to find URLs that could otherwise be tricky to discover for an automated crawler. Thus the sitemaps file complements the robots.txt file that informs search engine crawlers which parts of your website or domain are off limits.

A sample sitemap.xml file looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
      <loc>http://www.example.com/</loc>
      <lastmod>2005-01-01</lastmod>
      <changefreq>monthly</changefreq>
      <priority>0.8</priority>
   </url>
   <url>
      <loc>http://www.example.com/catalog?item=12&amp;desc=vacation_hawaii</loc>
      <changefreq>weekly</changefreq>
   </url>
   <url>
      <loc>http://www.example.com/catalog?item=73&amp;desc=vacation_new_zealand</loc>
      <lastmod>2004-12-23</lastmod>
      <changefreq>weekly</changefreq>
   </url>
</urlset>

Benefits of Having a Sitemap

There are several benefits of having a sitemap available for your website:

  • they assist crawlers in faster indexation
  • are very useful for sites that have thousands of pages and/or a deep website architecture
  • help search engines index faster when your website frequently adds new pages
  • when you frequently change content of existing pages
  • when your website suffers from weak internal linking and orphaned pages
  • when your website lacks a strong external link profile

Think about it this way - search engine crawlers arrive at your website with a "budget" of how many sites they will index, how many links to follow, how much data to download. Through a sitemap, you have a way to indicate to the crawler how to best spend that budget: by looking at the "interesting" pages contained in the sitemap, rather than blindly crawling along every link on your page until they run out of juice. So by including those pages in your sitemap that are highly SEO-relevant you can help search engines index your site more intelligently. On the flip-side, this also means that you should not include in the sitemap any pages that are irrelevant for SEO.

Let's look at an example:

Consider a website that has 1,000 pages. Of those, 400 pages are SEO relevant with great content, the remaining 600 pages are non-canonical pages, or duplicates, or session/params-based pages, or perhaps social sharing URLs. You decide to create a sitemap.xml that includes your prime 400 pages, asking the Google crawler to de-prioritize indexing for the other 600 pages. Now the Google algorithm decides that of those 400 pages it indexed, 300 are Grade A+, and 100 are Grade B. So your website is probably a great site to send users to.

Now imagine, you didn't have a sitemap.xml file and Google would have crawled all 1,000 pages, and deems 600 of them C,D, or F grade - that's 60% of your entire site! Now your average grade isn't looking so hot anymore ...

Even though a sitemap.xml doesn't preclude a search engine from indexing the entire site eventually, the example highlights the importance of properly configured sitemap.xml (inclusion definition) and robots.txt (exclusion definition) for optimizing your SEO ranking.

Implementing a Dynamic Sitemap For Your Rails App

Like many things in the Ruby on Rails-world, sitemap generation is a long-solved problem with a variety of Ruby gems readily available. For example, the sitemap_generator gem available on Github integrates nicely with your Rails app, and can generate sitemap files automatically on a pre-defined schedule implemented with a cron job, and automatically ping your favourite search engines to let them know that a new list of URLs is available for their crawlers.

In this article however, we want to follow a different path: dynamically rendering a sitemap.xml file on the fly, directly from your Rails app, instead of offline generation with a rake task (like sitemap_generator). Thankfully, that task is surprisingly simple and takes only a few lines of code.

We first start by adding a route to our config/routes.rb file:

# config/routes.rb
Rails.application.routes.draw do
  # For details on the DSL available within this file, see https://guides.rubyonrails.org/routing.html
  # ... other routes
  get '/sitemap.xml', to: 'sitemap#index', as: 'sitemap'
  # ... more routes
end

Then, we add a SitemapController that will be responsible for handling this route:

# app/controllers/sitemap_controller.rb
class SitemapController < ApplicationController
  def index
    @host = "#{request.protocol}#{request.host}"
    @posts = Post.published
  end
end

Finally, we create an app/views/sitemap/index.xml.erb template file to dynamically render out our sitemap.xml in a sitemap protocol 0.9 compatible format:

<?xml version="1.0" encoding="UTF-8"?>
<urlset
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"
  xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
      <loc><%= @host %></loc>
   </url>
   <url>
      <loc><%= "#{@host}#{expertise_path}" %></loc>
   </url>
   <url>
      <loc><%= "#{@host}#{contact_path}" %></loc>
   </url>
   <url>
      <loc><%= "#{@host}#{blog_path}" %></loc>
   </url>
   <% @posts.each do |post| %>
     <url>
        <loc><%= "#{@host}#{post_path(post)}" %></loc>
        <lastmod><%= post.updated_at.to_s(:friendly_short) %></lastmod>
     </url>
   <% end %>
</urlset>

We can test that everything is working correctly, by starting up the Rails app in development mode and accessing http://localhost:3000/sitemap.xml in our browser.

Once everything is working as intended and deployed to production, you can submit your sitemap through the Google Search Console.

Closing Thoughts

It is worth noting, that sitemaps complement, but do not replace the crawlers that search engines already use to discover and index pages, and using a sitemap does not guarantee that URLs will be included in search indexes, nor does it impact the ranking in search results. For instance Google's documentation specifies that:

"Using a sitemap doesn't guarantee that all the items in your sitemap will be crawled and indexed, as Google processes rely on complex algorithms to schedule crawling. However, in most cases, your site will benefit from having a sitemap, and you'll never be penalized for having one."

So why implement a sitemap after all?

Well, since websites are discovered from link to link, it is imporrant that other websites link to your site to signal its existence. If no website links to your new blog posts, a sitemap can really help search engines quickly discover new pages on your site.

Sitemaps can be also used to hide pages from users while still letting search engines crawl them. Why would you ever do that? I don't know ;-)

How Can We Help?

If you have any questions about KUY.io or are looking for advice with optimizing your next project for SEO, please feel free to reach out us.

👋 Cheers,

Nicolas Bettenburg, CEO and Founder of KUY.io