Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rails on Heroku with S3 sitemap hosting - google didn't like it #402

Open
cameronmccord2 opened this issue Jun 2, 2022 · 2 comments
Open

Comments

@cameronmccord2
Copy link

We run Rails(6.0.3.7) on Heroku and host our sitemaps on our S3 bucket. The readme worked great except that Google wouldn't accept our sitemaps because they were a different host than our website(OURBUCKET.s3.amazonaws.com vs www.shout.app). Our S3 bucket was fully verified in Google with an html verification file placed in our bucket's /sitemaps/ folder.

What fixed it was to set our sitemaps_host to the same as default_host and add redirects to our routes file for the sitemaps as seen below. This way all sitemap urls that Google sees are to our website and not to an S3 bucket.

# Sitemap Index
get "/sitemaps/sitemap.xml.gz", to: redirect("https://OURBUCKET.s3.amazonaws.com/sitemaps/sitemap.xml.gz")

# Each sub sitemap holds 50,000 urls so this is good for 500,000 urls
(1..10).each do |i|
  get "/sitemaps/sitemap#{i}.xml.gz", to: redirect("https://OURBUCKET.s3.amazonaws.com/sitemaps/sitemap#{i}.xml.gz")
end

We also changed our robots.txt to use our website's redirect instead of an S3 url

Sitemap: https://www.shout.app/sitemaps/sitemap.xml.gz

The readme section https://github.com/kjvarga/sitemap_generator#an-example-of-using-an-adapter didn't work because it had us using the S3 urls and not our site's urls. If you'd like I can submit a PR that updates that section of the readme or adds another section under that section detailing this setup

@cameronmccord2 cameronmccord2 changed the title Rails on Heroku with S3 sitemap hosting Rails on Heroku with S3 sitemap hosting - google didn't like it Jun 2, 2022
@kjvarga
Copy link
Owner

kjvarga commented Aug 9, 2022

Thanks for the detailed report! This issue has come up a number of times, but usually users are able to resolve it after realizing they didn't follow all steps, like updating the robots.txt file. I'd like to have another user with the same issue confirm your fix before I update the documentation. I'll leave this issue open for a while to see if we get any +1s.

@alessandrostein
Copy link

We found a workaround that works perfectly! (using AWS S3 to storage our sitemap files)

#config/sitemap.rb

SitemapGenerator::Sitemap.create_index = true
SitemapGenerator::Sitemap.default_host = 'https://yourdomain.com'
SitemapGenerator::Sitemap.sitemaps_host = 'https://yourdomain.com'
#config/routes.rb

base_sitemap_url = "https://#{ENV['AWS_ASSETS_BUCKET']}.s3.amazonaws.com"
get 'sitemap.xml.gz', to: redirect("#{base_sitemap_url}/sitemap.xml.gz")
get 'sitemap:number.xml.gz',
    to: redirect("#{base_sitemap_url}/sitemap%{number}.xml.gz")

Just make sure your sitemap index is 100% correct!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants