I have about 200k urls. I set the per page limit to 5000, and page_cache at 86400 (24 hours). The sitemap is dynamically generated by sitemap index (The sitemap framework | Django documentation | Django). It is super helpful.
example.com/sitemap.xml sometimes crash the server. The AWS EC2 console shows service are ok, but unable to ssh to the server, and web app is down.
I am still testing and not 100% sure about root cause, maybe
- Internal factor, due to EC2, nginx, gunicorn or django
- External factor, because of web crawlers (Googlebot)?
How do you ensure sitemap has high performance? What are the best practices if you have millions of urls? Or the only solution is to increase the server specs?
Would like to hear your experience.
from django.contrib.sitemaps import Sitemap from blog.models import Entry class BlogSitemap(Sitemap): changefreq = "never" priority = 0.5 def items(self): return Entry.objects.filter(is_draft=False) def lastmod(self, obj): return obj.pub_date
Is there anything we can do on the queryset to improve performance, such as selecting specific fields (urls, last_modified). Or since it’s lazy, it’s already optimized?
I am using page_cache, are there any tips that would make a difference?
1 technique is to create sitemap with
If we create the
.gz files ourselves, how do we integrate that with the sitemap index?
Is compressed sitemap more performant than cached sitemap?
What are the pros and cons?