Generate llms.txt file for Django docs, ie: make Django AI friendly

LLMs are smart but when helping them you get much better results.

Django documentation is well indexed by LLMs but it can be hard for LLMs to manage multiple versions at the same time and they are often confused with old docs and generate code that is not using the latest version. Current Django docs are also a bit too big to be added to the context all the time (would be costly)

https://llmstxt.org is an emerging standard to help LLMs with documentation
[edit] There an open discussion on Sphinx to support as a build target llms-full.txt - allow LLMs to consume the docs · Issue #13268 · sphinx-doc/sphinx · GitHub

There are already a few MCP servers able to read documentation from those files to help LLMs generate better quality output when asking them to use a specific library or version. They can also leverage RAG techniques in order to only add documentation relevant to the context of the user. For example no need to add Template documentation when asking the LLM to generate a models.py file.

Stretch goal could be a custom Local Django MCP Server that would serve the correct documentation, can use manage.py commands, can generate code using python manage.py startapp and startproject.

Seeing how vibe coding is taking off, I think having a strong AI friendly strategy for a framework like Django will determine how it will be used in the next decade.

Would love to hear what people think of this, and maybe create an issue on Trac for somebody to tackle. This could also be a cool Google Summer of Code project

4 Likes

I’m not 100% versed with how llms.txt works (how it’s generated and used), but I can provide my thoughts about how I would see this generally in Django, without going into details regarding LLMs.

First off, I am automatically for any idea that will be a help to reading and understanding the documentation and/or help newer Django developers.

If that would be achieved by implementing your idea, great.

As for how to achieve this, I feel this is no small feat. Django docs are big (really big), but if someone feels they can accomplish this without being a blocker on anything else, I am all for this.

All in all, +0 from me.

P.S. I don’t think a Trac ticket should be created already. You did the correct thing by opening a discussion on the forums – that is where these sorts of discussions should be held. Opening this on Trac would most likely result in the ticket being closed and pointing you to open a discussion on the forums.

:waving_hand: is it llms.txt or llms-full.txt that you think Django should have? Personally I like the idea but I’d like to see:

  • Exactly which LLM tools use this right now. Currently this seems like one of those standards that might have a use at this specific point in time but be replaced by something else later on.
  • A prototype. It seems to me the harder work is writing the content rather than necessarily the tooling of converting the docs to HTML then back to Markdown. Assuming that’s the bit Sphinx would be responsible for.

Both llms.txt and llms-full.txt the use case is slightly different.

The full one is more to give full context to a LLM, making it easier for robots to parse the full documentation or to do fine tuning using the latest version of the docs. HubSpot for example release their entire dev docs as llms-full.txt https://developers.hubspot.com/docs/llms-full.txt

The llms.txt is more like a sitemap.xml but specifically for LLMs agents that would want to find the relevant docs. You can see it as a more agent friendly way to navigate docs, that’s what a MCP server would use to return relevant context. An example of a good one is the pydantic AI one. https://ai.pydantic.dev/llms.txt you can see that it is small and is more like a homepage with links to the relevant sections.
The last piece of the puzzle is to provide a MD version of each HTML page so that crawlers don’t have to transform HTML back to raw text for training example. This saves in resources but also in accuracy as the MD version can focus on content and does not have header/footer and any other non relevant text.
Example:
Dependencies - PydanticAI
https://ai.pydantic.dev/dependencies/index.md

In terms of tools, they are some MCP servers that can leverage llms-full.txt already. One is maintained by langchain GitHub - langchain-ai/mcpdoc: Expose llms-txt to IDEs for development
This can then be leverage by any coding agent/idea supporting MCP. here is a non exhaustive list of those that do support this already. claude code, github copilot, cline, cursor, windsurf, jetbrains pycharm ai.

In terms of benefits this mean that if the file is available, instantly developers can leverage existing tools to be able to search up to date and version specific docs in their IDE. Those agents are also much better at referencing the original source as it is not compressed at training time and can link people the correct place in the docs. You can see that as a smart search within your IDE using LLM

1 Like

I found this website listing the websites that have a llms.txt file.
https://llmstxt.site they are some big ones adopting this standard

1 Like

Thank you @Benoit, I’ve seen lots of projects creating llms.txt but the LangChain MCP LLMS-TXT Documentation Server is the first time I’m aware of a tool explicitly using that, beyond people just manually pasting / linking to those files in chat interfaces.

Do you have thoughts on whether this can be prototyped outside of Django itself, and how we’d test that the file works correctly? You said earlier:

Seeing how vibe coding is taking off, I think having a strong AI friendly strategy for a framework like Django will determine how it will be used in the next decade.

As far as I understand, Django’s already in a pretty strong place, so I’d be interested to know what would be the signs that an llms.txt file delivers any extra value. With Django having been around for 20 years, and there being many StackOverflow answers and published books that AI companies have taken the liberty to train on (…), LLMs are pretty good with Django as it is. So maybe it’s more in the agent capabilities that would be improved?

I think the main issue with projects like Django that have been around for a long time is that the code snippets and the documentation that is used for training is usually outdated.
I have seen this myself a lot when using AI writing Django with things that have been deprecated for a long time.
Having an easy way to reference the exact Django version of the documentation for your project is very valuable.
Django is in a pretty strong position because of the opinanited aspect of the framework with mostly one way of doing things.

1 Like

Another implementation that I find pretty interesting and very dev/lln friendly is how Elevnlabs handle it for their developers docs. They have a button to copy page to clipboard that copies the content of the associated md file.

the MD file https://elevenlabs.io/docs/conversational-ai/guides/simulate-conversations.md has very good structure with a mix of MD and XML tags for Steps and Code. Here is an example on how it looks like in the LLM friendly MD file

## Prerequisites

* An agent configured in ElevenLabs Conversational AI ([create one here](/docs/conversational-ai/quickstart))
* Your ElevenLabs API key, which you can [create in the dashboard](https://elevenlabs.io/app/settings/api-keys)

## Implementing a Simulation Testing Workflow

<Steps>
  <Step title="Identify initial evaluation parameters">
    Search through your agent's conversation history and find instances where your agent has underperformed. Use those conversations to create various prompts for a simulated user who will interact with your agent. Additionally, define any extra evaluation criteria not already specified in your agent configuration to test outcomes you may want for a specific simulated user.
  </Step>

  <Step title="Simulate the conversation via the SDK">
    Create a request to the simulation endpoint using the ElevenLabs SDK.

    <CodeGroup>
      ```python title="Python"
      from dotenv import load_dotenv
      from elevenlabs import (
          ElevenLabs,
          ConversationSimulationSpecification,