Discouraging "the voice from nowhere" (~LLMs) in documentation

Django’s documentation is one of its greatest value propositions. While the sentiment may not be universal, I hear it even from first-time bug reporters, who embellish reports with comments such as, “love the framework, been using it since 0.01, great docs!”[1]

If we start merging documentation written in the voice of an LLM, I think we’re going to do serious damage to that value proposition. I’d like to merge a sentence or two to the contributing docs about this.


Consider this example. I’ve rewritten the subject matter to avoid singling anyone out.

Running your Django site

It’s important that you run your Django site correctly. When your site runs
correctly, your project uptime will improve, leading to fewer frustrated users
and service requests that can distract you from your other priorities.

This intentionally ridiculous example is both overly vague and overly precise at the same time, drilling down into an irrelevant detail about project uptime.

Pretend LLMs don’t exist. I’m suggesting we wouldn’t merge this paragraph into Django regardless of whether it was drafted by an LLM. The reason it’s not appropriate for Django’s documentation is that is not written from the voice of an experienced developer addressing a peer.[2] Instead, it’s written from a voice from nowhere.

As a reader, the “voice from nowhere” immediately degrades my trust in the documentation. I wrinkle my nose and brace myself for an onslaught of more lousy text.

Reviewers are already starting to discourage this. In some cases, a shorthand is used: “it’s not appropriate to use an LLM to generate this doc”. I’ve also seen the more laborious alternative of just iterating a half-dozen times and leaving scores of comments. In my experience, contributors don’t get the drift and just have their LLM plow through all the feedback. I can only speculate why reviewers (myself included) take the “long route”. It might be to avoid getting into a debate about whether it’s okay to use an LLM.

I’d like to propose that we bracket the whole thing about LLMs and just say that Django’s documentation needs to studiously avoid the “voice from nowhere”. That should give reviewers more power to redirect contributors in a better direction without getting into debates about LLMs.


My proposal is to add something like this in the coding style doc beneath the existing “Writing style” section:

Voice

Django’s documentation is written in the voice of an experienced user of the
framework addressing a peer developer, even if that developer is new to Django
or new to web development. Although it is not the only possible source of poor
text, generative AI tools are liable to produce text written in the “voice from
nowhere”: for example, dwelling on irrelevant details, using bullet points for
the sake of having bullet points, or presenting overly digested, zippy advice.
This damages the contract between reader and author that the author has
exercised editorial control over which details matter for the purpose at hand,
owing to the author’s own experience with the framework.

If a reviewer asks you to avoid using an LLM to generate documentation, it may
be a shorthand for expressing the more fundamental value that we must keep the
voice of Django’s documentation consistent. We cannot pollute it with “the
voice from nowhere.”


  1. It didn’t occur to me in time for this year’s Django Developer’s Survey, but I wonder about adding “Documentation” to the pick list for “Favorite components” next year. ↩︎

  2. My notion of peers includes beginners. That’s another amazing thing about Django’s docs: it’s beginner-friendly! ↩︎

11 Likes

Thanks for writing this up Jacob. I think having some docs around the voice of the documentation would be very helpful. I’ve run into the case in the past where the things I wrote were edited to the point where I was thinking, “why don’t you just rewrite this and save us both some time?” Having guidelines may have helped me.

I’m not sure we want to go down the path of empowering reviewers to say, “Don’t use an LLM to generate this” then expecting the PR author to figure out what that actually means. Ideally we’d use this as an opportunity to help explicitly guide contributors to how to write it more appropriately (or better direct the LLM). I think a canned response for reviews would be useful.

3 Likes

I’m a bit confused. Is the example you gave based on an actual contribution, or is it hypothetical?

If it’s hypothetical, I’m struggling to see the concrete problem. It feels like we’re discussing something that could become an issue, rather than something that is an issue today.

The difficulty with hypotheticals is that we can easily imagine many potential problems and start optimizing for all of them, which risks adding complexity without clear benefit.

Without real examples from the docs or recent contributions, it’s hard to tell whether this is something the current review process isn’t already handling well.

Also, the example itself feels unusually vague—more like a placeholder than something that would realistically be accepted into the docs. That makes it harder for me to relate it to actual contributions.

Thanks for opening this subject Jacob. I am all on board for defining the “voice” of the docs and I like your proposed text. Django’s docs do generally fit that voice though some of the older parts are a bit informal or used dated reference (like assuming the user knows PHP), though we have been improving them as they get touched.

I agree with @doom’s sentiment that it’s a bit hard to judge based on the one anonymized example you’ve given. That said, I have generally experienced this vague word-salad verbose voice from LLMs in other contexts, so I’m familiar with the problem.

I think the part I bolded is key here. My interpretation is that PRs are being created with LLM generated docs that have no place being merged in. Then the reviewer has to deal with communicating how to write docs properly while not being sure how much of an attempt was made in the first place.

It’s based on multiple actual proposed contributions that reached various resolutions (closing, iterating and closing, iterating and merging). I’m not going to link to any, to avoid public shaming.

1 Like

I’m increasingly seeing that writing style, too. And speaking as a human whose own writing tends toward excess verbosity[1]—and who often learns of the (implicit) Django house style[2] only during reviews—I’d welcome an expanded style guide. Specifying the expected voice seems like a great start.

This sounds exactly right to me:

I’m struggling a little with the rest of the suggested text. I’m not familiar with the phrase “voice from nowhere.”[3] And while (I think) I get what it’s trying to say, I’m not sure it provides useful guidance—particularly to a non-English speaker (or an LLM).

The specific examples (“dwelling on irrelevant details,” etc.) all made sense to me. Is there a way to structure them as concrete, actionable guidelines? This isn’t exactly right, but maybe something like:

  • Be concise. Avoid fluff.
  • Avoid generic platitudes and management-speak.[4] Remember you are speaking directly to another developer.
  • Stay focused on Django and the developer’s interaction with it. Don’t speculate or offer advice on the developer’s other priorities and responsibilities.
  • When in doubt, condense. A short, well-written sentence communicates more effectively than a list of bullet points padded with irrelevant details.

And then reviewers could link to those guidelines rather than getting into debates about appropriate LLM use.


  1. and is riddled with em dashes and parenthetical asides (and footnotes!) ↩︎

  2. E.g.: The passive voice is preferred in Django documentation. Django documentation uses the Oxford comma. Etc. ↩︎

  3. TIL “psychedelic noise rock:musical_notes: ↩︎

  4. like “actionable” :grinning_face: ↩︎

2 Likes

This all sounds good. I do wonder if there is value in taking the resultant guide (whatever that maybe) and using it with an LLM to see what it comes back with? Effectively preempting what these existing PR creators have been doing and will do with the new guide when it appears?

1 Like

Proposed edit:

Be concise. Avoid fluff.

3 Likes

+100 to favoring concrete details over vaguer notions. I’ll incorporate your ideas in a PR for this later today.

(I was reasoning backward from what I was feeling: “it doesn’t sound like someone who deploys websites is writing this text about deploying websites” → “it sounds like it was written by a voice with no particular experience”. But we’ve fleshed out this notion fully enough on this forum thread for anyone who’s interested. It doesn’t need to live in the docs.)

Thanks all, gave it a shot – Feedback welcome!

2 Likes