[GSoC 2024] Configurable Content Type Parsing

Currently, this is a placeholder, as I update this with my proposal for the idea. Sorry for the inconvenience, as I try to update this as soon as possible!

Configurable Content Type Parsing

[GSoC 2024] Configurable Content Type Parsing

Introduction / Motivation

This project is motivated by an attempt for modernizing the HTTPRequest object. One of the changes proposed is adding
a content-aware request.data property. Currently, we have request.POST already parsing form data - this project aims
to:

  • add a request.data that will from now store this data (request.POST will alias this variable, to ensure backward compatibility)
  • provide a general class (ContentParser/AbstractContentParset) that can be extended. This class will implement (at least) the following functions:
    • can_parse() -> bool that returns if the parser can parse the request.body.
    • parse() that returns the parsed body, and stores it in request.data.

Implementation Ideas

request.data

The first task to tackle would be to introduce request.data and alias it with request.POST. There has already been some work
towards this task, notably by Carlton Gibson.

The first task would be to finalise this work, and submit a patch to Django. The work seems nearly complete - the only thing I would
want to explore is if all tests still pass, since it has been quite a while since it has been updated.

ContentParser

We wish to provide a general class that users can extend to make their own parsers based on whatever data they wish to parse.
This will be done with the help of the general ContentParser class, which can be extended. An general idea of the class
could be like this:

class ContentParser:
  def __init__(self, content_type=None):
    """
      Set the content type that this parser will be parsing.
    """
    self.content_type = content_type
    
  def can_parse(content_type):
    """
      Check if the content type of the request matches the content type that the parser accepts.
    """
    return self.content_type == content_type
    
   def parse(body):
    """
      Parse the data using custom logic
    """
    return None

We will be providing a JSONContentParser out of the box, as that is one of the most commonly used application types.
Based on the above mentioned class, this would be a potential approach:

import json

class JSONContentParser(ContentParser):
  def __init__(self):
    content_type = "application/json"
    super(JSONContentParser, self).__init__(content_type)
    
   def parse(data):
    return json.loads(data)

Extending this for Custom Parsers

The main goal for this is that we allow users to make their own content parsers. We need a way to allow them to add this
to their project. One idea that comes to mind is by having a list in settings.py, the way that INSTALLED_APPS is
implemented.

CONTENT_PARSERS = [
  "django.http.content_parsers.JSONContentParser",
]

Whenever a new parser is made by the user, or by the Django community, it can be appended to this list.

Footnote

The entire proposal, including information about me can be found here.

Pinging @carltongibson for suggestions (based on previous proposals and work in the similar area). Please do let me know if I have incorrectly pinged you, or if I am supposed to ping someone else.

Hi @anirudhprabhakaran3.

This has progressed since last year.

@smithdc1 has a new PR, for the request parsing, which is here:

That’s slightly on hold, because it should tie-in with modernising the request object proposal, originally from @adamchainz, but we need to type up a DEP for that. (That’s on my active list, but I didn’t get to it yet. :person_juggling:)

In terms of a GSoC project, it would have to be in terms of picking up the bits and pieces and continuing to work on them. There’s certainly a good amount to do I think, but whether it fits together well enough as a GSoC project I can’t say. I guess a really quality proposal would demonstrate that it does: all the issues and outstanding points are in eyesight, a question really of doing the work to tidy it back into a plan.

Hint: try to follow the history back to the beginning and draw up the requirements fresh: David has done a lot of the work, so what’s outstanding, and what are the issues? What’s the Backwards Compatibility and migration story? What’s the plan if we can’t proceed the request object modernisation? If you can get 100% clear on all that, then you’d be able to write a convincing proposal — but it’s going to need to be sophisticated. (That’s a challenge, but maybe one you’re up for.)

I hope that helps.

Hello!

Thank you very much for the prompt reply. I was going through the discussions from the links you mentioned, and it does seem that writing a DEP would be the ideal first step to kickoff this process. If possible, I would love to help/contribute to drafting the DEP!

Could a starting point to this be looking into this draft DEP?

Although it might be a little late for GSoC, I still do plan on writing a proposal, although I honestly doubt how effective it would be. Although I’ve used Django many times, understanding the codebase and contributing back to it, especially to a critical component like the request object, is a bit overwhelming.

Even if not selected as a GSoC project, I would still like to work on this problem - I hope that is okay! I’ll be going through more documentation, and I hope it would be fine if I post any doubts I have here?

Thank you, and have a nice day!

Hello!

Considering the last discussion, I have updated my proposal. I shall copy-paste the content here for ease of view. Please do feel free to leave comments on this! Irrespective of GSoC, I do wish to work on this task, so please do leave comments regardless. Thank you!

Also pinging @adamchainz @felixxm for comments since I noticed your comments on previous interactions on this topic. I apologise if the ping was unnecessary!


[GSoC 2024] Configurable Content Type Parsing

Introduction / Motivation

This project is motivated by an attempt for modernizing the HTTPRequest object. One of the changes proposed is adding
a content-aware request.data property. Currently, we have request.POST already parsing form data - this project aims
to:

  • add a request.data that will from now store this data (request.POST will alias this variable, to ensure backward compatibility)
  • provide a general class (ContentParser/AbstractContentParset) that can be extended. This class will implement (at least) the following functions:
  • can_parse() -> bool that returns if the parser can parse the request.body.
  • parse() that returns the parsed body, and stores it in request.data.

Previous Discussions

There has been quite some discourse on this topic. There is a draft DEP (that can be found here) that introduces the idea for content parsing. This mainly focuses on content parsing that other (extremely successful) extensions to Django have done - one of the most famous being django-rest-framework.

On another path, we had Adam Johnson make this proposal to update the request object variables to make them more pythonic and informative about their actual function, thereby removing any ambiguity. After discussion, the original issue was marked as wontfix. This was due to concerns about:

  • Documentation. As pointed out by Carlton Gibson, just the naming change require documentation updates in a lot of places.
  • Backward compatibility. This would definitely cause some problems in keeping older APIs working, especially the ones that depend on request.POST.
  • The original proposal just wanted to change nomenclature, and nothing more. Having such a disruptive change for limited functionality improvement was not considered optimal, and it was suggested to club this with the content parsing to provide a new set of APIs with new functionalities. Discussion can be found here.

As mentioned above, the final idea that emerged was to merge these two concerns into one overarching choice to improve the request object itself - provide new APIs that have content parsing in them.

There has been work done in these two directions:

  • David Smith’s patch tackles the content parsing part, by adding the BaseParser and JSONParser. This also abstracts out FormParser and MultiPartParser.
  • Abhinav Yadav’s patch tackles the issue of renaming the variables (like request.GET -> request.query_params, etc.).

These MRs are currently blocked, pending on a vote by the technical board, and DEP to be proposed. Relevant thread.

Work Proposed

As we note, there is quite some work that has happened in this field. However, there are still some bits and pieces left to make this change more cohesive and effective, without it being disruptive. The following points outline what I wish to work on during the GSoC period:

  • Assist Carlton in writing the DEP for the same. I think the draft DEP linked above would be a good starting point for us to begin from, since it is very coherent about the changes needed. However, the main focus there was for JSON parsing (bringing functionality present in other plugins like Django REST Framework), and not a generic approach. We should modify and update that to include more generic content parsing. This should also include the proposed changes for updating the request variables.
  • The MR also has to be updated to add configurable content parsers. After discussion, it was decided to not use a setting variable for this. If custom parsers are required throughout the project, users can add their parser in the middleware. The ideal approach would be to add this on a per-view basis. I think having a decorator would be pretty good, as this seems to be the way that most view-specific functionality is implemented in Django.
  • Finish the work on the MRs and get them merged after Technical Board approval. Ideally, the approach would be to create a research candidate (RC) branch, which we ask a few volunteers to use to check what all breaks. (I myself would like to volunteer - there are a couple of projects that I am working on where I would love to try it out, including one that is deployed on a production environment). Collecting feedback from these volunteers would be the best step to go ahead with this.
  • Update documentation. As shown by Carlton, there are a lot of places where the docs are to be updated. This has to be updated to show all the new features that have been introduced. As pointed out,this will required release notes.

Timeline

The project is earmarked as a 350 hour project, which I feel is justified considering the amount of work that is left to be done regarding the same. A rough breakdown of how I plan to finish this project is mentioned below. Please do note the footnote as well.

  • Community Bonding (May 1 - May 26)

    • Work on the DEP with Carlton
    • Get opinions from the wider community about the implementation details.
    • By the end of the community bonding period, hopefully we can get the DEP approved and work on the code.
  • May 27 - June 10

    • Polish David’s MR and complete it. This includes getting getting community opinions and fixing the various TODO tasks mentioned in PR.
      • Finalising the function signature of parse, along with the return type. One idea was for it to always return both post and file data, but this has to be confirmed and implemented.
      • request.POST and request.data can be called on the same request. Ideally, we do not want there to be different types of processing of request.body for different variables.
      • Proper exception raising also has to be considered.
      • For requests, we must decide how to set and get associated parsers. Currently we have a setter on the request object. There was also an question raised whether the data property should be set on the HTTPRequest or the [WSGI | ASGI]Request.
  • June 10 - July 8

    • Write code for allowing adding custom parsers. Users will ideally subclass the BaseParser class and make their own parser class. They will also have to implement the parse function to parse the request.body. They can add this to the request either by adding it in the middleware, or by adding it to a view using a decorator.
  • July 8 - July 12

    • Document all the new changes that have been introduced till now.
    • Prepare documentation and reports for midterm evaluation.
  • July 12 - July 29

    • Polish Abhinav Yadav’s MR and complete. The major concerns regarding this MR are:
      • to ensure that no other part of the request cycle breaks due to the nomenclature changes.
      • to ensure that documentation is properly updated to reflect these change.
      • to ensure that (wherever applicable) warnings (ideally deprecation warnings, if we are planning to remvoe the old APIs) are displayed in an informative manner.
    • By the end of this phase, we should have an release candidate (RC) ready for beta testing among volunteers and among the wider Django community.
  • July 29 - August 12

    • Work with Adam to introduce these changes into django-upgrade. I am giving myself a couple of weeks for this, as I am not fully up to date with this project, and I might have some difficulties that I have not considered right now.
  • August 12 - August 26

    • This time will mainly be used for iterative testing and improvement based on usage by volunteers. There might be some corners that we missed out, or some bugs we introduced that broke earlier functionality, which will need correction.
  • August 26 - September 2

    • Bug fixes
    • Complete documentation
    • (Hopefully) Approval and merging into the main codebase, with release notes provided for anyone migrating to this version!

Hi @anirudhprabhakaran3, I’m one of the org admins for Django’s GSoC participation this year. Just wanted to confirm we’ve received your final proposal – thank you for taking it through :slight_smile:

There’s nothing further for you to do as far as the GSoC submission. We’ll be reviewing all proposals internally and confirming availability of mentors. Google will announce results on May 1st at 18:00 UTC.

1 Like