Currently, this is a placeholder, as I update this with my proposal for the idea. Sorry for the inconvenience, as I try to update this as soon as possible!
[GSoC 2024] Configurable Content Type Parsing
Introduction / Motivation
This project is motivated by an attempt for modernizing the HTTPRequest
object. One of the changes proposed is adding
a content-aware request.data
property. Currently, we have request.POST
already parsing form data - this project aims
to:
- add a
request.data
that will from now store this data (request.POST
will alias this variable, to ensure backward compatibility) - provide a general class (
ContentParser
/AbstractContentParset
) that can be extended. This class will implement (at least) the following functions:can_parse() -> bool
that returns if the parser can parse therequest.body
.parse()
that returns the parsed body, and stores it inrequest.data
.
Implementation Ideas
request.data
The first task to tackle would be to introduce request.data
and alias it with request.POST
. There has already been some work
towards this task, notably by Carlton Gibson.
The first task would be to finalise this work, and submit a patch to Django. The work seems nearly complete - the only thing I would
want to explore is if all tests still pass, since it has been quite a while since it has been updated.
ContentParser
We wish to provide a general class that users can extend to make their own parsers based on whatever data they wish to parse.
This will be done with the help of the general ContentParser
class, which can be extended. An general idea of the class
could be like this:
class ContentParser:
def __init__(self, content_type=None):
"""
Set the content type that this parser will be parsing.
"""
self.content_type = content_type
def can_parse(content_type):
"""
Check if the content type of the request matches the content type that the parser accepts.
"""
return self.content_type == content_type
def parse(body):
"""
Parse the data using custom logic
"""
return None
We will be providing a JSONContentParser
out of the box, as that is one of the most commonly used application types.
Based on the above mentioned class, this would be a potential approach:
import json
class JSONContentParser(ContentParser):
def __init__(self):
content_type = "application/json"
super(JSONContentParser, self).__init__(content_type)
def parse(data):
return json.loads(data)
Extending this for Custom Parsers
The main goal for this is that we allow users to make their own content parsers. We need a way to allow them to add this
to their project. One idea that comes to mind is by having a list in settings.py
, the way that INSTALLED_APPS
is
implemented.
CONTENT_PARSERS = [
"django.http.content_parsers.JSONContentParser",
]
Whenever a new parser is made by the user, or by the Django community, it can be appended to this list.
Footnote
The entire proposal, including information about me can be found here.
Pinging @carltongibson for suggestions (based on previous proposals and work in the similar area). Please do let me know if I have incorrectly pinged you, or if I am supposed to ping someone else.
This has progressed since last year.
@smithdc1 has a new PR, for the request parsing, which is here:
Thatâs slightly on hold, because it should tie-in with modernising the request object proposal, originally from @adamchainz, but we need to type up a DEP for that. (Thatâs on my active list, but I didnât get to it yet. )
In terms of a GSoC project, it would have to be in terms of picking up the bits and pieces and continuing to work on them. Thereâs certainly a good amount to do I think, but whether it fits together well enough as a GSoC project I canât say. I guess a really quality proposal would demonstrate that it does: all the issues and outstanding points are in eyesight, a question really of doing the work to tidy it back into a plan.
Hint: try to follow the history back to the beginning and draw up the requirements fresh: David has done a lot of the work, so whatâs outstanding, and what are the issues? Whatâs the Backwards Compatibility and migration story? Whatâs the plan if we canât proceed the request object modernisation? If you can get 100% clear on all that, then youâd be able to write a convincing proposal â but itâs going to need to be sophisticated. (Thatâs a challenge, but maybe one youâre up for.)
I hope that helps.
Hello!
Thank you very much for the prompt reply. I was going through the discussions from the links you mentioned, and it does seem that writing a DEP would be the ideal first step to kickoff this process. If possible, I would love to help/contribute to drafting the DEP!
Could a starting point to this be looking into this draft DEP?
Although it might be a little late for GSoC, I still do plan on writing a proposal, although I honestly doubt how effective it would be. Although Iâve used Django many times, understanding the codebase and contributing back to it, especially to a critical component like the request
object, is a bit overwhelming.
Even if not selected as a GSoC project, I would still like to work on this problem - I hope that is okay! Iâll be going through more documentation, and I hope it would be fine if I post any doubts I have here?
Thank you, and have a nice day!
Hello!
Considering the last discussion, I have updated my proposal. I shall copy-paste the content here for ease of view. Please do feel free to leave comments on this! Irrespective of GSoC, I do wish to work on this task, so please do leave comments regardless. Thank you!
Also pinging @adamchainz @felixxm for comments since I noticed your comments on previous interactions on this topic. I apologise if the ping was unnecessary!
[GSoC 2024] Configurable Content Type Parsing
Introduction / Motivation
This project is motivated by an attempt for modernizing the HTTPRequest
object. One of the changes proposed is adding
a content-aware request.data
property. Currently, we have request.POST
already parsing form data - this project aims
to:
- add a
request.data
that will from now store this data (request.POST
will alias this variable, to ensure backward compatibility) - provide a general class (
ContentParser
/AbstractContentParset
) that can be extended. This class will implement (at least) the following functions: can_parse() -> bool
that returns if the parser can parse therequest.body
.parse()
that returns the parsed body, and stores it inrequest.data
.
Previous Discussions
There has been quite some discourse on this topic. There is a draft DEP (that can be found here) that introduces the idea for content parsing. This mainly focuses on content parsing that other (extremely successful) extensions to Django have done - one of the most famous being django-rest-framework.
On another path, we had Adam Johnson make this proposal to update the request
object variables to make them more pythonic and informative about their actual function, thereby removing any ambiguity. After discussion, the original issue was marked as wontfix
. This was due to concerns about:
- Documentation. As pointed out by Carlton Gibson, just the naming change require documentation updates in a lot of places.
- Backward compatibility. This would definitely cause some problems in keeping older APIs working, especially the ones that depend on
request.POST
. - The original proposal just wanted to change nomenclature, and nothing more. Having such a disruptive change for limited functionality improvement was not considered optimal, and it was suggested to club this with the content parsing to provide a new set of APIs with new functionalities. Discussion can be found here.
As mentioned above, the final idea that emerged was to merge these two concerns into one overarching choice to improve the request
object itself - provide new APIs that have content parsing in them.
There has been work done in these two directions:
- David Smithâs patch tackles the content parsing part, by adding the
BaseParser
andJSONParser
. This also abstracts outFormParser
andMultiPartParser
. - Abhinav Yadavâs patch tackles the issue of renaming the variables (like
request.GET -> request.query_params
, etc.).
These MRs are currently blocked, pending on a vote by the technical board, and DEP to be proposed. Relevant thread.
Work Proposed
As we note, there is quite some work that has happened in this field. However, there are still some bits and pieces left to make this change more cohesive and effective, without it being disruptive. The following points outline what I wish to work on during the GSoC period:
- Assist Carlton in writing the DEP for the same. I think the draft DEP linked above would be a good starting point for us to begin from, since it is very coherent about the changes needed. However, the main focus there was for JSON parsing (bringing functionality present in other plugins like Django REST Framework), and not a generic approach. We should modify and update that to include more generic content parsing. This should also include the proposed changes for updating the request variables.
- The MR also has to be updated to add configurable content parsers. After discussion, it was decided to not use a setting variable for this. If custom parsers are required throughout the project, users can add their parser in the middleware. The ideal approach would be to add this on a per-view basis. I think having a decorator would be pretty good, as this seems to be the way that most view-specific functionality is implemented in Django.
- Finish the work on the MRs and get them merged after Technical Board approval. Ideally, the approach would be to create a research candidate (RC) branch, which we ask a few volunteers to use to check what all breaks. (I myself would like to volunteer - there are a couple of projects that I am working on where I would love to try it out, including one that is deployed on a production environment). Collecting feedback from these volunteers would be the best step to go ahead with this.
- Update documentation. As shown by Carlton, there are a lot of places where the docs are to be updated. This has to be updated to show all the new features that have been introduced. As pointed out,this will required release notes.
Timeline
The project is earmarked as a 350 hour project, which I feel is justified considering the amount of work that is left to be done regarding the same. A rough breakdown of how I plan to finish this project is mentioned below. Please do note the footnote as well.
-
Community Bonding (May 1 - May 26)
- Work on the DEP with Carlton
- Get opinions from the wider community about the implementation details.
- By the end of the community bonding period, hopefully we can get the DEP approved and work on the code.
-
May 27 - June 10
- Polish Davidâs MR and complete it. This includes getting getting community opinions and fixing the various TODO tasks mentioned in PR.
- Finalising the function signature of
parse
, along with the return type. One idea was for it to always return bothpost
andfile
data, but this has to be confirmed and implemented. request.POST
andrequest.data
can be called on the same request. Ideally, we do not want there to be different types of processing ofrequest.body
for different variables.- Proper exception raising also has to be considered.
- For requests, we must decide how to set and get associated parsers. Currently we have a setter on the
request
object. There was also an question raised whether thedata
property should be set on theHTTPRequest
or the[WSGI | ASGI]Request
.
- Finalising the function signature of
- Polish Davidâs MR and complete it. This includes getting getting community opinions and fixing the various TODO tasks mentioned in PR.
-
June 10 - July 8
- Write code for allowing adding custom parsers. Users will ideally subclass the
BaseParser
class and make their own parser class. They will also have to implement theparse
function to parse therequest.body
. They can add this to the request either by adding it in the middleware, or by adding it to a view using a decorator.
- Write code for allowing adding custom parsers. Users will ideally subclass the
-
July 8 - July 12
- Document all the new changes that have been introduced till now.
- Prepare documentation and reports for midterm evaluation.
-
July 12 - July 29
- Polish Abhinav Yadavâs MR and complete. The major concerns regarding this MR are:
- to ensure that no other part of the request cycle breaks due to the nomenclature changes.
- to ensure that documentation is properly updated to reflect these change.
- to ensure that (wherever applicable) warnings (ideally deprecation warnings, if we are planning to remvoe the old APIs) are displayed in an informative manner.
- By the end of this phase, we should have an release candidate (RC) ready for beta testing among volunteers and among the wider Django community.
- Polish Abhinav Yadavâs MR and complete. The major concerns regarding this MR are:
-
July 29 - August 12
- Work with Adam to introduce these changes into
django-upgrade
. I am giving myself a couple of weeks for this, as I am not fully up to date with this project, and I might have some difficulties that I have not considered right now.
- Work with Adam to introduce these changes into
-
August 12 - August 26
- This time will mainly be used for iterative testing and improvement based on usage by volunteers. There might be some corners that we missed out, or some bugs we introduced that broke earlier functionality, which will need correction.
-
August 26 - September 2
- Bug fixes
- Complete documentation
- (Hopefully) Approval and merging into the main codebase, with release notes provided for anyone migrating to this version!
Hi @anirudhprabhakaran3, Iâm one of the org admins for Djangoâs GSoC participation this year. Just wanted to confirm weâve received your final proposal â thank you for taking it through
Thereâs nothing further for you to do as far as the GSoC submission. Weâll be reviewing all proposals internally and confirming availability of mentors. Google will announce results on May 1st at 18:00 UTC.