GSOC '23 Discussion for Configurable Content Type Parsing

Hello everyone,
I want to contribute and modernise the request object and add configurable content parsers as 2023 GSOC project. Due to my past experience of contributing to the framework and my research on the same, I am creating this discussion to make sure I was on the right track and get some clarity on the scope of the project too.
I went through the discussions from the mailing list and the original ticket #21442 over the past decade to trace the objective of this project and have come up with the following points:

  • according to @adamchainz request.GET and request.POST are misleadingly named, and therefore should be renamed to request.query_params and request.data respectively.
  • the initial review of the community was that this would be a too big of a change and hence the PR addressing the same was closed.
  • after a board vote, it was decided to add extra features to the request object through data attribute to essentially “pay” for such a massive change to the api.

I have been reading the code of http module, especially request.py, trying to understand how the demands can be met, and have also been reading and understanding Tom Christie’s original work in DRF to see the behaviour there.

From a rough overview of the problem, the logic of multiple parsers will probably be written in a different file like parsers.py and according to the Content-Type header of the request, appropriate parser will be used. The user will also have the ability to specify the desired parsers in a view.

The last work done on this was by @carltongibson , where he opened a PR to add the data attribute and reroute POST calls of request to the data attribute.
The next goals in the checklist are to add support for json parsing and then multipart parsing, and my question was will it be implemented by adding different _parse() methods which will check content type and then subsequently call appropriate methods?

I would appreciate some guidance as to where I should dedicate my time to have a better understanding of the project so I can better come up with a solution.

I also had doubts on what is the scope of adding can_accept() and can_parse() methods to the API, what will be achieved from these two methods?

Hi @SirAbhi13.

It looks to me like you have a firm grasp.

I want to initially provide JSON handling as well as the form data that we currently have. The idea with the list of parsers is that we could handle any content type a user wants to provide a parser for. Django will provide form data and JSON parsing, but others that might be used could be MessagePack, Protobuf, XML, yaml, … — it goes on. Also there are specialisations, like application/vnd.github+json and so on. All Django needs to provide is the API for a parser — likely a can_accept() and parse() :thinking:. The idea would be to have a list and use the first parser that said yes to can_accept — allowing a specialist parser a first shot, before falling back to a more generic parser (for the +json type examples).

One area to look at is updating Adam’s original PR — we’ll add data rather than form_data but other than that the other changes should go in. We’ll want to take the chance to update the request object at the same time as adding the new content type aware data parsing.

I hope that helps.

Kind Regards,

Carlton

1 Like

Thank you Carlton for your feedback and explanation :slightly_smiling_face:!

I will ponder more about the solution to your raised demands and also shortly open a PR for the pre-gsoc items on the checklist.

I will keep updating this thread with my ideas.

Hi @carltongibson,
I have opened a PR against the branch 5.0/add-request.data. Added json parsing support to request object

I would really appreciate it if you can please take a look and give your thoughts for the same, and then I can add the logic of Json data in multipart requests, which I was wondering if I should make another class in parsers.py or just modify multipartparser.py

I had a few questions and would appreciate some clarity on these:

  1. In an effort to modernise the request object, will we also add parser logic for the already existing elif branches or will we leave it as it is? I think streamlining the _load_post_and_files() method is the way to go.

  2. Where do we want the user to manipulate the parsing behaviour of the incoming requests in django? I’m confused between just providing the API to the user which they can override later to their use (ex: make a parser and then add it to the parsers list in their view) or creating a middleware.

I have started drafting my proposal and look forward to share the first the first draft for an early review.

Hi @SirAbhi13 — I looked over your draft PR.

Once thing I note is that we need to test against the existing behaviour for .POST. If we just add JSON parsing, that would change. So an important part of the project is making sure we don’t break old code. In general, more emphasis on testing different tests cases beyond the simplest ones goes a long way.

See the discussion here about raising UnsupportedMediaType https://groups.google.com/g/django-developers/c/O8WfDpplXuI/m/Aj3iIFu8AwAJ .

Given the .data attribute would be a new feature of the request object I assume we don’t have any backward compatiblity concerns to worry about as long as we document the behaviour of .data properly and leave .POST unchanged

I’m going to cc @udbhavsomani here too, since you’re both working on the same idea, and these points apply to him too. (This is a big project so it’s not necessarily an issue to have two prospects — but all depends on your proposals, and how they are reviewed, and how many slots we’re allocated: it’s not just up to me.)

2 Likes

Hi @carltongibson,
I am attaching my draft proposal for this project and would appreciate if you could review the proposed implementation once. Proposal for adding configurable content type parsing and modernising the request object

Some notes of this proposal are:

  • I have continued the method I created in my PR for adding json parsing to django in this proposal.
  • I wanted to ask when we rename the attributes for modernising the request object, will we still maintain backward compatibility (meaning will we still provide access to previous API endpoints, ex request.FILES which will be rerouted to request.files internally)?
  • I plan to provide the same method for modifying the request.parser_classes attribute as that of modifying request.upload_handlers attribute. I think this is the best solution as this mimics the already existing behavior of Django and will be easy to adopt too. I want to know your thoughts about this method of implementation because you mentioned providing the ability to change the attribute through middlewares

Hi @SirAbhi13 — sorry for the slow reply. Life.

Do submit your proposal. It looks OK. I don’t want to get too into the details at this stage: we can work those out during the project phase. The goal here is for you to demonstrate a clear grasp of the project.

Kind Regards,

Carlton