[GSOC 2023] Add Configurable Content Type Parsing and Modernise Request Object

Hello Everyone,

Thank you for selecting my proposal and providing me with this opportunity to work with you @nessita @smithdc1 @adamchainz !

I am starting this thread to monitor my progress as per the suggestion of Natalia. I will also try to write blogs detailing the work and progress I have made every 2 weeks.

Work Done so Far:
As per discussions with Carlton in this thread, I was able to complete the pre-GSoC target of this project, which was adding a JSON parser. The PR was made in Carltonā€™s fork of django. Link to PR

Current:
Moving forward I would like to open a WIP PR in the main Django repo, working on the latest branch. Part 1 of my project is already done, so I will get to work on the second part (adding the custom parser API).

Due to unforeseen circumstances, I wasnā€™t able to get a proper review of my proposal and the proposed implementation during the application period. It would be helpful if the mentors can give their thoughts and if any changes are required. Gist of my proposal

Iā€™m excited to continue working on this project and contribute to Djangoā€™s development :rocket:.

Warm Regards,
Abhinav

4 Likes

Hi @SirAbhi13 :wave:

Congratulations again on having your proposal accepted this year. :tada:

Thank you for your links to previous work, they are very useful. Iā€™ll have a read in the coming days. In the meantime, I agree it would be good to move that PR from Carltonā€™s fork to the main Django repo to improve visibility. (This PR is the first step in your project?)

I think at this stage you are several steps ahead of us. It would therefore be useful if you could help me catch-up to where you are with your thinking. I think if you could articulate your plan on what you see are the key steps that would be helpful, this could be a summary of your milestones in your proposal, assuming thatā€™s still your plan.

Iā€™d also be interested in understanding if there are the any dependencies between changes (e.g. do we need custom parsing before renaming attribute names?), what design decisions need to be made, what support / reviews youā€™re expecting and so on. The aim here is for you to help manage us to some extent, tell us when youā€™re aiming for / need reviews for to keep the project on plan.

Welcome aboard! :ship:

1 Like

Welcome @SirAbhi13 :slight_smile: Iā€™m looking forward to seeing how you progress with the work.

Some notes on your proposal.

** 3.1.1 Rename HttpRequest.POST to HttpRequest.data and route HttpRequest.POST API calls to HttpRequest.data.**

I donā€™t think we can route POST to data - it needs to be left as-is. It could lead to security problems in many cases if code that expects only form data starts also accepting JSON, or whatever other parsers are configured, as well.

The idea of request.data is that itā€™s a new content-type aware attribute.

3.1.4 Remodel Form Parsing logic

Move the form parsing logic from the HttpRequest class to the parsers.py module and also model it after the new API design.

I am not sure this is entirely feasible. The new parsers API should be for setting request.data only. But multipart form parsing necessitates parsing both form data (request.POST) and files (request.FILES). We should not combine these into a single dictionary.

Previously, Django had request.REQUEST which merged request.GET and request.POST, leading to confusion and security problems. This was deprecated in 1.7: Django 1.7 release notes | Django documentation | Django . We donā€™t want to repeat that kind of mistake with the new design.


Also from your proposal:

If time permits, I would like to contribute to django-upgrade project to see if I can write code to help people migrate their code to the new design.

Absolutely. It will be a good stretch goal if you can get to that point.

2 Likes

Hi @SirAbhi13 hope youā€™ve had a good week.

Just wondering if youā€™ve any thoughts on the above?

Do let us know if you need any support.

P.s. looking forward to reading your first blog post!

1 Like

Hi Adam, sorry for the late response.
Here are some of my thoughts on the points you raised.

request.POST will still be a separate property that will exist in the request object. Whenever reqest.POST is called, the content-type header of that request object will be checked, if it is a request containing form data, the request.data QueryDict will be returned wherever the request.POST API call is made. If the content-type is something else, an empty QueryDict will be returned. I think this check will prevent any wrong data going at places where only form data is expected. Please do correct me if this line of thinking is wrong. As far as I can see, this will mimic the current behavior of the _load_post_and_files() method in request.py

I was of the opinion since we will create a parsers.py file anyway, it would be better IMO to move all the request parsing logic in one place. This will include the file parsing logic in the multipart form parsing too. I didnā€™t intend to combine both into a single dictionary. This was just a suggestion and I am open to suggestions on this issue for the best course of action. I understand it will be easier to leave the existing design as it is. I will have to think if we donā€™t change it, how will the multipart parsing will be handled along with the new design.

Thank you for bringing this to my atttention! I will surely keep this in mind and move forward without bringing regressions in the new design.

Hi David, I hope you have had a great week too!

I will give a detailed response to your questions on this weekend. I have had my mid-sem exams this week and wasnā€™t able to do much. I will start working on my project from this Saturday and will keep everyone in the loop of my next steps.

Will write my blog by this weekend and will share it here!

Hi @SirAbhi13! How are things going?

We were wondering if you had any chance to write a blog post or similar with details of your work/investigation so far. Are you block in any way? How can we help?

2 Likes

Hi all,
I have written an introductory blog post about getting selected in to GSoCā€™23. Please have a look and let me know your thoughts! I will write more blogs after each month to recap what work has been done so far.

The way I have planned out this project is to add the implementation of the custom parser and write all the supporting methods needed for it, including the JSONParser(), followed by the tests of these additions.
The key steps are:
Add the logic for the Configurable Parse API, this will include makina parsers.py file, and adding support methods for this logic to work in request.py file (examples are initialisation of parser classes, checking of content-type, etc)

The JSONParser will be added with the above implementation itself.

After at least the skeleton of this is implemented, i will start writing tests.

This will be the task at hand till the first evaluation.

After first evaluation, the goal will be to polish the changes and do a sanity check of the implementation (which will be done before too, but a final check before moving on with the final part of the project).

These changes are not dependent on each other and can be done in any order, but IMO, it would be better to implement the custom parsing part before renaming attribute names.

Regarding the design, I am confident in the methodology I have proposed in my proposal. The support I need which I can think of is perhaps design choices which can be myopic and your large experience and familiarity with Django ecosystem and of the codebase itself can help me come up with a better solution. I might also need little help getting 100% test coverage of the implementation of the custom parser.

Hi @nessita, everything is on track!

I have been pondering and reading up on tests and small changes that will be needed to complete this project.

I will open a draft PR shortly with initial changes and will commit the changes for Custom Parser bit by bit.

1 Like

Hi all,

I managed to have a brief chat this week with Carlton and Adam about this at DjangoCon Europe. My two key take aways were.

  1. We must not change any of the existing behaviour. Must make sure we have good test coverage. New behaviour should be accessed via the new API.

  2. Multipart parsing. Should be able to submit say text, json and a file as separate parts and they must all get parsed to their correct type.

1 Like

It was good to chat to you all earlier.

One additional thought Iā€™ve had this evening is about performance. This code will get used a lot therefore we need to be mindful of any impact to performance. Thereā€™s the django-asv repo where we could review/add benchmarks to test for any regressions (or improvements).

Thereā€™s some benchmarks already in the suite. Iā€™m not sure if this is enough.

1 Like

@SirAbhi13 Hello!

(Below a little context for the rest of the community) During our call last week we agreed that this project would be roughly progressing as follow:

  • Step 0, preparation: audit tests for HttpRequest to get familiar with whatā€™s there. Identify missing tests, specifically for the handling/values of request.POST in non trivial scenarios like multipart requests with mixed content types. In the call, at least 2 scenarios with missing tests were spotted so were agreed to expect at least a handful of new tests. This would ideally be an isolated and landable PR.

  • Step 1, the bulk of the feature: alias POST to data following previous discussions and history (branches from Carlton, multiple tickets, etc). Support parsing of content type application/json (only in data, not in POST).

  • Step 2, modernization of request: as discussed in tickets and forum posts (@smithdc1 @adamchainz Iā€™m not 100% sure about this really being step 2, please correct me if I remember this wrong)

Future steps, depending on progress and availability: integrate with upgrade tools, generalize the content type parsing logic to allow for custom parsers.

We were wondering if you had any time to progress on step 0, we discussed a rough/approximate ETA of completing that in a week. I recall you mentioned you had exams starting on the 14th, but hopefully you were able to make some progress on step 0 before that?

If you are blocked in any way, please reach out in Discord or in this forum.

Good luck with the exams! Natalia.

2 Likes

Hello @SirAbhi13!

Hope all is well with you and the exams. We just wanted to check in and see how things have been going. Have you had a chance to catch up on the latest comments (in this post and on your draft PR)?

Iā€™ll be out for a couple of weeks but David and Adam will be around. David has left actionable feedback on your PR, and weā€™re keen to know if youā€™ve made any progress. If you need any support, please reach out, the sooner the better.

Thanks, Natalia.

1 Like

HI @nessita, sorry for the delayed reply.

Just one exam left on Monday (26th june), and I will be back on track with where I left off the work.

I was able to check the comments left by David, and had some thoughts on the next steps of action.
In the coming days (from 27th of june to 29th june), I plan to read the all the tests related to our scope and detail the current behaviour completely (The comment left by David was to the same effect o how json data was handled in original module). I was able to make progress on this after our call and had also read the test coverage report.

I will update my progress as soon as Iā€™m done with step 0 from my side, and can share my findings then. Then I will start work tests for step 1 and along with writing appropriate logic for step 1.

Hi all,
I made progress in tackling the problem where POST was behaving differently when multipart/form-data was sent, the PR is

They way I tackled this problem was when POST will be called now, if there is a multipart/form-data request, there will be a duplicate QueryDict object, where the dictionary object of a key will be transformed back to the string variant which was expected in the previous behaviour.

The issue Iā€™m having is that this change break some tests in csrf_tests and some other modules and Iā€™m unable to pinpoint the cause, so I if any pointers about this way of solution can be provided, it would be helpful.

Meanwhile Iā€™m trying to pinpoint the error by understanding error stack in jenkin logs to figure out what went wrong, the most common message Iā€™m seeing is AttributeError: This QueryDict instance is immutable and AttributeError: 'TestingHttpRequest' object has no attribute '_read_started'. The failed tests are admin_tests and csrf_tests.

Apart from thinking about this problem, I have started coding the User interface for setting the parsers for a request.

Hi everyone,
In the latest commit I have now added the logic to allow users to add their own parsers to get a desirable output.
I would appreciate if the mentors can go over the design which was used to implement it and if there are any gaps. I havenā€™t added tests yet because the desired code might not be complete yet if there are any gaps.

I have followed the same design methodology used for changing/adding upload handlers of the request object.

We would also need to discuss the csrf policy and how we can handle it because currently as far as I understand a view function would need a @csrf_exempt decorator to allow changes to parser_classes attribute of the request object.

The issue of tests not passing due to the previous commit where I added logic to preserve the behaviour of request.POST for multipart/form-data is still there and I would appreciate any insight or thoughts to unblock me on the design flaw.

Regards,
Abhinav.

Just a quick follow up here to say that this isnā€™t being actively developed under GSoC anymore so if anyone else wants to pick it up that would be amazing.

The expected next step is to increase test coverage for existing behavior as outlined above.