Hi everyone, I want to contribute to this project as part of GSOC 2023, I’ve created this discussion to get some more details about the project and get going in the right direction!
On 1. and 2. request.POST will become an alias for the new request.data, so that existing code will continue to work. It will be scoped for only POST requests, and only form data encoded parsing (maybe) — so that its behaviour is unchanged.
On 3. The idea is any content type a user wants to provide a parser for. Django will provide form data and JSON parsing, but others that might be used could be MessagePack, Protobuf, XML, yaml, … — it goes on. Also there are specialisations, like application/vnd.github+json and so on. All Django needs to provide is the API for a parser — likely a can_accept() and parse() . The idea would be to have a list and use the first parser that said yes to can_accept — allowing a specialist parser a first shot, before falling back to a more generic parser (for the +json type examples).
I want to pick up my WIP PR shortly, and aim for it to be complete before the start of the GSoC season — help there could include test cases for the (single and multipart) JSON parsing, which is next on my list. A GSoC project would be based around generalising the Parser bit, which I’m not aiming for in the first phase.
Super! If you’d like to join in, I’m happy to have the help.
First of is straight application/json parsing. (So a request.data gives parsed Python object from a JSON body.)
Then it’s multipart examples with JSON parts.
With those bootstrapped we can look at more examples. If you wanted to make a PR against my PR branch (if that makes sense ) that would be awesome. We can discuss details there.
Hey @carltongibson!
I set up the repo locally on my machine and ran the tests.py. I tried writing a test function for JSON acceptance, but it seems like request.data returns an empty querydict when content-type is changed to application/json. The data is there in the request.body from where I can load and match it.
Here is the test function that I tried.
def test_json_data(self):
"""
JSON from various methods.
"""
for method in ["GET", "POST", "PUT", "DELETE"]:
with self.subTest(method=method):
payload = FakePayload("{\"key\": \"value\"}")
request = WSGIRequest(
{
"REQUEST_METHOD": method,
"CONTENT_LENGTH": len(payload),
"CONTENT_TYPE": "application/json",
"wsgi.input": payload,
}
)
print(request.data) # <QueryDict: {}>
self.assertEqual(json.loads(request.body), {"key": "value"}) # OK
Any help on what is happening and what am I missing out on?
Yes, that looks right, and you’ve discovered the current behaviour.
The task then is to make that pass.
If you look in django/http/request.py for the HttpRequest._load_post_and_files() method, you’ll see a branches where different content types are handled. Current behaviour there is to drop out the bottom, in the else branch. This is the bit that we’ll eventually make pluggable, but, for now we can add a branch for application/json.
This test case passes with the given output. Is it correct though? As in, is this the output that should be acceptable?
There is this function:
def test_body_after_POST_multipart_form_data(self):
"""
Reading body after parsing multipart/form-data is not allowed
"""
# Because multipart is used for large amounts of data i.e. file uploads,
# we don't want the data held in memory twice, and we don't want to
# silence the error by setting body = '' either.
payload = FakePayload(
"\r\n".join(
[
"--boundary",
'Content-Disposition: form-data; name="name"',
"",
"value",
"--boundary--",
]
)
)
request = WSGIRequest(
{
"REQUEST_METHOD": "POST",
"CONTENT_TYPE": "multipart/form-data; boundary=boundary",
"CONTENT_LENGTH": len(payload),
"wsgi.input": payload,
}
)
self.assertEqual(request.POST, {"name": ["value"]})
with self.assertRaises(RawPostDataException):
request.body
Please help me understand what this test really does?
# in django/http/request.py
def body(self):
if not hasattr(self, "_body"):
if self._read_started:
raise RawPostDataException(
"You cannot access body after reading from request's data stream"
)
...
Since multipart parser reads from request._stream (via read()) — this enforces (basically) that you either use the parsed POST method or you handle this yourself via body but not really both.
Hi @carltongibson
I was working on the multipart/form-data with json parts and I am kind of stuck at the multiparser bit.
I was looking at the code in django/http/multipartparser.py for the parse function.
Under this loop (which, imo, runs for every part of the multipart request): for item_type, meta_data, field_stream in Parser(stream, self._boundary):
I tried to add this code for json parsing, but got stuck on how to access the data from the stream:
try:
sub_content_type = meta_data["content-type"][0]
sub_content_type = sub_content_type.strip()
if sub_content_type == 'application/json':
pass # access data by using the exhaust() method maybe?
except (KeyError, IndexError, AttributeError):
continue
Is this the correct approach?
And for the gsoc configurable type parsing project, we would require a similar, new class with multiple parsers and use this multipartparser as the default one?
For single part parsing we just use the list of parsers. Then there’s a multipart parser that uses the same list for each part (or so I imagine… but it’s not settled yet: part of the project is determining what’s needed.)
It would be really kind of you to please review it. I have made the test cases for multipart+JSON type request objects as well as modified the existing parser bit which now returns (list of) dict(s) for the given JSON key.
OK, great. I shall take another look next week now. Good effort!
The other part of this project is bringing in the effort to modernise the request object. (We want to do both in the scope of the same release, 5.0) That should be a case of updating @adamchainz’ original PR. If you wanted to browse that and have a think about it, that would be worth your time.
Check-out the django-upgrade project. Creating updaters to help people migrate their code would be a nice touch, and interesting (from the Python ast point of view).
Hi @carltongibson!
I went through a bunch of threads (github + mailing list + tracker). I am in the process of drafting my GSoC proposal for the same.
Just wanted your inputs for the same:
Adding the request.query_params property and refining the request object further.
Create a new parser class that will check for the content-type of the request object and try to return parsed python object(s).
This can be activated by adding it as a middleware? can be there by default as well.
A fallback to the default parser if this middleware class does not return any compatible parser.
Adding this migration to the django-upgrade project
These are the things I’m currently looking to get done during the project term. Please add on to anything that I might have missed.