DRF: premature business logic validations

s-maibuecher · June 15, 2022, 1:56pm

We are using DRF and got the new requirement to bundle all input validation errors and all possible business logic validation errors in one single reponse when the client submits a form.
That violates the default drf workflow which fires the response when input has syntax errors and ignores further validations (fail fast).
So our thought is that we can write a custom_exception_handler which waits for a raised ValidationError, runs all (possible) business validations and enriches the response. But this handler will grow with each endpoint which needs business logic validation.

Are there further disadvantages beside much logic in one single handler? Or is there a better way?

Thanks for answering.

KenWhitesell · June 15, 2022, 2:38pm

If you’ve got a fundamental syntax error in the submitted JSON, how can you validate anything else?

If you’re missing a comma between elements, or a terminating quote for a string, or a close bracket for a list, etc, in what sense can you appropriately interpret anything else?

You’re just as likely to trigger an erroneous error as an appropriate one.

It doesn’t seem to me that doing this provides any real value.

(And, as a general issue of security, providing any sort of feedback for truly invalid data creates additional surface area as the target for an attack. Different types of invalid data can be fired at the endpoint in attempts to acquire information about internal data structures, leading to targeted attacks on those structures. In general, it’s why a production Django deployment produces vastly different responses for 500 errors in test and production.)

s-maibuecher · June 15, 2022, 3:22pm

Yes, possible validations. There are running several business logic validations, many of them need only a subset of the input values.

Product management don‘t want the user need to submit the form multiple times. They want to provide the user with as much validation infos as possible.

And also yes, security. But is is an intern enterprise application only provided to colleagues. (Till we want to build an mobile version, I guess). Good point.

Would you anyway try to reject this requirement?

KenWhitesell · June 15, 2022, 4:00pm

Except, in the case of syntactically invalid JSON, you don’t know what you’re validating. For example, if you’re missing a close-brace ( } ) that is supposed to terminate a dict, you have no way to determine that the following key should be validated on the parent object instead of the current dict. Any conclusions you come to based on that information is not valid.

There are very good reasons why syntactically invalid JSON is, and should be, summarily rejected without any further processing.

If they’re manually creating JSON data for submission to a site, you’ve got a different problem…

Depending upon how you categorize the data, somewhere between 20% and 50% of all security breaches are the result of internal actions. To say that an application is safe because it’s available only to employees ignores that fact. (Of course, if the data being leaked isn’t sensitive or particularly valuable, then it may not matter - but only an appropriate risk assessment can determine that.)

As a general principle, Absolutely. Without qualification or reservation. As highlighted above, it potentially provides as much misleading information as information of value - which can only create more confusion as the original problem is solved and an error now shows up in an area once thought to be safe because no problem was identified in that location of the data.

s-maibuecher · June 15, 2022, 10:02pm

Thank you. I’ll discuss that.

s-maibuecher · June 28, 2022, 7:20am

Hi @KenWhitesell,
we’ve discussed this topic. Should I open a new thread or is it okay to reply to this one?

I don’t know if I understood your answer in all the technical details regarding security or if we retain the problem with following solution, but a colleage had an idea:
What if you manage several serializers for an endpoint? Here, the main_serializer is the serializer which is quiet simple and just is for the field validation of all fields:

class MainSerializer(serializers.ModelSerializer):
    class Meta:
        model = Tblnetze
        fields = "__all__"

And then you can hand over more serializers, which contains business logic validations and just contain a subset of the fields they need. Therefore all required fields went through security checks, or am I wrong?

class BL1Serializer(serializers.ModelSerializer):

    class Meta:
        model = Tblnetze
        fields = [
            "a",
            "b",
            "c",
        ]

    def validate(self, attrs):
        data = super().validate(attrs)
        # ... business logic validation
        return data

And here the view:

    def post(self, request, format=None):

        main_serializer = MainSerializer(data=request.data)

        serializers_list = [
            main_serializer,
            BL1Serializer,
            BL2Serializer,
            # ...
        ]

        # here all serializers.is_valid() methods are called and exceptions are collected
        all_exceptions = validate_all(serializers_list, request.data)

        if not all_exceptions:
            # all validations went through, save object
            main_serializer.save()
            return Response(main_serializer.data, status=status.HTTP_201_CREATED)
        else:
            multi_exception_response = get_response_from_exc_list(all_exceptions)
            return multi_exception_response

What do you think? (Thank you very much for your help!)

KenWhitesell · June 28, 2022, 10:12am

I’m sorry, I’m not understanding what you’re asking for here - or how this relates to the original topic (if it’s related to the original topic).

The original post was talking about the idea of trying to process syntactically invalid JSON, which is always a bad idea.

Here you’re talking about accepting syntactically valid JSON but dynamically selecting the serializer - a completely different topic.

Which primarily leads me to think that I may have misunderstood the intent of the original question - in that you were not referring to syntactically invalid JSON but instead referring to the situation where the JSON doesn’t match the serializer being used.

In either case, can you possibly post a minimal example of JSON showing the issue you’re trying to address? That may help.

s-maibuecher · June 28, 2022, 10:27pm

Yes, of course.
Background is a form in our client where the user wants to create an new instance, let’s say an appointment, where he tries to schedule a repair-service at an adress.
This forms contains an address and a type of repair service and creates a new appointment. Now the user submits a numeric 123 instead of a string for the first name, syntactically incorrect.
Usually you validate input with a serializer and it now responds with 400, ValueError at first_name. Then the user corrects its input, submit the syntactically correct inputs and now gets a 409 with a businesslogic error which sais that this combination of postal code and type of repair service is not available.
The user corrects postal code or type of repair service and submits the form again, hopefully all validations will pass. At least 3 form submits.

What we want to have is following:
The user should not click twice on “Save” to get displayed both errors from above. Instead he should be provided with both error infos after the first submit (as long as it is possible, see below).
So our solution will create a response with all aggregated errors:

{
    "errors" : [
        {"type": "validation_error", "field": "first_name"},
        {"type": "business_logic_error", "field": ["repair_service", "postal_code"]}
    ]
}

Of course, if the user had also submitted an syntactically incorrect postal code, the business logic validation would not have been possible and therefore not called. Because it depends on the postal code. So the response will only be:

{
    "errors" : [
        {"type": "validation_error", "field": "first_name"},
        {"type": "validation_error", "field": "postal_code"},
    ]
}

Now the user can not avoid to submit the form at least two times.

Is this explanation more insightful? Is our approach correct?

KenWhitesell · June 28, 2022, 11:19pm

Actually, no - this is syntactically correct. It’s semantically incorrect.

Syntactic correctness means that the data is valid JSON. Semantic correctness means that the JSON lines up with whatever structure you are expecting it to have.

JSON itself is not typed. There’s nothing within a JSON object to specify that a given value is supposed to be either an integer or a string. (Notwithstanding the possibility that someone could have “123” as their legal name.)

That’s an indication of a semtantic error, not a syntax error. The serializer is validating the semantics of the JSON - the syntactical validation is occuring at an earlier stage.

So yes, this is a different issue than what I was addressing, and to the extent of my knowledge of DRF, I see nothing fundamentally wrong with your approach.

s-maibuecher · June 29, 2022, 3:48pm

Ok, I understand, sorry. But I learned a lot Thank you!

Topic		Replies	Views
Best practice for intercepting error responses in Django and DRF Mystery Errors	0	861	April 2, 2024
drf-standardized-errors: return the same response format for all 4xx and 5xx errors in your DRF API Show & Tell	0	1002	June 15, 2022
Raising ValidationError causes 500 Internal Server Error response Mystery Errors	2	840	July 23, 2024
Custom error messages and custom validation Forms & APIs	3	7071	August 6, 2024
Raising ValidationErrors from to_internal_value() Getting Started	0	47	October 3, 2024

DRF: premature business logic validations

Related topics