Reliability of MD5 database function

I have a model that represents a program written by the user in some programming language and the results of executing it against some test cases.

The model looks like this:

class UserProgram(models.Model):
     code = models.TextField(blank=True)
     execution_results = models.JSONField(blank=True, null=True)

The user first types some code, which is autosaved by the frontend into the code field, then at some point they click a “Run” button, which causes a task to be scheduled which runs the current code inside of a sandbox, which returns a JSON object which details the results of the execution (the fields of this dict aren’t relevant to my question), which is ultimately saved into the execution_results field.

I want to be able to check whether the execution results object refers to the latest version of the code, i.e. the current value of the code field. This is used to check if, so to speak, the saved results are up to date or whether the user has made modifications to the code since the last time it was run which haven’t been run yet.

One solution could be to just add a field named code inside of execution_results which just contains the code the results refer to, and then I could compare that value to the field code.

However, that would take up too much space. What I thought is I can store the MD5 of the code inside of execution_results and use that for comparison.

So, each time I get the results object from the sandbox, I do this before saving the object to the model’s field:

execution_results["code_md5"] = hashlib.md5(code.encode("utf-8")).hexdigest()

Then, in order to get the user programs which either haven’t been run or whose last execution refers to an out-of-date code, I run this query:

from django.db.models.functions import MD5

UserProgram.objects.filter(
            (
                Q(execution_results__isnull=True)
                # answer was updated since the last time it was run without running again
                | ~Q(execution_results__code_md5=MD5(F("code")))
            )
)

Which generates this query:

SELECT
    "userprogram"."id",
    # other fields ...
FROM
    "userprogram"
WHERE
    (
        (
            "userprogram"."execution_results" IS NULL
            OR NOT (
                (
                    CASE
                        WHEN JSON_TYPE(
                            "userprogram"."execution_results",
                            $."code_md5"
                        ) IN ('null', 'true', 'false') THEN JSON_TYPE(
                            "userprogram"."execution_results",
                            $."code_md5"
                        )
                        ELSE JSON_EXTRACT(
                            "userprogram"."execution_results",
                            $."code_md5"
                        )
                    END
                ) = (
                    MD5("userprogram"."code")
                )
                AND "userprogram"."execution_results" IS NOT NULL
            )
        )
    )

Since I’m not very familiar with db functions, much less the MD5 one, my question is: is this method reliable? Is it reliable across all the major db backends? Do I have a guarantee, or any way of guaranteeing, that the MD5 function I used in Python code to hash the code before saving the results object and the one my db will use for the query are the same?

Yes, the MD5 hash function is a “well-defined” standard algorithm. If two different systems create two different results for the same input, then one of them hasn’t implemented MD5 correctly.

See MD5 - Wikipedia (among any number of other sources…)

1 Like

Thank you.

This approach seems to work in local using sqlite3, but I just tested it on postgres and I’m getting this error:

HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.

Full stacktrace:

2022-11-09T14:25:37.629415978Z app[web.1]: Internal Server Error: /courses/7/events/JVyAoVn/
2022-11-09T14:25:37.629459481Z app[web.1]: Traceback (most recent call last):
2022-11-09T14:25:37.629466482Z app[web.1]:   File "/app/.heroku/python/lib/python3.8/site-packages/django/db/backends/utils.py", line 89, in _execute
2022-11-09T14:25:37.629480583Z app[web.1]:     return self.cursor.execute(sql, params)
2022-11-09T14:25:37.629484783Z app[web.1]: psycopg2.errors.UndefinedFunction: operator does not exist: jsonb = text
2022-11-09T14:25:37.629488784Z app[web.1]: LINE 1: ...icipationslot"."execution_results" -> 'code_md5') = (MD5("co...
2022-11-09T14:25:37.629493584Z app[web.1]:                                                              ^
2022-11-09T14:25:37.629497385Z app[web.1]: HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.
2022-11-09T14:25:37.629501685Z app[web.1]: 
2022-11-09T14:25:37.629505285Z app[web.1]: 
2022-11-09T14:25:37.629515186Z app[web.1]: The above exception was the direct cause of the following exception:
2022-11-09T14:25:37.629519386Z app[web.1]: 
2022-11-09T14:25:37.629523087Z app[web.1]: Traceback (most recent call last):
2022-11-09T14:25:37.629526887Z app[web.1]:   File "/app/.heroku/python/lib/python3.8/site-packages/django/core/handlers/exception.py", line 55, in inner
2022-11-09T14:25:37.629530987Z app[web.1]:     response = get_response(request)
2022-11-09T14:25:37.629534688Z app[web.1]:   File "/app/.heroku/python/lib/python3.8/site-packages/django/core/handlers/base.py", line 197, in _get_response
2022-11-09T14:25:37.629549489Z app[web.1]:     response = wrapped_callback(request, *callback_args, **callback_kwargs)
2022-11-09T14:25:37.629559390Z app[web.1]:   File "/app/.heroku/python/lib/python3.8/contextlib.py", line 75, in inner
2022-11-09T14:25:37.629564090Z app[web.1]:     return func(*args, **kwds)
2022-11-09T14:25:37.629567990Z app[web.1]:   File "/app/.heroku/python/lib/python3.8/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
2022-11-09T14:25:37.629571991Z app[web.1]:     return view_func(*args, **kwargs)
2022-11-09T14:25:37.629575891Z app[web.1]:   File "/app/.heroku/python/lib/python3.8/site-packages/rest_framework/viewsets.py", line 125, in view
2022-11-09T14:25:37.629579891Z app[web.1]:     return self.dispatch(request, *args, **kwargs)
2022-11-09T14:25:37.629595493Z app[web.1]:   File "/app/.heroku/python/lib/python3.8/site-packages/rest_framework/views.py", line 509, in dispatch
2022-11-09T14:25:37.629599493Z app[web.1]:     response = self.handle_exception(exc)
2022-11-09T14:25:37.629602893Z app[web.1]:   File "/app/.heroku/python/lib/python3.8/site-packages/rest_framework/views.py", line 469, in handle_exception
2022-11-09T14:25:37.629606494Z app[web.1]:     self.raise_uncaught_exception(exc)
2022-11-09T14:25:37.629609794Z app[web.1]:   File "/app/.heroku/python/lib/python3.8/site-packages/rest_framework/views.py", line 480, in raise_uncaught_exception
2022-11-09T14:25:37.629613294Z app[web.1]:     raise exc
2022-11-09T14:25:37.629616694Z app[web.1]:   File "/app/.heroku/python/lib/python3.8/site-packages/rest_framework/views.py", line 506, in dispatch
2022-11-09T14:25:37.629620295Z app[web.1]:     response = handler(request, *args, **kwargs)
2022-11-09T14:25:37.629624795Z app[web.1]:   File "/app/.heroku/python/lib/python3.8/site-packages/rest_framework/mixins.py", line 82, in partial_update
2022-11-09T14:25:37.629628495Z app[web.1]:     return self.update(request, *args, **kwargs)
2022-11-09T14:25:37.629631896Z app[web.1]:   File "/app/.heroku/python/lib/python3.8/site-packages/rest_framework/mixins.py", line 68, in update
2022-11-09T14:25:37.629635396Z app[web.1]:     self.perform_update(serializer)
2022-11-09T14:25:37.629639496Z app[web.1]:   File "/app/.heroku/python/lib/python3.8/site-packages/rest_framework/mixins.py", line 78, in perform_update
2022-11-09T14:25:37.629643097Z app[web.1]:     serializer.save()
2022-11-09T14:25:37.629646497Z app[web.1]:   File "/app/.heroku/python/lib/python3.8/site-packages/rest_framework/serializers.py", line 207, in save
2022-11-09T14:25:37.629650097Z app[web.1]:     self.instance = self.update(self.instance, validated_data)
2022-11-09T14:25:37.629653497Z app[web.1]:   File "/app/.heroku/python/lib/python3.8/site-packages/rest_framework/serializers.py", line 1006, in update
2022-11-09T14:25:37.629657398Z app[web.1]:     instance.save()
2022-11-09T14:25:37.629660698Z app[web.1]:   File "/app/courses/models.py", line 893, in save
2022-11-09T14:25:37.629664298Z app[web.1]:     return super().save(*args, **kwargs)
2022-11-09T14:25:37.629667599Z app[web.1]:   File "/app/.heroku/python/lib/python3.8/contextlib.py", line 75, in inner
2022-11-09T14:25:37.629671199Z app[web.1]:     return func(*args, **kwds)
2022-11-09T14:25:37.629674499Z app[web.1]:   File "/app/.heroku/python/lib/python3.8/site-packages/django_lifecycle/mixins.py", line 175, in save
2022-11-09T14:25:37.629677999Z app[web.1]:     self._run_hooked_methods(AFTER_UPDATE, **kwargs)
2022-11-09T14:25:37.629681400Z app[web.1]:   File "/app/.heroku/python/lib/python3.8/site-packages/django_lifecycle/mixins.py", line 289, in _run_hooked_methods
2022-11-09T14:25:37.629685100Z app[web.1]:     method.run(self)
2022-11-09T14:25:37.629688400Z app[web.1]:   File "/app/.heroku/python/lib/python3.8/site-packages/django_lifecycle/mixins.py", line 31, in run
2022-11-09T14:25:37.629692001Z app[web.1]:     self.method(instance)
2022-11-09T14:25:37.629695301Z app[web.1]:   File "/app/.heroku/python/lib/python3.8/site-packages/django_lifecycle/decorators.py", line 119, in func
2022-11-09T14:25:37.629706802Z app[web.1]:     hooked_method(*args, **kwargs)
2022-11-09T14:25:37.629710402Z app[web.1]:   File "/app/courses/models.py", line 882, in on_close
2022-11-09T14:25:37.629713902Z app[web.1]:     if slots_to_run.exists():
2022-11-09T14:25:37.629717303Z app[web.1]:   File "/app/.heroku/python/lib/python3.8/site-packages/django/db/models/query.py", line 892, in exists
2022-11-09T14:25:37.629720903Z app[web.1]:     return self.query.has_results(using=self.db)
2022-11-09T14:25:37.629724303Z app[web.1]:   File "/app/.heroku/python/lib/python3.8/site-packages/django/db/models/sql/query.py", line 589, in has_results
2022-11-09T14:25:37.629728004Z app[web.1]:     return compiler.has_results()
2022-11-09T14:25:37.629731304Z app[web.1]:   File "/app/.heroku/python/lib/python3.8/site-packages/django/db/models/sql/compiler.py", line 1329, in has_results
2022-11-09T14:25:37.629734804Z app[web.1]:     return bool(self.execute_sql(SINGLE))
2022-11-09T14:25:37.629738604Z app[web.1]:   File "/app/.heroku/python/lib/python3.8/site-packages/django/db/models/sql/compiler.py", line 1361, in execute_sql
2022-11-09T14:25:37.629742305Z app[web.1]:     cursor.execute(sql, params)
2022-11-09T14:25:37.629745605Z app[web.1]:   File "/app/.heroku/python/lib/python3.8/site-packages/django/db/backends/utils.py", line 67, in execute
2022-11-09T14:25:37.629749105Z app[web.1]:     return self._execute_with_wrappers(
2022-11-09T14:25:37.629752406Z app[web.1]:   File "/app/.heroku/python/lib/python3.8/site-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers
2022-11-09T14:25:37.629756006Z app[web.1]:     return executor(sql, params, many, context)
2022-11-09T14:25:37.629759506Z app[web.1]:   File "/app/.heroku/python/lib/python3.8/site-packages/django/db/backends/utils.py", line 89, in _execute
2022-11-09T14:25:37.629763206Z app[web.1]:     return self.cursor.execute(sql, params)
2022-11-09T14:25:37.629766507Z app[web.1]:   File "/app/.heroku/python/lib/python3.8/site-packages/django/db/utils.py", line 91, in __exit__
2022-11-09T14:25:37.629770007Z app[web.1]:     raise dj_exc_value.with_traceback(traceback) from exc_value
2022-11-09T14:25:37.629773407Z app[web.1]:   File "/app/.heroku/python/lib/python3.8/site-packages/django/db/backends/utils.py", line 89, in _execute
2022-11-09T14:25:37.629776908Z app[web.1]:     return self.cursor.execute(sql, params)
2022-11-09T14:25:37.629780208Z app[web.1]: django.db.utils.ProgrammingError: operator does not exist: jsonb = text
2022-11-09T14:25:37.629783608Z app[web.1]: LINE 1: ...icipationslot"."execution_results" -> 'code_md5') = (MD5("co...
2022-11-09T14:25:37.629787408Z app[web.1]:                                                              ^
2022-11-09T14:25:37.629790809Z app[web.1]: HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.

This seems to be an issue with the query generated by Django, but I couldn’t find any “gotcha” about the MD5 function in postgres. Do you have a clue what could be causing this?

Apparently, all I needed was a cast, just like the hint suggested.

Got it working with this code; I’m putting it out here just in case someone bumps into the same problem in the future:

UserProgram.objects.annotate(
            # explicit cast to text needed for postgres
            code_md5_as_text=Cast("execution_results__code_md5", models.TextField())
        ).filter(
            (
                Q(execution_results__isnull=True)
                # answer was updated since the last time it was run without running again
                | ~Q(code_md5_as_text=MD5(F("answer_text")))
            )
        )