Works Locally, Fails in Prod": battling Session Bloat (4096B Limit) & Silent Data Corruption in Django

Hi everyone,

I’m refactoring a legacy Django application (Django + uWSGI + PostgreSQL) handling complex multi-step questionnaires. We hit a critical stability wall that only occurs in Production, while the Local environment (runserver) works perfectly.

The Context & Environment:

  • Local: Works fine. Likely because test data is small, keeping the session cookie under the browser’s limit.

  • Production: Fails catastrophically. Real-world data volume (50+ answers) pushes the session payload over the edge.

The Symptoms:

  1. Infinite Loops: Users fill Page 1, click “Next”, and the form loops back.

  2. The Error: Sentry logs Cookie "sessionid" is invalid because its size is too big. Max size is 4096 B (we use signed_cookies).

  3. Worker Churn: uWSGI workers constantly restart (OOM/Timeouts) due to massive session serialization.

  4. Data Integrity: NULL values found in NOT NULL DB columns.

Root Cause Analysis:

  1. Session Abuse: The code hydrated all answers into a dict and dumped them into request.session. In Prod, this payload > 4KB.

  2. Forms.py Anti-patterns:

    • N+1 Queries: save() looped through fields doing Question.objects.get() for each.

    • Silent Failures: The save() method wrapped logic in try...except: pass. If the DB rejected data (IntegrityError), it failed silently.

The Fix: We refactored to a stateless architecture:

  1. Stop Session Hydration: Forms now fetch existing answers directly from the DB via pre-fetching.

  2. Atomic Transactions: Wrapped save() in transaction.atomic().

  3. Removed Silent Fail: Replaced pass with explicit ValidationError raising.

Question: Has anyone else seen signed_cookies act as a “silent killer” that only manifests when data volume scales up in Production?

Welcome @rhemzypm !

What session backend are you using? (Hopefully you’re not using the cookie-based backend. If you are, using something else would be my first recommendation. That change alone is likely to resolve these issues.)

yeah unfortunately the developer who developed this before using cookies based :). Is the issue I’m experiencing caused solely by that?

image

The symptoms you describe are consistent with that, yes. It would be the first thing I would change.

1 Like