Performance and Timeout Issues on My Website During API Requests in Django

I am currently facing a persistent and highly disruptive issue with my website that is built using Django, and it has become increasingly difficult to maintain reliable performance for users. The core problem revolves around API requests that fetch dynamic data for my site’s core functionality. While the application works well in a development environment, in production, API calls occasionally time out or take an unusually long time to complete, resulting in delayed page rendering, broken sections, and a degraded user experience. The site relies heavily on real-time data for menus, pricing, and availability, and these delays directly impact the accuracy and responsiveness of the interface.

The problem is particularly pronounced when multiple users access the site simultaneously. Under moderate traffic, certain API calls either fail to complete or return incomplete data, causing exceptions in Django views that rely on this data. Even though the backend processes complete successfully under local testing, in the production environment deployed on a cloud server, the same requests frequently result in Timeout or ConnectionError exceptions. Logging shows that the requests reach external services, but responses are either delayed beyond the configured timeout or partially received, causing downstream code to break. This behavior is inconsistent and difficult to reproduce locally, making debugging very challenging.

One complicating factor appears to be database queries triggered by these API requests. Some endpoints fetch related objects and perform complex joins to aggregate menu items, user preferences, and pricing information. While query optimization and select_related/prefetch_related strategies have been applied, response times are still significantly longer in production. Profiling indicates that certain queries, particularly those joining multiple tables for frequently accessed menu items, can take several seconds under load. Although caching layers such as Redis and Django’s per-view caching are in place, the caching does not fully mitigate the delays, suggesting that either cache misses or high concurrency levels are contributing to the problem.

Another aspect of the issue is related to middleware and request handling. The site uses custom middleware for logging, authentication, and request throttling. While these components function correctly individually, during high concurrency periods, they introduce additional latency, compounding the existing delays from database and API calls. Furthermore, some middleware functions attempt to modify response headers based on the API results, but when the API call fails or times out, exceptions propagate, leading to incomplete responses or HTTP 500 errors being returned to users. This makes it extremely difficult to guarantee reliability and stability of the web pages under load.

I have also noticed that deployment environment differences may exacerbate the problem. In production, the server runs behind Nginx with Gunicorn workers, whereas in development, Django’s built-in server is used. Profiling and logs indicate that under heavy load, worker processes occasionally get blocked waiting for external API responses or database connections, causing other requests to queue. This leads to cascading delays where even unrelated endpoints experience slower response times, creating a systemic performance issue rather than isolated slow API calls. Adjusting worker counts and timeout settings helps slightly, but does not fully resolve the problem.

I am seeking guidance from the Django community on best practices to ensure that my website can handle API requests reliably without timing out, even under concurrent load. I would greatly appreciate advice on optimizing database queries, using asynchronous request handling, improving middleware performance, or implementing robust retry and fallback mechanisms for external API calls. Additionally, any guidance on proper deployment configurations for Gunicorn, Nginx, and caching strategies to stabilize request handling under high concurrency would be extremely valuable. My ultimate goal is to ensure that my website delivers accurate and timely content to users consistently, even when dealing with dynamic API-driven data and complex database queries. Very sorry for long post!

The first suggestion I would make here would be to not limit your research to only Django-related topics.

This is the type of situation where you want to look at this from a system-perspective. Look at everything - memory usage, cpu utilization, configuration settings of your database, etc.

Don’t ignore any aspect of the environment.

You will want to create an architectural match of your production environment to use as a testing target and as a basis of comparision for changes you may make.

You’ll also want to be in a position where you can test by removing specific operations to try and identify the specific area causing the issues.

You might also want to become familiar with tools suitable for stress-testing your environment, such as jmeter. (There are others, but it’s the one I’m most familiar with.)

This behavior is inconsistent and difficult to reproduce locally, making debugging very challenging.

Are you using an exception tracking tool like sentry? It’s very helpful for debugging, although not perfect to provide all the data required to reproduce issues.

One complicating factor appears to be database queries triggered by these API requests

Are you using using any observability? You should be able to see the latency in all of your requests, and then start digging into those bottlenecks. Open telemetry is an open source example, but there are many saas providers (including sentry)

Profiling and logs indicate that under heavy load, worker processes occasionally get blocked waiting for external API responses or database connections, causing other requests to queue

Traditionally you find your latency bottlenecks to improve your throughput on a single worker. If your single worker throughput is not capable of handling your expected load, then it requires scaling horizontally. You can use a load testing tool like locust to test your system.

when the API call fails or times out, exceptions propagate, leading to incomplete responses or HTTP 500 errors being returned to users. This makes it extremely difficult to guarantee reliability and stability of the web pages under load. Depending on

This sounds like an architecture problem. It’s hard to give specific advice without understanding your application. It might mean that the simple server side request/response architecture won’t work for your expected stability/ux. From a backend only perspective, if it makes sense, one general way to solve this problem could be to move the api calls out of the request cycle. You might be able to put the api calls on an external “cron job” (with retries, etc) which store the results in the database and then query the database at request time (instead of making the api calls). The downside to this type of approach is that the data is potentially stale until the job is run, but there are always going to be tradeoffs in a data sync problem.

I would look at the EXPLAIN for those queries, either from django-debug-toolbar’s SQL panel, or in a shell with calling .explain() on the querysets. You’ll see what costs you might be paying needlessly.

Creating an architectural match of the production environment for testing is something I haven’t done properly yet. Most of my testing has been either local or partial staging, which doesn’t fully replicate Gunicorn, Nginx, database load, or concurrency conditions. I can see how without that parity, I’m essentially debugging blind.

The suggestion to isolate and remove specific operations is also valuable. I’ll start by:

  • Temporarily disabling external API calls to measure baseline response times.

  • Profiling database-heavy endpoints independently.

  • Disabling non-essential middleware to see how much latency it introduces.

  • Monitoring worker utilization and database connection pools under load.