Django Kubernetes Scaling issue


I’m running a django (DRF) + gunicorn + postgrsql + django-rq project on Kubernetes (OVH public cloud)
I configured it with sync workers (16 core CPU → 2x16+1 workers / 30GB Ram)

gunicorn api.wsgi --workers=35 --timeout=900 --limit-request-line=0 --bind

I have a question and an issue.

  • Horizontal scaling and performance :

I measure the average response time of one of my API endpoint
With one django replica on node1 and 30 concurrency requests I have an average request response time of 386 ms
With two django replicas (the second on another node/machine) and 60 concurrency requests the average response time drops to 500 ms
I assumed that up to a certain point, scale the django replicas would allow to increase the number of concurrency requests without any performance drops

Am I wrong ?

  • Horizontal scaling and request lost/exceptions :

The other issue I have if when running 2 django replicas I lost some requests

When running one django replica on node1 and 30 concurrency requests no issue
When running 2 django replicas (one in node1 and one in node2) and 30 concurrency requests I sometimes have this kind of erros

django.db.utils.OperationalError: could not translate host name "pgbouncer" to address: Try again
redis.exceptions.ConnectionError: Error -3 connecting to redis:6379. Try again.
requests.exceptions.ConnectionError: HTTPConnectionPool

As if running tow replicas made Django to sometimes lost connection with the other pods (redis / postgrsql or other services we use)

I read and tried a lot fo things but I cannot figure out why these two things happen.
Neither what to change/try/modifiy to be able to scale Dajngo pods to increase the concurrency requests supported.

In both cases the CPU/memory is not full.

So if someone can explain the scaling issue and maybe knows why I got issue when running two replicas, or what I could try …?

Many thanks in advance