Hello,
I’m running a django (DRF) + gunicorn + postgrsql + django-rq project on Kubernetes (OVH public cloud)
I configured it with sync workers (16 core CPU → 2x16+1 workers / 30GB Ram)
gunicorn api.wsgi --workers=35 --timeout=900 --limit-request-line=0 --bind 0.0.0.0:8000
I have a question and an issue.
- Horizontal scaling and performance :
I measure the average response time of one of my API endpoint
With one django replica on node1 and 30 concurrency requests I have an average request response time of 386 ms
With two django replicas (the second on another node/machine) and 60 concurrency requests the average response time drops to 500 ms
I assumed that up to a certain point, scale the django replicas would allow to increase the number of concurrency requests without any performance drops
Am I wrong ?
- Horizontal scaling and request lost/exceptions :
The other issue I have if when running 2 django replicas I lost some requests
When running one django replica on node1 and 30 concurrency requests no issue
When running 2 django replicas (one in node1 and one in node2) and 30 concurrency requests I sometimes have this kind of erros
django.db.utils.OperationalError: could not translate host name "pgbouncer" to address: Try again
redis.exceptions.ConnectionError: Error -3 connecting to redis:6379. Try again.
requests.exceptions.ConnectionError: HTTPConnectionPool
As if running tow replicas made Django to sometimes lost connection with the other pods (redis / postgrsql or other services we use)
I read and tried a lot fo things but I cannot figure out why these two things happen.
Neither what to change/try/modifiy to be able to scale Dajngo pods to increase the concurrency requests supported.
In both cases the CPU/memory is not full.
So if someone can explain the scaling issue and maybe knows why I got issue when running two replicas, or what I could try …?
Many thanks in advance