[Django 4.2] APIView perform_authentication sometimes takes over 29s

We’re running a production Django 4.2 WSGI Server hosted on Heroku. One of our endpoints take at most 2s to respond. However, we have started noticing couple of requests take more than 30s, hence resolving to 503. We did a little bit of profiling and observed perform_authentication method in APIView took 29.27s when the issue occurs.

class DebugAPIView(APIView):
    def initial(self, request, *args, **kwargs):
        logger.info("Starting APIView initial method")
        start_time = time.time()
        result = super().initial(request, *args, **kwargs)
        logger.info(f"APIView initial method took {time.time() - start_time} seconds")
        return result
        
    def perform_authentication(self, request):
        start_time = time.time()
        result = super().perform_authentication(request)
        logger.info(f"Authentication took {time.time() - start_time} seconds")
        return result
        
    def check_permissions(self, request):
        start_time = time.time()
        result = super().check_permissions(request)
        logger.info(f"Permission checks took {time.time() - start_time} seconds")
        return result
        
    def check_throttles(self, request):
        start_time = time.time()
        result = super().check_throttles(request)
        logger.info(f"Throttle checks took {time.time() - start_time} seconds")
        return result
        
    def dispatch(self, request, *args, **kwargs):
        logger.info("Starting APIView dispatch method")
        start_time = time.time()
        result = super().dispatch(request, *args, **kwargs)
        logger.info(f"APIView dispatch method took {time.time() - start_time} seconds")
        return result
Authentication took 29.27 seconds.

Then, we dug a bit deeper and profiled on get_user under AuthenticationMiddleware because this method was invoked whenever request.user lazy object is called.

def get_user(request):
    log_request(request, "Starting get_user...")
    if not hasattr(request, "_cached_user"):
        log_request(request, "request._cached_user is not set. Setting it...")
        get_user_start_time = time.time()
        request._cached_user = debug_get_user(request)
        get_user_duration = time.time() - get_user_start_time
        log_request(request, f"debug_get_user took {get_user_duration:.2f} seconds.")

    log_request(request, f"End of get_user. Returning {request._cached_user}")
    return request._cached_user

We saw the End of get_user and see that debug_get_user took line:

debug_get_user took 0.03 seconds 

Has anyone encountered this issue before?

Hello there! Here are some feedbacks:

That’s quite a high time already.

Well, that’s most likely related not to DRF itself, but what’s it’s trying to do perform a database call, are you only facing this issue accessing the application hosted on heroku or is that also happening locally? If only on Heroku, where is the database being hosted?
One thing that might help into further dig what might be the culprit for this slowdown is using django-debug-toolbar, I wouldn’t do this on a production environment though.

Thank you for your response.

That’s quite a high time already.

Yes we have a pretty complicated process for this endpoint, which is quite tough to reduce latency further

are you only facing this issue accessing the application hosted on heroku or is that also happening locally?

We saw this on production server hosted in Heroku. We’re not able to reproduce this locally unfortunately.

If only on Heroku, where is the database being hosted?

Our database is hosted in AWS RDS. We have performance insights and saw very fair load, not to the point of it screaming.

One thing that might help into further dig what might be the culprit for this slowdown is using django-debug-toolbar

We actually have Scout APM enabled for Django and saw in most of the slow to 503 cases, it stuck in “View” for a very long time. Essentially, all the middlewares have been completed.