Why YOLO model loading for every call and how can I solve?

Django is fundamentally built around the idea of the “request / response” cycle. Objects are created when the request is received, and disposed when the response is returned.

In a production-quality deployment of Django, you also have multiple processes running. There’s no such thing as “sharing an object between processes in memory”. Additionally, the process manager will, based upon circumstances, restart any individual process.

If you want “persistent entities” between requests in a production Django environment, you want them outside the Django process.

That’s why, for example, each of Celery and Celery Beat are run as separate processes.