How to use Gevent spawn in view function and use same single db connection

Hi, I’ve came up with a solution, that works for me, IDK if it’s ok to use one, but works for me so far.
So I have multiple queries and wanted to “parallelize” them using gevent spawn.
But I started seeing in logs multiple new db connections are created.
Django uses thread local variable for storing db connection, and gevent spawn makes new thread local for each greenlet, so Django thinks it’s running in new thread and creates new connection.
So I made a little decorator, that passes db, cache connection inside of new greenlet.
I can pass them safely because connection of psycopg is thread-safe. But I have to create new instance of db_wrapper for each greenlet db_wrapper = db.__class__(db.settings_dict, alias=db.alias), because db_wrapper can’t be used on another thread
It looks like a hack, but may be useful as a starting point for your own tests & experiments.

So now I can wrap func with the decorator

get_filtered_options = gevent_django_db(view.get_filtered_options)

then call them like this

available_brands = get_filtered_options("brands", "brand_id")
available_colors = get_filtered_options("colors", "color_recognized")
et_values = get_filtered_options("et_length", ignore=["etmin", "etmax"], order_by=["et_length"])

code below

import gevent
from django.db import connection
from django.core.cache import caches
def gevent_django_db(func):
    """
    Decorator to run a function in a gevent greenlet while sharing the existing
    Django DB connection (avoids creating a new one in the greenlet).
    """
    def wrapper(*args, **kwargs):
        db = connection._connections['default']
        cache_conn = caches['default']
        
        def runner(*args, **kwargs):
            # Create a wrapper DB instance sharing the existing connection
            db_wrapper = db.__class__(db.settings_dict, alias=db.alias)
            db_wrapper.connection = db.connection
            
            # Temporarily swap the global connection
            original_conn = connection._connections['default']
            connection._connections['default'] = db_wrapper

            caches['default'] = cache_conn
            try:
                # Call the original function
                return func(*args, **kwargs)
            finally:
                # Always restore the original connection
                connection._connections['default'] = original_conn
        
        # Spawn the greenlet and return it
        return gevent.spawn(runner, *args, **kwargs)
    
    return wrapper

I’m curious - do you have any numbers showing an actual benefit to doing this?

Even psycopg3 doesn’t parallelize queries on a single connection.

Quoting directly from: Concurrent operations - psycopg 3.3.0.dev2 documentation

Query execution and results retrieval on a connection is serialized: only one cursor at time will be able to run a query on the same connection (the Connection object will coordinate different cursors’ access).

and

However, if you expect to execute massively parallel operations on the database, it might be useful to use more than one connection at time, rather than many cursors on the same connection (or a mix of both).

(This basic idea is also true for psycopg2.)

As a result, I’d be very surprised if you saw any improvement in the runtime for these three “concurrent” queries. In fact, I’d expect that you’d find the actual results to be worse.

Yes, sir. There were 11 sequential queries taking 3s to complete.
With gevent “parallelizing” it takes about 800 ms, time of the longest query plus some extra ms.

What is this function?

Is it a function that returns an evaluated queryset?

Yes it is,
get_filtered_options returns list of values, from cache or list(qs)