How to release the cache used by prefetch_related_objects

ShudanY · September 6, 2024, 4:54pm

After this code, the memory usage increases:
models.prefetch_related_objects(chunk, *prefetch_fields)
Django seems not release the cache, how to release the cache explicitly?

anefta · September 8, 2024, 2:25pm

Hi, please show your models, templates, views in order to understand your problem.

Until then try something like that with cache.clear():

from django.core.cache import cache

# Your code
models.prefetch_related_objects(chunk, *prefetch_fields)

# Clear the cache explicitly
cache.clear()

ShudanY · September 9, 2024, 9:46am

Tried cache.clear(), that didn’t work.
The purpose of my code is to export a large amount of data from Database (PostgreSQL), the problem is that the memory increases based on the size of the data to export, and after completing the export, the memory is still not released. Each row of the data exported is large, contains large arrays.

Tried to release the cache for the model, but failed:
cache.clear()
or models._prefetched_objects_cache = {}

code snippets:

from django.db import models 
...
dump(export.zip_handle, export.filename, chunk_count, _iter_data(
                progress=progress,
                counts_offset=counts_offset,
                counts_total=counts_total,
                qs=export.qs,
            ))
...
def _iter_data(
        *,
        progress: models.TaskProgress,
        counts_offset: int,
        counts_total: int,
        qs: models.QuerySet,
) -> Iterable[list[models.Model]]:
    """
    Iterates over the given queryset in efficient chunks.
    """
    paginator = Paginator(qs, settings.SAFE_PAGE_SIZE)
    # Fetch chunked.
    n = 0
    for page_number in paginator.page_range:
        page = paginator.page(page_number)
        yield page.object_list
        # Report progress.
        n += len(page.object_list)
        progress.set(int((counts_offset + n) / counts_total * 100))

def dump(self, zip_handle: AESZipFile, filename: str, chunk_count: int,
             chunks: Iterable[Sequence[models.T]]) -> None:        

        def do_dump() -> Iterable[Iterable[dict[str, Any]]]:
            serializer = self.get_serializer()
            prefetch_fields = serializer.get_prefetch_fields(key_only=True)
            
            for chunk in chunks:
                models.prefetch_related_objects(chunk, *prefetch_fields)
                yield map(serializer.to_representation, chunk)
                
            # Tried to release the cache for the model, but failed
            models._prefetched_objects_cache = {}

        self.encode(zip_handle, filename, chunk_count, do_dump())

  def encode(self, zip_handle: AESZipFile, filename: str, chunk_count: int,
             chunks: Iterable[Iterable[dict[str, Any]]]) -> None:
      path = pathlib.Path(filename)
      index_digits = len(str(chunk_count))
      csv_fields = self._csv_fields
      dumps = json.dumps
      index = 1
      for chunk in chunks:
          chunk_filename = filename if index == 1 else f"{path.stem}_{index:0{index_digits}}{path.suffix}"
          with zip_handle.open(chunk_filename, "w", force_zip64=True) as handle:
              with storage.text_wrapper(handle, newline="") as text_handle:
                  writer = csv.writer(text_handle)
                  writer.writerow(item[0] for item in csv_fields)
                  writer.writerows(
                      (row[field_name] if is_raw else dumps(row[field_name]) for field_name, is_raw in csv_fields)
                      for row in chunk
                  )
          index += 1
          del chunk
      del chunks

anefta · September 10, 2024, 12:23am

Where are your models?

Perhaps you could use smaller chunk sizes by limiting the number of objects held in memory at any time.

Something like that:

SAFE_PAGE_SIZE = 100  # Play with this value

def _iter_data(
        *,
        progress: models.TaskProgress,
        counts_offset: int,
        counts_total: int,
        qs: models.QuerySet,
) -> Iterable[list[models.Model]]:
    paginator = Paginator(qs, SAFE_PAGE_SIZE)
    n = 0
    for page_number in paginator.page_range:
        page = paginator.page(page_number)
        yield page.object_list
        n += len(page.object_list)
        progress.set(int((counts_offset + n) / counts_total * 100))

Topic		Replies	Views
Exists a way to check if related objects has been prefetched? Using the ORM	10	6741	February 15, 2022
prefetch_related_objects, how to use, how to work? Using the ORM	15	280	August 28, 2024
Common pattern for prefetched and filtered related objects. Using the ORM	0	2106	October 19, 2023
Optimizing prefetch_related for "repeated" values Using the ORM	2	3298	February 19, 2022
DRF and nested serialisers optimisation with prefect_related() Using Django	2	10286	September 10, 2020

How to release the cache used by prefetch_related_objects

Related topics