Using Cursor Pagination for JSON Data (e.g., S3 File Content)

Hi everyone,

I’m working on a use case where I need to paginate large files stored in S3 eg, (csv, txt, parquet, tsv) formats. These files are parsed into JSON and served through a DRF API.

The S3 file content is parsed into JSON in sorted order by a "UID" field, so every row will contain a "UID" like this:

// example.csv response
"data": [
   { 
      "UID": 1, "image_path": "...", "label": "Disgust"
   },
   { 
      "UID": 2, "image_path": "...", "label": "Disgust"
   }   ...
 ]

For performance and memory efficiency, (CursorPagination) is ideal and works well with querysets via DRF. However, DRF’s built-in CursorPagination requires order_by and doesn’t support plain in-memory JSON data (I tried and faced errors). In contrast, PageNumberPagination works for both querysets and lists.

My Django project includes 10 APIs, where 9 use database-backed querysets,(CursorPagination is best for this case), but 1 paginates parsed JSON from S3. I want a single consistent pagination approach across all APIs.

To achieve this, I’ve implemented a custom Cursor pagination class that supports both DRF CursorPagination for querysets and a cursor-like approach for JSON data.
Here’s the code:

class DRFCursorPagination(CursorPagination):
    """
    Supports both (JSON custom Cursor pagination) and (DRF default Cursor pagination).

    Encodes/decodes cursors based on UID's for navigation.
    Provides paginated responses with next/previous cursor links.
    """

    page_size = 10
    # Ordering the records
    ordering = 'id'

    # DRF Cursor Pagination response
    def get_paginated_response(self, data: list) -> Response:
        return custom_response(
            success=True,
            message=DATA_SUCCESS if data else DATA_NONE,
            data=data,
            status=200,
            pagination={
                "next": self.get_next_link(),
                "previous": self.get_previous_link(),
            },
        )

    # Json Cursor Pagination
    def json_encode_cursor(self, uid):
        # Encodes a UID into a base64 cursor string.
        return base64.urlsafe_b64encode(str(uid).encode()).decode()

    def json_decode_cursor(self, cursor):
        # Decodes a base64 cursor string back into a UID (integer).
        return int(base64.urlsafe_b64decode(cursor).decode())

    def paginate_json(self, data: list[dict], request):
        # Applies cursor-based pagination to a json list.
        current_cursor = request.query_params.get("cursor")
        prev_cursor, next_cursor = None, None
        start_index = 0

        if current_cursor: # If cursor provided, find the starting index for pagination.
            cursor_uid = self.json_decode_cursor(current_cursor)
            # Set the start-index after decoding the cursor value.
            start_index = int(cursor_uid)
            if start_index > 1: # Generating previous cursor value.
                prev_cursor = self.json_encode_cursor(start_index - self.page_size)

        # slicing or paginating the data.
        page = data[start_index:start_index + self.page_size]

        # set next cursor if more data exists
        if len(page) == self.page_size and page:
            next_cursor = self.json_encode_cursor(page[-1]["UID"])

        return page, prev_cursor, next_cursor

    def get_json_cursor_paginated_response(self, data, prev_cursor, next_cursor):
        # formats and returns a paginated response
        return custom_response(
            success=True,
            message=DATA_SUCCESS if data else DATA_NONE,
            data=data,
            status=200,
            pagination={
                "next": f"?cursor={next_cursor}" if next_cursor else None,
                "previous": f"?cursor={prev_cursor}" if prev_cursor else None,
            },
        )

Note: I’m aware that AWS S3 Select or querying at the S3 level could be used to filter or limit data more efficiently. However, my goal here is to understand whether this can be handled purely within Django REST Framework’s pagination system to maintain a same approach across all endpoints.

I’m looking for advice or feedback on the approach:

  • Has anyone implemented cursor-like pagination for plain JSON lists before?

  • Any improvements you’d suggest? Would it be appropriate to integrate this pattern more tightly with DRF, or is there a better practice?

Thanks in advance,
Raza