API Optimization

burakozcn01 · August 8, 2024, 11:41am

Hello developers,

I am developing a threat intelligence project. I am having some problems with optimization. I will be sharing my related codes. I am waiting for your optimization suggestions. I am currently using pagination.

views.py

class PasswordEntryViewSet(viewsets.ModelViewSet):
    """
    ViewSet for managing PasswordEntry records.
    """
    queryset = PasswordEntry.objects.all().order_by('-id')
    serializer_class = PasswordEntrySerializer
    filter_backends = [DjangoFilterBackend]
    filterset_class = PasswordEntryFilter

class DataLeakViewSet(viewsets.ModelViewSet):
    """
    ViewSet for managing DataLeak records.
    """
    queryset = DataLeak.objects.all().order_by('-id')
    serializer_class = DataLeakSerializer
    filter_backends = [DjangoFilterBackend]
    filterset_class = DataLeakFilter

serializers.py

class PasswordEntrySerializer(serializers.ModelSerializer):
    victim_comment = serializers.CharField(
        source='victim.comment', read_only=True
    )

    class Meta:
        model = PasswordEntry
        fields = '__all__'


class DataLeakSerializer(serializers.ModelSerializer):
    class Meta:
        model = DataLeak
        fields = '__all__'

filters.py


class PasswordEntryFilter(django_filters.FilterSet):
    url = django_filters.CharFilter(field_name='url', lookup_expr='icontains')
    username = django_filters.CharFilter(field_name='username', lookup_expr='icontains')

    class Meta:
        model = PasswordEntry
        fields = ['url', 'username']

class DataLeakFilter(django_filters.FilterSet):
    mail = django_filters.CharFilter(field_name='mail', lookup_expr='icontains')
    domain = django_filters.CharFilter(field_name='domain', lookup_expr='icontains')

    class Meta:
        model = DataLeak
        fields = ['mail', 'domain']

models.py

class Victim(models.Model):
    victim_id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
    comment = models.TextField()
    created_at = models.DateTimeField(auto_now_add=True)

    def __str__(self):
        return self.comment

class PasswordEntry(models.Model):
    url = models.URLField()
    username = models.CharField(max_length=255, db_index=True)  
    password = models.CharField(max_length=100)
    file_name = models.CharField(max_length=255, db_index=True, null=True, blank=True)
    uploaded_at = models.DateTimeField(null=True, blank=True)
    
    victim = models.ForeignKey('Victim', on_delete=models.CASCADE, related_name='combined_entries', null=True, blank=True)
    application = models.CharField(max_length=255, null=True, blank=True)

    class Meta:
        indexes = [
            models.Index(fields=['username']),
            models.Index(fields=['file_name', 'uploaded_at']),
        ]

    def __str__(self):
        return f'{self.username} - {self.url}'

class DataLeak(models.Model):
    email = models.CharField(max_length=255)
    domain = models.CharField(max_length=255)
    password = models.CharField(max_length=255)
    file_name = models.CharField(max_length=255, blank=True)
    uploaded_at = models.DateField(blank=True)

    class Meta:
        unique_together = ('email', 'domain', 'password')
        indexes = [
            models.Index(fields=['email']),
            models.Index(fields=['domain']),
            models.Index(fields=['uploaded_at']),
        ]

    def __str__(self):
        return f'{self.email}@{self.domain}'

onyeibo · August 9, 2024, 3:19am

What are you optimizing? What is the target? What is the current state?

burakozcn01 · August 9, 2024, 10:02am

I will optimize query performance. I will work with large data sets such as 1 billion

KenWhitesell · August 9, 2024, 12:46pm

If you’re really going to work with tables containing more than 1,000,000,000 rows, then your biggest contraints are going to be the database itself. You’ll at least need to be ready to partition these tables, and possibly shard them across multiple servers.

Optimizing a database at that scale is not a “cookbook type” solution. It’s not a case of “do this” then “do that” and everything’s fine.

You’ve got ancillary issues associated with doing backups and restores (Disaster Recovery and Business Continuity) that are also complicated by data of this size. Even the planning of doing a migrate raises a bunch of issues.

My suggestion would be to engage the services of a database consultant team, one that knows what they’re doing at that scale.

burakozcn01 · August 9, 2024, 1:35pm

Thank you. I am working with competent people in the database part. I wanted to consult whether we need any revision in the code in terms of performance.

Topic		Replies	Views
Optimization of database requests to reduce page load speeds Using the ORM	1	40	October 16, 2024
Update a specific value in serializer Forms & APIs	3	1618	July 27, 2023
How to increase performance of a serializer method Forms & APIs	3	2169	December 19, 2022
How to cache queryset results in ListAPIView when pagination, filtering, ordering enabled? Forms & APIs	2	1305	July 14, 2022
django-filter & pagination Using Django	8	9131	March 26, 2020

API Optimization

Related topics