I’m building a CapCut-related resource website using Django, where users can browse tutorials, upload templates, and access video editing tips. The backend handles user authentication, file uploads, and API integrations to fetch CapCut project data. However, I’m encountering several technical issues:
File Upload Issues: Users can upload CapCut templates (large .json or .zip files), but I’m intermittently getting Request Data Too Large errors. I’ve already adjusted DATA_UPLOAD_MAX_MEMORY_SIZE and FILE_UPLOAD_MAX_MEMORY_SIZE, but the issue persists for larger files.
Asynchronous API Calls: The site makes frequent calls to external APIs to fetch CapCut project data, but the requests block the main thread, slowing down the site. I’ve tried using Django Channels, but the implementation feels overly complex. Is there a simpler way to handle asynchronous API requests?
Dynamic Content Loading: I want to display CapCut project previews using AJAX, but the CSRF token validation fails for some users, even though the token is correctly included in the headers.
Database Optimization: The site’s database (PostgreSQL) stores CapCut template metadata, but complex queries (e.g., filtering by category or popularity) are causing significant performance drops. I’ve indexed the necessary fields, but the issue remains during peak traffic.
Here’s my tech stack:
Django 4.2
PostgreSQL for the database
Hosted on AWS EC2 with Nginx and Gunicorn
Steps I’ve Taken:
Increased server memory and fine-tuned database connections in settings.
Added caching with Redis, which helped slightly but didn’t fully solve the issue.
Verified CSRF middleware configurations, but the intermittent validation failure remains unresolved.
I’d greatly appreciate any advice or best practices for handling these issues in Django, especially for file uploads and external API integrations.
What are you considering “larger” files? Transferring GBs of data through HTTP POSTs is typically a very bad idea. You’re a lot better off writing some JavaScript to send the data in chunks and managing the transmission yourself. (Channels can be very useful for this as this avoids all the overhead of initiating requests for each chunk - but it’s not necessary.)
It all depends upon what needs to be done with the data being fetched. Are you just updating models? Are you including it as data within a template being rendered?
In addition to Channels, Django does now support asynchronous views. As a third option, you could also use Celery. (My choice would depend upon the answers to the questions above.
If I remember correctly, there are at least four different CSRF-related error messages that can be issued, each one is worded slightly different and are the indication of the specific cause of the problem. You want to identify the specific error being received and address it appropriately.
This is also where the specifics of the code really matter - especially the client-side JavaScript. For diagnosing this, depending upon which error is being received, I’d look at the requests themselves being issued to ensure that both the token and the cookie are present and correct.
Again, the specifics matter here. What are you considering “complex” queries? What are you considering a “significant performance drop”? Have you used explain and explain analyze on those queries to identify what they are doing and where the issue may be?
You’ve mentioned you increased the server memory, but did you also adjust the database memory allocation? What size server are you talking about? How large is the database? Are there any other applications running on that server? Have you put an upper limit on Redis’ memory allocation?
Thanks for your detailed response and suggestions! I’ll address your points one by one:
File Uploads:
Larger files: I consider “larger” files to be in the 50MB-100MB range. Ideally, I want to support uploads up to 250MB. Chunking the uploads with Javascript sounds like a viable approach, especially considering Channels can help manage the chunked transfer. I’ll explore this option further.
Asynchronous API Calls:
Data usage: The fetched CapCut project data is used to populate the Django models and potentially for template rendering as well.
Asynchronous options: Thanks for mentioning Django’s built-in asynchronous views as an alternative to Channels. I’ll definitely research both Channels and asynchronous views to see which best suits my needs. Celery is also an option I’ll keep in mind for more complex background tasks.
CSRF Token Validation:
Error identification: I’ll investigate the specific CSRF error messages being encountered by users. Debugging the client-side Javascript for CSRF token inclusion sounds like a good next step.
Database Optimization:
Complex queries: By “complex” queries, I mean those involving filtering by multiple categories, sorting by various criteria (e.g., popularity, recent uploads), and potentially combining those filters. A “significant performance drop” translates to response times exceeding 5 seconds during peak traffic.
Optimization techniques: I’ve used EXPLAIN and EXPLAIN ANALYZE to identify bottlenecks in the queries. Based on those findings, I’ve created indexes on relevant fields, but the performance improvements haven’t been substantial.
Server and database details:
I’ve increased the server’s memory allocation to 8GB.
Database memory allocation is currently set to 4GB. I can definitely adjust this based on your recommendations.
The server is an AWS EC2 t3.xlarge instance.
The database size is currently around 10GB.
No other applications are running on this server.
Redis memory allocation is capped at 2GB.
I hope this additional information helps narrow down the potential causes of these issues. I’m open to any further suggestions or best practices you might have for optimizing file uploads, asynchronous API calls, CSRF token handling, and database queries in Django.
For us to be able to provide further and more detailed suggestions, we’d need to see the models, queries, and the output from an explain analyze from a sample query showing a 5-second response time.
It might also be worthwhile to use Django Debug Toolbar to definitively identify where the 5-second times are coming from. It’s always possible that the issue is related to multiple issues and that there’s no single issue to resolve.
Thank you for the quick response! I appreciate the suggestion to use Django Debug Toolbar to pinpoint the performance bottlenecks. I’ll integrate that and analyze where the delays are coming from.
In the meantime, here’s more context and the details you requested:
Models and Queries
Here’s an example of the model for storing CapCut template metadata:
class Template(models.Model):
name = models.CharField(max_length=255)
category = models.CharField(max_length=100)
popularity = models.IntegerField(default=0)
uploaded_at = models.DateTimeField(auto_now_add=True)
file = models.FileField(upload_to="templates/")
description = models.TextField(blank=True)
class Meta:
indexes = [
models.Index(fields=["category"]),
models.Index(fields=["popularity"]),
]
An example query causing issues during peak traffic: Template.objects.filter(category="Video Editing").order_by("-popularity")[:20]
During peak traffic, the query execution time spikes to around 5-6 seconds.
Steps Taken
Indexes: I’ve added indexes on category and popularity (see Meta section above).
Caching: Frequently accessed query results are cached using Redis for 1 minute, but performance issues still arise with less commonly accessed categories or when Redis cache is bypassed.
Database Connections: I’ve increased the CONN_MAX_AGE to 600 and added connection pooling with pgbouncer, which helped a bit but didn’t eliminate the delays.
Other Issues (Context for Debugging)
CSRF Token Issue: For AJAX requests, I include the CSRF token in the headers like this:
The issue seems to occur more frequently during high traffic.
Asynchronous API Calls: For fetching CapCut project data, I tried offloading to Django Channels but found the setup overly complex. Would integrating something like Celery with Redis be more appropriate for this use case?
First, a personal request: Please stop quoting your entire message in your replies. It’s making this thread more difficult for me to read.
What you quoted is the output of an explain, but not an explain analyze. An explain analyze include showing the actual time required for the query to execute:
polls=# explain analyze select * from polls_question order by "id" desc;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Sort (cost=18.00..18.42 rows=170 width=434) (actual time=0.022..0.023 rows=1 loops=1)
Sort Key: id DESC
Sort Method: quicksort Memory: 25kB
-> Seq Scan on polls_question (cost=0.00..11.70 rows=170 width=434) (actual time=0.011..0.012 rows=1 loops=1)
Planning Time: 0.089 ms
Execution Time: 0.036 ms
(6 rows)
Regardless, just looking at the cost metric of your query, the query itself should be very fast.
Whatever is taking 5 seconds, it’s not the query, unless there’s something very wrong with your database.
That’s just the Django-side of things. You’ll also need to ensure that whatever webserver / proxy server(s) you are using allow for post data of that size.
No, because you don’t have any kind of direct connection between Celery and the browser.
The purpose of using Channels here is a way of optimizing the upload of the files by reducing the overhead involved in multiple http requests when “chunking” the data instead of trying to upload it all at once. But you don’t need to use Channels for that.