Best server for production

Hi guys, I am having an internal debate on what server to deploy my django app. Originally, I was going for Heroku but the app requires tenant to upload file with somewhat significant size. In that case, Heroku does not seem like the first choice. Would someone has a recommendation on what to use for this specific case?
Thanks!

While I don’t have a specific recommendation regarding a server, I will point out that where Django stores uploaded files is a separate setting that you can customize. It’s probably not trivial to store files on a separate server, but it appears as if it could be done.

See Upload Handlers and Managing Files for some ideas to get started.

Ken

If I were in your shoes, I would probably consider AWS S3 for the uploaded files coupled with django-storages. You could create an S3 buckets to store the uploads and django-storages would handle a lot transparently. In this mode, you’d get the deployment ease of Heroku with some capabilities from AWS.

Note that this is just a suggestion and I don’t know what your budget permits so I would recommend running calculations on storage and transfer costs between Heroku and AWS before going down this road.

thank you for your response, I should have precised, the upload automatically replaces the existing data, therefore the schemas can never get too big. The way I go the app to work is a little bit contradictory to some common practices I have seen, meaning that the data gets upload, but before it is populating the schemas, it goes through an ETL that changes the data considerably. In this regard, I know that Heroku does not support large uploads, therefore I was wondering if you knew of any specific server that would fit this model. So far, I am leaning toward AWS ec2, but it would be nice to have confirmation you know! anyways thank you for your help!

Hey thank you for your response Matt, I appreciate your youtube videos, they have been super instructive for my needs, it feels like I am writing to my professor! As I mentioned in my other response, my apologies for not giving enough details about my situation to be hoping for a specific answer! I thought about S3 buckets but I needed a huge python ETL to transform my data and since I am not super experienced, it was simpler to handle the transformation before the data populates the database. (I know this not common practice, and probably make cringe some people) but I got to work well and I am now concerned about what server would be working well with this kind of model. In other words, a user uploads fairly big csv file that gets transformed and then replaced the existing data in that schema. In that regards, would you recommend any server? Thanks again!

1 Like

Part of the question is, what do you consider huge? Just what size are these upload files you’re trying to process?

The reason I’m asking is that if your files are too big, you may have other issues associated with the upload to deal with. (e.g. file upload size limits and timeout issues.)

Four files total, 2 of them are unsignificants, 1 will be between 500 and 20000 rows and the last one I imagine can go up 1 million rows.

Ok, when you’re considering things like this, that’s not ‘huge’. I might consider the last one ‘large’, depending upon the size of each row.

Generally speaking, if I can comfortably fit all the data in memory, I don’t sweat too much about it. If each row is 100 bytes and you have 1 million rows, that’s 100 MB for that file. Yes, you might have to make some settings changes in your web server to allow someone to upload a file of that size, and there might even need to be a Django setting to adjust as well. But depending upon a couple different factors, I’d be tempted not to write that out at all. Accept the data as being uploaded and then process it in memory before writing it out to your database. (Or, if you’re concerned about the process being interrupted and needing to restart, I’d also think about writing it out with each line of the file being a row in a table for later processing - a lot of it depends upon exactly what you need to do with this data and just how intricate the transformation are.)

Sorry I am a noob :sweat_smile: each file has between 3 to 7 columns. So you think its no problem? once the files are done with the ETL and stored in the schemas of the client, there is pretty mucho 0 calculation happening, the goal being to offer automated data visualizations from the raw data inputed. So you think I am good with AWS ec2?

Certainly as a starting point. Some of these types of decisions may come down to being budget, billing, and funding decisions as well. (And there are what, about a dozen different ec2 options for servers configurations? You will want to make the right choice there.)
Most any server environment I’m aware of is going to be able to handle this from a capacity perspective. So I guess my point is that that shouldn’t be driving your decision. Once you’ve done your development and testing, you’ll have a better idea of exactly how well this will run in your testing environment, and then you can make a final decision for deployment.
(But I would encourage you to have as nearly a full-scale test run before making that decision. There’s always the possibility that you may encounter an issue that can be better addressed before deployment rather than after.)

Thank you for your help, that goes a long way! I don’t want to abuse your time, but I have one question…I have done a lot of testing and the app functions well on local host with various upload size, multiple schemas, all the login stuff and all. What do you mean by “I would encourage you to have as nearly a full-scale test run”?

You mentioned that you have one file that may have 1 million rows. Have you tested your app with a 1-million row upload? That’s what I mean by full scale. Sometimes, things might work well when you’re talking in the small to medium size range that end up not working well at full scale. So if you haven’t tested with the largest file that you expect to receive, I’d suggest you do so, to have a feel for how long it’s going to run and what that might end up doing to the rest of the environment while it’s running.

For example, since my development environment is generally larger than most of the servers I deploy to, I create local virtual machines using either VirtualBox or VMWare that are typically of the scale of those servers. (If you’re working with docker images, I know you can do it there as well - I just haven’t used it that way.) I’ll then run my full application within that environment to try to get a feel as to how well it’s going to perform in the real environment. No, it’s not perfect, and there are sometimes some real surprises, but it does tend to work out well enough for a first shot.
We also try to have at least one test environment that is a true replica of our production environment that we call our “staging” server; That’s our final test platform before going live.
In the case of an AWS ec2 server, it’s easy enough to spin one up of the size you’re expecting to use just to see how it’s going to perform.

Thank you so much! I have ran tests with up to 5 million rows uploads, and it still work super well, just take about 1.5 minute to load up. I’ll set the app up in a virtual machine as a final test. Thank you for all your help!

Something you might have to watch out for is how long your platform of choice will go before a request will time out. It sounds like you’re planning to do all the processing during the view request. In the worst case scenario that you presented, that’s 90 seconds. I think that Heroku times out at 30 seconds.

If you want to run in this way and don’t want to use a more complicated background worker setup (and I wouldn’t blame since that brings in a lot of extra complexity), then that might dictate your choice of operating environment. You would probably require a destination where you can control the timeouts of your web server and that starts to push you towards virtual machines like EC2 where you have full control.

Considering virtual machines, I’ve had good luck with Digital Ocean. In my experience, a DO droplet is quick to set up. The downside of this approach is that you’re suddenly maintaining all your own infrastructure. That’s quite an extra chunk of work to handle.

Thank you very much for your advices, I think that EC2 seems to be a good option.