Hello, folk.
I know that a large codebase/monolith architecture is not good nowadays, especially in the Python world where the language is dynamic.
Anyway, we must support old big projects.
My question is how are you dealing with such a problem?
Hello, folk.
I know that a large codebase/monolith architecture is not good nowadays, especially in the Python world where the language is dynamic.
Anyway, we must support old big projects.
My question is how are you dealing with such a problem?
What problem? You’ve only described a meme. People pushing microservices, serverless, etc, etc, etc, are always extra-keen for developers to forget all the lessons they’ve learned over the last 15 years and buy into their auto-scaling nonsense without thinking… But thinking that <insert tech>
is automatically bad for <generic problem>
isn’t great either. Asses your problem.
Anyway. I develop a Django booking system that is directly a customer-facing set of websites (over multiple domains), physical operations management, accounting software, REST access for third party software, email ingress for booking systems stuck in the 1980s, physical operation control (eg cameras and barriers) and… Well. You get the point. Apart from a splash of aiohttp to manage inbound notifications from cameras, and, it’s all Django.
And again, this is absolutely to do with our problem. Booking systems are natural monoliths.
But yes, we’re doing more stuff over API than we were 10 years ago. More stuff in frontend frameworks (esp Vue currently but it’s JS so it’ll probably be completely different in 2y time). SSGs to manage static content.
But I don’t foresee Django going away. Having a single central logical data controller makes sense for us.
There’s definitely a lot of organizations using Django for large projects … Washington Post comes to mind.
I suspect some frustrations against monoliths are not against monoliths, but rather legacy codebases with lot of technical debt. But a monolith != legacy codebase with technical debt …
I mean at work (Eventbrite) we have a few million lines of Django code, part-monolith and part-services. Both architectural styles have their issues, and both have their advantages. I don’t think separate services is totally worth it unless you have enough people to map teams to those services (some version of Conway’s Law).
We’ve got a large DRF API in production. It isn’t a large Django code base, per se, but has 60,000 endpoints, which means there are 60,000 models. We have a generic serializer and view build on top of these models, and have come up with some interesting work-arounds to make it fast. I’ll be presenting this case study at DjangoCon US in a few weeks: https://2019.djangocon.us/talks/awesome-automated-apis-with-automagic/
Instagram currently features the world’s largest deployment of the Django web framework
cf: Web Service Efficiency at Instagram with Python | by Instagram Engineering | Instagram Engineering
It’s from 2016 but I think it is still true.
I work for a utility company on a complex Django monolith with well over a thousand modules. We do have other services that are more focused, but most things run within the main system. The code base is only a few years old and there is a lot of active development of new features.
My take on microservices versus monoliths is that it’s important to distinguish whether you’re doing it for infrastructure reasons or modelling reasons.
An example of an infrastructure reason might be if some parts of your system have different scaling needs. In that case, splitting them into separate services might be a good approach.
Modelling reasons are more about reducing the cognitive load on developers. How can we avoid systems becoming more and more complicated to think about as they grow in size? The key is to modularise systems so they can be thought about in smaller chunks. And to do this, you need boundaries between the modules.
Sometimes I think that the potential benefit of microservices is that you are able to impose boundaries at the infrastructure level. That’s ok if you get the boundaries right first time, but they’re much harder to change later. Until your system has stabilised, I prefer to impose them at the software level instead. This makes refactoring easier, as you discover better ways to modularise the system. Also, communication between different services is much easier (calling a function, rather than a network API).
To impose boundaries using software, we use a tool called Import-Linter (which I maintain) to enforce these boundaries at the software level. This helps us enforce the boundaries cheaply. As the system continues to mature, I imagine some parts of it will be broken out into separate services, but for now, our monolith feels like the right choice.
We used Django for a search engine like google called Zaoree. It was my first big project, actually I doesn’t work anymore with Zaoree but Django is totally compatible with large projects.
The largest Django codebase I worked on was my previous employer YPlan. Several hundred thousand lines of code, deployed as a monolith. The monolith let us move quicker since all developers were able to run the full stack on their machine and understand any bit of it easily.
Perhaps if the team got larger it would have made sense to split it, but it was a ticket booking system similar to @oliwarner covered which naturally needed a lot of integration so maybe it wouldn’t have been easy.
Are there any public repositories that one can have a look at, to understand the design patterns at large scale? There are Instagram engineering blogs but they are limited in number.
Hiya @hardikbansal,
Not necessarily a large proprietary project, but even better is this repo describing the needs of a professional, scalable, greenfields starting point for a django project:
Many good professional practices are demonstrated.
Quote (Sep-2019):
- Renders Django projects with 100% starting test coverage
- Twitter Bootstrap v4 (maintained Foundation fork also available)
- 12-Factor based settings via django-environ
- Secure by default. We believe in SSL.
- Optimized development and production settings
- Registration via django-allauth
- Comes with custom user model ready to go
- Optional custom static build using Gulp and livereload
- Send emails via Anymail (using Mailgun by default, but switchable)
- Media storage using Amazon S3 or Google Cloud Storage
- Docker support using docker-compose for development and production (using Traefik with LetsEncrypt support)
- Procfile for deploying to Heroku
- Instructions for deploying to PythonAnywhere
- Run tests with unittest or pytest
- Customizable PostgreSQL version
Optional Integrations
These features can be enabled during initial project setup.
- Serve static files from Amazon S3, Google Cloud Storage or Whitenoise
- Configuration for Celery and Flower (the latter in Docker setup only)
- Integration with MailHog for local email testing
- Integration with Sentry for error logging
It’s an informative starting point for professional software engineers starting out a django project. This also goes along well with Danny and Audrey’s book.
It’d be nice if there was some sort of listing on this as I often wonder myself. Top ones are Instagram, Bitbucket, Disqus, Pinterest (used to be on Django, not anymore). Here in Boston Klaviyo is huge, PathAI uses it. I also found out last night that Cox Media uses Django and I suspect many other large behemoths do as well but likely as part of larger web offerings.
To me the more interesting question is how do you architect Django once it reaches massive size. Would love to see folks sharing more on this.
Sentry is written in Django and uses a “single app” approach. I haven’t looked through the codebase recently but when I did in the past I found it educational. Source starts here: https://github.com/getsentry/sentry/tree/master/src/sentry
Hey @hardikbansal
Mozilla have quite some open-source Django projects, for example their addons.mozilla.org
:
Some time ago I compiled a list of popular OSS Django projects here (as a part of guide for telegram Django community):
It depends. I don’t mind monolith architecture, but there can be a tipping point. Having worked on both monoliths with 100+ apps and with clients who have 100+ microservices, I’ll take a monolith any day of the week over the alternative.
If you are stuck with a monolith, I’d consider app-by-app which ones you can split out into their own repositories and make them installable. This may both speed up and improve your overall time to test each one.
I don’t want to see 100+ apps in my project any more than I want to see 100+ application repos. They both have their advantages and disadvantages, but one is makes sharing code for new projects much easier.
That’s still true I guess. At least, they are still using it and they may still be the biggest app in term of users.
And we’re back! As we mentioned in the first part of our blog post series, Instagram Server is a Python monolith with several million lines of code and a few thousand Django endpoints.
I’ve seen OpenEDX mentioned a couple of times lately. Their “platform” is a huge monolith and they’ve recently been featured on Django Chat where they talk about it.