Hi folks!
I am running an application that manages quite a number of different types of documents (like chapters, statute sections, court decisions, flashcards) and their containers (books, statutes, reporters, decks).
So far I’ve been using Typesense for indexing and searching, but I think the overall considerations apply to all similar Search-as-a-Service offerings such as Algolia or Meilisearch. The experience is not bad, but even without managing the service it adds overhead such as:
- Synchronizing tables and indexes as only Algolia offers an official Django integration
- Serializing data as they all require JSON documents (more or less)
- Maintaining the integrations
- Cost
Recently, Postgres and especially pg_vector got some attention – actually so much that even Heroku included it. With that added to full-text search, I am wondering, whether I could actually abandon Typesense and rely entirely on Postgres. There are some advantages that I see in it:
- No synchronisation required as the search is performed on the actual tables
- Works with the Django ORM
- Simpler integration of object-based permissions (we use django-guardian)
- Less maintenance and complexity
- Vector-based search is possible with pg_vector
I wonder though, whether there might be difficulties that I underestimate:
- Does full-text search work well with long documents: Some of mine might exceed 90.000 characters
- Is there a good way to search multiple tables (of different documents) at once? Maybe async views with async DB requests?
- Will it work well if on tables with hundred thousands of rows?
All your insight is greatly appreciated!