ParadeDB introduced a PostgreSQL extension promising ElasticSearch quality results within PostgreSQL. It looks pretty impressive and is built on top of Tantivy, a Rust-based alternative to Apache’s Lucene.
As someone who has run into the limitations and bugs present in PostgreSQL’s TS_VECTOR search, this looks incredibly promising… and perhaps an extension to Django’s ORM could be in order. My team is going to test this with a multi-TB dataset to see how it performs. Here are some of the key takeaways:
- 100% Postgres native, with zero dependencies on an external search engine
- Built on top of Tantivy, a Rust-based alternative to the Apache Lucene search library
- Query times over 1M rows are 20x faster compared to
tsquery
andts_rank
, Postgres’ built-in full-text search and sort functions - Support for fuzzy search, aggregations, highlighting, and relevance tuning
- Relevance scoring uses BM25, the same algorithm used by ElasticSearch
- Real-time search — new data is immediately searchable without manual reindexing
Having the syntax in line with SQL looks fantastic too:
SELECT *
FROM my_table
WHERE my_table @@@ '"my query string"'
SELECT *
FROM my_table
WHERE my_table @@@ 'description:keyboard^2 OR electronics:::fuzzy_fields=description&distance=2'
Has anyone else played with this yet?