ElasticSearch Quality Inside PostgreSQL? ParadeDB Introduces pg_bm25

ParadeDB introduced a PostgreSQL extension promising ElasticSearch quality results within PostgreSQL. It looks pretty impressive and is built on top of Tantivy, a Rust-based alternative to Apache’s Lucene.

As someone who has run into the limitations and bugs present in PostgreSQL’s TS_VECTOR search, this looks incredibly promising… and perhaps an extension to Django’s ORM could be in order. My team is going to test this with a multi-TB dataset to see how it performs. Here are some of the key takeaways:

  • 100% Postgres native, with zero dependencies on an external search engine
  • Built on top of Tantivy, a Rust-based alternative to the Apache Lucene search library
  • Query times over 1M rows are 20x faster compared to tsquery and ts_rank, Postgres’ built-in full-text search and sort functions
  • Support for fuzzy search, aggregations, highlighting, and relevance tuning
  • Relevance scoring uses BM25, the same algorithm used by ElasticSearch
  • Real-time search — new data is immediately searchable without manual reindexing

Having the syntax in line with SQL looks fantastic too:

SELECT *
FROM my_table
WHERE my_table @@@ '"my query string"'

SELECT *
FROM my_table
WHERE my_table @@@ 'description:keyboard^2 OR electronics:::fuzzy_fields=description&distance=2'

Has anyone else played with this yet?

9 Likes

I would definitely play with it in the next weeks.
Thanks for your post.

Thanks for sharing, looks very interesting.

Hello! I’m of the makers of ParadeDB. It’s super cool to see our work be featured on the Django forum. If you have any thoughts/feedback as you play with it, please let us know. Django and Postgres are a terrific combination for any developer and we’re committed to making pg_bm25 and ParadeDB something truly magical for the Django community. :slight_smile:

4 Likes