How to manage a table created outside of Django

Hello,
I am dabbling with a cutting edge Postgres extensions that does no have existing django client.
For now, I am manually create table and writing raw SQL for CRUD.

It seems to me that the most proper way to do this is to mimic Django ORM. However, I am quite noob, so I dont know where to organise my raw SQL and how can I manage the manually created table.

Can anyone point me to the right direction to pull this off?

Also the extension is pgvectorscale, I believe there must be some people who are also working on it right now so please share some implementation or guild

I’m not sure I understand the first part of this question, but for the second part, if you want to use the ORM with this table, you can create a Django model that maps to an existing table by using the db_table Meta option in your model definition. And assuming that you do not want Django to manage the structure of your table, you would also want to use the managed = False option.

I don’t know anything about pgvectorscale, but as an extension / enhancement to pgvector, you might want to see pgvector · PyPI as a starting point. It may give you some ideas for additional extensions in that area.

1 Like

Hello,
Thank you for the quick answer,
The issue is that pgvectorscale has a new way to make index for the vector which pgvector does not have so I am not quite sure whether setting up pgvector model will mess up the database.

My first question is about whether should I make a new module for the whole raw sql parts or should I put in in a Manager for a Django model or should I write function inside the model definition.

The test model look like this:

CREATE TABLE IF NOT EXISTS document_embedding  (
    id BIGINT PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
    metadata JSONB,
    contents TEXT,
    embedding VECTOR(1536)
)

I suspect the Vector field is equivalent to pgvector VectorField, but I am not quite sure whether these people come up with some fanny new Field implementation. They dont even have a documentation about what is going on.

In short, I think it is quite safe to do everything in raw sql to avoid problems.

If you have managed = False, it can’t mess up the database. It’s not going to change the database structure at all. The only real question is whether or not the syntax for queries and updates is different between the two.

(What I was actually suggesting is that you could use it as a starting point to modify it for pgvectorscale. Again, it’s solely a question of whether or not you want to use this with the ORM.)

Yes, that is true, which circles back around to the other question:

That’s a style choice, and a decision I would make based upon the number of different queries being created and the number of different places each query is being used.

As a starting point, I’d have the queries in the views in which they are being used - unless you know that you are going to use the same query in multiple views. In that case, I’d probably make it a separate function - but still keep it in views.py.
(I’m a believer in keeping things simple until I have a demonstrable need for doing it otherwise. If you’re not using the ORM for these queries, I’m not sure I see a value in putting any of this with a model or manager.)

1 Like

Thank you, I will do it this way

Man, I am in the same situation. Not started yet, but I just came across this blog post where they manually created a table in a migration file: Getting Sensor Data Into TimescaleDB via Django

Apart from that, what is your solution so far?

Thanks :pray:

For those who are facing this problem, I have tested and conclude that:
=> The vector field of pgvector is similar to that of pgvectorscale
=> The insert step is similar to pgvector, can insert by django orm
=> The query part need rawsql for extra parameters
=> Index need to be manually created
Check my repo if you want the to see the raw sql part