I have an custom model field in which i have implemented cryptography. I get prefect results when I don’t use lookups for filtering data i.e. it calls get_prep_value()
But when I filter data using lookups like startswith
or endswith
. it does not calls get_prep_value()
.How to solve this problem?
I think we would need to see the actual code - both the model and the queries. There’s no way to tell anything without seeing what you’re trying to do.
Code for custom model
from django.db import models
from .modelfields import CharField
# Create your models here.
class Post(models.Model):
title = CharField(max_length=100)
body = models.TextField()
Code for custom model fields
from django.db import models
from .cryptography import encrypt,decrypt
class CharField(models.CharField):
def get_internal_type(self)-> str:
return "TextField"
def to_python(self, value: Any) -> Any:
print("to_python",value)
return self.clean(super().to_python(value),None)
def get_prep_value(self, value: Any) -> Any:
print("get_prep_value" , value)
return encrypt(super().get_prep_value(value))
def from_db_value(self, value:Any, expression:Any, connection:Any)-> Any:
return decrypt(value).decode()
def clean(self, value, model_instance):
"""
Convert the value's type and run validation. Validation errors
from to_python() and validate() are propagated. Return the correct
value if no error is raised.
"""
self.validate(value, model_instance)
self.run_validators(value)
return value
Code for custom Query
Post.objects.filter(title="ShahProgrammer")
#sql
'SELECT "CryptographicFields_post"."id", "CryptographicFields_post"."title", "CryptographicFields_post"."body" FROM "CryptographicFields_post" WHERE "CryptographicFields_post"."title" = 9319c221fb87f14a26a104c1c2db'
Post.objects.filter(title__startswith="Shah")
#sql
'SELECT "CryptographicFields_post"."id", "CryptographicFields_post"."title", "CryptographicFields_post"."body" FROM "CryptographicFields_post" WHERE "CryptographicFields_post"."title" LIKE Shah% ESCAPE \'\\\''
Unless I’m really misunderstanding something here, what you’re trying to do doesn’t make sense.
If the data is stored encrypted in the table, you’re never going to be able to perform a database filter with an unencrypted value, and “startswith”, etc, aren’t going to give you valid comparisons. (The encrypted value for “ShahProgrammer” is not going to start with the same encrypted value for “Shah”.)
If the data is stored encrypted in the database, those types of comparisons can only be done with unencrypted values, which means that either the database itself needs to do the decryption on its side, or you need to read the data, decrypt it, and do the comparison within your code.
I am encrypting data with AES so, the encrypted value of the ShahProgramer will start with same encrypted value for Shah.I am encrypting & decrypting data in my custom model fields. Database does not do any of encrypting & decrypting. I hope you understand my problem. Here I am providing an encrypted value for Shahprogrammer & Shah
- ShahProgrammer = 9319c221fb87f14a26a104c1c2db
- Shah = 9319c221
I’m sorry, that doesn’t resemble the behavior of any validated AES implementation I’ve ever seen, nor am I able to replicate that using any of a variety of different length strings.
Look at it this way, if the encryption were stable on a prefix of a string, it would be ridiculously easy to crack. All you would need to do is try one character at a time until you found the matching character, and keep adding characters until you found the complete string - that’s not encryption and not secure under any meaningful definition.
But, leaving that aside for the moment, I understand the issue. But in the general case, it doesn’t make sense to do it that way.
It is secure I am using AES CFB. AES will generate the same encrypted string until the key & iv changes. Here is the code for encyption. I am using pycryptodome for AES
from django.conf import settings
from .exceptions import LengthError
from Crypto.Cipher import AES
def get_key(settings)->str:
try:
cipher_key=settings.ENCRYPTION_KEY
except AttributeError:
cipher_key=settings.SECRET_KEY
finally:
if len(cipher_key) < 50:
raise LengthError(len(cipher_key))
else:
return cipher_key
def type_check(string)->bytearray:
if isinstance(string,bytearray):
return string
elif isinstance(string,bytes):
return bytearray(string)
else:
return bytearray(str(string),"UTF-8")
def to_hex(string)->hex:
return bytearray(string).hex()
def from_hex(hexstring)->bytearray:
return bytearray.fromhex(hexstring)
def encrypt(string)->hex:
print('encrypt called')
return to_hex(AES.new(type_check(get_key(settings)[:32]),AES.MODE_CFB,type_check(get_key(settings)[-16:])).encrypt(type_check(string)))
def decrypt(hexstring)->bytearray:
print('decrypt called')
return bytearray(AES.new(type_check(get_key(settings)[:32]),AES.MODE_CFB,type_check(get_key(settings)[-16:])).decrypt(type_check(from_hex(hexstring))))
Also why would this not make sense. In Django documentation it is written that
Since using a database requires conversion in both ways, if you override from_db_value() you also have to override get_prep_value() to convert Python objects back to query values.
Yes, that would be valid on an exact match, but any of the other “conditionals” such as __lt, __startswith, __iexact, etc, are not valid with encrypted data.
But, the issue here is that the from_db_value and get_prep_value methods are methods on model fields, not on constants in expressions. You’re not assigning these values to fields, you’re passing them as parameters for queries on the database.
Can you explain me in detail what do you mean by But, the issue here is that the from_db_value and get_prep_value methods are methods on model fields, not on constants in expressions. You’re not assigning these values to fields, you’re passing them as parameters for queries on the database.
My concern is that Django should call the get_prep_value()
function as it used for preparing data for the query but why Django fails to do it when used lookups for filtering data
Ok, I was wrong in why it works this way - my assumption was wrong with how this works.
The issue here is not “field assignment vs non-field assignment”, it’s an issue of whether or not the supplied parameter is transformed before being submitted.
For example, if you have the clause .filter(field_name="some value")
, the value “some value” is passed through to the query unchanged, and it does call get_prep_value on the constant being supplied.
However, if you use .filter(field_name__startswith="some")
, the value passed to the query is “some%”. Or, if you were to use .filter(field_name__iexact="some")
, the query being generated passes UPPER("some value")
in the where clause.
So, it appears that in those cases where the value is not transformed in any way, get_prep_value is called. But if the parameter is transformed, it’s not.
That does raise the question as to whether or not it should - there may be some aspects to this that might make it problematic, I wouldn’t know.
I’ve tried looking through the old tickets to see if there’s one addressing this issue, I haven’t found anything.
This definitely bears further investigation.
I think I found the issue that is causing this. I was checking the code of Django on GitHub & it calls the get_prep_value()
of the class with is registered with it not the class from where it was called
Github Link for lookups