Hi,
I was just reading the get_or_create
documentation and I want to make sure I understand it correctly. I think I do, but I just wanted to consult you guys (I’m looking at you Ken, lol) to make sure my plans for a solution are not missing something.
We have a database load script that loads data from excel files that researchers submit to us. It’s been working great for awhile, but in one recent data submission, a researcher used an excel formula to calculate a value for our infusion_rate
field in our Animal
model (a FloatField
).
A curator loaded some of the data prematurely (before resolving issues from one of the sheets in the file) and that’s a supported MO. In practice, up until now, all they should need to do is re-run the load script, since all of the load operations use get_or_create
(so that existing records are not duplicated).
However, every row on that one sheet with the infusion rate column generates an IntegrityError
because the value saved in the database and the one computed by the excel formula differ just slightly, e.g.:
0.11583011583011583
(from the excel formula)0.115830115830116
(from the database)
For this field, I could change it to a DecimalField
and set a number of decimal places for the FloatField
, but the loading code I’m dealing with handles multiple fields, and the values in some fields cannot be solved using static decimal places applied to the fields. These are fields that need a scientific mechanism like “significant figures” (where [using a rough example], the values 1230000000000.0
and 0.0000000000000123
each have 3 significant figures).
From what I understood from the documentation, the way I could apply significant figures (e.g. 3 significant figures) to a get_or_create
would be to use a filter that does something like infusion_rate__gte=0.115, infusion_rate__lte=0.116
, and then use the defaults
argument to get_or_create
to supply the value in the event of creation, e.g. get_or_create(**kwargs, defaults={"infusion_rate": 0.11583011583011583})
.
Note, the above is a rough example. I know that without the decimal places set, the above code could lead to IntegrityError
s. I’m just intending to confirm that this is how to account for precision issues in a get_or_create
for fields to which a decimal place limit cannot be applied. I intend to refine the code in the example above to implement a more comprehensive solution.
I already have code for handling significant figures that I could use to pre-process the values before the get_or_create
such that it would round to X
significant digits before a value is ever loaded, in which case, my understanding is that I wouldn’t need the filter before the get_or_create
, but is there any way to apply that sort of rounding to a FloatField
as a setting itself (because my search came up dry)? Unfortunately, that code would sit on top of the model definition (i.e. in our loading code). It would be nice if there was a more integrated way to support it. I suppose I could write a clean method that checks the significant figures though.
I found some geographic database functions that have support for significant digits (aka significant figures), and I found a database Round
function that uses the term “precision” (which is an overloaded term: the geographic docs use it for sig. dig., but Round
uses it for decimal places). None of that appears to be generically applicable to FloatField
s for significant figures.
Anything I’m missing?