How to use `get_or_create` when there are `FloatField` precision issues

Hi,

I was just reading the get_or_create documentation and I want to make sure I understand it correctly. I think I do, but I just wanted to consult you guys (I’m looking at you Ken, lol) to make sure my plans for a solution are not missing something.

We have a database load script that loads data from excel files that researchers submit to us. It’s been working great for awhile, but in one recent data submission, a researcher used an excel formula to calculate a value for our infusion_rate field in our Animal model (a FloatField).

A curator loaded some of the data prematurely (before resolving issues from one of the sheets in the file) and that’s a supported MO. In practice, up until now, all they should need to do is re-run the load script, since all of the load operations use get_or_create (so that existing records are not duplicated).

However, every row on that one sheet with the infusion rate column generates an IntegrityError because the value saved in the database and the one computed by the excel formula differ just slightly, e.g.:

  • 0.11583011583011583 (from the excel formula)
  • 0.115830115830116 (from the database)

For this field, I could change it to a DecimalField and set a number of decimal places for the FloatField, but the loading code I’m dealing with handles multiple fields, and the values in some fields cannot be solved using static decimal places applied to the fields. These are fields that need a scientific mechanism like “significant figures” (where [using a rough example], the values 1230000000000.0 and 0.0000000000000123 each have 3 significant figures).

From what I understood from the documentation, the way I could apply significant figures (e.g. 3 significant figures) to a get_or_create would be to use a filter that does something like infusion_rate__gte=0.115, infusion_rate__lte=0.116, and then use the defaults argument to get_or_create to supply the value in the event of creation, e.g. get_or_create(**kwargs, defaults={"infusion_rate": 0.11583011583011583}).

Note, the above is a rough example. I know that without the decimal places set, the above code could lead to IntegrityErrors. I’m just intending to confirm that this is how to account for precision issues in a get_or_create for fields to which a decimal place limit cannot be applied. I intend to refine the code in the example above to implement a more comprehensive solution.

I already have code for handling significant figures that I could use to pre-process the values before the get_or_create such that it would round to X significant digits before a value is ever loaded, in which case, my understanding is that I wouldn’t need the filter before the get_or_create, but is there any way to apply that sort of rounding to a FloatField as a setting itself (because my search came up dry)? Unfortunately, that code would sit on top of the model definition (i.e. in our loading code). It would be nice if there was a more integrated way to support it. I suppose I could write a clean method that checks the significant figures though.

I found some geographic database functions that have support for significant digits (aka significant figures), and I found a database Round function that uses the term “precision” (which is an overloaded term: the geographic docs use it for sig. dig., but Round uses it for decimal places). None of that appears to be generically applicable to FloatFields for significant figures.

Anything I’m missing?

<conjecture>

In Django? You could try creating a subclass of FloatField with a set of custom functions for get_prep_value, get_db_prep_value, to_python, etc, as necessary to do the rounding.
(I have no idea whether this would be helpful if you’re using something like “bulk_create” or “bulk_update” - I don’t know what the level of interaction would be in that situation.)

In the database itself? You could create a set of triggers to edit the fields (both ways - in and out).

Or, depending upon your needs for using this data beyond simply storing it, you could try using a decimal floating point format. (See pgDecimal2: decimal64 and decimal128 data types / PostgreSQL Extension Network for a PostgreSQL implementation.)

Or, you could implement your own decimal floating point format in Django and store it in any way you find appropriate.
</conjecture>

(Sorry, I don’t have any current knowledge in this area. The last time I worked with a decimal floating point format was probably 1985?)

1 Like