I’ve been trying to make a system to collect information on strength of connection in arbitrary user-defined Bayes-networks. (Not a Bayesian network expert, so I’m just building what’s been described to me.) The connections I called ‘factors’, and there are two classes - independent factors which we want people assess, and ‘dependent factors’ for which there are a set of prescribed states. I want to describe the whole problem so that if I’ve got the modelling wrong, or there’s a better way to model this, let me know.
So the models are:
- Nexus, an locus that brings different Factors together
- States, for some of the factors.
- We collect Values as evaluations of factors for all state combos - a full Cartesian product of combinations of state ie if F1 -> (S1,S2) & F2 -> (S3,4). ==>
F1.F2 = (S1,S3), (S1,S4), (S2,S3), (S2,S4)
- if F3, F4 and F5 are our independent factors, we get a table to complete like this:
__| (S1,S3) | (S1,S4) | ( S2,S3) | (S2,S4)|
But this table could be any dimensional, provided a normal human being could cope with filling it out, and I want it to be fairly reusable.
So, the Value model has one field for “Independent factor” as a ForeignKey, and one for a “state_space” as ManyToMany against States, dynamically generated from the Factor set for the nexus. So in the view, for a nexus, we “init_value_set” one entry at a time, to create a set for each user that populates the space required - one instance of each required combination of factors and states.
I think you can see the problem in the Nexus model functions below - to make the sets of objects I need (init_value_set) and to check whether the value_set is still valid (which it wouldn’t be if the Nexus changes) (clean_value_set), I am doing a lot in Python and a lot of database hits. This works, but its definitely slow.
I want to get average for different users assessment of the same nexus/factor/state_set but I thought I’d be able to do something like:
avg_set = Value.objects.values('nexus','factor','state_space').annotate(avg_val=Avg('value))
This average evenly divide the total Value objects by the number of Users (so if there are 96 value objects produced for 3 users, there should be 32 averages, but alas… it doesn’t seem to work because state_space is a ManyToManyField.
If I could convert the state_space to a unique string (by concatenating the State pks comma-separated for example), the naive aggregation would work. I looked into sub-query approaches but sadly blew my cerebral cortex.
class Nexus(BaseModelWithHistory): name = models.CharField(max_length=80) factors = models.ManyToManyField(Factor) def get_ind_factors(self): return self.factors.filter(state__isnull=True) def get_dep_factors(self): return self.factors.filter(state__isnull=False).distinct() def enumerate_state_space(self): """provides a nested tuple where the later applied factors are less-nested""" states = None for factor in self.get_dep_factors(): if not states: states = factor.state_set.all() else: states = product(states, factor.state_set.all()) return states # nested tuple def init_value_set(self, question, user): self.clean_value_set() for ind_factor in self.get_ind_factors(): for state_settings in self.enumerate_state_space(): ss = [s for s in flatten(state_settings)] # check for an instance of each state combination # first using statespace__in to match included states and # total counts of states that tally # state_spaces should not be set manually, so selections of dual states should not occur. if not Value.objects.filter(question=question, user=user, nexus=self, ind_factor=ind_factor, state_space__in=(ss)).annotate( num_states=Count('state_space')).filter( num_states=len(ss)).exists(): value = Value.objects.create(question=question, user=user, nexus=self, ind_factor=ind_factor) value.state_space.set([s.pk for s in ss]) def clean_value_set(self): for value in self.value_set.all(): if len(value.state_space.all()) != len(self.get_dep_factors()): print('deleting', value, "because", len(value.state_space.all()), "should be", len(self.get_dep_factors())) value.delete() else: fs = value.state_space.all().values_list('factor') c = Counter(fs) # print(c) if c.most_common(1) > 1: # get the count of the most common factor print('deleting', value, "because", fs) value.delete()
You made it to the end of my long and weird problems! Thanks!