I’ve been trying to make a system to collect information on strength of connection in arbitrary user-defined Bayes-networks. (Not a Bayesian network expert, so I’m just building what’s been described to me.) The connections I called ‘factors’, and there are two classes - independent factors which we want people assess, and ‘dependent factors’ for which there are a set of prescribed states. I want to describe the whole problem so that if I’ve got the modelling wrong, or there’s a better way to model this, let me know.
So the models are:
- Nexus, an locus that brings different Factors together
- Factors
- States, for some of the factors.
- We collect Values as evaluations of factors for all state combos - a full Cartesian product of combinations of state ie if F1 -> (S1,S2) & F2 -> (S3,4). ==>
F1.F2 = (S1,S3), (S1,S4), (S2,S3), (S2,S4) - if F3, F4 and F5 are our independent factors, we get a table to complete like this:
__| (S1,S3) | (S1,S4) | ( S2,S3) | (S2,S4)|
F3|_________|________ |__________|________|
F4|_________|________ |__________|________|
F5|_________|________ |__________|________|
But this table could be any dimensional, provided a normal human being could cope with filling it out, and I want it to be fairly reusable.
So, the Value model has one field for “Independent factor” as a ForeignKey, and one for a “state_space” as ManyToMany against States, dynamically generated from the Factor set for the nexus. So in the view, for a nexus, we “init_value_set” one entry at a time, to create a set for each user that populates the space required - one instance of each required combination of factors and states.
I think you can see the problem in the Nexus model functions below - to make the sets of objects I need (init_value_set) and to check whether the value_set is still valid (which it wouldn’t be if the Nexus changes) (clean_value_set), I am doing a lot in Python and a lot of database hits. This works, but its definitely slow.
I want to get average for different users assessment of the same nexus/factor/state_set but I thought I’d be able to do something like:
avg_set = Value.objects.values('nexus','factor','state_space').annotate(avg_val=Avg('value))
This average evenly divide the total Value objects by the number of Users (so if there are 96 value objects produced for 3 users, there should be 32 averages, but alas… it doesn’t seem to work because state_space is a ManyToManyField.
If I could convert the state_space to a unique string (by concatenating the State pks comma-separated for example), the naive aggregation would work. I looked into sub-query approaches but sadly blew my cerebral cortex.
# models.py
class Nexus(BaseModelWithHistory):
name = models.CharField(max_length=80)
factors = models.ManyToManyField(Factor)
def get_ind_factors(self):
return self.factors.filter(state__isnull=True)
def get_dep_factors(self):
return self.factors.filter(state__isnull=False).distinct()
def enumerate_state_space(self):
"""provides a nested tuple where the later applied factors are less-nested"""
states = None
for factor in self.get_dep_factors():
if not states:
states = factor.state_set.all()
else:
states = product(states, factor.state_set.all())
return states # nested tuple
def init_value_set(self, question, user):
self.clean_value_set()
for ind_factor in self.get_ind_factors():
for state_settings in self.enumerate_state_space():
ss = [s for s in flatten(state_settings)]
# check for an instance of each state combination
# first using statespace__in to match included states and
# total counts of states that tally
# state_spaces should not be set manually, so selections of dual states should not occur.
if not Value.objects.filter(question=question,
user=user,
nexus=self,
ind_factor=ind_factor,
state_space__in=(ss)).annotate(
num_states=Count('state_space')).filter(
num_states=len(ss)).exists():
value = Value.objects.create(question=question, user=user, nexus=self, ind_factor=ind_factor)
value.state_space.set([s.pk for s in ss])
def clean_value_set(self):
for value in self.value_set.all():
if len(value.state_space.all()) != len(self.get_dep_factors()):
print('deleting', value, "because", len(value.state_space.all()), "should be", len(self.get_dep_factors()))
value.delete()
else:
fs = value.state_space.all().values_list('factor')
c = Counter(fs)
# print(c)
if c.most_common(1)[0][1] > 1: # get the count of the most common factor
print('deleting', value, "because", fs)
value.delete()
You made it to the end of my long and weird problems! Thanks!