Modelling network connections; aggregating on a matched set of ManyToMany relationships.

I’ve been trying to make a system to collect information on strength of connection in arbitrary user-defined Bayes-networks. (Not a Bayesian network expert, so I’m just building what’s been described to me.) The connections I called ‘factors’, and there are two classes - independent factors which we want people assess, and ‘dependent factors’ for which there are a set of prescribed states. I want to describe the whole problem so that if I’ve got the modelling wrong, or there’s a better way to model this, let me know.

So the models are:

  • Nexus, an locus that brings different Factors together
  • Factors
  • States, for some of the factors.
  • We collect Values as evaluations of factors for all state combos - a full Cartesian product of combinations of state ie if F1 -> (S1,S2) & F2 -> (S3,4). ==>
    F1.F2 = (S1,S3), (S1,S4), (S2,S3), (S2,S4)
  • if F3, F4 and F5 are our independent factors, we get a table to complete like this:

__| (S1,S3) | (S1,S4) | ( S2,S3) | (S2,S4)|
F3|_________|________ |__________|________|
F4|_________|________ |__________|________|
F5|_________|________ |__________|________|

But this table could be any dimensional, provided a normal human being could cope with filling it out, and I want it to be fairly reusable.

So, the Value model has one field for “Independent factor” as a ForeignKey, and one for a “state_space” as ManyToMany against States, dynamically generated from the Factor set for the nexus. So in the view, for a nexus, we “init_value_set” one entry at a time, to create a set for each user that populates the space required - one instance of each required combination of factors and states.

I think you can see the problem in the Nexus model functions below - to make the sets of objects I need (init_value_set) and to check whether the value_set is still valid (which it wouldn’t be if the Nexus changes) (clean_value_set), I am doing a lot in Python and a lot of database hits. This works, but its definitely slow.

I want to get average for different users assessment of the same nexus/factor/state_set but I thought I’d be able to do something like:

avg_set = Value.objects.values('nexus','factor','state_space').annotate(avg_val=Avg('value))

This average evenly divide the total Value objects by the number of Users (so if there are 96 value objects produced for 3 users, there should be 32 averages, but alas… it doesn’t seem to work because state_space is a ManyToManyField.

If I could convert the state_space to a unique string (by concatenating the State pks comma-separated for example), the naive aggregation would work. I looked into sub-query approaches but sadly blew my cerebral cortex.

# models.py

class Nexus(BaseModelWithHistory):
    name = models.CharField(max_length=80)
    factors = models.ManyToManyField(Factor)

    def get_ind_factors(self):
        return self.factors.filter(state__isnull=True)

    def get_dep_factors(self):
        return self.factors.filter(state__isnull=False).distinct()

    def enumerate_state_space(self):
         """provides a nested tuple where the later applied factors are less-nested"""
         states = None
         for factor in self.get_dep_factors():
              if not states:
                  states = factor.state_set.all()
              else:
                  states = product(states, factor.state_set.all())
        return states   # nested tuple

    def init_value_set(self, question, user):
        self.clean_value_set()
        for ind_factor in self.get_ind_factors():
            for state_settings in self.enumerate_state_space():
                ss = [s for s in flatten(state_settings)]
            # check for an instance of each state combination
            # first using statespace__in to match included states and
            # total counts of states that tally
            # state_spaces should not be set manually, so selections of dual states should not occur.
                if not Value.objects.filter(question=question,
                                            user=user,
                                            nexus=self,
                                            ind_factor=ind_factor,
                                            state_space__in=(ss)).annotate(
                    num_states=Count('state_space')).filter(
                    num_states=len(ss)).exists():
                    value = Value.objects.create(question=question, user=user, nexus=self, ind_factor=ind_factor)
                    value.state_space.set([s.pk for s in ss])

    def clean_value_set(self):
         for value in self.value_set.all():
             if len(value.state_space.all()) != len(self.get_dep_factors()):
                 print('deleting', value, "because", len(value.state_space.all()), "should be", len(self.get_dep_factors()))
            value.delete()
        else:
            fs = value.state_space.all().values_list('factor')
            c = Counter(fs)
            # print(c)
            if c.most_common(1)[0][1] > 1:  # get the count of the most common factor
                print('deleting', value, "because", fs)
                value.delete()

You made it to the end of my long and weird problems! Thanks!