Creating a Chain Event Graph

Creating a Chain Event Graph#

Example 1: Using a Stratified Dataset#

This example builds a Chain Event Graph (CEG) from a discrete dataset showing results from a medical experiment. The dataset used is symmetrical, built from a rectangular dataset. These CEGs are known as stratified in the literature.

The Agglomerative Hierarchical Clustering (AHC) algorithm is used to maximise the log marginal likelihood score of the staged tree/CEG model to determine its stages. The package functions under a Bayesian framework and priors can be supplied to the AHC algorithm to override the default settings,

The example medical.xlsx dataset contains 4 categorical variables; Classification, Group, Difficulty, Response.

Each individual is given a binary classification; Blast or Non-blast. Each group is rated on their experience level; Experienced, Inexperienced, or Novice. The classification task they are given has a difficulty rating of Easy or Hard. Finally, their response is shown; Blast or Non-blast.

Firstly, a staged tree object is created from a data source, and calculate the AHC transitions.

from cegpy import StagedTree, ChainEventGraph
import pandas as pd

dataframe = pd.read_excel("medical.xlsx")
dataframe

	Classification	Group	Difficulty	Response
0	Blast	Experienced	Easy	Blast
1	Non-blast	Experienced	Easy	Non-blast
2	Non-blast	Experienced	Hard	Blast
3	Non-blast	Experienced	Hard	Non-blast
4	Blast	Experienced	Easy	Blast
...	...	...	...	...
10979	Blast	Novice	Easy	Non-blast
10980	Blast	Novice	Easy	Blast
10981	Non-blast	Novice	Easy	Blast
10982	Blast	Novice	Easy	Non-blast
10983	Non-blast	Novice	Hard	Non-blast

10984 rows × 4 columns

# Descriptive statistics for the dataset 
dataframe.describe()

	Classification	Group	Difficulty	Response
count	10984	10984	10984	10984
unique	2	3	2	2
top	Non-blast	Novice	Easy	Blast
freq	5493	7389	5494	5863

The AHC algorithm is executed on the event tree, and the nodes are assigned a colour if they are found to be in the same stage as each other. Note that the calculate_AHC_transitions method is only available from the StagedTree class and not the EventTree class.

Effectively, nodes in the same stage share the same parameter set; in other words, the immediate future of these nodes is identical. Note that singleton stages are not coloured in the staged tree and its corresponding CEG to prevent visual cluttering.

When the CEG is created, equivalent nodes (precisely, those whose complete future is identical) in a stage will be combined to compress the graph.

staged_tree = StagedTree(dataframe)
staged_tree.calculate_AHC_transitions();

Once the AHC algorithm has been run to identify the stages, a CEG can be created by passing the StagedTree object into the ChainEventGraph class. When the ChainEventGraph is created, it automatically generates the CEG from the StagedTree object. The process of generation compares nodes that are in the same stage to determine if they are logically compatible with one another. Once the graph has been constructed, and nodes combined, the probabilities of passing down any given edge are displayed.

Like the StagedTree, the graph can be displayed using the create_figure method as shown below.

from IPython.display import Image

chain_event_graph = ChainEventGraph(staged_tree)
chain_event_graph.create_figure()

../_images/c78d8f69d4bb6313cd74747bfc1088b76e7eff384dfa7f222e175096a474b2aa.png

The tree has now been compressed into a Chain Event Graph. The graph represents the system encoded in the data. All paths start at the root node w₀, (which represents an individual entering the system), and terminate at the sink node w_∞ (which represents the point at which an individual exits the system).

Example 2: Chain Event Graph from Non-Stratified Dataset#

This example builds a Chain Event Graph (CEG) from a asymmetric dataset. In simple words, a dataset is asymmetric when the event tree describing the dataset is not symmetric around its root. The class of CEGs built from asymmetric event trees is said to be non-stratified. Note that, technically, a CEG is also said to be non-stratified when the order of events along its different paths is not the same, even though its event tree might be symmetric. Whilst such processes can also be easily modelled with the cegpy package, for this example we focus on non-stratified CEGs that are built from asymmetric event trees/datasets.

Asymmetry in a dataset arises when it has structural zeros or structural missing values in certain rows; in other words, the sample space of a variable is different or empty respectively, depending on its ancestral variables. So logically, certain values of the variable will never be observed for certain configurations of its ancestral variables, irrespective of the sample size.

In this example, we consider the falls.xlsx dataset. Here, by interventional design, individuals who are not assessed are not offered referral or treatment. In this case, we would observe individuals in our dataset who are not assessed, going down the ‘Not Referred & Not Treated’ path with probability 1. This is not helpful, and so we choose to condense the tree and remove this edge. The zero observations for non-assessed individuals for the categories of ‘Referred & Treated’ and ‘Not Referred & Treated’ are both structural zeros.

from cegpy import EventTree
import pandas as pd

dataframe = pd.read_excel("falls.xlsx")
dataframe

	HousingAssessment	Risk	Treatment	Fall
0	Community Not Assessed	Low Risk	NaN	Fall
1	Community Not Assessed	High Risk	NaN	Fall
2	Community Not Assessed	Low Risk	NaN	Don't Fall
3	Community Not Assessed	Low Risk	NaN	Don't Fall
4	Community Not Assessed	Low Risk	NaN	Fall
...	...	...	...	...
49995	Community Not Assessed	Low Risk	NaN	Don't Fall
49996	Community Not Assessed	Low Risk	NaN	Don't Fall
49997	Community Not Assessed	Low Risk	NaN	Don't Fall
49998	Community Not Assessed	Low Risk	NaN	Fall
49999	Community Not Assessed	Low Risk	NaN	Fall

50000 rows × 4 columns

Note: When looking at the description of the dataset, the total count in the Treatment column is not equal to the counts for the other columns. This is the giveaway that the dataset is non-stratified. Extreme care must be taken to ensure that the dataset really is non-stratified, and doesn’t simply have sampling-zeros or sampling missing values. The package has no way of distinguishing these on its own unless the user specifies them.

dataframe.describe()

	HousingAssessment	Risk	Treatment	Fall
count	50000	50000	3250	50000
unique	4	2	3	2
top	Community Not Assessed	Low Risk	Not Referred & Not Treated	Don't Fall
freq	45211	42505	1768	34737

The end result of this is that in the EventTree shown below, paths such as S₀ -> S₂ -> S₇ -> S₁₈ skip the Treatment variable.

event_tree = EventTree(dataframe)
event_tree.create_figure()

../_images/9dfad13dbb1d3217bd0d08c913ed69d028f7829af85e4af2304fc0a32eb0b184.png

As in the stratified medical example, after initial checks on the dataset, and confirmation that the EventTree looks as expected, the next step is to identify the stages. For this, we use the StagedTree class, which first creates the EventTree internally, ready for the user to run a clustering algorithm on it. In this example we use the .calculate_AHC_transitions() method, which executes the agglomerative hierarchical clustering (AHC) algorithm on the EventTree. The package functions under a Bayesian framework and priors can be supplied to the AHC algorithm to override the default settings.

The resultant CEG has been reduced from the tree representation to a more compact graph.

from cegpy import ChainEventGraph, StagedTree

st = StagedTree(dataframe)
st.calculate_AHC_transitions()

ceg = ChainEventGraph(st)
ceg.create_figure()

../_images/e94e83032fc47ff652e7cbd097f84ae38d00e0fefdbf9d6890d6862cf0b962e9.png

As a CEG is a probabilistic model of a series of events, it may be desirable to view a CEG sub-graph when some or all of the variables are known. This can be especially true for graphs with lots of variables, which can balloon in size. In cegpy, this is done by using the ChainEventGraphReducer which is covered on the next page.

Creating a Chain Event Graph

Contents

Creating a Chain Event Graph#

Example 1: Using a Stratified Dataset#

Example 2: Chain Event Graph from Non-Stratified Dataset#