StagedTree#

class cegpy.StagedTree(dataframe, sampling_zero_paths=None, var_order=None, struct_missing_label=None, missing_label=None, complete_case=False)[source]#

Bases: EventTree

Representation of a Staged Tree.

A staged tree is a tree where each node is a situation, and each edge is a transition from one situation to another. Each situation is given a ‘stage’ which groups it with other situations which have the same outgoing edges, with equivalent probabilities of occurring.

The class is an extension of EventTree.

Parameters:
  • dataframe (pandas.DataFrame) – Required - DataFrame containing variables as column headers, with event name strings in each cell. These event names will be used to create the edges of the event tree. Counts of each event will be extracted and attached to each edge.

  • sampling_zero_paths (List[Tuple[str]] or None) –

    Optional - Paths to sampling zeros.

    Format is as follows: [(‘edge_1’,), (‘edge_1’, ‘edge_2’), …]

    If no paths are specified, default setting is that no sampling zero paths are created.

  • var_order (List[str] or None) – Optional - Specifies the ordering of variables to be adopted in the event tree. Default var_order is obtained from the order of columns in dataframe. String labels in the list should match the column names in dataframe.

  • struct_missing_label (str or None) – Optional - Label in the dataframe for observations which are structurally missing; e.g: Post operative health status is irrelevant for a dead patient. Label example: “struct”.

  • missing_label (str or None) – Optional - Label in the dataframe for observations which are missing values that are not structurally missing. e.g: Missing height for some individuals in the sample. Label example: “miss” Whatever label is provided will be renamed in the event tree to “missing”.

  • complete_case (bool) – Optional - If True, all entries (rows) with non-structural missing values are removed. Default setting: False.

property prior: Dict[Tuple[str], List[Fraction]]#

A mapping of priors keyed by edge. Keys are 3-tuples of the form: (src, dst, edge_label).

Returns:

A mapping edge -> priors.

Return type:

Dict[Tuple[str], List[Fraction]]

property prior_list: List[List[Fraction]]#
Returns:

Priors in the form of a list of lists.

Return type:

List[List[Fraction]]

property posterior: Dict[Tuple[str], List[Fraction]]#

Posteriors along each edge, calculated by adding edge count to the prior for each edge. Keys are 3-tuples of the form: (src, dst, edge_label).

Returns:

Mapping of edge -> edge_count + prior

Return type:

Dict[Tuple[str], List[Fraction]]

property posterior_list: List[List[Fraction]]#
Returns:

Posteriors in the form of a list of lists.

Return type:

List[List[Fraction]]

property alpha: float#

The equivalent sample size set for the root node which is then uniformly propagated through the tree.

Returns:

The value of Alpha.

Return type:

float

property hyperstage: List[List[str]]#

Indication of which nodes are allowed to be in the same stage. Each list is a list of node names e.g. “s0”.

Returns:

The List of all hyperstages.

Return type:

List[List[str]]

property edge_countset: List[List]#

Indexed the same as situations. :return: Edge counts emination from each node of the tree. :rtype: List[List]

property ahc_output: Dict#

Contains a List of Lists containing all the situations that were merged, and the log likelihood.

Returns:

The output from the AHC algorithm.

Return type:

Dict

calculate_AHC_transitions(prior=None, alpha=None, hyperstage=None, colour_list=None) Dict[source]#

Bayesian Agglommerative Hierarchical Clustering algorithm implementation. It returns a list of lists of the situations which have been merged together, the likelihood of the final model.

Parameters:
  • prior (Dict[Tuple[str], List[Fraction]]) – Optional - A mapping of priors keyed by edge. Keys are 3-tuples of the form: (src, dst, edge_label).

  • alpha (float) – Optional - The equivalent sample size set for the root node which is then uniformly propagated through the tree.

  • hyperstage (List[List[str]]) – Optional - Indication of which nodes are allowed to be in the same stage. Each list is a list of node names e.g. “s0”.

  • colour_list (List[str]) – Optional - a list of hex colours to be used for stages. Otherwise, colours evenly spaced around the colour spectrum are used.

Returns:

The output from the AHC algorithm, specified above.

Return type:

Dict

dot_graph(edge_info: str = 'count', staged: bool = True)[source]#

Returns Dot graph representation of the staged tree.

Parameters:
  • edge_info (str) – Optional - Chooses which summary measure to be displayed on edges. Defaults to “count”. Options: [“count”, “prior”, “posterior”, “probability”]

  • staged (bool) – if True, returns the coloured staged tree, if False, returns the underlying event tree.

Returns:

A graphviz Dot representation of the graph.

Return type:

pydotplus.Dot

create_figure(filename: Optional[str] = None, edge_info: str = 'count', staged: bool = True) Optional[Image][source]#

Draws the coloured staged tree or the underlying event tree for the process described by the dataset.

Parameters:
  • filename (str) –

    Optional - When provided, file is saved to the filename, local to the current working directory. e.g. if filename = “output/event_tree.svg”, the file will be saved to: cwd/output/staged_tree.svg Otherwise, if function is called inside an interactive notebook, image will be displayed in the notebook, even if filename is omitted.

    Supports any filetype that graphviz supports. e.g: “staged_tree.png” or “staged_tree.svg” etc.

  • edge_info (str) – Optional - Chooses which summary measure to be displayed on edges. In event trees, only “count” can be displayed, so this can be omitted.

  • staged (bool) – if True, returns the coloured staged tree, if False, returns the underlying event tree.

Returns:

The event tree Image object.

Return type:

IPython.display.Image or None