EventTree#

class cegpy.EventTree(dataframe: DataFrame, sampling_zero_paths=None, var_order=None, struct_missing_label=None, missing_label=None, complete_case=False)[source]#

Bases: MultiDiGraph

This class extends the NetworkX MultiDiGraph class to allow the creation of event tree representations of data.

Parameters:
  • dataframe (pandas.DataFrame) – Required - DataFrame containing variables as column headers, with event name strings in each cell. These event names will be used to create the edges of the event tree. Counts of each event will be extracted and attached to each edge.

  • sampling_zero_paths (List[Tuple[str]] or None) –

    Optional - Paths to sampling zeros.

    Format is as follows: [(‘edge_1’,), (‘edge_1’, ‘edge_2’), …]

    If no paths are specified, default setting is that no sampling zero paths are created.

  • var_order (List[str] or None) – Optional - Specifies the ordering of variables to be adopted in the event tree. Default var_order is obtained from the order of columns in dataframe. String labels in the list should match the column names in dataframe.

  • struct_missing_label (str or None) – Optional - Label in the dataframe for observations which are structurally missing; e.g: Post operative health status is irrelevant for a dead patient. Label example: “struct”.

  • missing_label (str or None) – Optional - Label in the dataframe for observations which are missing values that are not structurally missing. e.g: Missing height for some individuals in the sample. Label example: “miss” Whatever label is provided will be renamed in the event tree to “missing”.

  • complete_case (bool) – Optional - If True, all entries (rows) with non-structural missing values are removed. Default setting: False.

property root: str#
Returns:

The name of the root node of the event tree, currently hard coded to ‘s0’.

Return type:

str

property variables: List#
Returns:

The column headers of the dataset.

Return type:

List[str]

property sampling_zeros: Optional[List[Tuple[str]]]#

Setting this property will apply sampling zero paths to the tree. If different to previous value, the event tree will be regenerated.

Returns:

Sampling zero paths provided by the user.

Return type:

List[Tuple[str]] or None

property situations: List[str]#
Returns:

The situations of the tree (non-leaf nodes).

Return type:

List[str]

property leaves: List[str]#
Returns:

The leaves of the tree.

Return type:

List[str]

property edge_counts: Dict#

The counts along edges all edges in the tree, where edges are a Tuple like so: (“source_node”, “destination_node”, “edge_label”).

Returns:

A mapping of edges to their counts.

Return type:

Dict[Tuple[str], Int]

property categories_per_variable: Dict#

The number of unique categories/levels for each variable (column headings in dataframe).

Returns:

A mapping of variables to the number of unique categories/levels.

Return type:

Dict[str, Int]

dot_graph(edge_info: str = 'count') Dot[source]#

Returns Dot graph representation of the event tree. :param edge_info: Optional - Chooses which summary measure to be displayed on edges. In event trees, only “count” can be displayed, so this can be omitted.

Returns:

A graphviz Dot representation of the graph.

Return type:

pydotplus.Dot

create_figure(filename=None, edge_info: str = 'count') Optional[Image][source]#

Creates event tree from the dataframe.

Parameters:
  • filename (str) – Optional - When provided, file is saved to the filename, local to the current working directory. e.g. if filename = “output/event_tree.svg”, the file will be saved to: cwd/output/event_tree.svg Otherwise, if function is called inside an interactive notebook, image will be displayed in the notebook, even if filename is omitted. Supports any filetype that graphviz supports. e.g: “event_tree.png” or “event_tree.svg” etc.

  • edge_info (str) – Optional - Chooses which summary measure to be displayed on edges. In event trees, only “count” can be displayed, so this can be omitted.

Returns:

The event tree Image object.

Return type:

IPython.display.Image or None