hbutils.algorithm.topological

Topological sorting utilities for integer identifiers and arbitrary objects.

This module implements Kahn’s algorithm for topological sorting and provides two public helper functions:

  • topoids() - Topological sorting for graphs with integer node identifiers.

  • topo() - Topological sorting for graphs composed of arbitrary Python objects.

The implementation supports deterministic output ordering when sort=True by using a heap-based queue; otherwise a FIFO queue is used and the order may vary due to set iteration in adjacency storage.

Note

Duplicate edges are removed before processing. Nodes or items are considered unique by their integer ID (for topoids()) or by their identifier function (for topo()).

Example:

>>> # Integer-based nodes
>>> topoids(3, [(0, 1), (2, 1)])
[0, 2, 1]

>>> # Object-based nodes with explicit identifiers
>>> items = ["a", "b", "c"]
>>> edges = [("a", "c"), ("b", "c")]
>>> topo(items, edges, identifier=str, sort=True)
['a', 'b', 'c']

__all__

hbutils.algorithm.topological.__all__ = ['topoids', 'topo']

Built-in mutable sequence.

If no argument is given, the constructor creates a new empty list. The argument must be an iterable if specified.

topoids

hbutils.algorithm.topological.topoids(n: int, edges: Collection[Tuple[int, int]], sort: bool = False) List[int][source]

Perform a topological sort on integer-identified nodes.

This function performs a topological sort on a directed acyclic graph (DAG) represented by integer node IDs and edges. It uses Kahn’s algorithm for topological sorting. When sort=True, a heap is used to always pick the smallest available node ID, yielding deterministic ordering; otherwise a FIFO queue is used and order may depend on adjacency iteration.

Parameters:
  • n (int) – Count of nodes in the graph (nodes are numbered from 0 to n - 1).

  • edges (Collection[Tuple[int, int]]) – Collection of directed edges, where each tuple (x, y) means node x must appear before node y in the final sequence.

  • sort (bool) – Whether to enforce sorted output order by smallest available ID. When True, the time complexity increases by an extra \(O\left(\log{N}\right)\) due to heap maintenance. Defaults to False.

Returns:

Sorted sequence of node IDs in topological order.

Return type:

List[int]

Raises:
  • ArithmeticError – If the graph contains a cycle or not all nodes are reachable.

  • AssertionError – If edge endpoints are not valid node IDs (not in range [0, n)).

Example:

>>> topoids(3, [])
[0, 1, 2]
>>> topoids(3, [(0, 1), (2, 1)])
[0, 2, 1]
>>> topoids(4, [(0, 2), (0, 1), (2, 3), (1, 3)])
[0, 1, 2, 3]
>>> topoids(4, [(0, 2), (0, 1), (2, 3), (1, 3)], sort=True)
[0, 1, 2, 3]

Warning

The output order is not deterministic when sort=False due to unordered set iteration in adjacency lists.

topo

hbutils.algorithm.topological.topo(items: Iterable[_ElementType], edges: Collection[Tuple[_ElementType, _ElementType]], identifier: Callable[[_ElementType], _IdType] | None = None, sort: bool = False) List[_ElementType][source]

Perform a topological sort on arbitrary objects.

This function performs topological sorting on arbitrary objects by mapping each item to an integer ID and delegating sorting to topoids(). Items with the same identifier are treated as a single node; only the first occurrence is kept in the output mapping.

Parameters:
  • items (Iterable[_ElementType]) – Iterable of items to be sorted.

  • edges (Collection[Tuple[_ElementType, _ElementType]]) – Collection of directed edges, where each tuple (x, y) means item x must appear before item y in the final sequence.

  • identifier (Callable[[_ElementType], _IdType], optional) – Identifier function for the items. It must return a hashable value that uniquely identifies each item. Defaults to None, which uses Python’s built-in id() function (object identity).

  • sort (bool) – Whether to enforce sorted output order by the integer IDs that are assigned based on first appearance. Defaults to False.

Returns:

Sorted sequence of unique items in topological order.

Return type:

List[_ElementType]

Raises:
  • ArithmeticError – If the graph contains a cycle or not all nodes are reachable.

  • KeyError – If an edge references an item whose identifier is not present in the items iterable.

Example:

>>> items = ["a", "b", "c"]
>>> edges = [("a", "c"), ("b", "c")]
>>> topo(items, edges, identifier=str, sort=True)
['a', 'b', 'c']
>>> topo(items, [("a", "b"), ("b", "a")], identifier=str)
Traceback (most recent call last):
    ...
ArithmeticError: ('Invalid topological graph, for some items not accessible - (\'a\', \'b\').', ('a', 'b'))

Note

Duplicate items are removed based on their identifiers; only the first occurrence is retained in the output.