hbutils.encoding.int_hash_val

Hash function validation utilities for integer-based hashing.

This module provides a comprehensive validation suite for integer hash functions that accept str, bytes, or bytearray inputs and return integer hash values. The validation suite focuses on common properties expected from robust hash functions, including determinism, type consistency, avalanche effect, uniform distribution, collision resistance, empty input handling, and performance characteristics.

The module contains the following main components:

Note

The validation suite assumes a 32-bit output when computing the avalanche effect score. For hash functions with different bit widths, interpret the avalanche metrics accordingly.

Example:

>>> from hbutils.encoding import int_hash_val_comprehensive
>>>
>>> print(int_hash_val_comprehensive('xs'))  # validate existing hash functions
╔══════════════════════════════════════════════════════════════════════════════════════════════╗
║                          COMPREHENSIVE HASH FUNCTION VALIDATION REPORT                       ║
╠══════════════════════════════════════════════════════════════════════════════════════════════╣
║ Function Name:     xs               │ Overall Status:    PASS                                ║
║ Properties Tested: 7                │ Properties Passed: 7    (100.0%)                       ║
╠══════════════════════════════════════════════════════════════════════════════════════════════╣
║                                    PROPERTY STATUS                                           ║
╠══════════════════════════════════════════════════════════════════════════════════════════════╣
║ ✓ Determinism                                   │ PASS                                       ║
║ ✓ Type Consistency                              │ PASS                                       ║
║ ✓ Avalanche Effect                              │ PASS       | Avalanche Effect:       42.9% ║
║ ✓ Uniform Distribution                          │ PASS       | Uniformity Score:       0.996 ║
║ ✓ Collision Resistance                          │ PASS       | Collision Rate:        0.0000 ║
║ ✓ Empty Input                                   │ PASS                                       ║
║ ✓ Performance                                   │ PASS       | Avg Throughput:      4.6 MB/s ║
╠══════════════════════════════════════════════════════════════════════════════════════════════╣
║                                   RECOMMENDATIONS                                            ║
╠══════════════════════════════════════════════════════════════════════════════════════════════╣
║ ✓ Hash function meets all validation criteria - suitable for production use                  ║
╚══════════════════════════════════════════════════════════════════════════════════════════════╝
DETAILED ANALYSIS:
• All validation tests passed successfully
• Hash function demonstrates good cryptographic properties
• Suitable for general-purpose hashing applications

__all__

hbutils.encoding.int_hash_val.__all__ = ['int_hash_val_determinism', 'int_hash_val_type_consistency', 'int_hash_val_avalanche_effect', 'int_hash_val_uniform_distribution', 'int_hash_val_collision_resistance', 'int_hash_val_empty_input', 'int_hash_val_performance', 'int_hash_val_comprehensive', 'DeterminismValidationResult', 'TypeConsistencyValidationResult', 'AvalancheEffectValidationResult', 'UniformDistributionValidationResult', 'CollisionResistanceValidationResult', 'EmptyInputValidationResult', 'PerformanceValidationResult', 'ComprehensiveValidationResult']

Built-in mutable sequence.

If no argument is given, the constructor creates a new empty list. The argument must be an iterable if specified.

DeterminismValidationResult

class hbutils.encoding.int_hash_val.DeterminismValidationResult(passed: bool, failed_cases: List[str], total_tested: int, failed_count: int)[source]

Results from determinism validation test.

Parameters:
  • passed (bool) – Whether the determinism test passed

  • failed_cases (List[str]) – List of test cases that failed determinism check

  • total_tested (int) – Total number of test cases evaluated

  • failed_count (int) – Number of test cases that failed

TypeConsistencyValidationResult

class hbutils.encoding.int_hash_val.TypeConsistencyValidationResult(passed: bool, failed_cases: List[str], total_tested: int, failed_count: int, consistent_hashes: Dict[str, int])[source]

Results from type consistency validation test.

Parameters:
  • passed (bool) – Whether the type consistency test passed

  • failed_cases (List[str]) – List of test cases that failed type consistency check

  • total_tested (int) – Total number of test cases evaluated

  • failed_count (int) – Number of test cases that failed

  • consistent_hashes (Dict[str, int]) – Dictionary mapping test strings to their consistent hash values

AvalancheEffectValidationResult

class hbutils.encoding.int_hash_val.AvalancheEffectValidationResult(passed: bool, avg_bit_changes: float, change_percentage: float, total_comparisons: int, bit_changes_list: List[int], min_changes: int, max_changes: int)[source]

Results from avalanche effect validation test.

Parameters:
  • passed (bool) – Whether the avalanche effect test passed

  • avg_bit_changes (float) – Average number of bits changed across all comparisons

  • change_percentage (float) – Percentage of bits changed (avg_bit_changes / total_bits * 100)

  • total_comparisons (int) – Total number of hash comparisons performed

  • bit_changes_list (List[int]) – List of bit changes for each comparison

  • min_changes (int) – Minimum number of bits changed in any comparison

  • max_changes (int) – Maximum number of bits changed in any comparison

UniformDistributionValidationResult

class hbutils.encoding.int_hash_val.UniformDistributionValidationResult(passed: bool, uniformity_score: float, bucket_stats: Dict[str, Any], sample_count: int, buckets: List[int])[source]

Results from uniform distribution validation test.

Parameters:
  • passed (bool) – Whether the uniform distribution test passed

  • uniformity_score (float) – Score indicating distribution uniformity (0-1, higher is better)

  • bucket_stats (Dict[str, Any]) – Statistics about bucket distribution

  • sample_count (int) – Number of samples used in the test

  • buckets (List[int]) – List of counts for each bucket

CollisionResistanceValidationResult

class hbutils.encoding.int_hash_val.CollisionResistanceValidationResult(passed: bool, collision_count: int, collision_rate: float, sample_size: int, unique_hashes: int, collision_pairs: List[Tuple[str, int]])[source]

Results from collision resistance validation test.

Parameters:
  • passed (bool) – Whether the collision resistance test passed

  • collision_count (int) – Number of collisions detected

  • collision_rate (float) – Rate of collisions (collision_count / sample_size)

  • sample_size (int) – Total number of samples tested

  • unique_hashes (int) – Number of unique hash values generated

  • collision_pairs (List[Tuple[str, int]]) – List of (input, hash) tuples that collided

EmptyInputValidationResult

class hbutils.encoding.int_hash_val.EmptyInputValidationResult(passed: bool, hash_results: List[int], consistent_empty_hash: bool, error_cases: List[Tuple[str, str]], empty_hash_value: int | None)[source]

Results from empty input validation test.

Parameters:
  • passed (bool) – Whether the empty input test passed

  • hash_results (List[int]) – List of hash values for empty inputs

  • consistent_empty_hash (bool) – Whether all empty inputs produced the same hash

  • error_cases (List[Tuple[str, str]]) – List of (input_type, error_message) tuples for failed cases

  • empty_hash_value (Union[int, None]) – The consistent hash value for empty inputs, or None if inconsistent

PerformanceValidationResult

class hbutils.encoding.int_hash_val.PerformanceValidationResult(passed: bool, performance_data: Dict[int, Dict[str, float]], tested_sizes: List[int], completed_sizes: List[int])[source]

Results from performance validation test.

Parameters:
  • passed (bool) – Whether the performance test passed (completed without errors)

  • performance_data (Dict[int, Dict[str, float]]) – Dictionary mapping data sizes to performance metrics

  • tested_sizes (List[int]) – List of data sizes that were tested

  • completed_sizes (List[int]) – List of data sizes that completed successfully

ComprehensiveValidationResult

class hbutils.encoding.int_hash_val.ComprehensiveValidationResult(passed: bool, not_passed_properties: List[str], hash_function_name: str, total_properties_tested: int, properties_passed: int, determinism: DeterminismValidationResult, type_consistency: TypeConsistencyValidationResult, avalanche_effect: AvalancheEffectValidationResult, uniform_distribution: UniformDistributionValidationResult, collision_resistance: CollisionResistanceValidationResult, empty_input: EmptyInputValidationResult, performance: PerformanceValidationResult)[source]

Results from comprehensive validation test.

Parameters:
__str__() str[source]

Generate a formatted string representation of the validation results.

Returns:

Formatted validation report

Return type:

str

int_hash_val_determinism

hbutils.encoding.int_hash_val.int_hash_val_determinism(hash_func: str | Callable[[str | bytes | bytearray], int], test_data: List[str | bytes | bytearray]) DeterminismValidationResult[source]

Validate determinism: same input produces same output.

Tests whether the hash function consistently produces the same output for identical inputs across multiple invocations.

Parameters:
  • hash_func (_HashFuncTyping) – The hash function to validate. Should accept str, bytes, or bytearray and return an integer hash value. Can be a string name or callable.

  • test_data (List[Union[str, bytes, bytearray]]) – List of test inputs to validate determinism

Returns:

Determinism validation results

Return type:

DeterminismValidationResult

Example::
>>> def simple_hash(data):
...     return hash(data) & 0xFFFFFFFF
>>> result = int_hash_val_determinism(simple_hash, ["test", b"data"])
>>> result.passed
True

int_hash_val_type_consistency

hbutils.encoding.int_hash_val.int_hash_val_type_consistency(hash_func: str | Callable[[str | bytes | bytearray], int]) TypeConsistencyValidationResult[source]

Validate type consistency: same content in different types should produce same hash.

Tests whether the hash function produces identical hash values for the same content when provided as string, bytes, or bytearray types.

Parameters:

hash_func (_HashFuncTyping) – The hash function to validate. Should accept str, bytes, or bytearray and return an integer hash value. Can be a string name or callable.

Returns:

Type consistency validation results

Return type:

TypeConsistencyValidationResult

Example::
>>> def simple_hash(data):
...     if isinstance(data, str):
...         data = data.encode('utf-8')
...     return hash(bytes(data)) & 0xFFFFFFFF
>>> result = int_hash_val_type_consistency(simple_hash)
>>> result.passed
True

int_hash_val_avalanche_effect

hbutils.encoding.int_hash_val.int_hash_val_avalanche_effect(hash_func: str | Callable[[str | bytes | bytearray], int], sample_size: int = 100) AvalancheEffectValidationResult[source]

Validate avalanche effect: small input changes cause significant output changes.

Tests the avalanche effect property where a small change in input (single bit/character) should result in approximately 50% of the output bits changing. This is a key property of good hash functions.

Parameters:
  • hash_func (_HashFuncTyping) – The hash function to validate. Should accept str, bytes, or bytearray and return an integer hash value. Can be a string name or callable.

  • sample_size (int) – Number of random samples to test, defaults to 100

Returns:

Avalanche effect validation results

Return type:

AvalancheEffectValidationResult

Example::
>>> def simple_hash(data):
...     return hash(data) & 0xFFFFFFFF
>>> result = int_hash_val_avalanche_effect(simple_hash, sample_size=50)
>>> result.change_percentage > 40.0
True

int_hash_val_uniform_distribution

hbutils.encoding.int_hash_val.int_hash_val_uniform_distribution(hash_func: str | Callable[[str | bytes | bytearray], int], sample_size: int = 10000) UniformDistributionValidationResult[source]

Validate uniform distribution of hash outputs.

Tests whether the hash function produces uniformly distributed output values across the hash space. Divides the hash space into buckets and checks if hash values are evenly distributed.

Parameters:
  • hash_func (_HashFuncTyping) – The hash function to validate. Should accept str, bytes, or bytearray and return an integer hash value. Can be a string name or callable.

  • sample_size (int) – Number of random samples to generate and hash, defaults to 10000

Returns:

Uniform distribution validation results

Return type:

UniformDistributionValidationResult

Example::
>>> def simple_hash(data):
...     return hash(data) & 0xFFFFFFFF
>>> result = int_hash_val_uniform_distribution(simple_hash, sample_size=1000)
>>> result.uniformity_score > 0.95
True

int_hash_val_collision_resistance

hbutils.encoding.int_hash_val.int_hash_val_collision_resistance(hash_func: str | Callable[[str | bytes | bytearray], int], sample_size: int = 100000) CollisionResistanceValidationResult[source]

Validate collision resistance.

Tests the hash function’s resistance to collisions by generating many random inputs and checking for duplicate hash values. A good hash function should have a very low collision rate.

Parameters:
  • hash_func (_HashFuncTyping) – The hash function to validate. Should accept str, bytes, or bytearray and return an integer hash value. Can be a string name or callable.

  • sample_size (int) – Number of random samples to test, defaults to 100000

Returns:

Collision resistance validation results

Return type:

CollisionResistanceValidationResult

Example::
>>> def simple_hash(data):
...     return hash(data) & 0xFFFFFFFF
>>> result = int_hash_val_collision_resistance(simple_hash, sample_size=10000)
>>> result.collision_rate < 0.001
True

int_hash_val_empty_input

hbutils.encoding.int_hash_val.int_hash_val_empty_input(hash_func: str | Callable[[str | bytes | bytearray], int]) EmptyInputValidationResult[source]

Validate empty input handling.

Tests whether the hash function correctly handles empty inputs of different types (empty string, empty bytes, empty bytearray) and produces consistent results.

Parameters:

hash_func (_HashFuncTyping) – The hash function to validate. Should accept str, bytes, or bytearray and return an integer hash value. Can be a string name or callable.

Returns:

Empty input validation results

Return type:

EmptyInputValidationResult

Example::
>>> def simple_hash(data):
...     if isinstance(data, str):
...         data = data.encode('utf-8')
...     return hash(bytes(data)) & 0xFFFFFFFF
>>> result = int_hash_val_empty_input(simple_hash)
>>> result.consistent_empty_hash
True

int_hash_val_performance

hbutils.encoding.int_hash_val.int_hash_val_performance(hash_func: str | Callable[[str | bytes | bytearray], int], data_sizes: List[int] | None = None) PerformanceValidationResult[source]

Validate performance characteristics.

Measures the hash function’s performance across different input sizes, calculating average hashing time and throughput in MB/s.

Parameters:
  • hash_func (_HashFuncTyping) – The hash function to validate. Should accept str, bytes, or bytearray and return an integer hash value. Can be a string name or callable.

  • data_sizes (List[int], optional) – List of data sizes (in bytes) to test, defaults to [100, 1000, 10000, 100000]

Returns:

Performance validation results

Return type:

PerformanceValidationResult

Example::
>>> def simple_hash(data):
...     return hash(data) & 0xFFFFFFFF
>>> result = int_hash_val_performance(simple_hash, data_sizes=[100, 1000])
>>> result.passed
True
>>> 100 in result.performance_data
True

int_hash_val_comprehensive

hbutils.encoding.int_hash_val.int_hash_val_comprehensive(hash_func: str | Callable[[str | bytes | bytearray], int]) ComprehensiveValidationResult[source]

Comprehensive validation of hash function properties.

Runs a complete suite of validation tests on the hash function, including: determinism, type consistency, avalanche effect, uniform distribution, collision resistance, empty input handling, and performance characteristics.

Parameters:

hash_func (_HashFuncTyping) – The hash function to validate. Should accept str, bytes, or bytearray and return an integer hash value. Can be a string name or callable.

Returns:

Comprehensive validation results

Return type:

ComprehensiveValidationResult

Example::
>>> def simple_hash(data):
...     if isinstance(data, str):
...         data = data.encode('utf-8')
...     return hash(bytes(data)) & 0xFFFFFFFF
>>> result = int_hash_val_comprehensive(simple_hash)
>>> result.hash_function_name
'simple_hash'
>>> result.total_properties_tested
7