hbutils.encoding.int_hash_val

This module provides comprehensive validation utilities for hash functions, including tests for:

  • Determinism: Ensures consistent output for the same input

  • Type consistency: Validates consistent hashing across different input types

  • Avalanche effect: Measures how small input changes affect output

  • Uniform distribution: Analyzes hash value distribution patterns

  • Collision resistance: Tests for hash collisions

  • Empty input handling: Validates behavior with empty inputs

  • Performance characteristics: Measures hashing speed and throughput

The module is designed to validate integer-based hash functions that accept string, bytes, or bytearray inputs and return integer hash values.

Example::
>>> from hbutils.encoding import int_hash_val_comprehensive
>>>
>>> print(int_hash_val_comprehensive('xs'))  # validate existing hash functions
╔══════════════════════════════════════════════════════════════════════════════════════════════╗
║                          COMPREHENSIVE HASH FUNCTION VALIDATION REPORT                       ║
╠══════════════════════════════════════════════════════════════════════════════════════════════╣
║ Function Name:     xs               │ Overall Status:    PASS                                ║
║ Properties Tested: 7                │ Properties Passed: 7    (100.0%)                       ║
╠══════════════════════════════════════════════════════════════════════════════════════════════╣
║                                    PROPERTY STATUS                                           ║
╠══════════════════════════════════════════════════════════════════════════════════════════════╣
║ ✓ Determinism                                   │ PASS                                       ║
║ ✓ Type Consistency                              │ PASS                                       ║
║ ✓ Avalanche Effect                              │ PASS       | Avalanche Effect:       42.9% ║
║ ✓ Uniform Distribution                          │ PASS       | Uniformity Score:       0.996 ║
║ ✓ Collision Resistance                          │ PASS       | Collision Rate:        0.0000 ║
║ ✓ Empty Input                                   │ PASS                                       ║
║ ✓ Performance                                   │ PASS       | Avg Throughput:      4.6 MB/s ║
╠══════════════════════════════════════════════════════════════════════════════════════════════╣
║                                   RECOMMENDATIONS                                            ║
╠══════════════════════════════════════════════════════════════════════════════════════════════╣
║ ✓ Hash function meets all validation criteria - suitable for production use                  ║
╚══════════════════════════════════════════════════════════════════════════════════════════════╝
DETAILED ANALYSIS:
• All validation tests passed successfully
• Hash function demonstrates good cryptographic properties
• Suitable for general-purpose hashing applications
>>>
>>> def basic_good_hash(data) -> int:
...     # Convert all input types to bytes
...     if isinstance(data, str):
...         data = data.encode('utf-8')
...     elif isinstance(data, bytearray):
...         data = bytes(data)
...     hash_val = 0x811c9dc5  # FNV offset basis (32-bit)
...     for byte in data:
...         # Simple polynomial hash variant
...         hash_val = ((hash_val * 33) ^ byte) & 0xffffffff
...         # Add some bit mixing
...         hash_val ^= hash_val >> 16
...         hash_val = (hash_val * 0x85ebca6b) & 0xffffffff
...         hash_val ^= hash_val >> 13
...     return hash_val & 0xffffffff
...
>>> print(int_hash_val_comprehensive(basic_good_hash))
╔══════════════════════════════════════════════════════════════════════════════════════════════╗
║                          COMPREHENSIVE HASH FUNCTION VALIDATION REPORT                       ║
╠══════════════════════════════════════════════════════════════════════════════════════════════╣
║ Function Name:     basic_good_hash  │ Overall Status:    PASS                                ║
║ Properties Tested: 7                │ Properties Passed: 7    (100.0%)                       ║
╠══════════════════════════════════════════════════════════════════════════════════════════════╣
║                                    PROPERTY STATUS                                           ║
╠══════════════════════════════════════════════════════════════════════════════════════════════╣
║ ✓ Determinism                                   │ PASS                                       ║
║ ✓ Type Consistency                              │ PASS                                       ║
║ ✓ Avalanche Effect                              │ PASS       | Avalanche Effect:       50.5% ║
║ ✓ Uniform Distribution                          │ PASS       | Uniformity Score:       0.996 ║
║ ✓ Collision Resistance                          │ PASS       | Collision Rate:        0.0000 ║
║ ✓ Empty Input                                   │ PASS                                       ║
║ ✓ Performance                                   │ PASS       | Avg Throughput:      2.5 MB/s ║
╠══════════════════════════════════════════════════════════════════════════════════════════════╣
║                                   RECOMMENDATIONS                                            ║
╠══════════════════════════════════════════════════════════════════════════════════════════════╣
║ ✓ Hash function meets all validation criteria - suitable for production use                  ║
╚══════════════════════════════════════════════════════════════════════════════════════════════╝
DETAILED ANALYSIS:
• All validation tests passed successfully
• Hash function demonstrates good cryptographic properties
• Suitable for general-purpose hashing applications

int_hash_val_determinism

hbutils.encoding.int_hash_val.int_hash_val_determinism(hash_func: str | Callable[[str | bytes | bytearray], int], test_data: List[str | bytes | bytearray]) DeterminismValidationResult[source]

Validate determinism: same input produces same output.

Tests whether the hash function consistently produces the same output for identical inputs across multiple invocations.

Parameters:
  • hash_func (_HashFuncTyping) – The hash function to validate. Should accept str, bytes, or bytearray and return an integer hash value. Can be a string name or callable.

  • test_data (List[Union[str, bytes, bytearray]]) – List of test inputs to validate determinism

Returns:

Determinism validation results

Return type:

DeterminismValidationResult

Example::
>>> def simple_hash(data):
...     return hash(data) & 0xFFFFFFFF
>>> result = int_hash_val_determinism(simple_hash, ["test", b"data"])
>>> result.passed
True

int_hash_val_type_consistency

hbutils.encoding.int_hash_val.int_hash_val_type_consistency(hash_func: str | Callable[[str | bytes | bytearray], int]) TypeConsistencyValidationResult[source]

Validate type consistency: same content in different types should produce same hash.

Tests whether the hash function produces identical hash values for the same content when provided as string, bytes, or bytearray types.

Parameters:

hash_func (_HashFuncTyping) – The hash function to validate. Should accept str, bytes, or bytearray and return an integer hash value. Can be a string name or callable.

Returns:

Type consistency validation results

Return type:

TypeConsistencyValidationResult

Example::
>>> def simple_hash(data):
...     if isinstance(data, str):
...         data = data.encode('utf-8')
...     return hash(bytes(data)) & 0xFFFFFFFF
>>> result = int_hash_val_type_consistency(simple_hash)
>>> result.passed
True

int_hash_val_avalanche_effect

hbutils.encoding.int_hash_val.int_hash_val_avalanche_effect(hash_func: str | Callable[[str | bytes | bytearray], int], sample_size: int = 100) AvalancheEffectValidationResult[source]

Validate avalanche effect: small input changes cause significant output changes.

Tests the avalanche effect property where a small change in input (single bit/character) should result in approximately 50% of the output bits changing. This is a key property of good hash functions.

Parameters:
  • hash_func (_HashFuncTyping) – The hash function to validate. Should accept str, bytes, or bytearray and return an integer hash value. Can be a string name or callable.

  • sample_size (int) – Number of random samples to test, defaults to 100

Returns:

Avalanche effect validation results

Return type:

AvalancheEffectValidationResult

Example::
>>> def simple_hash(data):
...     return hash(data) & 0xFFFFFFFF
>>> result = int_hash_val_avalanche_effect(simple_hash, sample_size=50)
>>> result.change_percentage > 40.0
True

int_hash_val_uniform_distribution

hbutils.encoding.int_hash_val.int_hash_val_uniform_distribution(hash_func: str | Callable[[str | bytes | bytearray], int], sample_size: int = 10000) UniformDistributionValidationResult[source]

Validate uniform distribution of hash outputs.

Tests whether the hash function produces uniformly distributed output values across the hash space. Divides the hash space into buckets and checks if hash values are evenly distributed.

Parameters:
  • hash_func (_HashFuncTyping) – The hash function to validate. Should accept str, bytes, or bytearray and return an integer hash value. Can be a string name or callable.

  • sample_size (int) – Number of random samples to generate and hash, defaults to 10000

Returns:

Uniform distribution validation results

Return type:

UniformDistributionValidationResult

Example::
>>> def simple_hash(data):
...     return hash(data) & 0xFFFFFFFF
>>> result = int_hash_val_uniform_distribution(simple_hash, sample_size=1000)
>>> result.uniformity_score > 0.95
True

int_hash_val_collision_resistance

hbutils.encoding.int_hash_val.int_hash_val_collision_resistance(hash_func: str | Callable[[str | bytes | bytearray], int], sample_size: int = 100000) CollisionResistanceValidationResult[source]

Validate collision resistance.

Tests the hash function’s resistance to collisions by generating many random inputs and checking for duplicate hash values. A good hash function should have a very low collision rate.

Parameters:
  • hash_func (_HashFuncTyping) – The hash function to validate. Should accept str, bytes, or bytearray and return an integer hash value. Can be a string name or callable.

  • sample_size (int) – Number of random samples to test, defaults to 100000

Returns:

Collision resistance validation results

Return type:

CollisionResistanceValidationResult

Example::
>>> def simple_hash(data):
...     return hash(data) & 0xFFFFFFFF
>>> result = int_hash_val_collision_resistance(simple_hash, sample_size=10000)
>>> result.collision_rate < 0.001
True

int_hash_val_empty_input

hbutils.encoding.int_hash_val.int_hash_val_empty_input(hash_func: str | Callable[[str | bytes | bytearray], int]) EmptyInputValidationResult[source]

Validate empty input handling.

Tests whether the hash function correctly handles empty inputs of different types (empty string, empty bytes, empty bytearray) and produces consistent results.

Parameters:

hash_func (_HashFuncTyping) – The hash function to validate. Should accept str, bytes, or bytearray and return an integer hash value. Can be a string name or callable.

Returns:

Empty input validation results

Return type:

EmptyInputValidationResult

Example::
>>> def simple_hash(data):
...     if isinstance(data, str):
...         data = data.encode('utf-8')
...     return hash(bytes(data)) & 0xFFFFFFFF
>>> result = int_hash_val_empty_input(simple_hash)
>>> result.consistent_empty_hash
True

int_hash_val_performance

hbutils.encoding.int_hash_val.int_hash_val_performance(hash_func: str | Callable[[str | bytes | bytearray], int], data_sizes: List[int] | None = None) PerformanceValidationResult[source]

Validate performance characteristics.

Measures the hash function’s performance across different input sizes, calculating average hashing time and throughput in MB/s.

Parameters:
  • hash_func (_HashFuncTyping) – The hash function to validate. Should accept str, bytes, or bytearray and return an integer hash value. Can be a string name or callable.

  • data_sizes (List[int], optional) – List of data sizes (in bytes) to test, defaults to [100, 1000, 10000, 100000]

Returns:

Performance validation results

Return type:

PerformanceValidationResult

Example::
>>> def simple_hash(data):
...     return hash(data) & 0xFFFFFFFF
>>> result = int_hash_val_performance(simple_hash, data_sizes=[100, 1000])
>>> result.passed
True
>>> 100 in result.performance_data
True

int_hash_val_comprehensive

hbutils.encoding.int_hash_val.int_hash_val_comprehensive(hash_func: str | Callable[[str | bytes | bytearray], int]) ComprehensiveValidationResult[source]

Comprehensive validation of hash function properties.

Runs a complete suite of validation tests on the hash function, including: determinism, type consistency, avalanche effect, uniform distribution, collision resistance, empty input handling, and performance characteristics.

Parameters:

hash_func (_HashFuncTyping) – The hash function to validate. Should accept str, bytes, or bytearray and return an integer hash value. Can be a string name or callable.

Returns:

Comprehensive validation results

Return type:

ComprehensiveValidationResult

Example::
>>> def simple_hash(data):
...     if isinstance(data, str):
...         data = data.encode('utf-8')
...     return hash(bytes(data)) & 0xFFFFFFFF
>>> result = int_hash_val_comprehensive(simple_hash)
>>> result.hash_function_name
'simple_hash'
>>> result.total_properties_tested
7

DeterminismValidationResult

class hbutils.encoding.int_hash_val.DeterminismValidationResult(passed: bool, failed_cases: List[str], total_tested: int, failed_count: int)[source]

Results from determinism validation test.

Parameters:
  • passed (bool) – Whether the determinism test passed

  • failed_cases (List[str]) – List of test cases that failed determinism check

  • total_tested (int) – Total number of test cases evaluated

  • failed_count (int) – Number of test cases that failed

TypeConsistencyValidationResult

class hbutils.encoding.int_hash_val.TypeConsistencyValidationResult(passed: bool, failed_cases: List[str], total_tested: int, failed_count: int, consistent_hashes: Dict[str, int])[source]

Results from type consistency validation test.

Parameters:
  • passed (bool) – Whether the type consistency test passed

  • failed_cases (List[str]) – List of test cases that failed type consistency check

  • total_tested (int) – Total number of test cases evaluated

  • failed_count (int) – Number of test cases that failed

  • consistent_hashes (Dict[str, int]) – Dictionary mapping test strings to their consistent hash values

AvalancheEffectValidationResult

class hbutils.encoding.int_hash_val.AvalancheEffectValidationResult(passed: bool, avg_bit_changes: float, change_percentage: float, total_comparisons: int, bit_changes_list: List[int], min_changes: int, max_changes: int)[source]

Results from avalanche effect validation test.

Parameters:
  • passed (bool) – Whether the avalanche effect test passed

  • avg_bit_changes (float) – Average number of bits changed across all comparisons

  • change_percentage (float) – Percentage of bits changed (avg_bit_changes / total_bits * 100)

  • total_comparisons (int) – Total number of hash comparisons performed

  • bit_changes_list (List[int]) – List of bit changes for each comparison

  • min_changes (int) – Minimum number of bits changed in any comparison

  • max_changes (int) – Maximum number of bits changed in any comparison

UniformDistributionValidationResult

class hbutils.encoding.int_hash_val.UniformDistributionValidationResult(passed: bool, uniformity_score: float, bucket_stats: Dict[str, Any], sample_count: int, buckets: List[int])[source]

Results from uniform distribution validation test.

Parameters:
  • passed (bool) – Whether the uniform distribution test passed

  • uniformity_score (float) – Score indicating distribution uniformity (0-1, higher is better)

  • bucket_stats (Dict[str, Any]) – Statistics about bucket distribution

  • sample_count (int) – Number of samples used in the test

  • buckets (List[int]) – List of counts for each bucket

CollisionResistanceValidationResult

class hbutils.encoding.int_hash_val.CollisionResistanceValidationResult(passed: bool, collision_count: int, collision_rate: float, sample_size: int, unique_hashes: int, collision_pairs: List[Tuple[str, int]])[source]

Results from collision resistance validation test.

Parameters:
  • passed (bool) – Whether the collision resistance test passed

  • collision_count (int) – Number of collisions detected

  • collision_rate (float) – Rate of collisions (collision_count / sample_size)

  • sample_size (int) – Total number of samples tested

  • unique_hashes (int) – Number of unique hash values generated

  • collision_pairs (List[Tuple[str, int]]) – List of (input, hash) tuples that collided

EmptyInputValidationResult

class hbutils.encoding.int_hash_val.EmptyInputValidationResult(passed: bool, hash_results: List[int], consistent_empty_hash: bool, error_cases: List[Tuple[str, str]], empty_hash_value: int | None)[source]

Results from empty input validation test.

Parameters:
  • passed (bool) – Whether the empty input test passed

  • hash_results (List[int]) – List of hash values for empty inputs

  • consistent_empty_hash (bool) – Whether all empty inputs produced the same hash

  • error_cases (List[Tuple[str, str]]) – List of (input_type, error_message) tuples for failed cases

  • empty_hash_value (Union[int, None]) – The consistent hash value for empty inputs, or None if inconsistent

PerformanceValidationResult

class hbutils.encoding.int_hash_val.PerformanceValidationResult(passed: bool, performance_data: Dict[int, Dict[str, float]], tested_sizes: List[int], completed_sizes: List[int])[source]

Results from performance validation test.

Parameters:
  • passed (bool) – Whether the performance test passed (completed without errors)

  • performance_data (Dict[int, Dict[str, float]]) – Dictionary mapping data sizes to performance metrics

  • tested_sizes (List[int]) – List of data sizes that were tested

  • completed_sizes (List[int]) – List of data sizes that completed successfully

ComprehensiveValidationResult

class hbutils.encoding.int_hash_val.ComprehensiveValidationResult(passed: bool, not_passed_properties: List[str], hash_function_name: str, total_properties_tested: int, properties_passed: int, determinism: DeterminismValidationResult, type_consistency: TypeConsistencyValidationResult, avalanche_effect: AvalancheEffectValidationResult, uniform_distribution: UniformDistributionValidationResult, collision_resistance: CollisionResistanceValidationResult, empty_input: EmptyInputValidationResult, performance: PerformanceValidationResult)[source]

Results from comprehensive validation test.

Parameters:
__str__() str[source]

Generate a formatted string representation of the validation results.

Returns:

Formatted validation report

Return type:

str