hbutils.encoding.int_hash_val
Hash function validation utilities for integer-based hashing.
This module provides a comprehensive validation suite for integer hash functions
that accept str, bytes, or bytearray inputs and return
integer hash values. The validation suite focuses on common properties expected
from robust hash functions, including determinism, type consistency, avalanche
effect, uniform distribution, collision resistance, empty input handling, and
performance characteristics.
The module contains the following main components:
int_hash_val_determinism()- Determinism validationint_hash_val_type_consistency()- Type consistency validationint_hash_val_avalanche_effect()- Avalanche effect validationint_hash_val_uniform_distribution()- Uniform distribution validationint_hash_val_collision_resistance()- Collision resistance validationint_hash_val_empty_input()- Empty input handling validationint_hash_val_performance()- Performance validationint_hash_val_comprehensive()- Full validation suiteComprehensiveValidationResult- Aggregated results report
Note
The validation suite assumes a 32-bit output when computing the avalanche effect score. For hash functions with different bit widths, interpret the avalanche metrics accordingly.
Example:
>>> from hbutils.encoding import int_hash_val_comprehensive
>>>
>>> print(int_hash_val_comprehensive('xs')) # validate existing hash functions
╔══════════════════════════════════════════════════════════════════════════════════════════════╗
║ COMPREHENSIVE HASH FUNCTION VALIDATION REPORT ║
╠══════════════════════════════════════════════════════════════════════════════════════════════╣
║ Function Name: xs │ Overall Status: PASS ║
║ Properties Tested: 7 │ Properties Passed: 7 (100.0%) ║
╠══════════════════════════════════════════════════════════════════════════════════════════════╣
║ PROPERTY STATUS ║
╠══════════════════════════════════════════════════════════════════════════════════════════════╣
║ ✓ Determinism │ PASS ║
║ ✓ Type Consistency │ PASS ║
║ ✓ Avalanche Effect │ PASS | Avalanche Effect: 42.9% ║
║ ✓ Uniform Distribution │ PASS | Uniformity Score: 0.996 ║
║ ✓ Collision Resistance │ PASS | Collision Rate: 0.0000 ║
║ ✓ Empty Input │ PASS ║
║ ✓ Performance │ PASS | Avg Throughput: 4.6 MB/s ║
╠══════════════════════════════════════════════════════════════════════════════════════════════╣
║ RECOMMENDATIONS ║
╠══════════════════════════════════════════════════════════════════════════════════════════════╣
║ ✓ Hash function meets all validation criteria - suitable for production use ║
╚══════════════════════════════════════════════════════════════════════════════════════════════╝
DETAILED ANALYSIS:
• All validation tests passed successfully
• Hash function demonstrates good cryptographic properties
• Suitable for general-purpose hashing applications
__all__
- hbutils.encoding.int_hash_val.__all__ = ['int_hash_val_determinism', 'int_hash_val_type_consistency', 'int_hash_val_avalanche_effect', 'int_hash_val_uniform_distribution', 'int_hash_val_collision_resistance', 'int_hash_val_empty_input', 'int_hash_val_performance', 'int_hash_val_comprehensive', 'DeterminismValidationResult', 'TypeConsistencyValidationResult', 'AvalancheEffectValidationResult', 'UniformDistributionValidationResult', 'CollisionResistanceValidationResult', 'EmptyInputValidationResult', 'PerformanceValidationResult', 'ComprehensiveValidationResult']
Built-in mutable sequence.
If no argument is given, the constructor creates a new empty list. The argument must be an iterable if specified.
DeterminismValidationResult
- class hbutils.encoding.int_hash_val.DeterminismValidationResult(passed: bool, failed_cases: List[str], total_tested: int, failed_count: int)[source]
Results from determinism validation test.
- Parameters:
passed (bool) – Whether the determinism test passed
failed_cases (List[str]) – List of test cases that failed determinism check
total_tested (int) – Total number of test cases evaluated
failed_count (int) – Number of test cases that failed
TypeConsistencyValidationResult
- class hbutils.encoding.int_hash_val.TypeConsistencyValidationResult(passed: bool, failed_cases: List[str], total_tested: int, failed_count: int, consistent_hashes: Dict[str, int])[source]
Results from type consistency validation test.
- Parameters:
passed (bool) – Whether the type consistency test passed
failed_cases (List[str]) – List of test cases that failed type consistency check
total_tested (int) – Total number of test cases evaluated
failed_count (int) – Number of test cases that failed
consistent_hashes (Dict[str, int]) – Dictionary mapping test strings to their consistent hash values
AvalancheEffectValidationResult
- class hbutils.encoding.int_hash_val.AvalancheEffectValidationResult(passed: bool, avg_bit_changes: float, change_percentage: float, total_comparisons: int, bit_changes_list: List[int], min_changes: int, max_changes: int)[source]
Results from avalanche effect validation test.
- Parameters:
passed (bool) – Whether the avalanche effect test passed
avg_bit_changes (float) – Average number of bits changed across all comparisons
change_percentage (float) – Percentage of bits changed (avg_bit_changes / total_bits * 100)
total_comparisons (int) – Total number of hash comparisons performed
bit_changes_list (List[int]) – List of bit changes for each comparison
min_changes (int) – Minimum number of bits changed in any comparison
max_changes (int) – Maximum number of bits changed in any comparison
UniformDistributionValidationResult
- class hbutils.encoding.int_hash_val.UniformDistributionValidationResult(passed: bool, uniformity_score: float, bucket_stats: Dict[str, Any], sample_count: int, buckets: List[int])[source]
Results from uniform distribution validation test.
- Parameters:
passed (bool) – Whether the uniform distribution test passed
uniformity_score (float) – Score indicating distribution uniformity (0-1, higher is better)
bucket_stats (Dict[str, Any]) – Statistics about bucket distribution
sample_count (int) – Number of samples used in the test
buckets (List[int]) – List of counts for each bucket
CollisionResistanceValidationResult
- class hbutils.encoding.int_hash_val.CollisionResistanceValidationResult(passed: bool, collision_count: int, collision_rate: float, sample_size: int, unique_hashes: int, collision_pairs: List[Tuple[str, int]])[source]
Results from collision resistance validation test.
- Parameters:
passed (bool) – Whether the collision resistance test passed
collision_count (int) – Number of collisions detected
collision_rate (float) – Rate of collisions (collision_count / sample_size)
sample_size (int) – Total number of samples tested
unique_hashes (int) – Number of unique hash values generated
collision_pairs (List[Tuple[str, int]]) – List of (input, hash) tuples that collided
EmptyInputValidationResult
- class hbutils.encoding.int_hash_val.EmptyInputValidationResult(passed: bool, hash_results: List[int], consistent_empty_hash: bool, error_cases: List[Tuple[str, str]], empty_hash_value: int | None)[source]
Results from empty input validation test.
- Parameters:
passed (bool) – Whether the empty input test passed
hash_results (List[int]) – List of hash values for empty inputs
consistent_empty_hash (bool) – Whether all empty inputs produced the same hash
error_cases (List[Tuple[str, str]]) – List of (input_type, error_message) tuples for failed cases
empty_hash_value (Union[int, None]) – The consistent hash value for empty inputs, or None if inconsistent
PerformanceValidationResult
- class hbutils.encoding.int_hash_val.PerformanceValidationResult(passed: bool, performance_data: Dict[int, Dict[str, float]], tested_sizes: List[int], completed_sizes: List[int])[source]
Results from performance validation test.
- Parameters:
passed (bool) – Whether the performance test passed (completed without errors)
performance_data (Dict[int, Dict[str, float]]) – Dictionary mapping data sizes to performance metrics
tested_sizes (List[int]) – List of data sizes that were tested
completed_sizes (List[int]) – List of data sizes that completed successfully
ComprehensiveValidationResult
- class hbutils.encoding.int_hash_val.ComprehensiveValidationResult(passed: bool, not_passed_properties: List[str], hash_function_name: str, total_properties_tested: int, properties_passed: int, determinism: DeterminismValidationResult, type_consistency: TypeConsistencyValidationResult, avalanche_effect: AvalancheEffectValidationResult, uniform_distribution: UniformDistributionValidationResult, collision_resistance: CollisionResistanceValidationResult, empty_input: EmptyInputValidationResult, performance: PerformanceValidationResult)[source]
Results from comprehensive validation test.
- Parameters:
passed (bool) – Whether all validation tests passed
not_passed_properties (List[str]) – List of property names that failed validation
hash_function_name (str) – Name of the hash function being validated
total_properties_tested (int) – Total number of properties tested
properties_passed (int) – Number of properties that passed validation
determinism (DeterminismValidationResult) – Results from determinism validation
type_consistency (TypeConsistencyValidationResult) – Results from type consistency validation
avalanche_effect (AvalancheEffectValidationResult) – Results from avalanche effect validation
uniform_distribution (UniformDistributionValidationResult) – Results from uniform distribution validation
collision_resistance (CollisionResistanceValidationResult) – Results from collision resistance validation
empty_input (EmptyInputValidationResult) – Results from empty input validation
performance (PerformanceValidationResult) – Results from performance validation
int_hash_val_determinism
- hbutils.encoding.int_hash_val.int_hash_val_determinism(hash_func: str | Callable[[str | bytes | bytearray], int], test_data: List[str | bytes | bytearray]) DeterminismValidationResult[source]
Validate determinism: same input produces same output.
Tests whether the hash function consistently produces the same output for identical inputs across multiple invocations.
- Parameters:
hash_func (_HashFuncTyping) – The hash function to validate. Should accept str, bytes, or bytearray and return an integer hash value. Can be a string name or callable.
test_data (List[Union[str, bytes, bytearray]]) – List of test inputs to validate determinism
- Returns:
Determinism validation results
- Return type:
- Example::
>>> def simple_hash(data): ... return hash(data) & 0xFFFFFFFF >>> result = int_hash_val_determinism(simple_hash, ["test", b"data"]) >>> result.passed True
int_hash_val_type_consistency
- hbutils.encoding.int_hash_val.int_hash_val_type_consistency(hash_func: str | Callable[[str | bytes | bytearray], int]) TypeConsistencyValidationResult[source]
Validate type consistency: same content in different types should produce same hash.
Tests whether the hash function produces identical hash values for the same content when provided as string, bytes, or bytearray types.
- Parameters:
hash_func (_HashFuncTyping) – The hash function to validate. Should accept str, bytes, or bytearray and return an integer hash value. Can be a string name or callable.
- Returns:
Type consistency validation results
- Return type:
- Example::
>>> def simple_hash(data): ... if isinstance(data, str): ... data = data.encode('utf-8') ... return hash(bytes(data)) & 0xFFFFFFFF >>> result = int_hash_val_type_consistency(simple_hash) >>> result.passed True
int_hash_val_avalanche_effect
- hbutils.encoding.int_hash_val.int_hash_val_avalanche_effect(hash_func: str | Callable[[str | bytes | bytearray], int], sample_size: int = 100) AvalancheEffectValidationResult[source]
Validate avalanche effect: small input changes cause significant output changes.
Tests the avalanche effect property where a small change in input (single bit/character) should result in approximately 50% of the output bits changing. This is a key property of good hash functions.
- Parameters:
hash_func (_HashFuncTyping) – The hash function to validate. Should accept str, bytes, or bytearray and return an integer hash value. Can be a string name or callable.
sample_size (int) – Number of random samples to test, defaults to 100
- Returns:
Avalanche effect validation results
- Return type:
- Example::
>>> def simple_hash(data): ... return hash(data) & 0xFFFFFFFF >>> result = int_hash_val_avalanche_effect(simple_hash, sample_size=50) >>> result.change_percentage > 40.0 True
int_hash_val_uniform_distribution
- hbutils.encoding.int_hash_val.int_hash_val_uniform_distribution(hash_func: str | Callable[[str | bytes | bytearray], int], sample_size: int = 10000) UniformDistributionValidationResult[source]
Validate uniform distribution of hash outputs.
Tests whether the hash function produces uniformly distributed output values across the hash space. Divides the hash space into buckets and checks if hash values are evenly distributed.
- Parameters:
hash_func (_HashFuncTyping) – The hash function to validate. Should accept str, bytes, or bytearray and return an integer hash value. Can be a string name or callable.
sample_size (int) – Number of random samples to generate and hash, defaults to 10000
- Returns:
Uniform distribution validation results
- Return type:
- Example::
>>> def simple_hash(data): ... return hash(data) & 0xFFFFFFFF >>> result = int_hash_val_uniform_distribution(simple_hash, sample_size=1000) >>> result.uniformity_score > 0.95 True
int_hash_val_collision_resistance
- hbutils.encoding.int_hash_val.int_hash_val_collision_resistance(hash_func: str | Callable[[str | bytes | bytearray], int], sample_size: int = 100000) CollisionResistanceValidationResult[source]
Validate collision resistance.
Tests the hash function’s resistance to collisions by generating many random inputs and checking for duplicate hash values. A good hash function should have a very low collision rate.
- Parameters:
hash_func (_HashFuncTyping) – The hash function to validate. Should accept str, bytes, or bytearray and return an integer hash value. Can be a string name or callable.
sample_size (int) – Number of random samples to test, defaults to 100000
- Returns:
Collision resistance validation results
- Return type:
- Example::
>>> def simple_hash(data): ... return hash(data) & 0xFFFFFFFF >>> result = int_hash_val_collision_resistance(simple_hash, sample_size=10000) >>> result.collision_rate < 0.001 True
int_hash_val_empty_input
- hbutils.encoding.int_hash_val.int_hash_val_empty_input(hash_func: str | Callable[[str | bytes | bytearray], int]) EmptyInputValidationResult[source]
Validate empty input handling.
Tests whether the hash function correctly handles empty inputs of different types (empty string, empty bytes, empty bytearray) and produces consistent results.
- Parameters:
hash_func (_HashFuncTyping) – The hash function to validate. Should accept str, bytes, or bytearray and return an integer hash value. Can be a string name or callable.
- Returns:
Empty input validation results
- Return type:
- Example::
>>> def simple_hash(data): ... if isinstance(data, str): ... data = data.encode('utf-8') ... return hash(bytes(data)) & 0xFFFFFFFF >>> result = int_hash_val_empty_input(simple_hash) >>> result.consistent_empty_hash True
int_hash_val_performance
- hbutils.encoding.int_hash_val.int_hash_val_performance(hash_func: str | Callable[[str | bytes | bytearray], int], data_sizes: List[int] | None = None) PerformanceValidationResult[source]
Validate performance characteristics.
Measures the hash function’s performance across different input sizes, calculating average hashing time and throughput in MB/s.
- Parameters:
hash_func (_HashFuncTyping) – The hash function to validate. Should accept str, bytes, or bytearray and return an integer hash value. Can be a string name or callable.
data_sizes (List[int], optional) – List of data sizes (in bytes) to test, defaults to [100, 1000, 10000, 100000]
- Returns:
Performance validation results
- Return type:
- Example::
>>> def simple_hash(data): ... return hash(data) & 0xFFFFFFFF >>> result = int_hash_val_performance(simple_hash, data_sizes=[100, 1000]) >>> result.passed True >>> 100 in result.performance_data True
int_hash_val_comprehensive
- hbutils.encoding.int_hash_val.int_hash_val_comprehensive(hash_func: str | Callable[[str | bytes | bytearray], int]) ComprehensiveValidationResult[source]
Comprehensive validation of hash function properties.
Runs a complete suite of validation tests on the hash function, including: determinism, type consistency, avalanche effect, uniform distribution, collision resistance, empty input handling, and performance characteristics.
- Parameters:
hash_func (_HashFuncTyping) – The hash function to validate. Should accept str, bytes, or bytearray and return an integer hash value. Can be a string name or callable.
- Returns:
Comprehensive validation results
- Return type:
- Example::
>>> def simple_hash(data): ... if isinstance(data, str): ... data = data.encode('utf-8') ... return hash(bytes(data)) & 0xFFFFFFFF >>> result = int_hash_val_comprehensive(simple_hash) >>> result.hash_function_name 'simple_hash' >>> result.total_properties_tested 7