Source code for hbutils.encoding.int_hash_val

"""
This module provides comprehensive validation utilities for hash functions, including tests for:

- Determinism: Ensures consistent output for the same input
- Type consistency: Validates consistent hashing across different input types
- Avalanche effect: Measures how small input changes affect output
- Uniform distribution: Analyzes hash value distribution patterns
- Collision resistance: Tests for hash collisions
- Empty input handling: Validates behavior with empty inputs
- Performance characteristics: Measures hashing speed and throughput

The module is designed to validate integer-based hash functions that accept
string, bytes, or bytearray inputs and return integer hash values.

Example::
    >>> from hbutils.encoding import int_hash_val_comprehensive
    >>>
    >>> print(int_hash_val_comprehensive('xs'))  # validate existing hash functions
    ╔══════════════════════════════════════════════════════════════════════════════════════════════╗
    ║                          COMPREHENSIVE HASH FUNCTION VALIDATION REPORT                       ║
    ╠══════════════════════════════════════════════════════════════════════════════════════════════╣
    ║ Function Name:     xs               │ Overall Status:    PASS                                ║
    ║ Properties Tested: 7                │ Properties Passed: 7    (100.0%)                       ║
    ╠══════════════════════════════════════════════════════════════════════════════════════════════╣
    ║                                    PROPERTY STATUS                                           ║
    ╠══════════════════════════════════════════════════════════════════════════════════════════════╣
    ║ ✓ Determinism                                   │ PASS                                       ║
    ║ ✓ Type Consistency                              │ PASS                                       ║
    ║ ✓ Avalanche Effect                              │ PASS       | Avalanche Effect:       42.9% ║
    ║ ✓ Uniform Distribution                          │ PASS       | Uniformity Score:       0.996 ║
    ║ ✓ Collision Resistance                          │ PASS       | Collision Rate:        0.0000 ║
    ║ ✓ Empty Input                                   │ PASS                                       ║
    ║ ✓ Performance                                   │ PASS       | Avg Throughput:      4.6 MB/s ║
    ╠══════════════════════════════════════════════════════════════════════════════════════════════╣
    ║                                   RECOMMENDATIONS                                            ║
    ╠══════════════════════════════════════════════════════════════════════════════════════════════╣
    ║ ✓ Hash function meets all validation criteria - suitable for production use                  ║
    ╚══════════════════════════════════════════════════════════════════════════════════════════════╝
    DETAILED ANALYSIS:
    • All validation tests passed successfully
    • Hash function demonstrates good cryptographic properties
    • Suitable for general-purpose hashing applications
    >>>
    >>> def basic_good_hash(data) -> int:
    ...     # Convert all input types to bytes
    ...     if isinstance(data, str):
    ...         data = data.encode('utf-8')
    ...     elif isinstance(data, bytearray):
    ...         data = bytes(data)
    ...     hash_val = 0x811c9dc5  # FNV offset basis (32-bit)
    ...     for byte in data:
    ...         # Simple polynomial hash variant
    ...         hash_val = ((hash_val * 33) ^ byte) & 0xffffffff
    ...         # Add some bit mixing
    ...         hash_val ^= hash_val >> 16
    ...         hash_val = (hash_val * 0x85ebca6b) & 0xffffffff
    ...         hash_val ^= hash_val >> 13
    ...     return hash_val & 0xffffffff
    ...
    >>> print(int_hash_val_comprehensive(basic_good_hash))
    ╔══════════════════════════════════════════════════════════════════════════════════════════════╗
    ║                          COMPREHENSIVE HASH FUNCTION VALIDATION REPORT                       ║
    ╠══════════════════════════════════════════════════════════════════════════════════════════════╣
    ║ Function Name:     basic_good_hash  │ Overall Status:    PASS                                ║
    ║ Properties Tested: 7                │ Properties Passed: 7    (100.0%)                       ║
    ╠══════════════════════════════════════════════════════════════════════════════════════════════╣
    ║                                    PROPERTY STATUS                                           ║
    ╠══════════════════════════════════════════════════════════════════════════════════════════════╣
    ║ ✓ Determinism                                   │ PASS                                       ║
    ║ ✓ Type Consistency                              │ PASS                                       ║
    ║ ✓ Avalanche Effect                              │ PASS       | Avalanche Effect:       50.5% ║
    ║ ✓ Uniform Distribution                          │ PASS       | Uniformity Score:       0.996 ║
    ║ ✓ Collision Resistance                          │ PASS       | Collision Rate:        0.0000 ║
    ║ ✓ Empty Input                                   │ PASS                                       ║
    ║ ✓ Performance                                   │ PASS       | Avg Throughput:      2.5 MB/s ║
    ╠══════════════════════════════════════════════════════════════════════════════════════════════╣
    ║                                   RECOMMENDATIONS                                            ║
    ╠══════════════════════════════════════════════════════════════════════════════════════════════╣
    ║ ✓ Hash function meets all validation criteria - suitable for production use                  ║
    ╚══════════════════════════════════════════════════════════════════════════════════════════════╝
    DETAILED ANALYSIS:
    • All validation tests passed successfully
    • Hash function demonstrates good cryptographic properties
    • Suitable for general-purpose hashing applications
"""

import logging
import os
import random
import string
import time
from dataclasses import dataclass
from typing import Union, List, Dict, Any, Tuple

from .int_hash import _IntHashTyping, _INT_HASH_FUNCS

__all__ = [
    'int_hash_val_determinism',
    'int_hash_val_type_consistency',
    'int_hash_val_avalanche_effect',
    'int_hash_val_uniform_distribution',
    'int_hash_val_collision_resistance',
    'int_hash_val_empty_input',
    'int_hash_val_performance',
    'int_hash_val_comprehensive',
    'DeterminismValidationResult',
    'TypeConsistencyValidationResult',
    'AvalancheEffectValidationResult',
    'UniformDistributionValidationResult',
    'CollisionResistanceValidationResult',
    'EmptyInputValidationResult',
    'PerformanceValidationResult',
    'ComprehensiveValidationResult',
]

_HashFuncTyping = Union[str, _IntHashTyping]



[docs]
@dataclass
class DeterminismValidationResult:
    """
    Results from determinism validation test.

    :param passed: Whether the determinism test passed
    :type passed: bool
    :param failed_cases: List of test cases that failed determinism check
    :type failed_cases: List[str]
    :param total_tested: Total number of test cases evaluated
    :type total_tested: int
    :param failed_count: Number of test cases that failed
    :type failed_count: int
    """
    passed: bool
    failed_cases: List[str]
    total_tested: int
    failed_count: int




[docs]
@dataclass
class TypeConsistencyValidationResult:
    """
    Results from type consistency validation test.

    :param passed: Whether the type consistency test passed
    :type passed: bool
    :param failed_cases: List of test cases that failed type consistency check
    :type failed_cases: List[str]
    :param total_tested: Total number of test cases evaluated
    :type total_tested: int
    :param failed_count: Number of test cases that failed
    :type failed_count: int
    :param consistent_hashes: Dictionary mapping test strings to their consistent hash values
    :type consistent_hashes: Dict[str, int]
    """
    passed: bool
    failed_cases: List[str]
    total_tested: int
    failed_count: int
    consistent_hashes: Dict[str, int]




[docs]
@dataclass
class AvalancheEffectValidationResult:
    """
    Results from avalanche effect validation test.

    :param passed: Whether the avalanche effect test passed
    :type passed: bool
    :param avg_bit_changes: Average number of bits changed across all comparisons
    :type avg_bit_changes: float
    :param change_percentage: Percentage of bits changed (avg_bit_changes / total_bits * 100)
    :type change_percentage: float
    :param total_comparisons: Total number of hash comparisons performed
    :type total_comparisons: int
    :param bit_changes_list: List of bit changes for each comparison
    :type bit_changes_list: List[int]
    :param min_changes: Minimum number of bits changed in any comparison
    :type min_changes: int
    :param max_changes: Maximum number of bits changed in any comparison
    :type max_changes: int
    """
    passed: bool
    avg_bit_changes: float
    change_percentage: float
    total_comparisons: int
    bit_changes_list: List[int]
    min_changes: int
    max_changes: int




[docs]
@dataclass
class UniformDistributionValidationResult:
    """
    Results from uniform distribution validation test.

    :param passed: Whether the uniform distribution test passed
    :type passed: bool
    :param uniformity_score: Score indicating distribution uniformity (0-1, higher is better)
    :type uniformity_score: float
    :param bucket_stats: Statistics about bucket distribution
    :type bucket_stats: Dict[str, Any]
    :param sample_count: Number of samples used in the test
    :type sample_count: int
    :param buckets: List of counts for each bucket
    :type buckets: List[int]
    """
    passed: bool
    uniformity_score: float
    bucket_stats: Dict[str, Any]
    sample_count: int
    buckets: List[int]




[docs]
@dataclass
class CollisionResistanceValidationResult:
    """
    Results from collision resistance validation test.

    :param passed: Whether the collision resistance test passed
    :type passed: bool
    :param collision_count: Number of collisions detected
    :type collision_count: int
    :param collision_rate: Rate of collisions (collision_count / sample_size)
    :type collision_rate: float
    :param sample_size: Total number of samples tested
    :type sample_size: int
    :param unique_hashes: Number of unique hash values generated
    :type unique_hashes: int
    :param collision_pairs: List of (input, hash) tuples that collided
    :type collision_pairs: List[Tuple[str, int]]
    """
    passed: bool
    collision_count: int
    collision_rate: float
    sample_size: int
    unique_hashes: int
    collision_pairs: List[Tuple[str, int]]




[docs]
@dataclass
class EmptyInputValidationResult:
    """
    Results from empty input validation test.

    :param passed: Whether the empty input test passed
    :type passed: bool
    :param hash_results: List of hash values for empty inputs
    :type hash_results: List[int]
    :param consistent_empty_hash: Whether all empty inputs produced the same hash
    :type consistent_empty_hash: bool
    :param error_cases: List of (input_type, error_message) tuples for failed cases
    :type error_cases: List[Tuple[str, str]]
    :param empty_hash_value: The consistent hash value for empty inputs, or None if inconsistent
    :type empty_hash_value: Union[int, None]
    """
    passed: bool
    hash_results: List[int]
    consistent_empty_hash: bool
    error_cases: List[Tuple[str, str]]
    empty_hash_value: Union[int, None]




[docs]
@dataclass
class PerformanceValidationResult:
    """
    Results from performance validation test.

    :param passed: Whether the performance test passed (completed without errors)
    :type passed: bool
    :param performance_data: Dictionary mapping data sizes to performance metrics
    :type performance_data: Dict[int, Dict[str, float]]
    :param tested_sizes: List of data sizes that were tested
    :type tested_sizes: List[int]
    :param completed_sizes: List of data sizes that completed successfully
    :type completed_sizes: List[int]
    """
    passed: bool
    performance_data: Dict[int, Dict[str, float]]
    tested_sizes: List[int]
    completed_sizes: List[int]




[docs]
@dataclass
class ComprehensiveValidationResult:
    """
    Results from comprehensive validation test.

    :param passed: Whether all validation tests passed
    :type passed: bool
    :param not_passed_properties: List of property names that failed validation
    :type not_passed_properties: List[str]
    :param hash_function_name: Name of the hash function being validated
    :type hash_function_name: str
    :param total_properties_tested: Total number of properties tested
    :type total_properties_tested: int
    :param properties_passed: Number of properties that passed validation
    :type properties_passed: int
    :param determinism: Results from determinism validation
    :type determinism: DeterminismValidationResult
    :param type_consistency: Results from type consistency validation
    :type type_consistency: TypeConsistencyValidationResult
    :param avalanche_effect: Results from avalanche effect validation
    :type avalanche_effect: AvalancheEffectValidationResult
    :param uniform_distribution: Results from uniform distribution validation
    :type uniform_distribution: UniformDistributionValidationResult
    :param collision_resistance: Results from collision resistance validation
    :type collision_resistance: CollisionResistanceValidationResult
    :param empty_input: Results from empty input validation
    :type empty_input: EmptyInputValidationResult
    :param performance: Results from performance validation
    :type performance: PerformanceValidationResult
    """
    passed: bool
    not_passed_properties: List[str]
    hash_function_name: str
    total_properties_tested: int
    properties_passed: int
    determinism: DeterminismValidationResult
    type_consistency: TypeConsistencyValidationResult
    avalanche_effect: AvalancheEffectValidationResult
    uniform_distribution: UniformDistributionValidationResult
    collision_resistance: CollisionResistanceValidationResult
    empty_input: EmptyInputValidationResult
    performance: PerformanceValidationResult


[docs]
    def __str__(self) -> str:
        """
        Generate a formatted string representation of the validation results.

        :return: Formatted validation report
        :rtype: str
        """
        overall_status = "PASS" if self.passed else "FAIL"
        success_rate = (
                self.properties_passed / self.total_properties_tested * 100) if self.total_properties_tested > 0 else 0

        # Property status summary
        properties = [
            ("Determinism", self.determinism),
            ("Type Consistency", self.type_consistency),
            ("Avalanche Effect", self.avalanche_effect),
            ("Uniform Distribution", self.uniform_distribution),
            ("Collision Resistance", self.collision_resistance),
            ("Empty Input", self.empty_input),
            ("Performance", self.performance),
        ]

        prop_lines = []
        for prop_name, prop in properties:
            status_symbol = "✓" if prop.passed else "✗"
            status_text = "PASS" if prop.passed else "FAIL"
            if isinstance(prop, AvalancheEffectValidationResult):
                prop_lines.append(
                    f"║ {status_symbol} {prop_name:<45} │ {status_text:<10} | Avalanche Effect: {self.avalanche_effect.change_percentage:>10.1f}% ║")
            elif isinstance(prop, UniformDistributionValidationResult):
                prop_lines.append(
                    f"║ {status_symbol} {prop_name:<45} │ {status_text:<10} | Uniformity Score: {self.uniform_distribution.uniformity_score:>11.3f} ║")
            elif isinstance(prop, CollisionResistanceValidationResult):
                prop_lines.append(
                    f"║ {status_symbol} {prop_name:<45} │ {status_text:<10} | Collision Rate: {self.collision_resistance.collision_rate:>13.4f} ║")
            elif isinstance(prop, PerformanceValidationResult):
                avg_throughput = sum(self.performance.performance_data[size]['throughput_mbps']
                                     for size in self.performance.completed_sizes) / len(
                    self.performance.completed_sizes)
                prop_lines.append(
                    f"║ {status_symbol} {prop_name:<45} │ {status_text:<10} | Avg Throughput: {avg_throughput:>8.1f} MB/s ║")
            else:
                prop_lines.append(f"║ {status_symbol} {prop_name:<45} │ {status_text:<42} ║")

        # Key metrics summary
        metrics_lines = [
            f"║ Avalanche Effect:     {self.avalanche_effect.change_percentage:>6.1f}% │ Uniformity Score:    {self.uniform_distribution.uniformity_score:>6.3f} ║",
            f"║ Collision Rate:       {self.collision_resistance.collision_rate:>6.4f} │ Empty Hash Consistent: {str(self.empty_input.consistent_empty_hash):>5} ║",
        ]

        return f"""
╔══════════════════════════════════════════════════════════════════════════════════════════════╗
║                          COMPREHENSIVE HASH FUNCTION VALIDATION REPORT                       ║
╠══════════════════════════════════════════════════════════════════════════════════════════════╣
║ Function Name:     {self.hash_function_name:<16} │ Overall Status:    {overall_status:<33}   ║
║ Properties Tested: {self.total_properties_tested:<16} │ Properties Passed: {self.properties_passed:<4} ({success_rate:>5.1f}%)                       ║
╠══════════════════════════════════════════════════════════════════════════════════════════════╣
║                                    PROPERTY STATUS                                           ║
╠══════════════════════════════════════════════════════════════════════════════════════════════╣
{os.linesep.join(prop_lines)}
╠══════════════════════════════════════════════════════════════════════════════════════════════╣
║                                   RECOMMENDATIONS                                            ║
╠══════════════════════════════════════════════════════════════════════════════════════════════╣
║ {self._get_recommendation():<92} ║
╚══════════════════════════════════════════════════════════════════════════════════════════════╝

{self._get_detailed_analysis()}""".strip()


    def _get_recommendation(self) -> str:
        """
        Generate a recommendation based on validation results.

        :return: Recommendation string
        :rtype: str
        """
        if self.passed:
            return "✓ Hash function meets all validation criteria - suitable for production use"
        elif self.determinism.passed and self.collision_resistance.passed:
            return "⚠ Hash function has minor issues - review failed properties before use"
        else:
            return "✗ Hash function has significant issues - not recommended for use"

    def _get_detailed_analysis(self) -> str:
        """
        Generate detailed analysis of validation results.

        :return: Detailed analysis string
        :rtype: str
        """
        analysis = []

        if not self.determinism.passed:
            analysis.append("• CRITICAL: Determinism failure - hash function produces inconsistent results")

        if not self.type_consistency.passed:
            analysis.append("• WARNING: Type inconsistency - different input types produce different hashes")

        if not self.avalanche_effect.passed:
            analysis.append(
                f"• WARNING: Poor avalanche effect ({self.avalanche_effect.change_percentage:.1f}%) - weak cryptographic properties")

        if not self.uniform_distribution.passed:
            analysis.append(
                f"• WARNING: Poor distribution uniformity ({self.uniform_distribution.uniformity_score:.3f}) - potential bias")

        if not self.collision_resistance.passed:
            analysis.append(
                f"• CRITICAL: High collision rate ({self.collision_resistance.collision_rate:.4f}) - security risk")

        if not self.empty_input.passed:
            analysis.append("• WARNING: Inconsistent empty input handling - may cause unexpected behavior")

        if not self.performance.passed:
            analysis.append("• INFO: Performance test issues - check implementation efficiency")

        if not analysis:
            analysis.append("• All validation tests passed successfully")
            analysis.append("• Hash function demonstrates good cryptographic properties")
            analysis.append("• Suitable for general-purpose hashing applications")

        return os.linesep.join(["DETAILED ANALYSIS:", *analysis])



def _norm_func(hash_func: _HashFuncTyping) -> Tuple[str, _IntHashTyping]:
    """
    Normalize hash function input to a tuple of (name, function).

    Converts string hash function names to their corresponding function objects
    from the registry, or extracts the function name from callable objects.

    :param hash_func: Either a string name of a registered hash function or a callable hash function
    :type hash_func: Union[str, _IntHashTyping]

    :return: A tuple containing (function_name, function_object)
    :rtype: Tuple[str, _IntHashTyping]

    Example::
        >>> def my_hash(data):
        ...     return hash(data)
        >>> name, func = _norm_func(my_hash)
        >>> name
        'my_hash'
        >>> name, func = _norm_func('xxhash32')  # If registered
        >>> name
        'xxhash32'
    """
    if isinstance(hash_func, str):
        return hash_func, _INT_HASH_FUNCS[hash_func]
    else:
        return hash_func.__name__, hash_func



[docs]
def int_hash_val_determinism(hash_func: _HashFuncTyping,
                             test_data: List[Union[str, bytes, bytearray]]) -> DeterminismValidationResult:
    """
    Validate determinism: same input produces same output.

    Tests whether the hash function consistently produces the same output
    for identical inputs across multiple invocations.

    :param hash_func: The hash function to validate. Should accept str, bytes, or bytearray
                      and return an integer hash value. Can be a string name or callable.
    :type hash_func: _HashFuncTyping
    :param test_data: List of test inputs to validate determinism
    :type test_data: List[Union[str, bytes, bytearray]]

    :return: Determinism validation results
    :rtype: DeterminismValidationResult

    Example::
        >>> def simple_hash(data):
        ...     return hash(data) & 0xFFFFFFFF
        >>> result = int_hash_val_determinism(simple_hash, ["test", b"data"])
        >>> result.passed
        True
    """
    _, hash_func = _norm_func(hash_func)
    logging.info("Starting determinism validation")

    passed = True
    failed_cases = []

    for data in test_data:
        hash_results = [hash_func(data) for _ in range(10)]
        is_deterministic = len(set(hash_results)) == 1

        if not is_deterministic:
            logging.warning(f"Determinism validation failed for input: {repr(data)[:50]}")
            passed = False
            failed_cases.append(repr(data)[:50])
        else:
            logging.debug(f"Determinism OK: {repr(data)[:50]} -> {hash_results[0]:08x}")

    results = DeterminismValidationResult(
        passed=passed,
        failed_cases=failed_cases,
        total_tested=len(test_data),
        failed_count=len(failed_cases)
    )

    if passed:
        logging.info(f"Determinism validation passed for all {len(test_data)} test cases")
    else:
        logging.warning(f"Determinism validation failed for {len(failed_cases)}/{len(test_data)} cases")

    return results




[docs]
def int_hash_val_type_consistency(hash_func: _HashFuncTyping) -> TypeConsistencyValidationResult:
    """
    Validate type consistency: same content in different types should produce same hash.

    Tests whether the hash function produces identical hash values for the same
    content when provided as string, bytes, or bytearray types.

    :param hash_func: The hash function to validate. Should accept str, bytes, or bytearray
                      and return an integer hash value. Can be a string name or callable.
    :type hash_func: _HashFuncTyping

    :return: Type consistency validation results
    :rtype: TypeConsistencyValidationResult

    Example::
        >>> def simple_hash(data):
        ...     if isinstance(data, str):
        ...         data = data.encode('utf-8')
        ...     return hash(bytes(data)) & 0xFFFFFFFF
        >>> result = int_hash_val_type_consistency(simple_hash)
        >>> result.passed
        True
    """
    _, hash_func = _norm_func(hash_func)
    logging.info("Starting type consistency validation")

    test_strings = ["hello", "world", "test", ""]
    passed = True
    failed_cases = []
    consistent_hashes = {}

    for s in test_strings:
        try:
            str_hash = hash_func(s)
            bytes_hash = hash_func(s.encode('utf-8'))
            bytearray_hash = hash_func(bytearray(s.encode('utf-8')))

            if str_hash != bytes_hash or bytes_hash != bytearray_hash:
                logging.warning(f"Type consistency failed for: {repr(s)}")
                logging.warning(f"  str: {str_hash:08x}, bytes: {bytes_hash:08x}, bytearray: {bytearray_hash:08x}")
                passed = False
                failed_cases.append(s)
            else:
                logging.debug(f"Type consistency OK: {repr(s)[:30]} -> {str_hash:08x}")
                consistent_hashes[s] = str_hash

        except Exception as e:
            logging.warning(f"Exception during type consistency test for {repr(s)}: {e}")
            passed = False
            failed_cases.append(s)

    results = TypeConsistencyValidationResult(
        passed=passed,
        failed_cases=failed_cases,
        total_tested=len(test_strings),
        failed_count=len(failed_cases),
        consistent_hashes=consistent_hashes
    )

    if passed:
        logging.info(f"Type consistency validation passed for all {len(test_strings)} test cases")
    else:
        logging.warning(f"Type consistency validation failed for {len(failed_cases)}/{len(test_strings)} cases")

    return results




[docs]
def int_hash_val_avalanche_effect(hash_func: _HashFuncTyping,
                                  sample_size: int = 100) -> AvalancheEffectValidationResult:
    """
    Validate avalanche effect: small input changes cause significant output changes.

    Tests the avalanche effect property where a small change in input (single bit/character)
    should result in approximately 50% of the output bits changing. This is a key property
    of good hash functions.

    :param hash_func: The hash function to validate. Should accept str, bytes, or bytearray
                      and return an integer hash value. Can be a string name or callable.
    :type hash_func: _HashFuncTyping
    :param sample_size: Number of random samples to test, defaults to 100
    :type sample_size: int

    :return: Avalanche effect validation results
    :rtype: AvalancheEffectValidationResult

    Example::
        >>> def simple_hash(data):
        ...     return hash(data) & 0xFFFFFFFF
        >>> result = int_hash_val_avalanche_effect(simple_hash, sample_size=50)
        >>> result.change_percentage > 40.0
        True
    """
    _, hash_func = _norm_func(hash_func)
    logging.info(f"Starting avalanche effect validation with {sample_size} samples")

    def hamming_distance(a: int, b: int) -> int:
        """
        Calculate Hamming distance between two integers.

        :param a: First integer
        :type a: int
        :param b: Second integer
        :type b: int

        :return: Number of differing bits
        :rtype: int
        """
        return bin(a ^ b).count('1')

    total_bit_changes = 0
    total_comparisons = 0
    bit_changes_list = []

    for i in range(sample_size):
        # Generate random string
        original = ''.join(random.choices(string.ascii_letters + string.digits, k=20))

        # Randomly modify one character
        pos = random.randint(0, len(original) - 1)
        chars = list(original)
        chars[pos] = random.choice(string.ascii_letters + string.digits)
        modified = ''.join(chars)

        if original != modified:
            try:
                hash1 = hash_func(original)
                hash2 = hash_func(modified)

                bit_changes = hamming_distance(hash1, hash2)
                bit_changes_list.append(bit_changes)
                total_bit_changes += bit_changes
                total_comparisons += 1

                logging.debug(f"Sample {i}: {bit_changes}/32 bits changed")

            except Exception as e:
                logging.warning(f"Exception during avalanche test sample {i}: {e}")

    if total_comparisons == 0:
        logging.warning("No valid comparisons for avalanche effect test")
        return AvalancheEffectValidationResult(
            passed=False,
            avg_bit_changes=0.0,
            change_percentage=0.0,
            total_comparisons=0,
            bit_changes_list=[],
            min_changes=0,
            max_changes=0
        )

    avg_bit_changes = total_bit_changes / total_comparisons
    change_percentage = avg_bit_changes / 32 * 100  # Assuming 32-bit output

    # Good avalanche effect should change ~50% of bits
    passed = change_percentage > 40.0

    results = AvalancheEffectValidationResult(
        passed=passed,
        avg_bit_changes=avg_bit_changes,
        change_percentage=change_percentage,
        total_comparisons=total_comparisons,
        bit_changes_list=bit_changes_list,
        min_changes=min(bit_changes_list) if bit_changes_list else 0,
        max_changes=max(bit_changes_list) if bit_changes_list else 0
    )

    logging.info(f"Average bit changes: {avg_bit_changes:.2f}/32 ({change_percentage:.1f}%)")

    if passed:
        logging.info(f"Avalanche effect validation passed: {change_percentage:.1f}%")
    else:
        logging.warning(f"Avalanche effect validation failed: {change_percentage:.1f}% (expected >40%)")

    return results




[docs]
def int_hash_val_uniform_distribution(hash_func: _HashFuncTyping,
                                      sample_size: int = 10000) -> UniformDistributionValidationResult:
    """
    Validate uniform distribution of hash outputs.

    Tests whether the hash function produces uniformly distributed output values
    across the hash space. Divides the hash space into buckets and checks if
    hash values are evenly distributed.

    :param hash_func: The hash function to validate. Should accept str, bytes, or bytearray
                      and return an integer hash value. Can be a string name or callable.
    :type hash_func: _HashFuncTyping
    :param sample_size: Number of random samples to generate and hash, defaults to 10000
    :type sample_size: int

    :return: Uniform distribution validation results
    :rtype: UniformDistributionValidationResult

    Example::
        >>> def simple_hash(data):
        ...     return hash(data) & 0xFFFFFFFF
        >>> result = int_hash_val_uniform_distribution(simple_hash, sample_size=1000)
        >>> result.uniformity_score > 0.95
        True
    """
    _, hash_func = _norm_func(hash_func)
    logging.info(f"Starting uniform distribution validation with {sample_size} samples")

    # Generate random samples
    hashes = []
    for i in range(sample_size):
        data = ''.join(random.choices(string.ascii_letters + string.digits,
                                      k=random.randint(1, 50)))
        try:
            hashes.append(hash_func(data))
        except Exception as e:
            logging.warning(f"Exception during distribution test sample {i}: {e}")

    if not hashes:
        logging.warning("No valid hash samples for distribution test")
        return UniformDistributionValidationResult(
            passed=False,
            uniformity_score=0.0,
            bucket_stats={},
            sample_count=0,
            buckets=[]
        )

    # Calculate distribution statistics
    bucket_count = 256  # Divide hash values into 256 buckets
    buckets = [0] * bucket_count

    for h in hashes:
        bucket = h % bucket_count
        buckets[bucket] += 1

    # Calculate uniformity metrics
    expected = len(hashes) / bucket_count
    max_bucket = max(buckets)
    min_bucket = min(buckets)
    uniformity_score = 1 - (max_bucket - min_bucket) / len(hashes)

    # Good distribution should have uniformity > 0.95
    passed = uniformity_score > 0.95

    results = UniformDistributionValidationResult(
        passed=passed,
        uniformity_score=uniformity_score,
        bucket_stats={
            'max_bucket': max_bucket,
            'min_bucket': min_bucket,
            'expected': expected,
            'bucket_count': bucket_count
        },
        sample_count=len(hashes),
        buckets=buckets
    )

    logging.info(f"Max bucket: {max_bucket}, Min bucket: {min_bucket}, Expected: {expected:.1f}")
    logging.info(f"Uniformity score: {uniformity_score:.3f}")

    if passed:
        logging.info(f"Uniform distribution validation passed: {uniformity_score:.3f}")
    else:
        logging.warning(f"Uniform distribution validation failed: {uniformity_score:.3f} (expected >0.95)")

    return results




[docs]
def int_hash_val_collision_resistance(hash_func: _HashFuncTyping,
                                      sample_size: int = 100000) -> CollisionResistanceValidationResult:
    """
    Validate collision resistance.

    Tests the hash function's resistance to collisions by generating many random
    inputs and checking for duplicate hash values. A good hash function should have
    a very low collision rate.

    :param hash_func: The hash function to validate. Should accept str, bytes, or bytearray
                      and return an integer hash value. Can be a string name or callable.
    :type hash_func: _HashFuncTyping
    :param sample_size: Number of random samples to test, defaults to 100000
    :type sample_size: int

    :return: Collision resistance validation results
    :rtype: CollisionResistanceValidationResult

    Example::
        >>> def simple_hash(data):
        ...     return hash(data) & 0xFFFFFFFF
        >>> result = int_hash_val_collision_resistance(simple_hash, sample_size=10000)
        >>> result.collision_rate < 0.001
        True
    """
    _, hash_func = _norm_func(hash_func)
    logging.info(f"Starting collision resistance validation with {sample_size} samples")

    hashes = set()
    collisions = 0
    collision_pairs = []

    for i in range(sample_size):
        data = f"test_string_{i}_{random.randint(0, 1000000)}"
        try:
            h = hash_func(data)

            if h in hashes:
                collisions += 1
                collision_pairs.append((data, h))
                logging.debug(f"Collision detected: {data} -> {h:08x}")
            hashes.add(h)

        except Exception as e:
            logging.warning(f"Exception during collision test sample {i}: {e}")

    collision_rate = collisions / sample_size if sample_size > 0 else 1.0

    # Good collision resistance should have rate < 0.001
    passed = collision_rate < 0.001

    results = CollisionResistanceValidationResult(
        passed=passed,
        collision_count=collisions,
        collision_rate=collision_rate,
        sample_size=sample_size,
        unique_hashes=len(hashes),
        collision_pairs=collision_pairs[:10]  # Store first 10 collisions for analysis
    )

    logging.info(f"Samples: {sample_size}, Collisions: {collisions}, Rate: {collision_rate:.6f}")

    if passed:
        logging.info(f"Collision resistance validation passed: {collision_rate:.6f}")
    else:
        logging.warning(f"Collision resistance validation failed: {collision_rate:.6f} (expected <0.001)")

    return results




[docs]
def int_hash_val_empty_input(hash_func: _HashFuncTyping) -> EmptyInputValidationResult:
    """
    Validate empty input handling.

    Tests whether the hash function correctly handles empty inputs of different types
    (empty string, empty bytes, empty bytearray) and produces consistent results.

    :param hash_func: The hash function to validate. Should accept str, bytes, or bytearray
                      and return an integer hash value. Can be a string name or callable.
    :type hash_func: _HashFuncTyping

    :return: Empty input validation results
    :rtype: EmptyInputValidationResult

    Example::
        >>> def simple_hash(data):
        ...     if isinstance(data, str):
        ...         data = data.encode('utf-8')
        ...     return hash(bytes(data)) & 0xFFFFFFFF
        >>> result = int_hash_val_empty_input(simple_hash)
        >>> result.consistent_empty_hash
        True
    """
    _, hash_func = _norm_func(hash_func)
    logging.info("Starting empty input validation")

    empty_inputs = [
        ("str", ""),
        ("bytes", b""),
        ("bytearray", bytearray())
    ]

    results_list = []
    passed = True
    error_cases = []

    for input_type, empty in empty_inputs:
        try:
            result = hash_func(empty)
            results_list.append(result)
            logging.debug(f"{input_type}() -> {result:08x}")
        except Exception as e:
            logging.warning(f"{input_type}() -> Error: {e}")
            passed = False
            error_cases.append((input_type, str(e)))

    # Check if all empty inputs produce same result
    if results_list and len(set(results_list)) != 1:
        logging.warning(f"Different empty inputs produce different hash values - "
                        f"{list(zip(empty_inputs, results_list))!r}")
        passed = False

    results = EmptyInputValidationResult(
        passed=passed,
        hash_results=results_list,
        consistent_empty_hash=len(set(results_list)) == 1 if results_list else False,
        error_cases=error_cases,
        empty_hash_value=results_list[0] if results_list else None
    )

    if passed and results_list:
        logging.info(f"Empty input validation passed, consistent hash: {results_list[0]:08x}")
    else:
        logging.warning("Empty input validation failed")

    return results




[docs]
def int_hash_val_performance(hash_func: _HashFuncTyping, data_sizes: List[int] = None) -> PerformanceValidationResult:
    """
    Validate performance characteristics.

    Measures the hash function's performance across different input sizes,
    calculating average hashing time and throughput in MB/s.

    :param hash_func: The hash function to validate. Should accept str, bytes, or bytearray
                      and return an integer hash value. Can be a string name or callable.
    :type hash_func: _HashFuncTyping
    :param data_sizes: List of data sizes (in bytes) to test, defaults to [100, 1000, 10000, 100000]
    :type data_sizes: List[int], optional

    :return: Performance validation results
    :rtype: PerformanceValidationResult

    Example::
        >>> def simple_hash(data):
        ...     return hash(data) & 0xFFFFFFFF
        >>> result = int_hash_val_performance(simple_hash, data_sizes=[100, 1000])
        >>> result.passed
        True
        >>> 100 in result.performance_data
        True
    """
    _, hash_func = _norm_func(hash_func)
    if data_sizes is None:
        data_sizes = [100, 1000, 10000, 100000]

    logging.info(f"Starting performance validation with data sizes: {data_sizes}")

    passed = True
    performance_data = {}

    for size in data_sizes:
        logging.debug(f"Testing performance for size: {size} bytes")

        # Generate test data
        test_data = 'a' * size

        # Warmup
        try:
            for _ in range(10):
                hash_func(test_data)
        except Exception as e:
            logging.warning(f"Warmup failed for size {size}: {e}")
            passed = False
            continue

        # Actual test
        iterations = max(1, 10000 // size)  # Adjust iterations based on data size

        try:
            start_time = time.perf_counter()
            for _ in range(iterations):
                hash_func(test_data)
            end_time = time.perf_counter()

            total_time = end_time - start_time
            avg_time = total_time / iterations
            throughput = size / avg_time / (1024 * 1024)  # MB/s

            performance_data[size] = {
                'avg_time_seconds': avg_time,
                'avg_time_microseconds': avg_time * 1000000,
                'throughput_mbps': throughput,
                'iterations': iterations,
                'total_time': total_time
            }

            logging.info(f"Size: {size:6d} bytes, Avg time: {avg_time * 1000000:8.2f} μs, "
                         f"Throughput: {throughput:8.2f} MB/s")

        except Exception as e:
            logging.warning(f"Performance test failed for size {size}: {e}")
            passed = False

    # Performance is considered "passed" if all tests completed without errors
    # Individual performance metrics should be evaluated by the caller

    results = PerformanceValidationResult(
        passed=passed,
        performance_data=performance_data,
        tested_sizes=data_sizes,
        completed_sizes=list(performance_data.keys())
    )

    if passed:
        logging.info(f"Performance validation completed for {len(performance_data)} sizes")
    else:
        logging.warning("Performance validation encountered errors")

    return results




[docs]
def int_hash_val_comprehensive(hash_func: _HashFuncTyping) -> ComprehensiveValidationResult:
    """
    Comprehensive validation of hash function properties.

    Runs a complete suite of validation tests on the hash function, including:
    determinism, type consistency, avalanche effect, uniform distribution,
    collision resistance, empty input handling, and performance characteristics.

    :param hash_func: The hash function to validate. Should accept str, bytes, or bytearray
                      and return an integer hash value. Can be a string name or callable.
    :type hash_func: _HashFuncTyping

    :return: Comprehensive validation results
    :rtype: ComprehensiveValidationResult

    Example::
        >>> def simple_hash(data):
        ...     if isinstance(data, str):
        ...         data = data.encode('utf-8')
        ...     return hash(bytes(data)) & 0xFFFFFFFF
        >>> result = int_hash_val_comprehensive(simple_hash)
        >>> result.hash_function_name
        'simple_hash'
        >>> result.total_properties_tested
        7
    """
    hash_name, hash_func = _norm_func(hash_func)
    logging.info(f"Starting comprehensive validation for hash function: {hash_name}")

    # Test data for various validations
    test_data = [
        "hello",
        "world",
        "test_string",
        "",
        "a" * 1000,
        b"binary_data",
        bytearray(b"bytearray_data")
    ]

    not_passed_properties = []

    # Run all validation tests
    try:
        determinism_result = int_hash_val_determinism(hash_func, test_data)
        if not determinism_result.passed:
            not_passed_properties.append('determinism')
    except Exception as e:
        logging.warning(f"Exception during determinism validation: {e}")
        determinism_result = DeterminismValidationResult(False, [], 0, 0)
        not_passed_properties.append('determinism')

    try:
        type_consistency_result = int_hash_val_type_consistency(hash_func)
        if not type_consistency_result.passed:
            not_passed_properties.append('type_consistency')
    except Exception as e:
        logging.warning(f"Exception during type_consistency validation: {e}")
        type_consistency_result = TypeConsistencyValidationResult(False, [], 0, 0, {})
        not_passed_properties.append('type_consistency')

    try:
        avalanche_effect_result = int_hash_val_avalanche_effect(hash_func)
        if not avalanche_effect_result.passed:
            not_passed_properties.append('avalanche_effect')
    except Exception as e:
        logging.warning(f"Exception during avalanche_effect validation: {e}")
        avalanche_effect_result = AvalancheEffectValidationResult(False, 0.0, 0.0, 0, [], 0, 0)
        not_passed_properties.append('avalanche_effect')

    try:
        uniform_distribution_result = int_hash_val_uniform_distribution(hash_func)
        if not uniform_distribution_result.passed:
            not_passed_properties.append('uniform_distribution')
    except Exception as e:
        logging.warning(f"Exception during uniform_distribution validation: {e}")
        uniform_distribution_result = UniformDistributionValidationResult(False, 0.0, {}, 0, [])
        not_passed_properties.append('uniform_distribution')

    try:
        collision_resistance_result = int_hash_val_collision_resistance(hash_func)
        if not collision_resistance_result.passed:
            not_passed_properties.append('collision_resistance')
    except Exception as e:
        logging.warning(f"Exception during collision_resistance validation: {e}")
        collision_resistance_result = CollisionResistanceValidationResult(False, 0, 0.0, 0, 0, [])
        not_passed_properties.append('collision_resistance')

    try:
        empty_input_result = int_hash_val_empty_input(hash_func)
        if not empty_input_result.passed:
            not_passed_properties.append('empty_input')
    except Exception as e:
        logging.warning(f"Exception during empty_input validation: {e}")
        empty_input_result = EmptyInputValidationResult(False, [], False, [], None)
        not_passed_properties.append('empty_input')

    try:
        performance_result = int_hash_val_performance(hash_func)
        if not performance_result.passed:
            not_passed_properties.append('performance')
    except Exception as e:
        logging.warning(f"Exception during performance validation: {e}")
        performance_result = PerformanceValidationResult(False, {}, [], [])
        not_passed_properties.append('performance')

    # Overall pass/fail determination
    overall_passed = len(not_passed_properties) == 0
    total_properties = 7
    properties_passed = total_properties - len(not_passed_properties)

    comprehensive_results = ComprehensiveValidationResult(
        passed=overall_passed,
        not_passed_properties=not_passed_properties,
        hash_function_name=hash_name,
        total_properties_tested=total_properties,
        properties_passed=properties_passed,
        determinism=determinism_result,
        type_consistency=type_consistency_result,
        avalanche_effect=avalanche_effect_result,
        uniform_distribution=uniform_distribution_result,
        collision_resistance=collision_resistance_result,
        empty_input=empty_input_result,
        performance=performance_result
    )

    if overall_passed:
        logging.info(f"Comprehensive validation PASSED for {hash_name} - all {total_properties} properties validated")
    else:
        logging.warning(f"Comprehensive validation FAILED for {hash_name} - "
                        f"{len(not_passed_properties)} properties failed: {not_passed_properties}")

    return comprehensive_results