hbutils.encoding.decode

Overview:

Functions to deal with encoding binary data easily. This module provides utilities for automatically decoding binary data to strings by detecting the appropriate encoding, with support for preferred encodings and fallback mechanisms.

auto_decode

hbutils.encoding.decode.auto_decode(data: bytes, encoding: str | None = None, prefers: List[str] | None = None) str[source]

Auto decode binary data to string, the encoding mode will be automatically detected.

This function attempts to decode binary data using multiple strategies: 1. If an encoding is explicitly specified, use it directly 2. Otherwise, try preferred encodings in order 3. Fall back to system default encoding 4. Use chardet library to detect the encoding

The function will try each encoding until one succeeds, keeping track of the best partial match in case all attempts fail.

Parameters:
  • data (bytes) – Original binary data to be decoded.

  • encoding (Optional[str]) – Encoding mode to be used, default is None which means this function needs to automatically detect the encoding.

  • prefers (Optional[List[str]]) – Preferred encodings to try first. If None, uses default preferred encodings (utf-8, gbk, gb2312, gb18030, big5).

Returns:

Decoded string.

Return type:

str

Raises:

UnicodeDecodeError – If all decoding attempts fail, raises the error with the longest successful decode position.

Examples:

>>> auto_decode(b'kdsfjldsjflkdsmgds')
'kdsfjldsjflkdsmgds'
>>> auto_decode(b'\xd0\x94\xd0\xbe\xd0\xb1\xd1\x80\xd1\x8b\xd0\xb9 \xd0'
...             b'\xb2\xd0\xb5\xd1\x87\xd0\xb5\xd1\x80')
'Добрый вечер'
>>> auto_decode(b'\xa4\xb3\xa4\xf3\xa4\xd0\xa4\xf3\xa4\xcf')
'こんばんは'
>>> auto_decode(b'\xcd\xed\xc9\xcf\xba\xc3')
'晚上好'