hbutils.encoding.decode

Automatic binary decoding utilities.

This module provides helpers for decoding binary data into text by attempting multiple encodings. The primary entry point is auto_decode(), which tries an explicit encoding if provided, then a list of preferred encodings, and finally uses system defaults and the chardet detector.

The module contains the following main components:

  • auto_decode() - Automatically decode bytes into text using multiple strategies

Note

Decoding results depend on the provided byte stream and the available encodings in the runtime environment.

Example:

>>> from hbutils.encoding.decode import auto_decode
>>> auto_decode(b'kdsfjldsjflkdsmgds')
'kdsfjldsjflkdsmgds'
>>> auto_decode(b'\xd0\x94\xd0\xbe\xd0\xb1\xd1\x80\xd1\x8b\xd0\xb9')
'Добрый'

__all__

hbutils.encoding.decode.__all__ = ['auto_decode']

Built-in mutable sequence.

If no argument is given, the constructor creates a new empty list. The argument must be an iterable if specified.

auto_decode

hbutils.encoding.decode.auto_decode(data: bytes, encoding: str | None = None, prefers: List[str] | None = None) str[source]

Automatically decode binary data into text.

This function attempts to decode binary data using multiple strategies:

  1. If an encoding is explicitly specified, it is used directly.

  2. Otherwise, preferred encodings are tried in order.

  3. The system default encoding is tried next.

  4. Finally, the chardet library is used to detect a likely encoding.

The function tries each encoding until one succeeds. If all attempts fail, the UnicodeDecodeError with the longest successful decode position is raised.

Parameters:
  • data (bytes) – Original binary data to be decoded.

  • encoding (Optional[str]) – Encoding to use explicitly. If None, the encoding will be automatically detected using the described strategy.

  • prefers (Optional[List[str]]) – Preferred encodings to try first. If None, the default preferred encodings (utf-8, gbk, gb2312, gb18030, big5) are used.

Returns:

Decoded string.

Return type:

str

Raises:
  • UnicodeDecodeError – If all decoding attempts fail.

  • LookupError – If any attempted encoding is unknown.

Note

The detection step uses chardet.detect(), which may return None as the detected encoding; such values are ignored.

Examples:

>>> auto_decode(b'kdsfjldsjflkdsmgds')
'kdsfjldsjflkdsmgds'
>>> auto_decode(b'\\xd0\\x94\\xd0\\xbe\\xd0\\xb1\\xd1\\x80\\xd1\\x8b\\xd0\\xb9 \\xd0'
...             b'\\xb2\\xd0\\xb5\\xd1\\x87\\xd0\\xb5\\xd1\\x80')
'Добрый вечер'
>>> auto_decode(b'\\xa4\\xb3\\xa4\\xf3\\xa4\\xd0\\xa4\\xf3\\xa4\\xcf')
'こんばんは'
>>> auto_decode(b'\\xcd\\xed\\xc9\\xcf\\xba\\xc3')
'晚上好'