hbutils.encoding.decode
Automatic binary decoding utilities.
This module provides helpers for decoding binary data into text by attempting
multiple encodings. The primary entry point is auto_decode(), which
tries an explicit encoding if provided, then a list of preferred encodings,
and finally uses system defaults and the chardet detector.
The module contains the following main components:
auto_decode()- Automatically decode bytes into text using multiple strategies
Note
Decoding results depend on the provided byte stream and the available encodings in the runtime environment.
Example:
>>> from hbutils.encoding.decode import auto_decode
>>> auto_decode(b'kdsfjldsjflkdsmgds')
'kdsfjldsjflkdsmgds'
>>> auto_decode(b'\xd0\x94\xd0\xbe\xd0\xb1\xd1\x80\xd1\x8b\xd0\xb9')
'Добрый'
__all__
- hbutils.encoding.decode.__all__ = ['auto_decode']
Built-in mutable sequence.
If no argument is given, the constructor creates a new empty list. The argument must be an iterable if specified.
auto_decode
- hbutils.encoding.decode.auto_decode(data: bytes, encoding: str | None = None, prefers: List[str] | None = None) str[source]
Automatically decode binary data into text.
This function attempts to decode binary data using multiple strategies:
If an
encodingis explicitly specified, it is used directly.Otherwise, preferred encodings are tried in order.
The system default encoding is tried next.
Finally, the
chardetlibrary is used to detect a likely encoding.
The function tries each encoding until one succeeds. If all attempts fail, the
UnicodeDecodeErrorwith the longest successful decode position is raised.- Parameters:
data (bytes) – Original binary data to be decoded.
encoding (Optional[str]) – Encoding to use explicitly. If
None, the encoding will be automatically detected using the described strategy.prefers (Optional[List[str]]) – Preferred encodings to try first. If
None, the default preferred encodings (utf-8,gbk,gb2312,gb18030,big5) are used.
- Returns:
Decoded string.
- Return type:
str
- Raises:
UnicodeDecodeError – If all decoding attempts fail.
LookupError – If any attempted encoding is unknown.
Note
The detection step uses
chardet.detect(), which may returnNoneas the detected encoding; such values are ignored.Examples:
>>> auto_decode(b'kdsfjldsjflkdsmgds') 'kdsfjldsjflkdsmgds' >>> auto_decode(b'\\xd0\\x94\\xd0\\xbe\\xd0\\xb1\\xd1\\x80\\xd1\\x8b\\xd0\\xb9 \\xd0' ... b'\\xb2\\xd0\\xb5\\xd1\\x87\\xd0\\xb5\\xd1\\x80') 'Добрый вечер' >>> auto_decode(b'\\xa4\\xb3\\xa4\\xf3\\xa4\\xd0\\xa4\\xf3\\xa4\\xcf') 'こんばんは' >>> auto_decode(b'\\xcd\\xed\\xc9\\xcf\\xba\\xc3') '晚上好'