hbutils.encoding.decode
- Overview:
Functions to deal with encoding binary data easily. This module provides utilities for automatically decoding binary data to strings by detecting the appropriate encoding, with support for preferred encodings and fallback mechanisms.
auto_decode
- hbutils.encoding.decode.auto_decode(data: bytes, encoding: str | None = None, prefers: List[str] | None = None) str[source]
Auto decode binary data to string, the encoding mode will be automatically detected.
This function attempts to decode binary data using multiple strategies: 1. If an encoding is explicitly specified, use it directly 2. Otherwise, try preferred encodings in order 3. Fall back to system default encoding 4. Use chardet library to detect the encoding
The function will try each encoding until one succeeds, keeping track of the best partial match in case all attempts fail.
- Parameters:
data (bytes) – Original binary data to be decoded.
encoding (Optional[str]) – Encoding mode to be used, default is
Nonewhich means this function needs to automatically detect the encoding.prefers (Optional[List[str]]) – Preferred encodings to try first. If
None, uses default preferred encodings (utf-8, gbk, gb2312, gb18030, big5).
- Returns:
Decoded string.
- Return type:
str
- Raises:
UnicodeDecodeError – If all decoding attempts fail, raises the error with the longest successful decode position.
Examples:
>>> auto_decode(b'kdsfjldsjflkdsmgds') 'kdsfjldsjflkdsmgds' >>> auto_decode(b'\xd0\x94\xd0\xbe\xd0\xb1\xd1\x80\xd1\x8b\xd0\xb9 \xd0' ... b'\xb2\xd0\xb5\xd1\x87\xd0\xb5\xd1\x80') 'Добрый вечер' >>> auto_decode(b'\xa4\xb3\xa4\xf3\xa4\xd0\xa4\xf3\xa4\xcf') 'こんばんは' >>> auto_decode(b'\xcd\xed\xc9\xcf\xba\xc3') '晚上好'