[Buildroot] [PATCH 08/19] support/check-uniq-files: decode as many strings as possible
Arnout Vandecappelle
arnout at mind.be
Fri Feb 8 22:02:50 UTC 2019
On 08/02/2019 22:22, Yann E. MORIN wrote:
> Arnout, All,
>
> On 2019-02-08 21:42 +0100, Arnout Vandecappelle spake thusly:
>> On 08/02/2019 18:25, Yann E. MORIN wrote:
>>> On 2019-02-08 00:40 +0100, Arnout Vandecappelle spake thusly:
>>>> On 07/01/2019 23:05, Yann E. MORIN wrote:
>>>>> +def str_decode(s):
>>>>> + try:
>>>>> + return s.decode()
>>>>> + except UnicodeDecodeError:
>>>>> + return repr(s)
>>>>
>>>> I think s.decode(errors='replace') is exactly what we want: it prints the
>>>> question mark character for things that can't be represented, just like ls does.
> [--SNIP--]
>>> >>> lines[0].decode(errors='replace')
>>> u'\ufffd\n'
>>> >>> print('{}'.format(lines[0].decode(errors='replace')))
>>> Traceback (most recent call last):
>>> File "<stdin>", line 1, in <module>
>>> UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 0: ordinal not in range(128)
>>
>> Meh, Python2 unicode handling always confuses the hell out of me...
>>
>> So, to do it well, in python3 you need to do:
>> print(b'\xc5\x93\xff'.decode(sys.getfilesystemencoding(),errors='replace'))
>>
>> while in python2 the proper thing to do is
>>
>> print(b'\xc5\x93\xff'.decode(sys.getfilesystemencoding(), \
>> errors='replace').encode(sys.getfilesystemencoding(),errors='replace'))
>>
>> (sys.getfilesystemencoding() makes sure we use the user's encoding so stuff that
>> can be printed gets properly printed).
>>
>> I couldn't find a way to do the right thing both in python2 and python3...
>
> At which point, my proposal is much simpler, and more understandable,
> don't you think?
Absolutely. Well, it's imperfect because it prints the ugly b'....' in case
there is an non-decodable character, but it's good enough.
Regards,
Arnout
More information about the buildroot
mailing list