[Buildroot] [PATCH 08/19] support/check-uniq-files: decode as many strings as possible

Fri Feb 8 22:02:50 UTC 2019

On 08/02/2019 22:22, Yann E. MORIN wrote:
> Arnout, All,
> 
> On 2019-02-08 21:42 +0100, Arnout Vandecappelle spake thusly:
>> On 08/02/2019 18:25, Yann E. MORIN wrote:
>>> On 2019-02-08 00:40 +0100, Arnout Vandecappelle spake thusly:
>>>> On 07/01/2019 23:05, Yann E. MORIN wrote:
>>>>> +def str_decode(s):
>>>>> +    try:
>>>>> +        return s.decode()
>>>>> +    except UnicodeDecodeError:
>>>>> +        return repr(s)
>>>>
>>>>  I think s.decode(errors='replace') is exactly what we want: it prints the
>>>> question mark character for things that can't be represented, just like ls does.
> [--SNIP--]
>>>     >>> lines[0].decode(errors='replace')
>>>     u'\ufffd\n'
>>>     >>> print('{}'.format(lines[0].decode(errors='replace')))
>>>     Traceback (most recent call last):
>>>       File "<stdin>", line 1, in <module>
>>>     UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 0: ordinal not in range(128)
>>
>>  Meh, Python2 unicode handling always confuses the hell out of me...
>>
>>  So, to do it well, in python3 you need to do:
>> print(b'\xc5\x93\xff'.decode(sys.getfilesystemencoding(),errors='replace'))
>>
>> while in python2 the proper thing to do is
>>
>> print(b'\xc5\x93\xff'.decode(sys.getfilesystemencoding(), \
>> 	errors='replace').encode(sys.getfilesystemencoding(),errors='replace'))
>>
>> (sys.getfilesystemencoding() makes sure we use the user's encoding so stuff that
>> can be printed gets properly printed).
>>
>>  I couldn't find a way to do the right thing both in python2 and python3...
> 
> At which point, my proposal is much simpler, and more understandable,
> don't you think?

 Absolutely. Well, it's imperfect because it prints the ugly b'....' in case
there is an non-decodable character, but it's good enough.

 Regards,
 Arnout