[Buildroot] [PATCH v2 1/1] utils/getdeveloperlib.py: explicitly set devs document encoding

James Knight james.d.knight at live.com
Sun Sep 19 02:37:51 UTC 2021


Peter,

On Sat, Sep 18, 2021 at 5:17 PM Peter Korsgaard <peter at korsgaard.com> wrote:
> Hmm, this doesn't quite seem to work when stdout is not a UTF-8 console
> ...
>
> ./utils/get-developers -p libyang
> Heiko Thiery <heiko.thiery at gmail.com>
> Jan Kundrát <jan.kundrat at cesnet.cz>
>
> LANG=C ./utils/get-developers -p libyang
> Heiko Thiery <heiko.thiery at gmail.com>
> ...
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 9: ordinal not in range(128)
>
> ./utils/get-developers -p libyang | cat
> ...
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 9: ordinal not in range(128)
>
> Reverting 9f127cc420884ad fixes it:
>
> ./utils/get-developers -p libyang
> Heiko Thiery <heiko.thiery at gmail.com>
> Jan Kundrát <jan.kundrat at cesnet.cz>
>
> LANG=C ./utils/get-developers -p libyang
> Heiko Thiery <heiko.thiery at gmail.com>
> Jan Kundrát <jan.kundrat at cesnet.cz>
>
> ./utils/get-developers -p libyang | cat
> Heiko Thiery <heiko.thiery at gmail.com>
> Jan Kundrát <jan.kundrat at cesnet.cz>
>
> Any idea about how to fix this, or should it just be reverted?

I have no problem with reverting if this is causing issues.

>From my (limited) understanding of dealing with encoding, Python and
shells; I may understand the issue here (feel free to correct me if I
am wrong on any of this). I assume that the Python interpreter being
used here is a Python 2.x version. Using the example provided above,
the name value "Kundrát" includes a Unicode character which cannot be
rendered on an ASCII-only supported terminal. For the second command
(the explicit configuration of "LANG=C"), the running Python
interpreter would assume an ASCII terminal, attempt to convert a
unicode string to ASCII and generate the observed exception. Why this
did not fail before this commit was that the Python interpreter would
be processing the name value as a byte string (i.e. not a Unicode
string). The interpreter would just print out the raw byte string to
the output stream and the UTF-8 console would handle/render it as
expected. In the event that these raw bytes are written to an
ASCII-only (or another type of character-only) supported terminal, the
rendered output may not be an expected one (e.g. a value such as
"Kundrát").

I cannot say I understand the output of the third command (I could not
reproduce with the environments I have set up). I assume that maybe
when the call works with the pipe operation, Python may be
auto-detecting the environment as an ASCII output only.

My initial impression of this issue is that it may be better with this
commit, since having the exception thrown would help guarantee that
the output entries would be renderable on the active terminal.
However, if a user is pushing the output to another command that
understands UTF-8, it would be better that the Python interpreter not
throw an exception here and forwarded the raw byte strings (and I
imagine having a user attempting to force "PYTHONIOENCODING" in this
case would be annoying over time) -- so maybe reverting is the best
case here.


More information about the buildroot mailing list