[Buildroot] [PATCH] support/check-uniq-files: support weird locales and filenames

Arnout Vandecappelle arnout at mind.be
Sat Mar 31 13:37:11 UTC 2018



On 31-03-18 14:52, Yann E. MORIN wrote:
> Currently, when a filename contains characters not representable in the
> user's locale, we fail hard, especially when the host python is python3.
> 
> This is because python2 and python3 handle encoding/decoding strings
> differently, with python3 presumable doing the right thing, but it
> breaks on some systems, while python2 presumable does the wrong thing,
> but it works everywhere. (Just joking, obviously...)
> 
> Part of the issue being that the csv reader in python2 is broken with
> UTF8.
> 
> We fix the issue by ditching the csv reader, and simply read the file in
> binary mode, manually partitionning the lines on the first comma.
> 
> Then, we use the binary-encoded (really, un-encoded) package names and
> filenames as values and keys, respectively.
> 
> Finally, for each filename of package we need to print, we try to decode
> them with the default s for the usser settings, but catch any decoding
> exception and fallback to dumping the raw, binary values. in that case.
> 
> Thanks a lot to Arnout for the live help doing this patch. :-)
> 
> Reported-by: Jaap Crezee <jaap at jcz.nl>
> Signed-off-by: "Yann E. MORIN" <yann.morin.1998 at free.fr>
> Cc: Arnout Vandecappelle <arnout at mind.be>
> Cc: Jaap Crezee <jaap at jcz.nl>


 Applied to master, thanks. But I couldn't resist extended the commit log a
little more - it really was too short :-P

 Regards,
 Arnout

> ---
>  support/scripts/check-uniq-files | 19 +++++++++++++------
>  1 file changed, 13 insertions(+), 6 deletions(-)
> 
> diff --git a/support/scripts/check-uniq-files b/support/scripts/check-uniq-files
> index be808cce03..f110176274 100755
> --- a/support/scripts/check-uniq-files
> +++ b/support/scripts/check-uniq-files
> @@ -26,16 +26,23 @@ def main():
>          return False
>  
>      file_to_pkg = defaultdict(list)
> -    with open(args.packages_file_list[0], 'r') as pkg_file_list:
> -        r = csv.reader(pkg_file_list, delimiter=',')
> -        for row in r:
> -            pkg = row[0]
> -            file = row[1]
> +    with open(args.packages_file_list[0], 'rb') as pkg_file_list:
> +        for line in pkg_file_list.readlines():
> +            pkg, _, file = line.rstrip(b'\n').partition(b',')
>              file_to_pkg[file].append(pkg)
>  
>      for file in file_to_pkg:
>          if len(file_to_pkg[file]) > 1:
> -            sys.stderr.write(warn.format(args.type, file, file_to_pkg[file]))
> +            # If possible, try to decode the binary strings with
> +            # the default user's locale
> +            try:
> +                sys.stderr.write(warn.format(args.type, file.decode(),
> +                                             [p.decode() for p in file_to_pkg[file]]))
> +            except UnicodeDecodeError:
> +                # ... but fallback to just dumping them raw if they
> +                # contain non-representable chars
> +                sys.stderr.write(warn.format(args.type, file,
> +                                             file_to_pkg[file]))
>  
>  
>  if __name__ == "__main__":
> 

-- 
Arnout Vandecappelle                          arnout at mind be
Senior Embedded Software Architect            +32-16-286500
Essensium/Mind                                http://www.mind.be
G.Geenslaan 9, 3001 Leuven, Belgium           BE 872 984 063 RPR Leuven
LinkedIn profile: http://www.linkedin.com/in/arnoutvandecappelle
GPG fingerprint:  7493 020B C7E3 8618 8DEC 222C 82EB F404 F9AC 0DDF



More information about the buildroot mailing list