[Buildroot] [PATCH 09/19] support: add parser in python for packages-file-list files

Yann E. MORIN yann.morin.1998 at free.fr
Tue Jan 8 16:29:32 UTC 2019


Thomas DS, All,

On 2019-01-08 14:35 +0100, Thomas De Schampheleire spake thusly:
> El lun., 7 ene. 2019 a las 23:06, Yann E. MORIN
> (<yann.morin.1998 at free.fr>) escribió:
[--SNIP--]
> > Furthermore, that format is not easily extensible.
> >
> > So, introduce a parser, in python, that abstracts the format of these
> > files, and returns a dictionaries. Using dictionaries makes it easy for
> returns a dictionary.

Actually, a call to the function will return a list of dictionarries,
one for each file in the list. They are yielded, so returned one by one
to iterate over easily, but many dictionaries are returned.

So:   s/a /an iterator to a list of /

(I believe this to be an iterator, right?)

> > +def parse_pkg_file_list(path):
> > +    with open(path, 'rb') as f:
> 
> Why exactly do you open as binary?

Because it contains filenames, and filenames are a sequence of bytes,
not encioded in any way. Filenames can contain 0xbd (œ in iso8859-15)
or it may contain 0x6f65 which is U+0153 encoded in UTF-8. It may even
contain both. It is not a hypotetical situation, as already encountered
it more than once.

There is no way we can know the encoding of a filename, so we treat them
for what they are: sequence of bytes.

> IIRC this will cause the need for decoding the strings on Python 3,
> which you don't seem to do. This was also the reason why the orginal
> check-uniq-files needed 'b' markers in front of some strings like
> b','.

Exactly, and hence the reason that check-uniq-files will try to decide
those strings.

There might be a gap in the transition, though, as size-stat may need to
represent those strings when generating the graphs, or generating the
CVS files... Damned...

> > +        for rec in f.readlines():
> > +            l = rec.split(',0')
> 
> This looks wrong to me, there is no text ',0' in the input.
> I think you meant rec.split(',', 1), no, like in the original code?

Yeah, I borked a rebase at some point...

> > +            d = {
> > +                  'file': l[0],
> > +                  'pkg':  l[1],
> 
> This seems also wrong, 0 and 1 are swapped.

Yeah, I borked a rebase at some point.

> Example input is:
> 
> busybox,./usr/bin/eject
> busybox,./usr/bin/basename
> busybox,./usr/bin/hexedit
> busybox,./usr/bin/readlink
> busybox,./usr/bin/cmp
> 
> so I would expect 'pkg' to be l[0] and 'file' to be l[1].
> 
> Note that 'l' could perhaps become a more meaningful variable name. I
> also think that one of the python rules is that variable names should
> be minimum 3 characters.

So, flake8 does whine about 'l' but not about 'd'. And I borked a rebase
at some point, where I renamed 'l' and it ended up in a later commit.

I'll fix that here.

> > -    with open(args.packages_file_list[0], 'rb') as pkg_file_list:
> > -        for line in pkg_file_list.readlines():
> > -            pkg, _, file = line.rstrip(b'\n').partition(b',')
> Note that the rstrip of newlines is no longer present in parse_pkg_file_list.

Good catch. I'll fix.

> > +    fname = os.path.join(builddir, "build", "packages-file-list.txt")
> > +    for record in parse_pkg_file_list(fname):
> > +        # remove the initial './' in each file path
> > +        fpath = record['file'].strip()[2:]
> 
> The stripping here and the rstrip in check-uniq-files could perhaps
> all be moved to parse_pkg_file_list.

So, I am not sure about this one. I'm not even sure why we don't keep
the '/' either. After all, they are target files, so they should be
representable as they appear on the target, i.e. with a leading '/'

In any case, I wanted the introduction of the parser to be a drop-in
replacement for the existing ad-hoc parsers, as much as possible.

The seamantic of stripping the leading './' can be commonalised (or
fixed, or dropped) in a later patch, when this series is already big
enough as it is, and already touching (pun!) touchy (re-pun!) topics.

Regards,
Yann E. MORIN.

> > +        fullpath = os.path.join(builddir, "target", fpath)
> > +        add_file(filesdict, fpath, fullpath, record['pkg'])
> >      return filesdict
> >
> 
> /Thomas

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'



More information about the buildroot mailing list