[Buildroot] [PATCH 09/19] support: add parser in python for packages-file-list files
Yann E. MORIN
yann.morin.1998 at free.fr
Tue Jan 8 16:29:32 UTC 2019
Thomas DS, All,
On 2019-01-08 14:35 +0100, Thomas De Schampheleire spake thusly:
> El lun., 7 ene. 2019 a las 23:06, Yann E. MORIN
> (<yann.morin.1998 at free.fr>) escribió:
[--SNIP--]
> > Furthermore, that format is not easily extensible.
> >
> > So, introduce a parser, in python, that abstracts the format of these
> > files, and returns a dictionaries. Using dictionaries makes it easy for
> returns a dictionary.
Actually, a call to the function will return a list of dictionarries,
one for each file in the list. They are yielded, so returned one by one
to iterate over easily, but many dictionaries are returned.
So: s/a /an iterator to a list of /
(I believe this to be an iterator, right?)
> > +def parse_pkg_file_list(path):
> > + with open(path, 'rb') as f:
>
> Why exactly do you open as binary?
Because it contains filenames, and filenames are a sequence of bytes,
not encioded in any way. Filenames can contain 0xbd (œ in iso8859-15)
or it may contain 0x6f65 which is U+0153 encoded in UTF-8. It may even
contain both. It is not a hypotetical situation, as already encountered
it more than once.
There is no way we can know the encoding of a filename, so we treat them
for what they are: sequence of bytes.
> IIRC this will cause the need for decoding the strings on Python 3,
> which you don't seem to do. This was also the reason why the orginal
> check-uniq-files needed 'b' markers in front of some strings like
> b','.
Exactly, and hence the reason that check-uniq-files will try to decide
those strings.
There might be a gap in the transition, though, as size-stat may need to
represent those strings when generating the graphs, or generating the
CVS files... Damned...
> > + for rec in f.readlines():
> > + l = rec.split(',0')
>
> This looks wrong to me, there is no text ',0' in the input.
> I think you meant rec.split(',', 1), no, like in the original code?
Yeah, I borked a rebase at some point...
> > + d = {
> > + 'file': l[0],
> > + 'pkg': l[1],
>
> This seems also wrong, 0 and 1 are swapped.
Yeah, I borked a rebase at some point.
> Example input is:
>
> busybox,./usr/bin/eject
> busybox,./usr/bin/basename
> busybox,./usr/bin/hexedit
> busybox,./usr/bin/readlink
> busybox,./usr/bin/cmp
>
> so I would expect 'pkg' to be l[0] and 'file' to be l[1].
>
> Note that 'l' could perhaps become a more meaningful variable name. I
> also think that one of the python rules is that variable names should
> be minimum 3 characters.
So, flake8 does whine about 'l' but not about 'd'. And I borked a rebase
at some point, where I renamed 'l' and it ended up in a later commit.
I'll fix that here.
> > - with open(args.packages_file_list[0], 'rb') as pkg_file_list:
> > - for line in pkg_file_list.readlines():
> > - pkg, _, file = line.rstrip(b'\n').partition(b',')
> Note that the rstrip of newlines is no longer present in parse_pkg_file_list.
Good catch. I'll fix.
> > + fname = os.path.join(builddir, "build", "packages-file-list.txt")
> > + for record in parse_pkg_file_list(fname):
> > + # remove the initial './' in each file path
> > + fpath = record['file'].strip()[2:]
>
> The stripping here and the rstrip in check-uniq-files could perhaps
> all be moved to parse_pkg_file_list.
So, I am not sure about this one. I'm not even sure why we don't keep
the '/' either. After all, they are target files, so they should be
representable as they appear on the target, i.e. with a leading '/'
In any case, I wanted the introduction of the parser to be a drop-in
replacement for the existing ad-hoc parsers, as much as possible.
The seamantic of stripping the leading './' can be commonalised (or
fixed, or dropped) in a later patch, when this series is already big
enough as it is, and already touching (pun!) touchy (re-pun!) topics.
Regards,
Yann E. MORIN.
> > + fullpath = os.path.join(builddir, "target", fpath)
> > + add_file(filesdict, fpath, fullpath, record['pkg'])
> > return filesdict
> >
>
> /Thomas
--
.-----------------.--------------------.------------------.--------------------.
| Yann E. MORIN | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software Designer | \ / CAMPAIGN | ___ |
| +33 223 225 172 `------------.-------: X AGAINST | \e/ There is no |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL | v conspiracy. |
'------------------------------^-------^------------------^--------------------'
More information about the buildroot
mailing list