[Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2)
Yann E. MORIN
yann.morin.1998 at free.fr
Mon Jan 7 22:05:35 UTC 2019
Hello All!
Currently, the instrumentation steps, that we run after a package is
installed, get confused about the files that package may have be
responsible for.
The first problem is that all .la files are tweaked after a package is
installed, and thus those files are all then newer than the built
stampfile of that package, and consequently all .la files are accounted
to that package.
The second problem is that, during development and agter a user
requested a package reinstall (but not a rebuild!), then the built
stampfile is much older, and thus all files that have been installed
since the package was last built are accoutned to that package.
Those two problems are caused by 7fb6e782542f, when we switched away
from an md5 comparison between the state before and after the
installation, to a time-based comparison against the bult stampfile.
Furthermore, during development, the list of installed files can get
out of sync with what is really installed. For example, if a user were
to modify the source of a package, and trigger a re-configure, rebuild,
or re-install, then we'd remove the list of previously installed files
before generating the list of currently installed files. If files
installed in the previous installation are no longer installed, they are
still present in the target (or staging or host), but no longer
accounted to the package that instaleld them.
Additionally, when two or more packages install the same file and it has
the same content, we don't care much about which actually installed it,
as they would all have installed the exact same file. The size could be
assigned to any of those packages, and the licensing terms of any of
those package may be applied to that file. The case is mostly prominent
with the fftw familly of packages (soon to come) that install the same
headers and the same utilities.
Finally, there is one prominent file that gets _updated_ (and not
replaced) by many packages: the info page index, which packages update
when they install their own info pages. We currently report that file,
when in fact it does not end up in target, and thus we don't care about
how its content came to be. And more generically, we don't care any file
that we eventually remove as part of our target-finalize cleanups.
This series is thus an attempt at fixing all those issues.
First and foremost, the series addresses the limitation that causes the
first two problems: we do not have a way to know when the install steps
were started (or any other step, for that matters, but we're currently
only interested in the install steps). So, the first few patches make it
so that we can introduce an new timestamp file at the beginning of each
step.
Then, with the information about the beginning of the install step, we
can now limit the .la files tweaking to just those files that were
actually instaleld y a package. And then we use that same stamp file to
limit the listing of installed files accountable to the current package.
Then the series addreses the same-identical-file-from-many-packages. To
do so, it partially restore the md5sum of the files, but this is
limitted to only those files actually touched during the install of the
current package (see above), and is only ran at the end of the install,
not before. As thus, this is much faster than the original situation
that did the md5 of all files before and after, because it now acts on
cache-hot files only.
That part is split in two: first, the formnat of the packages-file-list
files is modified to be more resilient to weird filenames, which then
allows us to expand it with arbitrarily more fields. A python helper is
provided to abstract the new format, and the consumers of those files
are updated to use the helper (with one script being rewritten in
python). Then we make use of this new format to store the md5 of the
files contents, which we eventually use to decide whether to report the
file or not.
Now, files that are missing from the destination directory are no longer
elligible for being reported as being touched by more than ne pacakge
anymore.
And finally, now that we have a dependable check for uniqueness, we can
add an option in the menuconfig to turn the current warning into a hard
error when uniqueness is not met.
Since this is a time-sensitive topic, here are a few timings before and
after this series, over 6 runs on an idle machine, with a configuration:
- prebuilt glibc toolchain
- 233 packages, most pretty small and building fast
- target/: 215MiB, 14922 files, directories, symlinks...
- staging/: 625MiB, 29029 files, directories, symlinks...
- host/: 2.1GiB, 44129 files, directories, symlinks...
best minutes:seconds worst mean
before: 36:20 36:22 36:23 36:24 36:27 36:28 36:24
after: 36:29 36:31 36:32 36:33 36:35 36:37 36:33
So, this is a 9s overhead over 2184s (36:24, before), i.e. a mere 0.4%
increase in time over the full build, or just about a 38ms overhead per
package on average. This overhead is real, but is still very far from
the huge one that was choped off by 7fb6e782542f.
Additionally, the time for re-installing the last package does not
suffer from an already large number or size of files already present.
Best result of three builds (to be cache-hot), for one target package
with a staging install, and one for host package:
skeleton-init-common-reinstall host-patchelf-reinstall
before: 8.258s 4.951s
after: 4.514s 5.034s
delta: -3.744s +0.083s
So, basically, what this means is that, during development, reinstalling
a previous package is faster. This is because, even though we spend (a
little tiny wee bit) more time when lisitings files due to the md5sum
(and really, thats really just a few additional millieconds per package),
we get repaid hundreths-fold because the list is now accurate, and we
can limit ourselves to tweaking only the corresponding .la file, but
also limit the check-bin-arch to only those files actually interesting.
The host packages are still slightly impacted as we can see for
host-patchelf, because the check-bin-arch does not apply to them, so the
gain from running check-bin-arch only on just-installed files can't
apply to host packages. Still, the impact is minor.
I'd like to particularly thank Nicolas Cavallari for their valuable
input about the issues they encountered with the previous and current
situations. Many thanks! :-)
Regards,
Yann E. MORIN.
The following changes since commit 8e928a8389d88e0f64f04ee1b3aa4985dcfd373f
Makefile, manual, website: Bump copyright year (2019-01-06 21:30:34 +0100)
are available in the git repository at:
git://git.buildroot.org/~ymorin/git/buildroot.git
for you to fetch changes up to c7478b1fd1c92508f346f1a8626374d742c9c327
core: add optional failure when 2+ packages touch the same file (2019-01-07 23:04:09 +0100)
----------------------------------------------------------------
Yann E. MORIN (19):
infra/pkg-generic: display MESSAGE before running PRE_HOOKS
infra/pkg-generic: create $(@D) before running PRE_HOOKS
infra/pkg-generic: introduce new stampfile at the beginning of all steps
infra/pkg-generic: use \0 to separate .la files as they are found
infra/pkg-generic: tweak only .la files installed by the current package
infra/pkg-generic: only list files installed by the current package
infra/pkg-generic: offload same-package filtering to check-uniq-file
support/check-uniq-files: decode as many strings as possible
support: add parser in python for packages-file-list files
support: rewrite check-bin-arch in python
support: introduce new format for packages-file-list files
infra/pkg-generic: store md5 of just-installed files
support/check-uniq-file: invert condition logic
support/check-uniq-files: don't report files of the same content
support/check-uniq-files: use argparse to enfore required options
core: check unique files in the corresponding finalize step
core: check for unique target files after all our cleanups
core: ignore non-unique files that have disapeared
core: add optional failure when 2+ packages touch the same file
Config.in | 8 ++
Makefile | 22 ++++-
package/pkg-generic.mk | 41 +++++---
support/scripts/brpkgutil.py | 38 ++++++++
support/scripts/check-bin-arch | 205 +++++++++++++++++++++------------------
support/scripts/check-uniq-files | 69 +++++++------
support/scripts/size-stats | 14 +--
7 files changed, 255 insertions(+), 142 deletions(-)
--
.-----------------.--------------------.------------------.--------------------.
| Yann E. MORIN | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software Designer | \ / CAMPAIGN | ___ |
| +33 223 225 172 `------------.-------: X AGAINST | \e/ There is no |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL | v conspiracy. |
'------------------------------^-------^------------------^--------------------'
More information about the buildroot
mailing list