[Buildroot] [PATCHv3 2/5] pkg-generic: add step_pkg_size global instrumentation hook
Yann E. MORIN
yann.morin.1998 at free.fr
Sun Feb 15 16:59:51 UTC 2015
Thomas, All,
On 2015-02-05 22:19 +0100, Thomas Petazzoni spake thusly:
> This patch adds a global instrumentation hook that collects the list
> of files installed in $(TARGET_DIR) by each package, and stores this
> list into a file called $(BUILD_DIR)/packages-file-list.txt. It can
> later be used to determine the size contribution of each package to
> the target root filesystem.
>
> Note that in order to detect if a file installed by one package is
> later overriden by another package, we calculate the md5 of installed
> files and compare them at each installation of a new package.
>
> This commit also adds a Config.in option to enable the collection of
> this data, as calculating the md5 of all installed files at the
> beginning and end of the installation of each package can be
> considered a time-consuming process which maybe some users will not be
> willing to suffer from.
Well, I'd like to challenge that assertion, so I did a pretty "big" build:
Kodi on the RPi, with all Kodi addons enabled, plus a few additional
usefull packages (connman, dropbear and the likes).
The config has about 148 packages (make show-targets |wc -w), and takes
roughly 1h and 20min on my machine.
Because I did not time md5sum after each package was installed, I simply
timed the md5sum at the end, on a completely-populated target/ . That
gives a pretty good upper-bound of the overhead each package would incur
(i.e. the very first packages would be so much faster as there are far
fewer files installed).
$ du -hs target/
247M target/
$ find target -type f |wc -l
5150
$ tar cf - target/ |wc -c
242923520
$ tar cf - target/ |time md5sum
1d393aaf76ef6a7a462519f4b8b861e7 -
0.36user 0.03system 0:00.41elapsed 96%CPU (0avgtext+0avgdata 748maxresident)k
0inputs+0outputs (0major+233minor)pagefaults 0swaps
$ date '+%s.%N'; \
find target -type f -print0 2>/dev/null \
| xargs -0 md5sum >/dev/null 2>&1; \
date '+%s.%N'
1424018960.577764719
1424018960.994814894
So, the overhead of md5sum-ing each file independently (on a cache-hot
target/) is about less than 0.5s (only so-slightly bigger than md5sum-ing
the whole tarball thereof) .
Yes, 0.5s. Half-a-second. ;-)
That would give an upper-bound of the overhead for the whole build
somewhere in the 2-minute range (148*2*0.5). Out of a 1h 20min build.
Yes, md5 is a very fast hash. For reference, hashing a 512MiB blob takes
about less than a second.
I believe this overhead is negligible and we should unconditionally
enable that feature.
> Signed-off-by: Thomas Petazzoni <thomas.petazzoni at free-electrons.com>
> ---
> Config.in | 9 +++++++++
> package/pkg-generic.mk | 36 ++++++++++++++++++++++++++++++++++++
> 2 files changed, 45 insertions(+)
>
> diff --git a/Config.in b/Config.in
> index f5b6c73..58a5085 100644
> --- a/Config.in
> +++ b/Config.in
> @@ -613,6 +613,15 @@ config BR2_COMPILER_PARANOID_UNSAFE_PATH
> toolchain (through gcc and binutils patches) and external
> toolchain backends (through the external toolchain wrapper).
>
> +config BR2_COLLECT_FILE_SIZE_STATS
> + bool "collect statistics about installed file size"
> + help
> + Enable this option to let Buildroot collect data about the
> + installed files. When this option is enabled, you will be
> + able to use the 'size-stats' make target, which will
> + generate a graph and CSV files giving statistics about the
> + installed size of each file and each package.
> +
> endmenu
>
> endmenu
> diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
> index 1b09955..db35a87 100644
> --- a/package/pkg-generic.mk
> +++ b/package/pkg-generic.mk
> @@ -55,6 +55,42 @@ define step_time
> endef
> GLOBAL_INSTRUMENTATION_HOOKS += step_time
>
> +# Hooks to collect statistics about installed files
> +ifeq ($(BR2_COLLECT_FILE_SIZE_STATS),y)
> +
> +# This hook will be called before the target installation of a
> +# package. We store in a file named $(1).filelist_before the list of
> +# files currently installed in the target. Note that the MD5 is also
> +# stored, in order to identify if the files are overwritten.
> +define step_pkg_size_start
> + (cd $(TARGET_DIR) ; find . -type f -print0 | xargs -0 md5sum) | sort > \
> + $(BUILD_DIR)/$(1).filelist_before
Why don't you store that in the package's $(@D) ?
I don't really care, but if we're going to use $(BUILD_DIR) to store
temporary files, it might be time we introduce a better location
(probably somthing like BR2_TMP_DIR=$(BUILD_DIR)/.tmp/ )
We alreaduy ahve some temporary stuff written in there, and I find it
ugly (yes, I added some myself!).
Note: not related to your changes, of course, just prompted by them.
> +endef
> +
> +# This hook will be called after the target installation of a
> +# package. We store in a file named $(1).filelist_after the list
> +# of files (and their MD5) currently installed in the target. We then
> +# do a diff with the $(1).filelist_before to compute the list of
> +# files installed by this package.
> +define step_pkg_size_end
> + (cd $(TARGET_DIR); find . -type f -print0 | xargs -0 md5sum) | sort > \
> + $(BUILD_DIR)/$(1).filelist_after
> + comm -13 $(BUILD_DIR)/$(1).filelist_before $(BUILD_DIR)/$(1).filelist_after | \
> + while read hash file ; do \
> + echo "$(1),$${file}" >> $(BUILD_DIR)/packages-file-list.txt ; \
> + done
> + $(RM) -f $(BUILD_DIR)/$(1).filelist_before \
> + $(BUILD_DIR)/$(1).filelist_after
> +endef
> +
> +define step_pkg_size
> + $(if $(filter install-target,$(2)),\
> + $(if $(filter start,$(1)),$(call step_pkg_size_start,$(3))) \
> + $(if $(filter end,$(1)),$(call step_pkg_size_end,$(3))))
> +endef
When I introduced the instrumentation hooks, I did not envision they
would be used like that, directly as Makefile code.
What I expected is we would be using scripts (python, shell, whatever!)
somewhere in support/ , that would do their own filtering.
It's pretty fascinating how we all differ in reasoning! :-)
Regards,
Yann E. MORIN.
> +GLOBAL_INSTRUMENTATION_HOOKS += step_pkg_size
> +endif
> +
> # User-supplied script
> ifneq ($(BR2_INSTRUMENTATION_SCRIPTS),)
> define step_user
> --
> 2.1.0
>
> _______________________________________________
> buildroot mailing list
> buildroot at busybox.net
> http://lists.busybox.net/mailman/listinfo/buildroot
--
.-----------------.--------------------.------------------.--------------------.
| Yann E. MORIN | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software Designer | \ / CAMPAIGN | ___ |
| +33 223 225 172 `------------.-------: X AGAINST | \e/ There is no |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL | v conspiracy. |
'------------------------------^-------^------------------^--------------------'
More information about the buildroot
mailing list