[Buildroot] [PATCH] graph-depends: optimize remove_transitive_deps()
Samuel Martin
s.martin49 at gmail.com
Tue Dec 15 21:23:24 UTC 2015
Hi Thomas, Yann, all,
On Tue, Dec 15, 2015 at 9:37 PM, Yann E. MORIN <yann.morin.1998 at free.fr> wrote:
> Thomas, All,
>
> On 2015-12-15 11:21 +0100, Thomas Petazzoni spake thusly:
>> For large configurations, the execution time of
>> remove_transitive_deps() becomes really high, due to the number of
>> nested loops + the is_dep() function being recursive.
>>
>> For an allyespackageconfig, the remove_extra_deps() function takes 334
>> seconds to execute, and the overall time to generate the .dot file is
>> 6 minutes and 39 seconds. Here is a timing of the different
>> graph-depends steps and the overall execution time:
>>
>> Getting dependencies: 42.5735 seconds
>> Turn deps into a dict: 0.0023 seconds
>> Remove extra deps: 334.1542 seconds
>> Get version: 22.4919 seconds
>> Generate .dot: 0.0197 seconds
>>
>> real 6m39.289s
>> user 6m16.644s
>> sys 0m8.792s
>>
>> By adding a very simple cache for the results of is_dep(), we bring
>> down the execution time of the "Remove extra deps" step from 334
>> seconds to just 4 seconds, reducing the overall execution time to 1
>> minutes and 10 seconds:
>>
>> Getting dependencies: 42.9546 seconds
>> Turn deps into a dict: 0.0025 seconds
>> Remove extra deps: 4.9643 seconds
>> Get version: 22.1865 seconds
>> Generate .dot: 0.0207 seconds
>>
>> real 1m10.201s
>> user 0m47.716s
>> sys 0m7.948s
Impressive! :-)
>
> Wee! :-)
>
> Still, somme comments, see below...
>
> I did use the Python profiler to investigate:
> python -m cProfile -s cumulative support/scripts/graph-depends >foo.list
>
> Unfortunately, it is not possible to dump a text version to a file... :-(
> Or I'm too dumb for that. :-]
>
>> Signed-off-by: Thomas Petazzoni <thomas.petazzoni at free-electrons.com>
>> ---
>> support/scripts/graph-depends | 27 +++++++++++++++++++++++++++
>> 1 file changed, 27 insertions(+)
>>
>> diff --git a/support/scripts/graph-depends b/support/scripts/graph-depends
>> index 5f77038..d39306e 100755
>> --- a/support/scripts/graph-depends
>> +++ b/support/scripts/graph-depends
>> @@ -237,16 +237,43 @@ for dep in dependencies:
>> dict_deps[dep[0]] = []
>> dict_deps[dep[0]].append(dep[1])
>>
>> +# Basic cache for the results of the is_dep() function, in order to
>> +# optimize the execution time. The cache is a dict of dict of boolean
>> +# values. The key to the primary dict is "pkg", and the key of the
>> +# sub-dicts is "pkg2".
>> +is_dep_cache = {}
>> +
>> +def is_dep_cache_insert(pkg, pkg2, val):
>> + if is_dep_cache.has_key(pkg):
>
> if pkg in is_dep_cache
>
>> + is_dep_cache[pkg].update({pkg2: val})
>> + else:
>> + is_dep_cache[pkg] = {pkg2: val}
>> +
>> +# Returns a tuple (r, val), where r is a boolean that indicates if a
>> +# value was found in the cache or not, and val being the value found
>> +# (only when r is True).
>> +def is_dep_cache_lookup(pkg, pkg2):
>> + if is_dep_cache.has_key(pkg):
>> + if is_dep_cache[pkg].has_key(pkg2):
>> + return (True, is_dep_cache[pkg][pkg2])
>> + return (False, False)
>
> I think using exceptions is better:
>
> return is_dep_cache[pkg][pkg2]
>
> Don't catch any exception, it'll be propagated up as usual, see below...
Well, since this patch is about performance optimization, exception
should be avoided because they are expensive [1].
But in the end, the choice will also depends on the maintainability of
one or the other option.
>
>> # This function return True if pkg is a dependency (direct or
>> # transitive) of pkg2, dependencies being listed in the deps
>> # dictionary. Returns False otherwise.
>> def is_dep(pkg,pkg2,deps):
>> + (r, val) = is_dep_cache_lookup(pkg, pkg2)
>> + if r:
>> + return val
>> if pkg2 in deps:
>> for p in deps[pkg2]:
>> if pkg == p:
>> + is_dep_cache_insert(pkg, pkg2, True)
>> return True
>> if is_dep(pkg,p,deps):
>> + is_dep_cache_insert(pkg, pkg2, True)
>> return True
>> + is_dep_cache_insert(pkg, pkg2, False)
>> return False
>
> Here, I would have dome a bit differently:
>
> - keep is_dep() untouched
> - rename is_dep() to is_dep_uncached()
> - add is_dep() as such:
>
> def is_dep(pkg,pkg2,deps):
> try:
> return is_dep_cache_lookup(pkg,pkg2)
> except KeyError:
> val = is_dep_uncached(pkg, pkg2, deps)
> is_dep_cache_insert(pkg, pkg2, val)
> return val
>
> Not really tested, but should work I hope! ;-)
>
> Regards,
> Yann E. MORIN.
>
>> # This function eliminates transitive dependencies; for example, given
>> --
>> 2.6.4
>>
>
> --
> .-----------------.--------------------.------------------.--------------------.
> | Yann E. MORIN | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
> | +33 662 376 056 | Software Designer | \ / CAMPAIGN | ___ |
> | +33 223 225 172 `------------.-------: X AGAINST | \e/ There is no |
> | http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL | v conspiracy. |
> '------------------------------^-------^------------------^--------------------'
> _______________________________________________
> buildroot mailing list
> buildroot at busybox.net
> http://lists.busybox.net/mailman/listinfo/buildroot
Regards,
--
Samuel
More information about the buildroot
mailing list