[Buildroot] [PATCH v2 1/2] support/scripts/pkg-stats: URL checking support
Ricardo Martincoski
ricardo.martincoski at gmail.com
Sat Sep 29 05:40:43 UTC 2018
Hello,
In overall looks good. A few issues to fix below.
On Fri, Sep 21, 2018 at 12:35 PM, Matt Weber wrote:
[snip]
> INFRA_RE = re.compile("\$\(eval \$\(([a-z-]*)-package\)\)")
>
[snip]
> + def set_url(self):
> + """
> + Fills in the .url field
> + """
> + in_help_section = False
> + self.url_status = "No Config.in"
> + for filename in os.listdir(os.path.dirname(self.path)):
> + if fnmatch.fnmatch(filename, 'Config.*'):
> + fp = open(os.path.join(os.path.dirname(self.path), filename), "r")
> + for config_line in fp:
> + if config_line.strip() == "help":
> + in_help_section = True
Please take at look at HelpText @ utils/checkpackagelib/lib_config.py
A Config.in can have many help sections (one per option) so using a single flag
to know if we are at a help section is not enough. When any string from
entries_that_should_not_be_indented appears at the beginning of a line, the help
section is over.
But maybe we don't even need to check this, if the regexp is more strict.
> + if in_help_section and re.match("(.*)https?://", config_line):
For package/libcoap/Config.in
the regexp will find 'CoAP, see <http://coap.technology>.'
And for package/lttng-babeltrace/Config.in
'Trace Format (see <http://diamon.org/ctf/>). Babeltrace'
... so a better regexp would be:
re.match("\s*https?://", config_line):
Also it is good practice to compile the regexp just like it is done for
INFRA_RE. Because this script have a small number of regexps and the Python
interpreter do some caching (I don't know the internals of this), probably it
won't make a difference in performance. But this code does potentially 2000+
regexp compile() while only one is needed.
> + self.url = ''.join(config_line.split())
join() is not needed. Actually we want to match only 'upstream URLs' that are
always the only text in a line, so an even better regexp would be:
re.match("\s*https?://\S*$", config_line):
or
re.match("\s*https?://\S*\s*$", config_line):
if you want the regexp work even when there are trailing white spaces.
... so no need to join() or split(), just strip().
Notice I didn't tested any code I suggested.
[snip]
> +def check_url_status(pkg):
> + if pkg.url_status != "Missing" and pkg.url_status != "No Config.in":
> + try:
> + url_status_code = requests.head(pkg.url, timeout=5).status_code
This timeout is a bit strict in my opinion, but that is not that important in
this first patch, using serial requests.
It will became important in the next patch, to avoid false timeouts for slow
hosts, or hosts with a high load, or limited internet connection, or slow
servers ... so in the next patch I will suggest to use 30 seconds.
> + if url_status_code >= 400:
> + pkg.url_status = "Invalid(%s)" % str(url_status_code)
> + except requests.exceptions.RequestException as e:
> + return
Please fix the 3 warnings from flake8.
Regards,
Ricardo
More information about the buildroot
mailing list