[Buildroot] [PATCH v5 10/11] autobuild-run: kill all children on SIGTERM
Thomas De Schampheleire
patrickdepinguin at gmail.com
Fri Dec 12 20:04:55 UTC 2014
From: Thomas De Schampheleire <thomas.de.schampheleire at gmail.com>
The autobuild-run spawns the main build process through the timeout
command. To handle its job correctly, this command creates all children
in its own process group, different from the process group of
autobuild-run itself.
Thus, when autobuild-run is killed and the signal handler kills the
entire process group, the build processes run through timeout remain
alive.
To handle this, record the PIDs of the timeout processes in an array
shared between the main autobuild-run process and its instances. The
signal handler will iterate over all active processes in this array, and
kill them explicitly.
If a new timeout process would be started after the signal handler was
invoked but before the entire process tree is killed, this process could
remain alive too. To prevent this from occurring, the signal handler now
starts with terminating all instances.
Lastly, the signal handler would be called for all instances, which is
not intended, so prevent that by uninstalling the signal handler as a
first step of the handler itself.
Signed-off-by: Thomas De Schampheleire <thomas.de.schampheleire at gmail.com>
---
scripts/autobuild-run | 39 ++++++++++++++++++++++++++++++++++++---
1 file changed, 36 insertions(+), 3 deletions(-)
diff --git a/scripts/autobuild-run b/scripts/autobuild-run
index 237d443..3c448bd 100755
--- a/scripts/autobuild-run
+++ b/scripts/autobuild-run
@@ -97,9 +97,10 @@ import urllib2
import csv
from random import randint
import subprocess
-from multiprocessing import Process
+import multiprocessing
import signal
import os
+import errno
import shutil
from time import localtime, strftime
import sys
@@ -444,11 +445,16 @@ def do_build(**kwargs):
srcdir = os.path.join(idir, "buildroot")
f = open(os.path.join(outputdir, "logfile"), "w+")
log_write(log, "INFO: build started")
+
cmd = ["timeout", str(MAX_DURATION), "make", "O=%s" % outputdir,
"-C", srcdir, "BR2_DL_DIR=%s" % dldir,
"BR2_JLEVEL=%s" % kwargs['njobs']] \
+ kwargs['make_opts'].split()
- ret = subprocess.call(cmd, stdout=f, stderr=f)
+ sub = subprocess.Popen(cmd, stdout=f, stderr=f)
+ kwargs['buildpid'][kwargs['instance']] = sub.pid
+ ret = sub.wait()
+ kwargs['buildpid'][kwargs['instance']] = 0
+
# 124 is a special error code that indicates we have reached the
# timeout
if ret == 124:
@@ -692,8 +698,32 @@ def main():
print "WARN: tarballs of results will be kept locally only"
def sigterm_handler(signum, frame):
+ """Kill all children"""
+
+ # uninstall signal handler to prevent being called for all subprocesses
+ signal.signal(signal.SIGTERM, signal.SIG_DFL)
+
+ # stop all instances to prevent new children to be spawned
+ for p in processes:
+ p.terminate()
+
+ # kill build processes started with timeout (that puts its children
+ # explicitly in a separate process group)
+ for pid in buildpid:
+ if pid == 0:
+ continue
+ try:
+ os.kill(pid, signal.SIGTERM)
+ except OSError as e:
+ if e.errno != errno.ESRCH: # No such process, ignore
+ raise
+
+ # kill any remaining children in our process group
os.killpg(os.getpgid(os.getpid()), signal.SIGTERM)
+
sys.exit(1)
+
+ buildpid = multiprocessing.Array('i', int(args['--ninstances']))
processes = []
for i in range(0, int(args['--ninstances'])):
p = Process(target=run_instance, kwargs=dict(
@@ -704,11 +734,14 @@ def main():
http_password = args['--http-password'],
submitter = args['--submitter'],
make_opts = args['--make-opts'],
- upload = upload
+ upload = upload,
+ buildpid = buildpid
))
p.start()
processes.append(p)
+
signal.signal(signal.SIGTERM, sigterm_handler)
+
for p in processes:
p.join()
--
1.8.5.1
More information about the buildroot
mailing list