refactor: fix error handling; fix log.Origin; only one trial

A bit of refactoring to fix logging and error reporting, and simplify the code.
This transmogrifies some of the things committed in
0ffb41440661631fa1d520c152be4cf8ebd4c46b "Add JUnit XML reports; refactor test
reporting", which did not fully match the code structuring ideas used in
osmo-gsm-tester. Also solve some problems present from the start of the code
base.

Though this is a bit of a code bomb, it would take a lot of time to separate
this into smaller bits: these changes are closely related and resulted
incrementally from testing error handling and logging details. I hope it's ok.

Things changed / problems fixed:

Allow only a single trial to be run per cmdline invocation: unbloat trial and
suite invocation in osmo-gsm-tester.py.

There is a SuiteDefinition, intended to be immutable, and a mutable SuiteRun.
SuiteDefinition had a list of tests, which was modified by the SuiteRun to
record test results. Instead, have only the test basenames in the
SuiteDefinition and create a new set of Test() instances for each SuiteRun, to
ensure that no state leaks between separate suite runs.

State leaking across runs can be seen in
http://jenkins.osmocom.org/jenkins/view/osmo-gsm-tester/job/osmo-gsm-tester_run/453/
where an earlier sms test for sysmo succeeds, but its state gets overwritten by
the later sms test for trx that fails. The end result is that both tests
failed, although the first run was successful.

Fix a problem with Origin: log.Origin allowed to be __enter__ed more than once,
skipping the second entry. The problem there is that we'd still __exit__ twice
or more, popping the Origin off the stack even though it should still remain.
We could count __enter__ recurrences, but instead, completely disallow entering
a second time.

A code path should have one 'with' statement per object, at pivotal points like
run_suites or run_tests. Individual utility functions should not do 'with' on a
central object. The structure needed is, in pseudo code:

  try:
    with trial:
      try:
        with suite_run:
	  try:
	    with test:
	      test_actions()

The 'with' needs to be inside the 'try', so that the exception can be handled
in __exit__ before it reaches the exception logging.

To clarify this, like test exceptions caught in Test.run(), also move suite
exception handling from Trial into SuiteRun.run_tests(). There are 'with self'
in Test.run() and SuiteRun.run_tests(), which are well placed, because these
are pivotal points in the main code path.

Log output: clearly separate logging of distinct suites and test scripts, by
adding more large_separator() calls at the start of each test. Place these
separator calls in more logical places. Add separator size and spacing args.

Log output: print tracebacks only once, for the test script where they happen.

Have less state that duplicates other state: drop SuiteRun.test_failed_ctr and
suite.test_skipped_ctr, instead add SuiteRun.count_test_results().

For test failure reporting, store the traceback text in a separate member var.

In the text report, apply above changes and unclutter to achieve a brief and
easy to read result overview: print less filler characters, drop the starting
times, drop the tracebacks. This can be found in the individual test logs.
Because the tracebacks are no longer in the text report, the suite_test.py can
just print the reports and expect that output instead of asserting individual
contents.

In the text report, print duration in precision of .1 seconds.

Add origin information and a traceback text to the junit XML result to give
more context when browsing the result XML. For 'AssertionError', add the source
line of where the assertion hit.

Drop the explicit Failure exception. We don't need one specific exception to
mark a failure, instead any arbitrary exception is treated as a failure. Use
the exception's class name as fail_type.

Though my original idea was to use raising exceptions as the only way to cause
a test failure, I'm keeping the set_fail() function as an alternative way,
because it allows test specific cleanup and may come in handy later. To have
both ways integrate seamlessly, shift some result setting into 'finally'
clauses and make sure higher levels (suite, trial) count the contained items'
stati.

Minor tweak: write the 'pass' and 'skip' reports in lower case so that the
'FAIL' stands out.

Minor tweak: pass the return code that the program exit should return further
outward, so that the exit(1) call does not cause a SystemExit exception to be
logged.

The aims of this patch are:
- Logs are readable so that it is clear which logging belongs to which test and
  suite.
- The logging origins are correct (vs. parents gone missing as previously)
- A single test error does not cause following tests or suites to be skipped.
- An exception "above" Exception, i.e. SystemExit and the like, *does*
  immediately abort all tests and suites, and the results for tests that were
  not run are reported as "unknown" (rather than skipped on purpose):
  - Raising a SystemExit aborts all.
  - Hitting ctrl-c aborts all.
- The resulting summary in the log is brief and readable.

Change-Id: Ibf0846d457cab26f54c25e6906a8bb304724e2d8
diff --git a/src/osmo_gsm_tester/suite.py b/src/osmo_gsm_tester/suite.py
index fb6cf36..0dd8790 100644
--- a/src/osmo_gsm_tester/suite.py
+++ b/src/osmo_gsm_tester/suite.py
@@ -31,16 +31,6 @@
 class Timeout(Exception):
     pass
 
-class Failure(Exception):
-    '''Test failure exception, provided to be raised by tests. fail_type is
-       usually a keyword used to quickly identify the type of failure that
-       occurred. fail_msg is a more extensive text containing information about
-       the issue.'''
-
-    def __init__(self, fail_type, fail_msg):
-        self.fail_type = fail_type
-        self.fail_msg = fail_msg
-
 class SuiteDefinition(log.Origin):
     '''A test suite reserves resources for a number of tests.
        Each test requires a specific number of modems, BTSs etc., which are
@@ -65,43 +55,32 @@
         self.read_conf()
 
     def read_conf(self):
-        with self:
-            self.dbg('reading %s' % SuiteDefinition.CONF_FILENAME)
-            if not os.path.isdir(self.suite_dir):
-                raise RuntimeError('No such directory: %r' % self.suite_dir)
-            self.conf = config.read(os.path.join(self.suite_dir,
-                                                 SuiteDefinition.CONF_FILENAME),
-                                    SuiteDefinition.CONF_SCHEMA)
-            self.load_tests()
+        self.dbg('reading %s' % SuiteDefinition.CONF_FILENAME)
+        if not os.path.isdir(self.suite_dir):
+            raise RuntimeError('No such directory: %r' % self.suite_dir)
+        self.conf = config.read(os.path.join(self.suite_dir,
+                                             SuiteDefinition.CONF_FILENAME),
+                                SuiteDefinition.CONF_SCHEMA)
+        self.load_test_basenames()
 
-    def load_tests(self):
-        with self:
-            self.tests = []
-            for basename in sorted(os.listdir(self.suite_dir)):
-                if not basename.endswith('.py'):
-                    continue
-                self.tests.append(Test(self, basename))
+    def load_test_basenames(self):
+        self.test_basenames = []
+        for basename in sorted(os.listdir(self.suite_dir)):
+            if not basename.endswith('.py'):
+                continue
+            self.test_basenames.append(basename)
 
-    def add_test(self, test):
-        with self:
-            if not isinstance(test, Test):
-                raise ValueError('add_test(): pass a Test() instance, not %s' % type(test))
-            if test.suite is None:
-                test.suite = self
-            if test.suite is not self:
-                raise ValueError('add_test(): test already belongs to another suite')
-            self.tests.append(test)
 
 class Test(log.Origin):
     UNKNOWN = 'UNKNOWN'
-    SKIP = 'SKIP'
-    PASS = 'PASS'
+    SKIP = 'skip'
+    PASS = 'pass'
     FAIL = 'FAIL'
 
-    def __init__(self, suite, test_basename):
-        self.suite = suite
+    def __init__(self, suite_run, test_basename):
+        self.suite_run = suite_run
         self.basename = test_basename
-        self.path = os.path.join(self.suite.suite_dir, self.basename)
+        self.path = os.path.join(self.suite_run.definition.suite_dir, self.basename)
         super().__init__(self.path)
         self.set_name(self.basename)
         self.set_log_category(log.C_TST)
@@ -111,38 +90,38 @@
         self.fail_type = None
         self.fail_message = None
 
-    def run(self, suite_run):
-        assert self.suite is suite_run.definition
+    def run(self):
         try:
             with self:
+                log.large_separator(self.suite_run.trial.name(), self.suite_run.name(), self.name(), sublevel=3)
                 self.status = Test.UNKNOWN
                 self.start_timestamp = time.time()
-                test.setup(suite_run, self, ofono_client, sys.modules[__name__], event_loop)
-                self.log('START')
+                test.setup(self.suite_run, self, ofono_client, sys.modules[__name__], event_loop)
                 with self.redirect_stdout():
-                    util.run_python_file('%s.%s' % (self.suite.name(), self.name()),
+                    util.run_python_file('%s.%s' % (self.suite_run.definition.name(), self.basename),
                                          self.path)
                 if self.status == Test.UNKNOWN:
                      self.set_pass()
         except Exception as e:
-            self.log_exn()
-            if isinstance(e, Failure):
-                ftype = e.fail_type
-                fmsg =  e.fail_msg + '\n' + traceback.format_exc().rstrip()
+            if hasattr(e, 'msg'):
+                msg = e.msg
             else:
-                ftype = type(e).__name__
-                fmsg = repr(e) + '\n' + traceback.format_exc().rstrip()
-                if isinstance(e, resource.NoResourceExn):
-                    fmsg += suite_run.resource_status_str()
-
-            self.set_fail(ftype, fmsg, False)
-
-        finally:
-            if self.status == Test.PASS or self.status == Test.SKIP:
-                self.log(self.status)
-            else:
-                self.log('%s (%s)' % (self.status, self.fail_type))
-        return self.status
+                msg = str(e)
+            if isinstance(e, AssertionError):
+                # AssertionError lacks further information on what was
+                # asserted. Find the line where the code asserted:
+                msg += log.get_src_from_tb(sys.exc_info()[2])
+            # add source file information to failure report
+            if hasattr(e, 'origins'):
+                msg += ' [%s]' % e.origins
+            tb_str = traceback.format_exc()
+            if isinstance(e, resource.NoResourceExn):
+                tb_str += self.suite_run.resource_status_str()
+            self.set_fail(type(e).__name__, msg, tb_str)
+        except BaseException as e:
+            # when the program is aborted by a signal (like Ctrl-C), escalate to abort all.
+            self.err('TEST RUN ABORTED: %s' % type(e).__name__)
+            raise
 
     def name(self):
         l = log.get_line_for_src(self.path)
@@ -150,17 +129,26 @@
             return '%s:%s' % (self._name, l)
         return super().name()
 
-    def set_fail(self, fail_type, fail_message, tb=True):
+    def set_fail(self, fail_type, fail_message, tb_str=None):
         self.status = Test.FAIL
         self.duration = time.time() - self.start_timestamp
         self.fail_type = fail_type
         self.fail_message = fail_message
-        if tb:
-            self.fail_message += '\n' + ''.join(traceback.format_stack()[:-1]).rstrip()
+
+        if tb_str is None:
+            # populate an exception-less call to set_fail() with traceback info
+            tb_str = ''.join(traceback.format_stack()[:-1])
+
+        self.fail_tb = tb_str
+        self.err('%s: %s' % (self.fail_type, self.fail_message))
+        if self.fail_tb:
+            self.trace(self.fail_tb)
+        self.log('Test FAILED (%.1f sec)' % self.duration)
 
     def set_pass(self):
         self.status = Test.PASS
         self.duration = time.time() - self.start_timestamp
+        self.log('Test passed (%.1f sec)' % self.duration)
 
     def set_skip(self):
         self.status = Test.SKIP
@@ -172,6 +160,7 @@
     FAIL = 'FAIL'
 
     trial = None
+    status = None
     resources_pool = None
     reserved_resources = None
     objects_to_clean_up = None
@@ -179,13 +168,20 @@
     _config = None
     _processes = None
 
-    def __init__(self, current_trial, suite_scenario_str, suite_definition, scenarios=[]):
-        self.trial = current_trial
+    def __init__(self, trial, suite_scenario_str, suite_definition, scenarios=[]):
+        self.trial = trial
         self.definition = suite_definition
         self.scenarios = scenarios
         self.set_name(suite_scenario_str)
         self.set_log_category(log.C_TST)
         self.resources_pool = resource.ResourcesPool()
+        self.status = SuiteRun.UNKNOWN
+        self.load_tests()
+
+    def load_tests(self):
+        self.tests = []
+        for test_basename in self.definition.test_basenames:
+            self.tests.append(Test(self, test_basename))
 
     def register_for_cleanup(self, *obj):
         assert all([hasattr(o, 'cleanup') for o in obj])
@@ -198,11 +194,8 @@
             obj.cleanup()
 
     def mark_start(self):
-        self.tests = []
         self.start_timestamp = time.time()
         self.duration = 0
-        self.test_failed_ctr = 0
-        self.test_skipped_ctr = 0
         self.status = SuiteRun.UNKNOWN
 
     def combined(self, conf_name):
@@ -233,27 +226,27 @@
         if self.reserved_resources:
             raise RuntimeError('Attempt to reserve resources twice for a SuiteRun')
         self.log('reserving resources in', self.resources_pool.state_dir, '...')
-        with self:
-            self.reserved_resources = self.resources_pool.reserve(self, self.resource_requirements())
+        self.reserved_resources = self.resources_pool.reserve(self, self.resource_requirements())
 
     def run_tests(self, names=None):
-        self.log('Suite run start')
         try:
-            self.mark_start()
-            event_loop.register_poll_func(self.poll)
-            if not self.reserved_resources:
-                self.reserve_resources()
-            for test in self.definition.tests:
-                if names and not test.name() in names:
-                    test.set_skip()
-                    self.test_skipped_ctr += 1
-                    self.tests.append(test)
-                    continue
-                with self:
-                    st = test.run(self)
-                    if st == Test.FAIL:
-                        self.test_failed_ctr += 1
-                    self.tests.append(test)
+            with self:
+                log.large_separator(self.trial.name(), self.name(), sublevel=2)
+                self.mark_start()
+                event_loop.register_poll_func(self.poll)
+                if not self.reserved_resources:
+                    self.reserve_resources()
+                for test in self.tests:
+                    if names and not test.name() in names:
+                        test.set_skip()
+                        continue
+                    test.run()
+        except Exception:
+            self.log_exn()
+        except BaseException as e:
+            # when the program is aborted by a signal (like Ctrl-C), escalate to abort all.
+            self.err('SUITE RUN ABORTED: %s' % type(e).__name__)
+            raise
         finally:
             # if sys.exit() called from signal handler (e.g. SIGINT), SystemExit
             # base exception is raised. Make sure to stop processes in this
@@ -261,14 +254,33 @@
             self.stop_processes()
             self.objects_cleanup()
             self.free_resources()
-        event_loop.unregister_poll_func(self.poll)
-        self.duration = time.time() - self.start_timestamp
-        if self.test_failed_ctr:
-            self.status = SuiteRun.FAIL
-        else:
-            self.status = SuiteRun.PASS
-        self.log(self.status)
-        return self.status
+            event_loop.unregister_poll_func(self.poll)
+            self.duration = time.time() - self.start_timestamp
+
+            passed, skipped, failed = self.count_test_results()
+            # if no tests ran, count it as failure
+            if passed and not failed:
+                self.status = SuiteRun.PASS
+            else:
+                self.status = SuiteRun.FAIL
+
+            log.large_separator(self.trial.name(), self.name(), self.status, sublevel=2, space_above=False)
+
+    def passed(self):
+        return self.status == SuiteRun.PASS
+
+    def count_test_results(self):
+        passed = 0
+        skipped = 0
+        failed = 0
+        for test in self.tests:
+            if test.status == Test.PASS:
+                passed += 1
+            elif test.status == Test.FAIL:
+                failed += 1
+            else:
+                skipped += 1
+        return (passed, skipped, failed)
 
     def remember_to_stop(self, process):
         if self._processes is None: