[PATCH 2/2] KVM: selftests: Create KVM selftest runner

Wed Apr 30 14:31:05 PDT 2025

This is awesome!

I have lots of idea, but not all of them need to be address for the initial
commit.  Some of them I would consider blockers, but I also don't want to end up
with massive scope creep that causes this to stall out for (more) months on end.

On Fri, Feb 21, 2025, Vipin Sharma wrote:
> Create KVM selftest runner to run selftests and provide various options
> for execution.
> 
> Provide following features in the runner:
> 1. --timeout/-t: Max time each test should finish in before killing it.

The help for this needs to specify the units.  I assume it's seconds?

> 2. --jobs/-j: Run these many tests in parallel.
> 3. --tests: Provide space separated path of tests to execute.
> 4. --test_dirs: Directories to search for test files and run them.
> 5. --output/-o: Create the folder with given name and dump output of
>    each test in a hierarchical way.
> 6. Add summary at the end.
> 
> Runner needs testcase files which are provided in the previous patch.
> Following are the examples to start the runner (cwd is
> tools/testing/selftests/kvm)

The runner definitely needs a command line option to specify the path to the
test executables.  Defaulting to in-tree builds is totally fine, but we should
also support out-of-tree builds (or copying to a remote host, etc.).

The default testcases will have relative paths, e.g. $arch/$test, so the user
will still need to maintain the same directory structure as in-tree builds, but
IMO that's totally fine.

> - Basic run:
>   python3 runner --test_dirs testcases
> 
> - Run specific test
>   python3 runner --tests ./testcases/dirty_log_perf_test/default.test
> 
> - Run tests parallel
>   python3 runner --test_dirs testcases -j 10
> 
> - Run 5 tests parallely at a time, with the timeout of 10 seconds and
>   dump output in "result" directory
>   python3 runner --test_dirs testcases -j 5 -t 10 --output result
> 
> Sample output from the above command:
> 
> python3_binary runner --test_dirs testcases -j 5 -t 10 --output result
> 
> 2025-02-21 16:45:46,774 | 16809 |     INFO | [Passed] testcases/guest_print_test/default.test
> 2025-02-21 16:45:47,040 | 16809 |     INFO | [Passed] testcases/kvm_create_max_vcpus/default.test
> 2025-02-21 16:45:49,244 | 16809 |     INFO | [Passed] testcases/dirty_log_perf_test/default.test


Printing the timestamps to the console isn't terrible interesting, and IMO isn't
at all worth the noise.

The PID is nice, but it needs to be printed _before_ the test finishes, and it
needs to track the PID of the test.  If getting that working is non-trivial,
definitely punt it for the initial commit.

And presumably INFO is the level of logging.  That needs to go.

> ...
> 2025-02-21 16:46:07,225 | 16809 |     INFO | [Passed] testcases/x86_64/pmu_event_filter_test/default.test
> 2025-02-21 16:46:08,020 | 16809 |     INFO | [Passed] testcases/x86_64/vmx_preemption_timer_test/default.test
> 2025-02-21 16:46:09,734 | 16809 |     INFO | [Timed out] testcases/x86_64/pmu_counters_test/default.test

I would really like to have terminal colored+bolded output for tests that fail
(or timeout) or are skipped.

I think we should also provide controls for the verbosity of the output.  E.g. to
skip printing tests that pass entirely.  My vote would be for a collection of
boolean knobs, i.e. not a log_level or whatever, because inevitably we'll end up
with output that isn't strictly "increasing".

Adding a param to disable printing of passed tests is presumably trivial, so maybe
do that for the initial commit, and then we can work on the fancier stuff?

> 2025-02-21 16:46:10,202 | 16809 |     INFO | [Passed] testcases/hardware_disable_test/default.test
> 2025-02-21 16:46:10,203 | 16809 |     INFO | Tests ran: 85 tests

It would be very nice to have a summary of things printed out periodically.  E.g.
if my normal run has a few failures, but the current run has already failed a
decent number of tests, then I'd probably kill the run and start debugging.

Alternatively, and maybe even better, would be to make the runner mildly interactive,
i.e. to accept rudimentary commands while tests are running.  Then the user can
query the state of things while the runner is still doing its thing.  E.g. bind
a few keys to print the various statuses.

> 2025-02-21 16:46:10,204 | 16809 |     INFO | Passed: 61
> 2025-02-21 16:46:10,204 | 16809 |     INFO | Failed: 4
> 2025-02-21 16:46:10,204 | 16809 |     INFO | Skipped: 17
> 2025-02-21 16:46:10,204 | 16809 |     INFO | Timed out: 3
> 2025-02-21 16:46:10,204 | 16809 |     INFO | No run: 0

A not-quite-mandatory, but very-nice-to-have feature would be the ability to
display which tests Passed/Failed/Skipped/Timed Out/Incomplete, with command line
knobs for each.  My vote is for everything but Passed on-by-default, though it's
easy enough to put a light wrapper around this (which I'll do no matter what), so
my preference for the default doesn't matter all that much.

That could tie into the above idea of grabbing keys to print such information on-demand.

Also CTRL-C handling needs much more graceful output.  Ideally, no stack traces
whatsover, and instead a summary like the above, but with information about which
tests didn't complete.

> Output dumped in result directory
> 
> $ tree result/
> result/

The runner should have an (on-by-default?) option to abort if the output directory
already exists, e.g. so that users don't clobber previous runs.  And/or an option
to append a timestamp, e.g. $result.yyyy.mm.dd.MM.SS, so that all users don't end
up writing the same wrapper to generate a timestamp.

Having a no-timestamp + overwrite mode is also useful, e.g. when I'm running in
a more "interactive" mode where I'm doing initial testing of something, and I
don't care about

> ├── log
> └── testcases
>     ├── access_tracking_perf_test
>     │   └── default.test
>     │       ├── stderr
>     │       └── stdout
>     ├── coalesced_io_test
>     │   └── default.test
>     │       ├── stderr
>     │       └── stdout
> ...
> 
> results/log file will have the status of each test like the one printed
> on console. Each stderr and stdout will have data based on the
> execution.
> 
> Runner is implemented in python and needs at least 3.6 version.
> 
> Signed-off-by: Vipin Sharma <vipinsh at google.com>
> ---

...

> diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
> index 550b7c2b4a0c..a23fd4b2cb5f 100644
> --- a/tools/testing/selftests/kvm/.gitignore
> +++ b/tools/testing/selftests/kvm/.gitignore
> @@ -11,3 +11,4 @@
>  !Makefile
>  !Makefile.kvm
>  !*.test
> +!*.py

Sort this alphabetically as well.

> +def fetch_test_files(args):
> +    exclude_dirs = ["aarch64", "x86_64", "riscv", "s390x"]

These are now:

arm64, x86, riscv, s390

> +    def __init__(self, test_path, output_dir=None, timeout=None,):
> +        test_command = pathlib.Path(test_path).read_text().strip()
> +        if not test_command:
> +            raise ValueError("Empty test command in " + test_path)
> +
> +        if output_dir is not None:
> +            output_dir = os.path.join(output_dir, test_path)

This doesn't do the right thing if test_path is absolute (or maybe it's if the
output_dir is in a completely different hierarchy?)

I was able to fudge around this with

diff --git a/tools/testing/selftests/kvm/runner/selftest.py b/tools/testing/selftests/kvm/runner/selftest.py
index cdf5d1085c08..3bce023693cb 100644
--- a/tools/testing/selftests/kvm/runner/selftest.py
+++ b/tools/testing/selftests/kvm/runner/selftest.py
@@ -30,7 +30,8 @@ class Selftest:
             raise ValueError("Empty test command in " + test_path)
 
         if output_dir is not None:
-            output_dir = os.path.join(output_dir, test_path)
+            dirpath, filename = os.path.split(test_path)
+            output_dir = os.path.join(output_dir, os.path.basename(dirpath), filename)
         self.test_path = test_path
         self.command = command.Command(test_command, timeout, output_dir)
         self.status = SelftestStatus.NO_RUN

Lastly, please don't wrap agressively (off-list, you mentioned it was due to a
formatter?).  E.g.

diff --git a/tools/testing/selftests/kvm/runner/test_runner.py b/tools/testing/selftests/kvm/runner/test_runner.py
index b9d34c20bf88..5a568e155477 100644
--- a/tools/testing/selftests/kvm/runner/test_runner.py
+++ b/tools/testing/selftests/kvm/runner/test_runner.py
@@ -12,8 +12,7 @@ class TestRunner:
         self.tests = []
 
         for test_file in test_files:
-            self.tests.append(selftest.Selftest(
-                test_file, output_dir, timeout))
+            self.tests.append(selftest.Selftest(test_file, output_dir, timeout))
 
     def _run(self, test):
         test.run()