Tests: Generation, Structure, and Usage
This document summarizes the current state of the knime2py test generation, what the generator does to your copied KNIME projects, how the generated tests work, and how to run and tune them.
1) What the generator does
test_gen.cli is a small utility that:
-
Cleans a copied KNIME project under
tests/data/<NAME>(or a--pathyou give it), so only the files needed for reproducible export remain: -
Inside each node directory: keep only
settings.xml; delete everything else (including hidden files). - In the project root: keep only
workflow.knimeand non-hidden directories (node dirs); delete all other files and delete hidden directories. -
Safety checks: refuses to run on filesystem root; requires the presence of
workflow.knime. -
Writes a pytest into
tests/test_<slug>.pythat: -
Calls the knime2py CLI to export the workflow to
tests/data/!output(provided by a fixture). - Runs the generated
*_workbook.py. -
Compares produced CSV(s) to reference CSV(s) using relative tolerance (numeric), with trimming for strings, and strict header and shape checks.
-
Overwrites by default. The generated test file is replaced unless you pass
--no-overwrite.
2) Directory layout assumptions
repo/
├─ src/
│ └─ knime2py/...
├─ tests/
│ ├─ data/
│ │ ├─ <WORKFLOW_NAME>/ # a *copy* of your KNIME project (cleaned by the generator)
│ │ │ └─ workflow.knime
│ │ ├─ data/
│ │ │ └─ <WORKFLOW_NAME>/
│ │ │ ├─ output.csv # reference tables
│ │ │ ├─ foo_output.csv
│ │ │ └─ bar_output.csv
│ │ └─ !output/ # produced outputs (managed by fixture)
│ ├─ support/
│ │ └─ csv_compare.py # shared CSV comparison helper
│ ├─ conftest.py # provides output_dir fixture
│ └─ test_<slug>.py # generated test(s)
└─ ...
- The reference CSVs live under
tests/data/data/<WORKFLOW_NAME>/and can include multiple files; the generator looks for all files matching*_output.csv. - The test writes produced outputs into
tests/data/!output/.
3) The comparison helper
All generated tests import a single helper module:
tests/support/csv_compare.py
It provides:
compare_csv(got_path, exp_path, rtol=RTOL)— compares two CSVs.RTOL— default relative tolerance (defaults to1e-3, i.e. 0.1%). Individual tests can still override this constant locally.ZERO_TOL— small zero tolerance (defaults to1e-6). Any finite numeric value with absolute magnitude< ZERO_TOLis treated as0.0before comparison in both tables.
What “equal” means
- Headers must match exactly after trimming.
- Shape must match: same row count; each corresponding row has the same number of columns.
- Numeric cells are compared using
math.isclose(a, b, rel_tol=RTOL, abs_tol=0)after mapping finite near-zero values to0.0usingZERO_TOL. NaNequalsNaN.+∞and−∞must match exactly.- Non-numeric cells must match exactly after trimming.
On failure, the helper prints up to the first 25 mismatches and shows both file paths. Relative errors are reported using the same denominator as math.isclose (i.e. max(|a|, |b|)), applied after the zero-normalization.
4) Multiple outputs per workflow
The generated test now supports multiple output tables:
- It enumerates all reference files in
tests/data/data/<WORKFLOW_NAME>/**whose basename matches*_output.csv. - For each reference file
X_output.csvit expects a produced file with the same basename intests/data/!output/. - Each pair is compared independently via
csv_compare.compare_csv(...).
This lets a single workflow test validate several outputs.
5) The generated test: how it runs
At a high level, each generated test:
- Uses the
output_dirfixture (fromconftest.py) to get a freshtests/data/!outputdirectory. - Invokes with
PYTHONPATHpointing at your repo’ssrc/so the CLI resolves local code:
python -m knime2py <workflow_dir> --out tests/data/!output --graph off --workbook py
- Finds the generated
*_workbook.pyin!outputand executes it (cwd =!output) so relative paths resolve. - Compares
!output/<name>_output.csvfiles against references undertests/data/data/<WORKFLOW_NAME>/.
6) Tolerances
- Default
RTOL = 1e-3(0.1%). Individual tests can override this constant locally (e.g.,RTOL = 0.1). - Default
ZERO_TOL = 1e-6. This only affects values that are very close to zero; it avoids meaningless relative errors caused by tiny denormalized numbers.
Use a larger RTOL when you expect minor, benign numeric drift (e.g., pandas versions, BLAS differences). Use a smaller value when you want tighter verification.
7) Generator usage
From repo root:
# Generate test for a workflow copied under tests/data/<NAME>/
python -m test_gen.cli <NAME>
# Or point at an explicit KNIME project directory copy
python -m test_gen.cli --path /abs/path/to/knime_project_copy
# See actions without writing
python -m test_gen.cli <NAME> --dry-run -v
# Keep an existing test file (do not overwrite)
python -m test_gen.cli <NAME> --no-overwrite
Defaults:
--data-dirdefaults to<repo>/tests/data.--tests-dirdefaults to<repo>/tests.- Overwrite is enabled by default; add
--no-overwriteto preserve an existing test file.
Slug rules: test filename is tests/test_<slug>.py where <slug> keeps alphanumerics and converts others to _, collapsing runs.
8) Running tests
Typical invocations:
# Run all tests
pytest -q
# Only knime2py roundtrip tests
pytest -q -k roundtrip
# Show detailed failure output
pytest -vv