Provenance
SemSynth writes provenance for every report run using makeprov. For each dataset, files land under output/<dataset>/prov/ and capture inputs, outputs, and processing steps.
# Peek at the generated provenance files for Heart Disease
from pathlib import Path
prov_dir = Path("../output/Heart Disease/prov")
sorted(p.name for p in prov_dir.glob("*") if p.is_file())[:5]
['report.json']
How it is wired
searchandreportCLI commands insemsynth/__main__.pyare decorated with@rule(merge=True), so makeprov tracks their inputs/outputs automatically.The reporting pipeline (
semsynth/pipeline.py) marks major artifacts (UMAPs, metrics, reports) as provenance outputs, andProvenanceConfig.prov_diris set per dataset before processing starts.Default settings live in
prov-config.toml(base IRI and output directory). You can override them via CLI flags or config.
Browser catalog
The generated output/index.html includes a SPARQL/YASGUI panel wired to the static catalog and provenance files. Open it in a browser to explore datasets, mappings, and provenance artifacts without running a server.
Handy commands
Regenerate a missing file:
python -m semsynth --build output/<dataset>/index.htmlInspect the DAG without executing:
python -m semsynth --conf @prov-config.toml --dry-run output/<dataset>/index.html