Snakemake integration

makeprov can read Snakemake’s execution metadata without depending on Snakemake’s Python API. The optional submodule shells out to the snakemake CLI, collects the job DAG (--d3dag) and the detailed summary table (--detailed-summary), and builds a provenance graph that can be bundled into Snakemake HTML reports.

Installation

Install the optional extra so the Snakemake CLI is available alongside makeprov:

pip install "makeprov[snakemake]"

CLI entry point

The bridge is exposed as a module entry point. All arguments after -- are passed through to Snakemake unchanged.

python -m makeprov.snakemake \
    --prov-path prov/run \
    --out-fmt json \
    --forceall-dag \
    -- \
    --snakefile Snakefile --nolock

Key flags:

  • --prov-dir / --prov-path mirror the configuration fields from makeprov.config.

  • --out-fmt accepts json (default) or trig.

  • --context embeds the JSON-LD context inline, which is useful when generating standalone files for reports.

  • --frame chooses between the provenance (default) and results JSON-LD frames.

  • --forceall-dag adds --forceall to the Snakemake --d3dag invocation so edges between already up-to-date jobs remain part of the provenance graph.

The command honours -c/--conf arguments in the same way as the core CLI, allowing TOML configuration snippets or files to override defaults.

Example Snakefile

report: "report/workflow.rst"

rule all:
    input:
        report("results/word_count.txt", category="Results"),
        report("prov/snakemake.json", category="Provenance")

rule concat:
    input:
        "data/a.txt",
        "data/b.txt"
    output:
        "results/concatenated.txt"
    shell:
        "cat {input} > {output}"

rule count_words:
    input:
        "results/concatenated.txt"
    output:
        "results/word_count.txt"
    shell:
        "wc -w {input} > {output}"

rule provenance:
    input:
        "results/word_count.txt"
    output:
        "prov/snakemake.json"
    shell:
        (
            "python -m makeprov.snakemake "
            "--prov-path prov/snakemake "
            "--out-fmt json --context --forceall-dag "
            "-- "
            "--snakefile {workflow.snakefile} --nolock {input} "
        )

Running snakemake --cores 1 --report report.html now bundles the PROV file into the generated HTML report under the “Provenance” category.

Output structure

The command produces a single makeprov.prov.Prov document. Each job becomes a prov:Activity; inputs and outputs are represented as prov:Entity nodes, and job-to-job edges are recorded using prov:wasInformedBy whenever Snakemake’s D3 DAG provides the necessary IDs. File metadata such as hashes, MIME types and timestamps are captured from the filesystem when available.

Use the standard makeprov serialization helpers to post-process the output:

from makeprov import snakemake

prov = snakemake.build_prov_from_snakemake(dag_json, summary_rows, config=config)
prov.write("prov/snakemake", fmt="trig")