Usage guide

This guide walks through the typical workflow for defining rules, wiring them into the command-line interface, and inspecting the provenance artifacts produced by makeprov.

Defining rules

Rules are simple Python callables annotated with makeprov.paths.InPath for dependencies and makeprov.paths.OutPath for outputs. The makeprov.core.rule() decorator handles dependency inference, timestamp checks, and provenance writing.

from makeprov import InPath, OutPath, rule

@rule()
def uppercase(src: InPath, dest: OutPath):
    """Convert a text file to uppercase."""
    dest.write_text(src.read_text().upper())

Invoke the function directly to perform the work and produce provenance metadata in the configured output directory.

Building dependency graphs

When you provide default values for OutPath parameters, makeprov registers the rule as part of a build graph. You can then ask the system to build a target and its prerequisites:

from makeprov import build

# Builds the dependency chain ending at data/output.txt
build("data/output.txt")

Use makeprov.core.build_all() to trigger every terminal target in the graph, which is convenient for CI pipelines.

Parameterized targets

Default InPath or OutPath arguments can contain str.format-style placeholders. The decorator stores the associated templates and uses parse to extract parameters from requested targets:

@rule()
def align(
    sample: int | None = None,
    read1: InPath = InPath("reads/{sample:d}_R1.fq"),
    bam: OutPath = OutPath("results/{sample:d}.bam"),
):
    bam.write_text(read1.read_text())

build("results/42.bam")  # calls align(sample=42)

Phony/meta rules

Pass phony=True to makeprov.core.rule() to register orchestration or reporting helpers that do not produce outputs or should always run regardless of timestamps. These rules still participate in the CLI via makeprov.core.COMMANDS.

Command-line entry point

The makeprov.config.main() helper exposes decorated rules as CLI subcommands using defopt. Any --conf options you pass are applied before the rules run, making it easy to tailor provenance behavior per invocation.

python -m makeprov --conf @config/provenance.toml uppercase data/input.txt data/output.txt

Combine --verbose flags to increase logging during command execution, or use --explain / --to-dot to inspect dependency resolution without executing rules:

python -m makeprov -vv uppercase data/input.txt data/output.txt
python -m makeprov --explain data/output.txt
python -m makeprov --to-dot data/output.txt

Streaming input and output

All path marker classes accept the hyphen (-) to represent standard streams. This makes it simple to incorporate your rules into shell pipelines without creating temporary files:

from makeprov import InPath, OutPath, rule

@rule()
    def word_count(src: InPath = InPath("-"), dest: OutPath = OutPath("-")):
        """Count words from stdin and write the result to stdout."""
        content = src.read_text()
        dest.write_text(str(len(content.split())))

Tracking outputs within directories

Use :class:~makeprov.paths.OutDir when a rule produces multiple files under a common directory. The :meth:~makeprov.paths.OutDir.file helper returns :class:~makeprov.paths.OutPath instances rooted in that directory while recording them for provenance collection. :class:~makeprov.paths.InDir offers the same tracked-directory behavior for inputs.

from makeprov import InDir, OutDir, rule

@rule()
def write_assets(bundle: OutDir = OutDir("assets/v1/")):
    readme = bundle.file("README.txt")
    logo = bundle.file("logo.txt")

    readme.write_text("asset bundle\n")
    logo.write_text("v1 logo\n")

When the rule finishes, the provenance record includes both assets/v1/README.txt and assets/v1/logo.txt even though only the directory was declared as a parameter.

Merging provenance across nested rules

Pass merge=True to :func:~makeprov.core.rule to accumulate provenance from any rules invoked within the decorated function. This produces a single provenance document spanning the entire call tree, which is especially helpful for orchestration functions.

from makeprov import InDir, InPath, OutDir, OutPath, rule

@rule()
def render_fragment(name: str, dest: OutPath = OutPath("site/fragments/{name}.txt")):
    dest.write_text(f"fragment: {name}\n")

@rule(merge=True)
def build_site(
    sample: int,
    source_dir: InDir = InDir("content/{sample:d}/"),
    out: OutDir = OutDir("site/{sample:d}/"),
):
    index = out.file("index.html")
    report = out.file("report.md")
    logo = out.file("assets/logo.txt")

    render_fragment("logo", dest=logo)
    report.write_text(source_dir.file("main.txt").read_text())
    index.write_text("<html><body>see report.md</body></html>\n")

Invoking build("site/1/") runs the fragment rule, writes directory outputs, and emits a single merged provenance dataset for the entire workflow.

Scoped spans and explicit outputs

Use :func:makeprov.span to bracket arbitrary work in its own provenance buffer. A span returns the merged :class:~makeprov.prov.Prov via span.prov and can write to a specific path when nested, which makes per-model or per-shard provenance a one-liner:

from makeprov import span, rule, OutPath

@rule()
def train_model(out: OutPath = OutPath("models/a.txt")):
    out.write_text("ok")

with span("model-a", prov_path="prov/models/a") as sp:
    train_model()
    assert sp.prov.name == "model-a"

Controlling provenance framing

Prov objects can be serialized directly via :meth:~makeprov.prov.Prov.to_jsonld and :meth:~makeprov.prov.Prov.to_graph, which mirror the :class:~makeprov.rdfmixin.RDFMixin API. By default the provenance graph is stored in the default RDF graph, but setting frame = "results" in config moves it into a dedicated named graph while leaving result entities in the default graph. In JSON-LD that produces a nested structure under "provenance" keyed by the graph identifier, while TriG outputs add a named provenance context alongside the default graph.