makeprov

Track file provenance in Python workflows using PROV semantics

Functions

build(target[, _seen, session])

Recursively build a target and its prerequisites.

build_all(*[, session])

Build all concrete targets that have no dependents.

dry_run_build(target, *[, session])

Log the steps required to build target without executing rules.

explain(target, *[, session])

Log the rule used for each target in build order.

list_rules(*[, session])

Return registered rule names in alphabetical order.

list_targets(*[, session])

Return concrete targets produced by non-pattern rules.

main([subcommands, conf_obj, ...])

Entry point for running registered CLI subcommands.

needs_update(outputs, deps)

Determine whether outputs are stale relative to dependencies.

new_session()

Create a fresh session with isolated registries and buffers.

plan(target, *[, session])

Return the execution order for building a target.

resolve_target(target, *[, session])

Resolve a target to its registered rule and parameters.

root_targets(*[, session])

Return concrete targets that are not dependencies of other rules.

rule(*[, name, phony, base_iri, prov_dir, ...])

Decorate a function as a build rule with automatic provenance.

to_dot(target, *[, session])

Render the dependency graph for target in DOT format.

Classes

CachedDownload(url, cache_path, *[, ...])

Input wrapper that lazily downloads and records source metadata.

Config()

Base configuration container with TOML application helpers.

InDir(*paths)

Input directory that tracks files declared within it.

InPath(*paths)

Marker for input paths where "-" maps to stdin.

OutDir(*paths)

Output directory that tracks files declared within it.

OutPath(*paths)

Marker for output paths where "-" maps to stdout.

ProvPath(*paths)

Filesystem path with first-class support for stream placeholders.

ProvenanceConfig([base_iri, prov_dir, ...])

Runtime configuration for provenance generation.

RDFMixin()

Provide JSON-LD serialization helpers for dataclasses.

Session([rules_by_target, rules_by_name, ...])

In-memory registries and buffers for a makeprov run.

span(label[, prov_path, frame, context, session])

Scope provenance buffering to a context or decorator.

class makeprov.CachedDownload(url, cache_path, *, headers=None, transform='prov:wasDerivedFrom')

Bases: InPath

Input wrapper that lazily downloads and records source metadata.

open(mode='r', *args, **kwargs)

Open the path for reading, honoring stdin streams.

Parameters:
  • mode (str) – File mode; defaults to read.

  • *args – Additional positional arguments forwarded to Path.open.

  • **kwargs – Additional keyword arguments forwarded to Path.open.

Returns:

Readable file-like object.

Return type:

IOBase

Examples

InPath("example.txt").open().read()
class makeprov.Config

Bases: object

Base configuration container with TOML application helpers.

apply(toml_ref)
Return type:

Config

clone_with(**kwargs)
Return type:

Config

classmethod get()
Return type:

Config

classmethod set(config)
Return type:

Config

class makeprov.InDir(*paths: str | bytes | ProvPath)

Bases: InPath

Input directory that tracks files declared within it.

The file() helper produces InPath instances rooted in the directory while recording them for provenance collection.

property children: tuple[InPath, ...]
file(name)
Return type:

InPath

class makeprov.InPath(*paths: str | bytes | ProvPath)

Bases: ProvPath

Marker for input paths where "-" maps to stdin.

Examples

from makeprov.paths import InPath

src = InPath("data/input.txt")
with src.open() as handle:
    _ = handle.read()
open(mode='r', *args, **kwargs)

Open the path for reading, honoring stdin streams.

Parameters:
  • mode (str) – File mode; defaults to read.

  • *args – Additional positional arguments forwarded to Path.open.

  • **kwargs – Additional keyword arguments forwarded to Path.open.

Returns:

Readable file-like object.

Return type:

IOBase

Examples

InPath("example.txt").open().read()
class makeprov.OutDir(*paths: str | bytes | ProvPath)

Bases: OutPath

Output directory that tracks files declared within it.

The file() helper produces OutPath instances rooted in the directory while recording them for provenance collection.

property children: tuple[OutPath, ...]
file(name)
Return type:

OutPath

class makeprov.OutPath(*paths: str | bytes | ProvPath)

Bases: ProvPath

Marker for output paths where "-" maps to stdout.

Examples

from makeprov.paths import OutPath

dest = OutPath("data/output.txt")
dest.write_text("generated")
as_inpath()

Convert an output marker into an input marker.

Returns:

A new instance pointing to the same filesystem location.

Return type:

InPath

Raises:

ValueError – If the current path represents a stream.

Examples

from makeprov.paths import OutPath

OutPath("data/output.txt").as_inpath()
open(mode='w', *args, **kwargs)

Open the path for writing, creating parent directories when needed.

Parameters:
  • mode (str) – File mode; defaults to write.

  • *args – Additional positional arguments forwarded to Path.open.

  • **kwargs – Additional keyword arguments forwarded to Path.open.

Returns:

Writable file-like object.

Return type:

IOBase

Examples

with OutPath("output.txt").open("w") as handle:
    handle.write("hello")
class makeprov.ProvPath(*paths: str | bytes | 'ProvPath')

Bases: PosixPath

Filesystem path with first-class support for stream placeholders.

Hyphen ("-") paths are treated as stdin/stdout streams but still behave like pathlib.Path instances for all other operations.

Examples

from makeprov.paths import ProvPath

p = ProvPath("-")
assert p.is_stream
property is_stream: bool
open(mode='r', *args, **kwargs)

Open the path while respecting stream semantics.

Parameters:
  • mode (str) – File open mode, passed through to Path.open when not operating on a stream.

  • *args – Additional positional arguments forwarded to Path.open.

  • **kwargs – Additional keyword arguments forwarded to Path.open.

Returns:

A file-like object for the requested mode.

Return type:

IOBase

Examples

from makeprov.paths import ProvPath

with ProvPath("output.txt").open("w") as handle:
    handle.write("hello")
property stream_name: str | None
class makeprov.ProvenanceConfig(base_iri=None, prov_dir='prov', prov_path=None, force=False, merge=True, dry_run=False, out_fmt='json', frame='provenance', context=False, context_url='https://w3id.org/makeprov/context')

Bases: Config

Runtime configuration for provenance generation.

base_iri: str | None = None
context: bool = False
context_url: str = 'https://w3id.org/makeprov/context'
dry_run: bool = False
force: bool = False
frame: Frame = 'provenance'
merge: bool = True
out_fmt: ProvFormat = 'json'
prov_dir: str = 'prov'
prov_path: str | None = None
class makeprov.RDFMixin

Bases: object

Provide JSON-LD serialization helpers for dataclasses.

The mixin preserves unknown fields when round-tripping JSON-LD documents and offers convenient conversion to rdflib graphs.

Examples

@dataclass
class Person(RDFMixin):
    id: str
    type: str = "ex:Person"
    name: str | None = None

person = Person(id="ex:alice", name="Alice")
jsonld = person.to_jsonld()
classmethod fields_subclass_first()

Return dataclass fields with subclass members ordered first.

Examples

from dataclasses import dataclass

@dataclass
class Thing(RDFMixin):
    id: str

Thing.fields_subclass_first()
classmethod from_jsonld(data)

Deserialize a JSON-LD mapping into the dataclass instance.

Parameters:

data (dict) – Parsed JSON-LD object including optional @context.

Returns:

An instance of cls populated from data.

Return type:

RDFMixin

Examples

person = Person.from_jsonld({"id": "ex:alice", "name": "Alice"})
to_graph()

Convert this object to an rdflib.Graph from JSON-LD.

Returns:

Graph containing triples representing the instance.

Return type:

rdflib.Graph

Raises:

RuntimeError – If rdflib is not installed.

Examples

graph = Person(id="ex:alice").to_graph()
to_jsonld(with_context=True, include_extra=True)

Serialize the object to a JSON-LD-compatible mapping.

Parameters:
  • with_context (bool) – Whether to include the @context section.

  • include_extra (bool) – Whether to emit unknown fields captured during deserialization.

Returns:

JSON-LD representation of the object.

Return type:

dict

Examples

person = Person(id="ex:alice", name="Alice")
payload = person.to_jsonld()
class makeprov.Session(rules_by_target=<factory>, rules_by_name=<factory>, pattern_rules=<factory>, commands=<factory>, prov_buffers=<factory>)

Bases: object

In-memory registries and buffers for a makeprov run.

commands: set[Callable]
pattern_rules: list[Rule]
prov_buffers: list[list[Prov]]
rules_by_name: dict[str, Rule]
rules_by_target: dict[str, Rule]
makeprov.build(target, _seen=None, *, session=None, **kwargs)

Recursively build a target and its prerequisites.

Parameters:
  • target (OutPath) – Path to the output to build. Paths may be concrete or match templated outputs registered with rule().

  • _seen (set[str] | None) – Internal set to detect graph cycles.

makeprov.build_all(*, session=None)

Build all concrete targets that have no dependents.

makeprov.dry_run_build(target, *, session=None)

Log the steps required to build target without executing rules.

Return type:

None

makeprov.explain(target, *, session=None)

Log the rule used for each target in build order.

Return type:

None

makeprov.list_rules(*, session=None)

Return registered rule names in alphabetical order.

Return type:

list[str]

makeprov.list_targets(*, session=None)

Return concrete targets produced by non-pattern rules.

Return type:

list[str]

makeprov.main(subcommands=None, conf_obj=None, argparse_kwargs={}, *, session=None, **kwargs)

Entry point for running registered CLI subcommands.

makeprov.needs_update(outputs, deps)

Determine whether outputs are stale relative to dependencies.

Parameters:
  • outputs (Iterable[str | Path]) – Output files expected to exist after a rule runs.

  • deps (Iterable[str | Path]) – Dependency files that must be newer than outputs for a rebuild to be unnecessary.

Returns:

True if any output is missing or older than a dependency; the absence of dependencies returns False to avoid unnecessary rebuilds.

Return type:

bool

Examples

from makeprov.core import needs_update

if needs_update(["data/output.txt"], ["data/input.txt"]):
    regenerate()
makeprov.new_session()

Create a fresh session with isolated registries and buffers.

Return type:

Session

makeprov.plan(target, *, session=None)

Return the execution order for building a target.

The plan is derived using resolve_target() for each dependency, ensuring concrete and templated rules are treated uniformly.

Return type:

list[tuple[str, Rule, dict[str, Any]]]

makeprov.resolve_target(target, *, session=None)

Resolve a target to its registered rule and parameters.

Concrete targets are looked up directly in RULES_BY_TARGET. Pattern rules are attempted in registration order using parse templates.

Return type:

tuple[Rule, dict[str, Any]]

makeprov.root_targets(*, session=None)

Return concrete targets that are not dependencies of other rules.

Return type:

list[str]

makeprov.rule(*, name=None, phony=False, base_iri=None, prov_dir=None, prov_path=None, force=None, dry_run=None, out_fmt=None, frame=None, config=None, context=None, merge=None, session=None)

Decorate a function as a build rule with automatic provenance.

Parameters:
  • name (str | None) – Logical name for the rule; defaults to the function name.

  • phony (bool) – When True, do not require an OutPath parameter and always execute the wrapped function regardless of timestamps. Useful for meta-rules such as aggregators or reporting commands.

  • base_iri (str | None) – Base IRI for provenance identifiers; overrides global configuration when provided.

  • prov_dir (str | None) – Directory where provenance documents are saved.

  • prov_path (str | None) – Explicit path for the provenance file; overrides prov_dir when set.

  • force (bool | None) – When True, always run the rule regardless of timestamps.

  • dry_run (bool | None) – When True, log activity without executing the wrapped function.

  • out_fmt (Optional[Literal['json', 'trig']]) – Output format for provenance files ("json" or "trig").

  • frame (Optional[Literal['provenance', 'results']]) – Which structure to make primary subject of jsonld or trig named graph. Options: “provenance” or “results”.

  • config (ProvenanceConfig | None) – Configuration object to use instead of the process-wide configuration returned by makeprov.ProvenanceConfig.

  • context (bool | None) – Whether to embed JSON-LD context in output when writing provenance.

  • merge (bool | None) – When True, buffer provenance for this rule and any nested rule calls, emitting a single merged document. Defaults to the configured merge behavior.

  • session (Session | None) – Registry and buffer container to use instead of the process-wide default session. Passing a dedicated session isolates rules, commands, and provenance buffers from other runs.

Returns:

A decorator that wraps the target function and registers it as a rule when outputs are discoverable from annotations. Templated InPath or OutPath defaults using str.format style placeholders (e.g. "data/{sample:d}.txt") register as pattern rules and are resolved dynamically for matching targets.

Return type:

Callable

Examples

Annotate parameters with InPath and OutPath to let the decorator infer dependencies:

from makeprov import InPath, OutPath, rule

@rule()
def uppercase(src: InPath, dst: OutPath):
    dst.write_text(src.read_text().upper())

uppercase("data/input.txt", "data/output.txt")
class makeprov.span(label, prov_path=None, *, frame=None, context=None, session=None)

Bases: ContextDecorator

Scope provenance buffering to a context or decorator.

Starting a span begins a provenance buffer; exiting flushes it to disk or merges it into the parent buffer. This avoids manual buffer start/flush orchestration around backend calls.

makeprov.to_dot(target, *, session=None)

Render the dependency graph for target in DOT format.

Return type:

str

Modules

config

core

paths

prov

rdfmixin

snakemake