Provenance output
Every rule decorated with makeprov.core.rule() can emit provenance records
that describe the activity, inputs, outputs, environment, and results. The
library supports JSON-LD and TriG outputs, aligning with the W3C PROV data
model.
JSON-LD structure
When out_fmt is set to "json", the resulting file contains a top-level
provenance array with entries for activities, agents, entities, and optional
result graphs. Set context=True to embed the JSON-LD context alongside the
graph data. When context=False, the @context field points to the
published context URL (configurable via context_url) so consumers can reuse
a shared, versioned context document.
{
"@context": { "prov": "http://www.w3.org/ns/prov#" },
"provenance": [
{"@id": "script.py#uppercase-20240101T120000", "@type": "prov:Activity"}
]
}
Relationship fields (for example wasGeneratedBy or wasAssociatedWith)
accept either full provenance nodes or JSON-LD references in {"@id": ...}
form. This allows callers to mix embedded entities with references to
external/previously declared ones when constructing or deserializing
ActivityNode, GraphEntity, and
FileEntity instances.
TriG datasets
TriG output writes a dataset that combines the default graph for provenance with
named graphs for any makeprov.rdfmixin.RDFMixin results returned by a
rule. This is useful when merging provenance with domain-specific RDF data.
output = prov.write("prov/uppercase", fmt="trig")
print(output)
Environment capture
If the calling project is installed as a Python distribution, the library
captures the package name, version, and dependencies as part of the
provenance. This information is attached as a collection entity linked to the
generating activity. The environment identifier is derived from a hash of the
package metadata (name, version, dependencies) rather than the run timestamp,
so multiple runs with the same environment share a stable env-<hash> IRI.
When provenance documents are merged, the environment entity is deduplicated so
only one copy is emitted in the combined graph.
Result graphs
Return instances of RDFMixin from your rules to
include domain-specific RDF alongside provenance. Each result object is embedded
in a named graph so consumers can query both provenance and outputs together.