semsynth.downstream_fidelity

Downstream fidelity comparison between real and synthetic data.

Functions

auto_formula(df, meta, cfg)

compute_downstream(df_real, df_synth[, ...])

dataclass([cls, init, repr, eq, order, ...])

Add dunder methods based on the fields defined in the class.

get_column_name(entry, *[, extra_keys])

Return the first non-empty column name found in a JSON-LD column entry.

modeling_role(node, *[, default])

Return a simplified role label for modeling contexts.

recompute_downstream([metrics, real, synth, ...])

Recompute downstream metrics from existing real/synthetic CSVs.

rule(*[, name, phony, base_iri, prov_dir, ...])

Decorate a function as a build rule with automatic provenance.

Classes

Any(*args, **kwargs)

Special type indicating an unconstrained type.

CategoricalDtype([categories, ordered])

Type for categorical data with the categories and orderedness.

Config()

Base configuration container with TOML application helpers.

DownstreamConfig([m, burnin, ...])

InPath(*paths)

Marker for input paths where "-" maps to stdin.

LogisticRegression([penalty, C, l1_ratio, ...])

Logistic Regression (aka logit, MaxEnt) classifier.

OutPath(*paths)

Marker for output paths where "-" maps to stdout.

Path(*args, **kwargs)

PurePath subclass that can make system calls.

SimpleImputer(*[, missing_values, strategy, ...])

Univariate imputer for completing missing values with simple strategies.

class semsynth.downstream_fidelity.DownstreamConfig(m: 'int' = 20, burnin: 'int' = 5, max_interactions: 'int' = 5, cv: 'int' = 5)

Bases: Config

burnin: int = 5
cv: int = 5
m: int = 20
max_interactions: int = 5
semsynth.downstream_fidelity.auto_formula(df: DataFrame, meta: Mapping[str, Any], cfg: DownstreamConfig) str
semsynth.downstream_fidelity.compute_downstream(df_real: DataFrame, df_synth: DataFrame, meta: Mapping[str, Any] | None = None, *, cfg: DownstreamConfig | None = None, target: str | None = None) Mapping[str, Any]
semsynth.downstream_fidelity.recompute_downstream(metrics: OutPath = OutPath('{metrics}'), *, real: InPath | None = None, synth: InPath | None = None, meta: InPath | None = None, target: str | None = None, verbose: bool = False) None

Recompute downstream metrics from existing real/synthetic CSVs.