semsynth.privacy_metrics

Functions

dataclass([cls, init, repr, eq, order, ...])

Add dunder methods based on the fields defined in the class.

ensure_torch_rmsnorm()

Ensure torch.nn exposes RMSNorm on versions that predate it.

summarize_privacy_synthcity(df_real, ...[, eps])

Summarize privacy metrics using SynthCity on aligned real/synthetic dataframes.

Classes

DatasetPrivacySummary(n_real, n_synth, ...)

class semsynth.privacy_metrics.DatasetPrivacySummary(n_real: 'int', n_synth: 'int', used_columns: 'List[str]', qi_columns: 'List[str]', sensitive_columns: 'List[str]', exact_overlap_rate: 'float', near_duplicate_rate_eps: 'float', nn_distance_stats: 'Dict[str, float]', k_min: 'Optional[int]', k_pct_lt5: 'Optional[float]', k_map: 'Optional[int]', rare_qi_reproduction_rate: 'Optional[float]', t_closeness: 'Dict[str, Dict[str, float]]', identifiability_score: 'Optional[float]' = None, delta_presence: 'Optional[float]' = None)

Bases: object

delta_presence: float | None = None
exact_overlap_rate: float
identifiability_score: float | None = None
k_map: int | None
k_min: int | None
k_pct_lt5: float | None
n_real: int
n_synth: int
near_duplicate_rate_eps: float
nn_distance_stats: Dict[str, float]
qi_columns: List[str]
rare_qi_reproduction_rate: float | None
sensitive_columns: List[str]
t_closeness: Dict[str, Dict[str, float]]
used_columns: List[str]
semsynth.privacy_metrics.summarize_privacy_synthcity(df_real: pd.DataFrame, df_synth: pd.DataFrame, meta: pd.DataFrame, *, eps: float = 0.1) DatasetPrivacySummary

Summarize privacy metrics using SynthCity on aligned real/synthetic dataframes.

Parameters:
  • df_real – Real dataframe.

  • df_synth – Synthetic dataframe.

  • meta – Metadata with variable roles and types.

  • eps – Unused hook for future custom thresholds; kept for API stability.

Returns:

DatasetPrivacySummary with overlap, neighbor, k-map, t-closeness, and optional identifiability/delta-presence scores.