semsynth.privacy_metrics
Functions
|
Add dunder methods based on the fields defined in the class. |
|
Ensure torch.nn exposes RMSNorm on versions that predate it. |
|
Summarize privacy metrics using SynthCity on aligned real/synthetic dataframes. |
Classes
|
- class semsynth.privacy_metrics.DatasetPrivacySummary(n_real: 'int', n_synth: 'int', used_columns: 'List[str]', qi_columns: 'List[str]', sensitive_columns: 'List[str]', exact_overlap_rate: 'float', near_duplicate_rate_eps: 'float', nn_distance_stats: 'Dict[str, float]', k_min: 'Optional[int]', k_pct_lt5: 'Optional[float]', k_map: 'Optional[int]', rare_qi_reproduction_rate: 'Optional[float]', t_closeness: 'Dict[str, Dict[str, float]]', identifiability_score: 'Optional[float]' = None, delta_presence: 'Optional[float]' = None)
Bases:
object- delta_presence: float | None = None
- exact_overlap_rate: float
- identifiability_score: float | None = None
- k_map: int | None
- k_min: int | None
- k_pct_lt5: float | None
- n_real: int
- n_synth: int
- near_duplicate_rate_eps: float
- nn_distance_stats: Dict[str, float]
- qi_columns: List[str]
- rare_qi_reproduction_rate: float | None
- sensitive_columns: List[str]
- t_closeness: Dict[str, Dict[str, float]]
- used_columns: List[str]
- semsynth.privacy_metrics.summarize_privacy_synthcity(df_real: pd.DataFrame, df_synth: pd.DataFrame, meta: pd.DataFrame, *, eps: float = 0.1) DatasetPrivacySummary
Summarize privacy metrics using SynthCity on aligned real/synthetic dataframes.
- Parameters:
df_real – Real dataframe.
df_synth – Synthetic dataframe.
meta – Metadata with variable roles and types.
eps – Unused hook for future custom thresholds; kept for API stability.
- Returns:
DatasetPrivacySummary with overlap, neighbor, k-map, t-closeness, and optional identifiability/delta-presence scores.