Data Report — Hepatitis

Source: UCI dataset 46

SemMap JSON-LD: dataset.semmap.json · RDFa HTML

Overview

Metric Value
Dataset Hepatitis
Source UCI dataset 46
Rows 80
Columns 20
Discrete 14
Continuous 6
SemMap SemMap JSON-LD
SemMap HTML
Missingness Not modeled

Variables and summary

variable inferred dist
Age continuous 40.6625 ± 11.2800 [20, 32, 38.5, 49.25, 72]
Sex discrete 1: 69 (86.25%)
Steroid discrete 1: 38 (47.50%)
Antivirals discrete 1: 21 (26.25%)
Fatigue discrete 1: 52 (65.00%)
Malaise discrete 1: 31 (38.75%)
Anorexia discrete 1: 12 (15.00%)
Liver Big discrete 1: 13 (16.25%)
Liver Firm discrete 1: 38 (47.50%)
Spleen Palpable discrete 1: 15 (18.75%)
Spiders discrete 1: 25 (31.25%)
Ascites discrete 1: 12 (15.00%)
Varices discrete 1: 10 (12.50%)
Bilirubin continuous 1.2212 ± 0.8752 [0.3, 0.7, 1, 1.3, 4.8]
Alk Phosphate continuous 102.9125 ± 53.6848 [26, 68.25, 85, 133.5, 280]
Sgot continuous 82.0250 ± 71.6000 [14, 30.75, 56.5, 102.75, 420]
Albumin continuous 3.8438 ± 0.5763 [2.1, 3.5, 4, 4.2, 5]
Protime continuous 62.5125 ± 23.4278 [0, 46, 62, 77.25, 100]
Histology discrete 1: 47 (58.75%)
Class discrete 1: 13 (16.25%)

Fidelity summary

umap model backend disc jsd mean disc jsd median cont ks mean cont w1 mean downstream sign match
metasyn metasyn 0.1294 0.1524 0.1977 8.8017 0.64
clg_mi2 pybnesian 0.1362 0.1411 0.2325 11.5322
semi_mi5 pybnesian 0.1273 0.1378 0.1912 9.5664
ctgan_fast synthcity 0.261 0.205 0.8877 46.3285
tvae_quick synthcity 0.1585 0.1439 0.3241 13.0605

Privacy summary

model backend n real n synth exact overlap rate near duplicate rate eps nn distance mean k min k pct lt5 k map rare qi reproduction rate identifiability score delta presence
metasyn metasyn 80 155 0 0.8625 0.1294 1 1 6 0 1.4
clg_mi2 pybnesian 80 155 0 0.9375 0.0909 1 1 8 0 1.8182
semi_mi5 pybnesian 80 155 0 0.8625 0.1206 1 1 4 0 1.5217
ctgan_fast synthcity 80 155 0 0.3875 0.2962 1 1 6 0 5
tvae_quick synthcity 80 155 0 0.925 0.1192 1 1 1 0 6

Models

UMAPDetailsStructure

Real data

Model: metasyn (metasyn)

Per-variable fidelity
variable type KS W1 JSD
Age continuous 0.1605 2.7707
Sex discrete 0.1834
Steroid discrete 0.2074
Antivirals discrete 0.0446
Fatigue discrete 0.164
Malaise discrete 0.1496
Anorexia discrete 0.1744
Liver Big discrete 0.0445
Liver Firm discrete 0.0395
Spleen Palpable discrete 0.1067
Downstream metrics
metric value
sign_match_rate 0.64
formula Class ~ Age + Sex + Steroid + Antivirals + Fatigue + Malaise + Anorexia + Liver_Big + Liver_Firm + Spleen_Palpable + Spiders + Ascites + Varices + Bilirubin + Alk_Phosphate + Sgot + Albumin + Protime + Histology + Age:Sex + Sex:Steroid + Steroid:Antivirals + Antivirals:Fatigue + Fatigue:Malaise
Privacy metrics
metric value
n_real 80
n_synth 155
exact_overlap_rate 0
near_duplicate_rate_eps 0.8625
nn_distance_mean 0.1294
k_min 1
k_pct_lt5 1
k_map 6
rare_qi_reproduction_rate 0
delta_presence 1.4
variable distribution
Age core.lognormal
Sex core.multinoulli
Steroid core.multinoulli
Antivirals core.multinoulli
Fatigue core.multinoulli
Malaise core.multinoulli
Anorexia core.multinoulli
Liver Big core.multinoulli
Liver Firm core.multinoulli
Spleen Palpable core.multinoulli
Spiders core.multinoulli
Ascites core.multinoulli
Varices core.multinoulli
Bilirubin core.lognormal
Alk Phosphate core.lognormal
Sgot core.truncated_normal
Albumin core.normal
Protime core.truncated_normal
Histology core.multinoulli
Class core.multinoulli

Model: clg_mi2 (pybnesian)

Per-variable fidelity
variable type KS W1 JSD
Age continuous 0.225 4.5188
Sex discrete 0.1899
Steroid discrete 0.1858
Antivirals discrete 0.0506
Fatigue discrete 0.1253
Malaise discrete 0.2236
Anorexia discrete 0.1569
Liver Big discrete 0.1112
Liver Firm discrete 0.0045
Spleen Palpable discrete 0.1197
Privacy metrics
metric value
n_real 80
n_synth 155
exact_overlap_rate 0
near_duplicate_rate_eps 0.9375
nn_distance_mean 0.0909
k_min 1
k_pct_lt5 1
k_map 8
rare_qi_reproduction_rate 0
delta_presence 1.8182

Model: semi_mi5 (pybnesian)

Per-variable fidelity
variable type KS W1 JSD
Age continuous 0.2113 4.0159
Sex discrete 0.1495
Steroid discrete 0.2074
Antivirals discrete 0.0016
Fatigue discrete 0.1253
Malaise discrete 0.2236
Anorexia discrete 0.1569
Liver Big discrete 0.0075
Liver Firm discrete 0.045
Spleen Palpable discrete 0.1261
Privacy metrics
metric value
n_real 80
n_synth 155
exact_overlap_rate 0
near_duplicate_rate_eps 0.8625
nn_distance_mean 0.1206
k_min 1
k_pct_lt5 1
k_map 4
rare_qi_reproduction_rate 0
delta_presence 1.5217

Model: ctgan_fast (synthcity)

Per-variable fidelity
variable type KS W1 JSD
Age continuous 0.8556 22.7668
Sex discrete 0.061
Steroid discrete 0.1655
Antivirals discrete 0.2442
Fatigue discrete 0.0141
Malaise discrete 0.0833
Anorexia discrete 0.2806
Liver Big discrete 0.2349
Liver Firm discrete 0.4988
Spleen Palpable discrete 0.175
Privacy metrics
metric value
n_real 80
n_synth 155
exact_overlap_rate 0
near_duplicate_rate_eps 0.3875
nn_distance_mean 0.2962
k_min 1
k_pct_lt5 1
k_map 6
rare_qi_reproduction_rate 0
delta_presence 5

Model: tvae_quick (synthcity)

Per-variable fidelity
variable type KS W1 JSD
Age continuous 0.2371 4.4207
Sex discrete 0.063
Steroid discrete 0.229
Antivirals discrete 0.1162
Fatigue discrete 0.1309
Malaise discrete 0.2714
Anorexia discrete 0.2442
Liver Big discrete 0.1499
Liver Firm discrete 0.0604
Spleen Palpable discrete 0.2211
Privacy metrics
metric value
n_real 80
n_synth 155
exact_overlap_rate 0
near_duplicate_rate_eps 0.925
nn_distance_mean 0.1192
k_min 1
k_pct_lt5 1
k_map 1
rare_qi_reproduction_rate 0
delta_presence 6