Data Report — Fertility

Documentation: Season in which the analysis was performed. 1) winter, 2) spring, 3) Summer, 4) fall. (-1, -0.33, 0.33, 1)

Age at the time of analysis. 18-36 (0, 1)

Childish diseases (ie , chicken pox, measles, mumps, polio) 1) yes, 2) no. (0, 1)

Accident or serious trauma 1) yes, 2) no. (0, 1)

Surgical intervention 1) yes, 2) no. (0, 1)

High fevers in the last year 1) less than three months ago, 2) more than three months ago, 3) no. (-1, 0, 1)

Frequency of alcohol consumption 1) several times a day, 2) every day, 3) several times a week, 4) once a week, 5) hardly ever or never (0, 1)

Smoking habit 1) never, 2) occasional 3) daily. (-1, 0, 1)

Number of hours spent sitting per day ene-16 (0, 1)

Output: Diagnosis normal (N), altered (O)

Citation: {'@type': 'schema:ScholarlyArticle', 'title': 'Predicting seminal quality with artificial intelligence methods', 'schema:author': ['David Gil', 'J. L. Girela', 'Joaquin De Juan', 'M. Jose Gomez-Torres', 'Magnus Johnsson'], 'schema:isPartOf': 'Expert systems with applications', 'schema:datePublished': 2012, 'url': 'https://www.semanticscholar.org/paper/Predicting-seminal-quality-with-artificial-methods-Gil-Girela/92759c5ee08b9e6e7b17d1ccd48a7f8c02aba893'}

Source: UCI dataset 244

SemMap JSON-LD: dataset.semmap.json · RDFa HTML

Overview

Metric Value
Dataset Fertility
Source UCI dataset 244
Rows 100
Columns 10
Discrete 7
Continuous 3
SemMap SemMap JSON-LD
SemMap HTML
Missingness Not modeled

Variables and summary

variable inferred dist
season continuous -0.0789 ± 0.7967 [-1, -1, -0.33, 1, 1]
age continuous 0.6690 ± 0.1213 [0.5, 0.56, 0.67, 0.75, 1]
child_diseases discrete 1: 87 (87.00%)
accident discrete 1: 44 (44.00%)
surgical_intervention discrete 1: 51 (51.00%)
high_fevers discrete 0: 63 (63.00%)
1: 28 (28.00%)
-1: 9 (9.00%)
alcohol discrete 1: 40 (40.00%)
0.8: 39 (39.00%)
0.6: 19 (19.00%)
0.2: 1 (1.00%)
0.4: 1 (1.00%)
smoking discrete -1: 56 (56.00%)
0: 23 (23.00%)
1: 21 (21.00%)
hrs_sitting continuous 0.4068 ± 0.1864 [0.06, 0.25, 0.38, 0.5, 1]
diagnosis discrete N: 88 (88.00%)

Fidelity summary

umap model backend disc jsd mean disc jsd median cont ks mean cont w1 mean downstream sign match
metasyn metasyn 0.058 0.0683 0.2033 0.1181 0.25
clg_mi2 pybnesian 0.0647 0.0362 0.1833 0.1201
semi_mi5 pybnesian 0.0647 0.0362 0.1833 0.1201
ctgan_fast synthcity 0.3398 0.3911 0.6367 0.2003
tvae_quick synthcity 0.1168 0.1052 0.2933 0.1255

Privacy summary

model backend n real n synth exact overlap rate near duplicate rate eps nn distance mean k min k pct lt5 k map rare qi reproduction rate identifiability score delta presence
metasyn metasyn 100 100 0 0.21 0.4309 1 1 3 0 2.6667
clg_mi2 pybnesian 100 100 0 0.22 0.4959 1 1 2 0 2.5
semi_mi5 pybnesian 100 100 0 0.22 0.4959 1 1 2 0 2.5
ctgan_fast synthcity 100 100 0 0.12 0.3655 1 1 1 0 36
tvae_quick synthcity 100 100 0 0.08 0.3762 1 1 1 0 7

Models

UMAPDetailsStructure

Real data

Model: metasyn (metasyn)

Per-variable fidelity
variable type KS W1 JSD
season continuous 0.31 0.2876
age continuous 0.15 0.0261
child_diseases discrete 0.0245
accident discrete 0.0694
surgical_intervention discrete 0.0683
high_fevers discrete 0.069
alcohol discrete 0.0812
smoking discrete 0.0523
hrs_sitting continuous 0.15 0.0405
diagnosis discrete 0.0416
Downstream metrics
metric value
sign_match_rate 0.25
formula diagnosis ~ Q('season') + Q('age') + Q('child_diseases') + Q('accident') + Q('surgical_intervention') + Q('hrs_sitting') + Q('season'):Q('age') + Q('age'):Q('child_diseases') + Q('child_diseases'):Q('accident') + Q('accident'):Q('surgical_intervention') + Q('surgical_intervention'):Q('hrs_sitting')
skipped_reason
Privacy metrics
metric value
n_real 100
n_synth 100
exact_overlap_rate 0
near_duplicate_rate_eps 0.21
nn_distance_mean 0.4309
k_min 1
k_pct_lt5 1
k_map 3
rare_qi_reproduction_rate 0
delta_presence 2.6667
variable distribution
season core.uniform
age core.truncated_normal
child_diseases core.multinoulli
accident core.multinoulli
surgical_intervention core.multinoulli
high_fevers core.multinoulli
alcohol core.multinoulli
smoking core.multinoulli
hrs_sitting core.normal
diagnosis core.multinoulli

Model: clg_mi2 (pybnesian)

Per-variable fidelity
variable type KS W1 JSD
season continuous 0.27 0.3014
age continuous 0.15 0.026
child_diseases discrete 0.0362
accident discrete 0
surgical_intervention discrete 0.0597
high_fevers discrete 0.0286
alcohol discrete 0.1875
smoking discrete 0.1279
hrs_sitting continuous 0.13 0.0329
diagnosis discrete 0.0133
Privacy metrics
metric value
n_real 100
n_synth 100
exact_overlap_rate 0
near_duplicate_rate_eps 0.22
nn_distance_mean 0.4959
k_min 1
k_pct_lt5 1
k_map 2
rare_qi_reproduction_rate 0
delta_presence 2.5

Model: semi_mi5 (pybnesian)

Per-variable fidelity
variable type KS W1 JSD
season continuous 0.27 0.3014
age continuous 0.15 0.026
child_diseases discrete 0.0362
accident discrete 0
surgical_intervention discrete 0.0597
high_fevers discrete 0.0286
alcohol discrete 0.1875
smoking discrete 0.1279
hrs_sitting continuous 0.13 0.0329
diagnosis discrete 0.0133
Privacy metrics
metric value
n_real 100
n_synth 100
exact_overlap_rate 0
near_duplicate_rate_eps 0.22
nn_distance_mean 0.4959
k_min 1
k_pct_lt5 1
k_map 2
rare_qi_reproduction_rate 0
delta_presence 2.5

Model: ctgan_fast (synthcity)

Per-variable fidelity
variable type KS W1 JSD
season continuous 0.12 0.1127
age continuous 0.93 0.169
child_diseases discrete 0.1621
accident discrete 0.4065
surgical_intervention discrete 0.3911
high_fevers discrete 0.4932
alcohol discrete 0.266
smoking discrete 0.4694
hrs_sitting continuous 0.86 0.3192
diagnosis discrete 0.1901
Privacy metrics
metric value
n_real 100
n_synth 100
exact_overlap_rate 0
near_duplicate_rate_eps 0.12
nn_distance_mean 0.3655
k_min 1
k_pct_lt5 1
k_map 1
rare_qi_reproduction_rate 0
delta_presence 36

Model: tvae_quick (synthcity)

Per-variable fidelity
variable type KS W1 JSD
season continuous 0.21 0.2272
age continuous 0.29 0.0491
child_diseases discrete 0.1867
accident discrete 0.1052
surgical_intervention discrete 0.0856
high_fevers discrete 0.1405
alcohol discrete 0.138
smoking discrete 0.072
hrs_sitting continuous 0.38 0.1002
diagnosis discrete 0.0898
Privacy metrics
metric value
n_real 100
n_synth 100
exact_overlap_rate 0
near_duplicate_rate_eps 0.08
nn_distance_mean 0.3762
k_min 1
k_pct_lt5 1
k_map 1
rare_qi_reproduction_rate 0
delta_presence 7