semsynth.semmap

Functions

dataclass([cls, init, repr, eq, order, ...])

Add dunder methods based on the fields defined in the class.

get_column_name(entry, *[, extra_keys])

Return the first non-empty column name found in a JSON-LD column entry.

modeling_role(node, *[, default])

Return a simplified role label for modeling contexts.

normalize_role(raw)

Normalize a raw role string into a canonical privacy role label.

normalize_variable_descriptors(variables)

Normalize a collection of raw variable dictionaries into descriptors.

raw_role(node)

Extract a role value from a Column or JSON-LD mapping without normalization.

Classes

Any(*args, **kwargs)

Special type indicating an unconstrained type.

CodeBook([hasTopConcept, source])

CodeConcept([notation, prefLabel, ...])

Column(name[, titles, description, ...])

ColumnProperty([summaryStatistics, ...])

DatasetSchema(columns)

Enum(value)

Create a collection of name/value pairs.

Metadata(datasetSchema[, summaryStatistics, ...])

PintType([units, subdtype])

A Pint duck-typed class, suitable for holding a quantity (with unit specified) dtype.

RDFMixin()

Provide JSON-LD serialization helpers for dataclasses.

SemMapFrameAccessor(df)

DataFrame-level accessor for dataset metadata and Parquet round-trip.

SemMapSeriesAccessor(s)

Series-level accessor to attach metadata semantics.

SkosMappings(*[, exactMatch, closeMatch, ...])

StatisticalDataType(value)

SummaryStatistics([statisticalDataType, ...])

Unit(*[, exactMatch, closeMatch, ...])

VariableDescriptor(name[, description, ...])

Lightweight container for harmonized column metadata.

class semsynth.semmap.CodeBook(hasTopConcept: 'Optional[List[CodeConcept]]' = None, source: 'Optional[str]' = None)

Bases: RDFMixin

hasTopConcept: List[CodeConcept] | None = None
source: str | None = None
class semsynth.semmap.CodeConcept(notation: 'Optional[str]' = None, prefLabel: 'Optional[str]' = None, *, exactMatch: 'Optional[List[str]]' = None, closeMatch: 'Optional[List[str]]' = None, broadMatch: 'Optional[List[str]]' = None, narrowMatch: 'Optional[List[str]]' = None, relatedMatch: 'Optional[List[str]]' = None)

Bases: SkosMappings

notation: str | None = None
prefLabel: str | None = None
class semsynth.semmap.Column(name: 'str', titles: 'str | list[str] | None' = None, description: 'Optional[str]' = None, identifier: 'Optional[str]' = None, about: 'Optional[str]' = None, hadRole: 'Optional[str]' = None, defaultValue: 'Optional[Any]' = None, columnProperty: 'Optional[ColumnProperty]' = None, summaryStatistics: 'Optional[SummaryStatistics]' = None)

Bases: RDFMixin

about: str | None = None
columnProperty: ColumnProperty | None = None
defaultValue: Any | None = None
description: str | None = None
hadRole: str | None = None
identifier: str | None = None
name: str
summaryStatistics: SummaryStatistics | None = None
titles: str | list[str] | None = None
class semsynth.semmap.ColumnProperty(summaryStatistics: 'Optional[SummaryStatistics]' = None, unitText: 'Optional[str]' = None, hasUnit: 'Optional[Unit]' = None, source: 'Optional[str]' = None, hasCodeBook: 'Optional[CodeBook]' = None, hasVariable: 'str | CodeConcept | None' = None, *, exactMatch: 'Optional[List[str]]' = None, closeMatch: 'Optional[List[str]]' = None, broadMatch: 'Optional[List[str]]' = None, narrowMatch: 'Optional[List[str]]' = None, relatedMatch: 'Optional[List[str]]' = None)

Bases: SkosMappings, RDFMixin

hasCodeBook: CodeBook | None = None
hasUnit: Unit | None = None
hasVariable: str | CodeConcept | None = None
source: str | None = None
summaryStatistics: SummaryStatistics | None = None
unitText: str | None = None
class semsynth.semmap.DatasetSchema(columns: 'List[Column]')

Bases: RDFMixin

columns: List[Column]
class semsynth.semmap.Metadata(datasetSchema: 'DatasetSchema', summaryStatistics: 'Optional[SummaryStatistics]' = None, title: 'Optional[str]' = None, description: 'Optional[str]' = None, abstract: 'Optional[str]' = None, purpose: 'Optional[str]' = None, landingPage: 'Optional[str]' = None, tableOfContents: 'Optional[str]' = None, citation: 'Optional[Any]' = None, provider: 'Optional[str]' = None, identifier: 'Optional[Any]' = None, funding: 'Optional[Any]' = None, populationType: 'Optional[Any]' = None, accessRights: 'Optional[Any]' = None)

Bases: RDFMixin

abstract: str | None = None
accessRights: Any | None = None
citation: Any | None = None
datasetSchema: DatasetSchema
description: str | None = None
classmethod from_dcat_dsv(payload: Mapping[str, Any]) Metadata

Build a Metadata instance from a DCAT/DSV JSON-LD payload.

Parameters:

payload – Raw JSON-LD mapping created by semsynth.dataproviders.uciml.

Returns:

Parsed metadata with dataset- and column-level attributes populated.

funding: Any | None = None
identifier: Any | None = None
landingPage: str | None = None
populationType: Any | None = None
provider: str | None = None
purpose: str | None = None
summaryStatistics: SummaryStatistics | None = None
tableOfContents: str | None = None
title: str | None = None
to_jsonld(with_context: bool = False) Dict[str, Any] | None

Serialize the object to a JSON-LD-compatible mapping.

Parameters:
  • with_context (bool) – Whether to include the @context section.

  • include_extra (bool) – Whether to emit unknown fields captured during deserialization.

Returns:

JSON-LD representation of the object.

Return type:

dict

Examples

person = Person(id="ex:alice", name="Alice")
payload = person.to_jsonld()
to_privacy_frame(inferred: Mapping[str, str]) DataFrame

Build a privacy metadata dataframe from SemMap content.

The privacy metrics expect roles such as qi and sensitive and a coarse type mapping (numeric/categorical/datetime). This helper normalizes SemMap roles to those expectations and uses statisticalDataType or codebooks to infer the variable types, falling back to the provided inferred mapping when semantics are absent.

Parameters:

inferred – Mapping of column names to inferred types (discrete or continuous) used as fallback when semantics are missing.

Returns:

Dataframe with variable, role and type columns.

update_completeness_from_missingness(df: DataFrame, missingness_model: Any | None) None

Refresh completeness and missing-value annotations based on fitted models.

Parameters:
  • df – Dataframe used to compute dataset completeness.

  • missingness_model – Optional DataFrameMissingnessModel instance containing per-column missingness probabilities.

class semsynth.semmap.SemMapFrameAccessor(df: DataFrame)

Bases: object

DataFrame-level accessor for dataset metadata and Parquet round-trip.

from_jsonld(metadata: str | dict[str, Any], *, with_context: bool = False, convert_pint: bool = True) SemMapFrameAccessor

Attach dataset metadata and column schema from a JSON object.

static read_parquet(path: str, *, convert_pint: bool = True, **pq_kwargs) DataFrame

Read Parquet and restore semantics + pint units.

to_jsonld(with_context: bool = False) Dict[str, Any] | None
to_parquet(path: str, *, with_context: bool = False, index: bool = False, **pq_kwargs) None

Write Parquet with semantics stored in Arrow schema and fields.

class semsynth.semmap.SemMapSeriesAccessor(s: Series)

Bases: object

Series-level accessor to attach metadata semantics.

from_jsonld(metadata: Dict[str, Any], convert_pint: bool = True) SemMapSeriesAccessor

Attach column semantics from a JSON-LD payload.

Parameters:
  • metadata – Column-level JSON-LD mapping.

  • convert_pint – Whether to attempt pint dtype conversion.

Returns:

Accessor instance for chaining.

set_categorical(name: str, label: str, *, codes: dict[int | str, str], scheme_source_iri: str | None = None, source_iri: str | None = None) SemMapSeriesAccessor

Attach categorical variable metadata (integer-coded or strings).

set_numeric(name: str, label: str, *, unit_text: str | None = None, ucum_code: str | None = None, qudt_unit_iri: str | None = None, source_iri: str | None = None, convert_to_pint: bool = True) SemMapSeriesAccessor

Attach numeric variable metadata and (optionally) convert dtype to pint.

to_jsonld(with_context: bool = False) Dict[str, Any] | None
class semsynth.semmap.SkosMappings(*, exactMatch: 'Optional[List[str]]' = None, closeMatch: 'Optional[List[str]]' = None, broadMatch: 'Optional[List[str]]' = None, narrowMatch: 'Optional[List[str]]' = None, relatedMatch: 'Optional[List[str]]' = None)

Bases: RDFMixin

broadMatch: List[str] | None = None
closeMatch: List[str] | None = None
exactMatch: List[str] | None = None
narrowMatch: List[str] | None = None
relatedMatch: List[str] | None = None
class semsynth.semmap.StatisticalDataType(value)

Bases: str, Enum

Interval = 'dsv:IntervalDataType'
Nominal = 'dsv:NominalDataType'
Numerical = 'dsv:NumericalDataType'
Ordinal = 'dsv:OrdinalDataType'
Ratio = 'dsv:RatioDataType'
class semsynth.semmap.SummaryStatistics(statisticalDataType: 'Optional[StatisticalDataType]' = None, columnCompleteness: 'Optional[float]' = None, datasetCompleteness: 'Optional[float]' = None, numberOfRows: 'Optional[int]' = None, numberOfColumns: 'Optional[int]' = None, missingValueFormat: 'Optional[str]' = None, meanValue: 'Optional[float]' = None, medianValue: 'Optional[float]' = None, minimum: 'Optional[float]' = None, maximum: 'Optional[float]' = None)

Bases: RDFMixin

columnCompleteness: float | None = None
datasetCompleteness: float | None = None
maximum: float | None = None
meanValue: float | None = None
medianValue: float | None = None
minimum: float | None = None
missingValueFormat: str | None = None
numberOfColumns: int | None = None
numberOfRows: int | None = None
statisticalDataType: StatisticalDataType | None = None
class semsynth.semmap.Unit(*, exactMatch: List[str] | None = None, closeMatch: List[str] | None = None, broadMatch: List[str] | None = None, narrowMatch: List[str] | None = None, relatedMatch: List[str] | None = None)

Bases: SkosMappings

ucumCode: str | None = None
class semsynth.semmap.VariableDescriptor(name: str, description: str | None = None, role: str | None = None, unit: str | None = None)

Bases: object

Lightweight container for harmonized column metadata.

as_dict() Dict[str, str | None]

Convert the descriptor to a plain mapping.

description: str | None = None
name: str
role: str | None = None
unit: str | None = None
semsynth.semmap.get_column_name(entry: Mapping[str, Any], *, extra_keys: Sequence[str] = ()) str | None

Return the first non-empty column name found in a JSON-LD column entry.

semsynth.semmap.modeling_role(node: Mapping[str, Any] | Column | None, *, default: str = 'predictor') str

Return a simplified role label for modeling contexts.

semsynth.semmap.normalize_role(raw: str | None) str

Normalize a raw role string into a canonical privacy role label.

semsynth.semmap.normalize_variable_descriptors(variables: Iterable[Mapping[str, Any]]) List[VariableDescriptor]

Normalize a collection of raw variable dictionaries into descriptors.

semsynth.semmap.raw_role(node: Mapping[str, Any] | Column | None) str | None

Extract a role value from a Column or JSON-LD mapping without normalization.