semsynth.semmap

Functions

`dataclass`([cls, init, repr, eq, order, ...])	Add dunder methods based on the fields defined in the class.
`get_column_name`(entry, *[, extra_keys])	Return the first non-empty column name found in a JSON-LD column entry.
`modeling_role`(node, *[, default])	Return a simplified role label for modeling contexts.
`normalize_role`(raw)	Normalize a raw role string into a canonical privacy role label.
`normalize_variable_descriptors`(variables)	Normalize a collection of raw variable dictionaries into descriptors.
`raw_role`(node)	Extract a role value from a Column or JSON-LD mapping without normalization.

Classes

`Any`(args, *kwargs)	Special type indicating an unconstrained type.
`CodeBook`([hasTopConcept, source])
`CodeConcept`([notation, prefLabel, ...])
`Column`(name[, titles, description, ...])
`ColumnProperty`([summaryStatistics, ...])
`DatasetSchema`(columns)
`Enum`(value)	Create a collection of name/value pairs.
`Metadata`(datasetSchema[, summaryStatistics, ...])
`PintType`([units, subdtype])	A Pint duck-typed class, suitable for holding a quantity (with unit specified) dtype.
`RDFMixin`()	Provide JSON-LD serialization helpers for dataclasses.
`SemMapFrameAccessor`(df)	DataFrame-level accessor for dataset metadata and Parquet round-trip.
`SemMapSeriesAccessor`(s)	Series-level accessor to attach metadata semantics.
`SkosMappings`(*[, exactMatch, closeMatch, ...])
`StatisticalDataType`(value)
`SummaryStatistics`([statisticalDataType, ...])
`Unit`(*[, exactMatch, closeMatch, ...])
`VariableDescriptor`(name[, description, ...])	Lightweight container for harmonized column metadata.

class semsynth.semmap.CodeBook(hasTopConcept: 'Optional[List[CodeConcept]]' = None, source: 'Optional[str]' = None)

Bases: RDFMixin

hasTopConcept: List[CodeConcept] | None = None

source: str | None = None

class semsynth.semmap.CodeConcept(notation: 'Optional[str]' = None, prefLabel: 'Optional[str]' = None, *, exactMatch: 'Optional[List[str]]' = None, closeMatch: 'Optional[List[str]]' = None, broadMatch: 'Optional[List[str]]' = None, narrowMatch: 'Optional[List[str]]' = None, relatedMatch: 'Optional[List[str]]' = None)

Bases: SkosMappings

notation: str | None = None

prefLabel: str | None = None

class semsynth.semmap.Column(name: 'str', titles: 'str | list[str] | None' = None, description: 'Optional[str]' = None, identifier: 'Optional[str]' = None, about: 'Optional[str]' = None, hadRole: 'Optional[str]' = None, defaultValue: 'Optional[Any]' = None, columnProperty: 'Optional[ColumnProperty]' = None, summaryStatistics: 'Optional[SummaryStatistics]' = None)

Bases: RDFMixin

about: str | None = None

columnProperty: ColumnProperty | None = None

defaultValue: Any | None = None

description: str | None = None

hadRole: str | None = None

identifier: str | None = None

name: str

summaryStatistics: SummaryStatistics | None = None

titles: str | list[str] | None = None

class semsynth.semmap.ColumnProperty(summaryStatistics: 'Optional[SummaryStatistics]' = None, unitText: 'Optional[str]' = None, hasUnit: 'Optional[Unit]' = None, source: 'Optional[str]' = None, hasCodeBook: 'Optional[CodeBook]' = None, hasVariable: 'str | CodeConcept | None' = None, *, exactMatch: 'Optional[List[str]]' = None, closeMatch: 'Optional[List[str]]' = None, broadMatch: 'Optional[List[str]]' = None, narrowMatch: 'Optional[List[str]]' = None, relatedMatch: 'Optional[List[str]]' = None)

Bases: SkosMappings, RDFMixin

hasCodeBook: CodeBook | None = None

hasUnit: Unit | None = None

hasVariable: str | CodeConcept | None = None

source: str | None = None

summaryStatistics: SummaryStatistics | None = None

unitText: str | None = None

class semsynth.semmap.DatasetSchema(columns: 'List[Column]')

Bases: RDFMixin

columns: List[Column]

class semsynth.semmap.Metadata(datasetSchema: 'DatasetSchema', summaryStatistics: 'Optional[SummaryStatistics]' = None, title: 'Optional[str]' = None, description: 'Optional[str]' = None, abstract: 'Optional[str]' = None, purpose: 'Optional[str]' = None, landingPage: 'Optional[str]' = None, tableOfContents: 'Optional[str]' = None, citation: 'Optional[Any]' = None, provider: 'Optional[str]' = None, identifier: 'Optional[Any]' = None, funding: 'Optional[Any]' = None, populationType: 'Optional[Any]' = None, accessRights: 'Optional[Any]' = None)

Bases: RDFMixin

abstract: str | None = None

accessRights: Any | None = None

citation: Any | None = None

datasetSchema: DatasetSchema

description: str | None = None

classmethod from_dcat_dsv(payload: Mapping[str, Any]) → Metadata

Build a Metadata instance from a DCAT/DSV JSON-LD payload.

Parameters:: payload – Raw JSON-LD mapping created by semsynth.dataproviders.uciml.
Returns:: Parsed metadata with dataset- and column-level attributes populated.

funding: Any | None = None

identifier: Any | None = None

landingPage: str | None = None

populationType: Any | None = None

provider: str | None = None

purpose: str | None = None

summaryStatistics: SummaryStatistics | None = None

tableOfContents: str | None = None

title: str | None = None

to_jsonld(with_context: bool = False) → Dict[str, Any] | None

Serialize the object to a JSON-LD-compatible mapping.

Parameters:

with_context (bool) – Whether to include the @context section.
include_extra (bool) – Whether to emit unknown fields captured during deserialization.

Returns:

JSON-LD representation of the object.

Return type:

dict

Examples

person = Person(id="ex:alice", name="Alice")
payload = person.to_jsonld()

to_privacy_frame(inferred: Mapping[str, str]) → DataFrame

Build a privacy metadata dataframe from SemMap content.

The privacy metrics expect roles such as qi and sensitive and a coarse type mapping (numeric/categorical/datetime). This helper normalizes SemMap roles to those expectations and uses statisticalDataType or codebooks to infer the variable types, falling back to the provided inferred mapping when semantics are absent.

Parameters:: inferred – Mapping of column names to inferred types (discrete or continuous) used as fallback when semantics are missing.
Returns:: Dataframe with variable, role and type columns.

update_completeness_from_missingness(df: DataFrame, missingness_model: Any | None) → None

Refresh completeness and missing-value annotations based on fitted models.

Parameters:

df – Dataframe used to compute dataset completeness.
missingness_model – Optional DataFrameMissingnessModel instance containing per-column missingness probabilities.

class semsynth.semmap.SemMapFrameAccessor(df: DataFrame)

Bases: object

DataFrame-level accessor for dataset metadata and Parquet round-trip.

from_jsonld(metadata: str | dict[str, Any], *, with_context: bool = False, convert_pint: bool = True) → SemMapFrameAccessor: Attach dataset metadata and column schema from a JSON object.

static read_parquet(path: str, *, convert_pint: bool = True, **pq_kwargs) → DataFrame: Read Parquet and restore semantics + pint units.

to_jsonld(with_context: bool = False) → Dict[str, Any] | None

to_parquet(path: str, *, with_context: bool = False, index: bool = False, **pq_kwargs) → None: Write Parquet with semantics stored in Arrow schema and fields.

class semsynth.semmap.SemMapSeriesAccessor(s: Series)

Bases: object

Series-level accessor to attach metadata semantics.

from_jsonld(metadata: Dict[str, Any], convert_pint: bool = True) → SemMapSeriesAccessor

Attach column semantics from a JSON-LD payload.

Parameters:

metadata – Column-level JSON-LD mapping.
convert_pint – Whether to attempt pint dtype conversion.

Returns:

Accessor instance for chaining.

set_categorical(name: str, label: str, *, codes: dict[int | str, str], scheme_source_iri: str | None = None, source_iri: str | None = None) → SemMapSeriesAccessor: Attach categorical variable metadata (integer-coded or strings).

set_numeric(name: str, label: str, *, unit_text: str | None = None, ucum_code: str | None = None, qudt_unit_iri: str | None = None, source_iri: str | None = None, convert_to_pint: bool = True) → SemMapSeriesAccessor: Attach numeric variable metadata and (optionally) convert dtype to pint.

to_jsonld(with_context: bool = False) → Dict[str, Any] | None

class semsynth.semmap.SkosMappings(*, exactMatch: 'Optional[List[str]]' = None, closeMatch: 'Optional[List[str]]' = None, broadMatch: 'Optional[List[str]]' = None, narrowMatch: 'Optional[List[str]]' = None, relatedMatch: 'Optional[List[str]]' = None)

Bases: RDFMixin

broadMatch: List[str] | None = None

closeMatch: List[str] | None = None

exactMatch: List[str] | None = None

narrowMatch: List[str] | None = None

relatedMatch: List[str] | None = None

class semsynth.semmap.StatisticalDataType(value)

Bases: str, Enum

Interval = 'dsv:IntervalDataType'

Nominal = 'dsv:NominalDataType'

Numerical = 'dsv:NumericalDataType'

Ordinal = 'dsv:OrdinalDataType'

Ratio = 'dsv:RatioDataType'

class semsynth.semmap.SummaryStatistics(statisticalDataType: 'Optional[StatisticalDataType]' = None, columnCompleteness: 'Optional[float]' = None, datasetCompleteness: 'Optional[float]' = None, numberOfRows: 'Optional[int]' = None, numberOfColumns: 'Optional[int]' = None, missingValueFormat: 'Optional[str]' = None, meanValue: 'Optional[float]' = None, medianValue: 'Optional[float]' = None, minimum: 'Optional[float]' = None, maximum: 'Optional[float]' = None)

Bases: RDFMixin

columnCompleteness: float | None = None

datasetCompleteness: float | None = None

maximum: float | None = None

meanValue: float | None = None

medianValue: float | None = None

minimum: float | None = None

missingValueFormat: str | None = None

numberOfColumns: int | None = None

numberOfRows: int | None = None

statisticalDataType: StatisticalDataType | None = None

Bases: SkosMappings

ucumCode: str | None = None

class semsynth.semmap.VariableDescriptor(name: str, description: str | None = None, role: str | None = None, unit: str | None = None)

Bases: object

Lightweight container for harmonized column metadata.

as_dict() → Dict[str, str | None]: Convert the descriptor to a plain mapping.

description: str | None = None

name: str

role: str | None = None

unit: str | None = None

semsynth.semmap.get_column_name(entry: Mapping[str, Any], *, extra_keys: Sequence[str] = ()) → str | None: Return the first non-empty column name found in a JSON-LD column entry.

semsynth.semmap.modeling_role(node: Mapping[str, Any] | Column | None, *, default: str = 'predictor') → str: Return a simplified role label for modeling contexts.

semsynth.semmap.normalize_role(raw: str | None) → str: Normalize a raw role string into a canonical privacy role label.

semsynth.semmap.normalize_variable_descriptors(variables: Iterable[Mapping[str, Any]]) → List[VariableDescriptor]: Normalize a collection of raw variable dictionaries into descriptors.

semsynth.semmap.raw_role(node: Mapping[str, Any] | Column | None) → str | None: Extract a role value from a Column or JSON-LD mapping without normalization.