semsynth.semmap
Functions
|
Add dunder methods based on the fields defined in the class. |
|
Return the first non-empty column name found in a JSON-LD column entry. |
|
Return a simplified role label for modeling contexts. |
|
Normalize a raw role string into a canonical privacy role label. |
|
Normalize a collection of raw variable dictionaries into descriptors. |
|
Extract a role value from a Column or JSON-LD mapping without normalization. |
Classes
|
Special type indicating an unconstrained type. |
|
|
|
|
|
|
|
|
|
|
|
Create a collection of name/value pairs. |
|
|
|
A Pint duck-typed class, suitable for holding a quantity (with unit specified) dtype. |
|
Provide JSON-LD serialization helpers for dataclasses. |
DataFrame-level accessor for dataset metadata and Parquet round-trip. |
|
Series-level accessor to attach metadata semantics. |
|
|
|
|
|
|
|
|
|
|
Lightweight container for harmonized column metadata. |
- class semsynth.semmap.CodeBook(hasTopConcept: 'Optional[List[CodeConcept]]' = None, source: 'Optional[str]' = None)
Bases:
RDFMixin- hasTopConcept: List[CodeConcept] | None = None
- source: str | None = None
- class semsynth.semmap.CodeConcept(notation: 'Optional[str]' = None, prefLabel: 'Optional[str]' = None, *, exactMatch: 'Optional[List[str]]' = None, closeMatch: 'Optional[List[str]]' = None, broadMatch: 'Optional[List[str]]' = None, narrowMatch: 'Optional[List[str]]' = None, relatedMatch: 'Optional[List[str]]' = None)
Bases:
SkosMappings- notation: str | None = None
- prefLabel: str | None = None
- class semsynth.semmap.Column(name: 'str', titles: 'str | list[str] | None' = None, description: 'Optional[str]' = None, identifier: 'Optional[str]' = None, about: 'Optional[str]' = None, hadRole: 'Optional[str]' = None, defaultValue: 'Optional[Any]' = None, columnProperty: 'Optional[ColumnProperty]' = None, summaryStatistics: 'Optional[SummaryStatistics]' = None)
Bases:
RDFMixin- about: str | None = None
- columnProperty: ColumnProperty | None = None
- defaultValue: Any | None = None
- description: str | None = None
- hadRole: str | None = None
- identifier: str | None = None
- name: str
- summaryStatistics: SummaryStatistics | None = None
- titles: str | list[str] | None = None
- class semsynth.semmap.ColumnProperty(summaryStatistics: 'Optional[SummaryStatistics]' = None, unitText: 'Optional[str]' = None, hasUnit: 'Optional[Unit]' = None, source: 'Optional[str]' = None, hasCodeBook: 'Optional[CodeBook]' = None, hasVariable: 'str | CodeConcept | None' = None, *, exactMatch: 'Optional[List[str]]' = None, closeMatch: 'Optional[List[str]]' = None, broadMatch: 'Optional[List[str]]' = None, narrowMatch: 'Optional[List[str]]' = None, relatedMatch: 'Optional[List[str]]' = None)
Bases:
SkosMappings,RDFMixin- hasVariable: str | CodeConcept | None = None
- source: str | None = None
- summaryStatistics: SummaryStatistics | None = None
- unitText: str | None = None
- class semsynth.semmap.DatasetSchema(columns: 'List[Column]')
Bases:
RDFMixin
- class semsynth.semmap.Metadata(datasetSchema: 'DatasetSchema', summaryStatistics: 'Optional[SummaryStatistics]' = None, title: 'Optional[str]' = None, description: 'Optional[str]' = None, abstract: 'Optional[str]' = None, purpose: 'Optional[str]' = None, landingPage: 'Optional[str]' = None, tableOfContents: 'Optional[str]' = None, citation: 'Optional[Any]' = None, provider: 'Optional[str]' = None, identifier: 'Optional[Any]' = None, funding: 'Optional[Any]' = None, populationType: 'Optional[Any]' = None, accessRights: 'Optional[Any]' = None)
Bases:
RDFMixin- abstract: str | None = None
- accessRights: Any | None = None
- citation: Any | None = None
- datasetSchema: DatasetSchema
- description: str | None = None
- classmethod from_dcat_dsv(payload: Mapping[str, Any]) Metadata
Build a
Metadatainstance from a DCAT/DSV JSON-LD payload.- Parameters:
payload – Raw JSON-LD mapping created by
semsynth.dataproviders.uciml.- Returns:
Parsed metadata with dataset- and column-level attributes populated.
- funding: Any | None = None
- identifier: Any | None = None
- landingPage: str | None = None
- populationType: Any | None = None
- provider: str | None = None
- purpose: str | None = None
- summaryStatistics: SummaryStatistics | None = None
- tableOfContents: str | None = None
- title: str | None = None
- to_jsonld(with_context: bool = False) Dict[str, Any] | None
Serialize the object to a JSON-LD-compatible mapping.
- Parameters:
with_context (bool) – Whether to include the
@contextsection.include_extra (bool) – Whether to emit unknown fields captured during deserialization.
- Returns:
JSON-LD representation of the object.
- Return type:
dict
Examples
person = Person(id="ex:alice", name="Alice") payload = person.to_jsonld()
- to_privacy_frame(inferred: Mapping[str, str]) DataFrame
Build a privacy metadata dataframe from SemMap content.
The privacy metrics expect roles such as
qiandsensitiveand a coarse type mapping (numeric/categorical/datetime). This helper normalizes SemMap roles to those expectations and usesstatisticalDataTypeor codebooks to infer the variable types, falling back to the providedinferredmapping when semantics are absent.- Parameters:
inferred – Mapping of column names to inferred types (
discreteorcontinuous) used as fallback when semantics are missing.- Returns:
Dataframe with
variable,roleandtypecolumns.
- update_completeness_from_missingness(df: DataFrame, missingness_model: Any | None) None
Refresh completeness and missing-value annotations based on fitted models.
- Parameters:
df – Dataframe used to compute dataset completeness.
missingness_model – Optional
DataFrameMissingnessModelinstance containing per-column missingness probabilities.
- class semsynth.semmap.SemMapFrameAccessor(df: DataFrame)
Bases:
objectDataFrame-level accessor for dataset metadata and Parquet round-trip.
- from_jsonld(metadata: str | dict[str, Any], *, with_context: bool = False, convert_pint: bool = True) SemMapFrameAccessor
Attach dataset metadata and column schema from a JSON object.
- static read_parquet(path: str, *, convert_pint: bool = True, **pq_kwargs) DataFrame
Read Parquet and restore semantics + pint units.
- to_jsonld(with_context: bool = False) Dict[str, Any] | None
- to_parquet(path: str, *, with_context: bool = False, index: bool = False, **pq_kwargs) None
Write Parquet with semantics stored in Arrow schema and fields.
- class semsynth.semmap.SemMapSeriesAccessor(s: Series)
Bases:
objectSeries-level accessor to attach metadata semantics.
- from_jsonld(metadata: Dict[str, Any], convert_pint: bool = True) SemMapSeriesAccessor
Attach column semantics from a JSON-LD payload.
- Parameters:
metadata – Column-level JSON-LD mapping.
convert_pint – Whether to attempt pint dtype conversion.
- Returns:
Accessor instance for chaining.
- set_categorical(name: str, label: str, *, codes: dict[int | str, str], scheme_source_iri: str | None = None, source_iri: str | None = None) SemMapSeriesAccessor
Attach categorical variable metadata (integer-coded or strings).
- set_numeric(name: str, label: str, *, unit_text: str | None = None, ucum_code: str | None = None, qudt_unit_iri: str | None = None, source_iri: str | None = None, convert_to_pint: bool = True) SemMapSeriesAccessor
Attach numeric variable metadata and (optionally) convert dtype to pint.
- to_jsonld(with_context: bool = False) Dict[str, Any] | None
- class semsynth.semmap.SkosMappings(*, exactMatch: 'Optional[List[str]]' = None, closeMatch: 'Optional[List[str]]' = None, broadMatch: 'Optional[List[str]]' = None, narrowMatch: 'Optional[List[str]]' = None, relatedMatch: 'Optional[List[str]]' = None)
Bases:
RDFMixin- broadMatch: List[str] | None = None
- closeMatch: List[str] | None = None
- exactMatch: List[str] | None = None
- narrowMatch: List[str] | None = None
- class semsynth.semmap.StatisticalDataType(value)
Bases:
str,Enum- Interval = 'dsv:IntervalDataType'
- Nominal = 'dsv:NominalDataType'
- Numerical = 'dsv:NumericalDataType'
- Ordinal = 'dsv:OrdinalDataType'
- Ratio = 'dsv:RatioDataType'
- class semsynth.semmap.SummaryStatistics(statisticalDataType: 'Optional[StatisticalDataType]' = None, columnCompleteness: 'Optional[float]' = None, datasetCompleteness: 'Optional[float]' = None, numberOfRows: 'Optional[int]' = None, numberOfColumns: 'Optional[int]' = None, missingValueFormat: 'Optional[str]' = None, meanValue: 'Optional[float]' = None, medianValue: 'Optional[float]' = None, minimum: 'Optional[float]' = None, maximum: 'Optional[float]' = None)
Bases:
RDFMixin- columnCompleteness: float | None = None
- datasetCompleteness: float | None = None
- maximum: float | None = None
- meanValue: float | None = None
- medianValue: float | None = None
- minimum: float | None = None
- missingValueFormat: str | None = None
- numberOfColumns: int | None = None
- numberOfRows: int | None = None
- statisticalDataType: StatisticalDataType | None = None
- class semsynth.semmap.Unit(*, exactMatch: List[str] | None = None, closeMatch: List[str] | None = None, broadMatch: List[str] | None = None, narrowMatch: List[str] | None = None, relatedMatch: List[str] | None = None)
Bases:
SkosMappings- ucumCode: str | None = None
- class semsynth.semmap.VariableDescriptor(name: str, description: str | None = None, role: str | None = None, unit: str | None = None)
Bases:
objectLightweight container for harmonized column metadata.
- as_dict() Dict[str, str | None]
Convert the descriptor to a plain mapping.
- description: str | None = None
- name: str
- role: str | None = None
- unit: str | None = None
- semsynth.semmap.get_column_name(entry: Mapping[str, Any], *, extra_keys: Sequence[str] = ()) str | None
Return the first non-empty column name found in a JSON-LD column entry.
- semsynth.semmap.modeling_role(node: Mapping[str, Any] | Column | None, *, default: str = 'predictor') str
Return a simplified role label for modeling contexts.
- semsynth.semmap.normalize_role(raw: str | None) str
Normalize a raw role string into a canonical privacy role label.
- semsynth.semmap.normalize_variable_descriptors(variables: Iterable[Mapping[str, Any]]) List[VariableDescriptor]
Normalize a collection of raw variable dictionaries into descriptors.