semsynth.metadata

Functions

dataclass([cls, init, repr, eq, order, ...])

Add dunder methods based on the fields defined in the class.

field(*[, default, default_factory, init, ...])

Return an object to identify dataclass fields.

get_uciml_variable_descriptions(dataset_id)

Best-effort retrieval of UCI ML variable descriptions.

normalize_variable_descriptors(variables)

Normalize a collection of raw variable dictionaries into descriptors.

Classes

Any(*args, **kwargs)

Special type indicating an unconstrained type.

DCATDataset(identifier, title, description, ...)

DCATDistribution(title[, access_url, ...])

Path(*args, **kwargs)

PurePath subclass that can make system calls.

RDFMixin()

Provide JSON-LD serialization helpers for dataclasses.

class semsynth.metadata.DCATDataset(identifier: 'str', title: 'str', description: 'str', publisher: 'Optional[str]' = None, keywords: 'List[str]' = <factory>, issued: 'Optional[str]' = None, modified: 'Optional[str]' = None, language: 'Optional[str]' = None, distributions: 'List[DCATDistribution]' = <factory>)

Bases: RDFMixin

description: str
distributions: List[DCATDistribution]
identifier: str
issued: str | None = None
keywords: List[str]
language: str | None = None
modified: str | None = None
publisher: str | None = None
title: str
class semsynth.metadata.DCATDistribution(title: 'str', access_url: 'Optional[str]' = None, download_url: 'Optional[str]' = None, media_type: 'Optional[str]' = None)

Bases: RDFMixin

access_url: str | None = None
download_url: str | None = None
media_type: str | None = None
title: str
semsynth.metadata.get_uciml_variable_descriptions(dataset_id: int) Dict[str, str]

Best-effort retrieval of UCI ML variable descriptions.

The lookup first checks for a cached UCI ML payload under uciml-cache/{dataset_id}.json. If no cached metadata is found or the payload does not contain variable descriptions, it falls back to the ucimlrepo metadata API.

Parameters:

dataset_id – UCI ML dataset identifier used for cache lookup and API fallback.

Returns:

Mapping of column name to free-text description. Missing or empty descriptions are filtered out.