semsynth.metadata
Functions
|
Add dunder methods based on the fields defined in the class. |
|
Return an object to identify dataclass fields. |
|
Best-effort retrieval of UCI ML variable descriptions. |
|
Normalize a collection of raw variable dictionaries into descriptors. |
Classes
|
Special type indicating an unconstrained type. |
|
|
|
|
|
PurePath subclass that can make system calls. |
|
Provide JSON-LD serialization helpers for dataclasses. |
- class semsynth.metadata.DCATDataset(identifier: 'str', title: 'str', description: 'str', publisher: 'Optional[str]' = None, keywords: 'List[str]' = <factory>, issued: 'Optional[str]' = None, modified: 'Optional[str]' = None, language: 'Optional[str]' = None, distributions: 'List[DCATDistribution]' = <factory>)
Bases:
RDFMixin- description: str
- distributions: List[DCATDistribution]
- identifier: str
- issued: str | None = None
- keywords: List[str]
- language: str | None = None
- modified: str | None = None
- publisher: str | None = None
- title: str
- class semsynth.metadata.DCATDistribution(title: 'str', access_url: 'Optional[str]' = None, download_url: 'Optional[str]' = None, media_type: 'Optional[str]' = None)
Bases:
RDFMixin- access_url: str | None = None
- download_url: str | None = None
- media_type: str | None = None
- title: str
- semsynth.metadata.get_uciml_variable_descriptions(dataset_id: int) Dict[str, str]
Best-effort retrieval of UCI ML variable descriptions.
The lookup first checks for a cached UCI ML payload under
uciml-cache/{dataset_id}.json. If no cached metadata is found or the payload does not contain variable descriptions, it falls back to theucimlrepometadata API.- Parameters:
dataset_id – UCI ML dataset identifier used for cache lookup and API fallback.
- Returns:
Mapping of column name to free-text description. Missing or empty descriptions are filtered out.