semsynth.umap_utils

Utilities for building and rendering UMAP embeddings lazily.

Functions

build_umap(df, discrete_cols, ...[, ...])

Fit a UMAP model on a sample of the dataset.

dataclass([cls, init, repr, eq, order, ...])

Add dunder methods based on the fields defined in the class.

pick_color_labels(series)

Convert a categorical series into numeric labels for coloring.

plot_umap(embedding, outfile, title[, ...])

Render the embedding to disk and return axis limits.

transform_with_umap(art, df)

Project new data with the fitted UMAP model.

Classes

Any(*args, **kwargs)

Special type indicating an unconstrained type.

Path(*args, **kwargs)

PurePath subclass that can make system calls.

UMAPArtifacts(preproc, umap_model, ...)

Artifacts generated when fitting a UMAP projection.

class semsynth.umap_utils.UMAPArtifacts(preproc: Any, umap_model: Any, sample_idx: np.ndarray, embedding: np.ndarray, label_mapping: Dict[Any, int] | None, color_labels: 'np.ndarray' | None)

Bases: object

Artifacts generated when fitting a UMAP projection.

color_labels: 'np.ndarray' | None
embedding: np.ndarray
label_mapping: Dict[Any, int] | None
preproc: Any
sample_idx: np.ndarray
umap_model: Any
semsynth.umap_utils.build_umap(df: pd.DataFrame, discrete_cols: List[str], continuous_cols: List[str], color_series: 'pd.Series' | None, rng: np.random.Generator, random_state: int = 42, max_sample: int = 1000, n_neighbors: int = 30, min_dist: float = 0.1, n_components: int = 2) UMAPArtifacts

Fit a UMAP model on a sample of the dataset.

semsynth.umap_utils.pick_color_labels(series: 'pd.Series' | None) Tuple['np.ndarray' | None, Dict[Any, int] | None]

Convert a categorical series into numeric labels for coloring.

semsynth.umap_utils.plot_umap(embedding: np.ndarray, outfile: str, title: str, color_labels: 'np.ndarray' | None = None, lims: Tuple[Tuple[float, float], Tuple[float, float]] | None = None) Tuple[Tuple[float, float], Tuple[float, float]]

Render the embedding to disk and return axis limits.

semsynth.umap_utils.transform_with_umap(art: UMAPArtifacts, df: pd.DataFrame) np.ndarray

Project new data with the fitted UMAP model.