semsynth.umap_utils
Utilities for building and rendering UMAP embeddings lazily.
Functions
|
Fit a UMAP model on a sample of the dataset. |
|
Add dunder methods based on the fields defined in the class. |
|
Convert a categorical series into numeric labels for coloring. |
|
Render the embedding to disk and return axis limits. |
|
Project new data with the fitted UMAP model. |
Classes
|
Special type indicating an unconstrained type. |
|
PurePath subclass that can make system calls. |
|
Artifacts generated when fitting a UMAP projection. |
- class semsynth.umap_utils.UMAPArtifacts(preproc: Any, umap_model: Any, sample_idx: np.ndarray, embedding: np.ndarray, label_mapping: Dict[Any, int] | None, color_labels: 'np.ndarray' | None)
Bases:
objectArtifacts generated when fitting a UMAP projection.
- color_labels: 'np.ndarray' | None
- embedding: np.ndarray
- label_mapping: Dict[Any, int] | None
- preproc: Any
- sample_idx: np.ndarray
- umap_model: Any
- semsynth.umap_utils.build_umap(df: pd.DataFrame, discrete_cols: List[str], continuous_cols: List[str], color_series: 'pd.Series' | None, rng: np.random.Generator, random_state: int = 42, max_sample: int = 1000, n_neighbors: int = 30, min_dist: float = 0.1, n_components: int = 2) UMAPArtifacts
Fit a UMAP model on a sample of the dataset.
- semsynth.umap_utils.pick_color_labels(series: 'pd.Series' | None) Tuple['np.ndarray' | None, Dict[Any, int] | None]
Convert a categorical series into numeric labels for coloring.
- semsynth.umap_utils.plot_umap(embedding: np.ndarray, outfile: str, title: str, color_labels: 'np.ndarray' | None = None, lims: Tuple[Tuple[float, float], Tuple[float, float]] | None = None) Tuple[Tuple[float, float], Tuple[float, float]]
Render the embedding to disk and return axis limits.
- semsynth.umap_utils.transform_with_umap(art: UMAPArtifacts, df: pd.DataFrame) np.ndarray
Project new data with the fitted UMAP model.