{ "cells": [ { "cell_type": "markdown", "id": "34001c4b-fd7b-49a9-8389-35c0c6d6ffba", "metadata": {}, "source": [ "## Centroid Datamap\n", "Temporal Mapper constructs a graph which does not have an inherent visualization. Moreover, if your data has $d$ semantic dimensions, then the graph 'naturally' lives in $d+1$ dimensions when including time.\n", "\n", "A centroid datamap of a `TemporalMapper` is a 2d plot where each vertex is plotted on the centroid of its constituent points. " ] }, { "cell_type": "markdown", "id": "1bead087-1e56-4d8f-9221-d496f6f26c83", "metadata": {}, "source": [ "Let's demonstrate a centroid datamap by fitting a `TemporalMapper` to a small dataset of 10,000 arXiv machine learning papers. The paper's titles and abstracts were concatenated and embedded using the sentence transformer [all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2), and then reduced to 2D with UMAP." ] }, { "cell_type": "code", "execution_count": 1, "id": "52fe5ea8-32bd-4750-997c-90e7a32adc57", "metadata": { "execution": { "iopub.execute_input": "2026-03-09T20:52:12.473194Z", "iopub.status.busy": "2026-03-09T20:52:12.473078Z", "iopub.status.idle": "2026-03-09T20:52:52.715745Z", "shell.execute_reply": "2026-03-09T20:52:52.714849Z", "shell.execute_reply.started": "2026-03-09T20:52:12.473180Z" } }, "outputs": [], "source": [ "import temporalmapper as tm\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import requests, io\n", "from sklearn.cluster import DBSCAN\n", "from fast_hdbscan import HDBSCAN\n", "import datamapplot as dmp" ] }, { "cell_type": "code", "execution_count": 2, "id": "cc6ea0fb-62a7-434d-baf7-c3e97e5adffd", "metadata": { "execution": { "iopub.execute_input": "2026-03-09T20:52:52.716598Z", "iopub.status.busy": "2026-03-09T20:52:52.716232Z", "iopub.status.idle": "2026-03-09T20:52:53.738737Z", "shell.execute_reply": "2026-03-09T20:52:53.738058Z", "shell.execute_reply.started": "2026-03-09T20:52:52.716580Z" } }, "outputs": [ { "data": { "text/html": [ "
| \n", " | title | \n", "abstract | \n", "id | \n", "created | \n", "authors | \n", "arxiv | \n", "doi | \n", "
|---|---|---|---|---|---|---|---|
| 0 | \n", "automated rating of recorded classroom present... | \n", "effective presentation skills can help to succ... | \n", "1801.00453 | \n", "2018-01-01 | \n", "[akzharkyn izbassarova, aidana irmanova, a. p.... | \n", "cs.ai | \n", "10.1109/icacci.2017.8125872 | \n", "
| 1 | \n", "accelerating deep learning with memcomputing | \n", "restricted boltzmann machines (rbms) and their... | \n", "1801.00512 | \n", "2018-01-01 | \n", "[haik manukian, fabio l. traversa, massimilian... | \n", "cs.ai | \n", "\n", " |
| 2 | \n", "accelerating deep learning with memcomputing | \n", "restricted boltzmann machines (rbms) and their... | \n", "1801.00512 | \n", "2018-01-01 | \n", "[haik manukian, fabio l. traversa, massimilian... | \n", "cs.lg | \n", "\n", " |
| 3 | \n", "accurate reconstruction of image stimuli from ... | \n", "in neuroscience, all kinds of computation mode... | \n", "1801.00602 | \n", "2018-01-02 | \n", "[kai qiao, chi zhang, linyuan wang, bin yan, j... | \n", "cs.ai | \n", "\n", " |
| 4 | \n", "deep learning: a critical appraisal | \n", "although deep learning has historical roots go... | \n", "1801.00631 | \n", "2018-01-02 | \n", "[gary marcus] | \n", "cs.lg | \n", "\n", " |
TemporalMapper(clusterer=HDBSCAN(min_cluster_size=20),\n",
" data=array([[ 4.8223996, -1.5714529],\n",
" [ 1.2089204, -3.8459172],\n",
" [ 1.2019355, -3.8558466],\n",
" ...,\n",
" [11.055405 , 10.082938 ],\n",
" [11.048662 , 10.091122 ],\n",
" [11.049046 , 10.090183 ]], shape=(10000, 2), dtype=float32),\n",
" n_slices=12, slice_method='data',\n",
" time=array([[ 0],\n",
" [ 0],\n",
" [ 0],\n",
" ...,\n",
" [480],\n",
" [480],\n",
" [480]], shape=(10000, 1)))In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. | \n", " | time | \n", "array([[ 0],...pe=(10000, 1)) | \n", "
| \n", " | data | \n", "array([[ 4.82...dtype=float32) | \n", "
| \n", " | clusterer | \n", "HDBSCAN(min_cluster_size=20) | \n", "
| \n", " | n_slices | \n", "12 | \n", "
| \n", " | n_neighbors | \n", "5 | \n", "
| \n", " | overlap | \n", "0.5 | \n", "
| \n", " | inclusion_threshold | \n", "0.01 | \n", "
| \n", " | slice_method | \n", "'data' | \n", "
| \n", " | density_based | \n", "True | \n", "
| \n", " | kernel | \n", "<function squ...x798f7d54b240> | \n", "
| \n", " | kernel_params | \n", "None | \n", "
| \n", " | verbose | \n", "False | \n", "
HDBSCAN(min_cluster_size=20)
| \n", " | min_cluster_size | \n", "20 | \n", "
| \n", " | min_samples | \n", "None | \n", "
| \n", " | cluster_selection_method | \n", "'eom' | \n", "
| \n", " | allow_single_cluster | \n", "False | \n", "
| \n", " | max_cluster_size | \n", "inf | \n", "
| \n", " | cluster_selection_epsilon | \n", "0.0 | \n", "
| \n", " | cluster_selection_persistence | \n", "0.0 | \n", "
| \n", " | semi_supervised | \n", "False | \n", "
| \n", " | ss_algorithm | \n", "'bc' | \n", "