Centroid Datamap

Temporal Mapper constructs a graph which does not have an inherent visualization. Moreover, if your data has \(d\) semantic dimensions, then the graph ‘naturally’ lives in \(d+1\) dimensions when including time.

A centroid datamap of a TemporalMapper is a 2d plot where each vertex is plotted on the centroid of its constituent points.

Let’s demonstrate a centroid datamap by fitting a TemporalMapper to a small dataset of 10,000 arXiv machine learning papers. The paper’s titles and abstracts were concatenated and embedded using the sentence transformer all-mpnet-base-v2, and then reduced to 2D with UMAP.

[1]:
import temporalmapper as tm
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import requests, io
from sklearn.cluster import DBSCAN
from fast_hdbscan import HDBSCAN
import datamapplot as dmp
[2]:
response = requests.get(
    'https://github.com/TutteInstitute/temporal-mapper/raw/refs/heads/docs/docs/data/ai_arxiv_coordinates.npy'
)
map_data = np.load(io.BytesIO(response.content))

response = requests.get(
    'https://github.com/TutteInstitute/temporal-mapper/raw/refs/heads/docs/docs/data/ai_arxiv_data.feather'
)
df = pd.read_feather(io.BytesIO(response.content))

df.head()
[2]:
title abstract id created authors arxiv doi
0 automated rating of recorded classroom present... effective presentation skills can help to succ... 1801.00453 2018-01-01 [akzharkyn izbassarova, aidana irmanova, a. p.... cs.ai 10.1109/icacci.2017.8125872
1 accelerating deep learning with memcomputing restricted boltzmann machines (rbms) and their... 1801.00512 2018-01-01 [haik manukian, fabio l. traversa, massimilian... cs.ai
2 accelerating deep learning with memcomputing restricted boltzmann machines (rbms) and their... 1801.00512 2018-01-01 [haik manukian, fabio l. traversa, massimilian... cs.lg
3 accurate reconstruction of image stimuli from ... in neuroscience, all kinds of computation mode... 1801.00602 2018-01-02 [kai qiao, chi zhang, linyuan wang, bin yan, j... cs.ai
4 deep learning: a critical appraisal although deep learning has historical roots go... 1801.00631 2018-01-02 [gary marcus] cs.lg
[3]:
# Compute a time column T which is the number of days since Jan 01, 2018.
def date_to_T(date):
    d0 = pd.Timestamp('2018-01-01')
    delta = date-d0
    return delta.days

df["date"] = pd.to_datetime(df["created"])
df["T"] = df["date"].apply(
    lambda x: date_to_T(x)
)
time = df["T"].to_numpy().reshape(-1,1)
X = np.concatenate([map_data, time],axis=1)
[18]:
clusterer = HDBSCAN(
    cluster_selection_method='eom',
    min_cluster_size=20,
)
mapper = tm.TemporalMapper(
    time,
    map_data,
    clusterer = clusterer,
    slice_method = 'data',
    n_slices = 12,
    kernel=tm.kernels.square
)
mapper.build()
/tmp/ipykernel_2456827/4122698149.py:13: DeprecationWarning: build() is deprecated and will be removed in a future version. Please use fit() instead for sklearn-compatible API.
  mapper.build()
[18]:
TemporalMapper(clusterer=HDBSCAN(min_cluster_size=20),
               data=array([[ 4.8223996, -1.5714529],
       [ 1.2089204, -3.8459172],
       [ 1.2019355, -3.8558466],
       ...,
       [11.055405 , 10.082938 ],
       [11.048662 , 10.091122 ],
       [11.049046 , 10.090183 ]], shape=(10000, 2), dtype=float32),
               n_slices=12, slice_method='data',
               time=array([[  0],
       [  0],
       [  0],
       ...,
       [480],
       [480],
       [480]], shape=(10000, 1)))
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Now that we’ve fit a temporal Mapper, we can use centroid_datamap from temporal_mapper.utilities

[19]:
tm.plotting.centroid_datamap(mapper)
[19]:
<Axes: >
_images/centroid-datamap_7_1.png

The optional ax parameter can be passed to add a centroid datamap to an existing matplotlib axis. In particular, the centroid_datamap is designed to be added on top of a DataMapPlot plot of the cluster points, if one so desires:

[20]:
import datamapplot as dmp

fig, ax = dmp.create_plot(
    mapper.data,
    df['arxiv'].to_list(),
)
tm.plotting.centroid_datamap(mapper, ax=ax, bundle=True)
fig.set_figwidth(10)
fig.set_figheight(8)
ax.set_title("Topics in ar$\chi$iv AI papers, 2018-2019")
plt.show()
_images/centroid-datamap_9_0.png

By default, the centroid datamap matches colours with the temporal plot, to make it easier to relate the plots when viewing them side-by-side. To help convey the time information, vertices which correspond to earlier slices in the graph have their colour desaturated. This is a very subtle effect, so if you want to make the order of the vertices more obvious you can pass the option node_colouring = 'override'. This overrides the datamapplot colours, instead colouring vertices from dark to light as you move from the beginning to end of the time range.

[21]:
fig, (ax1, ax2) = plt.subplots(1,2)

tm.plotting.centroid_datamap(
    mapper,
    ax=ax1,
    node_colouring = 'desaturate',
    bundle=False
)
ax1.set_title("Default node colouring")
tm.plotting.centroid_datamap(
    mapper,
    ax=ax2,
    node_colouring = 'override',
    bundle=False,
)
ax2.set_title("node_colouring = 'override'")
fig.set_figwidth(10)
fig.set_figheight(5)
plt.show()
_images/centroid-datamap_11_0.png

The full set of parameters to customize temporalmapper.plotting.centroid_datamap is:

Name

Type

Default

Description

mapper

temporal_mapper.TemporalMapper

The temporal mapper object to plot.

ax

matplotlib.axes

None

Matplotlib axis to draw on.

node_colouring

'desaturate' or 'override'

'desaturate'

Determines how to incorporate temporal information in the color. The desaturate option keeps semantic colouring from datamapplot but desaturates points further back in time. The override option ignores semantic colouring and colors points only based on their time value.

vertices

list

None

List of nodes in mapper.graph to include in the plot.

edge_labels

dict

None

Dictionary of labels where edge_labels[e] is a string used to label edge e.

edge_scaling

float

1

Scales the thickness of edges; larger values produce thicker edges.

bundle

bool

True

If True, bundle the edges of the graph using datashader’s hammer_bundle function.

node_kwargs

dict

{}

Keyword arguments passed to networkx.draw_networkx_nodes().

edge_kwargs

dict

{}

Keyword arguments passed to networkx.draw_networkx_edges().