Interactive Growth Map

A growth map of a Mapper graph is a visualization that displays each topics relative size and growth. It is inspired by a similar visualization often used to display stock market data.

Let’s demonstrate a growth map by fitting a TemporalMapper to a small dataset of 10,000 arXiv machine learning papers. The paper’s titles and abstracts were concatenated and embedded using the sentence transformer all-mpnet-base-v2, and then reduced to 2D with UMAP.

[1]:
import temporalmapper as tm
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import requests, io
from sklearn.cluster import DBSCAN
from fast_hdbscan import HDBSCAN
import datamapplot as dmp
response = requests.get(
    'https://github.com/TutteInstitute/temporal-mapper/raw/refs/heads/docs/docs/data/ai_arxiv_coordinates.npy'
)
map_data = np.load(io.BytesIO(response.content))

response = requests.get(
    'https://github.com/TutteInstitute/temporal-mapper/raw/refs/heads/docs/docs/data/ai_arxiv_data.feather'
)
df = pd.read_feather(io.BytesIO(response.content))

df.head()
[1]:
title abstract id created authors arxiv doi
0 automated rating of recorded classroom present... effective presentation skills can help to succ... 1801.00453 2018-01-01 [akzharkyn izbassarova, aidana irmanova, a. p.... cs.ai 10.1109/icacci.2017.8125872
1 accelerating deep learning with memcomputing restricted boltzmann machines (rbms) and their... 1801.00512 2018-01-01 [haik manukian, fabio l. traversa, massimilian... cs.ai
2 accelerating deep learning with memcomputing restricted boltzmann machines (rbms) and their... 1801.00512 2018-01-01 [haik manukian, fabio l. traversa, massimilian... cs.lg
3 accurate reconstruction of image stimuli from ... in neuroscience, all kinds of computation mode... 1801.00602 2018-01-02 [kai qiao, chi zhang, linyuan wang, bin yan, j... cs.ai
4 deep learning: a critical appraisal although deep learning has historical roots go... 1801.00631 2018-01-02 [gary marcus] cs.lg
[3]:
# Compute a time column T which is the number of days since Jan 01, 2018.
def date_to_T(date):
    d0 = pd.Timestamp('2018-01-01')
    delta = date-d0
    return delta.days

df["date"] = pd.to_datetime(df["created"])
df["T"] = df["date"].apply(
    lambda x: date_to_T(x)
)
time = df["T"].to_numpy().reshape(-1,1)

clusterer = HDBSCAN(
    cluster_selection_method='eom',
    min_cluster_size=20,
)
mapper = tm.TemporalMapper(
    clusterer = clusterer,
    slice_method = 'data',
    n_slices = 8,
    kernel=tm.kernels.square
)
X = np.concatenate([map_data, time],axis=1)
mapper.fit(X)
[3]:
TemporalMapper(clusterer=HDBSCAN(min_cluster_size=20),
               data=array([[ 4.82239962, -1.57145286],
       [ 1.20892036, -3.84591722],
       [ 1.20193553, -3.85584664],
       ...,
       [11.05540466, 10.08293819],
       [ 8.87438393, -1.76646364],
       [11.04866219, 10.09112167]], shape=(10000, 2)),
               n_slices=8, slice_method='data',
               time=array([  0.,   0.,   0., ..., 480., 480., 480.], shape=(10000,)))
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Now that we’ve fit a Mapper graph, we can use tm.plotting.growth_map to generate a growth map. The size of each square indicates the number of data points in the corresponding topic, and it’s colour represents the growth of that topic.

[4]:
tm.plotting.growth_map(mapper)
_images/growth-map_5_0.png

By default, growth_map displays topics across the entire time range, but we can pass an index parameter to show only the topics at a certain time slice.

[6]:
tm.plotting.growth_map(mapper, index=3)
_images/growth-map_7_0.png
[ ]: