{ "cells": [ { "cell_type": "markdown", "id": "b0a922e2-9e00-4c60-a09a-36c3774ac637", "metadata": {}, "source": [ "# Temporal Topic Modelling with Temporal Mapper and Toponymy\n", "In this notebook, we will go though an example of how to use [Temporal Mapper](https://github.com/TutteInstitute/temporal-mapper) and \n", "[Toponymy](https://github.com/TutteInstitute/toponymy) together to create a temporal topic model of a corpus of documents. Be warned that this is an experimental, evolving workflow, so what you're about to see is not pretty.\n", "\n", "The dataset we will use is the [United Nations General Debate Corpus](https://www.kaggle.com/datasets/unitednations/un-general-debates)\n", "which consists of transcripts of the United Nations general debate from 1970 to 2015. I've preprocessed the dataset by chunking the speeches\n", "and then embedding the chunks with a sentence-transformer and reducing them to 2D using UMAP. Let's fetch the dataset from the HuggingFace Hub: " ] }, { "cell_type": "code", "execution_count": 1, "id": "d40829de-437b-497c-9b35-a06d539a6081", "metadata": { "execution": { "iopub.execute_input": "2026-03-12T18:06:24.572759Z", "iopub.status.busy": "2026-03-12T18:06:24.572638Z", "iopub.status.idle": "2026-03-12T18:06:27.690448Z", "shell.execute_reply": "2026-03-12T18:06:27.689748Z", "shell.execute_reply.started": "2026-03-12T18:06:24.572746Z" } }, "outputs": [ { "data": { "text/html": [ "
| \n", " | session | \n", "year | \n", "country | \n", "text | \n", "chunk | \n", "information_weight | \n", "embedding | \n", "reduced | \n", "
|---|---|---|---|---|---|---|---|---|
| 0 | \n", "44 | \n", "1989 | \n", "MDV | \n", "It is indeed a pleasure for me and the member... | \n", "It is indeed a pleasure for me and the member... | \n", "29.816833 | \n", "[-0.009967008, 0.028972907, 0.014457686, 0.022... | \n", "[9.491389, 7.566777] | \n", "
| 0 | \n", "44 | \n", "1989 | \n", "MDV | \n", "It is indeed a pleasure for me and the member... | \n", "Developments in southern Africa, and more part... | \n", "23.011437 | \n", "[0.050711717, 0.09013895, 0.0096756825, -0.016... | \n", "[6.820655, 4.6092267] | \n", "
| 0 | \n", "44 | \n", "1989 | \n", "MDV | \n", "It is indeed a pleasure for me and the member... | \n", "Positive strides have been taken towards the s... | \n", "21.294486 | \n", "[0.07367871, 0.045660958, 0.020714706, -0.0277... | \n", "[1.6624191, 2.9213235] | \n", "
| 0 | \n", "44 | \n", "1989 | \n", "MDV | \n", "It is indeed a pleasure for me and the member... | \n", "The process of reunification of peoples should... | \n", "21.153001 | \n", "[0.057730194, 0.083791696, 0.012951973, -0.019... | \n", "[-0.5313732, 0.14173953] | \n", "
| 1 | \n", "44 | \n", "1989 | \n", "FIN | \n", "\\nMay I begin by congratulating you. Sir, on ... | \n", "In the process of preparing both the developme... | \n", "25.820817 | \n", "[-0.012931386, -0.017169893, 0.012649347, -0.0... | \n", "[3.1583965, 11.189846] | \n", "