{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Tutorial\n", "========\n", "\n", "Čech persistent homology\n", "------------------------\n", "\n", "We start by calculating Čech persistent homology of a point cloud in 4-dimensional Euclidean space. First we import the relevant python packages and generate the sample data used for the examples below. The first data we use are 80 points on the Clifford torus. In order to calculate the persistent homology of a point cloud with an ambient Čech complex, we create a `Persistence` object. By default the plot_persistence method takes as input a\n", "dowker matrix X and returns the persistence diagram of the associated dowker nerve. Alternatively plot_persistence can be used with `dissimilarity='ambient'` to calculate ambient Čech complex. Information about the sizes of the nerves can be obtained by the `cardinality_information` method." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "from dowker_homology.datasets import clifford_torus\n", "from dowker_homology.persistence import Persistence\n", "# generate data\n", "coords = clifford_torus(80)\n", "# calculate and plot persistent homology\n", "pers = Persistence(dissimilarity='ambient')\n", "pers.plot_persistence(X=coords, alpha=0.3, show=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The parameter `coeff_field` is the characteristic p of the coefficient field Z/pZ for computing homology. See [gudhi](http://gudhi.gforge.inria.fr) for more information. The `max_simplex_size` parameter helps by giving an error if the resulting nerve is very large and calculations will take a long time. Then the options are to either set `max_simplex_size` higher and expect some waiting time or to change the interleaving parameters so that calculations finish after reasonably short times.\n", "\n", "Two important optional parameters for Persistence objects are the maximal homology dimension `dimension` and the number `n_samples` of vertices in the nerve. We use a farthest point sampling to reduce the number of vertices." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from dowker_homology.datasets import clifford_torus\n", "from dowker_homology.persistence import Persistence\n", "# generate data\n", "coords = clifford_torus(80)\n", "# calculate and plot persistent homology\n", "pers = Persistence(dimension=2, \n", " n_samples=40, \n", " dissimilarity='ambient')\n", "pers.plot_persistence(X=coords, alpha=0.3, show=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After plotting the persistence diagram the Persistence object contains the variable `homology`, a numpy array with three columns.\n", "The first column gives homological dimension.\n", "The second column gives birth.\n", "The third column gives death.\n", "\n", "\n", "Intrinsic Čech complex\n", "----------------------\n", "Instead of the ambient Čech complex we can compute the intrinsic Čech complex. For the intrinsic Čech complex we can specify all metrics accepted by `scipy.spatial.distance.cdist` or if the metric is precomputed we can specify `'metric'`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from dowker_homology.datasets import clifford_torus\n", "from dowker_homology.persistence import Persistence\n", "# generate data\n", "coords = clifford_torus(80)\n", "# calculate and plot persistent homology\n", "pers = Persistence(dissimilarity='euclidean')\n", "pers.plot_persistence(X=coords, alpha=0.3, show=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Alpha complexes\n", "\n", "For eucliden point clouds in low dimension the alpha complex is the most efficient complex computing Čech homology. This complex can be used by setting `dissimilarity='alpha'`. On a technical note, we have actually implemented the Delaunay-Čech complex." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from dowker_homology.datasets import clifford_torus\n", "from dowker_homology.persistence import Persistence\n", "# generate data\n", "coords = clifford_torus(80)\n", "# calculate and plot persistent homology\n", "pers = Persistence(dissimilarity='alpha')\n", "pers.plot_persistence(X=coords, alpha=0.3, show=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sparse Dowker Nerves\n", "--------------------\n", "\n", "The sparse version of persistent homology from Example 5.6 in Sparse Filtered Nerves is interleaved with Čech persistent homology. The easiest way to use it is to specify the additive and multiplicative interleaving constants." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from dowker_homology.datasets import clifford_torus\n", "from dowker_homology.persistence import Persistence\n", "# generate data\n", "coords = clifford_torus(80)\n", "# calculate and plot persistent homology\n", "pers = Persistence(dissimilarity='ambient', \n", " additive_interleaving=0.1,\n", " multiplicative_interleaving=1.5)\n", "pers.plot_persistence(X=coords, alpha=0.3, show=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that there is a dashed interleaving line in addition to the top dashed line indicating infinity. The interleaving guarantees that all points above this interleaving line will be present in the original persistence diagram as well. \n", "\n", "We can also sparsify the intrinsic Čech complex." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from dowker_homology.datasets import clifford_torus\n", "from dowker_homology.persistence import Persistence\n", "# generate data\n", "coords = clifford_torus(80)\n", "# calculate and plot persistent homology\n", "pers = Persistence(dissimilarity='euclidean', \n", " additive_interleaving=0.2,\n", " multiplicative_interleaving=1.5)\n", "pers.plot_persistence(X=coords, alpha=0.3, show=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For data of high dimension we are able to reduce the size of the nerve drastically by using a multiplicative interleaving. We illustrate this on a dataset consisting of random data in 16 dimensional euclidean space." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "import numpy as np\n", "import urllib.request\n", "from io import BytesIO\n", "from dowker_homology.datasets import clifford_torus\n", "from dowker_homology.persistence import Persistence\n", "\n", "# Download data and transform to numpy array\n", "url = ('https://raw.githubusercontent.com/' +\n", " 'n-otter/PH-roadmap/master/data_sets/' +\n", " 'roadmap_datasets_point_cloud/random_' +\n", " 'point_cloud_50_16_.txt')\n", "\n", "response = urllib.request.urlopen(url)\n", "data = response.read() \n", "text = data.decode('utf-8')\n", "coords = np.genfromtxt(BytesIO(data), delimiter=\" \")\n", "\n", "# calculate and plot persistent homology\n", "pers = Persistence(dimension=7,\n", " dissimilarity='ambient', \n", " multiplicative_interleaving=2.5)\n", "pers.plot_persistence(X=coords, alpha=0.3, show=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Subspace sparsification\n", "-----------------------\n", "In the introduction of Sparse Filtered Nerves we mention the additive interleaving of subspaces with the Hausdorff distance. This additive part can be specified using either the `resolution` or the `n_samples` parameters. If there is no other interleaving, the `resolution` parameter is the intersect of the interleaving line with the y-axis. The parameter `n_samples` specifies how many points to sample in a farthest point sampling. It is convenient, because computing time is low for few points. It can also be used to find an appropriate value for the `resolution` parameter. For a general alpha-interleaving the total interleaving is `alpha(t + resolution)`. Here we use the clifford torus with 2000 points and specify `n_samples` to be 50." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from dowker_homology.datasets import clifford_torus\n", "from dowker_homology.persistence import Persistence\n", "# generate data\n", "coords = clifford_torus(2000)\n", "# calculate and plot persistent homology\n", "pers = Persistence(dissimilarity='euclidean', \n", " multiplicative_interleaving=1.2, \n", " n_samples=50)\n", "pers.plot_persistence(X=coords, alpha=0.3, show=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "General interleavings\n", "---------------------\n", "We can also specify interleavings with an arbitrary translation function α(t). Note that there are no automated checks of the correctness of α(t) so it is the users responsability to make sure that it is indeed a translation function. The following example gives the same result as the example with multiplicative interleaving of 2.5 and an additive interleaving of 0.5." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from dowker_homology.datasets import clifford_torus\n", "from dowker_homology.persistence import Persistence\n", "# generate data\n", "coords = clifford_torus(2000)\n", "# specify translation function\n", "def translation(time):\n", " return 2.5 * time + 0.5\n", "# calculate and plot persistent homology\n", "pers = Persistence(dissimilarity='euclidean', \n", " translation_function=translation,\n", " n_samples=50)\n", "pers.plot_persistence(X=coords, alpha=0.3, show=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "More complex translation functions can also be specified." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from dowker_homology.datasets import clifford_torus\n", "from dowker_homology.persistence import Persistence\n", "# generate data\n", "coords = clifford_torus(2000)\n", "# specify translation function\n", "def translation(time):\n", " return pow(time, 3) + 0.3\n", "# calculate and plot persistent homology\n", "pers = Persistence(dissimilarity='euclidean', \n", " translation_function=translation,\n", " n_samples=50)\n", "pers.plot_persistence(X=coords, alpha=0.3, show=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Dowker dissimilarities\n", "------------------------------\n", "The package `dowker_homology` can compute persistent homology for Dowker dissimilarities defined on finite sets. \n", "\n", "Below we compute persistent homology of the dowker nerve of the function X × G → [0, ∞] specifying the distance between a point x in the Clifford torus X and a point g in a grid G containing X." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from scipy.spatial.distance import cdist\n", "from dowker_homology.datasets import clifford_torus\n", "from dowker_homology.persistence import Persistence\n", "# generate data\n", "coords = clifford_torus(80)\n", "grid = np.stack(np.meshgrid(\n", " np.linspace(-1, 1, 10), \n", " np.linspace(-1, 1, 10), \n", " np.linspace(-1, 1, 10), \n", " np.linspace(-1, 1, 10)), -1).reshape(-1, 4)\n", "dowker_dissimilarity = cdist(coords, grid)\n", "# calculate and plot persistent homology\n", "pers = Persistence(dissimilarity='dowker', \n", " additive_interleaving=0.2,\n", " multiplicative_interleaving=1.5)\n", "pers.plot_persistence(X=dowker_dissimilarity, \n", " alpha=0.3, show=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Inspired by the [persnet software](https://github.com/fmemoli/PersNet), we compute Dowker homology of cyclic networks. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from networkx import cycle_graph\n", "from dowker_homology.datasets import clifford_torus\n", "from dowker_homology.persistence import Persistence\n", "# generate data\n", "graph = cycle_graph(100)\n", "# calculate and plot persistent homology\n", "pers = Persistence(dissimilarity='dowker', \n", " additive_interleaving=2.0,\n", " multiplicative_interleaving=2.0)\n", "pers.plot_persistence(X=graph, alpha=0.3, show=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We next compute the intrinsic Čech complex directly from the distance matrix. In order to take advantage of the triangle inequality, we use dissimilarity='metric'." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from scipy.spatial.distance import cdist\n", "from dowker_homology.datasets import clifford_torus\n", "from dowker_homology.persistence import Persistence\n", "# generate data\n", "coords = clifford_torus(80)\n", "# calculate and plot persistent homology\n", "pers = Persistence(dissimilarity='metric', \n", " additive_interleaving=0.2,\n", " multiplicative_interleaving=1.5)\n", "pers.plot_persistence(X=cdist(coords, coords), \n", " alpha=0.3, show=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Alternative restrictions and truncations\n", "----------------------------------------------\n", "So far we have not talked about how we truncate and sparsify the Dowker dissimilarity. Usually the default methods `truncation_method='dowker'` and `restriction_method='dowker'` work best. Other methods are provided as a reference to compare the different strategies and examples in Sparse Dowker Nerves and Sparse Filtered Nerves.\n", "\n", "We have implemented *truncation methods* described by Cavanna, Jahanseir and Sheehy in A Geometric Perspective on Sparse Filtrations and Definition 6 in Sparse Dowker Nerves, which we termed `'Sheehy'` and the canonical truncation method described in Sparse Dowker Nerves, here called `'Canonical'`.\n", "\n", "The two *restriction methods* implemented here come from Sparse Dowker Nerves and are termed `'Sheehy'` for the method described in Proposition 1 and `'Parent'` for the method described in Definition 8. Note that the `'Sheehy'` restriction is only compatible with the `'Sheehy'` truncation.\n", "\n", "Below we show a short simulation study to compare all different restriction and truncation methods. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "from scipy.spatial.distance import cdist\n", "from dowker_homology.datasets import clifford_torus\n", "from dowker_homology.persistence import Persistence\n", "\n", "# generate data\n", "coords = clifford_torus(40)\n", "dists = cdist(coords, coords, metric='euclidean')\n", "\n", "# set parameters\n", "params = {'dimension': 1, \n", " 'multiplicative_interleaving': 2.5, \n", " 'dissimilarity': 'metric'}\n", "\n", "# initiate persistence objects\n", "pers_no_no = Persistence(truncation_method='no_truncation', \n", " restriction_method='no_restriction', \n", " **params)\n", "pers_no_dowker = Persistence(truncation_method='no_truncation', \n", " restriction_method='dowker', \n", " **params)\n", "pers_no_parent = Persistence(truncation_method='no_truncation', \n", " restriction_method='Parent', \n", " **params)\n", "\n", "pers_dowker_no = Persistence(truncation_method='dowker', \n", " restriction_method='no_restriction', \n", " **params)\n", "pers_dowker_dowker = Persistence(truncation_method='dowker',\n", " restriction_method='dowker', \n", " **params)\n", "pers_dowker_parent = Persistence(truncation_method='dowker', \n", " restriction_method='Parent', \n", " **params)\n", "\n", "pers_canonical_no = Persistence(truncation_method='Canonical', \n", " restriction_method='no_restriction',\n", " **params)\n", "pers_canonical_dowker = Persistence(truncation_method='Canonical',\n", " restriction_method='dowker', \n", " **params)\n", "pers_canonical_parent = Persistence(truncation_method='Canonical', \n", " restriction_method='Parent', \n", " **params)\n", "\n", "pers_sheehy_no = Persistence(truncation_method='Sheehy',\n", " restriction_method='no_restriction', \n", " **params)\n", "pers_sheehy_dowker = Persistence(truncation_method='Sheehy', \n", " restriction_method='dowker',\n", " **params)\n", "pers_sheehy_parent = Persistence(truncation_method='Sheehy', \n", " restriction_method='Parent', \n", " **params)\n", "pers_sheehy_sheehy = Persistence(truncation_method='Sheehy', \n", " restriction_method='Sheehy', \n", " **params)\n", "\n", "# calculate persistent homology\n", "pers_no_no.persistent_homology(X=dists)\n", "pers_no_dowker.persistent_homology(X=dists)\n", "pers_no_parent.persistent_homology(X=dists)\n", "\n", "pers_dowker_no.persistent_homology(X=dists)\n", "pers_dowker_dowker.persistent_homology(X=dists)\n", "pers_dowker_parent.persistent_homology(X=dists)\n", "\n", "pers_canonical_no.persistent_homology(X=dists)\n", "pers_canonical_dowker.persistent_homology(X=dists)\n", "pers_canonical_parent.persistent_homology(X=dists)\n", "\n", "pers_sheehy_no.persistent_homology(X=dists)\n", "pers_sheehy_dowker.persistent_homology(X=dists)\n", "pers_sheehy_parent.persistent_homology(X=dists)\n", "pers_sheehy_sheehy.persistent_homology(X=dists)\n", "\n", "# setup the summary function\n", "def summarize_one(pers):\n", " cardinality = pers.cardinality_information(verbose = False)\n", " return pd.DataFrame(\n", " {'truncation': [pers.truncation_method],\n", " 'restriction': [pers.restriction_method],\n", " 'unreduced': [cardinality.get('unreduced_cardinality', np.nan)], \n", " 'reduced': [cardinality.get('reduced_cardinality', np.nan)]}, \n", " columns = ['truncation', 'restriction', 'unreduced', 'reduced'])\n", "\n", "def summarize_sizes(*args):\n", " return pd.concat([summarize_one(arg) for arg in args])\n", "\n", "# sizes\n", "summarize_sizes(pers_no_no, \n", " pers_no_dowker,\n", " pers_no_parent,\n", " pers_dowker_no,\n", " pers_dowker_dowker,\n", " pers_dowker_parent,\n", " pers_canonical_no,\n", " pers_canonical_dowker,\n", " pers_canonical_parent,\n", " pers_sheehy_no,\n", " pers_sheehy_dowker,\n", " pers_sheehy_parent,\n", " pers_sheehy_sheehy)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Persistence in pipelines\n", "----------------------------\n", "We also provide a seperate class `PersistenceTransformer` that can be used to integrate persistence in a `sklearn` pipline. The main method `fit` returns the persistence diagrams that can be used for example in the `persim` package. The example below shows how a `PersistenceTransformer` can be used to predict the parameters of a linked twist map. Note that this is a toy example and larger datasets are needed to get reasonable accuracies." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from sklearn.base import BaseEstimator\n", "from sklearn.pipeline import Pipeline\n", "from sklearn.ensemble import RandomForestClassifier\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.metrics import accuracy_score\n", "from persim import PersImage\n", "from dowker_homology.persistence import PersistenceTransformer\n", "from dowker_homology.datasets import linked_twist_map\n", "\n", "# generate data\n", "np.random.seed(1)\n", "N = 100\n", "k = 5\n", "r_values = np.repeat([2.0, 3.5, 4.0, 4.1, 4.3], k)\n", "labels = r_values.astype('str')\n", "data = [linked_twist_map(N=N, r=r) for r in r_values]\n", "\n", "# train test split\n", "(X_train, X_test, y_train, y_test) = train_test_split(\n", " data, labels, test_size=0.33, random_state=42)\n", "\n", "# some pipeline steps\n", "class PrepareDiagrams(BaseEstimator):\n", " def __init__(self):\n", " pass\n", " def fit(self, X, y=None):\n", " pass\n", " def transform(self, X, y=None):\n", " dgm_result_list = list()\n", " for dgms in X:\n", " dgm_result = list()\n", " for dgm in dgms:\n", " dgm = dgm[np.isfinite(dgm[:, 1])]\n", " dgm_result.append(dgm)\n", " dgm_result_list.append(dgm_result)\n", " return dgm_result_list\n", " def fit_transform(self, X, y=None):\n", " return self.transform(X=X, y=y)\n", " \n", "class PersistenceImage(BaseEstimator):\n", " def __init__(self, spread=1, pixels=[10, 10]):\n", " self.spread = spread\n", " self.pixels = pixels\n", " self.pim0 = PersImage(spread=self.spread, \n", " pixels=self.pixels, \n", " verbose=False)\n", " self.pim1 = PersImage(spread=self.spread, \n", " pixels=self.pixels, \n", " verbose=False)\n", " def fit(self, X, y=None):\n", " pass\n", " def transform(self, X, y=None):\n", " img_0d = self.pim0.transform([dgm[0] for dgm in X])\n", " img_1d = self.pim1.transform([dgm[1] for dgm in X])\n", " img = [np.hstack((im0, im1)) for im0, im1 in zip(img_0d, img_1d)]\n", " image_dta = np.vstack([np.hstack([im.ravel() for im in image]) \n", " for image in img])\n", " self.img_0d = img_0d\n", " self.img_1d = img_1d \n", " return image_dta\n", " def fit_transform(self, X, y=None):\n", " return self.transform(X=X, y=y)\n", "\n", "# define pipeline\n", "pers = PersistenceTransformer(\n", " dissimilarity='euclidean', \n", " multiplicative_interleaving=1.5)\n", "prep = PrepareDiagrams()\n", "pim = PersistenceImage(spread=1, pixels=[10,10])\n", "rf = RandomForestClassifier(n_estimators=100)\n", "pipe = Pipeline(steps=[('pers', pers), \n", " ('prep', prep), \n", " ('pim', pim), \n", " ('rf', rf)])\n", "\n", "# fit model\n", "pipe.fit(X_train, y_train)\n", "\n", "# predictions\n", "y_pred = pipe.predict(X_test)\n", "\n", "# accuracy\n", "acc = accuracy_score(y_test, y_pred) \n", "print('Accuracy: {0:.2f}'.format(acc))" ] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:dowker]", "language": "python", "name": "conda-env-dowker-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" } }, "nbformat": 4, "nbformat_minor": 2 }