1309 lines
1.8 MiB
Plaintext
1309 lines
1.8 MiB
Plaintext
|
{
|
|||
|
"cells": [
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"# PCA and cluster quality analysis"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"# Install interactive matplotlib for Jupyter with ipympl\n",
|
|||
|
"\n",
|
|||
|
"Run this in Jupyter:\n",
|
|||
|
"\n",
|
|||
|
"```\n",
|
|||
|
"!pip install ipympl\n",
|
|||
|
"```\n",
|
|||
|
"\n",
|
|||
|
"And then **restart the Jupyter Lab server**."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 1,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"import pandas as pd\n",
|
|||
|
"import numpy as np\n",
|
|||
|
"# %matplotlib notebook\n",
|
|||
|
"%matplotlib ipympl\n",
|
|||
|
"import matplotlib.pyplot as plt\n",
|
|||
|
"import seaborn as sns"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## A 'single cell seq' type dataset\n",
|
|||
|
"\n",
|
|||
|
"We are going to skip a bunch of biology and wet lab realities and generate a toy dataset with many characteristics of a real single cell sequencing experiment. In these types of experiments, RNA is extracted from many individual cells and the number of copies (also known as number of \"reads\") of RNA molecules for particular genes is quantified. Here we generate a dataset in which we quantify the number of reads for 200 genes from many cells which belong to some number of cell types. Typically in such a 'single cell seq' dataset, the exact number of cell types are not known in advance nor is the the correspondence between cells and cell types. Typically we need to figure this out through data analysis. We are going to use PCA and clustering to perform these jobs."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 2,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"/var/folders/8r/wczrm19j599_qsgyft707_wh0000gr/T/ipykernel_82795/2160947091.py:16: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)\n",
|
|||
|
" result[i,j] = value\n",
|
|||
|
"/var/folders/8r/wczrm19j599_qsgyft707_wh0000gr/T/ipykernel_82795/2160947091.py:17: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)\n",
|
|||
|
" result[j,i] = value\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"application/vnd.jupyter.widget-view+json": {
|
|||
|
"model_id": "22c11184370a4846808438842d01b162",
|
|||
|
"version_major": 2,
|
|||
|
"version_minor": 0
|
|||
|
},
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAoAAAAHgCAYAAAA10dzkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOz9eXid9X3nD7/u++y79n2xvK9gY2xjEQJpWJKWkDad0CkJTfvLJMykU4YZ0mbSPNOSNEMnzHOldEibKZn8ShqyPJ1m0kAzoYE0rDYYjCVbsixvWqx9Ozr7di/PH8f3zdHRkXS02Jas7+u6uADpXr7n1tH5vvXZ3pKu6zoCgUAgEAgEgnWDfK0XIBAIBAKBQCC4uggBKBAIBAKBQLDOEAJQIBAIBAKBYJ0hBKBAIBAIBALBOkMIQIFAIBAIBIJ1hhCAAoFAIBAIBOsMIQAFAoFAIBAI1hlCAAoEAoFAIBCsM4QAFAgEAoFAIFhnCAEoEAgEAoFAsM4QAlAgEAgEAoFgnSEEoEAgEAgEAsE6QwhAgUAgEAgEgnWGEIACgUAgEAgE6wwhAAUCgUAgEAjWGUIACgQCgUAgEKwzhAAUCAQCgUAgWGcIASgQCAQCgUCwzhACUCAQCAQCgWCdIQSgQCAQCAQCwTpDCECBQCAQCASCdYYQgAKBQCAQCATrDCEABQKBQCAQCNYZQgAKBAKBQCAQrDOEABQIBAKBQCBYZwgBKBAIBAKBQLDOEAJQIBAIBAKBYJ0hBKBAIBAIBALBOkMIQIFAIBAIBIJ1hhCAAoFAIBAIBOsMIQAFAoFAIBAI1hlCAAoEAoFAIBCsM4QAFAgEAoFAIFhnCAEoEAgEAoFAsM4QAlAgEAgEAoFgnSEEoEAgEAgEAsE6QwhAgUAgEAgEgnWGEIACgUAgEAgE6wwhAAUCgUAgEAjWGUIACgQCgUAgEKwzhAAUCAQCgUAgWGcIASgQCAQCgUCwzhACUCAQCAQCgWCdIQSgQCAQCAQCwTpDCECBQCAQCASCdYYQgAKBQCAQCATrDCEABQKBQCAQCNYZQgAKBAKBQCAQrDOEABQIBAKBQCBYZwgBKBAIBAKBQLDOEAJQIBAIBAKBYJ0hBKBAIBAIBALBOkMIQIFAIBAIBIJ1hhCAAoFAIBAIBOsMIQAFAoFAIBAI1hlCAAoEAoFAIBCsM4QAFAgEAoFAIFhnCAEoEAgEAoFAsM4QAlAgEAgEAoFgnSEEoEAgEAgEAsE6QwhAgUAgEAgEgnWGEIACgUAgEAgE6wwhAAUCgUAgEAjWGUIACgQCgUAgEKwzhAAUCAQCgUAgWGcIASgQCAQCgUCwzhACUCAQCAQCgWCdIQSgQCAQCAQCwTpDCECBQCAQCASCdYYQgAKBQCAQCATrDCEABQKBQCAQCNYZQgAKBAKBQCAQrDOEABQIBAKBQCBYZwgBKBAIBAKBQLDOEAJQIBAIBAKBYJ0hBKBAIBAIBALBOkMIQIFAIBAIBIJ1hhCAAoFAIBAIBOsMIQAFAoFAIBAI1hlCAAoEAoFAIBCsM4QAFAgEAoFAIFhnCAEoEAgEAoFAsM4QAlAgEAgEAoFgnSEEoEAgEAgEAsE6w3qtFyAQCAS6rqOqKgAWiwVJkq7xigQCgeD6RghAgUBwTdE0jUwmQyKRQNd1ZFnGZrNhsViwWq3IsiwEoUAgEKwwkq7r+rVehEAgWH8YUT9FUUwRaHwcaZoGgCRJpiC0Wq1YLBYhCAUCgWAFEAJQIBBcdXRdJ5PJmGlfSZJIp9PmfxvHGP8IQSgQCAQrixCAAoHgqqJpGul0Gk3TTPGm6/osAZhPviAcHBzE7XZTWVmJ1WoVglAgEAgWgagBFAgEVwUj5WukevOFmiEE50KSJPN4i8VCJBJBkiQ0TSOVSpFMJpFlGVmWhSAUCASCBRACUCAQXHE0TUNRFDPluxKizBCEFosFeC9CqKoqqqqSSqXMlLEQhAKBQDATIQAFAsEVw0jXGlG/3CjeSl3fwLi2LMvm93IbTXIFoyEGrVbriq9JIBAI1gJCAAoEgiuCrusoioKiKAArLrQWutZcglBRFDKZzCxBaIhCIQgFAsF6QAhAgUCw4hhRv9dff52bbroJt9t9Re6zmB62xQhCYw6hkTIWCASC6w0hAAUCwYqRP9svFotdsXutVA3hXIIQmFU/KAShQCC4XhACUCAQrAj5s/1yR7wshKqqDA4O4nA4KCkpwWot7qNpJadYzSUIM5mMOaJGCEKBQHC9IASgQCBYNoVm+8HCo10AYrEYbW1tqKpqjnTx+XyUlpZSWlpKIBAwO31zudJ1eoUEoZHaNiKEkiQJQSgQCNYkQgAKBIIls9zZfkNDQ3R2dtLQ0EBLSwuSJJFMJpmeniYYDNLV1UU6nSYQCJiC0O/3zxBlV4vckTPGvRcShEaXsUAgEKw2hAAUCARLYq6Uby7GoOZ8FEXh9OnTjI+Pc+ONN1JZWUkmk0HTNFwuFy6Xi9raWnRdJ5FIEAwGCQaDDAwMoKoqJSUlpug0oo5Xm/kEYTqdNqOHhbqMBQKB4FojBKBAIFg0RtQvP+WbTyFhFolEaGtrw263c+utt+J0OueM5EmShNvtxu12U19fj67rxGIxgsEgly5dYmxsjPHxcUpKSswIodfrvSYiayFBqGkawWCQ2traGV3GQhAKBIJrgRCAAoGgaPJn+y3kqpEbAdR1nUuXLtHd3c2GDRvYvHnzosWPJEl4vV68Xi/xeBxZlqmpqTEjhD09PciyPEMQut3uay4IdV0nlUpx+vRpysvLZ0QI82sIhSAUCARXAyEABQJBURjRLEPQFTsw2UgVd3R0MD09zU033UR5efmy12Pc3+fz4fP5aGpqQtM0IpEIwWCQ8fFxzp8/j9VqNcVgSUkJLpfrqous3GeV2+FsNM/MZVsnBKFAILhSCAEoEAjmJTeVuVDKNx9ZlolEIpw6dQqPx8Ott96K3W5f0bXl3y8QCBAIBNiwYQOaphEKhQgGg4yMjNDd3Y3dbjcFYWlpKU6nc8XWU+xajeeXGyEEzC7o+cbOCEEoEAhWAiEABQLBnOi6TigUIp1O4/P5FiX+jMhfd3c3W7ZsYcOGDVfVCg6yAsoQepCtXTQE4eDgIGfOnMHpdM4QhCspUIslVxBaLBZzBqGROs4VhEb9oNVqXdTPQyAQCHIRAlAgEBTEiPoNDQ0RiUTYu3dv0eemUilOnTqFoihs27aNDRs2XJE1LnYMjMVioaysjLKyMiDbjWyMnOnr66OzsxOPxzMjZWyz2VZ0zcUIttyUcb4gTCaT5jGGIDQihEIQCgSCYhECUCAQzCDfzk2W5UUJrcnJSU6ePElpaSkej+eK+QAX6zIyH1arlYqKCioqKgDIZDKmILx48SKxWAyv1ztDEBbrUpLPctYqBKFAIFhphAAUCAQmhWb7FSsANU3jwoUL9Pb2sm3bNhobGzl69GhR5xbbUFJovSuJzWajsrKSyspKIBvJNAThuXPnSCaTRbmUXGmKFYT5MwiFIBQIBAZCAAoEAmDu2X7FRNqSySTt7e2k02luueUWfD5f0eculashZBwOB9XV1VRXVwPZ12mMnMl1KTHGzgQCgXmHUl+pNc8lCI2mkmQyaYp5IQgFAgEIASgQrHtyZ/sVsnOTZbmgm4fB2NgYp06doqqqiv37989IkV5JAWis/WridDqpra2d4VJiRAiHhoZQFGWGbZ3ROHO115ofUTUEoaqqqKo659gZIQgFgvWDEIACwTpG0zQURVnQzq2QeNE0je7ubgYGBti1axd1dXWzjlnrEcCF7m+4lNTV1aHrOvF43IwQ9vf3o+u6GR10uVxXXbDmrtUQfDC3IDRSxrk+xtf6OQsEgiuDEIACwTokd7afruvzbvSFRFw8HqetrQ2A1tZWPB5P0eeuJNdKUBVCkiQ8Hg8ej4eGhgZ0XScajc5wKQE4deqUKQo9Hs81cykpJAgVRSGTyZjfL+RjLAShQHB9IASgQLDOyG/0WGhTz08BDw8P09nZSX19Pdu2bVuw5q1YkbYUW7jVTL5LSSwW49ixY/j9fiYnJ7lw4QIWi2XGDMJ
|
|||
|
"text/html": [
|
|||
|
"\n",
|
|||
|
" <div style=\"display: inline-block;\">\n",
|
|||
|
" <div class=\"jupyter-widgets widget-label\" style=\"text-align: center;\">\n",
|
|||
|
" Figure\n",
|
|||
|
" </div>\n",
|
|||
|
" <img src='
|
|||
|
" </div>\n",
|
|||
|
" "
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
"Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# Generate some toy data with known characteristics.\n",
|
|||
|
"#\n",
|
|||
|
"# NOTE: Don't read this code upon first run-through of the notebook. This\n",
|
|||
|
"# would spoil the rest of the work we do below to \"discover\" what was done\n",
|
|||
|
"# here in the data generation step.\n",
|
|||
|
"\n",
|
|||
|
"n_samples_per_cluster = 100\n",
|
|||
|
"n_dim = 200\n",
|
|||
|
"n_clusters = 4\n",
|
|||
|
"\n",
|
|||
|
"def gen_covar_mat(ndim):\n",
|
|||
|
" result = np.zeros((ndim,ndim))\n",
|
|||
|
" for i in range(ndim):\n",
|
|||
|
" for j in range(i,ndim):\n",
|
|||
|
" value = np.random.randn(1,1)[0]\n",
|
|||
|
" result[i,j] = value\n",
|
|||
|
" result[j,i] = value\n",
|
|||
|
" return result\n",
|
|||
|
"\n",
|
|||
|
"np.random.seed(1)\n",
|
|||
|
"\n",
|
|||
|
"X_train = []\n",
|
|||
|
"for i in range(n_clusters):\n",
|
|||
|
"\n",
|
|||
|
" center = 10*np.random.randn(n_dim) + np.array([10]*n_dim)\n",
|
|||
|
" \n",
|
|||
|
" C = gen_covar_mat(n_dim)\n",
|
|||
|
" stretched_gaussian = np.dot(np.random.randn(n_samples_per_cluster, n_dim), C) + center\n",
|
|||
|
"\n",
|
|||
|
" big_positive = (100*stretched_gaussian + np.array([1000]*n_dim))\n",
|
|||
|
" \n",
|
|||
|
" big_positive = np.clip(big_positive,0, np.inf)\n",
|
|||
|
" big_positive = np.round(big_positive) \n",
|
|||
|
" X_train.append( big_positive )\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"# concatenate the datasets into the final training set\n",
|
|||
|
"X_train = np.vstack(X_train)\n",
|
|||
|
"\n",
|
|||
|
"# shuffle the rows\n",
|
|||
|
"idxs = np.arange(len(X_train))\n",
|
|||
|
"idxs = np.random.shuffle(idxs)\n",
|
|||
|
"X_train = X_train[idxs][0]\n",
|
|||
|
"\n",
|
|||
|
"x = np.linspace(-20., 30.)\n",
|
|||
|
"y = np.linspace(-20., 40.)\n",
|
|||
|
"X, Y = np.meshgrid(x, y)\n",
|
|||
|
"XX = np.array([X.ravel(), Y.ravel()]).T\n",
|
|||
|
"\n",
|
|||
|
"# plt.figure()\n",
|
|||
|
"# plt.scatter(X_train[:, 0], X_train[:, 1], .8)\n",
|
|||
|
"# plt.xlabel(\"num reads (gene 0)\")\n",
|
|||
|
"# plt.ylabel(\"num reads (gene 1)\");\n",
|
|||
|
"\n",
|
|||
|
"fig = plt.figure()\n",
|
|||
|
"\n",
|
|||
|
"ax = fig.add_subplot(111, projection='3d')\n",
|
|||
|
"\n",
|
|||
|
"ax.scatter(X_train[:, 0], X_train[:, 1], X_train[:, 2], '.')\n",
|
|||
|
"ax.set_xlabel('gene 0')\n",
|
|||
|
"ax.set_ylabel('gene 1')\n",
|
|||
|
"ax.set_zlabel('gene 2');"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 3,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"import pandas as pd\n",
|
|||
|
"\n",
|
|||
|
"data = {}\n",
|
|||
|
"for i in range(n_dim):\n",
|
|||
|
" name = 'gene %d'%i\n",
|
|||
|
" data[name] = X_train[:,i].astype(np.int64)\n",
|
|||
|
"\n",
|
|||
|
"df = pd.DataFrame(data=data)\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Now let's have a first look at this data."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 4,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/html": [
|
|||
|
"<div>\n",
|
|||
|
"<style scoped>\n",
|
|||
|
" .dataframe tbody tr th:only-of-type {\n",
|
|||
|
" vertical-align: middle;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe tbody tr th {\n",
|
|||
|
" vertical-align: top;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe thead th {\n",
|
|||
|
" text-align: right;\n",
|
|||
|
" }\n",
|
|||
|
"</style>\n",
|
|||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|||
|
" <thead>\n",
|
|||
|
" <tr style=\"text-align: right;\">\n",
|
|||
|
" <th></th>\n",
|
|||
|
" <th>gene 0</th>\n",
|
|||
|
" <th>gene 1</th>\n",
|
|||
|
" <th>gene 2</th>\n",
|
|||
|
" <th>gene 3</th>\n",
|
|||
|
" <th>gene 4</th>\n",
|
|||
|
" <th>gene 5</th>\n",
|
|||
|
" <th>gene 6</th>\n",
|
|||
|
" <th>gene 7</th>\n",
|
|||
|
" <th>gene 8</th>\n",
|
|||
|
" <th>gene 9</th>\n",
|
|||
|
" <th>...</th>\n",
|
|||
|
" <th>gene 190</th>\n",
|
|||
|
" <th>gene 191</th>\n",
|
|||
|
" <th>gene 192</th>\n",
|
|||
|
" <th>gene 193</th>\n",
|
|||
|
" <th>gene 194</th>\n",
|
|||
|
" <th>gene 195</th>\n",
|
|||
|
" <th>gene 196</th>\n",
|
|||
|
" <th>gene 197</th>\n",
|
|||
|
" <th>gene 198</th>\n",
|
|||
|
" <th>gene 199</th>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </thead>\n",
|
|||
|
" <tbody>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>0</th>\n",
|
|||
|
" <td>6329</td>\n",
|
|||
|
" <td>2843</td>\n",
|
|||
|
" <td>211</td>\n",
|
|||
|
" <td>1152</td>\n",
|
|||
|
" <td>3990</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>5046</td>\n",
|
|||
|
" <td>1014</td>\n",
|
|||
|
" <td>2576</td>\n",
|
|||
|
" <td>554</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>2261</td>\n",
|
|||
|
" <td>2760</td>\n",
|
|||
|
" <td>6334</td>\n",
|
|||
|
" <td>4104</td>\n",
|
|||
|
" <td>3510</td>\n",
|
|||
|
" <td>194</td>\n",
|
|||
|
" <td>2165</td>\n",
|
|||
|
" <td>544</td>\n",
|
|||
|
" <td>2967</td>\n",
|
|||
|
" <td>4166</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1</th>\n",
|
|||
|
" <td>4196</td>\n",
|
|||
|
" <td>2854</td>\n",
|
|||
|
" <td>2430</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>6139</td>\n",
|
|||
|
" <td>1301</td>\n",
|
|||
|
" <td>4882</td>\n",
|
|||
|
" <td>1231</td>\n",
|
|||
|
" <td>393</td>\n",
|
|||
|
" <td>1589</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>3136</td>\n",
|
|||
|
" <td>414</td>\n",
|
|||
|
" <td>4671</td>\n",
|
|||
|
" <td>2862</td>\n",
|
|||
|
" <td>3311</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>4260</td>\n",
|
|||
|
" <td>3011</td>\n",
|
|||
|
" <td>3655</td>\n",
|
|||
|
" <td>2244</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>2</th>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>1083</td>\n",
|
|||
|
" <td>3509</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>1371</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>2117</td>\n",
|
|||
|
" <td>2769</td>\n",
|
|||
|
" <td>3874</td>\n",
|
|||
|
" <td>790</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>2525</td>\n",
|
|||
|
" <td>1306</td>\n",
|
|||
|
" <td>3889</td>\n",
|
|||
|
" <td>3890</td>\n",
|
|||
|
" <td>2942</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>1147</td>\n",
|
|||
|
" <td>1706</td>\n",
|
|||
|
" <td>2621</td>\n",
|
|||
|
" <td>2170</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>3</th>\n",
|
|||
|
" <td>5216</td>\n",
|
|||
|
" <td>215</td>\n",
|
|||
|
" <td>2992</td>\n",
|
|||
|
" <td>729</td>\n",
|
|||
|
" <td>2003</td>\n",
|
|||
|
" <td>348</td>\n",
|
|||
|
" <td>1528</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>1487</td>\n",
|
|||
|
" <td>696</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>4419</td>\n",
|
|||
|
" <td>620</td>\n",
|
|||
|
" <td>4851</td>\n",
|
|||
|
" <td>4084</td>\n",
|
|||
|
" <td>4514</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>233</td>\n",
|
|||
|
" <td>3176</td>\n",
|
|||
|
" <td>1183</td>\n",
|
|||
|
" <td>2661</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>4</th>\n",
|
|||
|
" <td>1370</td>\n",
|
|||
|
" <td>95</td>\n",
|
|||
|
" <td>1812</td>\n",
|
|||
|
" <td>172</td>\n",
|
|||
|
" <td>3696</td>\n",
|
|||
|
" <td>3285</td>\n",
|
|||
|
" <td>4202</td>\n",
|
|||
|
" <td>1321</td>\n",
|
|||
|
" <td>4099</td>\n",
|
|||
|
" <td>1066</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>2787</td>\n",
|
|||
|
" <td>566</td>\n",
|
|||
|
" <td>4824</td>\n",
|
|||
|
" <td>4440</td>\n",
|
|||
|
" <td>4346</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>2130</td>\n",
|
|||
|
" <td>1498</td>\n",
|
|||
|
" <td>2840</td>\n",
|
|||
|
" <td>3426</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>...</th>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>395</th>\n",
|
|||
|
" <td>1120</td>\n",
|
|||
|
" <td>2184</td>\n",
|
|||
|
" <td>3273</td>\n",
|
|||
|
" <td>4078</td>\n",
|
|||
|
" <td>1391</td>\n",
|
|||
|
" <td>4020</td>\n",
|
|||
|
" <td>926</td>\n",
|
|||
|
" <td>1008</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>2771</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>1411</td>\n",
|
|||
|
" <td>2680</td>\n",
|
|||
|
" <td>3576</td>\n",
|
|||
|
" <td>1719</td>\n",
|
|||
|
" <td>3002</td>\n",
|
|||
|
" <td>3094</td>\n",
|
|||
|
" <td>2685</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>1499</td>\n",
|
|||
|
" <td>3172</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>396</th>\n",
|
|||
|
" <td>200</td>\n",
|
|||
|
" <td>2231</td>\n",
|
|||
|
" <td>3760</td>\n",
|
|||
|
" <td>2140</td>\n",
|
|||
|
" <td>639</td>\n",
|
|||
|
" <td>2361</td>\n",
|
|||
|
" <td>2165</td>\n",
|
|||
|
" <td>1072</td>\n",
|
|||
|
" <td>659</td>\n",
|
|||
|
" <td>2392</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>3884</td>\n",
|
|||
|
" <td>3814</td>\n",
|
|||
|
" <td>4327</td>\n",
|
|||
|
" <td>1997</td>\n",
|
|||
|
" <td>71</td>\n",
|
|||
|
" <td>1551</td>\n",
|
|||
|
" <td>4396</td>\n",
|
|||
|
" <td>954</td>\n",
|
|||
|
" <td>3456</td>\n",
|
|||
|
" <td>2964</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>397</th>\n",
|
|||
|
" <td>2819</td>\n",
|
|||
|
" <td>1761</td>\n",
|
|||
|
" <td>3465</td>\n",
|
|||
|
" <td>5406</td>\n",
|
|||
|
" <td>2306</td>\n",
|
|||
|
" <td>5221</td>\n",
|
|||
|
" <td>3042</td>\n",
|
|||
|
" <td>586</td>\n",
|
|||
|
" <td>465</td>\n",
|
|||
|
" <td>2791</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>366</td>\n",
|
|||
|
" <td>2801</td>\n",
|
|||
|
" <td>4754</td>\n",
|
|||
|
" <td>4591</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>1058</td>\n",
|
|||
|
" <td>1972</td>\n",
|
|||
|
" <td>2719</td>\n",
|
|||
|
" <td>1035</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>398</th>\n",
|
|||
|
" <td>733</td>\n",
|
|||
|
" <td>2635</td>\n",
|
|||
|
" <td>3357</td>\n",
|
|||
|
" <td>5571</td>\n",
|
|||
|
" <td>567</td>\n",
|
|||
|
" <td>4203</td>\n",
|
|||
|
" <td>2995</td>\n",
|
|||
|
" <td>256</td>\n",
|
|||
|
" <td>974</td>\n",
|
|||
|
" <td>5646</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>1461</td>\n",
|
|||
|
" <td>2347</td>\n",
|
|||
|
" <td>1631</td>\n",
|
|||
|
" <td>3641</td>\n",
|
|||
|
" <td>160</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>1409</td>\n",
|
|||
|
" <td>4058</td>\n",
|
|||
|
" <td>835</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>399</th>\n",
|
|||
|
" <td>1384</td>\n",
|
|||
|
" <td>1007</td>\n",
|
|||
|
" <td>3057</td>\n",
|
|||
|
" <td>2834</td>\n",
|
|||
|
" <td>2532</td>\n",
|
|||
|
" <td>2526</td>\n",
|
|||
|
" <td>2162</td>\n",
|
|||
|
" <td>2050</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>2044</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>0</td>\n",
|
|||
|
" <td>3139</td>\n",
|
|||
|
" <td>2886</td>\n",
|
|||
|
" <td>5599</td>\n",
|
|||
|
" <td>1274</td>\n",
|
|||
|
" <td>2532</td>\n",
|
|||
|
" <td>2884</td>\n",
|
|||
|
" <td>3167</td>\n",
|
|||
|
" <td>1446</td>\n",
|
|||
|
" <td>1671</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </tbody>\n",
|
|||
|
"</table>\n",
|
|||
|
"<p>400 rows × 200 columns</p>\n",
|
|||
|
"</div>"
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
" gene 0 gene 1 gene 2 gene 3 gene 4 gene 5 gene 6 gene 7 gene 8 \\\n",
|
|||
|
"0 6329 2843 211 1152 3990 0 5046 1014 2576 \n",
|
|||
|
"1 4196 2854 2430 0 6139 1301 4882 1231 393 \n",
|
|||
|
"2 0 1083 3509 0 1371 0 2117 2769 3874 \n",
|
|||
|
"3 5216 215 2992 729 2003 348 1528 0 1487 \n",
|
|||
|
"4 1370 95 1812 172 3696 3285 4202 1321 4099 \n",
|
|||
|
".. ... ... ... ... ... ... ... ... ... \n",
|
|||
|
"395 1120 2184 3273 4078 1391 4020 926 1008 0 \n",
|
|||
|
"396 200 2231 3760 2140 639 2361 2165 1072 659 \n",
|
|||
|
"397 2819 1761 3465 5406 2306 5221 3042 586 465 \n",
|
|||
|
"398 733 2635 3357 5571 567 4203 2995 256 974 \n",
|
|||
|
"399 1384 1007 3057 2834 2532 2526 2162 2050 0 \n",
|
|||
|
"\n",
|
|||
|
" gene 9 ... gene 190 gene 191 gene 192 gene 193 gene 194 gene 195 \\\n",
|
|||
|
"0 554 ... 2261 2760 6334 4104 3510 194 \n",
|
|||
|
"1 1589 ... 3136 414 4671 2862 3311 0 \n",
|
|||
|
"2 790 ... 2525 1306 3889 3890 2942 0 \n",
|
|||
|
"3 696 ... 4419 620 4851 4084 4514 0 \n",
|
|||
|
"4 1066 ... 2787 566 4824 4440 4346 0 \n",
|
|||
|
".. ... ... ... ... ... ... ... ... \n",
|
|||
|
"395 2771 ... 1411 2680 3576 1719 3002 3094 \n",
|
|||
|
"396 2392 ... 3884 3814 4327 1997 71 1551 \n",
|
|||
|
"397 2791 ... 366 2801 4754 4591 0 0 \n",
|
|||
|
"398 5646 ... 1461 2347 1631 3641 160 0 \n",
|
|||
|
"399 2044 ... 0 3139 2886 5599 1274 2532 \n",
|
|||
|
"\n",
|
|||
|
" gene 196 gene 197 gene 198 gene 199 \n",
|
|||
|
"0 2165 544 2967 4166 \n",
|
|||
|
"1 4260 3011 3655 2244 \n",
|
|||
|
"2 1147 1706 2621 2170 \n",
|
|||
|
"3 233 3176 1183 2661 \n",
|
|||
|
"4 2130 1498 2840 3426 \n",
|
|||
|
".. ... ... ... ... \n",
|
|||
|
"395 2685 0 1499 3172 \n",
|
|||
|
"396 4396 954 3456 2964 \n",
|
|||
|
"397 1058 1972 2719 1035 \n",
|
|||
|
"398 0 1409 4058 835 \n",
|
|||
|
"399 2884 3167 1446 1671 \n",
|
|||
|
"\n",
|
|||
|
"[400 rows x 200 columns]"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 4,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"# first impressions of the data\n",
|
|||
|
"\n",
|
|||
|
"So, how does our data look? At first glance it looks... like a bunch of random numbers with no real structure! But could there be some structure? For example, above we learned that although many cells have been sequenced, we expect these are from only a very limited number of cell types.\n",
|
|||
|
"\n",
|
|||
|
"How can we figure out something about these cell types?\n",
|
|||
|
"\n",
|
|||
|
"Let's make use of principal component analysis (PCA) and clustering from scikit-learn as some of the first tools in our toolkit."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## Converting to plain numpy\n",
|
|||
|
"\n",
|
|||
|
"While Pandas is very convenient for many things, scikit learn uses plain numpy arrays and generally works best when the datatype is a floating point number rather than an integer. Let's do this conversion now and call our data `X`.\n",
|
|||
|
"\n",
|
|||
|
"We need to make a copy of this so that it is \"C contiguous\" (this is the default numpy layout of the data but when creating a view of other data, the internal layout can be different)."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 5,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"X = df.to_numpy(dtype=np.float64).copy()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## PCA\n",
|
|||
|
"\n",
|
|||
|
"Let's first run PCA on our data."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 6,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"from sklearn.decomposition import PCA\n",
|
|||
|
"pca = PCA().fit(X)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"The results of our analysis are stored in the variable `pca`. We can use this to project our original high dimensional data into its principle components and plot just the first dimensions in this \"principle component space\"."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 7,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"projected = pca.transform(X)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 8,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"application/vnd.jupyter.widget-view+json": {
|
|||
|
"model_id": "386262ffe49340a08a864a56cd073fe8",
|
|||
|
"version_major": 2,
|
|||
|
"version_minor": 0
|
|||
|
},
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAoAAAAHgCAYAAAA10dzkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABceElEQVR4nO3de3yU9Z33//ckkBgijIEgSSSQyCpoOYhIIYBgaitQkQXtLYjLrb2VaimeKQa9tx62SnBb670qapGt3a4C3QXcX6tQ7AMrxgBGBA0qHjAIQihEw0QhEkyu3x9xxjlc15wy5+v1fDx4tMxcM3PNSK685/P9fj9fh2EYhgAAAGAbWck+AQAAACQWARAAAMBmCIAAAAA2QwAEAACwGQIgAACAzRAAAQAAbIYACAAAYDMEQAAAAJshAAIAANgMARAAAMBmCIAAAAA2QwAEAACwGQIgAACAzRAAAQAAbIYACAAAYDMEQAAAAJshAAIAANgMARAAAMBmCIAAAAA2QwAEAACwGQIgAACAzRAAAQAAbIYACAAAYDMEQAAAAJshAAIAANgMARAAAMBmCIAAAAA2QwAEAACwGQIgAACAzRAAAQAAbIYACAAAYDMEQAAAAJshAAIAANgMARAAAMBmCIAAAAA2QwAEAACwGQIgAACAzRAAAQAAbIYACAAAYDMEQAAAAJshAAIAANgMARAAAMBmCIAAAAA2QwAEAACwGQIgAACAzRAAAQAAbIYACAAAYDMEQAAAAJshAAIAANgMARAAAMBmCIAAAAA2QwAEAACwGQIgAACAzRAAAQAAbIYACAAAYDMEQAAAAJvpluwTSGcdHR06ePCgevbsKYfDkezTAQAAYTAMQ1988YVKSkqUlWXPWhgBsAsOHjyo0tLSZJ8GAACIwv79+9W/f/9kn0ZSEAC7oGfPnpI6/wH16tUryWcDAADC0dLSotLSUs/vcTsiAHaBe9i3V69eBEAAANKMnadv2XPgGwAAwMYIgAAAADZDAAQAALAZAiAAAIDNEAABAABshgAIAABgMwRAAAAAmyEAAgAA2AwBEAAAwGYIgAAAADZDAAQAL42uVtXuaVKjqzXZpwIAccNewADwjdV1+7R4bb06DCnLIS25fJhmjR6Q7NMCgJijAggA6qz8ucOfJHUY0l1rd1EJBJCRCIAAIKmh6Zgn/Lm1G4b2Nh1PzgkBQBwRAAFAUnlhvrIcvrdlOxwqK+yRnBMCgDgiAAKApGJnnpZcPkzZjs4UmO1w6MHLh6rYmZfkMwOA2GMRCAB8Y9boAZp4dl/tbTqussIehD8AGYsACABeip15BD8AGY8hYAAAAJshAAIAANgMARAAAMBmCIAAAAA2QwAEgDhgT2EAqYxVwABSXqOrVQ1Nx1RemJ+SK3T9z489hQGkOgIggJSW6mHK//zunDpES9fvDthTeOLZfVMyvAKwJ4aAAaSsRlerJ1xJ34apVBlWNTs/7/Dnxp7CAFINARBAympoOhbXMNXVeXpm59dhSH5bCrOnMICUwxAwgJRVXpivLId8QlaswlQshpatzu/GSWfqib/tUYfYUxhAaqICCCBlFTvztOTyYcp2dNbU3GFKUpcqd7EaWjY7vxkjS/TEK53hz+GQFk0dnFJzFgFAogIIIMXNGj1AE8/uq71Nx1VW2EObPzii8dWbulS5Cza0HGmlzvv8euRkaeayWs9zG4b00Pr3NX1ECRVAACmFCiAASfHvW9eV5y925qliUB9Jiknlzj10660rQ8vu8zvW1s4CEABpgQoggLi3WonV88eqcuceur1r7S61G0bM5unFc86ilPr9EAGkDwIgYHNW8+Fi1bculs8fy4DlP7Qci/car2AppX4/RADphQAI2Fws58PF+/ljHbCKnXkxr6TFI1hGE6KpFgIIhgAI2Fy8hy1j/fzxCFix5g6W7nmPXQ1hkYZoqoUAQmERCGBzVq1WYhWszJ5/0ZTBamg6FvWCE/eiC/c5xnsBSzRW1+3T+OpNmrN8m8ZXb9Lqun1RP1ewRSv+7z3Vd08BkBochmEYoQ+DmZaWFjmdTrlcLvXq1SvZpwN0SaOrNa5VNffzv33gqGe7tFhUp+JV7erKEGqjq9XTqsYt2+FQTVVl1J/t6rp9AUPfkgLee2nvHpqzfFvA41fOG+tZSQ3YHb+/GQIG8I14zIfzf35JuvrprTFbcBKvBSxdDZXxmFfpP/QtySdkut/72vkVcR3SB5AZGAIGkDCx3ts31PNFMzQciyHUWPcZdPMe+rZ678fbOuI6pA8gM1ABBJAw5YX5ckjyzi1dCUbBFpiEU8UzG+aNRfUunu1g3PJzsk1v75GTlRYLZQAkFwEQQMJs/uCIz98dUpfbuJgFLcl8xxDvoWGrgFj/qSvgdaIJqfEOYcfa2k1vP97WISn+Q/oA0hsBEEBCuIdWvYtrDoc08ey+ET2Hf8XOLGjV7mkKWsWzGuYdUtRTSzfsDnjdRVMHRxWm4hnC4t2+B0BmIwACSAizodUOQ2EPrQYb0vUPWqHCkdUwb93e5oDbJWn4GaeF9R7DEasGzYkYZgaQuQiAABKiKxWrSFf7hgpHrW1fBzwm2+HQ6LKCuFTV3KGv/lOXlm6IXQsc5voBiBYBEEBCdKViFc3CDO9w1CMnS8fa2tXoatWv/vK+1rx5IOD4GSNLNKK0IOZVNe/KpbdYtaxhrh+AaBAAASRMtBWrrlQPN+3+u1bUNJgO7Xp7fsdBLZw8OKZVNf/Kpb9Y7rkMAJEgAAJIqGgqVtFUD1fX7VPVGt9FJ8F4hzHvljDu14+GWeXSG4s2ACQLARBAWoikMme24jiULEmfHTuhRlerNn9wJOydQIIt6jCrXLqxaANAMhEAAaSNcKuHoSpv/tzNqRc8t0PuDTzcDw82Vy9Us2mzyuWiqYM1/IzT4rJoI1YrjAFkPgIggIxjVXlzh7QhRT31xt5mlRX2UGtbh25etcNzrFluNJurF+7K5Hiv1PWsMD7g0tL1sVthDCCzEQABZBx35c0d0ByS5k0s14/Hl3sC2IjSAkkybRrtz2yuXiQrk+O1UjfeK4wBZK6sZJ8AAESq0dWq2j1NanS1Bj3O8ApGg/qeKkkBj3NXC705JM9tVnP1zB6XyEUd4a4wBgAzVAABpJVQ8+6kwEUghqSqNfVyfDMs7P04qxXGoYZtg+1DXLunKe7z8FhhDKArCIAA0ka48+7MwpGhbyuC/o+zmqfnH+D8F1n4P27zB0c0vnpTQubhscIYQFcQAAGkjXDn3QULR96Pe/OTZl06/NuwZxaYQi2ycD8u0u3qvJ87mmqh6QrjKYM1vH98VhgDyCwEQABpI9wdQfzDUZa+qQD6Pd/PntuhT4+26oaJg0xfL9QiiyFFPXWsrV3lhfkRb1cXzlB2KBPP7qtHZo+Qq/WknHnddUFZb4IfgLAQAAHERTx60kWyI4jZ8KxZmFvy4m7JkG6Y5BsCw1lkMePxWhnqDHB3ThkS9nZ10VQL/ZmFU9q/AAgXARBAzMWiumUlkr563sO6s0YPUI+cbN20cmfAcUvX79b080pCziP0590s+qEN7+vOqUP00Pr3TcOpdyCOtFrozyqc0v4FQLgIgABiKhbVrVCC9dULVnm8oKy3HA7f9jCS1CFFNI/Q7HZ36Fs7v0LH2zp8wql/IP7pReZDzj1ywuvMFSycRhIkAdgXfQABxFSw6la8PbV5j8ZVb9Kc5ds0vnqTVtft87m/2JmnqqlDAh5nNVR7/YRyz0Uy2+HQ4qlDtHLeWK2bPy6gB6Ak/fKF9zRzWa32fX7Mp/LnH4ifeHmP6fkfb+sI632a9SB0y5Jo/wIgpIQGwM2bN+uyyy5TSUmJHA6Hnn/+eZ/7DcPQvffeq5KSEuXl5emiiy7SO++843PMiRMndNNNN6mwsFD5+fmaPn26Pv30U59jmpubNXfuXDmdTjmdTs2dO1dHjx71OWbfvn267LLLlJ+fr8LCQt18881qa2uLx9sGbCVZDZKfemWPlry4O6DVy1v7m32aP98wcZB+dtEgz56/ZvMIV9ft0/jqTfrtqw2SQ/rJhWeqpqpSN0wapIpBfTS
|
|||
|
"text/html": [
|
|||
|
"\n",
|
|||
|
" <div style=\"display: inline-block;\">\n",
|
|||
|
" <div class=\"jupyter-widgets widget-label\" style=\"text-align: center;\">\n",
|
|||
|
" Figure\n",
|
|||
|
" </div>\n",
|
|||
|
" <img src='
|
|||
|
" </div>\n",
|
|||
|
" "
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
"Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"fig = plt.figure()\n",
|
|||
|
"\n",
|
|||
|
"ax = fig.add_subplot(111)\n",
|
|||
|
"\n",
|
|||
|
"ax.plot(projected[:,0], projected[:,1], '.')\n",
|
|||
|
"ax.set_xlabel('PC1')\n",
|
|||
|
"ax.set_ylabel('PC2');"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 9,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"application/vnd.jupyter.widget-view+json": {
|
|||
|
"model_id": "dc2a8088d78a470e9b299a352f315dec",
|
|||
|
"version_major": 2,
|
|||
|
"version_minor": 0
|
|||
|
},
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAoAAAAHgCAYAAAA10dzkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOz9eXhceXnvi37WUHOVSqV5sGxLnqfu9tC25U6HJkADgQZCEhL63D6QA4R9OSGQwMlJTs69pzeXdHYmwtmEAJk2BELITrJDDhmaJglDz4MtybYsy7Isa56rNNS8pvtHeS2XZM2qksut3+d59HRbtbR+a60a1rfe4ftKlmVZCAQCgUAgEAi2DfLdPgCBQCAQCAQCwdYiBKBAIBAIBALBNkMIQIFAIBAIBIJthhCAAoFAIBAIBNsMIQAFAoFAIBAIthlCAAoEAoFAIBBsM4QAFAgEAoFAINhmCAEoEAgEAoFAsM0QAlAgEAgEAoFgmyEEoEAgEAgEAsE2QwhAgUAgEAgEgm2GEIACgUAgEAgE2wwhAAUCgUAgEAi2GUIACgQCgUAgEGwzhAAUCAQCgUAg2GYIASgQCAQCgUCwzRACUCAQCAQCgWCbIQSgQCAQCAQCwTZDCECBQCAQCASCbYYQgAKBQCAQCATbDCEABQKBQCAQCLYZQgAKBAKBQCAQbDOEABQIBAKBQCDYZggBKBAIBAKBQLDNEAJQIBAIBAKBYJshBKBAIBAIBALBNkMIQIFAIBAIBIJthhCAAoFAIBAIBNsMIQAFAoFAIBAIthlCAAoEAoFAIBBsM4QAFAgEAoFAINhmCAEoEAgEAoFAsM0QAlAgEAgEAoFgmyEEoEAgEAgEAsE2QwhAgUAgEAgEgm2GEIACgUAgEAgE2wwhAAUCgUAgEAi2GUIACgQCgUAgEGwzhAAUCAQCgUAg2GYIASgQCAQCgUCwzRACUCAQCAQCgWCbIQSgQCAQCAQCwTZDCECBQCAQCASCbYYQgAKBQCAQCATbDCEABQKBQCAQCLYZQgAKBAKBQCAQbDOEABQIBAKBQCDYZggBKBAIBAKBQLDNEAJQIBAIBAKBYJshBKBAIBAIBALBNkMIQIFAIBAIBIJthhCAAoFAIBAIBNsMIQAFAoFAIBAIthlCAAoEAoFAIBBsM4QAFAgEAoFAINhmCAEoEAgEAoFAsM0QAlAgEAgEAoFgmyEEoEAgEAgEAsE2QwhAgUAgEAgEgm2GEIACgUAgEAgE2wwhAAUCgUAgEAi2GUIACgQCgUAgEGwzhAAUCAQCgUAg2GYIASgQCAQCgUCwzRACUCAQCAQCgWCbIQSgQCAQCAQCwTZDCECBQCAQCASCbYYQgAKBQCAQCATbDCEABQKBQCAQCLYZQgAKBAKBQCAQbDOEABQIBAKBQCDYZggBKBAIBAKBQLDNEAJQIBAIBAKBYJshBKBAIBAIBALBNkMIQIFAIBAIBIJthhCAAoFAIBAIBNsMIQAFAoFAIBAIthlCAAoEAoFAIBBsM4QAFAgEAoFAINhmCAEoEAgEAoFAsM0QAlAgEAgEAoFgmyEEoEAgEAgEAsE2QwhAgUAgEAgEgm2GEIACgUAgEAgE2wz1bh+AQCAQWJaFYRgAKIqCJEl3+YgEAoHg9Y2IAAoEgruKaZpks1lefPFFJiYmmJubI5lMkslkMAwDy7Lu9iEKBALB6w4RARQIBHcFO+qn6zqmaTI3N4dpmliWRTqdBkCSJGRZxuVyoaoqiqIgy7KIEAoEAsEmEQJQIBBsOZZloWmak/a1RZ0kSSiKgqIoWJbl/AhBKBAIBIVFCECBQLCl2Clf0zQd8WZZlvNfG1sQAisKQkVRUFVVCEKBQCBYB0IACgSCLcFO+WqahmVZSwq1ler9lhOEpmmSyWRIp9PIsowsy0IQCgQCwSoIASgQCIqOaZroun5HyjefxRHA1cgXhIAjCA3DwDAMMpmMkzIWglAgEAgWIgSgQCAoGnaEzo76LRZt+WxWlNn7lmXZWTu/0SS/xtAWg6qqrnhMAoFA8HpFCECBQFAULMtC13V0XQfujNgtZr0RwNVYThDquo6maXcIQlsUCkEoEAi2A0IACgSCgmNH/QzDWCDCVqLQAnCp/a9VELpcLqcbeS3HLhAIBPcaQgAKBIKCsdjbb731dltp+rySIHz++ee5//778fl8C6KDQhAKBILXC0IACgSCgrCct99audtp13xBmMlknOPXNI1sNgtwR0OJEIQCgeBeRQhAgUCwaZby9lsvxU4BrxdbDOZHCO3UtqZpzjZCEAoEgnsRIQAFAsGGWYu331opJQG41DnY9YE2axGEdpexQCAQlBpCAAoEgg2x2ZTvcvu8V1hJEGazWSeCuFSXsUAgENxthAAUCATrxo76bSblu5hSigBuhLUKwsUpYyEIBQLB3UAIQIFAsGYWe/sVcqpGqQnAzR5LviC092XXSi43pUQIQoFAsFUIASgQCNaEHc0yTRNY3dh5vZSaACwk+TOMQQhCgUBw9xECUCAQrIidyhweHmZycpIjR44IUbJJVhKEmUxmRdsZce0FAkEhEAJQIBAsS36jh6ZpJBKJogmQ13MEcDXyBaGiKI4ptWVZDAwMMDc3x/79+5Fl2ZlSoqpqQVPwAoFgeyEEoEAgWJL8cW52N2uxR7VtVwG4mPz0ul13KcsylmWRTqedbWxBaEcIhSAUCARrRQhAgUCwgOXGuW3FrF7B0uSPrFscIRSCUCAQbAQhAAUCgcNK3n7FFIBbITDXQymJpqWuSX6EcCVBuNiDUAhCgUBgIwSgQCAAVvf2k2XZ6QAuFqUiAEuN1UTbcoLQbipJp9POWDshCAUCAQgBKBBse/K9/VYa57YVKWAhAO9kI9dksUWPLQgNw8AwjGVtZ4QgFAi2D0IACgTbGNM00XV9TePctkKglZIALJVjsSxr06LMFoR2HeFygtBOGefPMRaCUCB4fSIEoECwDckfU2YLjLWkGYuVAp6cnGRqaopoNEoikaCiooJIJILX6y3Ketud5QShrutomuY8vtQcYyEIBYLXB0IACgTbjMWNHmu9qds2JIXENE2uX79Of38/oVAIv9+P1+tleHiYq1ev4vV6iUQizo/b7S7o+vcChYgArsZ6BKHtQ2injAUCwb2JEIACwTZisbffeoRFoVPA6XSajo4ONE2jtbWV69ev4/V6aWlpoaWlBV3XmZmZIRaL0d/fT2dnJ8Fg0BGD5eXlqKr4CCsGKwnC2dlZBgcHOXTokBCEAsE9jPj0FAi2Act5+62HQgrAyclJLl68SE1NDYcPH15yxJmqqlRVVVFVVQVANpt1BGFPTw/pdJpQKOQIwnA47IxWez2xFRHA1VgsCGdnZ5EkyYkQwtJj64QgFAhKFyEABYLXOSt5+62HQtjAmKZJT08PAwMDHD58mMbGRuex1QSm2+2mpqaGmpoaIBdBjMVixGIxurq60DSNcDjsCMJQKLRhAXK3BVcpY5qmIwYXRwg1TSObzTqCUQhCgaB0EQJQIHgdY5om2Wx2w1G/fDYbAUylUnR0dKDrOq2trQSDwU3t3+v1Ul9fT319PZZlkUqliEajxGIxBgcHMU2T8vJyIpEIFRUVBAKBe1LYlUIEMJ+ljmeplLFdbmBHCBcLQrvLWCAQ3B2EABQIXofYKV+7y7cQ/m6bEYB2yre2ttapHVuKje5fkiT8fj9+v58dO3ZgWRbxeNyJEPb19SHL8oKGEp/Pd88IkFI6Tvv1tBJ2w0j+39iCcKkIYX6XsUAg2BqEABQIXmesx9tvPdhdwOuJSOWnfI8cOUJDQ8Oy2xby5i9JEqFQiFAoxM6dOzFNk7m5OWKxGOPj41y7dg232+1EByORCB6Pp2DrF5JS8SO02UhEci2CUJblO5pKhCAUCIqHEIACweuEjXj7rQd7X2sVAKulfJfaf7HEjizLlJeXU15eTnNzM4ZhMDs766SLr1y5gt/vd6KDttAtFUpJCBXKmNoWhPZ1XkoQLq4hLKXrIBDc6wgBKBC8DrBr4HRdx+VyFcWwN18ArsbExASXLl1aNeW7eP/FnjVsoygKFRUVVFRUsGf
|
|||
|
"text/html": [
|
|||
|
"\n",
|
|||
|
" <div style=\"display: inline-block;\">\n",
|
|||
|
" <div class=\"jupyter-widgets widget-label\" style=\"text-align: center;\">\n",
|
|||
|
" Figure\n",
|
|||
|
" </div>\n",
|
|||
|
" <img src='
|
|||
|
" </div>\n",
|
|||
|
" "
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
"Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"fig = plt.figure()\n",
|
|||
|
"\n",
|
|||
|
"ax = fig.add_subplot(111, projection='3d')\n",
|
|||
|
"\n",
|
|||
|
"ax.scatter(projected[:,0], projected[:,1], projected[:,2], '.')\n",
|
|||
|
"ax.set_xlabel('PC1')\n",
|
|||
|
"ax.set_ylabel('PC2')\n",
|
|||
|
"ax.set_zlabel('PC3');"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## Structure in the PCA space\n",
|
|||
|
"\n",
|
|||
|
"Now, what do you notice about the data in this PCA space?\n",
|
|||
|
"\n",
|
|||
|
"Now, instead of looking like a structure-free blob, we seem to have some structure. What kind of structure do we have?\n",
|
|||
|
"\n",
|
|||
|
"\"N clusters\", I hope you are thinking. Biologically speaking, we are now guessing that there were this many cell types in our original sample.\n",
|
|||
|
"\n",
|
|||
|
"## PCA explained variance\n",
|
|||
|
"\n",
|
|||
|
"One of the first questions about PCA is how much of the variance in our data are \"explained\" by the first N components of the projected data. Let's plot this."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 10,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"application/vnd.jupyter.widget-view+json": {
|
|||
|
"model_id": "f148fc18ce874be9ba25ea842ce96850",
|
|||
|
"version_major": 2,
|
|||
|
"version_minor": 0
|
|||
|
},
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAoAAAAHgCAYAAAA10dzkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABLoklEQVR4nO3deVyVZf7/8fcBAcEEVxAUgUzNcsulRDOX0rI9m7Lsqy1mOeWWTaU1pTVNODaVTWZqpdVkSk3aNGWL5a71CxVzKzVFj9oxwxRUEBCu3x/GySPbAQ/cZ3k9Hw++yXXuc87n5obvec91X4vNGGMEAACAgBFkdQEAAACoWQRAAACAAEMABAAACDAEQAAAgABDAAQAAAgwBEAAAIAAQwAEAAAIMARAAACAAEMABAAACDAEQAAAgABDAAQAAAgwBEAAAIAAQwAEAAAIMARAAACAAEMABAAACDAEQAAAgABDAAQAAAgwBEAAAIAAQwAEAAAIMARAAACAAEMABAAACDAEQAAAgABDAAQAAAgwBEAAAIAAQwAEAAAIMARAAACAAEMABAAACDAEQAAAgABDAAQAAAgwBEAAAIAAQwAEAAAIMARAAACAAEMABAAACDAEQAAAgABDAAQAAAgwBEAAAIAAQwAEAAAIMARAAACAAEMABAAACDAEQAAAgABDAAQAAAgwBEAAAIAAQwAEAAAIMARAAACAAEMABAAACDAEQAAAgABDAAQAAAgwBEAAAIAAQwAEAAAIMARAAACAAEMABAAACDAEQAAAgABDAAQAAAgwBEAAAIAAQwAEAAAIMARAAACAAFPL6gJ8WVFRkX7++WfVrVtXNpvN6nIAAIAbjDE6evSo4uLiFBQUmH1hBMCz8PPPPys+Pt7qMgAAQBXs3btXzZo1s7oMSxAAz0LdunUlnfoFioyMtLgaAADgjuzsbMXHxzs/xwMRAfAsFN/2jYyMJAACAOBjAnn4VmDe+AYAAAhgBEAAAIAAQwAEAAAIMARAAACAAEMABAAACDAEQAAAgABDAAQAAAgwBEAAAIAAQwAEAAAIMH4TAFesWKHrrrtOcXFxstls+uijjyp8zvLly9W5c2fVrl1b5557rmbMmFH9hQIAAFjMbwLg8ePH1aFDB02bNs2t4zMyMnT11VerZ8+eSk9P1+OPP67Ro0frww8/rOZKAQBAWRxZuVqzM1Pf7z2sNTsz5cjKtbokv+Q3ewEPGDBAAwYMcPv4GTNmqHnz5po6daokqU2bNlq7dq3++c9/6uabb66mKgEA8E+OrFxlZB5XndBgHc8vVFKjOpLk0nbmf8885pONDr25KkNF5o/XDbJJKQPbaVDX5laclt/ymwBYWd9884369+/v0nbllVfqzTffVEFBgUJCQko8Jy8vT3l5ec7vs7Ozq71OAABqUnGQKy/AuRPcbL//1+jsFBnp8QWbdVmrxoqNCj/LV0OxgA2ABw4cUExMjEtbTEyMTp48qczMTMXGxpZ4TkpKip5++umaKhEAgLNyZq9cRb1zpwe5sw1wZxv8TldojHZn5hAAPShgA6Ak2Ww2l++NMaW2F5swYYLGjRvn/D47O1vx8fHVVyAAAGWoKNzNXpVRoleumDvhzpMB7mwF22xKbBRhdRl+JWADYJMmTXTgwAGXtoMHD6pWrVpq2LBhqc8JCwtTWFhYTZQHAAhgZxPu3OFN4a4iwTabnhvYlt4/DwvYAJicnKz//e9/Lm1ffvmlunTpUur4PwAAPKGscFf83037s/SPz36scrjzFrbf/48p5zxKOyZI0r2XJemadrHKyS9SYqMIwl818JsAeOzYMf3000/O7zMyMrRhwwY1aNBAzZs314QJE7R//3698847kqQRI0Zo2rRpGjdunIYPH65vvvlGb775pubNm2fVKQAA/ERZIc+Xwl15Ac7d4CZJuzNzFBEapJz8ohL/Le0YAl/N8JsAuHbtWvXp08f5ffFYvTvvvFNvvfWWHA6H7Ha78/GkpCQtWrRIDz30kF599VXFxcXpX//6F0vAAADKVd23Z2tCeeGuOMjd3SNJUukBrjLBzZ0wR+CreTZjyuucRXmys7MVFRWlrKwsRUZGWl0OAMDDzlwSxdfC3Zm9cuUFuEDqgePz2496AAEAqIqy1r07c0kUb8h8lQ13FYU5fw96KBsBEAAQEEq7devuunc1Gf6CbTY9elVrtW9Wr9xbr4Q7nA0CIADA75wZ9tyZfFETIa+8cFeZW7CEO5wtAiAAwOedHvhK25asJrh7e5bwBm9AAAQA+AQrl1bx9Ng7wGoEQACA17Fi9m15a9udviQKY+/gDwiAAADLlXULtyZm35a37t2ZYY9wB39BAAQA1KjK9O55KvxVNLOWkIdAQwAEAFQrK3r32E8WKB8BEADgMZVZfqU6e/cIfED5CIAAgCqr7uVXWFoFqB4EQABApRSHvupcfqUqs28BuI8ACAAoV3X18pXXu8fEDKB6EQABAE41sf4evXuA9QiAAAA5snJdwp6nZui6M0GD4AfUPAIgAASgM2/rvrEywyXwVTX8sfwK4BsIgAAQIDw9eSPIJj121fksvwL4IAIgAPip6pq8cfoYPsIe4JsIgADgRzzVy1eZGboAfA8BEAD8wJmTOKrKJmk4M3QBv0cABAAfVNEkjsoq67YuwQ/wTwRAAPARnrq9y+QNAARAAPBynrq9y+QNAMUIgADgZc68vfv6yoxKvwa9fADKQwAEAC/gydm7w+nlA1ABAiAAWIjbuwCsQAAEAAsUB7+qzN7l9i6As0UABIBqVnx7N6lRHUnS7FUZVRrXx+1dAJ5CAASAanLm7V2bVKW1+ri9C8DTCIAA4GFl3d51J/xxexdATSAAAoAHnD6Ld/KiHyvd08ftXQA1iQAIAGfhbGfxEvwAWIEACABVUJVZvLbf/48xjOsDYC0CIABUQlWDX3EvnyTtzsxhXB8ASxEAAaACVR3fV1YvH8EPgNUIgABQhqqM7zt9Fi+9fAC8FQEQAM5wtrd5CX0AvB0BEAB+Vxz8KrNLB8EPgC8iAAIIaM7xffuOKOWzbW4/j1m8AHwZARBAQGJ8H4BARgAEEFC4zQsABEAAAYLgBwB/IAAC8Funr9+XsuhHt5/H+D4A/o4ACMDvML4PAMpHAATgN1i/DwDcQwAE4NOquk0bwQ9AICMAAvBZqWl2TViwye3bvBLj+wBAIgAC8FHf7z2s8Qs2ybgR/hjfBwCuCIAAfEpllnPhNi8AlI4ACMAnVCb4cZsXAMpHAATgtSqzjh+3eQHAfQRAAF6nsuv4BUla+EB3dYivX+21AYA/IAAC8CqpaXaN/3CT28u5BNmklIHtCH8AUAkEQABe4/u9h90Of0zwAICqIwACsJwjK1ezVuzSnNW7KzyWCR4AcPa8IgDm5+crIyNDLVq0UK1aXlESgGrmnOCx74hSPttW7rFM8AAAz7I0beXk5GjUqFF6++23JUnbt2/Xueeeq9GjRysuLk7jx4+3sjwA1aAyEzy4zQsA1SPIyjefMGGCvv/+ey1btky1a9d2tl9xxRVKTU21sDIA1SE1za7uKUv0+sqKw1+QpI8e7K7Hr76A8AcAHmZpD+BHH32k1NRUdevWTTabzdl+wQUXaOfOnRZWBsDTnFu3uXEsM3sBoHpZGgB//fVXRUdHl2g/fvy4SyAE4LscWbl6c2WG3ljFDh4A4C0sDYBdu3bVp59+qlGjRkmSM/S9/vrrSk5OtrI0AGeBHTwAwLtZGgBTUlJ01VVXaevWrTp58qRefvllbdmyRd98842WL19uZWkAqig1za4JCzYxwQMAvJilk0C6d++u1atXKycnRy1atNCXX36pmJgYffPNN+rcubOVpQGoguJxfkzwAADvZmkAlKR27drp7bff1ubNm7V161a9++67ateuXZVea/r06UpKSlLt2rXVuXNnrVy5stzj586dqw4dOigiIkKxsbG6++67dejQoSq9NxDIHFm5+vunW3XDq2tkKgp/NinlZiZ4AICVLA2AixYt0hdffFGi/YsvvtBnn31WqddKTU3V2LFj9cQTTyg9PV09e/bUgAEDZLfbSz1
|
|||
|
"text/html": [
|
|||
|
"\n",
|
|||
|
" <div style=\"display: inline-block;\">\n",
|
|||
|
" <div class=\"jupyter-widgets widget-label\" style=\"text-align: center;\">\n",
|
|||
|
" Figure\n",
|
|||
|
" </div>\n",
|
|||
|
" <img src='
|
|||
|
" </div>\n",
|
|||
|
" "
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
"Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"plt.figure()\n",
|
|||
|
"\n",
|
|||
|
"plt.plot(np.cumsum(pca.explained_variance_ratio_),'.-')\n",
|
|||
|
"plt.xlabel('number of components')\n",
|
|||
|
"plt.ylabel('cumulative explained variance');"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## mini batch K-Means\n",
|
|||
|
"\n",
|
|||
|
"Given that it looks like our data may have some clusters, let's try find these clusters using mini batch K-Means."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 11,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"from sklearn.cluster import MiniBatchKMeans\n",
|
|||
|
"from sklearn.metrics.pairwise import pairwise_distances_argmin"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 12,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"mbk = MiniBatchKMeans(n_clusters=n_clusters, batch_size=6, random_state=42, n_init='auto').fit(X);"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## plotting the clustering results in the original \"number of reads\" space\n",
|
|||
|
"\n",
|
|||
|
"Let's first plot the our raw read data in a scatter plot like above, but colored according to our cluster label. We will also plot our cluster centers here."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 13,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"mbk_means_cluster_centers = mbk.cluster_centers_\n",
|
|||
|
"mbk_means_labels = pairwise_distances_argmin(X, mbk_means_cluster_centers)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 14,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"application/vnd.jupyter.widget-view+json": {
|
|||
|
"model_id": "c80c775094004e72af6d933d88397c2d",
|
|||
|
"version_major": 2,
|
|||
|
"version_minor": 0
|
|||
|
},
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAoAAAAHgCAYAAAA10dzkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABrvElEQVR4nO3de3gU1f0/8PfmShKSDSTkBhEidwg3QUkwChYJUClafwUUjbFQlIpgCghF+SrlG0GsorZUUKqAeMEvVawXCkTbRhBCEBIkCAiCQiAhBJJNQmICm/n9gTtmc93LzO6ZmffreXgesju7e2Z2duez55zP55gkSZJARERERIbh4+0GEBEREZFnMQAkIiIiMhgGgEREREQGwwCQiIiIyGAYABIREREZDANAIiIiIoNhAEhERERkMAwAiYiIiAyGASARERGRwTAAJCIiIjIYBoBEREREBsMAkIiIiMhgGAASERERGQwDQCIiIiKDYQBIREREZDAMAImIiIgMhgEgERERkcEwACQiIiIyGAaARERERAbDAJCIiIjIYBgAEhERERkMA0AiIiIig2EASERERGQwDACJiIiIDIYBIBEREZHBMAAkIiIiMhgGgEREREQGwwCQiIiIyGAYABIREREZDANAIiIiIoNhAEhERERkMAwAiYiIiAyGASARERGRwTAAJCIiIjIYBoBEREREBsMAkIiIiMhgGAASERERGQwDQCIiIiKDYQBIREREZDAMAImIiIgMhgEgERERkcEwACQiIiIyGAaARERERAbDAJCIiIjIYBgAEhERERkMA0AiIiIig2EASERERGQwDACJiIiIDIYBIBEREZHBMAAkIiIiMhgGgEREREQGwwCQiIiIyGAYABIREREZDANAIiIiIoNhAEhERERkMAwAiYiIiAzGz9sN0LL6+nqcO3cOoaGhMJlM3m4OEREROUCSJFRWViIuLg4+PsbsC2MA6IZz584hPj7e280gIiIiF5w5cwZdunTxdjO8ggGgG0JDQwFcO4HCwsK83BoiIiJyREVFBeLj4+XruBExAHSDbdg3LCyMASAREZHGGHn6ljEHvomIiIgMjAEgERERkcEwACQiIiIyGAaARERERAbDAJCIiIjIYBgAEhERERkMA0AiIiIig2EASERERGQwDACJiIiIDIYBIBEREZHBMAAkIiIiMhiuBUxuu1BZi/mbD+JgYTkGdQnH85MGoVNooLebRURERC1gDyC5bf7mg9h1ohTl1Vew60Qp5m8+6O0mERERUSvYA0huO1hYDmu9BACw1kv4urDcuw3SGfawEhGR0tgDSG4b1CUcvj4mAICvjwkDu4R7t0E6wx5WIiJSGgNActvzkwYhpUckOgT7I6VHJJ6fNMjbTdIV9rASEZHSOARMbusUGogN027ydjN0a1CXcOw6UQprvcQeViIiUgR7AIkExx5WIiJSGnsAiQTHHlYiIlIaewCJiIiIDEa4ALBbt24wmUxN/s2aNQsAIEkSlixZgri4OAQFBWHUqFE4fPiw3XPU1tZi9uzZiIyMREhICCZOnIjCwkK7bcrKypCWlgaz2Qyz2Yy0tDSUl5d7ajeJiIiIvEa4AHDfvn0oKiqS/2VlZQEAJk2aBAB47rnnsHLlSqxatQr79u1DTEwMxowZg8rKSvk5MjIysGXLFmzatAm7du1CVVUVJkyYAKvVKm8zdepU5OfnY9u2bdi2bRvy8/ORlpbm2Z0lIiIi8gKTJEmStxvRmoyMDHzyySc4fvw4ACAuLg4ZGRlYuHAhgGu9fdHR0VixYgUefvhhWCwWdOrUCRs3bsSUKVMAAOfOnUN8fDy2bt2KsWPH4siRI+jXrx9ycnIwfPhwAEBOTg6Sk5Nx9OhR9O7d26G2VVRUwGw2w2KxICwsTIW9JyIiIqXx+i1gD2BDdXV1eOuttzBt2jSYTCacOnUKxcXFSE1NlbcJDAzEyJEjsXv3bgDA/v37ceXKFbtt4uLikJiYKG+zZ88emM1mOfgDgKSkJJjNZnmb5tTW1qKiosLuHxEREZHWCB0AfvjhhygvL8eDDz4IACguLgYAREdH220XHR0t31dcXIyAgAB06NCh1W2ioqKavF5UVJS8TXOWL18uzxk0m82Ij493ed+IiIiIvEXoAPD111/H+PHjERcXZ3e7yWSy+1uSpCa3NdZ4m+a2b+t5Fi1aBIvFIv87c+aMI7tBREREJBRhA8AffvgBn332GX73u9/Jt8XExABAk166kpISuVcwJiYGdXV1KCsra3Wb8+fPN3nNCxcuNOldbCgwMBBhYWF2/4iIiIi0RtgAcN26dYiKisIdd9wh35aQkICYmBg5Mxi4Nk8wOzsbI0aMAAAMHToU/v7+dtsUFRWhoKBA3iY5ORkWiwW5ubnyNnv37oXFYpG3ISIiItIrIVcCqa+vx7p165Ceng4/v5+baDKZkJGRgWXLlqFnz57o2bMnli1bhuDgYEydOhUAYDabMX36dMybNw8RERHo2LEj5s+fjwEDBuD2228HAPTt2xfjxo3DjBkz8OqrrwIAHnroIUyYMMHhDGAiIiIirRIyAPzss89w+vRpTJs2rcl9CxYsQE1NDR555BGUlZVh+PDh2LFjB0JDQ+VtXnzxRfj5+WHy5MmoqanB6NGjsX79evj6+srbvP3225gzZ46cLTxx4kSsWrVK/Z0jIiIi8jLh6wCKjHWEiDzrQmUt5m8+iIOF5RjUJRzPTxqETqGB3m4WEWkMr98CzwEkImps/uaD2HWiFOXVV7DrRCnmbz7o7SYREWmSkEPARETNOVhYDmv9tUELa72ErwvLvdsg8jj2AhMpgz2ARKQZg7qEw9fnWq1OXx8TBnYJ926DyOPYC0ykDAaARKQZz08ahJQekegQ7I+UHpF4ftIgbzeJPIy9wETK4BAwEWlGp9BAbJh2k7ebQV40qEs4dp0ohbVeYi8wkRvYA0hERJrBXmAiZbAHkIiINIO9wETKYABIRKQyJTNXmQVLRErgEDARkcqUzFxlFiwRKYEBIBGRypTMXGUWLBEpgQEgEZHKlKxfyFqIRKQEBoBEJKwLlbVIfyMXg5fuQPobubhQWevtJrlEycxVZsESkRJMkiRJ3m6EVnExaSJ1pb+Ra1fzLaVHJDNAichtvH6zB5CIBMb5bkRE6mAASETC4nw349LL8D+RqBgAEpGwON/NuFjuhkhdLARNRMLiqg/GxeF/InWxB5CIiITD4X8idTEAJCIi4XD4n0hdHAImIiLhcPifSF0MAImISBMuVNZi/uaDOFhYjkFdwvH8pEHoFBro7WYRaRKHgImISBOYGUykHAaARESkCcwMJlIOA0AiItIEZgYTKYcBIJGOcTUF0hNmBhMpxyRJkuTtRmgVF5Mm0aW/kYtdJ0phrZfg62NCSo9IZlYSkeHx+s0eQCJd45wpIiJqDgNAIh3jnCkiImoOA0AiHeOcKSIiag4LQRPpGFdTICKi5rAHkIiIiMhg2AOoM1wqiYjUxO8YIn1gD6DOcKkkIlITv2OI9IE9gDrDsh9EZKNGbx2/Y4j0gT2AOsOyH0Rko0ZvHb9jiPRByADw7NmzuP/++xEREYHg4GAMHjwY+/fvl++XJAlLlixBXFwcgoKCMGrUKBw+fNjuOWprazF79mxERkYiJCQEEydORGFhod02ZWVlSEtLg9lshtlsRlpaGsrLyz2xi6ph2Q8i7VNqCT81euv4HUOkD8ItBVdWVoYhQ4bgtttuw+9//3tERUXhu+++Q7du3dC9e3cAwIoVK/DMM89g/fr16NWrFzIzM/HFF1/g2LFjCA0NBQD8/ve/x8cff4z169cjIiIC8+bNw6VLl7B//374+voCAMaPH4/CwkK89tprAICHHnoI3bp1w8cff+xQW7mUDBGpQakl/LgUIFHzeP0WMAD84x//iC+//BI7d+5s9n5JkhAXF4eMjAwsXLgQwLXevujoaKxYsQIPP/wwLBYLOnXqhI0bN2LKlCkAgHPnziE+Ph5bt27F2LFjceTIEfTr1w85OTkYPnw4ACAnJwfJyck4evQoevfu3WZbeQIRkRoGL92B8uor8t8
|
|||
|
"text/html": [
|
|||
|
"\n",
|
|||
|
" <div style=\"display: inline-block;\">\n",
|
|||
|
" <div class=\"jupyter-widgets widget-label\" style=\"text-align: center;\">\n",
|
|||
|
" Figure\n",
|
|||
|
" </div>\n",
|
|||
|
" <img src='
|
|||
|
" </div>\n",
|
|||
|
" "
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
"Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"fig, ax = plt.subplots(nrows=1, ncols=1)\n",
|
|||
|
"\n",
|
|||
|
"x_gene_idx = 0\n",
|
|||
|
"y_gene_idx = 1\n",
|
|||
|
"\n",
|
|||
|
"for k in range(n_clusters):\n",
|
|||
|
" my_members = mbk_means_labels == k\n",
|
|||
|
" cluster_center = mbk_means_cluster_centers[k]\n",
|
|||
|
" line, = ax.plot(X[my_members, x_gene_idx], X[my_members, 1], '.', markersize=5)\n",
|
|||
|
" ax.plot(cluster_center[x_gene_idx], cluster_center[1], 'o', markersize=10, markeredgecolor='black', markerfacecolor=line.get_color())\n",
|
|||
|
"ax.set_xlabel('gene %d' % x_gene_idx)\n",
|
|||
|
"ax.set_ylabel('gene %d' % y_gene_idx);"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## plotting the clustering results in PCA space\n",
|
|||
|
"\n",
|
|||
|
"Hmm, the plot above was not too informative. It does not seem to show obvious clusters in the data, and the points look very interwoven with others, at least for these two genes.\n",
|
|||
|
"\n",
|
|||
|
"Let's re-plot our cluster assignments, but this time using the projection into PCA space."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 15,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"application/vnd.jupyter.widget-view+json": {
|
|||
|
"model_id": "3eccfa2e278b4351a8c264255a5a8c9f",
|
|||
|
"version_major": 2,
|
|||
|
"version_minor": 0
|
|||
|
},
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAoAAAAHgCAYAAAA10dzkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABFS0lEQVR4nO3deXyU5b3///ckhOwZSICEYDRQFqXBpdCyWAu2NnGJHruIlh6KfSDVIioCtfV4fmo9smgtaEGktZwWWisuiL+KyNJTBS0ggtADghUFZA2UQDaWEML1/YMz00wyk8xkZu65Z+7X8/HI4yEz18xcMybXvO/Pdd3X7TLGGAEAAMAxkmLdAQAAAFiLAAgAAOAwBEAAAACHIQACAAA4DAEQAADAYQiAAAAADkMABAAAcBgCIAAAgMMQAAEAAByGAAgAAOAwBEAAAACHIQACAAA4DAEQAADAYQiAAAAADkMABAAAcBgCIAAAgMMQAAEAAByGAAgAAOAwBEAAAACHIQACAAA4DAEQAADAYQiAAAAADkMABAAAcBgCIAAAgMMQAAEAAByGAAgAAOAwBEAAAACHIQACAAA4DAEQAADAYQiAAAAADkMABAAAcBgCIAAAgMMQAAEAAByGAAgAAOAwBEAAAACHIQACAAA4DAEQAADAYQiAAAAADkMABAAAcBgCIAAAgMMQAAEAAByGAAgAAOAwBEAAAACHIQACAAA4DAEQAADAYQiAAAAADkMABAAAcBgCIAAAgMMQAAEAAByGAAgAAOAwBEAAAACHIQACAAA4DAEQAADAYQiAAAAADkMABAAAcBgCIAAAgMN0iHUH4tm5c+d08OBBZWdny+Vyxbo7AAAgCMYY1dbWqrCwUElJzqyFEQDDcPDgQRUVFcW6GwAAoB327dunCy64INbdiAkCYBiys7Mlnf8FysnJiXFvAABAMGpqalRUVOT9HnciAmAYPNO+OTk5BEAAAOKMk5dvOXPiGwAAwMEIgAAAAA5DAAQAAHAYAiAAAIDDEAABAAAchgAIAADgMARAAAAAhyEAAgAAOAwBEAAAwGEIgAAAAA5DAAQAAHAYAiAAAIDDEAABoImVe1bq1qW3auWelbHuCgBEDQEQAJqYv22+tldu1/xt82PdFQCIGgIgADQxtmSs+uf119iSsbHuCgBETYdYdwAA7KS0uFSlxaWx7gYARBUVQAAAAIchAAIAADgMARAAAMBhCIAAAAAOQwAEAABwGAIgAACAwxAAAQAAHIYACAAA4DAEQACIAq4pDMDOCIAAbM/uYcpf/7imMAA7IwACsD27hyl//eOawgDsjAAIwPbsHqb89a+0uFQvlb/EdYUB2JLLGGNi3Yl4VVNTI7fbrerqauXk5MS6OwBCtHLPSs3fNl9jS8YS1AAH4fubCiAAB4vW1LLd1ywCAAEQQNyJVMCK1tSy3dcsAgABEEDciVTAitY6PbuvWQQAAiAASdGftozk89s9YHECCAC7IwACkBT9actIPr9TAxZrCwFECgEQgKToV9XsXrWLB6wtBBApbAMTBk4jB2ClULatYYsbIDC+v6kAAkDURHrKNpSpb6qFAFpDAARguUgHI7uujbMihAV670y5A2gNARCA5SIdjKIRtCIRKq0IYYHeu1NPlAEQHAIgAMtFOhhFI2hFIlRaEcIG5Q9SWnKaBuUPitprAEg8HWLdAQDOU1pcGtFQFOj5wjkRYmzJWO9j7Wzj4Y063XhaGw9vjHVXAMQRKoAALGXler22qniB+hJPZ9Cy1g9AexAAAVjKyrNT2wpHgfoST2fQstYPQHsQAAFYKpyKVajVw7bCUaC+UFUDkOjYCDoMbCQJWOvWpbdqe+V29c/rr5fKX4p1d9olnqaXgUTF9zcVQABxJFKVuV988At9+Y9f1oT/mWDJesSmlct4ml4GkLgIgADiRnvXuzWfOn75Hy/rdONprdm/xpIw1jT0Mb0MwA7YBgZAwvJU3Grqa7S/br/mb5uv0uJSjew3Ui//42UN7j5Y/zz1z6iHsaZbykR6CxwAaA/WAIaBNQSAvXnWDF6QdYFyUnNCWnfHWj0gcfH9zRQwgATmmW69f+D9bU4dN58mDnatnl2vQwwArSEAAkhYoawZbB74gl2rZ6eTOgijAIJFAAQAtQx8wYbHWJ/UwRnGANqDNYBhYA0BYG9OWMfXdG/E5iebAPCP728qgADiULBTnc0rYok4Rdq0Asll4QAEiwAIIO4EO9XZfHrWiilSq0MmoQ9AexAAAcSdYNfdNQ9H4a7XCybcsQ4PQDwgAAKIO+2tenkeJynoKl2oJ1mEEjITcUoaQHwgAAJwHE+Qe+i9h0JaR9g83PkLcOFsPRMqz+v/4oNfECQBhIQACCAq7FzdGlsyVmnJaTrdeDqkdYTNw124AW5Q/iClJadpUP6gdj3e8/ov/+Nlpp0BhIQACCAq7LwWrrS4VFO/OtUb7IwxOnr0qPbs2aOjR4+q6e5YrVX0wp3u3Xh4o043ntbGwxvb9T48rz+y38iY7kUIIP5YGgDXrFmjG2+8UYWFhXK5XHr99dd97jfG6NFHH1VhYaHS09M1YsQIffTRRz5t6uvrdc8996hLly7KzMzUTTfdpP379/u0OX78uEaPHi232y23263Ro0erqqrKp83evXt14403KjMzU126dNG9996rM2fORONtA44Uqw2Sg608lhaX6tdf/bV2/P871LdfX3Xt2lU9e/ZU165d1bdfXz3zzDMtxg1/z+EJh229rr9AHO5n5Hn9n3z5J5wJDCAklgbAEydO6LLLLtOcOXP83v/kk09q5syZmjNnjj744AMVFBTom9/8pmpra71tJk6cqCVLlmjRokV67733VFdXp/LycjU2NnrbjBo1Slu2bNHy5cu1fPlybdmyRaNHj/be39jYqBtuuEEnTpzQe++9p0WLFmnx4sWaPHly9N484DCx2J5k5Z6Veui9h4KqPK5YsUJFFxZp0uRJOpp7VEXji1T8k2IVjS/S0dyjun/y/erao6umLZgW1Gu3VfH0F/Yi+RnZecodgA2ZGJFklixZ4v33uXPnTEFBgZkxY4b3ttOnTxu3223mzZtnjDGmqqrKpKSkmEWLFnnbHDhwwCQlJZnly5cbY4zZvn27kWTWr1/vbbNu3TojyXz88cfGGGOWLVtmkpKSzIEDB7xtXnzxRZOammqqq6uDfg/V1dVGUkiPARA9I98YaUp+X2IG/WGQWbF7hTHGmBW7V5iRb4z0/tsYY6b+fqpxJbtMzmU5pt/T/UzJ70ta/PR7up/JvizbuJJd3vGlNZ7XeXLDky1ezwqe9z7yjZGWvi4Qj/j+NsY2awB3796tiooKlZb+60g4NTVVw4cP19q1ayVJmzZtUkNDg0+bwsJClZSUeNusW7dObrdbgwcP9rYZMmSI3G63T5uSkhIVFhZ625SVlam+vl6bNm0K2Mf6+nrV1NT4/ACwD0+VbepXpwY8UaOqqkqPjH9EWSVZKrq3SCmdUvw+V0qnFF1474XKLsnWd2/5btDTwRsPb4zJ2sdYX5MYQHyxTQCsqKiQJOXn5/vcnp+f772voqJCHTt2VOfOnVtt061btxbP361bN582zV+nc+fO6tixo7eNP9OnT/euK3S73SoqKgrxXQKIJn9Tqs2D0YIFC9RY36jCHxbKlexq9flcyS51/2F3nTx5UgsXLgyqD7EKYlwRBEAobBMAPVwu3wHZGNPituaat/HXvj1tmnvwwQdVXV3t/dm3b1+r/QIQe02DkTFGc56dI/cgd8DKX3MpnVKUMzBHs+fM9jk7OJjXa4o1egDsxDYBsKCgQJJaVOCOHDnirdYVFBTozJkzOn78eKttDh8+3OL5//nPf/q0af46x48fV0NDQ4vKYFOpqanKycnx+QEQPyorK/Xpzk+VPTA7pMdlD8rWpzs/1bFjx9r92nbeFgeA89gmAPbs2VMFBQVatWqV97YzZ85o9erVGjZsmCRp4MCBSklJ8Wlz6NAhbdu2zdtm6NChqq6u1oYNG7xt3n//fVVXV/u02bZtmw4dOuRts3LlSqWmpmrgwIFRfZ8AIqtpZa2
|
|||
|
"text/html": [
|
|||
|
"\n",
|
|||
|
" <div style=\"display: inline-block;\">\n",
|
|||
|
" <div class=\"jupyter-widgets widget-label\" style=\"text-align: center;\">\n",
|
|||
|
" Figure\n",
|
|||
|
" </div>\n",
|
|||
|
" <img src='
|
|||
|
" </div>\n",
|
|||
|
" "
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
"Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"fig, ax = plt.subplots(nrows=1, ncols=1)\n",
|
|||
|
"\n",
|
|||
|
"projected_centers = pca.transform(mbk.cluster_centers_)\n",
|
|||
|
"\n",
|
|||
|
"for k in range(n_clusters):\n",
|
|||
|
" my_members = mbk_means_labels == k\n",
|
|||
|
" projected_cluster_center = projected_centers[k]\n",
|
|||
|
" line, = ax.plot(projected[my_members, 0], projected[my_members, 1], '.',\n",
|
|||
|
" markersize=1.8)\n",
|
|||
|
" ax.plot(projected_cluster_center[0], projected_cluster_center[1], 'o',\n",
|
|||
|
" markersize=10, markeredgecolor='black', markerfacecolor=line.get_color())\n",
|
|||
|
"ax.set_xlabel('PC 1')\n",
|
|||
|
"ax.set_ylabel('PC 2');"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## silhouette analysis with K-means\n",
|
|||
|
"\n",
|
|||
|
"OK, so that last plot (in PCA space) is looking better. The automatically detected clusters seem to agree with the idea we had from just looking at the data. Let's now use a silhouette analysis as one way to check whether this was a particularly good number of clusters for this data.\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"## Selecting the number of clusters with silhouette analysis on KMeans clustering\n",
|
|||
|
"\n",
|
|||
|
"(This section modified from https://scikit-learn.org/stable/auto_examples/cluster/plot_kmeans_silhouette_analysis.html )\n",
|
|||
|
"\n",
|
|||
|
"Silhouette analysis can be used to study the separation distance between the\n",
|
|||
|
"resulting clusters. The silhouette plot displays a measure of how close each\n",
|
|||
|
"point in one cluster is to points in the neighboring clusters and thus provides\n",
|
|||
|
"a way to assess parameters like number of clusters visually. This measure has a\n",
|
|||
|
"range of [-1, 1].\n",
|
|||
|
"\n",
|
|||
|
"Silhouette coefficients (as these values are referred to as) near +1 indicate\n",
|
|||
|
"that the sample is far away from the neighboring clusters. A value of 0\n",
|
|||
|
"indicates that the sample is on or very close to the decision boundary between\n",
|
|||
|
"two neighboring clusters and negative values indicate that those samples might\n",
|
|||
|
"have been assigned to the wrong cluster.\n",
|
|||
|
"\n",
|
|||
|
"In this example, the silhouette analysis is used to inform us about an optimal value for\n",
|
|||
|
"the best number of clusters.\n",
|
|||
|
"\n",
|
|||
|
"Also from the thickness of the silhouette plot the cluster size can be\n",
|
|||
|
"visualized."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 16,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"from sklearn import cluster\n",
|
|||
|
"from scipy.spatial import distance\n",
|
|||
|
"import sklearn.datasets\n",
|
|||
|
"from sklearn.preprocessing import StandardScaler\n",
|
|||
|
"import numpy as np\n",
|
|||
|
"\n",
|
|||
|
"# from https://stats.stackexchange.com/a/251169/24333\n",
|
|||
|
"def compute_bic(kmeans,X):\n",
|
|||
|
" \"\"\"\n",
|
|||
|
" Computes the BIC metric for a given clusters\n",
|
|||
|
"\n",
|
|||
|
" Parameters:\n",
|
|||
|
" -----------------------------------------\n",
|
|||
|
" kmeans: List of clustering object from scikit learn\n",
|
|||
|
"\n",
|
|||
|
" X : multidimension np array of data points\n",
|
|||
|
"\n",
|
|||
|
" Returns:\n",
|
|||
|
" -----------------------------------------\n",
|
|||
|
" BIC value\n",
|
|||
|
" \"\"\"\n",
|
|||
|
" # assign centers and labels\n",
|
|||
|
" centers = [kmeans.cluster_centers_]\n",
|
|||
|
" labels = kmeans.labels_\n",
|
|||
|
" #number of clusters\n",
|
|||
|
" m = kmeans.n_clusters\n",
|
|||
|
" # size of the clusters\n",
|
|||
|
" n = np.bincount(labels)\n",
|
|||
|
" #size of data set\n",
|
|||
|
" N, d = X.shape\n",
|
|||
|
"\n",
|
|||
|
" #compute variance for all clusters beforehand\n",
|
|||
|
" cl_var = (1.0 / (N - m) / d) * sum([sum(distance.cdist(X[np.where(labels == i)], [centers[0][i]], \n",
|
|||
|
" 'euclidean')**2) for i in range(m)])\n",
|
|||
|
"\n",
|
|||
|
" const_term = 0.5 * m * np.log(N) * (d+1)\n",
|
|||
|
"\n",
|
|||
|
" # Note: our implementation of BIC has an inverted sign compared to that in the\n",
|
|||
|
" # e.g. wikipedia article, and thus the best fit has the highest value.\n",
|
|||
|
" \n",
|
|||
|
" BIC = np.sum([n[i] * np.log(n[i]) -\n",
|
|||
|
" n[i] * np.log(N) -\n",
|
|||
|
" ((n[i] * d) / 2) * np.log(2*np.pi*cl_var) -\n",
|
|||
|
" ((n[i] - 1) * d/ 2) for i in range(m)]) - const_term\n",
|
|||
|
"\n",
|
|||
|
" return(BIC)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 17,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"from sklearn.metrics import silhouette_samples, silhouette_score\n",
|
|||
|
"from sklearn.cluster import KMeans\n",
|
|||
|
"import matplotlib.cm as cm"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 19,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"For n_clusters = 2 The average silhouette_score is : 0.08926589407761779\n",
|
|||
|
"For n_clusters = 3 The average silhouette_score is : 0.12854877265032805\n",
|
|||
|
"For n_clusters = 4 The average silhouette_score is : 0.16714540077634665\n",
|
|||
|
"For n_clusters = 5 The average silhouette_score is : 0.12535810422902316\n",
|
|||
|
"For n_clusters = 6 The average silhouette_score is : 0.09312017317314801\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"application/vnd.jupyter.widget-view+json": {
|
|||
|
"model_id": "173fe63e421b47ea96f98a272563a1b3",
|
|||
|
"version_major": 2,
|
|||
|
"version_minor": 0
|
|||
|
},
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAoAAAAHgCAYAAAA10dzkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzddXgU19cH8O+sZzfuSgRCkOASrEAIJFhxaItL8ZbiLeXXIqVAsWLFSnH3FkqDFS1BCgSKpViAQIy4brK79/0jb6ZZNm6bZM8nT55nd+bOnbOzO7NnZ+beyzHGGAghhBBCiMEQ6DsAQgghhBBSvigBJIQQQggxMJQAEkIIIYQYGEoACSGEEEIMDCWAhBBCCCEGhhJAQgghhBADQwkgIYQQQoiBoQSQEEIIIcTAUAJICCGEEGJgKAEkpWbbtm3gOI7/z6l9+/b89OHDh/PTQ0NDtZa5cOFC+QZNKpwLFy5ofSZCQ0PLbd1z587l1+vm5lZu662I9Pk+EELKHiWAJFeMMezZswf+/v6wtbWFWCyGubk5PDw80KFDB0ybNg2XLl3Sd5gV2vDhw/kvz/bt2+dahr5kCclbzn1j27Zt+g6nwlGpVDhy5AgmTZoEHx8fuLi4QCqVwsTEBE2aNMG8efOQmJio7zBJBSXSdwCkYhoyZAh2796tNS0hIQEJCQl48eIFzp8/j4SEBLRt25af36xZMyxdurS8QyWk1Pj7+8PY2BgAYGZmpudoCMnfu3fv0LdvX53pGRkZuH37Nm7fvo0dO3bg8uXLcHR01EOEpCKjBJDo+OOPP7SSPx8fH3Ts2BFSqRSvX79GSEgIgoKCdJarW7cu6tatW56hElKqWrVqhVatWuk7DKJH6enpEAqFEIvF+g6l0KRSKfz9/dGkSROkpqZiz549CAsLAwA8f/4cM2fOxK5du/QcJalwGCHvmTJlCgPAADBPT0+mVqt1ysTGxrLr169rTdu6dSu/3PsfrXbt2vHThw0bxk9/8eKF1jLnz59nBw4cYM2aNWMymYxZWVmxYcOGsZiYmFxjPXPmDOvTpw9zdHRkYrGYmZqasubNm7NFixaxxMRErbK5rSsnV1dXft6cOXN01nX79m02fPhw5u7uzqRSKTM2NmZNmzZly5cvZ2lpaXluh9z+z58/X2CZnNupKOsvSFpaGvv6669ZQEAAc3d3Z6ampkwkEjErKyv2wQcfsDVr1rDMzMwCt93OnTtZkyZN8n2fIiMj2fTp05mvry+rVq0aMzY2ZmKxmNna2rJOnTqxnTt3Mo1Go7XM+9vmxYsXjDHGfH19+WlDhw7VeV0rV67k59va2rKMjAzGGGP37t1jgwYNYq6urkwikTCZTMZcXFyYr68v++qrr1hYWBhfx5w5c/g6XF1dteoPDQ1lY8aMYTVq1GAymYxJpVLm6OjIWrVqxaZMmcIePnxY6PcgL++/9qdPn7JVq1axunXrMolEwhwcHNgXX3xRpPf7fRkZGWzTpk3Mz8+PWVtbM7FYzGxsbFirVq3YkiVL8owl+31gLO/9mbHS2YY568/t//163759y7788ktWr149ZmxszKRSKfP09GRTpkxh4eHhOtvg/fhv377NunTpwszNzbVe66VLl1ivXr3444tCoWCurq6sc+fObM6cOSw+Pr7Y70NpiI6OZtOmTWNRUVFa09+9e8dsbW3512hpaamnCElFRgkg0fH555/zBw4rKysWEhJSqOVKIwH09/fP9YDfunVrnfVNnTo13y8JT09P9vLlyzzXVZQEcM2aNUwoFOa5rmbNmvFfBmWRABZl/QWJjo4ucN0dO3ZkKpUqz23XunXrQr1PN2/eLHBdI0aM0Fomr8Tj8OHD/DQjIyOd19uqVSt+/tSpUxljjD148IDJ5fJ81//HH3/wdeSVvERGRjIbG5t861m/fn2htn9+3n/teW3ngQMHFqv+6Oho1qRJk0IlVqWdABZlGxYlAbxy5QqztLTMs6ytrS27c+eOVow562/UqJHOZ+TFixfs7Nmz+e5zANijR48Ktd1zbpPC/L+/TYujb9++fH3GxsYlro9UPXQJmOho2LAh/zgmJga1atVC/fr10axZMzRr1gwdO3aEh4dHmaz79OnTaNmyJfz8/HDixAkEBwcDAP766y8EBQWhZcuWAIAdO3ZgxYoV/HL169dHjx49EBoait27d4MxhidPnmDAgAG4du1aiWL666+/MGnSJDDGAABt2rRBx44dER8fj+3btyMuLg43b97E+PHjsWfPHv5eyP379+Pvv/8GAHh4eGD8+PF8ndWrV8fSpUvx7NkzbNiwgZ/+9ddfw8LCAgDg7e1drPUXhOM41KhRAz4+PnB0dISFhQUyMzPx+PFjHDx4ECqVCmfPnsXhw4cxYMCAPLdJYd4ngUCAunXrolmzZrCzs4O5uTnS09Nx584dHD9+HIwxbN26FePGjUPz5s3zjbtnz56oVq0aXr16hbS0NOzevRsTJkwAALx580brtoTslubbt29HamoqAMDZ2RmDBw+GQqFAWFgY7t+/X+jPxuHDhxEdHQ0AsLCwwIgRI2BlZYW3b9/i8ePHuHz5cqHqKaq//voLAQEBaNasGfbs2YPnz58DAPbu3YslS5bAycmpSPUNGTIEt27d4p/XrVsXXbp0gUgkwt9//41nz56Vavw5FWUbjh8/Ht27d8eMGTP4aR999BGaNm0K4L/7MxMSEtC7d2/ExsYCyNrPBgwYALFYjAMHDiAkJARRUVHo06cPHj16BKlUqhPXnTt3IBaLMXz4cFSvXh0PHjyAWCzGpk2boFarAQC1atVC//79IRKJ8OrVKwQHB+P27dtls6FKyb///ss/btasmR4jIRWWfvNPUhFlZGSwBg0a5PsL1dfXlz1+/FhrudI4A9iiRQv+8mNMTIzWL/DVq1fzy+WMz93dXeuS2Pz587XqvHLlSq7rKuwZwN69e/PTAwICtC5ZBgYG8vM4jmOvX7/m5w0bNoyf165du1y3dX5nWUq6/oJERkayX3/9la1bt44tW7aMLV26lHl7e/P1jRw5ki9b3Pcp28uXL9mhQ4fY2rVr+XU5OTnxy8yfP79Q22ThwoVaZ26y/fjjj/z0pk2b8tMnTZrET1+0aJFOXLGxsSw2NpZ/ntfZqxUrVvDTx44dq1NPcnIyi4iIKGCLF+z9196vXz9+XnBwsNa83377rUh13717V2v5Dz/8UOdS/7Nnz/KMpaRnAIuzDXOuf+vWrTrLrFq1SutMX86zwnFxcUwmk/Hzd+/enWv8ANjJkyd16u7Rowc/f+/evTrzw8PDWUpKis703Pz1119s6dKlhf7PeVa6OJYsWaJ1XDh79myJ6iNVE50BJDrEYjEuXryIBQsWYNu2bXj37p1OmfPnz8Pf3x/379+HiYlJqa171KhREImyPpaWlpawtrZGZGQkACAuLg4AkJKSgrt37/LL9O/fHzKZjH8+bNgwfPvtt/zzq1evonXr1sWO6a+//uIfnzp1CgJB7r0nMcZw7do19OvXr9jrKo/1p6WlYcKECdixYwc0Gk2e5bJvIs9NYd4nIOsM8rBhw/D777/nG1N+68pp9OjRmD9/Pn8W8datW2jSpAkOHjzIlxkxYgT/+IMPPsDq1asBAP/73/9w/PhxeHl5wcvLCz4+Pvjggw8gFAoLXG/r1q3BcRwYY9i0aRNu3ryJOnXqwMvLC02bNoWvry/s7OwK9RqKYuzYsfxjLy8vrXk5t3Nh5PwcAcA333zDv4fZyurMPlA22zDna4qKioK5uXmeZa9evYqBAwfqTG/QoAG6dOmiM/2DDz7Ab7/9BiDrjPLGjRtRs2ZNeHl5oXXr1mjevLlOf6d5Kc/GRd99953W8e/HH3+En59fuaybVC6UAJJcmZmZYenSpfjhhx/w4MEDXLt2DefPn8exY8eQlpYGAHj16hWOHDmCYcOGldp6XV1dtZ7nvGSTnazEx8drlbG1tdV6/v6XSF5flOz/L6lmUyqVuZbLvrxUGNmXuEpTaa9/1qxZhepTLa/tARTufQKyEsWCkr+C1pWTtbU1PvroI2zfvh0AsHnzZtjZ2fG
|
|||
|
"text/html": [
|
|||
|
"\n",
|
|||
|
" <div style=\"display: inline-block;\">\n",
|
|||
|
" <div class=\"jupyter-widgets widget-label\" style=\"text-align: center;\">\n",
|
|||
|
" Figure\n",
|
|||
|
" </div>\n",
|
|||
|
" <img src='
|
|||
|
" </div>\n",
|
|||
|
" "
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
"Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"application/vnd.jupyter.widget-view+json": {
|
|||
|
"model_id": "7f19c64ecf564d1dac4e6cbafc31633a",
|
|||
|
"version_major": 2,
|
|||
|
"version_minor": 0
|
|||
|
},
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAoAAAAHgCAYAAAA10dzkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd3hT1f/A8ffN7t6b0hYoZe+NyhIQQXAA+kMRcDFURMSBfJUhgoIibsABiIgMQQUVAQUcTJmyKqvQAqWF7tKV5Pz+KA1Nd0vblOa8nifPk9yc3PvJTe7NJ+eeoQghBJIkSZIkSZLdUNk6AEmSJEmSJKl6yQRQkiRJkiTJzsgEUJIkSZIkyc7IBFCSJEmSJMnOyARQkiRJkiTJzsgEUJIkSZIkyc7IBFCSJEmSJMnOyARQkiRJkiTJzsgEUJIkSZIkyc7IBFCqNEuWLEFRFMstv+7du1uWjxw50rI8KirK6jXbtm2r3qClGmfbtm1W34moqKhq2/a0adMs2w0NDa227dZEtvwcJEmqejIBlIokhOCbb76hT58++Pr6otVqcXd3p169evTs2ZMXXniBP/74w9Zh1mgjR460/Hh27969yDLyR1aSipf/2FiyZImtw6mRVq9ezbBhw2jatCne3t5otVqcnZ1p3LgxTz75JIcOHbJ1iFINpbF1AFLNNHz4cJYvX261LDk5meTkZM6ePcvWrVtJTk7mjjvusDzfvn175s6dW92hSlKl6dOnD87OzgC4ubnZOBpJKt3SpUv56aefrJYZjUZOnDjBiRMnWLp0KatWreLee++1TYBSjSUTQKmQX375xSr569ixI3feeSd6vZ7o6GgiIyPZuXNnodc1bdqUpk2bVmeoklSpunTpQpcuXWwdhmRDmZmZqNVqtFqtrUMpE0dHR7p3707z5s3x9fXFaDTy999/s2XLFgBycnKYPHmyTAClwoQkFfD8888LQAAiPDxcmEymQmUSEhLE7t27rZYtXrzY8rqCX61u3bpZlo8YMcKy/OzZs1av2bp1q1i1apVo3769MBgMwsvLS4wYMUJcvXq1yFg3b94s7r//fhEYGCi0Wq1wdXUVHTp0ELNnzxYpKSlWZYvaVn4hISGW56ZOnVpoW/v37xcjR44UYWFhQq/XC2dnZ9GuXTvx7rvvioyMjGL3Q1G3rVu3llom/34qz/ZLk5GRIV599VXRt29fERYWJlxdXYVGoxFeXl7i9ttvFx9++KHIyckpdd8tW7ZMtG3btsTP6fLly2LSpEmiR48eom7dusLZ2VlotVrh6+srevfuLZYtWybMZrPVawrum7NnzwohhOjRo4dl2aOPPlrofc2fP9/yvK+vr8jOzhZCCHH48GHx8MMPi5CQEKHT6YTBYBDBwcGiR48e4pVXXhExMTGWdUydOtWyjpCQEKv1R0VFiaeeeko0aNBAGAwGodfrRWBgoOjSpYt4/vnnxbFjx8r8GRSn4Hs/deqUeP/990XTpk2FTqcTAQEB4rnnnivX511Qdna2WLRokejVq5fw9vYWWq1W+Pj4iC5duog5c+YUG0ve5yBE8cezEJWzD/Ovv6hbwfVevHhRvPzyy6J58+bC2dlZ6PV6ER4eLp5//nlx6dKlQvugYPz79+8X/fr1E+7u7lbv9Y8//hD33nuv5fzi5OQkQkJCxF133SWmTp0qkpKSKvw5VLXevXtb3qPBYLB1OFINJBNAqZBnn33WcuLw8vISkZGRZXpdZSSAffr0KfKE37Vr10LbmzhxYok/EuHh4eLcuXPFbqs8CeCHH34o1Gp1sdtq37695cegKhLA8my/NPHx8aVu+8477xRGo7HYfde1a9cyfU579+4tdVujRo2yek1xicd3331nWebg4FDo/Xbp0sXy/MSJE4UQQhw9elQ4OjqWuP1ffvnFso7ikpfLly8LHx+fEtfz6aeflmn/l6Tgey9uPw8bNqxC64+Pjxdt27YtU2JV2QlgefZheRLAv/76S3h6ehZb1tfXVxw4cMAqxvzrb926daHvyNmzZ8WWLVtKPOYAcfz48TLt9/z7pCy3gvu0PJKTk8XGjRuFr6+vZX1t27at8Pqk2kteApYKadWqleX+1atXadSoES1atKB9+/a0b9+eO++8k3r16lXJtjdt2kTnzp3p1asXGzZs4ODBgwD8/fff7Ny5k86dOwPw1VdfMW/ePMvrWrRowcCBA4mKimL58uUIITh58iRDhw5l165dNxXT33//zfjx4xFCAHDbbbdx5513kpSUxNKlS0lMTGTv3r2MHTuWb775xtIWcuXKlfzzzz8A1KtXj7Fjx1rWWb9+febOncvp06dZsGCBZfmrr76Kh4cHAM2aNavQ9kujKAoNGjSgY8eOBAYG4uHhQU5ODidOnGD16tUYjUa2bNnCd999x9ChQ4vdJ2X5nFQqFU2bNqV9+/b4+fnh7u5OZmYmBw4cYP369QghWLx4MWPGjKFDhw4lxj1o0CDq1q3L+fPnycjIYPny5YwbNw6ACxcuWDVLyOtpvnTpUq5duwZAnTp1eOSRR3ByciImJoYjR46U+bvx3XffER8fD4CHhwejRo3Cy8uLixcvcuLECf78888yrae8/v77b/r27Uv79u355ptvOHPmDAArVqxgzpw5BAUFlWt9w4cPZ9++fZbHTZs2pV+/fmg0Gv755x9Onz5dqfHnV559OHbsWAYMGMCLL75oWfbggw/Srl074Eb7zOTkZO677z4SEhKA3ONs6NChaLVaVq1aRWRkJHFxcdx///0cP34cvV5fKK4DBw6g1WoZOXIk9evX5+jRo2i1WhYtWoTJZAKgUaNGDBkyBI1Gw/nz5zl48CD79++vmh1VQaGhoZw7d67Qcnd3d95//30bRCTVeLbNP6WaKDs7W7Rs2bLEf6g9evQQJ06csHpdZdQAdurUyXL58erVq1b/wD/44APL6/LHFxYWZnVJbMaMGVbr/Ouvv4rcVllrAO+77z7L8r59+1pdsty4caPlOUVRRHR0tOW5ESNGWJ7r1q1bkfu6pFqWm91+aS5fvix++OEH8cknn4h33nlHzJ07VzRr1syyvscee8xStqKfU55z586JNWvWiI8++siyraCgIMtrZsyYUaZ9MmvWLKuamzzvvfeeZXm7du0sy8ePH29ZPnv27EJxJSQkiISEBMvj4mqv5s2bZ1k+evToQutJS0sTsbGxpezx0hV874MHD7Y8d/DgQavnfvzxx3Kt+9ChQ1avv+eeewpd6j99+nSxsdxsDWBF9mH+7S9evLjQa95//32rmr78tcKJiYnCYDBYnl++fHmR8QPi559/LrTugQMHWp5fsWJFoecvXbok0tPTCy0vyt9//y3mzp1b5lv+Wumyyn/+yn9u3Lt3b7nXJdkHWQMoFaLVatm+fTszZ85kyZIlXLlypVCZrVu30qdPH44cOYKLi0ulbfvxxx9Ho8n9Wnp6euLt7c3ly5cBSExMBCA9Pd1qaIMhQ4ZgMBgsj0eMGMHrr79uebxjxw66du1a4Zj+/vtvy/1ff/0Vlaro0ZOEEOzatYvBgwdXeFvVsf2MjAzGjRvHV199hdlsLrZcTExMsc+V5XOC3BrkESNGFOqlWJ5t5ffkk08yY8YMSy3ivn37aNu2LatXr7aUGTVqlOX+7bffzgcffADA//73P9avX09ERAQRERF07NiR22+/HbVaXep2u3btiqIoCCFYtGgRe/fupUmTJkRERNCuXTt69OiBn59fmd5DeYwePdpyPyIiwuq5/Pu5LPJ/jwBee+01y2eYp6pq9qFq9mH+9xQXF4e7u3uxZXfs2MGwYcMKLW/ZsiX9+vUrtPz222/nxx9/BHJrlBcuXEjDhg2JiIiga9eudOjQodB4p8Wpjs5FU6ZMITExkStXrvDbb7+xf/9+zp49S9euXfn8888ZPnx4lW5fuvXIBFAqkpubG3PnzuXtt9/m6NGj7Nq1i61bt/L999+TkZEBwPnz51m7di0jRoyotO2GhIRYPc5/ySYvWUlKSrIq4+vra/W44I9IcT+U4vol1TxZWVlFlsu7vFQWeZe4KlNlb3/y5MllGlOtuP0
|
|||
|
"text/html": [
|
|||
|
"\n",
|
|||
|
" <div style=\"display: inline-block;\">\n",
|
|||
|
" <div class=\"jupyter-widgets widget-label\" style=\"text-align: center;\">\n",
|
|||
|
" Figure\n",
|
|||
|
" </div>\n",
|
|||
|
" <img src='
|
|||
|
" </div>\n",
|
|||
|
" "
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
"Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"application/vnd.jupyter.widget-view+json": {
|
|||
|
"model_id": "302bdeebba95487098ce6eec725390e6",
|
|||
|
"version_major": 2,
|
|||
|
"version_minor": 0
|
|||
|
},
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAoAAAAHgCAYAAAA10dzkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd3wT9f/A8Vd2994FSoEyy95D9gYZiugPQcABAoKIOBCRIYKCIioi4AAUZINfQa2AAoJMmSKIiGzogO7dJJ/fH6GhabpXSvt5Ph559HL53N0719zlnbvPUAghBJIkSZIkSVKlobR1AJIkSZIkSVLZkgmgJEmSJElSJSMTQEmSJEmSpEpGJoCSJEmSJEmVjEwAJUmSJEmSKhmZAEqSJEmSJFUyMgGUJEmSJEmqZGQCKEmSJEmSVMnIBFCSJEmSJKmSkQmgVGJWrVqFQqEwP7Lq3Lmzef6oUaPM869cuWKxzN69e8s2aKnc2bt3r8Vn4sqVK2W27VmzZpm3W7169TLbbnlky/+DJEmlTyaAUo6EEHz77bf07NkTHx8fNBoNbm5u1KhRg65du/Lyyy/z22+/2TrMcm3UqFHmL8/OnTvnWEZ+yUpS7rIeG6tWrbJ1OA+ERYsWWey37D/GJSmT2tYBSOXTiBEjWLt2rcW8uLg44uLiuHz5Mnv27CEuLo6OHTuaX2/ZsiULFy4s61AlqcT07NkTJycnAFxdXW0cjSQVzj///MObb75p6zCkB4RMACUrP/30k0Xy17p1a7p3745Op+P69etcuHCBQ4cOWS3XoEEDGjRoUJahSlKJateuHe3atbN1GJINpaamolKp0Gg0tg6lUIxGI6NHjyYlJcXWoUgPCHkLWLKya9cu83RISAgHDx5k7ty5zJgxgxUrVrBv3z4iIiIYM2aMxXJ51QEsjE2bNtGqVSvs7e3x8vJi1KhRREdH51h29+7dPProowQGBqLVanF1daV169a8++67JCQkWJTNr75h9erVza/NmjXLalsnT55k9OjR1KhRAzs7O5ydnWnZsiWLFi0iNTXVaj+sXr3aPG/fvn1W21YoFHTp0sViG8HBwTnWlSzM9vOTmprK9OnT6d27NzVq1MDV1RWNRoOXlxcdO3ZkyZIl6PX6fPfdmjVraNGiRZ7/p8jISF555RW6du1KUFAQzs7OaLVafH196dmzJ2vWrEEIUaC4u3btat7+yJEjrV7/6KOPzK/7+vqSkZEBwJ9//snw4cOpXr06Op0Oe3t7qlWrRteuXZk2bRo3b940ryOvOoBXr15l7NixhISEYG9vj52dHYGBgbRv354pU6Zw/vz5Ar2PvGSvEnDp0iU+/vhjQkND0el0BAQEMHny5EL9v7PLyMjg888/p3v37nh7e6PVavHx8aF9+/YFvoKfW51eKJl9mLn+rEaPHp3rem/fvs3rr79Oo0aNcHZ2xs7Ojtq1azNlyhTCw8Pzjf/kyZP07dsXd3d37O3tzZ+J/fv3M3jwYPP5xcnJierVq9OnTx9mzZpFXFxcgfZXWVi0aBEHDx5ErVbTv39/W4cjPQiEJGUzceJEAQhAeHp6igsXLhRouZUrV5qXy/7R6tSpk3n+yJEjzfMvX75ssUzPnj0tnmc+2rdvb7W9KVOm5Fg28xESEiKuXr2a67b27Nljsb6goCDzazNnzrR47ZNPPhEqlSrXbbVs2VLExsbmuB9yeuzZsyffMln3U2G2n5+oqKh8t929e3eh1+tz3Xft27cv0P/p2LFj+W5r9OjRFstk3zeXL18WQgixZcsW8zx7e3ur99uuXTvz61OmTBFCCPHXX38JBweHPLf/008/mdcxc+ZM8/ygoCDz/IiICOHt7Z3nej777LMC7f+8ZH/vue3nYcOGFWn9UVFRonnz5rm+h6zvObf/gxC5H89ClMw+zLr+/OI8cOCA8PDwyLWsj4+POHnypEWMWdfftGlTq8/I5cuXxe7du/M85gBx/vz5Au33rPukII/s+zQ/58+fF3Z2dgIQM2bMsNqeJOVE3gKWrDRp0sQ8fffuXerWrUujRo1o2bIlLVu2pHv37tSoUaNUtr1z507atm1Lt27d2LFjB6dOnQLg999/59ChQ7Rt2xaAr7/+mkWLFpmXa9SoEQMGDODKlSusXbsWIQQXL15k6NChHD58uFgx/f7770yaNMl8papDhw50796d2NhYVq9eTUxMDMeOHWPcuHF8++235rqQGzZs4I8//gCgRo0ajBs3zrzOmjVrsnDhQi5dusSyZcvM89944w3c3d0BCA0NLdL286NQKKhVqxatW7cmICAAd3d3MjIy+Pvvv9m0aRN6vZ7du3ezZcsWhg4dmus+Kcj/SalU0qBBA1q2bImvry9ubm6kpqZy8uRJtm/fjhCClStX8vzzz9OqVas84x44cCDVqlXj2rVrpKSksHbtWsaPHw/AzZs3LaolZF6VWr16NcnJyQBUqVKF4cOH4+joyI0bNzh79myBPxtbtmwhKioKAHd3d0aPHo2npye3bt3i77//Zv/+/QVaT2H9/vvv9OrVi5YtW/Ltt9/y33//AbBu3ToWLFhAYGBgodY3YsQIjh8/bn7eoEED+vTpg1qt5o8//uDSpUslGn9WhdmH48aNo3///rzyyivmeY8//jgtWrQA7tfPjIuLY/DgweYrzzVq1GDo0KFoNBo2btzIhQsXiIyM5JFHHuH8+fPodDqruE6ePIlGo2HUqFHUrFmTv/76C41Gw4oVKzAYDADUrVuXxx57DLVazbVr1zh16hQnTpwonR1VSAaDgdGjR5Oamkrjxo2ZMWMG77zzjq3Dkh4Ets0/pfIoPT1dNG7cOM9fqF26dBF///23xXIlcQWwTZs2IiMjQwghxN27dy1+gX/88cfm5bLGFxwcLFJSUsyvzZkzx2KdBw4cyHFbBb0COHjwYPP8Xr16CaPRaH4tLCzM/JpCoRDXr183vzZy5Ejza506dcpxX+d1laW4289PRESE+N///ieWLl0q3n//fbFw4UIRGhpqXt/TTz9tLlvU/1Omq1evis2bN4slS5aYtxUYGGheZs6cOQXaJ/PmzbO4cpPpww8/NM9v0aKFef6kSZPM8+fPn28VV3R0tIiOjjY/z+3q1aJFi8zzx44da7WexMREER4ens8ez1/29z5kyBDza6dOnbJ47fvvvy/Uuk+fPm2x/MMPP2z+H2a6dOlSrrEU9wpgUfZh1u2vXLnSapmPPvrI4kpf1qvCMTEx5qtigFi7dm2O8QPixx9/tFr3gAEDzK+vW7fO6vXbt2+LpKQkq/k5+f3338XChQsL/Mh6VTo/7777rgCERqMxX+mUVwClgpBXACUrGo2Gffv2MXfuXFatWsWdO3esyuzZs4eePXty9uxZnJ2dS2zbzzzzDGq16WPp4eGBl5cXERERAMTExACQlJTE6dOnzcs89thj2NnZmZ+PHDmSt956y/z84MGDtG/fvsgx/f777+bpn3/+GaUy56qzQggOHz7MkCFDirytsth+SkoK48eP5+uvv8ZoNOZa7saNG7m+VpD/E5iuII8cOZIffvghz5jy2lZWzz33HHPmzDFfRTx+/DjNmzdn06ZN5jKjR482Tz/00EN8/PHHALz55pts376dOnXqUKdOHVq3bs1DDz2ESqXKd7vt27dHoVAghGDFihUcO3aM+vXrU6dOHVq0aEGXLl3w9fUt0HsojLFjx5qn69SpY/Fa1v1cEFk/RwAzZsww/w8zldaVfSidfZj1PUVGRuLm5pZr2YMHDzJs2DCr+Y0bN6ZPnz5W8x966CG+//57wHRFefny5dSuXZs6derQvn17WrVqVeC6zqXVuOj8+fPMnDkTMH2+s969kaT8yARQypGrqysLFy7kvffe46+//uLw4cPs2bOH7777ztzK7Nq1a2zdujXHCvlFFRQUZPE86y2bzGQlNjbWooyPj4/F8+xfIrl9UYpsjQ/S0tJyLJdbA5ScZN7iKkklvf1p06YVqE+13PYHFOz/BKZEMb/kL79tZeXl5cXjjz9ubmDzxRdf4Ovra779q9Pp+L//+z9z+SFDhjB16lQ++eQT0tL
|
|||
|
"text/html": [
|
|||
|
"\n",
|
|||
|
" <div style=\"display: inline-block;\">\n",
|
|||
|
" <div class=\"jupyter-widgets widget-label\" style=\"text-align: center;\">\n",
|
|||
|
" Figure\n",
|
|||
|
" </div>\n",
|
|||
|
" <img src='
|
|||
|
" </div>\n",
|
|||
|
" "
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
"Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"application/vnd.jupyter.widget-view+json": {
|
|||
|
"model_id": "394e1cf36c024a75b3fdb43f8217bcfb",
|
|||
|
"version_major": 2,
|
|||
|
"version_minor": 0
|
|||
|
},
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAoAAAAHgCAYAAAA10dzkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd3xT1fvA8U/apHvQvSilrDLLpgwVyih7CqgMAZH9le1ARYYICoIgKIgoICh7/ASRKUuhgEwZVhDKLgW6oHTn/P6IvTSkIy1t03HevPIivTk598lN7s2Te89QCSEEkiRJkiRJUqlhZuoAJEmSJEmSpMIlE0BJkiRJkqRSRiaAkiRJkiRJpYxMACVJkiRJkkoZmQBKkiRJkiSVMjIBlCRJkiRJKmVkAihJkiRJklTKyARQkiRJkiSplJEJoCRJkiRJUikjE0Ap36xYsQKVSqXcMmrRooWyfODAgcry8PBwveccOHCgcIOWipwDBw7ofSbCw8MLbd1Tp05V1lu+fPlCW29RZMr3QZKkgicTQClTQgh++uknQkJCcHd3R6PRUKZMGSpUqEDLli2ZMGEChw4dMnWYRdrAgQOVL88WLVpkWkZ+yUpS1jLuGytWrDB1OEVSxuNMVrcZM2aYOkypCFKbOgCpaOrfvz8//vij3rLY2FhiY2O5du0a+/fvJzY2lpdeekl5vGHDhsyZM6ewQ5WkfBMSEoKdnR0Ajo6OJo5GkiSp4MgEUDLw66+/6iV/QUFBtG7dGktLS27evElYWBhHjx41eF6NGjWoUaNGYYYqSfmqadOmNG3a1NRhSCaUmJiIubk5Go3G1KHk2vDhw6lYsaLB8hdeeMEE0UhFnpCkZ4wbN04AAhCVK1cWaWlpBmWioqLEsWPH9JYtX75ced6zH63mzZsrywcMGKAsv3btmt5z9u/fL9avXy8aNmworKyshIuLixgwYIB4+PBhprHu2bNH9OjRQ3h7ewuNRiMcHBxEo0aNxKxZs0RcXJxe2czWlZGfn5/y2JQpUwzWderUKTFw4EDh7+8vLC0thZ2dnWjQoIGYO3euSEhIyHI7ZHbbv39/jmUybqfcrD8nCQkJ4v333xdt27YV/v7+wsHBQajVauHi4iJefPFFsXDhQpGSkpLjtlu1apWoX79+tu/TvXv3xMSJE0VwcLAoV66csLOzExqNRri7u4s2bdqIVatWCa1Wq/ecZ7fNtWvXhBBCBAcHK8tef/11g9c1f/585XF3d3eRnJwshBDi3Llzom/fvsLPz09YWFgIKysr4evrK4KDg8V7770nbt26pdQxZcoUpQ4/Pz+9+sPDw8XQoUNFpUqVhJWVlbC0tBTe3t6iadOmYty4ceLixYtGvwdZefa1X7lyRSxYsEDUqFFDWFhYCC8vLzFmzJhcvd/PSk5OFkuXLhWtWrUSrq6uQqPRCDc3N9G0aVMxe/bsLGNJfx+EyHp/FiJ/tmHG+jO7PVvvnTt3xLvvvitq1aol7OzshKWlpahcubIYN26cuHv3rsE2eDb+U6dOifbt24syZcrovdZDhw6Jbt26KccXW1tb4efnJ9q1ayemTJkiYmJi8vw+5JcBAwZkeUyTpOzIBFAy8NZbbykHFBcXFxEWFmbU8/IjAQwJCcn0gN+sWTOD9Y0fPz7bL4nKlSuL69evZ7mu3CSACxcuFObm5lmuq2HDhsqXQUEkgLlZf07u37+f47pbt24tUlNTs9x2zZo1M+p9OnHiRI7rGjRokN5zsko8Nm3apCyztrY2eL1NmzZVHh8/frwQQogLFy4IGxubbNf/66+/KnVklbzcu3dPuLm5ZVvP4sWLjdr+2Xn2tWe1nfv06ZOn+u/fvy/q169vVGKV3wlgbrZhbhLA33//XTg7O2dZ1t3dXZw+fVovxoz1161b1+Azcu3aNbF3795s9zlAXLp0yajtnnGbGHN7dptmJ2MCWK5cOWFlZSWsra1FQECAGD16tN4xUJIykpeAJQN16tRR7j98+JCqVasSGBhIw4YNadiwIa1bt6ZChQoFsu7du3fTpEkTWrVqxfbt2zlz5gwAf/zxB0ePHqVJkyYA/PDDD8ybN095XmBgIF26dCE8PJwff/wRIQSXL1+md+/ehIaGPldMf/zxB6NHj0YIAegup7Ru3ZqYmBhWrlxJdHQ0J06cYMSIEfz0009KW8h169bx559/AlChQgVGjBih1FmxYkXmzJnDv//+y5IlS5Tl77//Pk5OTgDUrFkzT+vPiUqlolKlSgQFBeHt7Y2TkxMpKSn8/fffbNiwgdTUVPbu3cumTZvo3bt3ltvEmPfJzMyMGjVq0LBhQzw8PChTpgyJiYmcPn2abdu2IYRg+fLlDB8+nEaNGmUbd9euXSlXrhw3btwgISGBH3/8kZEjRwJw+/ZtvWYJ6T3NV65cyZMnTwAoW7Ys/fr1w9bWllu3bnH+/HmjPxubNm3i/v37ADg5OTFo0CBcXFy4c+cOf//9N4cPHzaqntz6448/aNu2LQ0bNuSnn37i6tWrAKxZs4bZs2fj4+OTq/r69+/PyZMnlb9r1KhB+/btUavV/Pnnn/z777/5Gn9GudmGI0aMoFOnTrz99tvKsldeeYUGDRoAT9tnxsbG0r17d6KiogDdfta7d280Gg3r168nLCyMyMhIevTowaVLl7C0tDSI6/Tp02g0GgYOHEjFihW5cOECGo2GpUuXkpaWBkDVqlXp1asXarWaGzducObMGU6dOlUwG+o53LhxQ7kfFhZGWFgYK1euZMeOHbJpg2RAJoCSgf79+/Pll19y9uxZAIQQnD17lrNnz7Js2TIAgoODWbx4MQEBAfm67saNG3Po0CHUajXjxo3D3d1dOQj/+eefSmKRMfnz9/fn2LFjWFlZAVClShU++ugjAI4dO8Yff/xBs2bN8hzT3LlzleSrbdu2/Prrr8owN+3ataNdu3YArF27ltmzZyttIc+fP68kgL6+vkycOFGv3okTJ3LgwAG9BHDIkCEGw4/kdv1ly5bN9vW4uLhw+fJlIiMjCQ0N5fbt2zx58oR69erx119/cf78eQB27dqVZQJo7PtUr149zp8/z40bNzhx4gQRERFoNBpefPFFTp48ye3bt5V15ZQAmpubM3z4cN5//30Ali1bpiSAGzZsULZRgwYNqFWrFqBrz5Vu1KhRvPfee3p1RkdHZ7vOdBnr6d27N3PnztV7PD4+nsePHxtVV2707NmTDRs2KPfTf5wJITh16lSuEsBz586xc+dO5e/OnTuzefNm1OqnXwPpCWZByM02fOWVVwD0EsB27drpDSEFugQ/Pal0d3fn1KlTSnI4fvx4vLy8SExM5Nq1a2zatIk+ffpkGtv//d//0b59+yzjnTJlCq+++qre4xERETg4OOT4ukG/c5Ex0n/8GcvFxYU2bdpQoUIFhBDs3r1bSfRjY2N59dVXuXz5cqYJsFR6yQRQMqDRaDh48CAzZsxgxYoVPHjwwKDM/v37CQkJ4fz589jb2+fbugcPHqx8ITk7O+Pq6sq9e/eAp1/W8fHxSnIK0KtXLyX5AxgwYICSAAIcOXLkuRLAP/74Q7m/a9cuzMwyHz1JCEFoaCg9e/bM87oKY/0JCQmMHDmSH374Aa1Wm2W5W7duZfmYMe8T6M4gDxgwgF9++SXbmLJbV0ZDhgxh+vTpylnEkydPUr9+fSVJAhg0aJBy/8UXX+TLL78E4MMPP2Tbtm0EBAQQEBBAUFAQL774Iubm5jmut1mzZqhUKoQQLF26lBMnTlC9enUCAgJo0KABwcHBeHh4GPUacmPYsGHK/Wd/bBmbvKbL+DkCmDx5sl7yBxTYmX0omG2Y8TVFRkZSpkyZLMseOXIk0wSwdu3aBskf6D47P//8M6A7o/zNN99QpUoVAgICaNasGY0aNTIY7zQrBdm5aPLkySxbtkzvvfzkk0/0RnK4efMme/fupWPHjgUSg1Q8yQRQypSjoyNz5szhs88+48KFC4SGhrJ//362bt1KQkICoLvcsHnzZgYMGJBv6/Xz89P7O+Mv1vRkJSYmRq+Mu7u73t/Pfolk9UWZfsYoXVJSUqbl0i8
|
|||
|
"text/html": [
|
|||
|
"\n",
|
|||
|
" <div style=\"display: inline-block;\">\n",
|
|||
|
" <div class=\"jupyter-widgets widget-label\" style=\"text-align: center;\">\n",
|
|||
|
" Figure\n",
|
|||
|
" </div>\n",
|
|||
|
" <img src='
|
|||
|
" </div>\n",
|
|||
|
" "
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
"Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"application/vnd.jupyter.widget-view+json": {
|
|||
|
"model_id": "883843ccb1e94eb3865afc2d12dde899",
|
|||
|
"version_major": 2,
|
|||
|
"version_minor": 0
|
|||
|
},
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAoAAAAHgCAYAAAA10dzkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd3wT5R/A8U/SpHvQQSdQCpQyyt6gQNlLloDKEFABQUWWCqIyfggIiuAEHGzZoIJsBASZMmVY2buU0UFLV5rn98fZ0DTdTZuO5/165dXr5bm7by65yzd3z1AJIQSSJEmSJElSiaG2dACSJEmSJElSwZIJoCRJkiRJUgkjE0BJkiRJkqQSRiaAkiRJkiRJJYxMACVJkiRJkkoYmQBKkiRJkiSVMDIBlCRJkiRJKmFkAihJkiRJklTCyARQkiRJkiSphJEJoGQ2ixcvRqVSGR6ptWzZ0jB/0KBBhvnXrl0zWmbv3r0FG7RU6Ozdu9foM3Ht2rUC2/bkyZMN2y1fvnyBbbcwsuT7IElS/pMJoJQuIQQ//fQT7dq1w9PTE61WS6lSpahQoQKtWrVi7Nix/PHHH5YOs1AbNGiQ4cuzZcuW6ZaRX7KSlLHUx8bixYstHU6hFhUVxbRp02jYsCGurq7Y2Njg5+dHy5YtmTp1qqXDkwohjaUDkAqnAQMGsGLFCqN5UVFRREVFcfXqVfbs2UNUVBTNmzc3PN+gQQNmz55d0KFKktm0a9cOR0dHAFxcXCwcjSRlz7Fjx3juuee4d++e0fw7d+5w584dDhw4wEcffWSh6KTCSiaAkomtW7caJX+NGjWiTZs22NjYcPPmTUJDQzl06JDJctWrV6d69eoFGaokmVXTpk1p2rSppcOQLCg+Ph4rKyu0Wq2lQ8mWW7du0bFjRx4+fAiAp6cnPXr0oEyZMsTFxXHz5k1OnTpl2SClwklIUhqjR48WgABEYGCgSE5ONinz6NEjceTIEaN5ixYtMiyX9qPVokULw/yBAwca5l+9etVomT179og1a9aIBg0aCFtbW+Hu7i4GDhwoHj58mG6sO3fuFD179hS+vr5Cq9UKZ2dn0bBhQzFjxgwRHR1tVDa9baXm7+9veG7SpEkm2zpx4oQYNGiQCAgIEDY2NsLR0VHUr19ffPbZZyIuLi7D/ZDeY8+ePVmWSb2fcrL9rMTFxYn3339ftG/fXgQEBAhnZ2eh0WiEu7u7ePbZZ8WXX34pkpKSstx3y5YtE/Xq1cv0fbp3754YN26cCAkJEeXKlROOjo5Cq9UKT09P0bZtW7Fs2TKh1+uNlkm7b65evSqEECIkJMQw7+WXXzZ5XXPnzjU87+npKRITE4UQQpw5c0b069dP+Pv7C2tra2FrayvKli0rQkJCxPjx48WtW7cM65g0aZJhHf7+/kbrv3btmhg6dKioVKmSsLW1FTY2NsLX11c0bdpUjB49Wpw/fz7b70FG0r72S5cuiXnz5onq1asLa2tr4ePjI95+++0cvd9pJSYmioULF4rWrVsLDw8PodVqRenSpUXTpk3FrFmzMowl5X0QIuPjWQjz7MPU60/vkXa9d+7cEe+9956oUaOGcHR0FDY2NiIwMFCMHj1a3L1712QfpI3/xIkTomPHjqJUqVJGr/WPP/4Q3bt3N5xfHBwchL+/v+jQoYOYNGmSiIyMzPX7YC4DBw40vJaQkJA8fTakkkUmgJKJt956y3BCcXd3F6GhodlazhwJYLt27dI94Tdr1sxke2PGjMn0SyIwMFBcv349w23lJAH88ssvhZWVVYbbatCggeHLID8SwJxsPyv379/Pcttt2rQROp0uw33XrFmzbL1Px44dy3JbgwcPNlomo8Rj/fr1hnl2dnYmr7dp06aG58eMGSOEEOLcuXPC3t4+0+1v3brVsI6Mkpd79+6J0qVLZ7qeb7/9Nlv7PzNpX3tG+7lv3765Wv/9+/dFvXr1spVYmTsBzMk+zEkCeODAAeHm5pZhWU9PT3Hy5EmjGFOvv06dOiafkatXr4pdu3ZleswB4sKFC9na76n3SXYeafdpRuLi4oSNjY1huenTp4t27dqJ0qVLC3t7e1G3bl3x5ZdfpvsjXpLkLWDJRO3atQ3TDx8+pEqVKtSsWZMGDRrQoEED2rRpQ4UKFfJl2zt27KBJkya0bt2azZs3G25d/Pnnnxw6dIgmTZoAsHTpUubMmWNYrmbNmnTt2pVr166xYsUKhBBcvHiRPn36cPjw4TzF9OeffzJy5EiEEAA888wztGnThsjISJYsWUJERATHjh1j+PDh/PTTT4a6kKtXr+avv/4CoEKFCgwfPtywzooVKzJ79mwuX77M/PnzDfPff/99XF1dAQgODs7V9rOiUqmoVKkSjRo1wtfXF1dXV5KSkvjnn39Yu3YtOp2OXbt2sX79evr06ZPhPsnO+6RWq6levToNGjTAy8uLUqVKER8fz8mTJ9m0aRNCCBYtWsTrr79Ow4YNM427W7dulCtXjhs3bhAXF8eKFSsYMWIEALdv3zaqlpDS0nzJkiU8efIEgDJlytC/f38cHBy4desWZ8+ezfZnY/369dy/fx8AV1dXBg8ejLu7O3fu3OGff/5h//792VpPTv3555+0b9+eBg0a8NNPP3HlyhUAVq5cyaxZs/Dz88vR+gYMGMDx48cN/1evXp2OHTui0Wj466+/uHz5slnjTy0n+3D48OF06dKFd955xzDvhRdeoH79+sDT+plRUVH06NGDR48eAcpx1qdPH7RaLWvWrCE0NJTw8HB69uzJhQsXsLGxMYnr5MmTaLVaBg0aRMWKFTl37hxarZaFCxeSnJwMQJUqVejduzcajYYbN25w6tQpTpw4kT87KgeOHz9OQkKC4f/333/f6PkTJ05w4sQJdu/ezfr161GrZbtPKRXL5p9SYZSYmChq1aqV6S/UkJAQ8c8//xgtZ44rgI0bNzbcfnz48KHRL/AvvvjCsFzq+AICAoxue0ydOtVonQcOHEh3W9m9AtijRw/D/Pbt2xvdsty2bZvhOZVKJW7evGl4LvWtmRYtWqS7rzO7ypLX7Wfl3r174pdffhHffPON+PTTT8Xs2bNFcHCwYX2vvPKKoWxu36cU169fF+vWrRNfffWVYVt+fn6GZaZOnZqtfTJ9+nSjKzcpPv/8c8P8+vXrG+aPHDnSMH/GjBkmcT169Eg8evTI8H9GV6/mzJljmD9s2DCT9cTExIiwsLAs9njW0r72Xr16GZ47deqU0XO//vprjtZ9+vRpo+Wfe+45k1v9ly9fzjCWvF4BzM0+TL39RYsWmSwzb948oyt9qa8KR0RECFtbW8PzK1asSDd+QGzZssVk3V27djU8v3LlSpPn7969K2JjY03mp+fPP/8Us2fPzvYj9VXpzKxdu9bk3NymTRsxadIk0bBhQ6P5CxYsyNY6pZJDXgGUTGi1Wvbt28e0adNYvHgxDx48MCmzZ88e2rVrx9mzZ3FycjLbtl999VU0GuVj6ebmhoeHh6FlW0REBACxsbGcPn3asEzv3r2xtbU1/D9w4ECjFm8HDx6kWbNmuY7pzz//NExv3749w1/RQggOHz5Mr169cr2tgth+XFwcI0aMYOnSpej1+gzL3bp1K8PnsvM+gXIFeeDAgfz222+ZxpTZtlIbMmQIU6dONVxFPH78OPXq1WPt2rWGMoMHDzZMP/vss3zxxRcAfPDBB2zatImgoCCCgoJo1KgRzz77LFZWVllut1mzZqhUKoQQLFy4kGPHjlGtWjWCgoKoX78+ISEheHl5Zes15MSwYcMM00FBQUbPpd7P2ZH6cwTw4YcfGt7DFPl1ZR/yZx+mfk3h4eGUKlUqw7IHDx6kb9++JvNr1apFx44dTeY/++yz/Prrr4ByRXnBggVUrlyZoKAgmjVrRsOGDU36O81IfjUuSkxMNPq/Xr167NixA5VKxYQJEwgICODu3bsArFixgqFDh5o9BqnokgmglC4XFxdmz57NJ598wrlz5zh8+DB79uzh559/Ji4uDoAbN26wYcMGBg4caLbt+vv7G/2f+pZNSrISGRlpVMbT09Po/7RfIhl9UYr
|
|||
|
"text/html": [
|
|||
|
"\n",
|
|||
|
" <div style=\"display: inline-block;\">\n",
|
|||
|
" <div class=\"jupyter-widgets widget-label\" style=\"text-align: center;\">\n",
|
|||
|
" Figure\n",
|
|||
|
" </div>\n",
|
|||
|
" <img src='
|
|||
|
" </div>\n",
|
|||
|
" "
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
"Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# Modified from https://scikit-learn.org/0.21/auto_examples/cluster/plot_kmeans_silhouette_analysis.html\n",
|
|||
|
"range_n_clusters = [2, 3, 4, 5, 6]\n",
|
|||
|
"\n",
|
|||
|
"bic_score = []\n",
|
|||
|
"for n_clusters in range_n_clusters:\n",
|
|||
|
" # Create a subplot with 1 row and 2 columns\n",
|
|||
|
" fig, (ax1, ax2) = plt.subplots(1, 2)\n",
|
|||
|
"\n",
|
|||
|
" # The 1st subplot is the silhouette plot\n",
|
|||
|
" # The silhouette coefficient can range from -1, 1 but in this example all\n",
|
|||
|
" # lie within [-0.1, 1]\n",
|
|||
|
" ax1.set_xlim([-0.1, 1])\n",
|
|||
|
" # The (n_clusters+1)*10 is for inserting blank space between silhouette\n",
|
|||
|
" # plots of individual clusters, to demarcate them clearly.\n",
|
|||
|
" ax1.set_ylim([0, len(X) + (n_clusters + 1) * 10])\n",
|
|||
|
"\n",
|
|||
|
" # Initialize the clusterer with n_clusters value and a random generator\n",
|
|||
|
" # seed of 10 for reproducibility.\n",
|
|||
|
" clusterer = KMeans(n_clusters=n_clusters, random_state=10, n_init='auto')\n",
|
|||
|
" cluster_labels = clusterer.fit_predict(X)\n",
|
|||
|
" \n",
|
|||
|
" \n",
|
|||
|
" this_bic = compute_bic(clusterer,X)\n",
|
|||
|
" bic_score.append(this_bic)\n",
|
|||
|
"\n",
|
|||
|
" # The silhouette_score gives the average value for all the samples.\n",
|
|||
|
" # This gives a perspective into the density and separation of the formed\n",
|
|||
|
" # clusters\n",
|
|||
|
" silhouette_avg = silhouette_score(X, cluster_labels)\n",
|
|||
|
" print(\"For n_clusters =\", n_clusters,\n",
|
|||
|
" \"The average silhouette_score is :\", silhouette_avg)\n",
|
|||
|
"\n",
|
|||
|
" # Compute the silhouette scores for each sample\n",
|
|||
|
" sample_silhouette_values = silhouette_samples(X, cluster_labels)\n",
|
|||
|
"\n",
|
|||
|
" y_lower = 10\n",
|
|||
|
" for i in range(n_clusters):\n",
|
|||
|
" # Aggregate the silhouette scores for samples belonging to\n",
|
|||
|
" # cluster i, and sort them\n",
|
|||
|
" ith_cluster_silhouette_values = \\\n",
|
|||
|
" sample_silhouette_values[cluster_labels == i]\n",
|
|||
|
"\n",
|
|||
|
" ith_cluster_silhouette_values.sort()\n",
|
|||
|
"\n",
|
|||
|
" size_cluster_i = ith_cluster_silhouette_values.shape[0]\n",
|
|||
|
" y_upper = y_lower + size_cluster_i\n",
|
|||
|
"\n",
|
|||
|
" color = cm.nipy_spectral(float(i) / n_clusters)\n",
|
|||
|
" ax1.fill_betweenx(np.arange(y_lower, y_upper),\n",
|
|||
|
" 0, ith_cluster_silhouette_values,\n",
|
|||
|
" facecolor=color, edgecolor=color, alpha=0.7)\n",
|
|||
|
"\n",
|
|||
|
" # Label the silhouette plots with their cluster numbers at the middle\n",
|
|||
|
" ax1.text(-0.05, y_lower + 0.5 * size_cluster_i, str(i))\n",
|
|||
|
"\n",
|
|||
|
" # Compute the new y_lower for next plot\n",
|
|||
|
" y_lower = y_upper + 10 # 10 for the 0 samples\n",
|
|||
|
"\n",
|
|||
|
" ax1.set_title(\"The silhouette plot for the various clusters.\")\n",
|
|||
|
" ax1.set_xlabel(\"The silhouette coefficient values\")\n",
|
|||
|
" ax1.set_ylabel(\"Cluster label\")\n",
|
|||
|
"\n",
|
|||
|
" # The vertical line for average silhouette score of all the values\n",
|
|||
|
" ax1.axvline(x=silhouette_avg, color=\"red\", linestyle=\"--\")\n",
|
|||
|
"\n",
|
|||
|
" ax1.set_yticks([]) # Clear the yaxis labels / ticks\n",
|
|||
|
" ax1.set_xticks([-0.1, 0, 0.2, 0.4, 0.6, 0.8, 1])\n",
|
|||
|
"\n",
|
|||
|
" # 2nd Plot showing the actual clusters formed\n",
|
|||
|
" colors = cm.nipy_spectral(cluster_labels.astype(float) / n_clusters)\n",
|
|||
|
" ax2.scatter(X[:, 0], X[:, 1], marker='.', s=30, lw=0, alpha=0.7,\n",
|
|||
|
" c=colors, edgecolor='k')\n",
|
|||
|
"\n",
|
|||
|
" # Labeling the clusters\n",
|
|||
|
" centers = clusterer.cluster_centers_\n",
|
|||
|
" # Draw white circles at cluster centers\n",
|
|||
|
" ax2.scatter(centers[:, 0], centers[:, 1], marker='o',\n",
|
|||
|
" c=\"white\", alpha=1, s=200, edgecolor='k')\n",
|
|||
|
"\n",
|
|||
|
" for i, c in enumerate(centers):\n",
|
|||
|
" ax2.scatter(c[0], c[1], marker='$%d$' % i, alpha=1,\n",
|
|||
|
" s=50, edgecolor='k')\n",
|
|||
|
"\n",
|
|||
|
" ax2.set_title(\"The visualization of the clustered data.\")\n",
|
|||
|
" ax2.set_xlabel(\"Feature space for the 1st feature\")\n",
|
|||
|
" ax2.set_ylabel(\"Feature space for the 2nd feature\")\n",
|
|||
|
"\n",
|
|||
|
" plt.suptitle((\"Silhouette analysis, n_clusters = %d\" % n_clusters),\n",
|
|||
|
" fontsize=14, fontweight='bold')"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## BIC - Bayesian information criterion\n",
|
|||
|
"\n",
|
|||
|
"https://en.wikipedia.org/wiki/Bayesian_information_criterion\n",
|
|||
|
"\n",
|
|||
|
"Note: our implementation of BIC has an inverted sign compared to that in the wikipedia article, and thus the best fit has the highest value."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 20,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"application/vnd.jupyter.widget-view+json": {
|
|||
|
"model_id": "96872e64011b4661b8d6bfffa5a679b0",
|
|||
|
"version_major": 2,
|
|||
|
"version_minor": 0
|
|||
|
},
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAoAAAAHgCAYAAAA10dzkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABpRElEQVR4nO3deVzUdf4H8NfAwAzggChyKXF4IZ4wJGEemYaolVeK+osu111qXRWsELUVy0TDjt0tdWtda7ddVMT7KMmCPNhSRBDxVsQQREQBD46Z+fz+ICZHDkFHvwzzej4e89jl+31/v/P+zpecF9/j85UJIQSIiIiIyGxYSN0AERERET1aDIBEREREZoYBkIiIiMjMMAASERERmRkGQCIiIiIzwwBIREREZGYYAImIiIjMDAMgERERkZlhACQiIiIyMwyARERERGaGAZCIiIjIzDAAEhEREZkZBkAiIiIiM8MASERERGRmGACJiIiIzAwDIBEREZGZYQAkIiIiMjMMgERERERmhgGQiIiIyMwwABIRERGZGQZAIiIiIjPDAEhERERkZhgAiYiIiMwMAyARERGRmWEAJCIiIjIzDIBEREREZoYBkIiIiMjMMAASERERmRkGQCIiIiIzwwBIREREZGYYAImIiIjMDAMgERERkZlhACQiIiIyMwyARERERGaGAZCIiIjIzDAAEhEREZkZBkAiIiIiM8MASERERGRmGACJiIiIzAwDIBEREZGZYQAkIiIiMjMMgERERERmhgGQiIiIyMwwABIRERGZGQZAIiIiIjPDAEhERERkZhgAiYiIiMwMAyARERGRmWEAJCIiIjIzDIBEREREZoYBkIiIiMjMMAASERERmRkGQCIiIiIzwwBIREREZGYYAImIiIjMDAMgERERkZmRS92AKdPpdLh06RJUKhVkMpnU7RAREVETCCFQXl4Od3d3WFiY57EwBsAHcOnSJXh4eEjdBhEREd2HixcvolOnTlK3IQkGwAegUqkA1PwC2dvbS9wNERERNUVZWRk8PDz03+PmiAHwAdSe9rW3t2cAJCIiMjHmfPmWeZ74JiIiIjJjDIBEREREZoYBkIiIiMjMMAASERERmRkGQCIiIiIzwwBIREREZGYYAImIiIjMDAMgERERkZnhQNBERHfR6gR+Pl+CovIKOKuU6O/dDpYW5jtgLBG1PgyARER3+Ca7AIu25aCgtEI/zc1BiYXP+SG0l5uEnRERGQ9PARMR/eqb7AK8/vVhg/AHAIWlFXj968P4JrtAos6IiIyLAZCICDWnfRdty4GoZ17ttEXbcqDV1VdBRGRaGACJiAD8fL6kzpG/OwkABaUV+Pl8yaNriojoIeE1gEREAIrKGw5/d/rzlmwM6NweXVxU6ObcBt1cVHC0s37I3RERGRcDIBERAGeVskl1p4tu4HTRDYNpTm0U6OrcBt1c2qCriwrdXFTo6tyGwZCIWiwGQCIiAP2928HNQdngaWAZgPZtrPHWiO44d+UmTl0ux6nLN5B//TaKb1Si+EYl0s5dNVjGqY0C3VxqjhJ2+fVoYTeXNmhry2BIRNJiACQiAmBpIUNYoAc+2XO6zrzaEQAXj+1VZyiYm5UanCm6gVOXy3G69n/vCoYHzhoGww6q2iOGKnT9NSB2dWYwJKJHhwGQiAjA7SotNmbkAwBsrS1xq0qrn+fayDiAdgo5+nq0RV+PtgbTb/waDE/XEwyvlFfiSnn9wbCbSxt0df4tGHZzVsHB1sr4G0xEZo0BkIgIwEfJJ5FXcgtuDkrsmjUIxwvKH+hJIG0UcvTzaIt+DQTDmkBYEw7vDob7zxgGQ2eVAl1/DYb6o4YMhkT0AGRCCA5qdZ/Kysrg4OCA0tJS2NvbS90OEd2nIxevY/yK/dAJ4J+vBOJpX5dH3sONSs0dgbDm+sLTl8txqZGhaZxVijrXF3Z1UcHBhsGQqDH8/uYRQCIyc1UaHaI3ZEEngLH93CUJf0DNEUP/xxzh/5ijwfTyiupfTyX/dp1hbTAsKq9EUXkl9p0pNlimNhjeeX0hgyER3YkBkIjM2sqUszh5uRzt7Kzx5+d6St1OHSqlVYPB8HTRDZz5NRie+jUYFjQSDF3s6x4x7OLMYEhkjhgAichsnbpcjk9/qLnrd+FzfmhnQuP2qZRWCHjMEQENBEP9aeQ7guHlskpcLqvE3tP1B8Oaawzb1Fxv6KKCvZLBkKi1YgAkIrOk1QlEJ2WhWiswzNcZz/d1l7olo2goGJZVVOP05Rs4U1QTDGvvSi4sazgYutor77j5pCYUdnVpw2BI1AowABKRWfrqQC4y8q6jjUKOxeN6QSZr3l2+psZeaQW1pyPUnvUHw9+OGP4WDGtfDQXDO68vZDAkMi0MgERkdi6W3EL8tycBADGjfOHmYCNxR9JpKBiW3q7GmV/DYG0wPHW5HJfLKhsMhm4Oyjp3JHdxZjAkaokYAInIrAghELPxKG5XaxHk3Q5THn9M6pZaJAcbK6g920Ht2c5gem0wrD2NXDum4eWyShSUVqCgtP5g2NVFhW7Ov11f2NW5DVQMhkSSYQAkIrOSmP4L9p0phkJugaUT+sCimQM8m7sGg+Gtapy5Ynh94ekiw2D446krBsu4OyjR5ddgWDtsTRcjBUOtTuDn8yUPNJg3UWvGAEhEZqOovAKLt+cAACKf6QZvJzuJO2o9HGwbDoaniwyvLzx1uRxF5ZW4VFqBSw0Ew661p5F/fSxeVxcV2iia9pX1TXYBFm3LQcEdg2i7NfI4PyJzxCeBPACOJE5kWl7/Oh27sgvRq6M9Nr/xJOSWFlK3ZLbuDIY1A1zXhMOi8soGl+nY1ubXawx/O418dzD8JrsAr399GHd/sdUe+1v5YgBDIPH7GzwCSERmYtfRAuzKLoTcQoYPJvRl+JOYg60VAr3aIdDL8Ijh9VtVOF1keBr51OUbuFJeifzrt5F//TZS7zpi2LGtDbq6tEHnDnbYkJ5fJ/wBgEBNCFy0LQfP+LnydDCZPQZAImr1Sm9V450txwAAEUM6w8/dPP/iNwVtba3xuFc7PH6PYFj7WLw7g2HKySsNrLWGAFBQWoGfz5cguHP7h7gVRC0fAyARtXqLd+Sg+EYlfDrYYcbTXaRuh+5DQ8Hw2s3fguHuY4X48a47kOsz/V8H4efugC7ObdClQ82NJ12c28DNQdnqx4MkqsUASESt2r7TxUhM/wUyGfDBhD5QWllK3RIZkaOdNfp7t0N/73bo3KFNkwLgjUotfj5fgp/PlxhMt7O2ROdfQ2Fn59+CoWc7W14yQK0OAyARtVq3qjSYuzELAPDSE551rjej1qW/dzu4OShRWFpR73WAMgAu9kqsfDEA54tv4kzRjZrXlRu4cPUWblZpkfVLKbJ+KTVYzspSBq/2dvpA2MW5DTp3qHnZWPMPCjJNDIBE1Got//YUfrl2Gx3b2uCtUF+p26GHzNJChoXP+eH1rw9DBhiEwNoTu7HP+8H/MUf43/Ws5CqNDheuGobCM0U3cPbKDVRU63C66AZOF90wWEYm++3O5DtPJXdxboO2ttYPdVuJHpRRj2nv2LEDQUFBsLGxgZOTE8aPH28w/+DBgxg2bBjatm0LR0dHhISE4MiRIwY169evR79+/WBrawtPT0/Ex8fXeZ/U1FSo1WoolUr4+Phg1apVdWqSkpLg5+cHhUIBPz8/bNq0qU7NihUr4O3tDaVSCbVajb179z7YB0BELcbhvGtYc+A8AOD9cb2aPIYcmbbQXm5Y+WIAXB2UBtNdHZSNDgFjLbdAVxcVRvZ2w5+GdcVfJvtjx8xByFkUir1vD8WaVx/HgtE9MPlxDwR6OqKtrRWEAH65VnPzyT/2ncfcjUfxwqo09Hs3GYGLkxH29zTM33QUa/afx97TV1BQehsceY1aCqP9i5iUlITp06djyZIlePrppyGEwNGjR/Xzy8vLMWLECIwZMwYrVqyARqPBwoULMWLECPzyyy+wsrLCrl278H//93/429/+hpCQEBw/fhy/+93vYGNjgxkzZgAAzp8/j1GjRmH69On4+uuvsX//frzxxhvo0KEDJkyYAAB
|
|||
|
"text/html": [
|
|||
|
"\n",
|
|||
|
" <div style=\"display: inline-block;\">\n",
|
|||
|
" <div class=\"jupyter-widgets widget-label\" style=\"text-align: center;\">\n",
|
|||
|
" Figure\n",
|
|||
|
" </div>\n",
|
|||
|
" <img src='
|
|||
|
" </div>\n",
|
|||
|
" "
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
"Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"fig, ax = plt.subplots(nrows=1,ncols=1)\n",
|
|||
|
"ax.plot(range_n_clusters,bic_score,'o-')\n",
|
|||
|
"ax.set_xlabel(\"num clusters (k)\")\n",
|
|||
|
"ax.set_ylabel(\"BIC\");"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"kernelspec": {
|
|||
|
"display_name": "Python 3 (ipykernel)",
|
|||
|
"language": "python",
|
|||
|
"name": "python3"
|
|||
|
},
|
|||
|
"language_info": {
|
|||
|
"codemirror_mode": {
|
|||
|
"name": "ipython",
|
|||
|
"version": 3
|
|||
|
},
|
|||
|
"file_extension": ".py",
|
|||
|
"mimetype": "text/x-python",
|
|||
|
"name": "python",
|
|||
|
"nbconvert_exporter": "python",
|
|||
|
"pygments_lexer": "ipython3",
|
|||
|
"version": "3.11.10"
|
|||
|
}
|
|||
|
},
|
|||
|
"nbformat": 4,
|
|||
|
"nbformat_minor": 4
|
|||
|
}
|