{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "3325951a33e219b4018ec2a3d92da174", "grade": false, "grade_id": "cell-5e22290104c842b6", "locked": true, "schema_version": 3, "solution": false, "task": false }, "tags": [] }, "outputs": [], "source": [ "# You must run this cell, but you can ignore its contents.\n", "\n", "import hashlib\n", "\n", "def ads_hash(ty):\n", " \"\"\"Return a unique string for input\"\"\"\n", " ty_str = str(ty).encode()\n", " m = hashlib.sha256()\n", " m.update(ty_str)\n", " return m.hexdigest()[:10]" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "a5067aaf2a99b7decbc8c17d9ebd280e", "grade": false, "grade_id": "cell-9f70118929cfc07a", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "# Bayesian Statistics: Interactive Bayesian updating\n", "\n", "From https://github.com/NuclearTalent/Bayes2019/blob/master/topics/basics-of-bayesian-statistics/Bayesian_updating_coinflip_interactive.ipynb" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "ca64a684f3ba03b196e9255ca7592b89", "grade": false, "grade_id": "cell-f4d886aaea957b85", "locked": true, "schema_version": 3, "solution": false, "task": false }, "tags": [] }, "source": [ "## Python/Jupyter set up" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "143499ed0f4a1d70035ac46a55a0f3ab", "grade": false, "grade_id": "cell-8a69051f2683cc79", "locked": true, "schema_version": 3, "solution": false, "task": false }, "tags": [] }, "outputs": [], "source": [ "import numpy as np\n", "\n", "import scipy.stats as stats\n", "from scipy.stats import norm, uniform\n", "\n", "import matplotlib.pyplot as plt\n", "\n", "import ipywidgets as widgets\n", "from ipywidgets import HBox, VBox, Layout, Tab, Label, Checkbox, Button\n", "from ipywidgets import FloatSlider, IntSlider, Play, Dropdown, HTMLMath \n", "\n", "from IPython.display import display\n", "\n", "import seaborn as sns\n", "sns.set()\n", "sns.set_context(\"talk\")\n" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "ee94303da91208f884b725bcdd5becf6", "grade": false, "grade_id": "cell-ef87f464e37e4cca", "locked": true, "schema_version": 3, "solution": false, "task": false }, "tags": [] }, "source": [ "## Bayesian updating examples\n", "\n", "$ \\newcommand{\\thetavec}{\\boldsymbol{\\theta}} \\newcommand{\\pr}{\\textrm{p}}$\n", "Recall Bayes' theorem with $\\thetavec$ the vector of parameters we seek and information $I$ is kept implicit.\n", "\n", "$$\n", " \\overbrace{\\pr(\\thetavec \\mid \\textrm{data},I)}^{\\textrm{posterior}} =\n", " \\frac{\\color{red}{\\overbrace{\\pr(\\textrm{data} \\mid \\thetavec,I)}^{\\textrm{likelihood}}} \\times\n", " \\color{blue}{\\overbrace{\\pr(\\thetavec \\mid I)}^{\\textrm{prior}}}}\n", " {\\color{darkgreen}{\\underbrace{\\pr(\\textrm{data} \\mid I)}_{\\textrm{evidence}}}}\n", "$$\n", "\n", "If we view the prior as the initial information we have about $\\thetavec$, summarized as a probability density function, then Bayes' theorem tells us how to update that information after observing some data: this is the posterior pdf. Here we will give some examples of how this plays out when tossing coins.\n" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "bc0be5ceec85b10c29143e1ad87f5ceb", "grade": false, "grade_id": "cell-4f86b9c7f450dfdc", "locked": true, "schema_version": 3, "solution": false, "task": false }, "tags": [] }, "source": [ "### Determining the bias of a coin" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "0e80c93cb3e2872a51c58fa678e7990f", "grade": false, "grade_id": "cell-f277b01772407ee0", "locked": true, "schema_version": 3, "solution": false, "task": false }, "tags": [] }, "source": [ "The idea here is that we are observing successive flips of a coin, which is a proxy for any process that has a binary outcome. There is a definite true probability for getting heads, which we'll label $p_h$, but we don't know what it is. We start with a preconceived notion of the probability expressed in terms of a prior pdf for $p_h$, i.e., $\\pr(p_h)$. With each flip of the coin, we have more information, so our goal is to update our expectation of $p_h$, meaning we want the posterior $\\pr(p_h \\mid \\mbox{ num tosses, num heads})$. " ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "c28b79b238bba84152b9e54b41ed109a", "grade": false, "grade_id": "cell-404c8f0599a0960c", "locked": true, "schema_version": 3, "solution": false, "task": false }, "tags": [] }, "source": [ "#### Main code for coin-flipping UI" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "code_folding": [], "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "6d07721a47fc47504f2987268924f940", "grade": false, "grade_id": "cell-87f9a4135ca5602b", "locked": true, "schema_version": 3, "solution": false, "task": false }, "tags": [] }, "outputs": [], "source": [ "# Initial values (can be changed by widgets)\n", "n_trials_max = 5000 # maximum number of coin tosses\n", "prob_heads = 0.4 # p_h, the true probability of a heads\n", "x = np.linspace(0, 1, 301) # mesh for posterior plots (enough so smooth)\n", "\n", "class Data():\n", " \"\"\"Class to hold the array of heads and tails (1s and 0s) outcomes.\"\"\"\n", " def __init__(self, prob_heads=0.5, n_trials_max=5000):\n", " self._data = self.generate_data(prob_heads, n_trials_max)\n", " \n", " def generate_data(self, prob_heads, n_trials_max):\n", " \"\"\"Generate an array of heads or tails, 1 or 0, for n_trials_max\n", " independent tosses according to the Bernoulli distribution.\"\"\"\n", " self._data = stats.bernoulli.rvs(prob_heads, size=n_trials_max)\n", " \n", " def heads_in_data_to_N(self, N):\n", " \"\"\"Count how many heads in the first N elements of the data.\"\"\"\n", " return self._data[:N].sum()\n", "\n", "coin_data = Data(prob_heads, n_trials_max) \n", " \n", "def update_plot(N=0, jump=1, recalculate_data=True, \n", " prob_heads=0.5, n_trials_max=5000,\n", " alpha_1=1., beta_1=1.,\n", " alpha_2=30., beta_2=30.,\n", " alpha_3=0.2, beta_3=0.3\n", " ):\n", " \"\"\"\n", " Make a new plot based on the current widget settings for the input\n", " parameters.\n", " \"\"\" \n", " \n", " font_size = 18\n", " plt.rcParams.update({'font.size': font_size})\n", " \n", " fig = plt.figure(figsize=(12,5))\n", " ax = fig.add_subplot(1, 1, 1)\n", "\n", " if recalculate_data:\n", " coin_data.generate_data(prob_heads, n_trials_max)\n", " recalculate_data_w.value = False\n", "\n", " heads = coin_data.heads_in_data_to_N(N) # add up the 1s (= # of heads)\n", " # update using the conjugate prior, which is a beta pdf\n", " y_1 = stats.beta.pdf(x, alpha_1 + heads, beta_1 + N - heads) \n", " y_2 = stats.beta.pdf(x, alpha_2 + heads, beta_2 + N - heads) \n", " y_3 = stats.beta.pdf(x, alpha_3 + heads, beta_3 + N - heads) \n", "\n", " # default y_3 distribution has two high max at endpoints for plot\n", " y_max = np.max([y_1.max(), y_2.max()]) \n", " \n", " line1, = ax.plot(x, y_1, label=\"uniform prior\", color=\"blue\")\n", " ax.fill_between(x, 0, y_1, color=\"blue\", alpha=0.1)\n", " line2, = ax.plot(x, y_2, label=\"informative prior\", color=\"red\")\n", " ax.fill_between(x, 0, y_2, color=\"red\", alpha=0.1)\n", " line3, = ax.plot(x, y_3, label=\"anti prior\", color=\"green\")\n", " ax.fill_between(x, 0, y_3, color=\"green\", alpha=0.1)\n", " \n", " ax.set_xlabel(\"$p_h$, probability of heads\") \n", " ax.set_yticks([]) # turn off the plotting of ticks on the y-axis\n", " ax.axvline(prob_heads, 0, 1.1*y_max, color=\"k\", linestyle=\"--\", lw=2)\n", " ax.annotate(f'observe {N:d} tosses,\\n {heads:d} heads', \n", " xy=(0.05,0.85), xycoords='axes fraction', \n", " horizontalalignment='left',verticalalignment='top')\n", " leg = ax.legend(loc='upper right')\n", " leg.get_frame().set_alpha(0.4)\n", " ax.autoscale(tight=True)\n", "\n", " \n", "################### begin: text for help tabs ##################\n", "# In HTML (could move this to an external file!)\n", "overview_text = \\\n", " r\"\"\"
Here we explore Bayesian updating for a coin flip. There is help \n", " available under the other tabs.
\n", "Recall Bayes' theorem with $\\thetavec$ the vector of parameters \n", " we seek and information $I$ is kept implicit.
\n", "\n", " $$\n", " \\newcommand{\\thetavec}{\\boldsymbol{\\theta}}\n", " \\overbrace{p(\\thetavec \\mid \\textrm{data},I)}^{\\textrm{posterior}} =\n", " \\frac{\\color{red}{\\overbrace{p(\\textrm{data} \n", " \\mid \\thetavec,I)}^{\\textrm{likelihood}}} \\times\n", " \\color{blue}{\\overbrace{p(\\thetavec \\mid I)}^{\\textrm{prior}}}}\n", " {\\color{darkgreen}{\\underbrace{p(\\textrm{data} \n", " \\mid I)}_{\\textrm{evidence}}}}\n", " $$\n", "\n", "If we view the prior as the initial information we have about \n", " $\\thetavec$, summarized as a probability density function, \n", " then Bayes' theorem tells us how to update that \n", " information after observing some data: this is the posterior pdf. \n", " Here we will look at an example of how this plays out in practice:\n", " flipping a (biased) coin.
\n", "\n", "The idea here is that we are observing successive flips of a coin, \n", " which is a proxy for any process that has a binary outcome. \n", " There is a definite true probability for getting heads, \n", " which we'll label $p_h$, but we don't know what it is. \n", " We start with a preconceived notion of the probability expressed \n", " in terms of a prior pdf for $p_h$, i.e., $p(p_h)$. \n", " With each flip of the coin, we have more information, so our goal is \n", " to update our expectation of $p_h$, meaning we want the \n", " posterior $p(p_h\\mid \\mbox{# tosses, # heads})$.
\n", "\n", " \"\"\"\n", "\n", "toss_coin_text = \\\n", " r\"\"\"\n", " The graph shows three posteriors that result from three choices for\n", " the prior (see the \"Priors\" tab for details) for the number of coin\n", " tosses and observed heads shown at the upper left. The true probability\n", " of a heads, $p_h$, is indicated by a dashed vertical line.\n", "