pm21-dragon/exercises/source/exercise-13/3 BEST exercise.ipynb

357 lines
10 KiB
Plaintext
Raw Normal View History

2025-01-31 03:53:37 -05:00
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# You must run this cell, but you can ignore its contents.\n",
"\n",
"import hashlib\n",
"\n",
"def ads_hash(ty):\n",
" \"\"\"Return a unique string for input\"\"\"\n",
" ty_str = str(ty).encode()\n",
" m = hashlib.sha256()\n",
" m.update(ty_str)\n",
" return m.hexdigest()[:10]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The questions here are in reference to the notebook from the lecture named `BEST.ipynb`. You will need to run that notebook to answer the following questions. You will need PyMC installed to run that notebook. You may be able to install it within Jupyter with `!pip install pymc`."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Problems\n",
"\n",
"### Q1 Run the notebook above with the following samples. What is the mean value of the posterior estimate of the difference of means? Put this in the variable `posterior_diff_means`.\n",
"\n",
"```\n",
"drug = (101,100,102,104,102,97,105,105,98,101,100,123,105,103,100,95,102,106,\n",
" 109,102,82,102,100,102,102,101,102,102,103,103,97,97,103,101,97,104,\n",
" 96,103,124,101,101,100,101,101,104,100,101)\n",
"placebo = (99,101,100,101,102,100,97,101,104,101,102,102,100,105,88,101,100,\n",
" 104,100,100,100,101,102,103,97,101,101,100,101,99,101,100,100,\n",
" 101,100,99,101,100,102,99,100,99)\n",
"```\n",
"\n",
"Note: you may also try running this at [the online version of the BEST test](http://www.sumsar.net/best_online/)."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"nbgrader": {
"grade": false,
"grade_id": "cell-4583b093ba19414e",
"locked": false,
"schema_version": 3,
"solution": true,
"task": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"'4a44dc1536'"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Type your answer here and then run this and the following cell.\n",
"posterior_diff_means = 1.017\n",
"ads_hash(int(np.round(posterior_diff_means*10)))"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"nbgrader": {
"grade": true,
"grade_id": "cell-16393e9287358f37",
"locked": true,
"points": 1,
"schema_version": 3,
"solution": false,
"task": false
}
},
"outputs": [],
"source": [
"# This is a test of the above, do not change this code.\n",
"assert(ads_hash(int(np.round(posterior_diff_means*10)))=='4a44dc1536')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q2 Note the bounds of the 94% highest posterior density interval (HDI) for the difference of means between group 1 and group 2. Read these values from the summary table of the trace looking at the \"difference of means\" variable.\n",
"\n",
"Put these into the variables `diff_means_hdi_lower` and `diff_means_hdi_upper`\n",
"\n",
"(94% was chosen by the authors of the software as a friendly reminder that this value is an arbitrary choice - rather than perhaps 95% which my not remind you of this choice.)\n",
"\n",
"Also, due to the use of (pseudo) random numbers, your results may not exactly meet the auto-test check here. If that's the case, rerun the example with the BEST model and try again."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"nbgrader": {
"grade": false,
"grade_id": "cell-f10804d9e06925c6",
"locked": false,
"schema_version": 3,
"solution": true,
"task": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"('6b86b273ff', '4e07408562')"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Type your answer here and then run this and the following cell.\n",
"diff_means_hdi_lower, diff_means_hdi_upper = 0.220, 1.915\n",
"diff_means_hdi_lower, diff_means_hdi_upper = 0.125, 1.827\n",
"ads_hash(int(np.round(diff_means_hdi_lower*10))), ads_hash(int(np.round(diff_means_hdi_upper+1.0)))"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"nbgrader": {
"grade": true,
"grade_id": "cell-c13982956b19759a",
"locked": true,
"points": 1,
"schema_version": 3,
"solution": false,
"task": false
}
},
"outputs": [],
"source": [
"assert ads_hash(int(np.round(diff_means_hdi_lower*10)))=='6b86b273ff'\n",
"assert ads_hash(int(np.round(diff_means_hdi_upper+1.0))) == '4e07408562'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q3 Re-run the notebook with the following samples. What is the mean value for the posterior estimate of the difference of means for this smaller sample size? Put this in the variable `posterior_diff_means_small_samples`.¶\n",
"\n",
"```\n",
"drug = (103,99,103,105,102,101)\n",
"placebo = (101,100,102,105,102,96)\n",
"```\n",
"\n",
"Again, due to the use of (pseudo) random numbers, your results may not exactly meet the auto-test check here. If that's the case, rerun the example with the BEST model and try again."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"nbgrader": {
"grade": false,
"grade_id": "cell-e349ae3da9999c30",
"locked": false,
"schema_version": 3,
"solution": true,
"task": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"'4a44dc1536'"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Type your answer here and then run this and the following cell.\n",
"posterior_diff_means_small_samples = 1.043\n",
"ads_hash(int(np.round(posterior_diff_means_small_samples*10)))"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"nbgrader": {
"grade": true,
"grade_id": "cell-25d34717f657c572",
"locked": true,
"points": 1,
"schema_version": 3,
"solution": false,
"task": false
}
},
"outputs": [],
"source": [
"# This is a test of the above, do not change this code.\n",
"assert(ads_hash(int(np.round(posterior_diff_means_small_samples*10)))=='4a44dc1536')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q4 Note the bounds of the 94% highest posterior density interval (HDI) for the difference of means between group 1 and group 2 for this smaller data set. Read these values from the summary table of the trace looking at the \"difference of means\" variable.\n",
"\n",
"Put these into the variables `diff_means_hdi_lower_small_samples` and `diff_means_hdi_upper_small_samples`\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"nbgrader": {
"grade": false,
"grade_id": "cell-6cb965cd2f3f109a",
"locked": false,
"schema_version": 3,
"solution": true,
"task": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"('37aa1ccf80', 'e7f6c01177')"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Type your answer here and then run this and the following cell.\n",
"(diff_means_hdi_lower_small_samples,diff_means_hdi_upper_small_samples)=-2.600, 4.521\n",
"ads_hash(int(np.round(diff_means_hdi_lower_small_samples*2))), ads_hash(int(np.round(diff_means_hdi_upper_small_samples+1)))"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"nbgrader": {
"grade": true,
"grade_id": "cell-b0b09eb0a16dfb2b",
"locked": true,
"points": 1,
"schema_version": 3,
"solution": false,
"task": false
}
},
"outputs": [],
"source": [
"# This is a test of the above, do not change this code.\n",
"assert ads_hash(int(np.round(diff_means_hdi_lower_small_samples*2))) == '37aa1ccf80'\n",
"assert ads_hash(int(np.round(diff_means_hdi_upper_small_samples+1))) == 'e7f6c01177'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q5 The posterior estimate of the difference in means between the two datasets is similar. But one of the datasets has much more data than the other. Explain why the 94% HDI exclude a zero difference result in one dataset but not the other."
]
},
{
"cell_type": "markdown",
"metadata": {
"nbgrader": {
"grade": true,
"grade_id": "cell-cde0c771165a691c",
"locked": false,
"points": 1,
"schema_version": 3,
"solution": true,
"task": false
}
},
"source": [
"### (Your answer here)\n",
"\n",
"The width of the HDI is much smaller in the dataset with more data -- this reflects that we are more certain about our estimates because we have more data. With the narrower posterior we can confidently exclude zero effect."
]
}
],
"metadata": {
"anaconda-cloud": {},
"celltoolbar": "Create Assignment",
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.10"
},
"latex_envs": {
"bibliofile": "biblio.bib",
"cite_by": "apalike",
"current_citInitial": 1,
"eqLabelWithNumbers": true,
"eqNumInitial": 0
}
},
"nbformat": 4,
"nbformat_minor": 4
}