pm21-dragon/lectures/lecture-04/1 - For-loops-dicts-files.ipynb

2324 lines
58 KiB
Plaintext
Raw Normal View History

2024-11-08 02:55:51 -05:00
{
"cells": [
{
"cell_type": "markdown",
"id": "a795ba59-d48d-4f8d-934e-2cbc558071e4",
"metadata": {},
"source": [
"# `exercise-04` review\n",
"\n",
"* I check your answers based on file name. Please keep the files names exactly as specified, i.e. `my_name.py`.\n",
"\n",
"* Example answers:\n",
"\n",
"```python\n",
"print(\"Paolo\")\n",
"\n",
"\n",
"a = 1001\n",
"b = 22\n",
"def onethousandandone_times_twentytwo(a,b):\n",
" print(a*b)\n",
"```\n",
"\n",
"vs\n",
"\n",
"```python\n",
"# Create a Python script called `my_name.py` which does two things:\n",
"\n",
"# 1) prints your name\n",
"\n",
"print(\"Paolo\")\n",
"\n",
"# 2) computes the value of 1001 * 22 and then prints this\n",
"\n",
"result = 1001*22\n",
"print(result)\n",
"```\n",
"\n",
"vs\n",
"\n",
"```python\n",
"print (\"Paolo\")\n",
"value1 = 1001*22\n",
"print (value1)\n",
"```\n",
"\n",
"Correct answer should look like this:\n",
"\n",
"```\n",
"astraw@computer$ python my_name.py\n",
"Paolo\n",
"22022\n",
"astraw@computer$\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ba980f4b-8e46-41d7-940c-90910d939b81",
"metadata": {},
"outputs": [],
"source": [
"# What is wrong with this code?\n",
"print(Andrew)\n",
"print(1001 * 22)"
]
},
{
"cell_type": "markdown",
"id": "4df1eac5-e457-4d81-a1bf-fd151265ea9c",
"metadata": {
"tags": []
},
"source": [
"# For loops, iterators, Dictionaries, more operators, files"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e19d742e-a9b2-4583-baf7-f3277157beb6",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# We run this for use below\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"id": "4a1b23a3-eca2-475c-928d-74a9340c8948",
"metadata": {},
"source": [
"## Control flow with `for` using `range` to produce an iterator"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b083f041-81b4-4036-a074-777cca57b714",
"metadata": {},
"outputs": [],
"source": [
"for x in range(10):\n",
" print(x)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5d9b3d49-bfb1-4710-ad2d-2017c8f53f57",
"metadata": {},
"outputs": [],
"source": [
"for x in range(0, 10):\n",
" print(x)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1a9fa26d-b496-4027-8284-296eeb03c458",
"metadata": {},
"outputs": [],
"source": [
"for y in range(0, 1000, 100):\n",
" print(y)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f255422b-b9f2-4cac-8416-e6e2873c0efd",
"metadata": {},
"outputs": [],
"source": [
"myiter = range(0, 1000, 100)\n",
"print('myiter:', myiter)\n",
"print(type(myiter))\n",
"for y in myiter:\n",
" print(y)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d8cde809-3a7c-4d47-95e0-ae92f4fbc32f",
"metadata": {},
"outputs": [],
"source": [
"for y in range(10):\n",
" print(y)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "618eae3e-271e-4aa9-b960-2406e66f5dff",
"metadata": {},
"outputs": [],
"source": [
"for y in range(4,10):\n",
" print(y)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b1d9ce42-4b17-464b-890b-58fc3074451c",
"metadata": {},
"outputs": [],
"source": [
"for y in range(4, 10, 2):\n",
" print(y)"
]
},
{
"cell_type": "markdown",
"id": "ce930921-7f89-40c3-8ae1-86fa4bf76950",
"metadata": {},
"source": [
"Note the symmetry between `range()` and slices."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "21919ffa-13fa-44bb-b1d9-263e176c497c",
"metadata": {},
"outputs": [],
"source": [
"my_list = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7418236e-f87c-4caf-9ae8-6e4f50fcda59",
"metadata": {},
"outputs": [],
"source": [
"my_list[:10]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "72a30f71-a29c-4b5e-b18d-828c85598502",
"metadata": {},
"outputs": [],
"source": [
"my_list[4:10]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "84bcbdbf-4bc2-4099-ba18-3d1f88df47f0",
"metadata": {},
"outputs": [],
"source": [
"my_list[4:10:2]"
]
},
{
"cell_type": "markdown",
"id": "5f170ad1-3f66-4dce-b838-08e9e6962fe4",
"metadata": {},
"source": [
"## Control flow with `for` using a list as an iterator"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5e8ad5d5-8310-44f7-b3a3-03efedd846c6",
"metadata": {},
"outputs": [],
"source": [
"my_list = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\n",
"\n",
"for y in my_list:\n",
" print(y)\n",
"print(\"end\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9d72823c-57da-443c-85b7-8beef3d785d8",
"metadata": {},
"outputs": [],
"source": [
"my_list = [[5,5,6], [6,6,7]]\n",
"\n",
"for y in my_list:\n",
" print('y:',y)\n",
" for z in y:\n",
" print(z)\n",
"print(\"end\")"
]
},
{
"cell_type": "markdown",
"id": "8063e2bc-23a3-4364-bfa2-21d6da515555",
"metadata": {},
"source": [
"# iterators\n",
"\n",
"We have seen now a couple examples of *iterators*.\n",
"\n",
"An iterator is not a type in Python but rather a behavior that some types have. Namely, you can iterate over them. This means you can use them as the source of data in a `for` loop. All items in the iterators do not need to be stored in memory at once, but rather they can be constructed one at a time.\n",
"\n",
"Iterators could run infinitely or they can end at a certain point.\n",
"\n",
"We can create a list from all values in an iterator in a couple different ways.\n",
"\n",
"The first you should be able to do by yourself already:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "04ae4602-76fe-4382-a9bc-946999dc9f49",
"metadata": {},
"outputs": [],
"source": [
"my_list = []\n",
"for x in range(10):\n",
" my_list.append(x)\n",
"my_list"
]
},
{
"cell_type": "markdown",
"id": "7784431a-461e-4d0c-b3eb-21ab73d8da44",
"metadata": {},
"source": [
"The second approach of creating a list from all values in an iterator relies on the `list()` function, which is the *constructor* of a list. This constructor function will iterate over the iterator and create a list with its contents:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "67b35f94-b268-488b-a8ee-c0cb12bc708c",
"metadata": {},
"outputs": [],
"source": [
"my_list = list(range(10))\n",
"my_list"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fb690611-4886-47fd-810a-048487ffa4bd",
"metadata": {},
"outputs": [],
"source": [
"my_list = []\n",
"x = \"my super important data\"\n",
"# Note that we overwrite x here!\n",
"for x in range(2):\n",
" my_list.append(x)\n",
"my_list"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0a7785b5-9a4d-438e-9cd6-b5fe38afa003",
"metadata": {},
"outputs": [],
"source": [
"x"
]
},
{
"cell_type": "markdown",
"id": "8c93a418-83fd-459f-a277-d28384d7a0cf",
"metadata": {},
"source": [
"`continue` and `break` work in for loops, too."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "13d2a909-7a05-43f6-92b9-f9c09e51563f",
"metadata": {},
"outputs": [],
"source": [
"my_list = []\n",
"for x in range(100):\n",
" if x > 5:\n",
" if x < 10:\n",
" continue\n",
" if x >= 20:\n",
" break\n",
" my_list.append(x)\n",
"my_list"
]
},
{
"cell_type": "markdown",
"id": "6545f21f-1f66-41ec-9573-eabd95b0f0e8",
"metadata": {},
"source": [
"# Methods\n",
"\n",
"Methods are a way of giving a type specific additional functions. You already know a few of them, which so far we have just used without discussing much. This includes `list.append` and `str.format`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "09c591ee-a2a6-475c-bfbd-b2fa52a3accc",
"metadata": {},
"outputs": [],
"source": [
"my_list = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\n",
"my_list.append(10)\n",
"my_list"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "82a767b3-d398-4f7e-b389-0b143511737c",
"metadata": {},
"outputs": [],
"source": [
"my_str = \"Hello, my name is {}\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8c459558-d7ce-4459-9195-1d197a1a23b0",
"metadata": {},
"outputs": [],
"source": [
"my_str"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d987a1f3-4d33-405d-ac3e-2eb5b44f95c2",
"metadata": {},
"outputs": [],
"source": [
"my_str.format(\"Andrew\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3dcd5aba-8877-4ca5-9725-445870073529",
"metadata": {},
"outputs": [],
"source": [
"my_str"
]
},
{
"cell_type": "markdown",
"id": "d866caa5-8d01-4c36-9a0c-817cf5bb637f",
"metadata": {},
"source": [
"Later, we will learn how to define our own methods. For now, it's just important that you know a method is like a function. Both can be *called* with input arguments, they return an output value, and they can have \"side effects\" -- changes to their inputs or something else."
]
},
{
"cell_type": "markdown",
"id": "1e24fa7e-e8a7-42d9-9b74-7b82f18067b0",
"metadata": {},
"source": [
"# Modules\n",
"\n",
"We have also used a number of modules without discussing this aspect much. There are built-in modules -- they come with Python as part of \"the standard library\" -- and there are modules which have to be installed separately. Matplotlib, for example, is a set of modules, (a \"library\") which we use a lot and which is not part of the Python language itself.\n",
"\n",
"Modules are a data type in Python like any other. They can have functions which have names like `module_name.function_name`. This is a very minor point, but the `.` makes a function in a module \"look like\" a method, but actually it is a normal function.\n",
"\n",
"Here we *import* the `random` module from the standard library."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cc64c0d3-8bee-4d5f-ace8-1fcd05b2ca85",
"metadata": {},
"outputs": [],
"source": [
"import random"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "39943249-daa9-49c3-bc95-d537ca76e025",
"metadata": {},
"outputs": [],
"source": [
"x = [1,2,3,4,5,'asdf','dalkfj']\n",
"random.choice(x)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "885b9d19-4d21-400a-81f4-669460baa6f2",
"metadata": {},
"outputs": [],
"source": [
"random.choice(x)"
]
},
{
"cell_type": "markdown",
"id": "5f424016-3f5a-4d51-ac50-bfbc1971f42d",
"metadata": {},
"source": [
"As mentioned, there are modules which are not part of the Python language itself. In fact there are approximately zillions of libraries for doing many, many different things, and this is one of the reasons Python is so useful and so popular. There can be a positive feedback loop between language popularity and the availability of libraries, and Python has benefitted a lot from this - especially in the data science area.\n",
"\n",
"One place that distributes many Python modules: [PyPI, the python package index](https://pypi.org/) another is [Anaconda](https://www.anaconda.com).\n",
"\n",
"As an example, let's return to our previous use of matplotlib. Check, for example the [matplotlib gallery](https://matplotlib.org/stable/gallery/index.html) for example plots. Here is a simple usage of matplotlib to draw a simple plot:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "09231cae-e75e-463d-8b9b-b2c873d482d5",
"metadata": {},
"outputs": [],
"source": [
"# Below, we will use matplotlib, so we need to import it here.\n",
"import matplotlib.pyplot as plt\n",
"\n",
"x=[1,2,3,4,5,6,7,8,9,10]\n",
"y=[0,4,0,3,3,0,3,4,5,2]\n",
"\n",
"plt.plot(x,y)"
]
},
{
"cell_type": "markdown",
"id": "7a417b0c-a52b-4a87-888e-322150bade72",
"metadata": {},
"source": [
"To start with, there are a few simple things you can do to improve your plot:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "822633f1-a12b-4d5e-a577-937111d10914",
"metadata": {},
"outputs": [],
"source": [
"# Below, we will use matplotlib, so we need to import it here.\n",
"\n",
"x=[1,2,3,4,5,6,7,8,9,10]\n",
"y1=[0,4,0,3,3,0,3,4,5,2]\n",
"plt.plot(x, y1, label=\"y1\")\n",
"plt.plot(x, x, label=\"x\")\n",
"y2=[3,2,4,4,2,4,4,2,4,2]\n",
"plt.plot(x, y2, label=\"y2\")\n",
"plt.legend()\n",
"plt.xlabel('x (unit 1)')\n",
"plt.ylabel('y (unit 2)')"
]
},
{
"cell_type": "markdown",
"id": "8bb1996b-1984-4b72-80a4-5d51f2c53dae",
"metadata": {
"tags": []
},
"source": [
"# Example: compute the Fibonacci sequence using *recursion*"
]
},
{
"cell_type": "markdown",
"id": "5cf12416-91df-412f-948c-b65fc37dc0c1",
"metadata": {
"tags": []
},
"source": [
"1, 1, 2, 3, 5, 8, 13"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2ea9d4a7-57e1-4a48-8314-8271e04baa35",
"metadata": {},
"outputs": [],
"source": [
"def fib(n):\n",
" \"\"\"Return the Fibonacci sequence up to position n.\n",
" \n",
" n is an integer\"\"\"\n",
" # Check that our assumptions are true\n",
" assert type(n)==int\n",
" assert n>0\n",
" \n",
" # special cases for short lists\n",
" if n == 1:\n",
" return [1]\n",
" if n == 2:\n",
" return [1,1]\n",
" \n",
" seq = fib(n-1)\n",
" a = seq[-2]\n",
" b = seq[-1]\n",
" seq.append( a+b )\n",
" return seq\n",
"\n",
"fib(3)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "884683c3-fc5b-4a7e-b9ea-3c5cf44d7e0d",
"metadata": {},
"outputs": [],
"source": [
"fib(4)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5c028087-c2d7-4db5-883d-28f8a7bedee7",
"metadata": {},
"outputs": [],
"source": [
"fib(10)"
]
},
{
"cell_type": "markdown",
"id": "073bfda5-113a-4d20-9862-97c81f8b0854",
"metadata": {},
"source": [
"## More strings\n",
"\n",
"[`str`](https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str)\n",
"\n",
"Useful function for strings:\n",
"\n",
"- `len`\n",
"\n",
"Useful methods:\n",
"\n",
"- `strip`\n",
"- `split`\n",
"- `startswith`\n",
"- `endswith`"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2e2a51f4-ba55-4da6-8c95-11d5d3b1c4ed",
"metadata": {},
"outputs": [],
"source": [
"len(\"my string\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "51b609ce-a438-41dc-a708-3c29e616bd37",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"\" my string \".strip()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "006f32e1-db2a-4358-a7e4-f78d23f1a382",
"metadata": {},
"outputs": [],
"source": [
"len(\" my string \")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4c8ae22e-f44b-4465-a908-d116347a8343",
"metadata": {},
"outputs": [],
"source": [
"len(\" my string \".strip())"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "014ef6e5-1a9c-42fc-a143-840fe2a0a992",
"metadata": {},
"outputs": [],
"source": [
"a=\" my string \"\n",
"a.strip()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "967cb647-c24a-462c-88b9-a7b018ec448e",
"metadata": {},
"outputs": [],
"source": [
"a"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6993859c-1479-4fe8-8402-20b1281e077c",
"metadata": {},
"outputs": [],
"source": [
"\"a,b,c,def\".split(\",\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "71684557-3b94-46e1-a0da-edbaf5752204",
"metadata": {},
"outputs": [],
"source": [
"\"hello world\".startswith(\"hello\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "323f78f6-cad4-4522-a291-c3faa0d92edd",
"metadata": {},
"outputs": [],
"source": [
"\"hello world\".endswith(\"world\")"
]
},
{
"cell_type": "markdown",
"id": "3a9120ae-28a2-4d0b-8f90-7238f8dea8a2",
"metadata": {},
"source": [
"## Dictionaries - Python's `dict` type\n",
"\n",
"`dict` construction is with either `{}` or `dict()`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "13a7d7fc-704e-4ea6-b17e-353ac1b8ed5a",
"metadata": {},
"outputs": [],
"source": [
"x = {'key1': 'value1',\n",
" 'key2': 'value2',\n",
" 'key3': 'value3',\n",
" }"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a5aa8b60-b195-400c-9cdd-99608c9b5b38",
"metadata": {},
"outputs": [],
"source": [
"x"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6ee1492a-acc6-41a1-9751-66319d1585f9",
"metadata": {},
"outputs": [],
"source": [
"x['key1']"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5abd5123-ff6c-468b-87c2-5d1a88749b8d",
"metadata": {},
"outputs": [],
"source": [
"key = \"key3\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f48ab396-0797-484e-b812-f8ea25ec411a",
"metadata": {},
"outputs": [],
"source": [
"x[key]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9d620659-6db4-4c50-b573-a6c0d9f4c94d",
"metadata": {},
"outputs": [],
"source": [
"x[key1]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3025c22b-fb7c-4964-a0a0-f1d2a68ac910",
"metadata": {},
"outputs": [],
"source": [
"x = dict( (('key1', 'value1'), ['key2', 'value2'], ('key3', 'value3')) )"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3aee4625-a694-48ec-9627-89de9ad343df",
"metadata": {},
"outputs": [],
"source": [
"x"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b95d11fc-96c9-42fb-9ae5-79470f1d5a78",
"metadata": {},
"outputs": [],
"source": [
"type(x)"
]
},
{
"cell_type": "markdown",
"id": "bff2b811-cfab-484c-9d12-35ae7308c42a",
"metadata": {},
"source": [
"Keys in a `dict` can be any value that is *hashable*."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "98c0c33d-c4cf-4912-87f7-37ff05f90217",
"metadata": {},
"outputs": [],
"source": [
"x={1:'value1', 2:'value2'}"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f2b9029e-1ba7-43fe-8447-39082e0972dd",
"metadata": {},
"outputs": [],
"source": [
"x[1]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "34fc4694-1e93-4bb7-bc45-a9bfeb58cb9e",
"metadata": {},
"outputs": [],
"source": [
"x={(1,2,3): \"456\"}\n",
"x"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7130ab9e-83d7-4c88-a29b-d51db4090fc5",
"metadata": {},
"outputs": [],
"source": [
"x[(1,2,3)]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "83a55976-33e5-4496-a069-d50d9b9079e3",
"metadata": {},
"outputs": [],
"source": [
"x={[1,2,3]: \"456\"}\n",
"x"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e6081d5f-f775-4cb1-a126-c1517a712f0d",
"metadata": {},
"outputs": [],
"source": [
"x = {'key1':1, 'key2':2, 'key3':123456, 'key4': [1,2,3], 'key5': {}, 1234: 4321, (1,2,3): '9845712345'}"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d453c135-5195-4fe3-8366-4799c0ea35e9",
"metadata": {},
"outputs": [],
"source": [
"x"
]
},
{
"cell_type": "markdown",
"id": "378490b3-1a1c-40eb-9fdd-49f373cfd311",
"metadata": {},
"source": [
"Just like we can iterate over items in a list, we can iterate over the keys in a dict:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4dc3830e-d4ca-41e8-8e7d-98cb583e1f61",
"metadata": {},
"outputs": [],
"source": [
"for key in x:\n",
" print(key)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "19560e57-1815-48a8-95e8-f640d151e12b",
"metadata": {},
"outputs": [],
"source": [
"for key in x:\n",
" value = x[key]\n",
" print(f\"key: {key}, value: {value}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "83e6c07c-dec3-4c86-9c0e-11954dce7b32",
"metadata": {},
"outputs": [],
"source": [
"x['key5']"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "84b5bc6d-4a1f-4f64-a1f6-09ef906f838a",
"metadata": {},
"outputs": [],
"source": [
"x['key does not exist']"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "937a9759-bccb-4edc-85d1-4007a2e6bf07",
"metadata": {},
"outputs": [],
"source": [
"x['my new key'] = 9843059"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "71d14521-f025-440d-80e4-e2ac8f4f30d8",
"metadata": {},
"outputs": [],
"source": [
"x"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "27f83073-2f88-4d7b-8ba8-5fc124cc78e6",
"metadata": {},
"outputs": [],
"source": [
"x['key5']['hello'] = 'world'"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9305e9cf-b800-436f-8261-d0264c965e91",
"metadata": {},
"outputs": [],
"source": [
"x"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f9b2afb6-9aba-41ed-8894-737eca4da4fa",
"metadata": {},
"outputs": [],
"source": [
"tmp = x['key5']\n",
"tmp['hello'] = 'world 2'"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3933cbbd-fa69-4729-8a18-0a62d40a2d45",
"metadata": {},
"outputs": [],
"source": [
"x"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7216abd8-d3af-4b2e-bac4-712c34cce195",
"metadata": {},
"outputs": [],
"source": [
"x['key4'].append( 4 )"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "746ed972-e427-4fa1-9fcf-99def813343c",
"metadata": {},
"outputs": [],
"source": [
"x"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "de23284e-f7ce-442d-8e17-e9b2346256b8",
"metadata": {},
"outputs": [],
"source": [
"'key1' in x"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3f86ace1-eb0c-4e9d-83d3-03e7b0610479",
"metadata": {},
"outputs": [],
"source": [
"1 in x"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f1cc91eb-e228-48de-b797-ed1efa675d51",
"metadata": {},
"outputs": [],
"source": [
"1234 in x"
]
},
{
"cell_type": "markdown",
"id": "6a2d6058-e5db-478c-9b18-ed4380a0d943",
"metadata": {},
"source": [
"## More about functions: keyword arguments"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "01cd5398-943c-49ee-a0bb-95f48dd52bff",
"metadata": {},
"outputs": [],
"source": [
"def my_function(x, z=1):\n",
" return x+z*z"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b3419f92-8017-4656-b4cc-c9b29102e46b",
"metadata": {},
"outputs": [],
"source": [
"my_function(9)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3a7c0dc4-5af9-493f-a7d1-b0f97d0e72fb",
"metadata": {},
"outputs": [],
"source": [
"my_function(9,11)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c76e2e4c-9005-4cee-a27b-2e9decf954c4",
"metadata": {},
"outputs": [],
"source": [
"my_function(9,z=11)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "debac7a1-0865-4c35-878a-e40626e7f485",
"metadata": {},
"outputs": [],
"source": [
"my_function(x=9,z=11)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f952135a-9211-484d-8159-0e8971cf3b23",
"metadata": {},
"outputs": [],
"source": [
"my_function(z=11)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4d5d297c-711b-4d54-bae4-923d95aa18f9",
"metadata": {},
"outputs": [],
"source": [
"my_function(z=11,x=9)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cf7db77e-fd81-4035-8bd8-66219ed478d9",
"metadata": {},
"outputs": [],
"source": [
"my_function(z=11,9)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "21f8f437-b093-47f0-a00a-e80fda8ade30",
"metadata": {},
"outputs": [],
"source": [
"def my_function2(x, y, z=1, qq=0):\n",
" return x+z+qq+y"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fa8fe1f7-22aa-46f7-8f6a-4de9b354f7c7",
"metadata": {},
"outputs": [],
"source": [
"my_function2(0,1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d019400e-2aa2-4b42-85c6-27dea98ae5b3",
"metadata": {},
"outputs": [],
"source": [
"my_function2(0,1,qq=-32)"
]
},
{
"cell_type": "markdown",
"id": "f6593546-0c45-4708-918a-6c215a1fb3a2",
"metadata": {},
"source": [
"## The `+` operator on various data types"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3d6f8126-f40b-4733-9fe1-fd86af8a0d1b",
"metadata": {},
"outputs": [],
"source": [
"1+1"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7ea7cec7-9343-4f9a-b615-2755747ed983",
"metadata": {},
"outputs": [],
"source": [
"1 + 2.3"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "24a67ab4-d73f-48d0-be71-2eff2964ac48",
"metadata": {},
"outputs": [],
"source": [
"\"1\"+1"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f2ac0055-6908-43f4-b8c4-e749191641af",
"metadata": {},
"outputs": [],
"source": [
"1+\"1\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f37d2ace-962a-4c02-90f2-52bffd60592d",
"metadata": {},
"outputs": [],
"source": [
"\"1\"+\"1\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "81dc1b8a-0b49-47af-afb0-b5681dd9c045",
"metadata": {},
"outputs": [],
"source": [
"\"1 x\" + \"1 y\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bab16714-5a6b-4e4f-8ed4-8a16498923a6",
"metadata": {},
"outputs": [],
"source": [
"[1]+1"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "00d9ea96-be24-43a0-b2bd-6bcedf2c05ad",
"metadata": {},
"outputs": [],
"source": [
"[1] + [1]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3bc11485-00de-40c3-8af6-3e063d2de489",
"metadata": {},
"outputs": [],
"source": [
"x=[1]\n",
"y=[1]\n",
"z=x+y\n",
"z"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5a100456-56a3-4134-ab9f-6017ec27f7b3",
"metadata": {},
"outputs": [],
"source": [
"list.__add__([1], [1])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9533ce06-1ee9-4c98-9d7b-188774fb2c26",
"metadata": {},
"outputs": [],
"source": [
"x=[1]\n",
"x.append(1)\n",
"x"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d524a21a-7538-45be-8f92-7b7299bfd2cc",
"metadata": {},
"outputs": [],
"source": [
"int.__add__(1, 3)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3f8b6fbe-f757-490d-8689-93adf78ee211",
"metadata": {},
"outputs": [],
"source": [
"1 + 3"
]
},
{
"cell_type": "markdown",
"id": "0e52366a-b96e-4602-8f50-0e2b1932adc2",
"metadata": {},
"source": [
"Note: \"joining\" or \"combining\" one sequence to another is called *concatenating* the sequences. It works with lists of any length:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "568bdadd-2505-4972-9955-968fd8b959dd",
"metadata": {},
"outputs": [],
"source": [
"[1,2,3] + [4,5,6]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5863946d-499d-406c-ac15-6e9d627d5d32",
"metadata": {},
"outputs": [],
"source": [
"[1,2,3] + []"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "91fcf9e5-a3a6-413f-a5aa-990ab97fd9b9",
"metadata": {},
"outputs": [],
"source": [
"[] + [1,2,3]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b6fdaf75-7185-4b9f-9c5d-178242b662e2",
"metadata": {},
"outputs": [],
"source": [
"(1,) + (1,)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cc30aba1-031e-4bec-9af0-910df0e8c74d",
"metadata": {},
"outputs": [],
"source": [
"(1,) + 1"
]
},
{
"cell_type": "markdown",
"id": "0bac25ee-d7ee-4501-bc94-9a372f889a19",
"metadata": {},
"source": [
"## The `*` operator on various data types"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c99b7fb4-8d1a-464a-b2b6-b71c78053a6e",
"metadata": {},
"outputs": [],
"source": [
"1*5"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "14bc9a15-79c8-4840-a623-ff4267b874bf",
"metadata": {},
"outputs": [],
"source": [
"\"1xz\"*5"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b1ff7075-8a59-490a-9047-c2283a6f648e",
"metadata": {},
"outputs": [],
"source": [
"\"1xz\"*\"blah\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "342b200b-8dae-4793-9140-597b88f142f2",
"metadata": {},
"outputs": [],
"source": [
"[1,2,3]*5"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6a5d7d47-2c74-4c88-87e0-cb9d18f70ea4",
"metadata": {},
"outputs": [],
"source": [
"5 * [1,2,3]"
]
},
{
"cell_type": "markdown",
"id": "16e4e558-5d84-44df-94d7-02d638a88ed5",
"metadata": {},
"source": [
"## Special method: `object.__add__(other)`\n",
"\n",
"Many of the bits of Python we have already been using are defined as \"special methods\". The names of these methods start and end with a double underscore `__`. They are not usually called directly, but rather Python calls these methods \"secretly\" to acheive some task. As we saw above, the \"add\" special method is implemended with `__add__`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d201089e-a0bc-44ae-b786-ca6c5b372c4e",
"metadata": {},
"outputs": [],
"source": [
"six = 6\n",
"six.__add__(4)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bd47841c-f7ea-4d41-bf8d-702c81677ad1",
"metadata": {},
"outputs": [],
"source": [
"int.__add__(6,4)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f658de87-ace5-4863-b401-8c77e953af67",
"metadata": {},
"outputs": [],
"source": [
"six+4"
]
},
{
"cell_type": "markdown",
"id": "de7adb09-9380-47f3-8640-a839355322d0",
"metadata": {},
"source": [
"## Special method: `object.__getitem__(index)`\n",
"\n",
"The special method `object.__getitem__(index)` is how python implements `object[index]`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1ba584d9-4454-4f92-8161-91db385765e1",
"metadata": {},
"outputs": [],
"source": [
"x={0:1}"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "48747698-be87-43c4-9b62-c4f2c01d3358",
"metadata": {},
"outputs": [],
"source": [
"x"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8b329eca-9cc2-49b5-9f38-984e2a5d6082",
"metadata": {},
"outputs": [],
"source": [
"x[0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7d8c44c3-83a8-44fb-a2aa-662cec019329",
"metadata": {},
"outputs": [],
"source": [
"x.__getitem__(0)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "843c9276-15fa-4c37-8957-e65095bfd1ad",
"metadata": {},
"outputs": [],
"source": [
"x={1:\"value1\",2:43}"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ad181878-37d7-453b-9d82-0e0e87fa262e",
"metadata": {},
"outputs": [],
"source": [
"x[1]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "078a64da-72e6-482a-bd10-6214b13b0259",
"metadata": {},
"outputs": [],
"source": [
"x.__getitem__(1)"
]
},
{
"cell_type": "markdown",
"id": "7700f265-438e-47f0-8374-bfaa04c37839",
"metadata": {},
"source": [
"## Special method: `sequence.__len__()`\n",
"\n",
"Another special method is `__len__`, which returns the length of a sequence."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3f43db53-58aa-4fb7-844a-2a887f573a5d",
"metadata": {},
"outputs": [],
"source": [
"x"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cbc1f087-1bd2-45c8-9769-b634ed0bf454",
"metadata": {},
"outputs": [],
"source": [
"len(x)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "340077e2-4976-4c35-ae40-875fc852bd7b",
"metadata": {},
"outputs": [],
"source": [
"x.__len__()"
]
},
{
"cell_type": "markdown",
"id": "10e6adc3-9886-46b5-9648-7ce6cc848e5c",
"metadata": {},
"source": [
"## Special methods: `object.__str__()` (and `object.__repr__()`)\n",
"\n",
"Another special method is `__str__`, which returns a string representation of the object. (`__repr__` does something very similar but can often be used to \"reproduce\" the original thing and is hence a little more exact if less \"nice\" or \"pretty\".)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9af0f5db-21ac-4034-95ce-0f87d4f707fc",
"metadata": {},
"outputs": [],
"source": [
"str(0.4)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2a1fc7fb-d30b-4777-868f-6048d35a481c",
"metadata": {},
"outputs": [],
"source": [
"x = 0.4\n",
"x.__str__()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "39c639fd-ef4f-4dee-9d0f-e7c1afa2e561",
"metadata": {},
"outputs": [],
"source": [
"x={1:\"value1\",2:43}\n",
"x.__str__()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9378b32d-b53d-400a-b961-e71e856a190d",
"metadata": {},
"outputs": [],
"source": [
"print(x)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "059a009a-b3f0-420e-b5f4-b8c38f0f1baa",
"metadata": {},
"outputs": [],
"source": [
"print(\"hello\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "15846c60-857d-4d5c-a8af-359165665d99",
"metadata": {},
"outputs": [],
"source": [
"\"hello\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "80199a3b-8e26-4ca2-b172-6acfb4c2cd58",
"metadata": {},
"outputs": [],
"source": [
"f\"my value is: {x}\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3d68b71b-21dd-4190-ba3e-266003959887",
"metadata": {},
"outputs": [],
"source": [
"one = 1\n",
"one.__str__()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e7a3c69e-437f-47dd-ac3c-e831af5a9efd",
"metadata": {},
"outputs": [],
"source": [
"f\"my value is: {1}\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "742638d8-641e-4d44-9633-9984bb4032bd",
"metadata": {},
"outputs": [],
"source": [
"repr(1/9)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bc52ee21-d3f2-44e7-a526-f8be942d2749",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "3b701ab6-6fbb-4d03-b598-92010251ca94",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "49372fac-b68d-4937-8836-e143fbf35cc8",
"metadata": {},
"outputs": [],
"source": [
"x.__repr__()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dc6ed135-8ee7-4aad-8e4b-331c042e16a6",
"metadata": {},
"outputs": [],
"source": [
"\"hello\".__repr__()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "49c71868-c7b0-4ad3-bc95-0956dcce9883",
"metadata": {},
"outputs": [],
"source": [
"\"hello\".__str__()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8879819f-72ac-4c78-af40-a11c1e0466c6",
"metadata": {},
"outputs": [],
"source": [
"print(\"hello\".__str__())"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ef18f2c3-690e-4633-a3ae-198800f37408",
"metadata": {},
"outputs": [],
"source": [
"print(\"hello\".__repr__())"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9f22f4d4-8ef1-4edb-9a78-0adff34c77d2",
"metadata": {},
"outputs": [],
"source": [
"print.__str__()"
]
},
{
"cell_type": "markdown",
"id": "981f1a73-7c79-4a0e-b91e-63a3d348a503",
"metadata": {},
"source": [
"# Abstract *interfaces* in python\n",
"\n",
"`for` loops iterate over \"iterables\". You can construct a `list` (or a `dict`) from iterables.\n",
"\n",
"Functions and methods are \"callable\".\n",
"\n",
"Getting items with square brackets (e.g. `x[0]`) works by calling the `__getitem__` method (so, `x.__getitem__(0)`). Any type can define how this works for that type."
]
},
{
"cell_type": "markdown",
"id": "5aa6d28b-16cb-4d24-8a94-ec39f011d192",
"metadata": {},
"source": [
"## More on iterators\n",
"\n",
"There are a couple of very handy functions which take an iterable and return a new iterator:\n",
"\n",
"- `enumerate(items)` - returns iterator with index of items. Each iteration produces a tuple with `(index, item)`.\n",
"- `zip(a_items, b_items)` - returns iterator combining two other iterators. Each iteration produces a tuple with `(a_item, b_item)`"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6efaf701-1c93-4029-9c66-26f4154a42ad",
"metadata": {},
"outputs": [],
"source": [
"my_list = ['abc', 'def', 'ghi']\n",
"my_iterator = enumerate(my_list)\n",
"for x in my_iterator:\n",
" idx, item = x\n",
" print(f\"{idx}: {item}\")"
]
},
{
"cell_type": "markdown",
"id": "3b727d87-2494-4834-8031-ef4c8cb936c4",
"metadata": {},
"source": [
"Usually, the temporary iterator would be implicit:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d85af85e-6cd1-4004-9b25-e45e66e16e74",
"metadata": {},
"outputs": [],
"source": [
"my_list = ['abc', 'def', 'ghi']\n",
"for x in enumerate(my_list):\n",
" idx, item = x\n",
" print(f\"{idx}: {item}\")"
]
},
{
"cell_type": "markdown",
"id": "2ba404f2-d30f-44ed-b6ce-e174153e1340",
"metadata": {},
"source": [
"We can directly assign the tuple to two variables for further elimination of temporary variables:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d0bf3490-b600-4d44-b6c3-ff298237c3a8",
"metadata": {},
"outputs": [],
"source": [
"my_list = ['abc', 'def', 'ghi']\n",
"for idx, item in enumerate(my_list):\n",
" print(f\"{idx}: {item}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "01eaf24f-4556-4c2d-a980-9523557cec4a",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "61709401-22b2-4419-b36f-e9b79e5909f4",
"metadata": {},
"source": [
"Now, for `zip`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3506da24-eb14-4e94-951e-70b599388fb7",
"metadata": {},
"outputs": [],
"source": [
"my_list = ['abc', 'def', 'ghi']\n",
"list2 = ['red', 'green', 'blue']\n",
"my_iterator = zip(my_list, list2)\n",
"for x in my_iterator:\n",
" (item, color) = x\n",
" print(f\"{item} {color}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fce72964-cc76-44b6-95f3-732c07f032c8",
"metadata": {},
"outputs": [],
"source": [
"my_list = ['abc', 'def', 'ghi']\n",
"for (item, color) in zip(my_list, ['red', 'green', 'blue']):\n",
" print(f\"{item} {color}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "48c0294b-0d17-464f-bd2c-c4c8000f277f",
"metadata": {},
"outputs": [],
"source": [
"my_list = ['abc', 'def', 'ghi']\n",
"for item, number in zip(my_list, range(3,6)):\n",
" print(f\"{item} {number}\")"
]
},
{
"cell_type": "markdown",
"id": "f3610ab7-a3fb-49b9-b147-4ae6570f2ed9",
"metadata": {},
"source": [
"# Data Frames\n",
"\n",
"We are going to look at data in *tables* where each *row* of the table contains measurements or values about a single thing and each *column* is the measurement type. Such tables are very common in data science."
]
},
{
"cell_type": "markdown",
"id": "e7f6488f-694b-4ee6-8f28-f0f7b1ec691a",
"metadata": {},
"source": [
"(Loading the iris data is hidden in this cell. You can ignore this.)\n",
"<!--\n",
"import numpy as np\n",
"import pandas as pd\n",
"from sklearn.datasets import load_iris\n",
"iris = load_iris()\n",
"df= pd.DataFrame(data= np.c_[iris['data'], iris['target']],\n",
" columns= iris['feature_names'] + ['target'])\n",
"df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)\n",
"df = df.drop('target',axis=1)\n",
"def to_dict(df):\n",
" result = {}\n",
" for column_name in df.columns:\n",
" result[column_name] = []\n",
" for i,row in df.iterrows():\n",
" for column_name in df.columns:\n",
" result[column_name].append( row[column_name] )\n",
" return result\n",
"iris_dict = to_dict(df)\n",
"print(iris_dict)\n",
"-->"
]
},
{
"cell_type": "markdown",
"id": "9b60c8c3-2b6f-4ca4-b4b7-ec085fb1609b",
"metadata": {},
"source": [
"Here is an example of the data we will be looking at. It is a subsampling of the very famous [Iris data set](https://en.wikipedia.org/wiki/Iris_flower_data_set).\n",
"\n",
"<table class=\"dataframe\" border=\"1\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sepal length (cm)</th>\n",
" <th>sepal width (cm)</th>\n",
" <th>petal length (cm)</th>\n",
" <th>petal width (cm)</th>\n",
" <th>species</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>4.8</td>\n",
" <td>3.4</td>\n",
" <td>1.6</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>81</th>\n",
" <td>5.5</td>\n",
" <td>2.4</td>\n",
" <td>3.7</td>\n",
" <td>1.0</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>97</th>\n",
" <td>6.2</td>\n",
" <td>2.9</td>\n",
" <td>4.3</td>\n",
" <td>1.3</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>37</th>\n",
" <td>4.9</td>\n",
" <td>3.6</td>\n",
" <td>1.4</td>\n",
" <td>0.1</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>31</th>\n",
" <td>5.4</td>\n",
" <td>3.4</td>\n",
" <td>1.5</td>\n",
" <td>0.4</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>5.2</td>\n",
" <td>3.4</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>141</th>\n",
" <td>6.9</td>\n",
" <td>3.1</td>\n",
" <td>5.1</td>\n",
" <td>2.3</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>149</th>\n",
" <td>5.9</td>\n",
" <td>3.0</td>\n",
" <td>5.1</td>\n",
" <td>1.8</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>"
]
},
{
"cell_type": "markdown",
"id": "bbf97605-aa5d-4f7a-8015-fd9e543c80c0",
"metadata": {},
"source": [
"For now, the data are given as a `dict`. This `dict` is created in a special way, where each key is the column name and each value is a list of the entry for each row for that column. Later we will read this from a file."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e212cde2-364f-4055-b5f3-9e58db575544",
"metadata": {},
"outputs": [],
"source": [
"iris_dataset = {'sepal length (cm)': [5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.8, \n",
" 4.8, 4.3, 5.8, 5.7, 5.4, 5.1, 5.7, 5.1, 5.4, 5.1, 4.6, 5.1, \n",
" 4.8, 5.0, 5.0, 5.2, 5.2, 4.7, 4.8, 5.4, 5.2, 5.5, 4.9, 5.0,\n",
" 5.5, 4.9, 4.4, 5.1, 5.0, 4.5, 4.4, 5.0, 5.1, 4.8, 5.1, 4.6,\n",
" 5.3, 5.0, 7.0, 6.4, 6.9, 5.5, 6.5, 5.7, 6.3, 4.9, 6.6, 5.2,\n",
" 5.0, 5.9, 6.0, 6.1, 5.6, 6.7, 5.6, 5.8, 6.2, 5.6, 5.9, 6.1,\n",
" 6.3, 6.1, 6.4, 6.6, 6.8, 6.7, 6.0, 5.7, 5.5, 5.5, 5.8, 6.0,\n",
" 5.4, 6.0, 6.7, 6.3, 5.6, 5.5, 5.5, 6.1, 5.8, 5.0, 5.6, 5.7, \n",
" 5.7, 6.2, 5.1, 5.7, 6.3, 5.8, 7.1, 6.3, 6.5, 7.6, 4.9, 7.3, 6.7, \n",
" 7.2, 6.5, 6.4, 6.8, 5.7, 5.8, 6.4, 6.5, 7.7, 7.7, 6.0, 6.9, 5.6, 7.7, 6.3, 6.7, \n",
" 7.2, 6.2, 6.1, 6.4, 7.2, 7.4, 7.9, 6.4, 6.3, 6.1, 7.7, 6.3, 6.4, 6.0, 6.9, 6.7, \n",
" 6.9, 5.8, 6.8, 6.7, 6.7, 6.3, 6.5, 6.2, 5.9], \n",
" 'sepal width (cm)': [3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.4, 3.0, 3.0, 4.0, 4.4, 3.9, 3.5, 3.8, 3.8, 3.4, 3.7, 3.6, 3.3, 3.4, 3.0, 3.4, 3.5, 3.4, 3.2, 3.1, 3.4, 4.1, 4.2, 3.1, 3.2, 3.5, 3.6, 3.0, 3.4, 3.5, 2.3, 3.2, 3.5, 3.8, 3.0, 3.8, 3.2, 3.7, 3.3, 3.2, 3.2, 3.1, 2.3, 2.8, 2.8, 3.3, 2.4, 2.9, 2.7, 2.0, 3.0, 2.2, 2.9, 2.9, 3.1, 3.0, 2.7, 2.2, 2.5, 3.2, 2.8, 2.5, 2.8, 2.9, 3.0, 2.8, 3.0, 2.9, 2.6, 2.4, 2.4, 2.7, 2.7, 3.0, 3.4, 3.1, 2.3, 3.0, 2.5, 2.6, 3.0, 2.6, 2.3, 2.7, 3.0, 2.9, 2.9, 2.5, 2.8, 3.3, 2.7, 3.0, 2.9, 3.0, 3.0, 2.5, 2.9, 2.5, 3.6, 3.2, 2.7, 3.0, 2.5, 2.8, 3.2, 3.0, 3.8, 2.6, 2.2, 3.2, 2.8, 2.8, 2.7, 3.3, 3.2, 2.8, 3.0, 2.8, 3.0, 2.8, 3.8, 2.8, 2.8, 2.6, 3.0, 3.4, 3.1, 3.0, 3.1, 3.1, 3.1, 2.7, 3.2, 3.3, 3.0, 2.5, 3.0, 3.4, 3.0], 'petal length (cm)': [1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.6, 1.4, 1.1, 1.2, 1.5, 1.3, 1.4, 1.7, 1.5, 1.7, 1.5, 1.0, 1.7, 1.9, 1.6, 1.6, 1.5, 1.4, 1.6, 1.6, 1.5, 1.5, 1.4, 1.5, 1.2, 1.3, 1.4, 1.3, 1.5, 1.3, 1.3, 1.3, 1.6, 1.9, 1.4, 1.6, 1.4, 1.5, 1.4, 4.7, 4.5, 4.9, 4.0, 4.6, 4.5, 4.7, 3.3, 4.6, 3.9, 3.5, 4.2, 4.0, 4.7, 3.6, 4.4, 4.5, 4.1, 4.5, 3.9, 4.8, 4.0, 4.9, 4.7, 4.3, 4.4, 4.8, 5.0, 4.5, 3.5, 3.8, 3.7, 3.9, 5.1, 4.5, 4.5, 4.7, 4.4, 4.1, 4.0, 4.4, 4.6, 4.0, 3.3, 4.2, 4.2, 4.2, 4.3, 3.0, 4.1, 6.0, 5.1, 5.9, 5.6, 5.8, 6.6, 4.5, 6.3, 5.8, 6.1, 5.1, 5.3, 5.5, 5.0, 5.1, 5.3, 5.5, 6.7, 6.9, 5.0, 5.7, 4.9, 6.7, 4.9, 5.7, 6.0, 4.8, 4.9, 5.6, 5.8, 6.1, 6.4, 5.6, 5.1, 5.6, 6.1, 5.6, 5.5, 4.8, 5.4, 5.6, 5.1, 5.1, 5.9, 5.7, 5.2, 5.0, 5.2, 5.4, 5.1], 'petal width (cm)': [0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.2, 0.1, 0.1, 0.2, 0.4, 0.4, 0.3, 0.3, 0.3, 0.2, 0.4, 0.2, 0.5, 0.2, 0.2, 0.4, 0.2, 0.2, 0.2, 0.2, 0.4, 0.1, 0.2, 0.2, 0.2, 0.2, 0.1, 0.2, 0.2, 0.3, 0.3, 0.2, 0.6, 0.4, 0.3, 0.2, 0.2, 0.2, 0.2, 1.4, 1.5, 1.5, 1.3, 1.5, 1.3, 1.6, 1.0, 1.3, 1.4, 1.0, 1.5, 1.0, 1.4, 1.3, 1.4, 1.5, 1.0, 1.5, 1.1, 1.8, 1.3, 1.5, 1.2, 1.3, 1.4, 1.4, 1.7, 1.5, 1.0, 1.1, 1.0, 1.2, 1.6, 1.5, 1.6, 1.5, 1.3, 1.3, 1.3, 1.2, 1.4, 1.2, 1.0, 1.3, 1.2, 1.3, 1.3, 1.1, 1.3, 2.5, 1.9, 2.1, 1.8, 2.2, 2.1, 1.7, 1.8, 1.8, 2.5, 2.0, 1.9, 2.1, 2.0, 2.4, 2.3, 1.8, 2.2, 2.3, 1.5, 2.3, 2.0, 2.0, 1.8, 2.1, 1.8, 1.8, 1.8, 2.1, 1.6, 1.9, 2.0, 2.2, 1.5, 1.4, 2.3, 2.4, 1.8, 1.8, 2.1, 2.4, 2.3, 1.9, 2.3, 2.5, 2.3, 1.9, 2.0, 2.3, 1.8], 'species': ['setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica',
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "378a0233-8278-41a7-b889-336a897cd7ca",
"metadata": {},
"outputs": [],
"source": [
"plt.plot(iris_dataset['sepal width (cm)'], iris_dataset['petal width (cm)'],'o');\n",
"plt.xlabel('sepal width (cm)')\n",
"plt.ylabel('petal width (cm)')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bc40c86c-9fb8-45f2-8159-bd3f3580d902",
"metadata": {},
"outputs": [],
"source": [
"for column_name in iris_dataset:\n",
" print(column_name)"
]
},
{
"cell_type": "markdown",
"id": "f4bc81fb-b355-441a-a95e-32c31ac4a08d",
"metadata": {},
"source": [
"Let's double check that every column (the value of each key) has the same number of rows."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8eecef02-329d-42c8-a357-fc5735cd787e",
"metadata": {},
"outputs": [],
"source": [
"for column_name in iris_dataset:\n",
" column = iris_dataset[column_name]\n",
" print(\"'{}': {} rows\".format(column_name, len(column)))"
]
},
{
"cell_type": "markdown",
"id": "1fc044e6-6f03-45da-855b-bad03037676b",
"metadata": {},
"source": [
"Now let's compute the average value for each measurement across all of our data."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1ab6b4c2-bad4-4d48-afa3-8232555b2b56",
"metadata": {},
"outputs": [],
"source": [
"def compute_average(my_list):\n",
" assert type(my_list)==list\n",
" accum = 0.0\n",
" for item in my_list:\n",
" accum = accum + item\n",
" average = accum / len(my_list)\n",
" return average"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9184f364-7476-40ac-a645-2f68b9bec876",
"metadata": {},
"outputs": [],
"source": [
"compute_average([4, 6])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aef55a09-4de7-4e83-97b7-187ded8e4212",
"metadata": {},
"outputs": [],
"source": [
"for column_name in iris_dataset:\n",
" if column_name == 'species':\n",
" continue\n",
" average = compute_average(iris_dataset[column_name])\n",
" print(\"'{}' average: {}\".format(column_name, average))"
]
},
{
"cell_type": "markdown",
"id": "db2e40ad-aecf-4235-8dd3-632f8331bd04",
"metadata": {},
"source": [
"Let's see what species we have in our data."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2a27b2cf-a6df-4f34-aa57-28dde7c7bc03",
"metadata": {},
"outputs": [],
"source": [
"known_species = {}\n",
"count = 0\n",
"for row_species in iris_dataset['species']:\n",
" known_species[row_species] = None\n",
" count = count + 1\n",
"\n",
"print(count)\n",
"for species in known_species:\n",
" print(species)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c4d01d6f-230f-45d5-8ddd-caa674f38463",
"metadata": {},
"outputs": [],
"source": [
"known_species"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b0f4e745-9e3d-45dc-bc4f-4efc22692c60",
"metadata": {},
"outputs": [],
"source": [
"known_species = {}\n",
"for row_species in iris_dataset['species']:\n",
" if row_species in known_species:\n",
" known_species[row_species] += 1\n",
" else:\n",
" known_species[row_species] = 1\n",
"\n",
"print(known_species)"
]
},
{
"cell_type": "markdown",
"id": "ffbb912e-2baf-44f5-adb0-0ce1ffc9cfe3",
"metadata": {},
"source": [
"Now, we will want to calculate values for each species, not across all measurements. This is going to be a little tricky, because we need to calculate which species is in which row. As our first step, we will figure this out."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "500a55c9-dbe8-4436-b4a8-ac380fcbe987",
"metadata": {},
"outputs": [],
"source": [
"rows_for_species = {'setosa':[], 'versicolor':[], 'virginica':[]}\n",
"for species_name in rows_for_species:\n",
" # print(species_name)\n",
" row_index = 0\n",
" for row_species in iris_dataset['species']:\n",
" # print(row_index, row_species)\n",
" if row_species == species_name:\n",
" rows_for_species[species_name].append(row_index)\n",
" row_index = row_index + 1"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "21139fb9-edb2-49a0-8f33-19f1bac1b098",
"metadata": {},
"outputs": [],
"source": [
"rows_for_species"
]
},
{
"cell_type": "markdown",
"id": "823f9530-91de-4850-a93d-5b3c3896f62b",
"metadata": {},
"source": [
"Let's check if this worked by building a list for each species of each column."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5be44260-8e23-438e-a39d-335f36d21e93",
"metadata": {},
"outputs": [],
"source": [
"for species_name in rows_for_species:\n",
" # get a list of row numbers for `species_name`\n",
" species_indexes = rows_for_species[species_name]\n",
" # iterate over columns in dataset\n",
" for column_name in iris_dataset:\n",
" # get all data for this column (get all data for this measurement type, e.g. sepal width)\n",
" all_rows_for_this_column = iris_dataset[column_name]\n",
" \n",
" # accumulate measurements in a list **for this species**\n",
" this_species_values = []\n",
" for species_index in species_indexes:\n",
" # take only the rows corresponding to this species\n",
" row_value = all_rows_for_this_column[species_index]\n",
" this_species_values.append(row_value)\n",
" print(f\"{species_name} -> {column_name}: {this_species_values}\")\n",
" print()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}