diff --git a/lectures/lecture-07/1 - numpy.ipynb b/lectures/lecture-07/1 - numpy.ipynb new file mode 100644 index 0000000..11dd5df --- /dev/null +++ b/lectures/lecture-07/1 - numpy.ipynb @@ -0,0 +1,1568 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\"Alle Teilnehmer:innen biologischer Profilmodule werden zur Studienleistung angemeldet.\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Numpy introduction" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# numpy basics and creating arrays\n", + "\n", + "Numpy is a widely used library for handling arrays of data, especially numerical data. It would not be an exageration to say it is fundamental to the Python data science ecosystem.\n", + "\n", + "The most important part of numpy is the numpy `array` type. A numpy `array` is conceptually similar to a Python `list` or `tuple` but each element has the same data type and the array has a fixed size.\n", + "\n", + "Typically `numpy` is imported as `np`, a conventional shorthand that saves a bit of typing." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can create an array from any sequence type, such as lists and tuples:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = np.array([1,2,3,4])\n", + "x" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can crate an array of `n` elements (from `0` to `n-1`) with the `arange` function." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = np.arange(10)\n", + "x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = np.arange(4,10)\n", + "x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = np.arange(0,10,1)\n", + "x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = np.arange(4,10,2)\n", + "x" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can create an array of `n` equally spaced elements from `start` to `stop` with `np.linspace`. For example, here `start` is `100`, `stop` is `120` and `n` is 11." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "np.linspace(100, 120, 11)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can also create arrays of zero or one with a given shape:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "np.zeros((3,))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "np.zeros((3,5)) # shape parameter - (number of rows, number of columns) in this case" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "np.ones((5,3))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## array shape\n", + "\n", + "In addition to 1 dimensional numpy arrays which are very similar to lists or tuples, numpy arrays may also be 2 or more dimensions. The `shape` attribute of a numpy array may be used to get or set its number of dimensions and size." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = np.arange(12)\n", + "x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x.shape" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = np.arange(12)\n", + "x.shape = (3,4)\n", + "x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x.shape" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x.shape = (3,2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The `ndim` attribute is the dimensionality of the array (and, thus, equal to length of the array `shape`):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x.ndim" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = np.arange(12)\n", + "x.shape = (3,2,2)\n", + "x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x.ndim" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## array operations\n", + "\n", + "Numpy arrays support mathematical operations with other numpy arrays and with single numbers (\"scalars\").\n", + "\n", + "With scalars, the scalar is first converted to an array with the same shape as the numpy array and then an element-wise operation is performed.\n", + "\n", + "With other arrays of the same size, an element-wise operation is performed." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = np.arange(10)\n", + "x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y = 2 * np.arange(10)\n", + "y" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "z = x + y\n", + "z" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x + 3" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x + 3.0" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x/5" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "1/5" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "4*x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "np.array([ 0, 3, 6, 9, 12, 15, 18, 21, 24, 27])/3" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## array dtype\n", + "\n", + "Just like lists or tuples, every element in a numpy array has a data type. As mentioned above, however, every element in a numpy array has the same data type, and thus we can refer to the \"datatype of the array\". This can be set when the array is created with the `dtype` keyword argument and read from the `dtype` attribute:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = np.arange(10)\n", + "x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x.dtype" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = np.arange(10, dtype=np.float64)\n", + "x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x.dtype" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## array indexing and slicing\n", + "\n", + "Numpy arrays can be indexed and sliced just like other Python sequence types such as lists, tuples, and strings.\n", + "\n", + "Just like with python lists, the indexes or slices can be read and written. In other words, numpy arrays are *mutable*." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = np.arange(10)\n", + "x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x[4]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x[:4]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x[4:]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tmp = slice(4)\n", + "x[tmp]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tmp = slice(4, 7)\n", + "x[tmp]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x[4:7]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tmp = slice(2, 7, 2)\n", + "x[tmp]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tmp = slice(2, None, 2)\n", + "x[tmp]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x[2::2]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x[4::3]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x[4:7]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x[2::2.2]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "np.arange(0, 10, 2.2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x[::2]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Because numpy arrays can have 2 or more, dimensions, we can also index and slice them in higher dimensions. For two dimensional arrays, the first index is always the row index and the second index is always the column index." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = np.arange(12)\n", + "x.shape = (3,4)\n", + "x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x[1:, :]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x[:, 1:]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x[1:, 2:]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x[1:2:]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x[1:, 2:] = 99\n", + "x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x.dtype" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x[1:, 2:] = 99.5\n", + "x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x[1:, 2:] = 99.9\n", + "x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x[:, :] = 99.9\n", + "x" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## References to arrays\n", + "\n", + "Remember that variable assignment in Python does not create a new object but only creates a variable which points to an existing object. This is very important with numpy." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = np.arange(20)\n", + "x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Here we create a variable which references the first 10 elements of `x`.\n", + "y = x[:10]\n", + "y" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x[4] = 9999" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Now we assign all the elements of `y` to have the value of 123.\n", + "# We do this by creating a slice into the array `y` and assigning to it.\n", + "y[:] = 123\n", + "y" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# How does this affect the original array `x`?\n", + "x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y[-1] = 999" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y[::2] = -1" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x[:3] = -100" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "z = y.copy()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "z" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y[:2] = -9999" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "z" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "z = y[:]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "z" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y[:2] = 444444" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "z" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "z = np.arange(12)\n", + "z" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "zref = z[::3]\n", + "zref[:] = 999" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "z" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "zrefref = zref[::2]\n", + "zrefref[:] = -1000" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "z" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Array *slices* - a key difference between a numpy array and a Python list\n", + "\n", + "With a numpy array, a slice is created by `[:]` (e.g. `my_array[:]`). With a plain Python list, `[:]` will return a copy of the list.\n", + "\n", + "For both numpy arrays and Python lists, the `.copy()` method will make a copy, so this is preferred if you want to be sure you are making a copy." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# First with a list\n", + "a = [1,2,3]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "b = a[:]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a[0] = 100" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "b" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Now with a numpy array\n", + "a = np.array([1,2,3])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "b = a[:]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a[0] = 100" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "b" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Efficient data processing with numpy\n", + "\n", + "Because operations on numpy arrays happen for all elements with a single Python expression, these can operations can be performed very fast and efficiently by the computer. For example, if `x` is a numpy array with 10,000 elements, we can avoid a Python for loop with 10,000 iterations by performing our work with numpy.\n", + "\n", + "Below we use the Jupyter \"magic command `%timeit`\" to measure how long a single expression takes, in this case performing an element-wise multiplication." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = np.arange(10000, dtype=np.float64)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x*x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%timeit x*x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y = x*x\n", + "len(y)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "assert y[2] == 4" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now let's do the same as above with a Python `list`. We need to crease a list_mul function." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def list_mul(a,b):\n", + " \"\"\"element-wise product of `a` and `b`\"\"\"\n", + " n = len(a)\n", + " assert n==len(b) \n", + " c = []\n", + " for i in range(n):\n", + " c.append(a[i] * b[i])\n", + " return c" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now convert `x` to a list from a numpy array." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = list(x)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "type(x)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x[:10]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%timeit list_mul(x,x)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y = list_mul(x,x)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "assert y[3] == 9" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Elementwise numpy operations\n", + "\n", + "Above you have already seen element-wise multiplication, which multiplies every element of two inputs. Similarly, other operations operate element wise on a single input array." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "np.sqrt( np.array([1, 4, 9]))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "np.linspace( 0, 2*np.pi, 30) " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "np.cos( np.linspace( 0, 2*np.pi, 30) )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## More numpy operations\n", + "\n", + "In addition to elementwise operations such as `np.cos(x)` or `x * y` where `x` and `y` are same-shaped arrays, numpy can also perform operations on entire arrays.\n", + "\n", + "Take for example the `mean()` function." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = np.arange(10)\n", + "x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "np.mean(x)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can also do the mean on a 2D array, either for the entire array or row-wise or column-wise:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = np.arange(30)\n", + "x.shape = (5,6)\n", + "x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "np.mean(x)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# take the mean across the rows, (i.e. mean of each column), which is axis 0.\n", + "np.mean(x,axis=0)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# take the mean across the columns, which is axis 1.\n", + "np.mean(x,axis=1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In addition to `mean()`, numpy provides `std()`, `sum()`, and more. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "np.std(x)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "np.sum(x)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "np.max(x)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x.mean()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "np.mean(x)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## argmin and argmax\n", + "\n", + "Important in many scientific computing applications are `argmin` and `argmax` functions. These return the index of the smallest or largest value, respectively." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = np.array([0, 10, -10, 4, 3, 2, 100, 2, 2, -1])\n", + "x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "min_idx = np.argmin(x)\n", + "min_idx" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x[min_idx]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "np.min(x)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "max_idx = np.argmax(x)\n", + "max_idx" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x[max_idx]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = np.array([0, 100, 0, 4, 3, 2, 100, 2, 2, -1])\n", + "x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "np.argmax(x)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Because of its speed, numpy makes it possible to use Python for scientific computing.\n", + "\n", + "You can read more about numpy at its [User Guide](https://numpy.org/doc/1.26/user/index.html) and its [Reference Guide](https://numpy.org/doc/1.26/reference/index.html)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Live coding example: calculate distance between 2D points." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = (10, 20)\n", + "b = (13, 24)\n", + "\n", + "plt.plot([a[0]], [a[1]], 'o', label='a')\n", + "plt.plot([b[0]], [b[1]], 'o', label='b')\n", + "plt.plot([5, 15, 15, 5, 5], [15, 15, 25, 25, 15], 'k-')\n", + "plt.legend();" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "compute_distance(a,b)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "assert compute_distance(a,b)==5.0" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "See also https://twitter.com/MIT_CSAIL/status/1459932891765297153?s=20\n", + "\n", + "![FEA8OpTWYAAxNjx.png](FEA8OpTWYAAxNjx.png)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.10" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/lectures/lecture-07/2 - optimization.ipynb b/lectures/lecture-07/2 - optimization.ipynb new file mode 100644 index 0000000..7dbc4cc --- /dev/null +++ b/lectures/lecture-07/2 - optimization.ipynb @@ -0,0 +1,271 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Optimization\n", + "\n", + "Let's consider the following code:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "\n", + "# position of flower in 2D\n", + "flower = np.array([29.1, 19.1])\n", + "\n", + "def make_bee_track(t):\n", + " \"\"\"Find bee position at time t\"\"\"\n", + " pos0 = (13.21, 4.56)\n", + " velocity = (3.1, 0.25)\n", + " pos_x = pos0[0] + t*velocity[0]\n", + " pos_y = pos0[1] + t*velocity[1]\n", + " return np.array([pos_x,pos_y])\n", + "\n", + "t = np.linspace(0,15,10)\n", + "bee_track = make_bee_track(t)\n", + "\n", + "fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(4,3))\n", + "ax.plot( [flower[0]], [flower[1]], 'or', label='flower' )\n", + "ax.plot( bee_track[0], bee_track[1], '.-k', label='bee')\n", + "ax.axis('equal')\n", + "ax.legend()\n", + "ax.set_xlabel('x')\n", + "ax.set_ylabel('y')\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the above code, we *parameterized* the bee trajectory by the variable `t` in the function `make_bee_track()`. This means we could get a new point on the track by choosing a new value of `t`. For example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(make_bee_track(0.0))\n", + "print(make_bee_track(0.1))\n", + "print(make_bee_track(0.2))\n", + "print(make_bee_track(1.0))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now let's measure the distance between the bee and the flower." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def compute_distance(a,b):\n", + " a = np.array(a)\n", + " b = np.array(b)\n", + " return np.sqrt(np.sum((a-b)**2))\n", + "\n", + "n_time_steps = bee_track.shape[1]\n", + "distance = np.zeros(n_time_steps)\n", + "for i in range(n_time_steps):\n", + " bee_pos = bee_track[:,i]\n", + " distance[i] = compute_distance(bee_pos, flower)\n", + "display(distance)\n", + "\n", + "fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(4,3))\n", + "ax.plot( t, distance )\n", + "ax.set_xlabel('t')\n", + "ax.set_ylabel('distance')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Given the plot of distance versus t above, we can see the distance is minimized when t is near 5. What is the bee's position when t is 5?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(make_bee_track(5))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can check back to the xy plot to see, indeed, this point is pretty close to the flower.\n", + "\n", + "What if we want to know, however, exactly the closest point? There are several ways to find this. Here we are going to use a \"brute force\" approach which will work on many different problems. Specifically, we will use [scipy.optimize.minimize_scalar](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize_scalar.html). The overall idea of this kind of *numerical optimization* is that we find the best fitting parameter to minimize our \"error\".\n", + "\n", + "In this example, we are not so much concerned with the exact algorithm being used, but with the way we call this algorithm.\n", + "\n", + "Let's create a function which uses the `flower` location to compute distance:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def calc_distance_func(t):\n", + " # assume variable 'flower' in our global scope\n", + " x1, y1 = flower\n", + " # calculate bee position (also assuming 'make_bee_track' in scope)\n", + " x2, y2 = make_bee_track(t)\n", + " dist = compute_distance((x1,y1), (x2,y2))\n", + " print(f't: {t} -> dist: {dist}')\n", + " return dist" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "calc_distance_func(0)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "calc_distance_func(5)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import scipy.optimize\n", + "result = scipy.optimize.minimize_scalar(calc_distance_func)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "scipy.optimize.minimize_scalar?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "result" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "type(result)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "result.x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(calc_distance_func(5))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "calc_distance_func(5.468493134264144)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Where is the bee for this value of `t`?\n", + "make_bee_track(5.468493134264144)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(4,3))\n", + "ax.plot( [flower[0]], [flower[1]], 'or', label='flower' )\n", + "ax.plot( bee_track[0], bee_track[1], '.-k', label='bee')\n", + "\n", + "x,y = make_bee_track(5.468493134264144)\n", + "ax.plot( [x], [y] ,'x', label='closest')\n", + "ax.axis('equal')\n", + "ax.legend()\n", + "ax.set_xlabel('x')\n", + "ax.set_ylabel('y')\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.10" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/lectures/lecture-07/FEA8OpTWYAAxNjx.png b/lectures/lecture-07/FEA8OpTWYAAxNjx.png new file mode 100644 index 0000000..d17688f Binary files /dev/null and b/lectures/lecture-07/FEA8OpTWYAAxNjx.png differ