"Alle Teilnehmer:innen biologischer Profilmodule werden zur Studienleistung angemeldet."

Numpy introduction¶

numpy basics and creating arrays¶

Numpy is a widely used library for handling arrays of data, especially numerical data. It would not be an exageration to say it is fundamental to the Python data science ecosystem.

The most important part of numpy is the numpy array type. A numpy array is conceptually similar to a Python list or tuple but each element has the same data type and the array has a fixed size.

Typically numpy is imported as np, a conventional shorthand that saves a bit of typing.

In [4]:

import numpy as np

We can create an array from any sequence type, such as lists and tuples:

In [5]:

x = np.array([1,2,3,4])
x

Out[5]:

array([1, 2, 3, 4])

We can crate an array of n elements (from 0 to n-1) with the arange function.

In [6]:

x = np.arange(10)
x

Out[6]:

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [7]:

x = np.arange(4,10)
x

Out[7]:

array([4, 5, 6, 7, 8, 9])

In [8]:

x = np.arange(0,10,1)
x

Out[8]:

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [9]:

x = np.arange(4,10,2)
x

Out[9]:

array([4, 6, 8])

We can create an array of n equally spaced elements from start to stop with np.linspace. For example, here start is 100, stop is 120 and n is 11.

In [12]:

np.linspace(100, 120, 11)

Out[12]:

array([100., 102., 104., 106., 108., 110., 112., 114., 116., 118., 120.])

We can also create arrays of zero or one with a given shape:

In [13]:

np.zeros((3,))

Out[13]:

array([0., 0., 0.])

In [14]:

np.zeros((3,5)) # shape parameter - (number of rows, number of columns) in this case

Out[14]:

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

In [15]:

np.ones((5,3))

Out[15]:

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

array shape¶

In addition to 1 dimensional numpy arrays which are very similar to lists or tuples, numpy arrays may also be 2 or more dimensions. The shape attribute of a numpy array may be used to get or set its number of dimensions and size.

In [16]:

x = np.arange(12)
x

Out[16]:

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [17]:

x.shape

Out[17]:

(12,)

In [18]:

x = np.arange(12)
x.shape = (3,4)
x

Out[18]:

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [19]:

x.shape

Out[19]:

(3, 4)

In [20]:

x.shape = (3,2)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[20], line 1
----> 1 x.shape = (3,2)

ValueError: cannot reshape array of size 12 into shape (3,2)

The ndim attribute is the dimensionality of the array (and, thus, equal to length of the array's shape attribute):

In [21]:

x.ndim

Out[21]:

In [23]:

x = np.arange(12)
x.shape = (3,2,2)
x

Out[23]:

array([[[ 0,  1],
        [ 2,  3]],

       [[ 4,  5],
        [ 6,  7]],

       [[ 8,  9],
        [10, 11]]])

In [24]:

x.ndim

Out[24]:

array operations¶

Numpy arrays support mathematical operations with other numpy arrays and with single numbers ("scalars").

With scalars, the scalar is first converted to an array with the same shape as the numpy array and then an element-wise operation is performed.

With other arrays of the same size, an element-wise operation is performed.

In [25]:

x = np.arange(10)
x

Out[25]:

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [26]:

y = 2 * np.arange(10)
y

Out[26]:

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [27]:

z = x + y
z

Out[27]:

array([ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27])

In [28]:

x + 3

Out[28]:

array([ 3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

In [29]:

x + 3.0

Out[29]:

array([ 3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.])

In [30]:

Out[30]:

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [31]:

x/5

Out[31]:

array([0. , 0.2, 0.4, 0.6, 0.8, 1. , 1.2, 1.4, 1.6, 1.8])

In [32]:

1/5

Out[32]:

0.2

In [33]:

Out[33]:

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [34]:

4*x

Out[34]:

array([ 0,  4,  8, 12, 16, 20, 24, 28, 32, 36])

In [35]:

np.array([ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27])/3

Out[35]:

array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

array dtype¶

Just like lists or tuples, every element in a numpy array has a data type. As mentioned above, however, every element in a numpy array has the same data type, and thus we can refer to the "datatype of the array". This can be set when the array is created with the dtype keyword argument and read from the dtype attribute:

In [36]:

x = np.arange(10)
x

Out[36]:

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [37]:

x.dtype

Out[37]:

dtype('int64')

In [38]:

x = np.arange(10, dtype=np.float64)
x

Out[38]:

array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

In [39]:

x.dtype

Out[39]:

dtype('float64')

array indexing and slicing¶

Numpy arrays can be indexed and sliced just like other Python sequence types such as lists, tuples, and strings.

Just like with python lists, the indices or slices can be read and written. In other words, numpy arrays are mutable.

In [40]:

x = np.arange(10)
x

Out[40]:

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [41]:

x[4]

Out[41]:

In [42]:

x[:4]

Out[42]:

array([0, 1, 2, 3])

In [56]:

x[4:]

Out[56]:

array([4, 5, 6, 7, 8, 9])

In [54]:

tmp = slice(4)
x[tmp]

Out[54]:

array([0, 1, 2, 3])

In [57]:

tmp = slice(4, 7)
x[tmp]

Out[57]:

array([4, 5, 6])

In [58]:

x[4:7]

Out[58]:

array([4, 5, 6])

In [59]:

tmp = slice(2, 7, 2)
x[tmp]

Out[59]:

array([2, 4, 6])

In [60]:

tmp = slice(2, None, 2)
x[tmp]

Out[60]:

array([2, 4, 6, 8])

In [61]:

x[2::2]

Out[61]:

array([2, 4, 6, 8])

In [62]:

x[4::3]

Out[62]:

array([4, 7])

In [63]:

x[4:7]

Out[63]:

array([4, 5, 6])

In [64]:

x[2::2.2]

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[64], line 1
----> 1 x[2::2.2]

TypeError: slice indices must be integers or None or have an __index__ method

In [65]:

np.arange(0, 10, 2.2)

Out[65]:

array([0. , 2.2, 4.4, 6.6, 8.8])

In [66]:

x[::2]

Out[66]:

array([0, 2, 4, 6, 8])

Because numpy arrays can have 2 or more, dimensions, we can also index and slice them in higher dimensions. For two dimensional arrays, the first index is always the row index and the second index is always the column index.

In [67]:

x = np.arange(12)
x.shape = (3,4)
x

Out[67]:

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [68]:

x[1:, :]

Out[68]:

array([[ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [69]:

x[:, 1:]

Out[69]:

array([[ 1,  2,  3],
       [ 5,  6,  7],
       [ 9, 10, 11]])

In [70]:

x[1:, 2:]

Out[70]:

array([[ 6,  7],
       [10, 11]])

In [71]:

x[1:2:]

Out[71]:

array([[4, 5, 6, 7]])

In [72]:

Out[72]:

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [73]:

x[1:, 2:] = 99
x

Out[73]:

array([[ 0,  1,  2,  3],
       [ 4,  5, 99, 99],
       [ 8,  9, 99, 99]])

In [74]:

x.dtype

Out[74]:

dtype('int64')

In [75]:

x[1:, 2:] = 99.5
x

Out[75]:

array([[ 0,  1,  2,  3],
       [ 4,  5, 99, 99],
       [ 8,  9, 99, 99]])

In [76]:

x[1:, 2:] = 99.9
x

Out[76]:

array([[ 0,  1,  2,  3],
       [ 4,  5, 99, 99],
       [ 8,  9, 99, 99]])

In [77]:

x[:, :] = 99.9
x

Out[77]:

array([[99, 99, 99, 99],
       [99, 99, 99, 99],
       [99, 99, 99, 99]])

References to arrays¶

Remember that variable assignment in Python does not create a new object but only creates a variable which points to an existing object. This is very important with numpy.

In [78]:

x = np.arange(20)
x

Out[78]:

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

In [79]:

# Here we create a variable which references the first 10 elements of `x`.
y = x[:10]
y

Out[79]:

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [80]:

x[4] = 9999

In [81]:

Out[81]:

array([   0,    1,    2,    3, 9999,    5,    6,    7,    8,    9,   10,
         11,   12,   13,   14,   15,   16,   17,   18,   19])

In [82]:

Out[82]:

array([   0,    1,    2,    3, 9999,    5,    6,    7,    8,    9])

In [83]:

# Now we assign all the elements of `y` to have the value of 123.
# We do this by creating a slice into the array `y` and assigning to it.
y[:] = 123
y

Out[83]:

array([123, 123, 123, 123, 123, 123, 123, 123, 123, 123])

In [84]:

# How does this affect the original array `x`?
x

Out[84]:

array([123, 123, 123, 123, 123, 123, 123, 123, 123, 123,  10,  11,  12,
        13,  14,  15,  16,  17,  18,  19])

In [85]:

y[-1] = 999

In [86]:

Out[86]:

array([123, 123, 123, 123, 123, 123, 123, 123, 123, 999,  10,  11,  12,
        13,  14,  15,  16,  17,  18,  19])

In [87]:

y[::2] = -1

In [88]:

Out[88]:

array([ -1, 123,  -1, 123,  -1, 123,  -1, 123,  -1, 999,  10,  11,  12,
        13,  14,  15,  16,  17,  18,  19])

In [89]:

x[:3] = -100

In [90]:

Out[90]:

array([-100, -100, -100,  123,   -1,  123,   -1,  123,   -1,  999,   10,
         11,   12,   13,   14,   15,   16,   17,   18,   19])

In [91]:

Out[91]:

array([-100, -100, -100,  123,   -1,  123,   -1,  123,   -1,  999])

In [92]:

z = y.copy()

In [93]:

Out[93]:

array([-100, -100, -100,  123,   -1,  123,   -1,  123,   -1,  999])

In [94]:

Out[94]:

array([-100, -100, -100,  123,   -1,  123,   -1,  123,   -1,  999])

In [95]:

y[:2] = -9999

In [96]:

Out[96]:

array([-9999, -9999,  -100,   123,    -1,   123,    -1,   123,    -1,
         999])

In [97]:

Out[97]:

array([-100, -100, -100,  123,   -1,  123,   -1,  123,   -1,  999])

In [98]:

z = y[:]

In [99]:

Out[99]:

array([-9999, -9999,  -100,   123,    -1,   123,    -1,   123,    -1,
         999])

In [100]:

Out[100]:

array([-9999, -9999,  -100,   123,    -1,   123,    -1,   123,    -1,
         999])

In [101]:

y[:2] = 444444

In [102]:

Out[102]:

array([444444, 444444,   -100,    123,     -1,    123,     -1,    123,
           -1,    999])

In [103]:

Out[103]:

array([444444, 444444,   -100,    123,     -1,    123,     -1,    123,
           -1,    999])

In [104]:

z = np.arange(12)
z

Out[104]:

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [105]:

zref = z[::3]
zref[:] = 999

In [106]:

Out[106]:

array([999,   1,   2, 999,   4,   5, 999,   7,   8, 999,  10,  11])

In [107]:

zrefref = zref[::2]
zrefref[:] = -1000

In [108]:

Out[108]:

array([-1000,     1,     2,   999,     4,     5, -1000,     7,     8,
         999,    10,    11])

Array slices - a key difference between a numpy array and a Python list¶

With a numpy array, a slice is created by [:] (e.g. my_array[:]). With a plain Python list, [:] will return a copy of the list.

For both numpy arrays and Python lists, the .copy() method will make a copy, so this is preferred if you want to be sure you are making a copy.

In [109]:

# First with a list
a = [1,2,3]

In [110]:

b = a[:]

In [111]:

a[0] = 100

In [112]:

Out[112]:

[1, 2, 3]

In [113]:

Out[113]:

[100, 2, 3]

In [114]:

# Now with a numpy array
a = np.array([1,2,3])

In [115]:

b = a[:]

In [116]:

a[0] = 100

In [117]:

Out[117]:

array([100,   2,   3])

In [118]:

Out[118]:

array([100,   2,   3])

Efficient data processing with numpy¶

Because operations on numpy arrays happen for all elements with a single Python expression, these can operations can be performed very fast and efficiently by the computer. For example, if x is a numpy array with 10,000 elements, we can avoid a Python for loop with 10,000 iterations by performing our work with numpy.

Below we use the Jupyter "magic command %timeit" to measure how long a single expression takes, in this case performing an element-wise multiplication.

In [119]:

x = np.arange(10000, dtype=np.float64)

In [120]:

x*x

Out[120]:

array([0.0000000e+00, 1.0000000e+00, 4.0000000e+00, ..., 9.9940009e+07,
       9.9960004e+07, 9.9980001e+07])

In [121]:

%timeit x*x

1.6 μs ± 15.2 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [122]:

y = x*x
len(y)

Out[122]:

In [124]:

assert y[2] == 4
assert y[3] == 9

Now let's do the same as above with a Python list. We need to crease a list_mul function.

In [125]:

def list_mul(a,b):
    """element-wise product of `a` and `b`"""
    n = len(a)
    assert n==len(b) 
    c = []
    for i in range(n):
        c.append(a[i] * b[i])
    return c

Now convert x to a list from a numpy array.

In [126]:

x = list(x)

In [127]:

type(x)

Out[127]:

list

In [128]:

x[:10]

Out[128]:

[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]

In [129]:

%timeit list_mul(x,x)

483 μs ± 24.9 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [130]:

y = list_mul(x,x)

In [131]:

assert y[3] == 9

Elementwise numpy operations¶

Above you have already seen element-wise multiplication, which multiplies every element of two inputs. Similarly, other operations operate element wise on a single input array.

In [132]:

np.sqrt( np.array([1, 4, 9]))

Out[132]:

array([1., 2., 3.])

In [133]:

np.linspace( 0, 2*np.pi, 30)

Out[133]:

array([0.        , 0.21666156, 0.43332312, 0.64998469, 0.86664625,
       1.08330781, 1.29996937, 1.51663094, 1.7332925 , 1.94995406,
       2.16661562, 2.38327719, 2.59993875, 2.81660031, 3.03326187,
       3.24992343, 3.466585  , 3.68324656, 3.89990812, 4.11656968,
       4.33323125, 4.54989281, 4.76655437, 4.98321593, 5.1998775 ,
       5.41653906, 5.63320062, 5.84986218, 6.06652374, 6.28318531])

In [134]:

np.cos( np.linspace( 0, 2*np.pi, 30) )

Out[134]:

array([ 1.        ,  0.97662056,  0.90757542,  0.79609307,  0.64738628,
        0.46840844,  0.26752834,  0.05413891, -0.161782  , -0.37013816,
       -0.56118707, -0.72599549, -0.85685718, -0.94765317, -0.99413796,
       -0.99413796, -0.94765317, -0.85685718, -0.72599549, -0.56118707,
       -0.37013816, -0.161782  ,  0.05413891,  0.26752834,  0.46840844,
        0.64738628,  0.79609307,  0.90757542,  0.97662056,  1.        ])

More numpy operations¶

In addition to elementwise operations such as np.cos(x) or x * y where x and y are same-shaped arrays, numpy can also perform operations on entire arrays.

Take for example the mean() function.

In [135]:

x = np.arange(10)
x

Out[135]:

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [136]:

np.mean(x)

Out[136]:

4.5

We can also do the mean on a 2D array, either for the entire array or row-wise or column-wise:

In [137]:

x = np.arange(30)
x.shape = (5,6)
x

Out[137]:

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29]])

In [138]:

np.mean(x)

Out[138]:

14.5

In [139]:

# take the mean across the rows, (i.e. mean of each column), which is axis 0.
np.mean(x,axis=0)

Out[139]:

array([12., 13., 14., 15., 16., 17.])

In [140]:

# take the mean across the columns, which is axis 1.
np.mean(x,axis=1)

Out[140]:

array([ 2.5,  8.5, 14.5, 20.5, 26.5])

In addition to mean(), numpy provides std(), sum(), and more.

In [141]:

np.std(x)

Out[141]:

8.65544144839919

In [142]:

np.sum(x)

Out[142]:

In [143]:

np.max(x)

Out[143]:

In [144]:

x.mean()

Out[144]:

14.5

In [145]:

np.mean(x)

Out[145]:

14.5

argmin and argmax¶

Important in many scientific computing applications are argmin and argmax functions. These return the index of the smallest or largest value, respectively.

In [146]:

x = np.array([0, 10, -10, 4, 3, 2, 100, 2, 2, -1])
x

Out[146]:

array([  0,  10, -10,   4,   3,   2, 100,   2,   2,  -1])

In [147]:

min_idx = np.argmin(x)
min_idx

Out[147]:

In [148]:

x[min_idx]

Out[148]:

-10

In [149]:

np.min(x)

Out[149]:

-10

In [150]:

Out[150]:

array([  0,  10, -10,   4,   3,   2, 100,   2,   2,  -1])

In [151]:

max_idx = np.argmax(x)
max_idx

Out[151]:

In [152]:

x[max_idx]

Out[152]:

In [153]:

x = np.array([0, 100, 0, 4, 3, 2, 100, 2, 2, -1])
x

Out[153]:

array([  0, 100,   0,   4,   3,   2, 100,   2,   2,  -1])

In [154]:

np.argmax(x)

Out[154]:

Because of its speed, numpy makes it possible to use Python for scientific computing.¶

You can read more about numpy at its User Guide and its Reference Guide.

Live coding example: calculate distance between 2D points.¶

In [155]:

import matplotlib.pyplot as plt

In [156]:

a = (10, 20)
b = (13, 24)

plt.plot([a[0]], [a[1]], 'o', label='a')
plt.plot([b[0]], [b[1]], 'o', label='b')
plt.plot([5, 15, 15, 5, 5], [15, 15, 25, 25, 15], 'k-')
plt.legend();

In [157]:

def compute_distance(a,b):
    dx = a[0] - b[0]
    dy = a[1] - b[1]
    return np.sqrt(dx*dx + dy*dy)

In [158]:

compute_distance(a,b)

Out[158]:

5.0

In [159]:

assert compute_distance(a,b)==5.0

64 KiB Raw Blame History