64 KiB
"Alle Teilnehmer:innen biologischer Profilmodule werden zur Studienleistung angemeldet."
Numpy introduction¶
numpy basics and creating arrays¶
Numpy is a widely used library for handling arrays of data, especially numerical data. It would not be an exageration to say it is fundamental to the Python data science ecosystem.
The most important part of numpy is the numpy array
type. A numpy array
is conceptually similar to a Python list
or tuple
but each element has the same data type and the array has a fixed size.
Typically numpy
is imported as np
, a conventional shorthand that saves a bit of typing.
import numpy as np
We can create an array from any sequence type, such as lists and tuples:
x = np.array([1,2,3,4])
x
We can crate an array of n
elements (from 0
to n-1
) with the arange
function.
x = np.arange(10)
x
x = np.arange(4,10)
x
x = np.arange(0,10,1)
x
x = np.arange(4,10,2)
x
We can create an array of n
equally spaced elements from start
to stop
with np.linspace
. For example, here start
is 100
, stop
is 120
and n
is 11.
np.linspace(100, 120, 11)
We can also create arrays of zero or one with a given shape:
np.zeros((3,))
np.zeros((3,5)) # shape parameter - (number of rows, number of columns) in this case
np.ones((5,3))
array shape¶
In addition to 1 dimensional numpy arrays which are very similar to lists or tuples, numpy arrays may also be 2 or more dimensions. The shape
attribute of a numpy array may be used to get or set its number of dimensions and size.
x = np.arange(12)
x
x.shape
x = np.arange(12)
x.shape = (3,4)
x
x.shape
x.shape = (3,2)
The ndim
attribute is the dimensionality of the array (and, thus, equal to length of the array's shape
attribute):
x.ndim
x = np.arange(12)
x.shape = (3,2,2)
x
x.ndim
array operations¶
Numpy arrays support mathematical operations with other numpy arrays and with single numbers ("scalars").
With scalars, the scalar is first converted to an array with the same shape as the numpy array and then an element-wise operation is performed.
With other arrays of the same size, an element-wise operation is performed.
x = np.arange(10)
x
y = 2 * np.arange(10)
y
z = x + y
z
x + 3
x + 3.0
x
x/5
1/5
x
4*x
np.array([ 0, 3, 6, 9, 12, 15, 18, 21, 24, 27])/3
array dtype¶
Just like lists or tuples, every element in a numpy array has a data type. As mentioned above, however, every element in a numpy array has the same data type, and thus we can refer to the "datatype of the array". This can be set when the array is created with the dtype
keyword argument and read from the dtype
attribute:
x = np.arange(10)
x
x.dtype
x = np.arange(10, dtype=np.float64)
x
x.dtype
array indexing and slicing¶
Numpy arrays can be indexed and sliced just like other Python sequence types such as lists, tuples, and strings.
Just like with python lists, the indices or slices can be read and written. In other words, numpy arrays are mutable.
x = np.arange(10)
x
x[4]
x[:4]
x[4:]
tmp = slice(4)
x[tmp]
tmp = slice(4, 7)
x[tmp]
x[4:7]
tmp = slice(2, 7, 2)
x[tmp]
tmp = slice(2, None, 2)
x[tmp]
x[2::2]
x[4::3]
x[4:7]
x[2::2.2]
np.arange(0, 10, 2.2)
x[::2]
Because numpy arrays can have 2 or more, dimensions, we can also index and slice them in higher dimensions. For two dimensional arrays, the first index is always the row index and the second index is always the column index.
x = np.arange(12)
x.shape = (3,4)
x
x[1:, :]
x[:, 1:]
x[1:, 2:]
x[1:2:]
x
x[1:, 2:] = 99
x
x.dtype
x[1:, 2:] = 99.5
x
x[1:, 2:] = 99.9
x
x[:, :] = 99.9
x
References to arrays¶
Remember that variable assignment in Python does not create a new object but only creates a variable which points to an existing object. This is very important with numpy.
x = np.arange(20)
x
# Here we create a variable which references the first 10 elements of `x`.
y = x[:10]
y
x[4] = 9999
x
y
# Now we assign all the elements of `y` to have the value of 123.
# We do this by creating a slice into the array `y` and assigning to it.
y[:] = 123
y
# How does this affect the original array `x`?
x
y[-1] = 999
x
y[::2] = -1
x
x[:3] = -100
x
y
z = y.copy()
y
z
y[:2] = -9999
y
z
z = y[:]
y
z
y[:2] = 444444
y
z
z = np.arange(12)
z
zref = z[::3]
zref[:] = 999
z
zrefref = zref[::2]
zrefref[:] = -1000
z
Array slices - a key difference between a numpy array and a Python list¶
With a numpy array, a slice is created by [:]
(e.g. my_array[:]
). With a plain Python list, [:]
will return a copy of the list.
For both numpy arrays and Python lists, the .copy()
method will make a copy, so this is preferred if you want to be sure you are making a copy.
# First with a list
a = [1,2,3]
b = a[:]
a[0] = 100
b
a
# Now with a numpy array
a = np.array([1,2,3])
b = a[:]
a[0] = 100
b
a
Efficient data processing with numpy¶
Because operations on numpy arrays happen for all elements with a single Python expression, these can operations can be performed very fast and efficiently by the computer. For example, if x
is a numpy array with 10,000 elements, we can avoid a Python for loop with 10,000 iterations by performing our work with numpy.
Below we use the Jupyter "magic command %timeit
" to measure how long a single expression takes, in this case performing an element-wise multiplication.
x = np.arange(10000, dtype=np.float64)
x*x
%timeit x*x
y = x*x
len(y)
assert y[2] == 4
assert y[3] == 9
Now let's do the same as above with a Python list
. We need to crease a list_mul function.
def list_mul(a,b):
"""element-wise product of `a` and `b`"""
n = len(a)
assert n==len(b)
c = []
for i in range(n):
c.append(a[i] * b[i])
return c
Now convert x
to a list from a numpy array.
x = list(x)
type(x)
x[:10]
%timeit list_mul(x,x)
y = list_mul(x,x)
assert y[3] == 9
Elementwise numpy operations¶
Above you have already seen element-wise multiplication, which multiplies every element of two inputs. Similarly, other operations operate element wise on a single input array.
np.sqrt( np.array([1, 4, 9]))
np.linspace( 0, 2*np.pi, 30)
np.cos( np.linspace( 0, 2*np.pi, 30) )
More numpy operations¶
In addition to elementwise operations such as np.cos(x)
or x * y
where x
and y
are same-shaped arrays, numpy can also perform operations on entire arrays.
Take for example the mean()
function.
x = np.arange(10)
x
np.mean(x)
We can also do the mean on a 2D array, either for the entire array or row-wise or column-wise:
x = np.arange(30)
x.shape = (5,6)
x
np.mean(x)
# take the mean across the rows, (i.e. mean of each column), which is axis 0.
np.mean(x,axis=0)
# take the mean across the columns, which is axis 1.
np.mean(x,axis=1)
In addition to mean()
, numpy provides std()
, sum()
, and more.
np.std(x)
np.sum(x)
np.max(x)
x.mean()
np.mean(x)
argmin and argmax¶
Important in many scientific computing applications are argmin
and argmax
functions. These return the index of the smallest or largest value, respectively.
x = np.array([0, 10, -10, 4, 3, 2, 100, 2, 2, -1])
x
min_idx = np.argmin(x)
min_idx
x[min_idx]
np.min(x)
x
max_idx = np.argmax(x)
max_idx
x[max_idx]
x = np.array([0, 100, 0, 4, 3, 2, 100, 2, 2, -1])
x
np.argmax(x)
Because of its speed, numpy makes it possible to use Python for scientific computing.¶
You can read more about numpy at its User Guide and its Reference Guide.
Live coding example: calculate distance between 2D points.¶
import matplotlib.pyplot as plt
a = (10, 20)
b = (13, 24)
plt.plot([a[0]], [a[1]], 'o', label='a')
plt.plot([b[0]], [b[1]], 'o', label='b')
plt.plot([5, 15, 15, 5, 5], [15, 15, 25, 25, 15], 'k-')
plt.legend();
def compute_distance(a,b):
dx = a[0] - b[0]
dy = a[1] - b[1]
return np.sqrt(dx*dx + dy*dy)
compute_distance(a,b)
assert compute_distance(a,b)==5.0