58 KiB
exercise-04
review¶
I check your answers based on file name. Please keep the files names exactly as specified, i.e.
my_name.py
.Example answers:
print("Paolo")
a = 1001
b = 22
def onethousandandone_times_twentytwo(a,b):
print(a*b)
vs
# Create a Python script called `my_name.py` which does two things:
# 1) prints your name
print("Paolo")
# 2) computes the value of 1001 * 22 and then prints this
result = 1001*22
print(result)
vs
print ("Paolo")
value1 = 1001*22
print (value1)
Correct answer should look like this:
astraw@computer$ python my_name.py
Paolo
22022
astraw@computer$
# What is wrong with this code?
print(Andrew)
print(1001 * 22)
For loops, iterators, Dictionaries, more operators, files¶
# We run this for use below
import matplotlib.pyplot as plt
Control flow with for
using range
to produce an iterator¶
for x in range(10):
print(x)
for x in range(0, 10):
print(x)
for y in range(0, 1000, 100):
print(y)
myiter = range(0, 1000, 100)
print('myiter:', myiter)
print(type(myiter))
for y in myiter:
print(y)
for y in range(10):
print(y)
for y in range(4,10):
print(y)
for y in range(4, 10, 2):
print(y)
Note the symmetry between range()
and slices.
my_list = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
my_list[:10]
my_list[4:10]
my_list[4:10:2]
Control flow with for
using a list as an iterator¶
my_list = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
for y in my_list:
print(y)
print("end")
my_list = [[5,5,6], [6,6,7]]
for y in my_list:
print('y:',y)
for z in y:
print(z)
print("end")
iterators¶
We have seen now a couple examples of iterators.
An iterator is not a type in Python but rather a behavior that some types have. Namely, you can iterate over them. This means you can use them as the source of data in a for
loop. All items in the iterators do not need to be stored in memory at once, but rather they can be constructed one at a time.
Iterators could run infinitely or they can end at a certain point.
We can create a list from all values in an iterator in a couple different ways.
The first you should be able to do by yourself already:
my_list = []
for x in range(10):
my_list.append(x)
my_list
The second approach of creating a list from all values in an iterator relies on the list()
function, which is the constructor of a list. This constructor function will iterate over the iterator and create a list with its contents:
my_list = list(range(10))
my_list
my_list = []
x = "my super important data"
# Note that we overwrite x here!
for x in range(2):
my_list.append(x)
my_list
x
continue
and break
work in for loops, too.
my_list = []
for x in range(100):
if x > 5:
if x < 10:
continue
if x >= 20:
break
my_list.append(x)
my_list
Methods¶
Methods are a way of giving a type specific additional functions. You already know a few of them, which so far we have just used without discussing much. This includes list.append
and str.format
.
my_list = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
my_list.append(10)
my_list
my_str = "Hello, my name is {}"
my_str
my_str.format("Andrew")
my_str
Later, we will learn how to define our own methods. For now, it's just important that you know a method is like a function. Both can be called with input arguments, they return an output value, and they can have "side effects" -- changes to their inputs or something else.
Modules¶
We have also used a number of modules without discussing this aspect much. There are built-in modules -- they come with Python as part of "the standard library" -- and there are modules which have to be installed separately. Matplotlib, for example, is a set of modules, (a "library") which we use a lot and which is not part of the Python language itself.
Modules are a data type in Python like any other. They can have functions which have names like module_name.function_name
. This is a very minor point, but the .
makes a function in a module "look like" a method, but actually it is a normal function.
Here we import the random
module from the standard library.
import random
x = [1,2,3,4,5,'asdf','dalkfj']
random.choice(x)
random.choice(x)
As mentioned, there are modules which are not part of the Python language itself. In fact there are approximately zillions of libraries for doing many, many different things, and this is one of the reasons Python is so useful and so popular. There can be a positive feedback loop between language popularity and the availability of libraries, and Python has benefitted a lot from this - especially in the data science area.
One place that distributes many Python modules: PyPI, the python package index another is Anaconda.
As an example, let's return to our previous use of matplotlib. Check, for example the matplotlib gallery for example plots. Here is a simple usage of matplotlib to draw a simple plot:
# Below, we will use matplotlib, so we need to import it here.
import matplotlib.pyplot as plt
x=[1,2,3,4,5,6,7,8,9,10]
y=[0,4,0,3,3,0,3,4,5,2]
plt.plot(x,y)
To start with, there are a few simple things you can do to improve your plot:
# Below, we will use matplotlib, so we need to import it here.
x=[1,2,3,4,5,6,7,8,9,10]
y1=[0,4,0,3,3,0,3,4,5,2]
plt.plot(x, y1, label="y1")
plt.plot(x, x, label="x")
y2=[3,2,4,4,2,4,4,2,4,2]
plt.plot(x, y2, label="y2")
plt.legend()
plt.xlabel('x (unit 1)')
plt.ylabel('y (unit 2)')
Example: compute the Fibonacci sequence using recursion¶
1, 1, 2, 3, 5, 8, 13
def fib(n):
"""Return the Fibonacci sequence up to position n.
n is an integer"""
# Check that our assumptions are true
assert type(n)==int
assert n>0
# special cases for short lists
if n == 1:
return [1]
if n == 2:
return [1,1]
seq = fib(n-1)
a = seq[-2]
b = seq[-1]
seq.append( a+b )
return seq
fib(3)
fib(4)
fib(10)
len("my string")
" my string ".strip()
len(" my string ")
len(" my string ".strip())
a=" my string "
a.strip()
a
"a,b,c,def".split(",")
"hello world".startswith("hello")
"hello world".endswith("world")
Dictionaries - Python's dict
type¶
dict
construction is with either {}
or dict()
.
x = {'key1': 'value1',
'key2': 'value2',
'key3': 'value3',
}
x
x['key1']
key = "key3"
x[key]
x[key1]
x = dict( (('key1', 'value1'), ['key2', 'value2'], ('key3', 'value3')) )
x
type(x)
Keys in a dict
can be any value that is hashable.
x={1:'value1', 2:'value2'}
x[1]
x={(1,2,3): "456"}
x
x[(1,2,3)]
x={[1,2,3]: "456"}
x
x = {'key1':1, 'key2':2, 'key3':123456, 'key4': [1,2,3], 'key5': {}, 1234: 4321, (1,2,3): '9845712345'}
x
Just like we can iterate over items in a list, we can iterate over the keys in a dict:
for key in x:
print(key)
for key in x:
value = x[key]
print(f"key: {key}, value: {value}")
x['key5']
x['key does not exist']
x['my new key'] = 9843059
x
x['key5']['hello'] = 'world'
x
tmp = x['key5']
tmp['hello'] = 'world 2'
x
x['key4'].append( 4 )
x
'key1' in x
1 in x
1234 in x
More about functions: keyword arguments¶
def my_function(x, z=1):
return x+z*z
my_function(9)
my_function(9,11)
my_function(9,z=11)
my_function(x=9,z=11)
my_function(z=11)
my_function(z=11,x=9)
my_function(z=11,9)
def my_function2(x, y, z=1, qq=0):
return x+z+qq+y
my_function2(0,1)
my_function2(0,1,qq=-32)
The +
operator on various data types¶
1+1
1 + 2.3
"1"+1
1+"1"
"1"+"1"
"1 x" + "1 y"
[1]+1
[1] + [1]
x=[1]
y=[1]
z=x+y
z
list.__add__([1], [1])
x=[1]
x.append(1)
x
int.__add__(1, 3)
1 + 3
Note: "joining" or "combining" one sequence to another is called concatenating the sequences. It works with lists of any length:
[1,2,3] + [4,5,6]
[1,2,3] + []
[] + [1,2,3]
(1,) + (1,)
(1,) + 1
The *
operator on various data types¶
1*5
"1xz"*5
"1xz"*"blah"
[1,2,3]*5
5 * [1,2,3]
Special method: object.__add__(other)
¶
Many of the bits of Python we have already been using are defined as "special methods". The names of these methods start and end with a double underscore __
. They are not usually called directly, but rather Python calls these methods "secretly" to acheive some task. As we saw above, the "add" special method is implemended with __add__
:
six = 6
six.__add__(4)
int.__add__(6,4)
six+4
Special method: object.__getitem__(index)
¶
The special method object.__getitem__(index)
is how python implements object[index]
.
x={0:1}
x
x[0]
x.__getitem__(0)
x={1:"value1",2:43}
x[1]
x.__getitem__(1)
Special method: sequence.__len__()
¶
Another special method is __len__
, which returns the length of a sequence.
x
len(x)
x.__len__()
Special methods: object.__str__()
(and object.__repr__()
)¶
Another special method is __str__
, which returns a string representation of the object. (__repr__
does something very similar but can often be used to "reproduce" the original thing and is hence a little more exact if less "nice" or "pretty".)
str(0.4)
x = 0.4
x.__str__()
x={1:"value1",2:43}
x.__str__()
print(x)
print("hello")
"hello"
f"my value is: {x}"
one = 1
one.__str__()
f"my value is: {1}"
repr(1/9)
x.__repr__()
"hello".__repr__()
"hello".__str__()
print("hello".__str__())
print("hello".__repr__())
print.__str__()
Abstract interfaces in python¶
for
loops iterate over "iterables". You can construct a list
(or a dict
) from iterables.
Functions and methods are "callable".
Getting items with square brackets (e.g. x[0]
) works by calling the __getitem__
method (so, x.__getitem__(0)
). Any type can define how this works for that type.
More on iterators¶
There are a couple of very handy functions which take an iterable and return a new iterator:
enumerate(items)
- returns iterator with index of items. Each iteration produces a tuple with(index, item)
.zip(a_items, b_items)
- returns iterator combining two other iterators. Each iteration produces a tuple with(a_item, b_item)
my_list = ['abc', 'def', 'ghi']
my_iterator = enumerate(my_list)
for x in my_iterator:
idx, item = x
print(f"{idx}: {item}")
Usually, the temporary iterator would be implicit:
my_list = ['abc', 'def', 'ghi']
for x in enumerate(my_list):
idx, item = x
print(f"{idx}: {item}")
We can directly assign the tuple to two variables for further elimination of temporary variables:
my_list = ['abc', 'def', 'ghi']
for idx, item in enumerate(my_list):
print(f"{idx}: {item}")
Now, for zip
:
my_list = ['abc', 'def', 'ghi']
list2 = ['red', 'green', 'blue']
my_iterator = zip(my_list, list2)
for x in my_iterator:
(item, color) = x
print(f"{item} {color}")
my_list = ['abc', 'def', 'ghi']
for (item, color) in zip(my_list, ['red', 'green', 'blue']):
print(f"{item} {color}")
my_list = ['abc', 'def', 'ghi']
for item, number in zip(my_list, range(3,6)):
print(f"{item} {number}")
Data Frames¶
We are going to look at data in tables where each row of the table contains measurements or values about a single thing and each column is the measurement type. Such tables are very common in data science.
(Loading the iris data is hidden in this cell. You can ignore this.)
Here is an example of the data we will be looking at. It is a subsampling of the very famous Iris data set.
sepal length (cm) | sepal width (cm) | petal length (cm) | petal width (cm) | species | |
---|---|---|---|---|---|
11 | 4.8 | 3.4 | 1.6 | 0.2 | setosa |
81 | 5.5 | 2.4 | 3.7 | 1.0 | versicolor |
97 | 6.2 | 2.9 | 4.3 | 1.3 | versicolor |
37 | 4.9 | 3.6 | 1.4 | 0.1 | setosa |
31 | 5.4 | 3.4 | 1.5 | 0.4 | setosa |
28 | 5.2 | 3.4 | 1.4 | 0.2 | setosa |
141 | 6.9 | 3.1 | 5.1 | 2.3 | virginica |
149 | 5.9 | 3.0 | 5.1 | 1.8 | virginica |
For now, the data are given as a dict
. This dict
is created in a special way, where each key is the column name and each value is a list of the entry for each row for that column. Later we will read this from a file.
iris_dataset = {'sepal length (cm)': [5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.8,
4.8, 4.3, 5.8, 5.7, 5.4, 5.1, 5.7, 5.1, 5.4, 5.1, 4.6, 5.1,
4.8, 5.0, 5.0, 5.2, 5.2, 4.7, 4.8, 5.4, 5.2, 5.5, 4.9, 5.0,
5.5, 4.9, 4.4, 5.1, 5.0, 4.5, 4.4, 5.0, 5.1, 4.8, 5.1, 4.6,
5.3, 5.0, 7.0, 6.4, 6.9, 5.5, 6.5, 5.7, 6.3, 4.9, 6.6, 5.2,
5.0, 5.9, 6.0, 6.1, 5.6, 6.7, 5.6, 5.8, 6.2, 5.6, 5.9, 6.1,
6.3, 6.1, 6.4, 6.6, 6.8, 6.7, 6.0, 5.7, 5.5, 5.5, 5.8, 6.0,
5.4, 6.0, 6.7, 6.3, 5.6, 5.5, 5.5, 6.1, 5.8, 5.0, 5.6, 5.7,
5.7, 6.2, 5.1, 5.7, 6.3, 5.8, 7.1, 6.3, 6.5, 7.6, 4.9, 7.3, 6.7,
7.2, 6.5, 6.4, 6.8, 5.7, 5.8, 6.4, 6.5, 7.7, 7.7, 6.0, 6.9, 5.6, 7.7, 6.3, 6.7,
7.2, 6.2, 6.1, 6.4, 7.2, 7.4, 7.9, 6.4, 6.3, 6.1, 7.7, 6.3, 6.4, 6.0, 6.9, 6.7,
6.9, 5.8, 6.8, 6.7, 6.7, 6.3, 6.5, 6.2, 5.9],
'sepal width (cm)': [3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.4, 3.0, 3.0, 4.0, 4.4, 3.9, 3.5, 3.8, 3.8, 3.4, 3.7, 3.6, 3.3, 3.4, 3.0, 3.4, 3.5, 3.4, 3.2, 3.1, 3.4, 4.1, 4.2, 3.1, 3.2, 3.5, 3.6, 3.0, 3.4, 3.5, 2.3, 3.2, 3.5, 3.8, 3.0, 3.8, 3.2, 3.7, 3.3, 3.2, 3.2, 3.1, 2.3, 2.8, 2.8, 3.3, 2.4, 2.9, 2.7, 2.0, 3.0, 2.2, 2.9, 2.9, 3.1, 3.0, 2.7, 2.2, 2.5, 3.2, 2.8, 2.5, 2.8, 2.9, 3.0, 2.8, 3.0, 2.9, 2.6, 2.4, 2.4, 2.7, 2.7, 3.0, 3.4, 3.1, 2.3, 3.0, 2.5, 2.6, 3.0, 2.6, 2.3, 2.7, 3.0, 2.9, 2.9, 2.5, 2.8, 3.3, 2.7, 3.0, 2.9, 3.0, 3.0, 2.5, 2.9, 2.5, 3.6, 3.2, 2.7, 3.0, 2.5, 2.8, 3.2, 3.0, 3.8, 2.6, 2.2, 3.2, 2.8, 2.8, 2.7, 3.3, 3.2, 2.8, 3.0, 2.8, 3.0, 2.8, 3.8, 2.8, 2.8, 2.6, 3.0, 3.4, 3.1, 3.0, 3.1, 3.1, 3.1, 2.7, 3.2, 3.3, 3.0, 2.5, 3.0, 3.4, 3.0], 'petal length (cm)': [1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.6, 1.4, 1.1, 1.2, 1.5, 1.3, 1.4, 1.7, 1.5, 1.7, 1.5, 1.0, 1.7, 1.9, 1.6, 1.6, 1.5, 1.4, 1.6, 1.6, 1.5, 1.5, 1.4, 1.5, 1.2, 1.3, 1.4, 1.3, 1.5, 1.3, 1.3, 1.3, 1.6, 1.9, 1.4, 1.6, 1.4, 1.5, 1.4, 4.7, 4.5, 4.9, 4.0, 4.6, 4.5, 4.7, 3.3, 4.6, 3.9, 3.5, 4.2, 4.0, 4.7, 3.6, 4.4, 4.5, 4.1, 4.5, 3.9, 4.8, 4.0, 4.9, 4.7, 4.3, 4.4, 4.8, 5.0, 4.5, 3.5, 3.8, 3.7, 3.9, 5.1, 4.5, 4.5, 4.7, 4.4, 4.1, 4.0, 4.4, 4.6, 4.0, 3.3, 4.2, 4.2, 4.2, 4.3, 3.0, 4.1, 6.0, 5.1, 5.9, 5.6, 5.8, 6.6, 4.5, 6.3, 5.8, 6.1, 5.1, 5.3, 5.5, 5.0, 5.1, 5.3, 5.5, 6.7, 6.9, 5.0, 5.7, 4.9, 6.7, 4.9, 5.7, 6.0, 4.8, 4.9, 5.6, 5.8, 6.1, 6.4, 5.6, 5.1, 5.6, 6.1, 5.6, 5.5, 4.8, 5.4, 5.6, 5.1, 5.1, 5.9, 5.7, 5.2, 5.0, 5.2, 5.4, 5.1], 'petal width (cm)': [0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.2, 0.1, 0.1, 0.2, 0.4, 0.4, 0.3, 0.3, 0.3, 0.2, 0.4, 0.2, 0.5, 0.2, 0.2, 0.4, 0.2, 0.2, 0.2, 0.2, 0.4, 0.1, 0.2, 0.2, 0.2, 0.2, 0.1, 0.2, 0.2, 0.3, 0.3, 0.2, 0.6, 0.4, 0.3, 0.2, 0.2, 0.2, 0.2, 1.4, 1.5, 1.5, 1.3, 1.5, 1.3, 1.6, 1.0, 1.3, 1.4, 1.0, 1.5, 1.0, 1.4, 1.3, 1.4, 1.5, 1.0, 1.5, 1.1, 1.8, 1.3, 1.5, 1.2, 1.3, 1.4, 1.4, 1.7, 1.5, 1.0, 1.1, 1.0, 1.2, 1.6, 1.5, 1.6, 1.5, 1.3, 1.3, 1.3, 1.2, 1.4, 1.2, 1.0, 1.3, 1.2, 1.3, 1.3, 1.1, 1.3, 2.5, 1.9, 2.1, 1.8, 2.2, 2.1, 1.7, 1.8, 1.8, 2.5, 2.0, 1.9, 2.1, 2.0, 2.4, 2.3, 1.8, 2.2, 2.3, 1.5, 2.3, 2.0, 2.0, 1.8, 2.1, 1.8, 1.8, 1.8, 2.1, 1.6, 1.9, 2.0, 2.2, 1.5, 1.4, 2.3, 2.4, 1.8, 1.8, 2.1, 2.4, 2.3, 1.9, 2.3, 2.5, 2.3, 1.9, 2.0, 2.3, 1.8], 'species': ['setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica']}
plt.plot(iris_dataset['sepal width (cm)'], iris_dataset['petal width (cm)'],'o');
plt.xlabel('sepal width (cm)')
plt.ylabel('petal width (cm)')
for column_name in iris_dataset:
print(column_name)
Let's double check that every column (the value of each key) has the same number of rows.
for column_name in iris_dataset:
column = iris_dataset[column_name]
print("'{}': {} rows".format(column_name, len(column)))
Now let's compute the average value for each measurement across all of our data.
def compute_average(my_list):
assert type(my_list)==list
accum = 0.0
for item in my_list:
accum = accum + item
average = accum / len(my_list)
return average
compute_average([4, 6])
for column_name in iris_dataset:
if column_name == 'species':
continue
average = compute_average(iris_dataset[column_name])
print("'{}' average: {}".format(column_name, average))
Let's see what species we have in our data.
known_species = {}
count = 0
for row_species in iris_dataset['species']:
known_species[row_species] = None
count = count + 1
print(count)
for species in known_species:
print(species)
known_species
known_species = {}
for row_species in iris_dataset['species']:
if row_species in known_species:
known_species[row_species] += 1
else:
known_species[row_species] = 1
print(known_species)
Now, we will want to calculate values for each species, not across all measurements. This is going to be a little tricky, because we need to calculate which species is in which row. As our first step, we will figure this out.
rows_for_species = {'setosa':[], 'versicolor':[], 'virginica':[]}
for species_name in rows_for_species:
# print(species_name)
row_index = 0
for row_species in iris_dataset['species']:
# print(row_index, row_species)
if row_species == species_name:
rows_for_species[species_name].append(row_index)
row_index = row_index + 1
rows_for_species
Let's check if this worked by building a list for each species of each column.
for species_name in rows_for_species:
# get a list of row numbers for `species_name`
species_indexes = rows_for_species[species_name]
# iterate over columns in dataset
for column_name in iris_dataset:
# get all data for this column (get all data for this measurement type, e.g. sepal width)
all_rows_for_this_column = iris_dataset[column_name]
# accumulate measurements in a list **for this species**
this_species_values = []
for species_index in species_indexes:
# take only the rows corresponding to this species
row_value = all_rows_for_this_column[species_index]
this_species_values.append(row_value)
print(f"{species_name} -> {column_name}: {this_species_values}")
print()