pm21-dragon/lectures/lecture-04/1 - For-loops-dicts-files.ipynb
2024-11-08 08:55:51 +01:00

58 KiB

None <html> <head> </head>

exercise-04 review

  • I check your answers based on file name. Please keep the files names exactly as specified, i.e. my_name.py.

  • Example answers:

print("Paolo")


a = 1001
b = 22
def onethousandandone_times_twentytwo(a,b):
    print(a*b)

vs

# Create a Python script called `my_name.py` which does two things:

# 1) prints your name

print("Paolo")

# 2) computes the value of 1001 * 22 and then prints this

result = 1001*22
print(result)

vs

print ("Paolo")
value1 = 1001*22
print (value1)

Correct answer should look like this:

astraw@computer$ python my_name.py
Paolo
22022
astraw@computer$
In [ ]:
# What is wrong with this code?
print(Andrew)
print(1001 * 22)

For loops, iterators, Dictionaries, more operators, files

In [ ]:
# We run this for use below
import matplotlib.pyplot as plt

Control flow with for using range to produce an iterator

In [ ]:
for x in range(10):
    print(x)
In [ ]:
for x in range(0, 10):
    print(x)
In [ ]:
for y in range(0, 1000, 100):
    print(y)
In [ ]:
myiter = range(0, 1000, 100)
print('myiter:', myiter)
print(type(myiter))
for y in myiter:
    print(y)
In [ ]:
for y in range(10):
    print(y)
In [ ]:
for y in range(4,10):
    print(y)
In [ ]:
for y in range(4, 10, 2):
    print(y)

Note the symmetry between range() and slices.

In [ ]:
my_list = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [ ]:
my_list[:10]
In [ ]:
my_list[4:10]
In [ ]:
my_list[4:10:2]

Control flow with for using a list as an iterator

In [ ]:
my_list = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

for y in my_list:
    print(y)
print("end")
In [ ]:
my_list = [[5,5,6], [6,6,7]]

for y in my_list:
    print('y:',y)
    for z in y:
        print(z)
print("end")

iterators

We have seen now a couple examples of iterators.

An iterator is not a type in Python but rather a behavior that some types have. Namely, you can iterate over them. This means you can use them as the source of data in a for loop. All items in the iterators do not need to be stored in memory at once, but rather they can be constructed one at a time.

Iterators could run infinitely or they can end at a certain point.

We can create a list from all values in an iterator in a couple different ways.

The first you should be able to do by yourself already:

In [ ]:
my_list = []
for x in range(10):
    my_list.append(x)
my_list

The second approach of creating a list from all values in an iterator relies on the list() function, which is the constructor of a list. This constructor function will iterate over the iterator and create a list with its contents:

In [ ]:
my_list = list(range(10))
my_list
In [ ]:
my_list = []
x = "my super important data"
# Note that we overwrite x here!
for x in range(2):
    my_list.append(x)
my_list
In [ ]:
x

continue and break work in for loops, too.

In [ ]:
my_list = []
for x in range(100):
    if x > 5:
        if x < 10:
            continue
    if x >= 20:
        break
    my_list.append(x)
my_list

Methods

Methods are a way of giving a type specific additional functions. You already know a few of them, which so far we have just used without discussing much. This includes list.append and str.format.

In [ ]:
my_list = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
my_list.append(10)
my_list
In [ ]:
my_str = "Hello, my name is {}"
In [ ]:
my_str
In [ ]:
my_str.format("Andrew")
In [ ]:
my_str

Later, we will learn how to define our own methods. For now, it's just important that you know a method is like a function. Both can be called with input arguments, they return an output value, and they can have "side effects" -- changes to their inputs or something else.

Modules

We have also used a number of modules without discussing this aspect much. There are built-in modules -- they come with Python as part of "the standard library" -- and there are modules which have to be installed separately. Matplotlib, for example, is a set of modules, (a "library") which we use a lot and which is not part of the Python language itself.

Modules are a data type in Python like any other. They can have functions which have names like module_name.function_name. This is a very minor point, but the . makes a function in a module "look like" a method, but actually it is a normal function.

Here we import the random module from the standard library.

In [ ]:
import random
In [ ]:
x = [1,2,3,4,5,'asdf','dalkfj']
random.choice(x)
In [ ]:
random.choice(x)

As mentioned, there are modules which are not part of the Python language itself. In fact there are approximately zillions of libraries for doing many, many different things, and this is one of the reasons Python is so useful and so popular. There can be a positive feedback loop between language popularity and the availability of libraries, and Python has benefitted a lot from this - especially in the data science area.

One place that distributes many Python modules: PyPI, the python package index another is Anaconda.

As an example, let's return to our previous use of matplotlib. Check, for example the matplotlib gallery for example plots. Here is a simple usage of matplotlib to draw a simple plot:

In [ ]:
# Below, we will use matplotlib, so we need to import it here.
import matplotlib.pyplot as plt

x=[1,2,3,4,5,6,7,8,9,10]
y=[0,4,0,3,3,0,3,4,5,2]

plt.plot(x,y)

To start with, there are a few simple things you can do to improve your plot:

In [ ]:
# Below, we will use matplotlib, so we need to import it here.

x=[1,2,3,4,5,6,7,8,9,10]
y1=[0,4,0,3,3,0,3,4,5,2]
plt.plot(x, y1, label="y1")
plt.plot(x, x, label="x")
y2=[3,2,4,4,2,4,4,2,4,2]
plt.plot(x, y2, label="y2")
plt.legend()
plt.xlabel('x (unit 1)')
plt.ylabel('y (unit 2)')

Example: compute the Fibonacci sequence using recursion

1, 1, 2, 3, 5, 8, 13

In [ ]:
def fib(n):
    """Return the Fibonacci sequence up to position n.
    
    n is an integer"""
    # Check that our assumptions are true
    assert type(n)==int
    assert n>0
    
    # special cases for short lists
    if n == 1:
        return [1]
    if n == 2:
        return [1,1]
    
    seq = fib(n-1)
    a = seq[-2]
    b = seq[-1]
    seq.append( a+b )
    return seq

fib(3)
In [ ]:
fib(4)
In [ ]:
fib(10)

More strings

str

Useful function for strings:

  • len

Useful methods:

  • strip
  • split
  • startswith
  • endswith
In [ ]:
len("my string")
In [ ]:
"   my string    ".strip()
In [ ]:
len("   my string    ")
In [ ]:
len("   my string    ".strip())
In [ ]:
a="   my string    "
a.strip()
In [ ]:
a
In [ ]:
"a,b,c,def".split(",")
In [ ]:
"hello world".startswith("hello")
In [ ]:
"hello world".endswith("world")

Dictionaries - Python's dict type

dict construction is with either {} or dict().

In [ ]:
x = {'key1': 'value1',
     'key2': 'value2',
    'key3': 'value3',
    }
In [ ]:
x
In [ ]:
x['key1']
In [ ]:
key = "key3"
In [ ]:
x[key]
In [ ]:
x[key1]
In [ ]:
x = dict(   (('key1', 'value1'), ['key2', 'value2'], ('key3', 'value3'))    )
In [ ]:
x
In [ ]:
type(x)

Keys in a dict can be any value that is hashable.

In [ ]:
x={1:'value1', 2:'value2'}
In [ ]:
x[1]
In [ ]:
x={(1,2,3): "456"}
x
In [ ]:
x[(1,2,3)]
In [ ]:
x={[1,2,3]: "456"}
x
In [ ]:
x = {'key1':1, 'key2':2, 'key3':123456, 'key4': [1,2,3], 'key5': {}, 1234: 4321, (1,2,3): '9845712345'}
In [ ]:
x

Just like we can iterate over items in a list, we can iterate over the keys in a dict:

In [ ]:
for key in x:
    print(key)
In [ ]:
for key in x:
    value = x[key]
    print(f"key: {key}, value: {value}")
In [ ]:
x['key5']
In [ ]:
x['key does not exist']
In [ ]:
x['my new key'] = 9843059
In [ ]:
x
In [ ]:
x['key5']['hello'] = 'world'
In [ ]:
x
In [ ]:
tmp = x['key5']
tmp['hello'] = 'world 2'
In [ ]:
x
In [ ]:
x['key4'].append( 4 )
In [ ]:
x
In [ ]:
'key1' in x
In [ ]:
1 in x
In [ ]:
1234 in x

More about functions: keyword arguments

In [ ]:
def my_function(x, z=1):
    return x+z*z
In [ ]:
my_function(9)
In [ ]:
my_function(9,11)
In [ ]:
my_function(9,z=11)
In [ ]:
my_function(x=9,z=11)
In [ ]:
my_function(z=11)
In [ ]:
my_function(z=11,x=9)
In [ ]:
my_function(z=11,9)
In [ ]:
def my_function2(x, y, z=1, qq=0):
    return x+z+qq+y
In [ ]:
my_function2(0,1)
In [ ]:
my_function2(0,1,qq=-32)

The + operator on various data types

In [ ]:
1+1
In [ ]:
1 + 2.3
In [ ]:
"1"+1
In [ ]:
1+"1"
In [ ]:
"1"+"1"
In [ ]:
"1   x" + "1   y"
In [ ]:
[1]+1
In [ ]:
[1] + [1]
In [ ]:
x=[1]
y=[1]
z=x+y
z
In [ ]:
list.__add__([1], [1])
In [ ]:
x=[1]
x.append(1)
x
In [ ]:
int.__add__(1, 3)
In [ ]:
1 + 3

Note: "joining" or "combining" one sequence to another is called concatenating the sequences. It works with lists of any length:

In [ ]:
[1,2,3] + [4,5,6]
In [ ]:
[1,2,3] + []
In [ ]:
[] + [1,2,3]
In [ ]:
(1,) + (1,)
In [ ]:
(1,) + 1

The * operator on various data types

In [ ]:
1*5
In [ ]:
"1xz"*5
In [ ]:
"1xz"*"blah"
In [ ]:
[1,2,3]*5
In [ ]:
5 * [1,2,3]

Special method: object.__add__(other)

Many of the bits of Python we have already been using are defined as "special methods". The names of these methods start and end with a double underscore __. They are not usually called directly, but rather Python calls these methods "secretly" to acheive some task. As we saw above, the "add" special method is implemended with __add__:

In [ ]:
six = 6
six.__add__(4)
In [ ]:
int.__add__(6,4)
In [ ]:
six+4

Special method: object.__getitem__(index)

The special method object.__getitem__(index) is how python implements object[index].

In [ ]:
x={0:1}
In [ ]:
x
In [ ]:
x[0]
In [ ]:
x.__getitem__(0)
In [ ]:
x={1:"value1",2:43}
In [ ]:
x[1]
In [ ]:
x.__getitem__(1)

Special method: sequence.__len__()

Another special method is __len__, which returns the length of a sequence.

In [ ]:
x
In [ ]:
len(x)
In [ ]:
x.__len__()

Special methods: object.__str__() (and object.__repr__())

Another special method is __str__, which returns a string representation of the object. (__repr__ does something very similar but can often be used to "reproduce" the original thing and is hence a little more exact if less "nice" or "pretty".)

In [ ]:
str(0.4)
In [ ]:
x = 0.4
x.__str__()
In [ ]:
x={1:"value1",2:43}
x.__str__()
In [ ]:
print(x)
In [ ]:
print("hello")
In [ ]:
"hello"
In [ ]:
f"my value is: {x}"
In [ ]:
one = 1
one.__str__()
In [ ]:
f"my value is: {1}"
In [ ]:
repr(1/9)
In [ ]:
 
In [ ]:
 
In [ ]:
x.__repr__()
In [ ]:
"hello".__repr__()
In [ ]:
"hello".__str__()
In [ ]:
print("hello".__str__())
In [ ]:
print("hello".__repr__())
In [ ]:
print.__str__()

Abstract interfaces in python

for loops iterate over "iterables". You can construct a list (or a dict) from iterables.

Functions and methods are "callable".

Getting items with square brackets (e.g. x[0]) works by calling the __getitem__ method (so, x.__getitem__(0)). Any type can define how this works for that type.

More on iterators

There are a couple of very handy functions which take an iterable and return a new iterator:

  • enumerate(items) - returns iterator with index of items. Each iteration produces a tuple with (index, item).
  • zip(a_items, b_items) - returns iterator combining two other iterators. Each iteration produces a tuple with (a_item, b_item)
In [ ]:
my_list = ['abc', 'def', 'ghi']
my_iterator = enumerate(my_list)
for x in my_iterator:
    idx, item = x
    print(f"{idx}: {item}")

Usually, the temporary iterator would be implicit:

In [ ]:
my_list = ['abc', 'def', 'ghi']
for x in enumerate(my_list):
    idx, item = x
    print(f"{idx}: {item}")

We can directly assign the tuple to two variables for further elimination of temporary variables:

In [ ]:
my_list = ['abc', 'def', 'ghi']
for idx, item in enumerate(my_list):
    print(f"{idx}: {item}")
In [ ]:
 

Now, for zip:

In [ ]:
my_list = ['abc', 'def', 'ghi']
list2 = ['red', 'green', 'blue']
my_iterator = zip(my_list, list2)
for x in my_iterator:
    (item, color) = x
    print(f"{item} {color}")
In [ ]:
my_list = ['abc', 'def', 'ghi']
for (item, color) in zip(my_list, ['red', 'green', 'blue']):
    print(f"{item} {color}")
In [ ]:
my_list = ['abc', 'def', 'ghi']
for item, number in zip(my_list, range(3,6)):
    print(f"{item} {number}")

Data Frames

We are going to look at data in tables where each row of the table contains measurements or values about a single thing and each column is the measurement type. Such tables are very common in data science.

(Loading the iris data is hidden in this cell. You can ignore this.)

Here is an example of the data we will be looking at. It is a subsampling of the very famous Iris data set.

sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) species
11 4.8 3.4 1.6 0.2 setosa
81 5.5 2.4 3.7 1.0 versicolor
97 6.2 2.9 4.3 1.3 versicolor
37 4.9 3.6 1.4 0.1 setosa
31 5.4 3.4 1.5 0.4 setosa
28 5.2 3.4 1.4 0.2 setosa
141 6.9 3.1 5.1 2.3 virginica
149 5.9 3.0 5.1 1.8 virginica

For now, the data are given as a dict. This dict is created in a special way, where each key is the column name and each value is a list of the entry for each row for that column. Later we will read this from a file.

In [ ]:
iris_dataset = {'sepal length (cm)': [5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.8, 
                                      4.8, 4.3, 5.8, 5.7, 5.4, 5.1, 5.7, 5.1, 5.4, 5.1, 4.6, 5.1, 
                                      4.8, 5.0, 5.0, 5.2, 5.2, 4.7, 4.8, 5.4, 5.2, 5.5, 4.9, 5.0,
                                      5.5, 4.9, 4.4, 5.1, 5.0, 4.5, 4.4, 5.0, 5.1, 4.8, 5.1, 4.6,
                                      5.3, 5.0, 7.0, 6.4, 6.9, 5.5, 6.5, 5.7, 6.3, 4.9, 6.6, 5.2,
                                      5.0, 5.9, 6.0, 6.1, 5.6, 6.7, 5.6, 5.8, 6.2, 5.6, 5.9, 6.1,
                                      6.3, 6.1, 6.4, 6.6, 6.8, 6.7, 6.0, 5.7, 5.5, 5.5, 5.8, 6.0,
                                      5.4, 6.0, 6.7, 6.3, 5.6, 5.5, 5.5, 6.1, 5.8, 5.0, 5.6, 5.7, 
                                      5.7, 6.2, 5.1, 5.7, 6.3, 5.8, 7.1, 6.3, 6.5, 7.6, 4.9, 7.3, 6.7, 
                                      7.2, 6.5, 6.4, 6.8, 5.7, 5.8, 6.4, 6.5, 7.7, 7.7, 6.0, 6.9, 5.6, 7.7, 6.3, 6.7, 
                                      7.2, 6.2, 6.1, 6.4, 7.2, 7.4, 7.9, 6.4, 6.3, 6.1, 7.7, 6.3, 6.4, 6.0, 6.9, 6.7, 
                                      6.9, 5.8, 6.8, 6.7, 6.7, 6.3, 6.5, 6.2, 5.9], 
                'sepal width (cm)': [3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.4, 3.0, 3.0, 4.0, 4.4, 3.9, 3.5, 3.8, 3.8, 3.4, 3.7, 3.6, 3.3, 3.4, 3.0, 3.4, 3.5, 3.4, 3.2, 3.1, 3.4, 4.1, 4.2, 3.1, 3.2, 3.5, 3.6, 3.0, 3.4, 3.5, 2.3, 3.2, 3.5, 3.8, 3.0, 3.8, 3.2, 3.7, 3.3, 3.2, 3.2, 3.1, 2.3, 2.8, 2.8, 3.3, 2.4, 2.9, 2.7, 2.0, 3.0, 2.2, 2.9, 2.9, 3.1, 3.0, 2.7, 2.2, 2.5, 3.2, 2.8, 2.5, 2.8, 2.9, 3.0, 2.8, 3.0, 2.9, 2.6, 2.4, 2.4, 2.7, 2.7, 3.0, 3.4, 3.1, 2.3, 3.0, 2.5, 2.6, 3.0, 2.6, 2.3, 2.7, 3.0, 2.9, 2.9, 2.5, 2.8, 3.3, 2.7, 3.0, 2.9, 3.0, 3.0, 2.5, 2.9, 2.5, 3.6, 3.2, 2.7, 3.0, 2.5, 2.8, 3.2, 3.0, 3.8, 2.6, 2.2, 3.2, 2.8, 2.8, 2.7, 3.3, 3.2, 2.8, 3.0, 2.8, 3.0, 2.8, 3.8, 2.8, 2.8, 2.6, 3.0, 3.4, 3.1, 3.0, 3.1, 3.1, 3.1, 2.7, 3.2, 3.3, 3.0, 2.5, 3.0, 3.4, 3.0], 'petal length (cm)': [1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.6, 1.4, 1.1, 1.2, 1.5, 1.3, 1.4, 1.7, 1.5, 1.7, 1.5, 1.0, 1.7, 1.9, 1.6, 1.6, 1.5, 1.4, 1.6, 1.6, 1.5, 1.5, 1.4, 1.5, 1.2, 1.3, 1.4, 1.3, 1.5, 1.3, 1.3, 1.3, 1.6, 1.9, 1.4, 1.6, 1.4, 1.5, 1.4, 4.7, 4.5, 4.9, 4.0, 4.6, 4.5, 4.7, 3.3, 4.6, 3.9, 3.5, 4.2, 4.0, 4.7, 3.6, 4.4, 4.5, 4.1, 4.5, 3.9, 4.8, 4.0, 4.9, 4.7, 4.3, 4.4, 4.8, 5.0, 4.5, 3.5, 3.8, 3.7, 3.9, 5.1, 4.5, 4.5, 4.7, 4.4, 4.1, 4.0, 4.4, 4.6, 4.0, 3.3, 4.2, 4.2, 4.2, 4.3, 3.0, 4.1, 6.0, 5.1, 5.9, 5.6, 5.8, 6.6, 4.5, 6.3, 5.8, 6.1, 5.1, 5.3, 5.5, 5.0, 5.1, 5.3, 5.5, 6.7, 6.9, 5.0, 5.7, 4.9, 6.7, 4.9, 5.7, 6.0, 4.8, 4.9, 5.6, 5.8, 6.1, 6.4, 5.6, 5.1, 5.6, 6.1, 5.6, 5.5, 4.8, 5.4, 5.6, 5.1, 5.1, 5.9, 5.7, 5.2, 5.0, 5.2, 5.4, 5.1], 'petal width (cm)': [0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.2, 0.1, 0.1, 0.2, 0.4, 0.4, 0.3, 0.3, 0.3, 0.2, 0.4, 0.2, 0.5, 0.2, 0.2, 0.4, 0.2, 0.2, 0.2, 0.2, 0.4, 0.1, 0.2, 0.2, 0.2, 0.2, 0.1, 0.2, 0.2, 0.3, 0.3, 0.2, 0.6, 0.4, 0.3, 0.2, 0.2, 0.2, 0.2, 1.4, 1.5, 1.5, 1.3, 1.5, 1.3, 1.6, 1.0, 1.3, 1.4, 1.0, 1.5, 1.0, 1.4, 1.3, 1.4, 1.5, 1.0, 1.5, 1.1, 1.8, 1.3, 1.5, 1.2, 1.3, 1.4, 1.4, 1.7, 1.5, 1.0, 1.1, 1.0, 1.2, 1.6, 1.5, 1.6, 1.5, 1.3, 1.3, 1.3, 1.2, 1.4, 1.2, 1.0, 1.3, 1.2, 1.3, 1.3, 1.1, 1.3, 2.5, 1.9, 2.1, 1.8, 2.2, 2.1, 1.7, 1.8, 1.8, 2.5, 2.0, 1.9, 2.1, 2.0, 2.4, 2.3, 1.8, 2.2, 2.3, 1.5, 2.3, 2.0, 2.0, 1.8, 2.1, 1.8, 1.8, 1.8, 2.1, 1.6, 1.9, 2.0, 2.2, 1.5, 1.4, 2.3, 2.4, 1.8, 1.8, 2.1, 2.4, 2.3, 1.9, 2.3, 2.5, 2.3, 1.9, 2.0, 2.3, 1.8], 'species': ['setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica', 'virginica']}
In [ ]:
plt.plot(iris_dataset['sepal width (cm)'], iris_dataset['petal width (cm)'],'o');
plt.xlabel('sepal width (cm)')
plt.ylabel('petal width (cm)')
In [ ]:
for column_name in iris_dataset:
    print(column_name)

Let's double check that every column (the value of each key) has the same number of rows.

In [ ]:
for column_name in iris_dataset:
    column = iris_dataset[column_name]
    print("'{}': {} rows".format(column_name, len(column)))

Now let's compute the average value for each measurement across all of our data.

In [ ]:
def compute_average(my_list):
    assert type(my_list)==list
    accum = 0.0
    for item in my_list:
        accum = accum + item
    average = accum / len(my_list)
    return average
In [ ]:
compute_average([4, 6])
In [ ]:
for column_name in iris_dataset:
    if column_name == 'species':
        continue
    average = compute_average(iris_dataset[column_name])
    print("'{}' average: {}".format(column_name, average))

Let's see what species we have in our data.

In [ ]:
known_species = {}
count = 0
for row_species in iris_dataset['species']:
    known_species[row_species] = None
    count = count + 1

print(count)
for species in known_species:
    print(species)
In [ ]:
known_species
In [ ]:
known_species = {}
for row_species in iris_dataset['species']:
    if row_species in known_species:
        known_species[row_species] += 1
    else:
        known_species[row_species] = 1

print(known_species)

Now, we will want to calculate values for each species, not across all measurements. This is going to be a little tricky, because we need to calculate which species is in which row. As our first step, we will figure this out.

In [ ]:
rows_for_species = {'setosa':[], 'versicolor':[], 'virginica':[]}
for species_name in rows_for_species:
    # print(species_name)
    row_index = 0
    for row_species in iris_dataset['species']:
        # print(row_index, row_species)
        if row_species == species_name:
            rows_for_species[species_name].append(row_index)
        row_index = row_index + 1
In [ ]:
rows_for_species

Let's check if this worked by building a list for each species of each column.

In [ ]:
for species_name in rows_for_species:
    # get a list of row numbers for `species_name`
    species_indexes = rows_for_species[species_name]
    # iterate over columns in dataset
    for column_name in iris_dataset:
        # get all data for this column (get all data for this measurement type, e.g. sepal width)
        all_rows_for_this_column = iris_dataset[column_name]
        
        # accumulate measurements in a list **for this species**
        this_species_values = []
        for species_index in species_indexes:
            # take only the rows corresponding to this species
            row_value = all_rows_for_this_column[species_index]
            this_species_values.append(row_value)
        print(f"{species_name} -> {column_name}: {this_species_values}")
        print()
</html>