pm21-dragon/lectures/lecture-04/2 - reading-CSV.ipynb
2024-11-08 12:03:56 +01:00

26 KiB

None <html> <head> </head>
In [1]:
import matplotlib.pyplot as plt

reading CSV files with pure Python

CSV ("comma separated values") files are very widely used for storing tables of data. Excel and Google Sheets can read and write CSV files quite easily. They are also "human readable" as data files -- you can open one it a simple text viewer program such as TextEdit as see the contents.

Unfortunately they are not totally standard. Here we will open one and read its contents into a dictionary. The dictionary will have one key for each column and a list of values for each row.

For example, with file.csv like so:

column 1,column 2
1,2
3,4

We would like to extract a dictionary like this:

{'column 1': [1,2], 'column 2': [3,4]}

We will now do this for the file iris.csv which contains the data used above.

In [2]:
fobj = open("iris.csv")
In [3]:
fobj = open("iris.csv")
for line_num, line in enumerate(fobj.readlines()):
    print(repr(line))
    if line_num > 5:
        break
'sepal_length,sepal_width,petal_length,petal_width,species\n'
'5.1,3.5,1.4,0.2,setosa\n'
'4.9,3.0,1.4,0.2,setosa\n'
'4.7,3.2,1.3,0.2,setosa\n'
'4.6,3.1,1.5,0.2,setosa\n'
'5.0,3.6,1.4,0.2,setosa\n'
'5.4,3.9,1.7,0.4,setosa\n'
In [4]:
fobj = open("iris.csv")
for line_num, line in enumerate(fobj.readlines()):
    line = line.strip()
    print(repr(line))
    if line_num > 5:
        break
'sepal_length,sepal_width,petal_length,petal_width,species'
'5.1,3.5,1.4,0.2,setosa'
'4.9,3.0,1.4,0.2,setosa'
'4.7,3.2,1.3,0.2,setosa'
'4.6,3.1,1.5,0.2,setosa'
'5.0,3.6,1.4,0.2,setosa'
'5.4,3.9,1.7,0.4,setosa'
In [9]:
fobj = open("iris.csv")
iris_dataset_from_csv= {}
for line_num, line in enumerate(fobj.readlines()):
    line = line.strip()
    entries = line.split(',')
    # print(entries)
    # if line_num > 5:
    #     break
    if line_num == 0:
        column_names = entries
        for column_name in column_names:
            iris_dataset_from_csv[column_name] = []
        continue
    # if we are here, we are line_num >= 1 and iris_dataset_from_csv is set up with columns and
    # column_names has our column names in the right order.
    for (column_name, entry) in zip(column_names, entries):
        if column_name != 'species':
            entry = float(entry)
        iris_dataset_from_csv[column_name].append(entry) 
In [15]:
plt.plot(iris_dataset_from_csv['sepal_width'], iris_dataset_from_csv['petal_width'],'o', alpha=0.2);

Using matplotlib from Python programs (not in Jupyter):

Open an interactive window:

plt.show()

Save figure to file

plt.savefig("plot_filename.png")

Live demo of all the above to call plt.savefig().

</html>