94 KiB
Reading CSV files¶
Step 1: Download file from https://archive.ics.uci.edu/ml/datasets/Wine+Quality . Click the "Download" button to get the wine+quality.zip
file. Open this file and extract winequality-red.csv
. Place it in the folder alongside this notebook.
Let's look at the first lines of this file.
fobj = open('winequality-red.csv')
for line_num, line in enumerate(fobj.readlines()):
line = line.strip()
print(f"line {line_num}: '{line}'")
if line_num > 3:
break
Q10 Read the file into a dict called data
¶
The dict should have a key for each column in the CSV file and each dictionary value should be a list with all the values in that column.
For example, a CSV file like this:
name,home planet
Arthur,Earth
Zaphod,Betelgeuse V
Trillian,Earth
Would result in a dictionary like this:
{'name':['Arthur','Zaphod','Trillian'], 'home planet':['Earth', 'Betelgeuse V', 'Earth']}
But here, we read the file winequality-red.csv
which you have uploaded into this folder. Note that in this wine quality "CSV" file, the values are separated with semicolons (;
), not commas.
fobj = open('winequality-red.csv')
data = {}
for line_num, line in enumerate(fobj.readlines()):
line = line.strip()
#print(f"line {line_num}: '{line}'")
entries = line.split(';')
if line_num == 0:
column_names = entries
for column_name in column_names:
data[column_name] = []
continue
for (colname, entry) in zip(column_names, entries):
data[colname].append(float(entry))
data.keys()
assert len(data.keys()) == 12
assert len(data['"alcohol"'])==1599
acc = 0; [acc := acc+x for x in data['"quality"']]
assert acc==9012
Q11 Plot the "Density" (Y axis) versus "Alcohol" (X axis).¶
Your plot should look like this:
import matplotlib.pyplot as plt
plt.plot(data['"density"'], data['"alcohol"'], '.')
plt.xlabel("Density")
plt.ylabel("Alcohol");
Q12 Make a Python program that does this¶
Create a Python program called plot_red_wine.py
which makes the above plot (alcohol vs density for the red wine dataset) and saves the plot to a file called red_wine.png
.
Hint: save the figure using the plt.savefig()
function. (You might also want to play around with the plot.show()
function.)
Uploading the exercise¶
For this exercise, the following files should be uploaded:
- The two
.ipynb
files (overwriting the original ones, as usual). plot_red_wine.py
- Your Python scriptwinequality-red.csv
- The file you downloadedred_wine.png
- The plot you generated usingplot_red_wine.py
.