pm21-dragon/exercises/release/exercise-05/2__reading_csv_files.ipynb
2024-11-08 09:26:35 +01:00

50 KiB

None <html> <head> </head>

Reading CSV files

Step 1: Download file from https://archive.ics.uci.edu/ml/datasets/Wine+Quality . Click the "Download" button to get the wine+quality.zip file. Open this file and extract winequality-red.csv. Place it in the folder alongside this notebook.

Let's look at the first lines of this file.

In [ ]:
fobj = open('winequality-red.csv')
for line_num, line in enumerate(fobj.readlines()):
    line = line.strip()
    print(f"line {line_num}: '{line}'")
    if line_num > 3:
        break

Q10 Read the file into a dict called data

The dict should have a key for each column in the CSV file and each dictionary value should be a list with all the values in that column.

For example, a CSV file like this:

name,home planet
Arthur,Earth
Zaphod,Betelgeuse V
Trillian,Earth

Would result in a dictionary like this:

{'name':['Arthur','Zaphod','Trillian'], 'home planet':['Earth', 'Betelgeuse V', 'Earth']}

But here, we read the file winequality-red.csv which you have uploaded into this folder. Note that in this wine quality "CSV" file, the values are separated with semicolons (;), not commas.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
assert len(data.keys()) == 12
assert len(data['"alcohol"'])==1599
acc = 0; [acc := acc+x for x in data['"quality"']]
assert acc==9012

Q11 Plot the "Density" (Y axis) versus "Alcohol" (X axis).

Your plot should look like this:

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()

Q12 Make a Python program that does this

Create a Python program called plot_red_wine.py which makes the above plot (alcohol vs density for the red wine dataset) and saves the plot to a file called red_wine.png.

Hint: save the figure using the plt.savefig() function. (You might also want to play around with the plot.show() function.)

Uploading the exercise

For this exercise, the following files should be uploaded:

  • The two .ipynb files (overwriting the original ones, as usual).
  • plot_red_wine.py - Your Python script
  • winequality-red.csv - The file you downloaded
  • red_wine.png - The plot you generated using plot_red_wine.py.
</html>