pm21-dragon/exercises/release/exercise-05/2__reading_csv_files.ipynb

238 lines
50 KiB
Plaintext
Raw Permalink Normal View History

2024-11-08 03:26:35 -05:00
{
"cells": [
{
"cell_type": "markdown",
"id": "9be665b8-0e4b-43f0-96b5-1cc821fc7d67",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "markdown",
"checksum": "905476d45abb629cb03d33e8fc4d9408",
"grade": false,
"grade_id": "cell-293ccdf5e42bf800",
"locked": true,
"schema_version": 3,
"solution": false,
"task": false
},
"tags": []
},
"source": [
"# Reading CSV files\n",
"\n",
"Step 1: Download file from https://archive.ics.uci.edu/ml/datasets/Wine+Quality . Click the \"Download\" button to get the `wine+quality.zip` file. Open this file and extract `winequality-red.csv`. Place it in the folder alongside this notebook.\n",
"\n",
"Let's look at the first lines of this file."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5c2e4658-ad84-4a02-8176-6ca65fa9140f",
"metadata": {},
"outputs": [],
"source": [
"fobj = open('winequality-red.csv')\n",
"for line_num, line in enumerate(fobj.readlines()):\n",
" line = line.strip()\n",
" print(f\"line {line_num}: '{line}'\")\n",
" if line_num > 3:\n",
" break"
]
},
{
"cell_type": "markdown",
"id": "7b803dcd-a408-476e-9e05-bab37dd64aac",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "markdown",
"checksum": "530907dd4b607a1abb9019a6434a9d41",
"grade": false,
"grade_id": "cell-65efe24785650af1",
"locked": true,
"schema_version": 3,
"solution": false,
"task": false
},
"tags": []
},
"source": [
"## Q10 Read the file into a dict called `data`\n",
"\n",
"The dict should have a key for each column in the CSV file and each dictionary value should be a list with all the values in that column.\n",
"\n",
"For example, a CSV file like this:\n",
"\n",
"```\n",
"name,home planet\n",
"Arthur,Earth\n",
"Zaphod,Betelgeuse V\n",
"Trillian,Earth\n",
"```\n",
"\n",
"Would result in a dictionary like this:\n",
"\n",
"```python\n",
"{'name':['Arthur','Zaphod','Trillian'], 'home planet':['Earth', 'Betelgeuse V', 'Earth']}\n",
"```\n",
"\n",
"But here, we read the file `winequality-red.csv` which you have uploaded into this folder. Note that in this wine quality \"CSV\" file, the values are separated with semicolons (`;`), not commas."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5443bf3d-2303-4971-85f4-0af37b783247",
"metadata": {
"deletable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "4440c7d3e2dcc5e7b6cfb03c6975c20d",
"grade": false,
"grade_id": "cell-bbe508684824a047",
"locked": false,
"schema_version": 3,
"solution": true,
"task": false
},
"tags": []
},
"outputs": [],
"source": [
"# YOUR CODE HERE\n",
"raise NotImplementedError()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9880f13b-acc5-431c-836e-a0d34bdc632c",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "a205654f9a79e0ca05b841f9c4c2f341",
"grade": true,
"grade_id": "cell-1978372e733238bd",
"locked": true,
"points": 0,
"schema_version": 3,
"solution": false,
"task": false
},
"tags": []
},
"outputs": [],
"source": [
"assert len(data.keys()) == 12\n",
"assert len(data['\"alcohol\"'])==1599\n",
"acc = 0; [acc := acc+x for x in data['\"quality\"']]\n",
"assert acc==9012"
]
},
{
"cell_type": "markdown",
"id": "075f13ae-1d24-4e26-ba68-fa3828da1861",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "markdown",
"checksum": "20b1e23ac89091143fca2c1434adf737",
"grade": false,
"grade_id": "cell-c76a021eff929a4e",
"locked": true,
"schema_version": 3,
"solution": false,
"task": false
},
"tags": []
},
"source": [
" ## Q11 Plot the \"Density\" (Y axis) versus \"Alcohol\" (X axis).\n",
" \n",
" Your plot should look like this:\n",
" \n",
"<img src=\"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "57ace93a-fc8e-48c7-9fb4-b1ebe2b521f0",
"metadata": {
"deletable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "734c1d9fe9608c5903b3f922b1e41f1e",
"grade": true,
"grade_id": "cell-0dd13f6a5af90429",
"locked": false,
"points": 1,
"schema_version": 3,
"solution": true,
"task": false
},
"tags": []
},
"outputs": [],
"source": [
"# YOUR CODE HERE\n",
"raise NotImplementedError()"
]
},
{
"cell_type": "markdown",
"id": "7cff50a5-7642-47f6-b342-043bc398e981",
"metadata": {},
"source": [
" ## Q12 Make a Python program that does this\n",
"\n",
"Create a Python program called `plot_red_wine.py` which makes the above plot (alcohol vs density for the red wine dataset) and saves the plot to a file called `red_wine.png`.\n",
"\n",
"Hint: save the figure using the `plt.savefig()` function. (You might also want to play around with the `plot.show()` function.)"
]
},
{
"cell_type": "markdown",
"id": "c7ce822f-3b1e-4982-a6c7-cfabb3ee5f80",
"metadata": {},
"source": [
"# Uploading the exercise\n",
"\n",
"For this exercise, the following files should be uploaded:\n",
"\n",
"* The two `.ipynb` files (overwriting the original ones, as usual).\n",
"* `plot_red_wine.py` - Your Python script\n",
"* `winequality-red.csv` - The file you downloaded\n",
"* `red_wine.png` - The plot you generated using `plot_red_wine.py`."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}