pm21-dragon/exercises/release/exercise-05/2__reading_csv_files.ipynb

238 lines
50 KiB
Plaintext
Raw Permalink Normal View History

2024-11-08 03:26:35 -05:00
{
"cells": [
{
"cell_type": "markdown",
"id": "9be665b8-0e4b-43f0-96b5-1cc821fc7d67",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "markdown",
"checksum": "905476d45abb629cb03d33e8fc4d9408",
"grade": false,
"grade_id": "cell-293ccdf5e42bf800",
"locked": true,
"schema_version": 3,
"solution": false,
"task": false
},
"tags": []
},
"source": [
"# Reading CSV files\n",
"\n",
"Step 1: Download file from https://archive.ics.uci.edu/ml/datasets/Wine+Quality . Click the \"Download\" button to get the `wine+quality.zip` file. Open this file and extract `winequality-red.csv`. Place it in the folder alongside this notebook.\n",
"\n",
"Let's look at the first lines of this file."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5c2e4658-ad84-4a02-8176-6ca65fa9140f",
"metadata": {},
"outputs": [],
"source": [
"fobj = open('winequality-red.csv')\n",
"for line_num, line in enumerate(fobj.readlines()):\n",
" line = line.strip()\n",
" print(f\"line {line_num}: '{line}'\")\n",
" if line_num > 3:\n",
" break"
]
},
{
"cell_type": "markdown",
"id": "7b803dcd-a408-476e-9e05-bab37dd64aac",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "markdown",
"checksum": "530907dd4b607a1abb9019a6434a9d41",
"grade": false,
"grade_id": "cell-65efe24785650af1",
"locked": true,
"schema_version": 3,
"solution": false,
"task": false
},
"tags": []
},
"source": [
"## Q10 Read the file into a dict called `data`\n",
"\n",
"The dict should have a key for each column in the CSV file and each dictionary value should be a list with all the values in that column.\n",
"\n",
"For example, a CSV file like this:\n",
"\n",
"```\n",
"name,home planet\n",
"Arthur,Earth\n",
"Zaphod,Betelgeuse V\n",
"Trillian,Earth\n",
"```\n",
"\n",
"Would result in a dictionary like this:\n",
"\n",
"```python\n",
"{'name':['Arthur','Zaphod','Trillian'], 'home planet':['Earth', 'Betelgeuse V', 'Earth']}\n",
"```\n",
"\n",
"But here, we read the file `winequality-red.csv` which you have uploaded into this folder. Note that in this wine quality \"CSV\" file, the values are separated with semicolons (`;`), not commas."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5443bf3d-2303-4971-85f4-0af37b783247",
"metadata": {
"deletable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "4440c7d3e2dcc5e7b6cfb03c6975c20d",
"grade": false,
"grade_id": "cell-bbe508684824a047",
"locked": false,
"schema_version": 3,
"solution": true,
"task": false
},
"tags": []
},
"outputs": [],
"source": [
"# YOUR CODE HERE\n",
"raise NotImplementedError()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9880f13b-acc5-431c-836e-a0d34bdc632c",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "a205654f9a79e0ca05b841f9c4c2f341",
"grade": true,
"grade_id": "cell-1978372e733238bd",
"locked": true,
"points": 0,
"schema_version": 3,
"solution": false,
"task": false
},
"tags": []
},
"outputs": [],
"source": [
"assert len(data.keys()) == 12\n",
"assert len(data['\"alcohol\"'])==1599\n",
"acc = 0; [acc := acc+x for x in data['\"quality\"']]\n",
"assert acc==9012"
]
},
{
"cell_type": "markdown",
"id": "075f13ae-1d24-4e26-ba68-fa3828da1861",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "markdown",
"checksum": "20b1e23ac89091143fca2c1434adf737",
"grade": false,
"grade_id": "cell-c76a021eff929a4e",
"locked": true,
"schema_version": 3,
"solution": false,
"task": false
},
"tags": []
},
"source": [
" ## Q11 Plot the \"Density\" (Y axis) versus \"Alcohol\" (X axis).\n",
" \n",
" Your plot should look like this:\n",
" \n",
"<img src=\"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAjoAAAGwCAYAAACgi8/jAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAA9hAAAPYQGoP6dpAACBbElEQVR4nO3de3wU9b0//tdsyBXIjSTkYkJCxARJBCJQuYgBPEJQKxKt9qJ4rVV71NqWS+053/7OsQJtbWsv2lPqrV4qRcALFS8NIAIKBCSAAiIkJEJCiCwbSEICZH5/hFl2Z2dmZ3Znd2d3X8/Hg0ebzezM5/OZye7H3dfnPYIoiiKIiIiIIpAt1A0gIiIiChROdIiIiChicaJDREREEYsTHSIiIopYnOgQERFRxOJEh4iIiCIWJzpEREQUsfqFugGB1tvbiyNHjmDgwIEQBCHUzSEiIiIdRFHEyZMnkZubC5vN989lIn6ic+TIEeTn54e6GUREROSDpqYmXHTRRT4/P+InOgMHDgTQN1DJyckhbg0RERHp0d7ejvz8fOf7uK8ifqIjfV2VnJzMiQ4REVGY8Td2wjAyERERRSxOdIiIiChicaJDREREEYsTHSIiIopYnOgQERFRxArpRGf9+vW4/vrrkZubC0EQ8MYbb7j9/o477oAgCG7/rrjiitA0loiIiMJOSCc6HR0dGDlyJP70pz+pbjNjxgw0Nzc7/73zzjtBbCERERGFs5DW0amqqkJVVZXmNvHx8cjOzg5Si4iIiCiSWD6js27dOmRlZeGSSy7Bvffei9bWVs3tu7u70d7e7vaPiIiIopOlJzpVVVV45ZVXsGbNGjz55JPYunUrpk6diu7ubtXnLFy4ECkpKc5/vM8VERFR9BJEURRD3Qigr8TzypUrMWvWLNVtmpubMWTIELz22muYPXu24jbd3d1uEyHpXhkOh4O3gCAiIgoT7e3tSElJ8fv9O6zudZWTk4MhQ4Zg//79qtvEx8cjPj4+iK0iokjV7OhCfVsHijL6IyclMdTNISIfhNVE5+uvv0ZTUxNycnJC3RQiinBLtzZiwYpd6BUBmwAsnF2OW8YWhLpZRGRQSDM6p06dwo4dO7Bjxw4AQH19PXbs2IHGxkacOnUKP/nJT/Dxxx+joaEB69atw/XXX4+MjAzceOONoWw2EUW4ZkeXc5IDAL0i8LMVu9Hs6Aptw4jIsJB+olNbW4spU6Y4f3700UcBAHPmzMEzzzyDXbt24e9//ztOnDiBnJwcTJkyBUuXLsXAgQND1WQiigL1bR3OSY7knCiioa2TX2ERhZmQTnQqKyuhlYV+7733gtgaIqI+RRn9YRPgNtmJEQQUZiSFrlFE5BNLLy8nIgqFnJRELJxdjhhBANA3yXlidhk/zSEKQ2EVRiYiCpZbxhZg8iWZaGjrRGFGEic5RGGKEx0iIhU5KYmc4BCFOX51RURERBGLEx0iIiKKWJzoEBERUcTiRIeIiIgiFic6REREFLE40SEiIqKIxYkOERERRSxOdIiIiChicaJDREREEYsTHSIiIopYnOgQERFRxOJEh4iIiCIWJzpEREQUsTjRISIioojFiQ4RERFFLE50iIiIKGJxokNEREQRixMdIiIiilic6BAREVHE4kSHiIiIIhYnOkRERBSxONEhIiKiiMWJDhEREUUsTnSIiIgoYnGiQ0RERBGLEx0iIiKKWJzoEBERUcTiRIeIiIgiFic6REREFLE40SEiIqKIxYkOERERRSxOdIiIiChicaJDREREEYsTHSIiIopYnOgQERFRxOJEh4iIiCIWJzpEREQUsTjRISIioogV0onO+vXrcf311yM3NxeCIOCNN95Q3fa+++6DIAj4/e9/H7T2hVqzowtv1x3Gqp1H0OzoCsj+Nx1oQ82eFiz56ADqmuxujwfimGQMzwURkX/6hfLgHR0dGDlyJO68805UV1erbvfGG29g8+bNyM3NDWLrQmvp1kbMX74L4vmfBQCLqstxy9gC0/a/YMUu9Iruj1cUpGJH0wn0ioBNABbONu+YZIzrOeK5ICLyTUg/0amqqsLjjz+O2bNnq25z+PBh/PCHP8Qrr7yC2NhYr/vs7u5Ge3u7279w0+zocpvkAIAIYMGKXab8l32zo0txkgMA2xtPOB/vFYGfrdjNTxNCQH6OeC6IiHxj6YxOb28vbrvtNvz0pz/FiBEjdD1n4cKFSElJcf7Lz88PcCvNV9/WAYU5CHpFoKGt05T9K01ylJwTRVOOScYonSOeCyIi4yw90Vm8eDH69euHhx56SPdzFixYAIfD4fzX1NQUwBYGRlFGfwgKj9sEoDAjyZT925QOoCBGEEw5JhmjdI54LoiIjLPsRGfbtm146qmn8MILL0AQdL4rA4iPj0dycrLbv3CTk5KIRdXlbpMd4XxGIycl0ZT9L5xdjhiFca0oSHU+HiMIeGJ2mSnHJGPk54jngojIN4Ioijq/xAgsQRCwcuVKzJo1CwDw+9//Ho8++ihstgtzsXPnzsFmsyE/Px8NDQ269tve3o6UlBQ4HI6wm/Q0O7qwrcEOQQAqhqSZ/ibX7OhCQ1snOnvOoKGtE2MK0zAyP835eGFGEt9YQ4zngoiilVnv3yFddaXltttuw9VXX+322PTp03HbbbfhzjvvDFGrgisnJRHXjQzcm1tOSqLim6fa4xR8PBdERP4J6UTn1KlT+PLLL50/19fXY8eOHUhPT0dBQQEGDRrktn1sbCyys7NRUlIS7KYSERFRGArpRKe2thZTpkxx/vzoo48CAObMmYMXXnghRK0iIiKiSBHSiU5lZSWMRIT05nKIiIiIAAuvuiIiIiLyFyc6REREFLE40SEiIqKIxYkOERERRSxOdIiIiChicaJDREREEYsTHSIiIopYnOgQERFRxOJEh4iIiCIWJzpEREQUsTjRMVGzowubDrSh2dEV6qbopqfN4dgvuUjoAxERGRfSe11FkqVbG7FgxS70ioBNABbOLsctYwtC3SxNetocjv2Si4Q+EBGRb/iJjgmaHV3ON1IA6BWBn63YbelPD/S0ORz7JRcJfSAiIt9xomOC+rYO5xup5JwooqGtMzQN0kFPm8OxX3KR0AciIvIdJzomKMroD5vg/liMIKAwIyk0DdJBT5v7x8UoPjcpLnwum3A8N0REZJ7weceysJyURCycXY4Yoe8dNUYQ8MTsMuSkJIa4ZdrunlTknAQotbmj55zi8zp7eoPRPFOE67khIiJzMIxsklvGFmDyJZloaOtEYUaSpd9IXcO5AoDvTy7CnROLPNosfRri+tVPOH4aEk7nhoiIzMVPdEyUk5KI8cWDLP1GKg/nigCe/ahBcdtI+jQkHM4NERGZj5/oRBmtcK7SJICfhhARUTjjRCfK+PJ1VE5KIic4REQUlvjVVZSJpK+j6AJWfu4TbeMQbf0l8gU/0YlC/DoqsrDyc59oG4do6y+Rr/iJTpRiODcysPJzn2gbh2jrL5E/ONEhCmOs/Nwn2sYh2vpL5A9OdIjCGCs/94m2cYi2/hL5gxMdojDGcHmfaBuHaOsvkT8EURRF75uFr/b2dqSkpMDhcCA5OTnUzSGZZkcX6ts6UJTRny/Sfmh2dCmGy6NtfNXGIVLVNdmxtcGOsYVpGJmfFurmEJnKrPdvrrqikOGqEfMo1TqKxvGNpppP0Xh+iXzBr64oJLhqJLA4vpGN55dIP050KCS4aiSwOL6RjeeXSD9OdCgkuGoksDi+kY3nl0g/TnQoJLhqJLACMb683YB18O+HSD+uuqKQirZVMsFm1vgy+GpN/PuhSGbW+zcnOkSkqdnRhYmL1njc8X7D/Cl8cyWigDHr/ZtfXRGRJgZfiSiccaJDRJoYfCWicMaJDgWFlYKsVmpLOGDwlYjCGSsjU8BZKchqpbaEk1vGFmDyJZkMvhJR2OEnOhRQVqrgaqW2hKOclESMLx7ESQ4RhRVOdCigrBRktVJbiIgoODjRoYCyUpDVSm0hIqLgCOlEZ/369bj++uuRm5sLQRDwxhtvuP3+F7/4BUpLS9G/f3+kpaXh6quvx
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "57ace93a-fc8e-48c7-9fb4-b1ebe2b521f0",
"metadata": {
"deletable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "734c1d9fe9608c5903b3f922b1e41f1e",
"grade": true,
"grade_id": "cell-0dd13f6a5af90429",
"locked": false,
"points": 1,
"schema_version": 3,
"solution": true,
"task": false
},
"tags": []
},
"outputs": [],
"source": [
"# YOUR CODE HERE\n",
"raise NotImplementedError()"
]
},
{
"cell_type": "markdown",
"id": "7cff50a5-7642-47f6-b342-043bc398e981",
"metadata": {},
"source": [
" ## Q12 Make a Python program that does this\n",
"\n",
"Create a Python program called `plot_red_wine.py` which makes the above plot (alcohol vs density for the red wine dataset) and saves the plot to a file called `red_wine.png`.\n",
"\n",
"Hint: save the figure using the `plt.savefig()` function. (You might also want to play around with the `plot.show()` function.)"
]
},
{
"cell_type": "markdown",
"id": "c7ce822f-3b1e-4982-a6c7-cfabb3ee5f80",
"metadata": {},
"source": [
"# Uploading the exercise\n",
"\n",
"For this exercise, the following files should be uploaded:\n",
"\n",
"* The two `.ipynb` files (overwriting the original ones, as usual).\n",
"* `plot_red_wine.py` - Your Python script\n",
"* `winequality-red.csv` - The file you downloaded\n",
"* `red_wine.png` - The plot you generated using `plot_red_wine.py`."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}