264 lines
94 KiB
Plaintext
264 lines
94 KiB
Plaintext
|
{
|
||
|
"cells": [
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "9be665b8-0e4b-43f0-96b5-1cc821fc7d67",
|
||
|
"metadata": {
|
||
|
"nbgrader": {
|
||
|
"grade": false,
|
||
|
"grade_id": "cell-293ccdf5e42bf800",
|
||
|
"locked": true,
|
||
|
"schema_version": 3,
|
||
|
"solution": false,
|
||
|
"task": false
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"# Reading CSV files\n",
|
||
|
"\n",
|
||
|
"Step 1: Download file from https://archive.ics.uci.edu/ml/datasets/Wine+Quality . Click the \"Download\" button to get the `wine+quality.zip` file. Open this file and extract `winequality-red.csv`. Place it in the folder alongside this notebook.\n",
|
||
|
"\n",
|
||
|
"Let's look at the first lines of this file."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 1,
|
||
|
"id": "5c2e4658-ad84-4a02-8176-6ca65fa9140f",
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"line 0: '\"fixed acidity\";\"volatile acidity\";\"citric acid\";\"residual sugar\";\"chlorides\";\"free sulfur dioxide\";\"total sulfur dioxide\";\"density\";\"pH\";\"sulphates\";\"alcohol\";\"quality\"'\n",
|
||
|
"line 1: '7.4;0.7;0;1.9;0.076;11;34;0.9978;3.51;0.56;9.4;5'\n",
|
||
|
"line 2: '7.8;0.88;0;2.6;0.098;25;67;0.9968;3.2;0.68;9.8;5'\n",
|
||
|
"line 3: '7.8;0.76;0.04;2.3;0.092;15;54;0.997;3.26;0.65;9.8;5'\n",
|
||
|
"line 4: '11.2;0.28;0.56;1.9;0.075;17;60;0.998;3.16;0.58;9.8;6'\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"fobj = open('winequality-red.csv')\n",
|
||
|
"for line_num, line in enumerate(fobj.readlines()):\n",
|
||
|
" line = line.strip()\n",
|
||
|
" print(f\"line {line_num}: '{line}'\")\n",
|
||
|
" if line_num > 3:\n",
|
||
|
" break"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "7b803dcd-a408-476e-9e05-bab37dd64aac",
|
||
|
"metadata": {
|
||
|
"nbgrader": {
|
||
|
"grade": false,
|
||
|
"grade_id": "cell-65efe24785650af1",
|
||
|
"locked": true,
|
||
|
"schema_version": 3,
|
||
|
"solution": false,
|
||
|
"task": false
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
"## Q10 Read the file into a dict called `data`\n",
|
||
|
"\n",
|
||
|
"The dict should have a key for each column in the CSV file and each dictionary value should be a list with all the values in that column.\n",
|
||
|
"\n",
|
||
|
"For example, a CSV file like this:\n",
|
||
|
"\n",
|
||
|
"```\n",
|
||
|
"name,home planet\n",
|
||
|
"Arthur,Earth\n",
|
||
|
"Zaphod,Betelgeuse V\n",
|
||
|
"Trillian,Earth\n",
|
||
|
"```\n",
|
||
|
"\n",
|
||
|
"Would result in a dictionary like this:\n",
|
||
|
"\n",
|
||
|
"```python\n",
|
||
|
"{'name':['Arthur','Zaphod','Trillian'], 'home planet':['Earth', 'Betelgeuse V', 'Earth']}\n",
|
||
|
"```\n",
|
||
|
"\n",
|
||
|
"But here, we read the file `winequality-red.csv` which you have uploaded into this folder. Note that in this wine quality \"CSV\" file, the values are separated with semicolons (`;`), not commas."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 2,
|
||
|
"id": "5443bf3d-2303-4971-85f4-0af37b783247",
|
||
|
"metadata": {
|
||
|
"nbgrader": {
|
||
|
"grade": false,
|
||
|
"grade_id": "cell-bbe508684824a047",
|
||
|
"locked": false,
|
||
|
"schema_version": 3,
|
||
|
"solution": true,
|
||
|
"task": false
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"dict_keys(['\"fixed acidity\"', '\"volatile acidity\"', '\"citric acid\"', '\"residual sugar\"', '\"chlorides\"', '\"free sulfur dioxide\"', '\"total sulfur dioxide\"', '\"density\"', '\"pH\"', '\"sulphates\"', '\"alcohol\"', '\"quality\"'])"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 2,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"fobj = open('winequality-red.csv')\n",
|
||
|
"data = {}\n",
|
||
|
"for line_num, line in enumerate(fobj.readlines()):\n",
|
||
|
" line = line.strip()\n",
|
||
|
" #print(f\"line {line_num}: '{line}'\")\n",
|
||
|
" entries = line.split(';')\n",
|
||
|
" if line_num == 0:\n",
|
||
|
" column_names = entries\n",
|
||
|
" for column_name in column_names:\n",
|
||
|
" data[column_name] = []\n",
|
||
|
" continue\n",
|
||
|
" for (colname, entry) in zip(column_names, entries):\n",
|
||
|
" data[colname].append(float(entry))\n",
|
||
|
"data.keys()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 3,
|
||
|
"id": "9880f13b-acc5-431c-836e-a0d34bdc632c",
|
||
|
"metadata": {
|
||
|
"nbgrader": {
|
||
|
"grade": true,
|
||
|
"grade_id": "cell-1978372e733238bd",
|
||
|
"locked": true,
|
||
|
"points": 0,
|
||
|
"schema_version": 3,
|
||
|
"solution": false,
|
||
|
"task": false
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"assert len(data.keys()) == 12\n",
|
||
|
"assert len(data['\"alcohol\"'])==1599\n",
|
||
|
"acc = 0; [acc := acc+x for x in data['\"quality\"']]\n",
|
||
|
"assert acc==9012"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "075f13ae-1d24-4e26-ba68-fa3828da1861",
|
||
|
"metadata": {
|
||
|
"nbgrader": {
|
||
|
"grade": false,
|
||
|
"grade_id": "cell-c76a021eff929a4e",
|
||
|
"locked": true,
|
||
|
"schema_version": 3,
|
||
|
"solution": false,
|
||
|
"task": false
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"source": [
|
||
|
" ## Q11 Plot the \"Density\" (Y axis) versus \"Alcohol\" (X axis).\n",
|
||
|
" \n",
|
||
|
" Your plot should look like this:\n",
|
||
|
" \n",
|
||
|
"<img src=\"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 4,
|
||
|
"id": "57ace93a-fc8e-48c7-9fb4-b1ebe2b521f0",
|
||
|
"metadata": {
|
||
|
"nbgrader": {
|
||
|
"grade": true,
|
||
|
"grade_id": "cell-0dd13f6a5af90429",
|
||
|
"locked": false,
|
||
|
"points": 1,
|
||
|
"schema_version": 3,
|
||
|
"solution": true,
|
||
|
"task": false
|
||
|
},
|
||
|
"tags": []
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjoAAAGwCAYAAACgi8/jAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy81sbWrAAAACXBIWXMAAA9hAAAPYQGoP6dpAACBbElEQVR4nO3de3wU9b0//tdsyBXIjSTkYkJCxARJBCJQuYgBPEJQKxKt9qJ4rVV71NqWS+053/7OsQJtbWsv2lPqrV4qRcALFS8NIAIKBCSAAiIkJEJCiCwbSEICZH5/hFl2Z2dmZ3Znd2d3X8/Hg0ebzezM5/OZye7H3dfnPYIoiiKIiIiIIpAt1A0gIiIiChROdIiIiChicaJDREREEYsTHSIiIopYnOgQERFRxOJEh4iIiCIWJzpEREQUsfqFugGB1tvbiyNHjmDgwIEQBCHUzSEiIiIdRFHEyZMnkZubC5vN989lIn6ic+TIEeTn54e6GUREROSDpqYmXHTRRT4/P+InOgMHDgTQN1DJyckhbg0RERHp0d7ejvz8fOf7uK8ifqIjfV2VnJzMiQ4REVGY8Td2wjAyERERRSxOdIiIiChicaJDREREEYsTHSIiIopYnOgQERFRxArpRGf9+vW4/vrrkZubC0EQ8MYbb7j9/o477oAgCG7/rrjiitA0loiIiMJOSCc6HR0dGDlyJP70pz+pbjNjxgw0Nzc7/73zzjtBbCERERGFs5DW0amqqkJVVZXmNvHx8cjOzg5Si4iIiCiSWD6js27dOmRlZeGSSy7Bvffei9bWVs3tu7u70d7e7vaPiIiIopOlJzpVVVV45ZVXsGbNGjz55JPYunUrpk6diu7ubtXnLFy4ECkpKc5/vM8VERFR9BJEURRD3Qigr8TzypUrMWvWLNVtmpubMWTIELz22muYPXu24jbd3d1uEyHpXhkOh4O3gCAiIgoT7e3tSElJ8fv9O6zudZWTk4MhQ4Zg//79qtvEx8cjPj4+iK0iokjV7OhCfVsHijL6IyclMdTNISIfhNVE5+uvv0ZTUxNycnJC3RQiinBLtzZiwYpd6BUBmwAsnF2OW8YWhLpZRGRQSDM6p06dwo4dO7Bjxw4AQH19PXbs2IHGxkacOnUKP/nJT/Dxxx+joaEB69atw/XXX4+MjAzceOONoWw2EUW4ZkeXc5IDAL0i8LMVu9Hs6Aptw4jIsJB+olNbW4spU6Y4f3700UcBAHPmzMEzzzyDXbt24e9//ztOnDiBnJwcTJkyBUuXLsXAgQND1WQiigL1bR3OSY7knCiioa2TX2ERhZmQTnQqKyuhlYV+7733gtgaIqI+RRn9YRPgNtmJEQQUZiSFrlFE5BNLLy8nIgqFnJRELJxdjhhBANA3yXlidhk/zSEKQ2EVRiYiCpZbxhZg8iWZaGjrRGFGEic5RGGKEx0iIhU5KYmc4BCFOX51RURERBGLEx0iIiKKWJzoEBERUcTiRIeIiIgiFic6REREFLE40SEiIqKIxYkOERERRSxOdIiIiChicaJDREREEYsTHSIiIopYnOgQERFRxOJEh4iIiCIWJzpEREQUsTjRISIioojFiQ4RERFFLE50iIiIKGJxokNEREQRixMdIiIiilic6BAREVHE4kSHiIiIIhYnOkRERBSxONEhIiKiiMWJDhEREUUsTnSIiIgoYnGiQ0RERBGLEx0iIiKKWJzoEBERUcTiRIeIiIgiFic6REREFLE40SEiIqKIxYkOERERRSxOdIiIiChicaJDREREEYsTHSIiIopYnOgQERFRxOJEh4iIiCIWJzpEREQUsTjRISIioogV0onO+vXrcf311yM3NxeCIOCNN95Q3fa+++6DIAj4/e9/H7T2hVqzowtv1x3Gqp1H0OzoCsj+Nx1oQ82eFiz56ADqmuxujwfimGQMzwURkX/6hfLgHR0dGDlyJO68805UV1erbvfGG29g8+bNyM3NDWLrQmvp1kbMX74L4vmfBQCLqstxy9gC0/a/YMUu9Iruj1cUpGJH0wn0ioBNABbONu+YZIzrOeK5ICLyTUg/0amqqsLjjz+O2bNnq25z+PBh/PCHP8Qrr7yC2NhYr/vs7u5Ge3u7279w0+zocpvkAIAIYMGKXab8l32zo0txkgMA2xtPOB/vFYGfrdjNTxNCQH6OeC6IiHxj6YxOb28vbrvtNvz0pz/FiBEjdD1n4cKFSElJcf7Lz88PcCvNV9/WAYU5CHpFoKGt05T9K01ylJwTRVOOScYonSOeCyIi4yw90Vm8eDH69euHhx56SPdzFixYAIfD4fzX1NQUwBYGRlFGfwgKj9sEoDAjyZT925QOoCBGEEw5JhmjdI54LoiIjLPsRGfbtm146qmn8MILL0AQdL4rA4iPj0dycrLbv3CTk5KIRdXlbpMd4XxGIycl0ZT9L5xdjhiFca0oSHU+HiMIeGJ2mSnHJGPk54jngojIN4Ioijq/xAgsQRCwcuVKzJo1CwDw+9//Ho8++ihstgtzsXPnzsFmsyE/Px8NDQ269tve3o6UlBQ4HI6wm/Q0O7qwrcEOQQAqhqSZ/ibX7OhCQ1snOnvOoKGtE2MK0zAyP835eGFGEt9YQ4zngoiilVnv3yFddaXltttuw9VXX+322PTp03HbbbfhzjvvDFGrgisnJRHXjQzcm1tOSqLim6fa4xR8PBdERP4J6UTn1KlT+PLLL50/19fXY8eOHUhPT0dBQQEGDRrktn1sbCyys7NRUlIS7KYSERFRGArpRKe2thZTpkxx/vzoo48CAObMmYMXXnghRK0iIiKiSBHSiU5lZSWMRIT05nKIiIiIAAuvuiIiIiLyFyc6REREFLE40SEiIqKIxYkOERERRSxOdIiIiChicaJDREREEYsTHSIiIopYnOgQERFRxOJEh4iIiCIWJzpEREQUsTjRMVGzowubDrSh2dEV6qbopqfN4dgvuUjoAxERGRfSe11FkqVbG7FgxS70ioBNABbOLsctYwtC3SxNetocjv2Si4Q+EBGRb/iJjgmaHV3ON1IA6BWBn63YbelPD/S0ORz7JRcJfSAiIt9xomOC+rYO5xup5JwooqGtMzQN0kFPm8OxX3KR0AciIvIdJzomKMroD5vg/liMIKAwIyk0DdJBT5v7x8UoPjcpLnwum3A8N0REZJ7weceysJyURCycXY4Yoe8dNUYQ8MTsMuSkJIa4ZdrunlTknAQotbmj55zi8zp7eoPRPFOE67khIiJzMIxsklvGFmDyJZloaOtEYUaSpd9IXcO5AoDvTy7CnROLPNosfRri+tVPOH4aEk7nhoiIzMVPdEyUk5KI8cWDLP1GKg/nigCe/ahBcdtI+jQkHM4NERGZj5/oRBmtcK7SJICfhhARUTjjRCfK+PJ1VE5KIic4REQUlvjVVZSJpK+j6AJWfu4TbeMQbf0l8gU/0YlC/DoqsrDyc59oG4do6y+Rr/iJTpRiODcysPJzn2gbh2jrL5E/ONEhCmOs/Nwn2sYh2vpL5A9OdIjCGCs/94m2cYi2/hL5gxMdojDGcHmfaBuHaOsvkT8EURRF75uFr/b2dqSkpMDhcCA5OTnUzSGZZkcX6ts6UJTRny/Sfmh2dCmGy6NtfNXGIVLVNdmxtcGOsYVpGJmfFurmEJnKrPdvrrqikOGqEfMo1TqKxvGNpppP0Xh+iXzBr64oJLhqJLA4vpGN55dIP050KCS4aiSwOL6RjeeXSD9OdCgkuGoksDi+kY3nl0g/TnQoJLhqJLACMb683YB18O+HSD+uuqKQirZVMsFm1vgy+GpN/PuhSGbW+zcnOkSkqdnRhYmL1njc8X7D/Cl8cyWigDHr/ZtfXRGRJgZfiSiccaJDRJoYfCWicMaJDgWFlYKsVmpLOGDwlYjCGSsjU8BZKchqpbaEk1vGFmDyJZkMvhJR2OEnOhRQVqrgaqW2hKOclESMLx7ESQ4RhRVOdCigrBRktVJbiIgoODjRoYCyUpDVSm0hIqLgCOlEZ/369bj++uuRm5sLQRDwxhtvuP3+F7/4BUpLS9G/f3+kpaXh6quvxubNm0PTWPKJPMhqE4C
|
||
|
"text/plain": [
|
||
|
"<Figure size 640x480 with 1 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"import matplotlib.pyplot as plt\n",
|
||
|
"plt.plot(data['\"density\"'], data['\"alcohol\"'], '.')\n",
|
||
|
"plt.xlabel(\"Density\")\n",
|
||
|
"plt.ylabel(\"Alcohol\");"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "7cff50a5-7642-47f6-b342-043bc398e981",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
" ## Q12 Make a Python program that does this\n",
|
||
|
"\n",
|
||
|
"Create a Python program called `plot_red_wine.py` which makes the above plot (alcohol vs density for the red wine dataset) and saves the plot to a file called `red_wine.png`.\n",
|
||
|
"\n",
|
||
|
"Hint: save the figure using the `plt.savefig()` function. (You might also want to play around with the `plot.show()` function.)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "c7ce822f-3b1e-4982-a6c7-cfabb3ee5f80",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"# Uploading the exercise\n",
|
||
|
"\n",
|
||
|
"For this exercise, the following files should be uploaded:\n",
|
||
|
"\n",
|
||
|
"* The two `.ipynb` files (overwriting the original ones, as usual).\n",
|
||
|
"* `plot_red_wine.py` - Your Python script\n",
|
||
|
"* `winequality-red.csv` - The file you downloaded\n",
|
||
|
"* `red_wine.png` - The plot you generated using `plot_red_wine.py`."
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"metadata": {
|
||
|
"kernelspec": {
|
||
|
"display_name": "Python 3 (ipykernel)",
|
||
|
"language": "python",
|
||
|
"name": "python3"
|
||
|
},
|
||
|
"language_info": {
|
||
|
"codemirror_mode": {
|
||
|
"name": "ipython",
|
||
|
"version": 3
|
||
|
},
|
||
|
"file_extension": ".py",
|
||
|
"mimetype": "text/x-python",
|
||
|
"name": "python",
|
||
|
"nbconvert_exporter": "python",
|
||
|
"pygments_lexer": "ipython3",
|
||
|
"version": "3.11.7"
|
||
|
}
|
||
|
},
|
||
|
"nbformat": 4,
|
||
|
"nbformat_minor": 5
|
||
|
}
|