pm21-dragon/lectures/lecture-04/2 - reading-CSV.ipynb

210 lines
26 KiB
Plaintext
Raw Normal View History

2024-11-08 02:55:51 -05:00
{
"cells": [
{
"cell_type": "code",
2024-11-08 06:03:56 -05:00
"execution_count": 1,
2024-11-08 02:55:51 -05:00
"id": "e87e5c8f-564b-4330-b7db-bd4c20451c1b",
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"id": "c879ab27-3c6d-4fcf-b3e9-54596e952b9d",
"metadata": {},
"source": [
"# reading CSV files with pure Python\n",
"\n",
"CSV (\"comma separated values\") files are very widely used for storing tables of data. Excel and Google Sheets can read and write CSV files quite easily. They are also \"human readable\" as data files -- you can open one it a simple text viewer program such as TextEdit as see the contents.\n",
"\n",
"Unfortunately they are not totally standard. Here we will open one and read its contents into a dictionary. The dictionary will have one key for each column and a list of values for each row.\n",
"\n",
"For example, with `file.csv` like so:\n",
"\n",
"```\n",
"column 1,column 2\n",
"1,2\n",
"3,4\n",
"```\n",
"\n",
"We would like to extract a dictionary like this:\n",
"\n",
"```python\n",
"{'column 1': [1,2], 'column 2': [3,4]}\n",
"```\n",
"\n",
"We will now do this for the file `iris.csv` which contains the data used above."
]
},
{
"cell_type": "code",
2024-11-08 06:03:56 -05:00
"execution_count": 2,
2024-11-08 02:55:51 -05:00
"id": "d10cc532-f354-4262-8a98-55b4f4fdea64",
"metadata": {},
"outputs": [],
"source": [
"fobj = open(\"iris.csv\")"
]
},
{
"cell_type": "code",
2024-11-08 06:03:56 -05:00
"execution_count": 3,
2024-11-08 02:55:51 -05:00
"id": "95fdd89a-6ae0-4ecc-bd0b-1fc34472533a",
"metadata": {},
2024-11-08 06:03:56 -05:00
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"'sepal_length,sepal_width,petal_length,petal_width,species\\n'\n",
"'5.1,3.5,1.4,0.2,setosa\\n'\n",
"'4.9,3.0,1.4,0.2,setosa\\n'\n",
"'4.7,3.2,1.3,0.2,setosa\\n'\n",
"'4.6,3.1,1.5,0.2,setosa\\n'\n",
"'5.0,3.6,1.4,0.2,setosa\\n'\n",
"'5.4,3.9,1.7,0.4,setosa\\n'\n"
]
}
],
2024-11-08 02:55:51 -05:00
"source": [
"fobj = open(\"iris.csv\")\n",
"for line_num, line in enumerate(fobj.readlines()):\n",
" print(repr(line))\n",
" if line_num > 5:\n",
" break"
]
},
{
"cell_type": "code",
2024-11-08 06:03:56 -05:00
"execution_count": 4,
2024-11-08 02:55:51 -05:00
"id": "c74368aa-143f-4310-bdee-5bdab1802d7a",
"metadata": {},
2024-11-08 06:03:56 -05:00
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"'sepal_length,sepal_width,petal_length,petal_width,species'\n",
"'5.1,3.5,1.4,0.2,setosa'\n",
"'4.9,3.0,1.4,0.2,setosa'\n",
"'4.7,3.2,1.3,0.2,setosa'\n",
"'4.6,3.1,1.5,0.2,setosa'\n",
"'5.0,3.6,1.4,0.2,setosa'\n",
"'5.4,3.9,1.7,0.4,setosa'\n"
]
}
],
2024-11-08 02:55:51 -05:00
"source": [
"fobj = open(\"iris.csv\")\n",
"for line_num, line in enumerate(fobj.readlines()):\n",
" line = line.strip()\n",
" print(repr(line))\n",
" if line_num > 5:\n",
" break"
]
},
{
"cell_type": "code",
2024-11-08 06:03:56 -05:00
"execution_count": 9,
2024-11-08 02:55:51 -05:00
"id": "d73d37e3-876c-4576-a332-dc876726ad15",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"fobj = open(\"iris.csv\")\n",
"iris_dataset_from_csv= {}\n",
"for line_num, line in enumerate(fobj.readlines()):\n",
" line = line.strip()\n",
" entries = line.split(',')\n",
2024-11-08 06:03:56 -05:00
" # print(entries)\n",
" # if line_num > 5:\n",
" # break\n",
2024-11-08 02:55:51 -05:00
" if line_num == 0:\n",
" column_names = entries\n",
" for column_name in column_names:\n",
" iris_dataset_from_csv[column_name] = []\n",
" continue\n",
" # if we are here, we are line_num >= 1 and iris_dataset_from_csv is set up with columns and\n",
" # column_names has our column names in the right order.\n",
" for (column_name, entry) in zip(column_names, entries):\n",
" if column_name != 'species':\n",
" entry = float(entry)\n",
2024-11-08 06:03:56 -05:00
" iris_dataset_from_csv[column_name].append(entry) "
2024-11-08 02:55:51 -05:00
]
},
{
"cell_type": "code",
2024-11-08 06:03:56 -05:00
"execution_count": 15,
2024-11-08 02:55:51 -05:00
"id": "d803ec2c-dc0e-4456-b040-c44393a2b71c",
"metadata": {},
2024-11-08 06:03:56 -05:00
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAisAAAGdCAYAAADT1TPdAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy81sbWrAAAACXBIWXMAAA9hAAAPYQGoP6dpAAA7v0lEQVR4nO3da2zc1Z3/8c/cb7bHlzjOxHYcQ4tpkoLShBazhS2bJYisVqXiAVrpT0u1W5FdCi0RKg19sFKfZKvy77LVtrDZXVGlqBRpTf6lIlthdZPQinRLaFraGJLQNYnjjJ2L47E9Hs/9/yDEi8k4nnGOx2dm3i/JDzKXn7/n+MTz9W9+cz6OfD6fFwAAgKWcy10AAADA1dCsAAAAq9GsAAAAq9GsAAAAq9GsAAAAq9GsAAAAq9GsAAAAq9GsAAAAq7mXu4Bi5HI5nTlzRvX19XI4HMtdDgAAKEI+n9fk5KRWr14tp3Px50cqolk5c+aMOjs7l7sMAACwCENDQ+ro6Fj08yuiWamvr5d0abANDQ3LXA0AACjGxMSEOjs7Z1/HF6simpXLb/00NDTQrAAAUGGu9RIOLrAFAABWo1kBAABWo1kBAABWo1kBAABWo1kBAABWo1kBAABWo1kBAABWo1kBAABWq4hN4QBgIblcTuemUppJZ+X3uNRa572mLBLb5PN5xVNZZbI5uV1OhbwustJQM0pqVnbt2qWXXnpJ77zzjgKBgG677TZ961vfUk9Pz7zPOXDggO68884rbn/77bd14403ll4xAHzI0MVpvTF4QafHEkpl8vK6HepoDuiW7hZ1NgWXu7xrFkukdfJCXGNTKWVyebmdDjXXedXVElI44Fnu8oAlV1KzcvDgQT388MO65ZZblMlk9I1vfENbt27VwMCAQqHQVZ977NixOVvlt7a2Lq5iAPiAoYvT2vdWVOPTaUXCfgW8LiVSWZ0YjevcZErbbopUdMMSS6T1h+GY4smMmoJeed1OpTI5jcRmNDmT0Yb2MA0Lql5JzcrPfvazOf9+7rnntHLlSr355pu64447rvrclStXqrGxseQCAWA+uVxObwxe0Ph0Wje0/W9QWr3fqXq/R8dHJ3X4vTG1h/0V+ZZQPp/XyQtxxZMZRcKB2dv9Hpci4YCisYROjcW1YXWYt4RQ1a7pf28sFpMkNTc3L/jYjRs3KhKJaMuWLdq/f/9VH5tMJjUxMTHnCwA+7NxUSqfHEoqE/QXvj4T9GrowrXNTqTJXZkY8ldXYVEpNQW/B+5uCXl2YTCmeypa5MqC8Ft2s5PN57dixQ5/+9Ke1YcOGeR8XiUS0e/du9fX16aWXXlJPT4+2bNmi1157bd7n7Nq1S+FwePars7NzsWUCqGIz6axSmbwCXlfB+/0el1KZvGbSlflinsnmlMnl5XUX/lXtcTmVyeWVyebKXBlQXo58Pp9fzBMffvhhvfLKK/rlL3+pjo6Okp77l3/5l3I4HHr55ZcL3p9MJpVMJmf/PTExoc7OTsVisTnXvQCobaMTM3rx16fUGPSq3n/ldRuTM2mNT6d0/yfXqK2h8NkXm00lMzo8OKaQzy2/58qGbCadVTyZ0ebuZtX5+HAn7DMxMaFwOHzNr9+LOrPyyCOP6OWXX9b+/ftLblQk6dZbb9WJEyfmvd/n86mhoWHOFwB8WGudVx3NAUVjMwXvj8Zm1NkSVGtd4bdRbBfyutRc59XF6cJvY12cTqml3qvQPGeWgGpRUiuez+f1yCOPaO/evTpw4IC6u7sX9U2PHDmiSCSyqOcCwGVOp1O3dLfo3GRKx0cnFQn75fe4NJPOKhqbUWPQo81rmyvy4lpJcjgc6moJaXImo2gsoaagVx6XU+lsThenUwr53FrTHOLiWlS9kpqVhx9+WD/60Y/0k5/8RPX19RoZGZEkhcNhBQKXrlTfuXOnhoeHtWfPHknS008/rbVr12r9+vVKpVJ6/vnn1dfXp76+PsNDAVCLOpuC2nZT5Ip9Vm5YVafNa5sr+mPLkhQOeLShPXzFPiuRRr/WNLPPCmpDSc3KM888I0n6zGc+M+f25557Tg8++KAkKRqN6tSpU7P3pVIpPf744xoeHlYgEND69ev1yiuvaNu2bddWOQC8r7MpqPawv2p3sA0HPPp4e5gdbFGzFn2BbTmZukAHAACUz7JeYAsAAFAuNCsAAMBqfDAfMKza03EZH4Byo1kBDKr2dFzGB2A50KwAhlR7Oi7jq+zxAZWMa1YAAz6cjuv3uOR0OGbTcePJjE6NxVUBH74riPFV9viASkezAhhQ7em4jK+yxwdUOpoVwIBqT8dlfJU9PqDS0awABrhdTrmdDqUyhV/M0tmc3E6H3K7K/C/H+Cp7fECl438eYEC1p+MyvsoeH1DpaFYAAy6n44Z8bkVjCc2ks8rm8u+n/yYqPh2X8VX2+IBKx0eXAUOqPR2X8VX2+IBKRrMCGFTt6biMD8ByoFkBDHM4HKrzVe9/LcYHoNy4ZgUAAFiNZgUAAFiNc52oWKTjFi+Xy+ncVEoz6az8Hpda67xyOhf3t4rJec9mszo5llA8mVHI51ZXc0Au1/J+PJh1BdiHZgUViXTc4g1dnNYbgxd0eiyhVCYvr9uhjuaAbuluUWdTsKRjmZz3gWhM/UdHNHh+WulMTh63U90rgrpr/Sqti4RLOpYprCvATjQrqDik4xZv6OK09r0V1fh0WpGwXwGvS4lUVidG4zo3mdK2myJFNywm530gGtOe109qPJ5We5NfIZ9L8WRWb5+ZUjR2Up+/ravsDQvrCrAX16ygopCOW7xcLqc3Bi9ofDqtG9rqVe/3yO10qt7v0Q1t9RqfTuvwe2PK5RbOuzE579lsVv1HRzQeT2vd6gaFA165nS6FA16tW92g8XhaPx8YVTZbvtBA1hVgN5oVVBTScYt3biql02MJRcL+gvdHwn4NXZjWuanCW8x/kMl5PzmW0OD5abU3Fa6rvcmvP56L6+RYYsFjmcK6AuxGs4KKQjpu8WbSWaUyeQXmybPxe1xKZS5tKb8Qk/MeT2aUzuQU8hWuK+B1KZ3JKZ7MLHgsU1hXgN1oVlBRSMctnt/jktftUGKeswEz6ay87ktvdSzE5LyHfG553E7Fk4XrSqSy8ridCpVxYzbWFWA3/uehopCOW7zWOq86mgOKxmYK3h+NzaizJajWusJvfXyQyXnvag6oe0VQwxcL1zV8cUbXt4bU1RxY8FimsK4Au9GsoKKQjls8p9OpW7pb1Bj06PjopCZn0kpnc5qcSev46KQagx5tXttc1H4rJufd5XLprvWr1BjyaODMhGKJlFLZrGKJlAbOTKgx5NGWdW1l3W+FdQXYzZGvgMvbJyYmFA6HFYvF1NDQsNzlwAKF9sNoqfeSjltAoX1WOluC2ry22cg+K4ud90L7rFzfGtKWdW1W7bPCugIWz9TrN80KKhY7jRaPHWyLx7oCzDH1+s2mcKhYpOMWz+l0qq2h8EeFS2Vy3l0ul65rrTNyLFNYV4B9uGYFAABYjWYFAABYjXOdgGE2XvNgY02oDqwtlAPNCmCQjam9NtaE6sDaQrnQrACG2Jjaa2NNqA6sLZQT16wABtiY2mtjTagOrC2UG80KYICNqb021oTqwNpCudGsAAbYmNprY02oDqwtlBvNCmCAjam9NtaE6sDaQrmxkgADbEzttbEmVAfWFsqNZgUwwMbUXhtrQnVgbaHc+OgyYEg44NGG9vAV+05EGv3LltprY02oDqwtlBPNCmBQOODRx9vDVu3oaWNNqA6sLZQLzQpgmI2pvTbWhOrA2kI5cM0KAACwGs0KAACwGufuAMNsTKHNZrM6OZZQPJlRyOdWV3NALtfiPlZq4/hsrgvAtaNZAQyyMYV2IBpT/9ERDZ6fVjqTk8ftVPeKoO5av0rrIuGSjmXj+GyuC4AZNCuAITam0A5EY9rz+kmNx9Nqb/Ir5HMpnszq7TNTisZO6vO3dRXdsNg4PpvrAmAO16wABtiYQpvNZtV/dETj8bTWrW5QOOCV2+lSOODVutUNGo+n9fOBUWWzC4fN2Tg+m+sCYBbNCmCAjSm0J8cSGjw
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
2024-11-08 02:55:51 -05:00
"source": [
"plt.plot(iris_dataset_from_csv['sepal_width'], iris_dataset_from_csv['petal_width'],'o', alpha=0.2);"
]
},
{
"cell_type": "markdown",
"id": "d6539f9f-726b-4f49-84d0-54356022b572",
"metadata": {},
"source": [
"# Using matplotlib from Python programs (not in Jupyter):\n",
"\n",
"Open an interactive window:\n",
"\n",
"```python\n",
"plt.show()\n",
"```\n",
"\n",
"Save figure to file\n",
"\n",
"```python\n",
"plt.savefig(\"plot_filename.png\")\n",
"```"
]
},
{
2024-11-08 06:03:56 -05:00
"cell_type": "markdown",
"id": "d00eb38c-03cc-480c-aefd-6a21d54c7325",
2024-11-08 02:55:51 -05:00
"metadata": {},
2024-11-08 06:03:56 -05:00
"source": [
"Live demo of all the above to call `plt.savefig()`."
]
2024-11-08 02:55:51 -05:00
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}