pm21-dragon/lectures/lecture-06/2 - Pandas.ipynb

5087 lines
241 KiB
Plaintext
Raw Normal View History

2024-11-22 03:30:44 -05:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Pandas\n",
"\n",
"![pandas logo](https://pandas.pydata.org/static/img/pandas.svg)\n",
"\n",
"\"high-performance, easy-to-use data structures and data analysis tools\" https://pandas.pydata.org/\n",
"\n",
"Pandas is typically imported as `pd`."
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 1,
2024-11-22 03:30:44 -05:00
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 8,
2024-11-22 03:30:44 -05:00
"metadata": {},
"outputs": [],
"source": [
"# We will start with the Iris dataset from a previous lecture\n",
"iris_dataset = {'sepal length (cm)': [5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.8, 4.8, 4.3, 5.8, 5.7, 5.4, 5.1, 5.7, 5.1, 5.4, 5.1, 4.6, 5.1, 4.8, 5.0, 5.0, 5.2, 5.2, 4.7, 4.8, 5.4, 5.2, 5.5, 4.9, 5.0, 5.5, 4.9, 4.4, 5.1, 5.0, 4.5, 4.4, 5.0, 5.1, 4.8, 5.1, 4.6, 5.3, 5.0, 7.0, 6.4, 6.9, 5.5, 6.5, 5.7, 6.3, 4.9, 6.6, 5.2, 5.0, 5.9, 6.0, 6.1, 5.6, 6.7, 5.6, 5.8, 6.2, 5.6, 5.9, 6.1, 6.3, 6.1, 6.4, 6.6, 6.8, 6.7, 6.0, 5.7, 5.5, 5.5, 5.8, 6.0, 5.4, 6.0, 6.7, 6.3, 5.6, 5.5, 5.5, 6.1, 5.8, 5.0, 5.6, 5.7, 5.7, 6.2, 5.1, 5.7, 6.3, 5.8, 7.1, 6.3, 6.5, 7.6, 4.9, 7.3, 6.7, 7.2, 6.5, 6.4, 6.8, 5.7, 5.8, 6.4, 6.5, 7.7, 7.7, 6.0, 6.9, 5.6, 7.7, 6.3, 6.7, 7.2, 6.2, 6.1, 6.4, 7.2, 7.4, 7.9, 6.4, 6.3, 6.1, 7.7, 6.3, 6.4, 6.0, 6.9, 6.7, 6.9, 5.8, 6.8, 6.7, 6.7, 6.3, 6.5, 6.2, 5.9], 'sepal width (cm)': [3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.4, 3.0, 3.0, 4.0, 4.4, 3.9, 3.5, 3.8, 3.8, 3.4, 3.7, 3.6, 3.3, 3.4, 3.0, 3.4, 3.5, 3.4, 3.2, 3.1, 3.4, 4.1, 4.2, 3.1, 3.2, 3.5, 3.6, 3.0, 3.4, 3.5, 2.3, 3.2, 3.5, 3.8, 3.0, 3.8, 3.2, 3.7, 3.3, 3.2, 3.2, 3.1, 2.3, 2.8, 2.8, 3.3, 2.4, 2.9, 2.7, 2.0, 3.0, 2.2, 2.9, 2.9, 3.1, 3.0, 2.7, 2.2, 2.5, 3.2, 2.8, 2.5, 2.8, 2.9, 3.0, 2.8, 3.0, 2.9, 2.6, 2.4, 2.4, 2.7, 2.7, 3.0, 3.4, 3.1, 2.3, 3.0, 2.5, 2.6, 3.0, 2.6, 2.3, 2.7, 3.0, 2.9, 2.9, 2.5, 2.8, 3.3, 2.7, 3.0, 2.9, 3.0, 3.0, 2.5, 2.9, 2.5, 3.6, 3.2, 2.7, 3.0, 2.5, 2.8, 3.2, 3.0, 3.8, 2.6, 2.2, 3.2, 2.8, 2.8, 2.7, 3.3, 3.2, 2.8, 3.0, 2.8, 3.0, 2.8, 3.8, 2.8, 2.8, 2.6, 3.0, 3.4, 3.1, 3.0, 3.1, 3.1, 3.1, 2.7, 3.2, 3.3, 3.0, 2.5, 3.0, 3.4, 3.0], 'petal length (cm)': [1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.6, 1.4, 1.1, 1.2, 1.5, 1.3, 1.4, 1.7, 1.5, 1.7, 1.5, 1.0, 1.7, 1.9, 1.6, 1.6, 1.5, 1.4, 1.6, 1.6, 1.5, 1.5, 1.4, 1.5, 1.2, 1.3, 1.4, 1.3, 1.5, 1.3, 1.3, 1.3, 1.6, 1.9, 1.4, 1.6, 1.4, 1.5, 1.4, 4.7, 4.5, 4.9, 4.0, 4.6, 4.5, 4.7, 3.3, 4.6, 3.9, 3.5, 4.2, 4.0, 4.7, 3.6, 4.4, 4.5, 4.1, 4.5, 3.9, 4.8, 4.0, 4.9, 4.7, 4.3, 4.4, 4.8, 5.0, 4.5, 3.5, 3.8, 3.7, 3.9, 5.1, 4.5, 4.5, 4.7, 4.4, 4.1, 4.0, 4.4, 4.6, 4.0, 3.3, 4.2, 4.2, 4.2, 4.3, 3.0, 4.1, 6.0, 5.1, 5.9, 5.6, 5.8, 6.6, 4.5, 6.3, 5.8, 6.1, 5.1, 5.3, 5.5, 5.0, 5.1, 5.3, 5.5, 6.7, 6.9, 5.0, 5.7, 4.9, 6.7, 4.9, 5.7, 6.0, 4.8, 4.9, 5.6, 5.8, 6.1, 6.4, 5.6, 5.1, 5.6, 6.1, 5.6, 5.5, 4.8, 5.4, 5.6, 5.1, 5.1, 5.9, 5.7, 5.2, 5.0, 5.2, 5.4, 5.1], 'petal width (cm)': [0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.2, 0.1, 0.1, 0.2, 0.4, 0.4, 0.3, 0.3, 0.3, 0.2, 0.4, 0.2, 0.5, 0.2, 0.2, 0.4, 0.2, 0.2, 0.2, 0.2, 0.4, 0.1, 0.2, 0.2, 0.2, 0.2, 0.1, 0.2, 0.2, 0.3, 0.3, 0.2, 0.6, 0.4, 0.3, 0.2, 0.2, 0.2, 0.2, 1.4, 1.5, 1.5, 1.3, 1.5, 1.3, 1.6, 1.0, 1.3, 1.4, 1.0, 1.5, 1.0, 1.4, 1.3, 1.4, 1.5, 1.0, 1.5, 1.1, 1.8, 1.3, 1.5, 1.2, 1.3, 1.4, 1.4, 1.7, 1.5, 1.0, 1.1, 1.0, 1.2, 1.6, 1.5, 1.6, 1.5, 1.3, 1.3, 1.3, 1.2, 1.4, 1.2, 1.0, 1.3, 1.2, 1.3, 1.3, 1.1, 1.3, 2.5, 1.9, 2.1, 1.8, 2.2, 2.1, 1.7, 1.8, 1.8, 2.5, 2.0, 1.9, 2.1, 2.0, 2.4, 2.3, 1.8, 2.2, 2.3, 1.5, 2.3, 2.0, 2.0, 1.8, 2.1, 1.8, 1.8, 1.8, 2.1, 1.6, 1.9, 2.0, 2.2, 1.5, 1.4, 2.3, 2.4, 1.8, 1.8, 2.1, 2.4, 2.3, 1.9, 2.3, 2.5, 2.3, 1.9, 2.0, 2.3, 1.8], 'species': ['setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'setosa', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolor', 'versicolo
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# pandas `DataFrame`\n",
"\n",
"The primary interest in pandas is the `DataFrame`. A `DataFrame` is a type, conceptually related to a numpy array, for containing large amounts of data and operating efficiently on it. With `DataFrame`s, however, there is typically more structure. A `DataFrame` is always two dimensional, with every element in a column having the same data type. There are multiple columns, each with a name and potentially different datatypes. The easiest way to think about a `DataFrame` is like a well-organized spreadsheet. Indeed, `DataFrame`s are great for doing the kind of calculations you might do in spreadsheets.\n",
"\n",
"## Creation\n",
"\n",
"One way to create a pandas `DataFrame` is by using its constructor, `DataFrame()`. If provided one argument, a dictionary, it will create a new `DataFrame` instance with a column from each item in the dict. The dict key becomes the column name and the dict value (a Python sequence) becomes are the column data values. Pandas will infer the datatype for the column. It is required that the length of all sequences in the dict are identical so that each column in the `DataFrame` has the same length.\n",
"\n",
"As an example, let's load our Iris dataset."
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 9,
2024-11-22 03:30:44 -05:00
"metadata": {},
"outputs": [],
"source": [
"df = pd.DataFrame(iris_dataset)"
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 10,
2024-11-22 03:30:44 -05:00
"metadata": {},
2024-11-25 01:21:30 -05:00
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sepal length (cm)</th>\n",
" <th>sepal width (cm)</th>\n",
" <th>petal length (cm)</th>\n",
" <th>petal width (cm)</th>\n",
" <th>species</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>5.1</td>\n",
" <td>3.5</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>4.9</td>\n",
" <td>3.0</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>4.7</td>\n",
" <td>3.2</td>\n",
" <td>1.3</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4.6</td>\n",
" <td>3.1</td>\n",
" <td>1.5</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5.0</td>\n",
" <td>3.6</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>145</th>\n",
" <td>6.7</td>\n",
" <td>3.0</td>\n",
" <td>5.2</td>\n",
" <td>2.3</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>146</th>\n",
" <td>6.3</td>\n",
" <td>2.5</td>\n",
" <td>5.0</td>\n",
" <td>1.9</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>147</th>\n",
" <td>6.5</td>\n",
" <td>3.0</td>\n",
" <td>5.2</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>148</th>\n",
" <td>6.2</td>\n",
" <td>3.4</td>\n",
" <td>5.4</td>\n",
" <td>2.3</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>149</th>\n",
" <td>5.9</td>\n",
" <td>3.0</td>\n",
" <td>5.1</td>\n",
" <td>1.8</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>150 rows × 5 columns</p>\n",
"</div>"
],
"text/plain": [
" sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) \\\n",
"0 5.1 3.5 1.4 0.2 \n",
"1 4.9 3.0 1.4 0.2 \n",
"2 4.7 3.2 1.3 0.2 \n",
"3 4.6 3.1 1.5 0.2 \n",
"4 5.0 3.6 1.4 0.2 \n",
".. ... ... ... ... \n",
"145 6.7 3.0 5.2 2.3 \n",
"146 6.3 2.5 5.0 1.9 \n",
"147 6.5 3.0 5.2 2.0 \n",
"148 6.2 3.4 5.4 2.3 \n",
"149 5.9 3.0 5.1 1.8 \n",
"\n",
" species \n",
"0 setosa \n",
"1 setosa \n",
"2 setosa \n",
"3 setosa \n",
"4 setosa \n",
".. ... \n",
"145 virginica \n",
"146 virginica \n",
"147 virginica \n",
"148 virginica \n",
"149 virginica \n",
"\n",
"[150 rows x 5 columns]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
2024-11-22 03:30:44 -05:00
"source": [
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `head()` and `tail()` methods both return dataframes which are a subset of the original dataframe, with the top and bottom rows, respectively:"
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 11,
2024-11-22 03:30:44 -05:00
"metadata": {},
2024-11-25 01:21:30 -05:00
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sepal length (cm)</th>\n",
" <th>sepal width (cm)</th>\n",
" <th>petal length (cm)</th>\n",
" <th>petal width (cm)</th>\n",
" <th>species</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>5.1</td>\n",
" <td>3.5</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>4.9</td>\n",
" <td>3.0</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>4.7</td>\n",
" <td>3.2</td>\n",
" <td>1.3</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4.6</td>\n",
" <td>3.1</td>\n",
" <td>1.5</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5.0</td>\n",
" <td>3.6</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) \\\n",
"0 5.1 3.5 1.4 0.2 \n",
"1 4.9 3.0 1.4 0.2 \n",
"2 4.7 3.2 1.3 0.2 \n",
"3 4.6 3.1 1.5 0.2 \n",
"4 5.0 3.6 1.4 0.2 \n",
"\n",
" species \n",
"0 setosa \n",
"1 setosa \n",
"2 setosa \n",
"3 setosa \n",
"4 setosa "
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
2024-11-22 03:30:44 -05:00
"source": [
"df.head()"
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 12,
2024-11-22 03:30:44 -05:00
"metadata": {},
2024-11-25 01:21:30 -05:00
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sepal length (cm)</th>\n",
" <th>sepal width (cm)</th>\n",
" <th>petal length (cm)</th>\n",
" <th>petal width (cm)</th>\n",
" <th>species</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>145</th>\n",
" <td>6.7</td>\n",
" <td>3.0</td>\n",
" <td>5.2</td>\n",
" <td>2.3</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>146</th>\n",
" <td>6.3</td>\n",
" <td>2.5</td>\n",
" <td>5.0</td>\n",
" <td>1.9</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>147</th>\n",
" <td>6.5</td>\n",
" <td>3.0</td>\n",
" <td>5.2</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>148</th>\n",
" <td>6.2</td>\n",
" <td>3.4</td>\n",
" <td>5.4</td>\n",
" <td>2.3</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>149</th>\n",
" <td>5.9</td>\n",
" <td>3.0</td>\n",
" <td>5.1</td>\n",
" <td>1.8</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) \\\n",
"145 6.7 3.0 5.2 2.3 \n",
"146 6.3 2.5 5.0 1.9 \n",
"147 6.5 3.0 5.2 2.0 \n",
"148 6.2 3.4 5.4 2.3 \n",
"149 5.9 3.0 5.1 1.8 \n",
"\n",
" species \n",
"145 virginica \n",
"146 virginica \n",
"147 virginica \n",
"148 virginica \n",
"149 virginica "
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
2024-11-22 03:30:44 -05:00
"source": [
"df.tail()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that jupyter and pandas work nicely together to give the nicely formatted output you see above. Here is plain `print()`:"
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 13,
2024-11-22 03:30:44 -05:00
"metadata": {},
2024-11-25 01:21:30 -05:00
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) \\\n",
"145 6.7 3.0 5.2 2.3 \n",
"146 6.3 2.5 5.0 1.9 \n",
"147 6.5 3.0 5.2 2.0 \n",
"148 6.2 3.4 5.4 2.3 \n",
"149 5.9 3.0 5.1 1.8 \n",
"\n",
" species \n",
"145 virginica \n",
"146 virginica \n",
"147 virginica \n",
"148 virginica \n",
"149 virginica \n"
]
}
],
2024-11-22 03:30:44 -05:00
"source": [
"print(df.tail())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What's happening behind the scenes is that Pandas knows how to use the `diplay()` function from IPython."
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 14,
2024-11-22 03:30:44 -05:00
"metadata": {},
"outputs": [],
"source": [
"from IPython.display import display"
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 15,
2024-11-22 03:30:44 -05:00
"metadata": {},
2024-11-25 01:21:30 -05:00
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sepal length (cm)</th>\n",
" <th>sepal width (cm)</th>\n",
" <th>petal length (cm)</th>\n",
" <th>petal width (cm)</th>\n",
" <th>species</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>5.1</td>\n",
" <td>3.5</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>4.9</td>\n",
" <td>3.0</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>4.7</td>\n",
" <td>3.2</td>\n",
" <td>1.3</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4.6</td>\n",
" <td>3.1</td>\n",
" <td>1.5</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5.0</td>\n",
" <td>3.6</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) \\\n",
"0 5.1 3.5 1.4 0.2 \n",
"1 4.9 3.0 1.4 0.2 \n",
"2 4.7 3.2 1.3 0.2 \n",
"3 4.6 3.1 1.5 0.2 \n",
"4 5.0 3.6 1.4 0.2 \n",
"\n",
" species \n",
"0 setosa \n",
"1 setosa \n",
"2 setosa \n",
"3 setosa \n",
"4 setosa "
]
},
"metadata": {},
"output_type": "display_data"
}
],
2024-11-22 03:30:44 -05:00
"source": [
"display(df.head())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# The `DataFrame.groupby()` method\n",
"\n",
"One of the most useful aspects of dataframes is the `groupby()` method, which returns an iterator that steps through the original dataframe by returning subsets (groups) which all have been selected based on a common value. An example will make this more clear.\n",
"\n",
"Here we will step through our original dataframe grouping by species. The iterator from `groupby()` returns, on each iteration, a tuple of `(group_value, group_data_frame)`. Let's look at this in action:"
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 17,
2024-11-22 03:30:44 -05:00
"metadata": {},
2024-11-25 01:21:30 -05:00
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"species: Iris setosa\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sepal length (cm)</th>\n",
" <th>sepal width (cm)</th>\n",
" <th>petal length (cm)</th>\n",
" <th>petal width (cm)</th>\n",
" <th>species</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>5.1</td>\n",
" <td>3.5</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>4.9</td>\n",
" <td>3.0</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>4.7</td>\n",
" <td>3.2</td>\n",
" <td>1.3</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4.6</td>\n",
" <td>3.1</td>\n",
" <td>1.5</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5.0</td>\n",
" <td>3.6</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) \\\n",
"0 5.1 3.5 1.4 0.2 \n",
"1 4.9 3.0 1.4 0.2 \n",
"2 4.7 3.2 1.3 0.2 \n",
"3 4.6 3.1 1.5 0.2 \n",
"4 5.0 3.6 1.4 0.2 \n",
"\n",
" species \n",
"0 setosa \n",
"1 setosa \n",
"2 setosa \n",
"3 setosa \n",
"4 setosa "
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"species: Iris versicolor\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sepal length (cm)</th>\n",
" <th>sepal width (cm)</th>\n",
" <th>petal length (cm)</th>\n",
" <th>petal width (cm)</th>\n",
" <th>species</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>50</th>\n",
" <td>7.0</td>\n",
" <td>3.2</td>\n",
" <td>4.7</td>\n",
" <td>1.4</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>51</th>\n",
" <td>6.4</td>\n",
" <td>3.2</td>\n",
" <td>4.5</td>\n",
" <td>1.5</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>52</th>\n",
" <td>6.9</td>\n",
" <td>3.1</td>\n",
" <td>4.9</td>\n",
" <td>1.5</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>53</th>\n",
" <td>5.5</td>\n",
" <td>2.3</td>\n",
" <td>4.0</td>\n",
" <td>1.3</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>54</th>\n",
" <td>6.5</td>\n",
" <td>2.8</td>\n",
" <td>4.6</td>\n",
" <td>1.5</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) \\\n",
"50 7.0 3.2 4.7 1.4 \n",
"51 6.4 3.2 4.5 1.5 \n",
"52 6.9 3.1 4.9 1.5 \n",
"53 5.5 2.3 4.0 1.3 \n",
"54 6.5 2.8 4.6 1.5 \n",
"\n",
" species \n",
"50 versicolor \n",
"51 versicolor \n",
"52 versicolor \n",
"53 versicolor \n",
"54 versicolor "
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"species: Iris virginica\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sepal length (cm)</th>\n",
" <th>sepal width (cm)</th>\n",
" <th>petal length (cm)</th>\n",
" <th>petal width (cm)</th>\n",
" <th>species</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>100</th>\n",
" <td>6.3</td>\n",
" <td>3.3</td>\n",
" <td>6.0</td>\n",
" <td>2.5</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>101</th>\n",
" <td>5.8</td>\n",
" <td>2.7</td>\n",
" <td>5.1</td>\n",
" <td>1.9</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>102</th>\n",
" <td>7.1</td>\n",
" <td>3.0</td>\n",
" <td>5.9</td>\n",
" <td>2.1</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>103</th>\n",
" <td>6.3</td>\n",
" <td>2.9</td>\n",
" <td>5.6</td>\n",
" <td>1.8</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>104</th>\n",
" <td>6.5</td>\n",
" <td>3.0</td>\n",
" <td>5.8</td>\n",
" <td>2.2</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) \\\n",
"100 6.3 3.3 6.0 2.5 \n",
"101 5.8 2.7 5.1 1.9 \n",
"102 7.1 3.0 5.9 2.1 \n",
"103 6.3 2.9 5.6 1.8 \n",
"104 6.5 3.0 5.8 2.2 \n",
"\n",
" species \n",
"100 virginica \n",
"101 virginica \n",
"102 virginica \n",
"103 virginica \n",
"104 virginica "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"my_iterator = df.groupby('species')\n",
"for x in my_iterator:\n",
" species, gdf = x\n",
" print(f\"species: Iris {species}\")\n",
" display(gdf.head())"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"species: Iris setosa\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sepal length (cm)</th>\n",
" <th>sepal width (cm)</th>\n",
" <th>petal length (cm)</th>\n",
" <th>petal width (cm)</th>\n",
" <th>species</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>5.1</td>\n",
" <td>3.5</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>4.9</td>\n",
" <td>3.0</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>4.7</td>\n",
" <td>3.2</td>\n",
" <td>1.3</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4.6</td>\n",
" <td>3.1</td>\n",
" <td>1.5</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5.0</td>\n",
" <td>3.6</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) \\\n",
"0 5.1 3.5 1.4 0.2 \n",
"1 4.9 3.0 1.4 0.2 \n",
"2 4.7 3.2 1.3 0.2 \n",
"3 4.6 3.1 1.5 0.2 \n",
"4 5.0 3.6 1.4 0.2 \n",
"\n",
" species \n",
"0 setosa \n",
"1 setosa \n",
"2 setosa \n",
"3 setosa \n",
"4 setosa "
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"species: Iris versicolor\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sepal length (cm)</th>\n",
" <th>sepal width (cm)</th>\n",
" <th>petal length (cm)</th>\n",
" <th>petal width (cm)</th>\n",
" <th>species</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>50</th>\n",
" <td>7.0</td>\n",
" <td>3.2</td>\n",
" <td>4.7</td>\n",
" <td>1.4</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>51</th>\n",
" <td>6.4</td>\n",
" <td>3.2</td>\n",
" <td>4.5</td>\n",
" <td>1.5</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>52</th>\n",
" <td>6.9</td>\n",
" <td>3.1</td>\n",
" <td>4.9</td>\n",
" <td>1.5</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>53</th>\n",
" <td>5.5</td>\n",
" <td>2.3</td>\n",
" <td>4.0</td>\n",
" <td>1.3</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>54</th>\n",
" <td>6.5</td>\n",
" <td>2.8</td>\n",
" <td>4.6</td>\n",
" <td>1.5</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) \\\n",
"50 7.0 3.2 4.7 1.4 \n",
"51 6.4 3.2 4.5 1.5 \n",
"52 6.9 3.1 4.9 1.5 \n",
"53 5.5 2.3 4.0 1.3 \n",
"54 6.5 2.8 4.6 1.5 \n",
"\n",
" species \n",
"50 versicolor \n",
"51 versicolor \n",
"52 versicolor \n",
"53 versicolor \n",
"54 versicolor "
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"species: Iris virginica\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sepal length (cm)</th>\n",
" <th>sepal width (cm)</th>\n",
" <th>petal length (cm)</th>\n",
" <th>petal width (cm)</th>\n",
" <th>species</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>100</th>\n",
" <td>6.3</td>\n",
" <td>3.3</td>\n",
" <td>6.0</td>\n",
" <td>2.5</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>101</th>\n",
" <td>5.8</td>\n",
" <td>2.7</td>\n",
" <td>5.1</td>\n",
" <td>1.9</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>102</th>\n",
" <td>7.1</td>\n",
" <td>3.0</td>\n",
" <td>5.9</td>\n",
" <td>2.1</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>103</th>\n",
" <td>6.3</td>\n",
" <td>2.9</td>\n",
" <td>5.6</td>\n",
" <td>1.8</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>104</th>\n",
" <td>6.5</td>\n",
" <td>3.0</td>\n",
" <td>5.8</td>\n",
" <td>2.2</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) \\\n",
"100 6.3 3.3 6.0 2.5 \n",
"101 5.8 2.7 5.1 1.9 \n",
"102 7.1 3.0 5.9 2.1 \n",
"103 6.3 2.9 5.6 1.8 \n",
"104 6.5 3.0 5.8 2.2 \n",
"\n",
" species \n",
"100 virginica \n",
"101 virginica \n",
"102 virginica \n",
"103 virginica \n",
"104 virginica "
]
},
"metadata": {},
"output_type": "display_data"
}
],
2024-11-22 03:30:44 -05:00
"source": [
"for species, gdf in df.groupby('species'):\n",
" print(f\"species: Iris {species}\")\n",
" display(gdf.head())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's take a closer look at this iteration aspect:"
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 19,
2024-11-22 03:30:44 -05:00
"metadata": {},
2024-11-25 01:21:30 -05:00
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.groupby.generic.DataFrameGroupBy'>\n"
]
}
],
2024-11-22 03:30:44 -05:00
"source": [
"my_iter = df.groupby('species')\n",
"print(type(my_iter))"
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 20,
2024-11-22 03:30:44 -05:00
"metadata": {},
2024-11-25 01:21:30 -05:00
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"setosa 50\n",
"versicolor 50\n",
"virginica 50\n"
]
}
],
2024-11-22 03:30:44 -05:00
"source": [
"my_iter = df.groupby('species')\n",
"for x in my_iter:\n",
" # species = x[0]\n",
" # gdf = x[1]\n",
" species, gdf = x\n",
" # (species, gdf) = x\n",
" print(species, len(gdf))"
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 21,
2024-11-22 03:30:44 -05:00
"metadata": {},
2024-11-25 01:21:30 -05:00
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"setosa 50\n",
"versicolor 50\n",
"virginica 50\n"
]
}
],
2024-11-22 03:30:44 -05:00
"source": [
"for species, gdf in df.groupby('species'):\n",
" print(species, len(gdf))"
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 22,
2024-11-22 03:30:44 -05:00
"metadata": {},
2024-11-25 01:21:30 -05:00
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"=============== setosa ============\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sepal length (cm)</th>\n",
" <th>sepal width (cm)</th>\n",
" <th>petal length (cm)</th>\n",
" <th>petal width (cm)</th>\n",
" <th>species</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>5.1</td>\n",
" <td>3.5</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>4.9</td>\n",
" <td>3.0</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>4.7</td>\n",
" <td>3.2</td>\n",
" <td>1.3</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4.6</td>\n",
" <td>3.1</td>\n",
" <td>1.5</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5.0</td>\n",
" <td>3.6</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>5.4</td>\n",
" <td>3.9</td>\n",
" <td>1.7</td>\n",
" <td>0.4</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>4.6</td>\n",
" <td>3.4</td>\n",
" <td>1.4</td>\n",
" <td>0.3</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>5.0</td>\n",
" <td>3.4</td>\n",
" <td>1.5</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>4.4</td>\n",
" <td>2.9</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>4.9</td>\n",
" <td>3.1</td>\n",
" <td>1.5</td>\n",
" <td>0.1</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>5.4</td>\n",
" <td>3.7</td>\n",
" <td>1.5</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>4.8</td>\n",
" <td>3.4</td>\n",
" <td>1.6</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>4.8</td>\n",
" <td>3.0</td>\n",
" <td>1.4</td>\n",
" <td>0.1</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>4.3</td>\n",
" <td>3.0</td>\n",
" <td>1.1</td>\n",
" <td>0.1</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>5.8</td>\n",
" <td>4.0</td>\n",
" <td>1.2</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>5.7</td>\n",
" <td>4.4</td>\n",
" <td>1.5</td>\n",
" <td>0.4</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>5.4</td>\n",
" <td>3.9</td>\n",
" <td>1.3</td>\n",
" <td>0.4</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>5.1</td>\n",
" <td>3.5</td>\n",
" <td>1.4</td>\n",
" <td>0.3</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>5.7</td>\n",
" <td>3.8</td>\n",
" <td>1.7</td>\n",
" <td>0.3</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>5.1</td>\n",
" <td>3.8</td>\n",
" <td>1.5</td>\n",
" <td>0.3</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>5.4</td>\n",
" <td>3.4</td>\n",
" <td>1.7</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>5.1</td>\n",
" <td>3.7</td>\n",
" <td>1.5</td>\n",
" <td>0.4</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>4.6</td>\n",
" <td>3.6</td>\n",
" <td>1.0</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>5.1</td>\n",
" <td>3.3</td>\n",
" <td>1.7</td>\n",
" <td>0.5</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>4.8</td>\n",
" <td>3.4</td>\n",
" <td>1.9</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>5.0</td>\n",
" <td>3.0</td>\n",
" <td>1.6</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>5.0</td>\n",
" <td>3.4</td>\n",
" <td>1.6</td>\n",
" <td>0.4</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>5.2</td>\n",
" <td>3.5</td>\n",
" <td>1.5</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>5.2</td>\n",
" <td>3.4</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>4.7</td>\n",
" <td>3.2</td>\n",
" <td>1.6</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>30</th>\n",
" <td>4.8</td>\n",
" <td>3.1</td>\n",
" <td>1.6</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>31</th>\n",
" <td>5.4</td>\n",
" <td>3.4</td>\n",
" <td>1.5</td>\n",
" <td>0.4</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>32</th>\n",
" <td>5.2</td>\n",
" <td>4.1</td>\n",
" <td>1.5</td>\n",
" <td>0.1</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>33</th>\n",
" <td>5.5</td>\n",
" <td>4.2</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>34</th>\n",
" <td>4.9</td>\n",
" <td>3.1</td>\n",
" <td>1.5</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>35</th>\n",
" <td>5.0</td>\n",
" <td>3.2</td>\n",
" <td>1.2</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>36</th>\n",
" <td>5.5</td>\n",
" <td>3.5</td>\n",
" <td>1.3</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>37</th>\n",
" <td>4.9</td>\n",
" <td>3.6</td>\n",
" <td>1.4</td>\n",
" <td>0.1</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>38</th>\n",
" <td>4.4</td>\n",
" <td>3.0</td>\n",
" <td>1.3</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>39</th>\n",
" <td>5.1</td>\n",
" <td>3.4</td>\n",
" <td>1.5</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>40</th>\n",
" <td>5.0</td>\n",
" <td>3.5</td>\n",
" <td>1.3</td>\n",
" <td>0.3</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41</th>\n",
" <td>4.5</td>\n",
" <td>2.3</td>\n",
" <td>1.3</td>\n",
" <td>0.3</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>42</th>\n",
" <td>4.4</td>\n",
" <td>3.2</td>\n",
" <td>1.3</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>43</th>\n",
" <td>5.0</td>\n",
" <td>3.5</td>\n",
" <td>1.6</td>\n",
" <td>0.6</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>44</th>\n",
" <td>5.1</td>\n",
" <td>3.8</td>\n",
" <td>1.9</td>\n",
" <td>0.4</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>45</th>\n",
" <td>4.8</td>\n",
" <td>3.0</td>\n",
" <td>1.4</td>\n",
" <td>0.3</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>46</th>\n",
" <td>5.1</td>\n",
" <td>3.8</td>\n",
" <td>1.6</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>47</th>\n",
" <td>4.6</td>\n",
" <td>3.2</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>48</th>\n",
" <td>5.3</td>\n",
" <td>3.7</td>\n",
" <td>1.5</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49</th>\n",
" <td>5.0</td>\n",
" <td>3.3</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) \\\n",
"0 5.1 3.5 1.4 0.2 \n",
"1 4.9 3.0 1.4 0.2 \n",
"2 4.7 3.2 1.3 0.2 \n",
"3 4.6 3.1 1.5 0.2 \n",
"4 5.0 3.6 1.4 0.2 \n",
"5 5.4 3.9 1.7 0.4 \n",
"6 4.6 3.4 1.4 0.3 \n",
"7 5.0 3.4 1.5 0.2 \n",
"8 4.4 2.9 1.4 0.2 \n",
"9 4.9 3.1 1.5 0.1 \n",
"10 5.4 3.7 1.5 0.2 \n",
"11 4.8 3.4 1.6 0.2 \n",
"12 4.8 3.0 1.4 0.1 \n",
"13 4.3 3.0 1.1 0.1 \n",
"14 5.8 4.0 1.2 0.2 \n",
"15 5.7 4.4 1.5 0.4 \n",
"16 5.4 3.9 1.3 0.4 \n",
"17 5.1 3.5 1.4 0.3 \n",
"18 5.7 3.8 1.7 0.3 \n",
"19 5.1 3.8 1.5 0.3 \n",
"20 5.4 3.4 1.7 0.2 \n",
"21 5.1 3.7 1.5 0.4 \n",
"22 4.6 3.6 1.0 0.2 \n",
"23 5.1 3.3 1.7 0.5 \n",
"24 4.8 3.4 1.9 0.2 \n",
"25 5.0 3.0 1.6 0.2 \n",
"26 5.0 3.4 1.6 0.4 \n",
"27 5.2 3.5 1.5 0.2 \n",
"28 5.2 3.4 1.4 0.2 \n",
"29 4.7 3.2 1.6 0.2 \n",
"30 4.8 3.1 1.6 0.2 \n",
"31 5.4 3.4 1.5 0.4 \n",
"32 5.2 4.1 1.5 0.1 \n",
"33 5.5 4.2 1.4 0.2 \n",
"34 4.9 3.1 1.5 0.2 \n",
"35 5.0 3.2 1.2 0.2 \n",
"36 5.5 3.5 1.3 0.2 \n",
"37 4.9 3.6 1.4 0.1 \n",
"38 4.4 3.0 1.3 0.2 \n",
"39 5.1 3.4 1.5 0.2 \n",
"40 5.0 3.5 1.3 0.3 \n",
"41 4.5 2.3 1.3 0.3 \n",
"42 4.4 3.2 1.3 0.2 \n",
"43 5.0 3.5 1.6 0.6 \n",
"44 5.1 3.8 1.9 0.4 \n",
"45 4.8 3.0 1.4 0.3 \n",
"46 5.1 3.8 1.6 0.2 \n",
"47 4.6 3.2 1.4 0.2 \n",
"48 5.3 3.7 1.5 0.2 \n",
"49 5.0 3.3 1.4 0.2 \n",
"\n",
" species \n",
"0 setosa \n",
"1 setosa \n",
"2 setosa \n",
"3 setosa \n",
"4 setosa \n",
"5 setosa \n",
"6 setosa \n",
"7 setosa \n",
"8 setosa \n",
"9 setosa \n",
"10 setosa \n",
"11 setosa \n",
"12 setosa \n",
"13 setosa \n",
"14 setosa \n",
"15 setosa \n",
"16 setosa \n",
"17 setosa \n",
"18 setosa \n",
"19 setosa \n",
"20 setosa \n",
"21 setosa \n",
"22 setosa \n",
"23 setosa \n",
"24 setosa \n",
"25 setosa \n",
"26 setosa \n",
"27 setosa \n",
"28 setosa \n",
"29 setosa \n",
"30 setosa \n",
"31 setosa \n",
"32 setosa \n",
"33 setosa \n",
"34 setosa \n",
"35 setosa \n",
"36 setosa \n",
"37 setosa \n",
"38 setosa \n",
"39 setosa \n",
"40 setosa \n",
"41 setosa \n",
"42 setosa \n",
"43 setosa \n",
"44 setosa \n",
"45 setosa \n",
"46 setosa \n",
"47 setosa \n",
"48 setosa \n",
"49 setosa "
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"=============== versicolor ============\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sepal length (cm)</th>\n",
" <th>sepal width (cm)</th>\n",
" <th>petal length (cm)</th>\n",
" <th>petal width (cm)</th>\n",
" <th>species</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>50</th>\n",
" <td>7.0</td>\n",
" <td>3.2</td>\n",
" <td>4.7</td>\n",
" <td>1.4</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>51</th>\n",
" <td>6.4</td>\n",
" <td>3.2</td>\n",
" <td>4.5</td>\n",
" <td>1.5</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>52</th>\n",
" <td>6.9</td>\n",
" <td>3.1</td>\n",
" <td>4.9</td>\n",
" <td>1.5</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>53</th>\n",
" <td>5.5</td>\n",
" <td>2.3</td>\n",
" <td>4.0</td>\n",
" <td>1.3</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>54</th>\n",
" <td>6.5</td>\n",
" <td>2.8</td>\n",
" <td>4.6</td>\n",
" <td>1.5</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>55</th>\n",
" <td>5.7</td>\n",
" <td>2.8</td>\n",
" <td>4.5</td>\n",
" <td>1.3</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>56</th>\n",
" <td>6.3</td>\n",
" <td>3.3</td>\n",
" <td>4.7</td>\n",
" <td>1.6</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>57</th>\n",
" <td>4.9</td>\n",
" <td>2.4</td>\n",
" <td>3.3</td>\n",
" <td>1.0</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>58</th>\n",
" <td>6.6</td>\n",
" <td>2.9</td>\n",
" <td>4.6</td>\n",
" <td>1.3</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>59</th>\n",
" <td>5.2</td>\n",
" <td>2.7</td>\n",
" <td>3.9</td>\n",
" <td>1.4</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>60</th>\n",
" <td>5.0</td>\n",
" <td>2.0</td>\n",
" <td>3.5</td>\n",
" <td>1.0</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>61</th>\n",
" <td>5.9</td>\n",
" <td>3.0</td>\n",
" <td>4.2</td>\n",
" <td>1.5</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>62</th>\n",
" <td>6.0</td>\n",
" <td>2.2</td>\n",
" <td>4.0</td>\n",
" <td>1.0</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>63</th>\n",
" <td>6.1</td>\n",
" <td>2.9</td>\n",
" <td>4.7</td>\n",
" <td>1.4</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>64</th>\n",
" <td>5.6</td>\n",
" <td>2.9</td>\n",
" <td>3.6</td>\n",
" <td>1.3</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>65</th>\n",
" <td>6.7</td>\n",
" <td>3.1</td>\n",
" <td>4.4</td>\n",
" <td>1.4</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>66</th>\n",
" <td>5.6</td>\n",
" <td>3.0</td>\n",
" <td>4.5</td>\n",
" <td>1.5</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>67</th>\n",
" <td>5.8</td>\n",
" <td>2.7</td>\n",
" <td>4.1</td>\n",
" <td>1.0</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>68</th>\n",
" <td>6.2</td>\n",
" <td>2.2</td>\n",
" <td>4.5</td>\n",
" <td>1.5</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>69</th>\n",
" <td>5.6</td>\n",
" <td>2.5</td>\n",
" <td>3.9</td>\n",
" <td>1.1</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>70</th>\n",
" <td>5.9</td>\n",
" <td>3.2</td>\n",
" <td>4.8</td>\n",
" <td>1.8</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>71</th>\n",
" <td>6.1</td>\n",
" <td>2.8</td>\n",
" <td>4.0</td>\n",
" <td>1.3</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>72</th>\n",
" <td>6.3</td>\n",
" <td>2.5</td>\n",
" <td>4.9</td>\n",
" <td>1.5</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>73</th>\n",
" <td>6.1</td>\n",
" <td>2.8</td>\n",
" <td>4.7</td>\n",
" <td>1.2</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>74</th>\n",
" <td>6.4</td>\n",
" <td>2.9</td>\n",
" <td>4.3</td>\n",
" <td>1.3</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75</th>\n",
" <td>6.6</td>\n",
" <td>3.0</td>\n",
" <td>4.4</td>\n",
" <td>1.4</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>76</th>\n",
" <td>6.8</td>\n",
" <td>2.8</td>\n",
" <td>4.8</td>\n",
" <td>1.4</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>77</th>\n",
" <td>6.7</td>\n",
" <td>3.0</td>\n",
" <td>5.0</td>\n",
" <td>1.7</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>78</th>\n",
" <td>6.0</td>\n",
" <td>2.9</td>\n",
" <td>4.5</td>\n",
" <td>1.5</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>79</th>\n",
" <td>5.7</td>\n",
" <td>2.6</td>\n",
" <td>3.5</td>\n",
" <td>1.0</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>80</th>\n",
" <td>5.5</td>\n",
" <td>2.4</td>\n",
" <td>3.8</td>\n",
" <td>1.1</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>81</th>\n",
" <td>5.5</td>\n",
" <td>2.4</td>\n",
" <td>3.7</td>\n",
" <td>1.0</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>82</th>\n",
" <td>5.8</td>\n",
" <td>2.7</td>\n",
" <td>3.9</td>\n",
" <td>1.2</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>83</th>\n",
" <td>6.0</td>\n",
" <td>2.7</td>\n",
" <td>5.1</td>\n",
" <td>1.6</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>84</th>\n",
" <td>5.4</td>\n",
" <td>3.0</td>\n",
" <td>4.5</td>\n",
" <td>1.5</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>85</th>\n",
" <td>6.0</td>\n",
" <td>3.4</td>\n",
" <td>4.5</td>\n",
" <td>1.6</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86</th>\n",
" <td>6.7</td>\n",
" <td>3.1</td>\n",
" <td>4.7</td>\n",
" <td>1.5</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>87</th>\n",
" <td>6.3</td>\n",
" <td>2.3</td>\n",
" <td>4.4</td>\n",
" <td>1.3</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>88</th>\n",
" <td>5.6</td>\n",
" <td>3.0</td>\n",
" <td>4.1</td>\n",
" <td>1.3</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>89</th>\n",
" <td>5.5</td>\n",
" <td>2.5</td>\n",
" <td>4.0</td>\n",
" <td>1.3</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>90</th>\n",
" <td>5.5</td>\n",
" <td>2.6</td>\n",
" <td>4.4</td>\n",
" <td>1.2</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>91</th>\n",
" <td>6.1</td>\n",
" <td>3.0</td>\n",
" <td>4.6</td>\n",
" <td>1.4</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>92</th>\n",
" <td>5.8</td>\n",
" <td>2.6</td>\n",
" <td>4.0</td>\n",
" <td>1.2</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>93</th>\n",
" <td>5.0</td>\n",
" <td>2.3</td>\n",
" <td>3.3</td>\n",
" <td>1.0</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>94</th>\n",
" <td>5.6</td>\n",
" <td>2.7</td>\n",
" <td>4.2</td>\n",
" <td>1.3</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>95</th>\n",
" <td>5.7</td>\n",
" <td>3.0</td>\n",
" <td>4.2</td>\n",
" <td>1.2</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>96</th>\n",
" <td>5.7</td>\n",
" <td>2.9</td>\n",
" <td>4.2</td>\n",
" <td>1.3</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>97</th>\n",
" <td>6.2</td>\n",
" <td>2.9</td>\n",
" <td>4.3</td>\n",
" <td>1.3</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>98</th>\n",
" <td>5.1</td>\n",
" <td>2.5</td>\n",
" <td>3.0</td>\n",
" <td>1.1</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" <tr>\n",
" <th>99</th>\n",
" <td>5.7</td>\n",
" <td>2.8</td>\n",
" <td>4.1</td>\n",
" <td>1.3</td>\n",
" <td>versicolor</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) \\\n",
"50 7.0 3.2 4.7 1.4 \n",
"51 6.4 3.2 4.5 1.5 \n",
"52 6.9 3.1 4.9 1.5 \n",
"53 5.5 2.3 4.0 1.3 \n",
"54 6.5 2.8 4.6 1.5 \n",
"55 5.7 2.8 4.5 1.3 \n",
"56 6.3 3.3 4.7 1.6 \n",
"57 4.9 2.4 3.3 1.0 \n",
"58 6.6 2.9 4.6 1.3 \n",
"59 5.2 2.7 3.9 1.4 \n",
"60 5.0 2.0 3.5 1.0 \n",
"61 5.9 3.0 4.2 1.5 \n",
"62 6.0 2.2 4.0 1.0 \n",
"63 6.1 2.9 4.7 1.4 \n",
"64 5.6 2.9 3.6 1.3 \n",
"65 6.7 3.1 4.4 1.4 \n",
"66 5.6 3.0 4.5 1.5 \n",
"67 5.8 2.7 4.1 1.0 \n",
"68 6.2 2.2 4.5 1.5 \n",
"69 5.6 2.5 3.9 1.1 \n",
"70 5.9 3.2 4.8 1.8 \n",
"71 6.1 2.8 4.0 1.3 \n",
"72 6.3 2.5 4.9 1.5 \n",
"73 6.1 2.8 4.7 1.2 \n",
"74 6.4 2.9 4.3 1.3 \n",
"75 6.6 3.0 4.4 1.4 \n",
"76 6.8 2.8 4.8 1.4 \n",
"77 6.7 3.0 5.0 1.7 \n",
"78 6.0 2.9 4.5 1.5 \n",
"79 5.7 2.6 3.5 1.0 \n",
"80 5.5 2.4 3.8 1.1 \n",
"81 5.5 2.4 3.7 1.0 \n",
"82 5.8 2.7 3.9 1.2 \n",
"83 6.0 2.7 5.1 1.6 \n",
"84 5.4 3.0 4.5 1.5 \n",
"85 6.0 3.4 4.5 1.6 \n",
"86 6.7 3.1 4.7 1.5 \n",
"87 6.3 2.3 4.4 1.3 \n",
"88 5.6 3.0 4.1 1.3 \n",
"89 5.5 2.5 4.0 1.3 \n",
"90 5.5 2.6 4.4 1.2 \n",
"91 6.1 3.0 4.6 1.4 \n",
"92 5.8 2.6 4.0 1.2 \n",
"93 5.0 2.3 3.3 1.0 \n",
"94 5.6 2.7 4.2 1.3 \n",
"95 5.7 3.0 4.2 1.2 \n",
"96 5.7 2.9 4.2 1.3 \n",
"97 6.2 2.9 4.3 1.3 \n",
"98 5.1 2.5 3.0 1.1 \n",
"99 5.7 2.8 4.1 1.3 \n",
"\n",
" species \n",
"50 versicolor \n",
"51 versicolor \n",
"52 versicolor \n",
"53 versicolor \n",
"54 versicolor \n",
"55 versicolor \n",
"56 versicolor \n",
"57 versicolor \n",
"58 versicolor \n",
"59 versicolor \n",
"60 versicolor \n",
"61 versicolor \n",
"62 versicolor \n",
"63 versicolor \n",
"64 versicolor \n",
"65 versicolor \n",
"66 versicolor \n",
"67 versicolor \n",
"68 versicolor \n",
"69 versicolor \n",
"70 versicolor \n",
"71 versicolor \n",
"72 versicolor \n",
"73 versicolor \n",
"74 versicolor \n",
"75 versicolor \n",
"76 versicolor \n",
"77 versicolor \n",
"78 versicolor \n",
"79 versicolor \n",
"80 versicolor \n",
"81 versicolor \n",
"82 versicolor \n",
"83 versicolor \n",
"84 versicolor \n",
"85 versicolor \n",
"86 versicolor \n",
"87 versicolor \n",
"88 versicolor \n",
"89 versicolor \n",
"90 versicolor \n",
"91 versicolor \n",
"92 versicolor \n",
"93 versicolor \n",
"94 versicolor \n",
"95 versicolor \n",
"96 versicolor \n",
"97 versicolor \n",
"98 versicolor \n",
"99 versicolor "
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"=============== virginica ============\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sepal length (cm)</th>\n",
" <th>sepal width (cm)</th>\n",
" <th>petal length (cm)</th>\n",
" <th>petal width (cm)</th>\n",
" <th>species</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>100</th>\n",
" <td>6.3</td>\n",
" <td>3.3</td>\n",
" <td>6.0</td>\n",
" <td>2.5</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>101</th>\n",
" <td>5.8</td>\n",
" <td>2.7</td>\n",
" <td>5.1</td>\n",
" <td>1.9</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>102</th>\n",
" <td>7.1</td>\n",
" <td>3.0</td>\n",
" <td>5.9</td>\n",
" <td>2.1</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>103</th>\n",
" <td>6.3</td>\n",
" <td>2.9</td>\n",
" <td>5.6</td>\n",
" <td>1.8</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>104</th>\n",
" <td>6.5</td>\n",
" <td>3.0</td>\n",
" <td>5.8</td>\n",
" <td>2.2</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>105</th>\n",
" <td>7.6</td>\n",
" <td>3.0</td>\n",
" <td>6.6</td>\n",
" <td>2.1</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>106</th>\n",
" <td>4.9</td>\n",
" <td>2.5</td>\n",
" <td>4.5</td>\n",
" <td>1.7</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>107</th>\n",
" <td>7.3</td>\n",
" <td>2.9</td>\n",
" <td>6.3</td>\n",
" <td>1.8</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>108</th>\n",
" <td>6.7</td>\n",
" <td>2.5</td>\n",
" <td>5.8</td>\n",
" <td>1.8</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>109</th>\n",
" <td>7.2</td>\n",
" <td>3.6</td>\n",
" <td>6.1</td>\n",
" <td>2.5</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>110</th>\n",
" <td>6.5</td>\n",
" <td>3.2</td>\n",
" <td>5.1</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>111</th>\n",
" <td>6.4</td>\n",
" <td>2.7</td>\n",
" <td>5.3</td>\n",
" <td>1.9</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>112</th>\n",
" <td>6.8</td>\n",
" <td>3.0</td>\n",
" <td>5.5</td>\n",
" <td>2.1</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>113</th>\n",
" <td>5.7</td>\n",
" <td>2.5</td>\n",
" <td>5.0</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>114</th>\n",
" <td>5.8</td>\n",
" <td>2.8</td>\n",
" <td>5.1</td>\n",
" <td>2.4</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>115</th>\n",
" <td>6.4</td>\n",
" <td>3.2</td>\n",
" <td>5.3</td>\n",
" <td>2.3</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>116</th>\n",
" <td>6.5</td>\n",
" <td>3.0</td>\n",
" <td>5.5</td>\n",
" <td>1.8</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>117</th>\n",
" <td>7.7</td>\n",
" <td>3.8</td>\n",
" <td>6.7</td>\n",
" <td>2.2</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>118</th>\n",
" <td>7.7</td>\n",
" <td>2.6</td>\n",
" <td>6.9</td>\n",
" <td>2.3</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>119</th>\n",
" <td>6.0</td>\n",
" <td>2.2</td>\n",
" <td>5.0</td>\n",
" <td>1.5</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>120</th>\n",
" <td>6.9</td>\n",
" <td>3.2</td>\n",
" <td>5.7</td>\n",
" <td>2.3</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>121</th>\n",
" <td>5.6</td>\n",
" <td>2.8</td>\n",
" <td>4.9</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>122</th>\n",
" <td>7.7</td>\n",
" <td>2.8</td>\n",
" <td>6.7</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>123</th>\n",
" <td>6.3</td>\n",
" <td>2.7</td>\n",
" <td>4.9</td>\n",
" <td>1.8</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>124</th>\n",
" <td>6.7</td>\n",
" <td>3.3</td>\n",
" <td>5.7</td>\n",
" <td>2.1</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>125</th>\n",
" <td>7.2</td>\n",
" <td>3.2</td>\n",
" <td>6.0</td>\n",
" <td>1.8</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>126</th>\n",
" <td>6.2</td>\n",
" <td>2.8</td>\n",
" <td>4.8</td>\n",
" <td>1.8</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>127</th>\n",
" <td>6.1</td>\n",
" <td>3.0</td>\n",
" <td>4.9</td>\n",
" <td>1.8</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>128</th>\n",
" <td>6.4</td>\n",
" <td>2.8</td>\n",
" <td>5.6</td>\n",
" <td>2.1</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>129</th>\n",
" <td>7.2</td>\n",
" <td>3.0</td>\n",
" <td>5.8</td>\n",
" <td>1.6</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>130</th>\n",
" <td>7.4</td>\n",
" <td>2.8</td>\n",
" <td>6.1</td>\n",
" <td>1.9</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>131</th>\n",
" <td>7.9</td>\n",
" <td>3.8</td>\n",
" <td>6.4</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>132</th>\n",
" <td>6.4</td>\n",
" <td>2.8</td>\n",
" <td>5.6</td>\n",
" <td>2.2</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>133</th>\n",
" <td>6.3</td>\n",
" <td>2.8</td>\n",
" <td>5.1</td>\n",
" <td>1.5</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>134</th>\n",
" <td>6.1</td>\n",
" <td>2.6</td>\n",
" <td>5.6</td>\n",
" <td>1.4</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>135</th>\n",
" <td>7.7</td>\n",
" <td>3.0</td>\n",
" <td>6.1</td>\n",
" <td>2.3</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>136</th>\n",
" <td>6.3</td>\n",
" <td>3.4</td>\n",
" <td>5.6</td>\n",
" <td>2.4</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>137</th>\n",
" <td>6.4</td>\n",
" <td>3.1</td>\n",
" <td>5.5</td>\n",
" <td>1.8</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>138</th>\n",
" <td>6.0</td>\n",
" <td>3.0</td>\n",
" <td>4.8</td>\n",
" <td>1.8</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>139</th>\n",
" <td>6.9</td>\n",
" <td>3.1</td>\n",
" <td>5.4</td>\n",
" <td>2.1</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>140</th>\n",
" <td>6.7</td>\n",
" <td>3.1</td>\n",
" <td>5.6</td>\n",
" <td>2.4</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>141</th>\n",
" <td>6.9</td>\n",
" <td>3.1</td>\n",
" <td>5.1</td>\n",
" <td>2.3</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>142</th>\n",
" <td>5.8</td>\n",
" <td>2.7</td>\n",
" <td>5.1</td>\n",
" <td>1.9</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>143</th>\n",
" <td>6.8</td>\n",
" <td>3.2</td>\n",
" <td>5.9</td>\n",
" <td>2.3</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>144</th>\n",
" <td>6.7</td>\n",
" <td>3.3</td>\n",
" <td>5.7</td>\n",
" <td>2.5</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>145</th>\n",
" <td>6.7</td>\n",
" <td>3.0</td>\n",
" <td>5.2</td>\n",
" <td>2.3</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>146</th>\n",
" <td>6.3</td>\n",
" <td>2.5</td>\n",
" <td>5.0</td>\n",
" <td>1.9</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>147</th>\n",
" <td>6.5</td>\n",
" <td>3.0</td>\n",
" <td>5.2</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>148</th>\n",
" <td>6.2</td>\n",
" <td>3.4</td>\n",
" <td>5.4</td>\n",
" <td>2.3</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>149</th>\n",
" <td>5.9</td>\n",
" <td>3.0</td>\n",
" <td>5.1</td>\n",
" <td>1.8</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) \\\n",
"100 6.3 3.3 6.0 2.5 \n",
"101 5.8 2.7 5.1 1.9 \n",
"102 7.1 3.0 5.9 2.1 \n",
"103 6.3 2.9 5.6 1.8 \n",
"104 6.5 3.0 5.8 2.2 \n",
"105 7.6 3.0 6.6 2.1 \n",
"106 4.9 2.5 4.5 1.7 \n",
"107 7.3 2.9 6.3 1.8 \n",
"108 6.7 2.5 5.8 1.8 \n",
"109 7.2 3.6 6.1 2.5 \n",
"110 6.5 3.2 5.1 2.0 \n",
"111 6.4 2.7 5.3 1.9 \n",
"112 6.8 3.0 5.5 2.1 \n",
"113 5.7 2.5 5.0 2.0 \n",
"114 5.8 2.8 5.1 2.4 \n",
"115 6.4 3.2 5.3 2.3 \n",
"116 6.5 3.0 5.5 1.8 \n",
"117 7.7 3.8 6.7 2.2 \n",
"118 7.7 2.6 6.9 2.3 \n",
"119 6.0 2.2 5.0 1.5 \n",
"120 6.9 3.2 5.7 2.3 \n",
"121 5.6 2.8 4.9 2.0 \n",
"122 7.7 2.8 6.7 2.0 \n",
"123 6.3 2.7 4.9 1.8 \n",
"124 6.7 3.3 5.7 2.1 \n",
"125 7.2 3.2 6.0 1.8 \n",
"126 6.2 2.8 4.8 1.8 \n",
"127 6.1 3.0 4.9 1.8 \n",
"128 6.4 2.8 5.6 2.1 \n",
"129 7.2 3.0 5.8 1.6 \n",
"130 7.4 2.8 6.1 1.9 \n",
"131 7.9 3.8 6.4 2.0 \n",
"132 6.4 2.8 5.6 2.2 \n",
"133 6.3 2.8 5.1 1.5 \n",
"134 6.1 2.6 5.6 1.4 \n",
"135 7.7 3.0 6.1 2.3 \n",
"136 6.3 3.4 5.6 2.4 \n",
"137 6.4 3.1 5.5 1.8 \n",
"138 6.0 3.0 4.8 1.8 \n",
"139 6.9 3.1 5.4 2.1 \n",
"140 6.7 3.1 5.6 2.4 \n",
"141 6.9 3.1 5.1 2.3 \n",
"142 5.8 2.7 5.1 1.9 \n",
"143 6.8 3.2 5.9 2.3 \n",
"144 6.7 3.3 5.7 2.5 \n",
"145 6.7 3.0 5.2 2.3 \n",
"146 6.3 2.5 5.0 1.9 \n",
"147 6.5 3.0 5.2 2.0 \n",
"148 6.2 3.4 5.4 2.3 \n",
"149 5.9 3.0 5.1 1.8 \n",
"\n",
" species \n",
"100 virginica \n",
"101 virginica \n",
"102 virginica \n",
"103 virginica \n",
"104 virginica \n",
"105 virginica \n",
"106 virginica \n",
"107 virginica \n",
"108 virginica \n",
"109 virginica \n",
"110 virginica \n",
"111 virginica \n",
"112 virginica \n",
"113 virginica \n",
"114 virginica \n",
"115 virginica \n",
"116 virginica \n",
"117 virginica \n",
"118 virginica \n",
"119 virginica \n",
"120 virginica \n",
"121 virginica \n",
"122 virginica \n",
"123 virginica \n",
"124 virginica \n",
"125 virginica \n",
"126 virginica \n",
"127 virginica \n",
"128 virginica \n",
"129 virginica \n",
"130 virginica \n",
"131 virginica \n",
"132 virginica \n",
"133 virginica \n",
"134 virginica \n",
"135 virginica \n",
"136 virginica \n",
"137 virginica \n",
"138 virginica \n",
"139 virginica \n",
"140 virginica \n",
"141 virginica \n",
"142 virginica \n",
"143 virginica \n",
"144 virginica \n",
"145 virginica \n",
"146 virginica \n",
"147 virginica \n",
"148 virginica \n",
"149 virginica "
]
},
"metadata": {},
"output_type": "display_data"
}
],
2024-11-22 03:30:44 -05:00
"source": [
"for species, gdf in df.groupby('species'):\n",
" print(f\"=============== {species} ============\")\n",
" display(gdf)"
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 23,
2024-11-22 03:30:44 -05:00
"metadata": {},
2024-11-25 01:21:30 -05:00
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sepal length (cm)</th>\n",
" <th>sepal width (cm)</th>\n",
" <th>petal length (cm)</th>\n",
" <th>petal width (cm)</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>150.000000</td>\n",
" <td>150.000000</td>\n",
" <td>150.000000</td>\n",
" <td>150.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>5.843333</td>\n",
" <td>3.057333</td>\n",
" <td>3.758000</td>\n",
" <td>1.199333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>0.828066</td>\n",
" <td>0.435866</td>\n",
" <td>1.765298</td>\n",
" <td>0.762238</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>4.300000</td>\n",
" <td>2.000000</td>\n",
" <td>1.000000</td>\n",
" <td>0.100000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>5.100000</td>\n",
" <td>2.800000</td>\n",
" <td>1.600000</td>\n",
" <td>0.300000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>5.800000</td>\n",
" <td>3.000000</td>\n",
" <td>4.350000</td>\n",
" <td>1.300000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>6.400000</td>\n",
" <td>3.300000</td>\n",
" <td>5.100000</td>\n",
" <td>1.800000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>7.900000</td>\n",
" <td>4.400000</td>\n",
" <td>6.900000</td>\n",
" <td>2.500000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sepal length (cm) sepal width (cm) petal length (cm) \\\n",
"count 150.000000 150.000000 150.000000 \n",
"mean 5.843333 3.057333 3.758000 \n",
"std 0.828066 0.435866 1.765298 \n",
"min 4.300000 2.000000 1.000000 \n",
"25% 5.100000 2.800000 1.600000 \n",
"50% 5.800000 3.000000 4.350000 \n",
"75% 6.400000 3.300000 5.100000 \n",
"max 7.900000 4.400000 6.900000 \n",
"\n",
" petal width (cm) \n",
"count 150.000000 \n",
"mean 1.199333 \n",
"std 0.762238 \n",
"min 0.100000 \n",
"25% 0.300000 \n",
"50% 1.300000 \n",
"75% 1.800000 \n",
"max 2.500000 "
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
2024-11-22 03:30:44 -05:00
"source": [
"df.describe()"
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 24,
2024-11-22 03:30:44 -05:00
"metadata": {},
2024-11-25 01:21:30 -05:00
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"=============== setosa ============\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sepal length (cm)</th>\n",
" <th>sepal width (cm)</th>\n",
" <th>petal length (cm)</th>\n",
" <th>petal width (cm)</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>50.00000</td>\n",
" <td>50.000000</td>\n",
" <td>50.000000</td>\n",
" <td>50.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>5.00600</td>\n",
" <td>3.428000</td>\n",
" <td>1.462000</td>\n",
" <td>0.246000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>0.35249</td>\n",
" <td>0.379064</td>\n",
" <td>0.173664</td>\n",
" <td>0.105386</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>4.30000</td>\n",
" <td>2.300000</td>\n",
" <td>1.000000</td>\n",
" <td>0.100000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>4.80000</td>\n",
" <td>3.200000</td>\n",
" <td>1.400000</td>\n",
" <td>0.200000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>5.00000</td>\n",
" <td>3.400000</td>\n",
" <td>1.500000</td>\n",
" <td>0.200000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>5.20000</td>\n",
" <td>3.675000</td>\n",
" <td>1.575000</td>\n",
" <td>0.300000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>5.80000</td>\n",
" <td>4.400000</td>\n",
" <td>1.900000</td>\n",
" <td>0.600000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sepal length (cm) sepal width (cm) petal length (cm) \\\n",
"count 50.00000 50.000000 50.000000 \n",
"mean 5.00600 3.428000 1.462000 \n",
"std 0.35249 0.379064 0.173664 \n",
"min 4.30000 2.300000 1.000000 \n",
"25% 4.80000 3.200000 1.400000 \n",
"50% 5.00000 3.400000 1.500000 \n",
"75% 5.20000 3.675000 1.575000 \n",
"max 5.80000 4.400000 1.900000 \n",
"\n",
" petal width (cm) \n",
"count 50.000000 \n",
"mean 0.246000 \n",
"std 0.105386 \n",
"min 0.100000 \n",
"25% 0.200000 \n",
"50% 0.200000 \n",
"75% 0.300000 \n",
"max 0.600000 "
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"=============== versicolor ============\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sepal length (cm)</th>\n",
" <th>sepal width (cm)</th>\n",
" <th>petal length (cm)</th>\n",
" <th>petal width (cm)</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>50.000000</td>\n",
" <td>50.000000</td>\n",
" <td>50.000000</td>\n",
" <td>50.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>5.936000</td>\n",
" <td>2.770000</td>\n",
" <td>4.260000</td>\n",
" <td>1.326000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>0.516171</td>\n",
" <td>0.313798</td>\n",
" <td>0.469911</td>\n",
" <td>0.197753</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>4.900000</td>\n",
" <td>2.000000</td>\n",
" <td>3.000000</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>5.600000</td>\n",
" <td>2.525000</td>\n",
" <td>4.000000</td>\n",
" <td>1.200000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>5.900000</td>\n",
" <td>2.800000</td>\n",
" <td>4.350000</td>\n",
" <td>1.300000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>6.300000</td>\n",
" <td>3.000000</td>\n",
" <td>4.600000</td>\n",
" <td>1.500000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>7.000000</td>\n",
" <td>3.400000</td>\n",
" <td>5.100000</td>\n",
" <td>1.800000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sepal length (cm) sepal width (cm) petal length (cm) \\\n",
"count 50.000000 50.000000 50.000000 \n",
"mean 5.936000 2.770000 4.260000 \n",
"std 0.516171 0.313798 0.469911 \n",
"min 4.900000 2.000000 3.000000 \n",
"25% 5.600000 2.525000 4.000000 \n",
"50% 5.900000 2.800000 4.350000 \n",
"75% 6.300000 3.000000 4.600000 \n",
"max 7.000000 3.400000 5.100000 \n",
"\n",
" petal width (cm) \n",
"count 50.000000 \n",
"mean 1.326000 \n",
"std 0.197753 \n",
"min 1.000000 \n",
"25% 1.200000 \n",
"50% 1.300000 \n",
"75% 1.500000 \n",
"max 1.800000 "
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"=============== virginica ============\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sepal length (cm)</th>\n",
" <th>sepal width (cm)</th>\n",
" <th>petal length (cm)</th>\n",
" <th>petal width (cm)</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>50.00000</td>\n",
" <td>50.000000</td>\n",
" <td>50.000000</td>\n",
" <td>50.00000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>6.58800</td>\n",
" <td>2.974000</td>\n",
" <td>5.552000</td>\n",
" <td>2.02600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>0.63588</td>\n",
" <td>0.322497</td>\n",
" <td>0.551895</td>\n",
" <td>0.27465</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>4.90000</td>\n",
" <td>2.200000</td>\n",
" <td>4.500000</td>\n",
" <td>1.40000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>6.22500</td>\n",
" <td>2.800000</td>\n",
" <td>5.100000</td>\n",
" <td>1.80000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>6.50000</td>\n",
" <td>3.000000</td>\n",
" <td>5.550000</td>\n",
" <td>2.00000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>6.90000</td>\n",
" <td>3.175000</td>\n",
" <td>5.875000</td>\n",
" <td>2.30000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>7.90000</td>\n",
" <td>3.800000</td>\n",
" <td>6.900000</td>\n",
" <td>2.50000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sepal length (cm) sepal width (cm) petal length (cm) \\\n",
"count 50.00000 50.000000 50.000000 \n",
"mean 6.58800 2.974000 5.552000 \n",
"std 0.63588 0.322497 0.551895 \n",
"min 4.90000 2.200000 4.500000 \n",
"25% 6.22500 2.800000 5.100000 \n",
"50% 6.50000 3.000000 5.550000 \n",
"75% 6.90000 3.175000 5.875000 \n",
"max 7.90000 3.800000 6.900000 \n",
"\n",
" petal width (cm) \n",
"count 50.00000 \n",
"mean 2.02600 \n",
"std 0.27465 \n",
"min 1.40000 \n",
"25% 1.80000 \n",
"50% 2.00000 \n",
"75% 2.30000 \n",
"max 2.50000 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
2024-11-22 03:30:44 -05:00
"source": [
"for species, gdf in df.groupby('species'):\n",
" print(f\"=============== {species} ============\")\n",
" display(gdf.describe())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# More about Pandas `DataFrame`s\n",
"\n",
"Let's get started by making a sample dataframe with fake data:"
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 25,
2024-11-22 03:30:44 -05:00
"metadata": {},
2024-11-25 01:21:30 -05:00
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>number</th>\n",
" <th>color</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>blue</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>blue</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>red</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>6</td>\n",
" <td>red</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2</td>\n",
" <td>red</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>3</td>\n",
" <td>blue</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>2</td>\n",
" <td>blue</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>2</td>\n",
" <td>red</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>1</td>\n",
" <td>green</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>2</td>\n",
" <td>yellow</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" number color\n",
"0 1 blue\n",
"1 2 blue\n",
"2 3 red\n",
"3 6 red\n",
"4 2 red\n",
"5 3 blue\n",
"6 2 blue\n",
"7 2 red\n",
"8 1 green\n",
"9 2 yellow"
]
},
"metadata": {},
"output_type": "display_data"
}
],
2024-11-22 03:30:44 -05:00
"source": [
"sample_df = pd.DataFrame({'number':[1,2,3,6,2,3,2,2,1,2], 'color':['blue','blue','red','red','red','blue','blue','red','green','yellow']})\n",
"display(sample_df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Getting a `Series` from a `DataFrame` in Pandas\n",
"\n",
"You can get a `Series` (a Pandas 1D array) from a dataframe column by indexing with the column name:"
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 26,
2024-11-22 03:30:44 -05:00
"metadata": {},
2024-11-25 01:21:30 -05:00
"outputs": [
{
"data": {
"text/plain": [
"0 blue\n",
"1 blue\n",
"2 red\n",
"3 red\n",
"4 red\n",
"5 blue\n",
"6 blue\n",
"7 red\n",
"8 green\n",
"9 yellow\n",
"Name: color, dtype: object"
]
},
"metadata": {},
"output_type": "display_data"
}
],
2024-11-22 03:30:44 -05:00
"source": [
"colors = sample_df['color']\n",
"display(colors)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Above we got a `Series` from a `DataFrame` column with dictionary-like get item using square brackets and a string with the name of the column. In addition to this approach, Pandas also has an ergonomic feature where columns with names that are valid Python can be used with a dot (`.`) as if they were variables (also called \"attributes\") of the `DataFrame` instance."
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 27,
2024-11-22 03:30:44 -05:00
"metadata": {},
2024-11-25 01:21:30 -05:00
"outputs": [
{
"data": {
"text/plain": [
"0 blue\n",
"1 blue\n",
"2 red\n",
"3 red\n",
"4 red\n",
"5 blue\n",
"6 blue\n",
"7 red\n",
"8 green\n",
"9 yellow\n",
"Name: color, dtype: object"
]
},
"metadata": {},
"output_type": "display_data"
}
],
2024-11-22 03:30:44 -05:00
"source": [
"# This does the same thing as above because \"color\" is a valid Python attribute name.\n",
"\n",
"colors = sample_df.color\n",
"display(colors)"
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 28,
2024-11-22 03:30:44 -05:00
"metadata": {},
2024-11-25 01:21:30 -05:00
"outputs": [
{
"data": {
"text/plain": [
"pandas.core.series.Series"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
2024-11-22 03:30:44 -05:00
"source": [
"type(colors)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A `Series` has many useful methods, such as `.unique()` and `.mean()`"
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 29,
2024-11-22 03:30:44 -05:00
"metadata": {},
2024-11-25 01:21:30 -05:00
"outputs": [
{
"data": {
"text/plain": [
"array(['blue', 'red', 'green', 'yellow'], dtype=object)"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
2024-11-22 03:30:44 -05:00
"source": [
"colors.unique()"
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 30,
2024-11-22 03:30:44 -05:00
"metadata": {},
2024-11-25 01:21:30 -05:00
"outputs": [
{
"data": {
"text/plain": [
"2.4"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
2024-11-22 03:30:44 -05:00
"source": [
"sample_df['number'].mean()"
]
},
2024-11-25 01:21:30 -05:00
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"blue\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>number</th>\n",
" <th>color</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>blue</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>blue</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>3</td>\n",
" <td>blue</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>2</td>\n",
" <td>blue</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" number color\n",
"0 1 blue\n",
"1 2 blue\n",
"5 3 blue\n",
"6 2 blue"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"green\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>number</th>\n",
" <th>color</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>1</td>\n",
" <td>green</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" number color\n",
"8 1 green"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"red\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>number</th>\n",
" <th>color</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>red</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>6</td>\n",
" <td>red</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2</td>\n",
" <td>red</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>2</td>\n",
" <td>red</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" number color\n",
"2 3 red\n",
"3 6 red\n",
"4 2 red\n",
"7 2 red"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"yellow\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>number</th>\n",
" <th>color</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>2</td>\n",
" <td>yellow</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" number color\n",
"9 2 yellow"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"for color_name, color_df in sample_df.groupby('color'):\n",
" print(color_name)\n",
" display(color_df)"
]
},
2024-11-22 03:30:44 -05:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Pandas `read_csv`\n",
"\n",
"[CSV files](https://en.wikipedia.org/wiki/Comma-separated_values) are a very common and very good way to save data. Pandas has a good (and fast) reader for CSV files in the `read_csv()` function."
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 35,
2024-11-22 03:30:44 -05:00
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv('iris.csv')"
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 36,
2024-11-22 03:30:44 -05:00
"metadata": {},
2024-11-25 01:21:30 -05:00
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sepal_length</th>\n",
" <th>sepal_width</th>\n",
" <th>petal_length</th>\n",
" <th>petal_width</th>\n",
" <th>species</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>5.1</td>\n",
" <td>3.5</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>4.9</td>\n",
" <td>3.0</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>4.7</td>\n",
" <td>3.2</td>\n",
" <td>1.3</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4.6</td>\n",
" <td>3.1</td>\n",
" <td>1.5</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5.0</td>\n",
" <td>3.6</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>145</th>\n",
" <td>6.7</td>\n",
" <td>3.0</td>\n",
" <td>5.2</td>\n",
" <td>2.3</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>146</th>\n",
" <td>6.3</td>\n",
" <td>2.5</td>\n",
" <td>5.0</td>\n",
" <td>1.9</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>147</th>\n",
" <td>6.5</td>\n",
" <td>3.0</td>\n",
" <td>5.2</td>\n",
" <td>2.0</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>148</th>\n",
" <td>6.2</td>\n",
" <td>3.4</td>\n",
" <td>5.4</td>\n",
" <td>2.3</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>149</th>\n",
" <td>5.9</td>\n",
" <td>3.0</td>\n",
" <td>5.1</td>\n",
" <td>1.8</td>\n",
" <td>virginica</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>150 rows × 5 columns</p>\n",
"</div>"
],
"text/plain": [
" sepal_length sepal_width petal_length petal_width species\n",
"0 5.1 3.5 1.4 0.2 setosa\n",
"1 4.9 3.0 1.4 0.2 setosa\n",
"2 4.7 3.2 1.3 0.2 setosa\n",
"3 4.6 3.1 1.5 0.2 setosa\n",
"4 5.0 3.6 1.4 0.2 setosa\n",
".. ... ... ... ... ...\n",
"145 6.7 3.0 5.2 2.3 virginica\n",
"146 6.3 2.5 5.0 1.9 virginica\n",
"147 6.5 3.0 5.2 2.0 virginica\n",
"148 6.2 3.4 5.4 2.3 virginica\n",
"149 5.9 3.0 5.1 1.8 virginica\n",
"\n",
"[150 rows x 5 columns]"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
2024-11-22 03:30:44 -05:00
"source": [
"df"
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 37,
2024-11-22 03:30:44 -05:00
"metadata": {},
2024-11-25 01:21:30 -05:00
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sepal_length</th>\n",
" <th>sepal_width</th>\n",
" <th>petal_length</th>\n",
" <th>petal_width</th>\n",
" <th>species</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>5.1</td>\n",
" <td>3.5</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>4.9</td>\n",
" <td>3.0</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>4.7</td>\n",
" <td>3.2</td>\n",
" <td>1.3</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4.6</td>\n",
" <td>3.1</td>\n",
" <td>1.5</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5.0</td>\n",
" <td>3.6</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>setosa</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sepal_length sepal_width petal_length petal_width species\n",
"0 5.1 3.5 1.4 0.2 setosa\n",
"1 4.9 3.0 1.4 0.2 setosa\n",
"2 4.7 3.2 1.3 0.2 setosa\n",
"3 4.6 3.1 1.5 0.2 setosa\n",
"4 5.0 3.6 1.4 0.2 setosa"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
2024-11-22 03:30:44 -05:00
"source": [
"df.head()"
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 38,
2024-11-22 03:30:44 -05:00
"metadata": {},
2024-11-25 01:21:30 -05:00
"outputs": [
{
"data": {
"text/plain": [
"array(['setosa', 'versicolor', 'virginica'], dtype=object)"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
2024-11-22 03:30:44 -05:00
"source": [
"df['species'].unique()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conditions and selecting part of the data from a `DataFrame`.\n",
"\n",
"Let's consider an equality condition. Let's check every row to test if the 'color' column is equal to `'blue'`:"
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 39,
2024-11-22 03:30:44 -05:00
"metadata": {},
2024-11-25 01:21:30 -05:00
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>number</th>\n",
" <th>color</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>blue</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>blue</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>red</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>6</td>\n",
" <td>red</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2</td>\n",
" <td>red</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>3</td>\n",
" <td>blue</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>2</td>\n",
" <td>blue</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>2</td>\n",
" <td>red</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>1</td>\n",
" <td>green</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>2</td>\n",
" <td>yellow</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" number color\n",
"0 1 blue\n",
"1 2 blue\n",
"2 3 red\n",
"3 6 red\n",
"4 2 red\n",
"5 3 blue\n",
"6 2 blue\n",
"7 2 red\n",
"8 1 green\n",
"9 2 yellow"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
2024-11-22 03:30:44 -05:00
"source": [
"sample_df"
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 41,
2024-11-22 03:30:44 -05:00
"metadata": {},
2024-11-25 01:21:30 -05:00
"outputs": [
{
"data": {
"text/plain": [
"0 blue\n",
"1 blue\n",
"2 red\n",
"3 red\n",
"4 red\n",
"5 blue\n",
"6 blue\n",
"7 red\n",
"8 green\n",
"9 yellow\n",
"Name: color, dtype: object"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
2024-11-22 03:30:44 -05:00
"source": [
"sample_df['color']"
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 42,
2024-11-22 03:30:44 -05:00
"metadata": {},
2024-11-25 01:21:30 -05:00
"outputs": [
{
"data": {
"text/plain": [
"0 True\n",
"1 True\n",
"2 False\n",
"3 False\n",
"4 False\n",
"5 True\n",
"6 True\n",
"7 False\n",
"8 False\n",
"9 False\n",
"Name: color, dtype: bool"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
2024-11-22 03:30:44 -05:00
"source": [
"sample_df['color']=='blue'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As usual, we can assign this result to a variable:"
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 43,
2024-11-22 03:30:44 -05:00
"metadata": {},
"outputs": [],
"source": [
"condition = sample_df['color']=='blue'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see the type of this result is another `Series`"
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 44,
2024-11-22 03:30:44 -05:00
"metadata": {},
2024-11-25 01:21:30 -05:00
"outputs": [
{
"data": {
"text/plain": [
"pandas.core.series.Series"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
2024-11-22 03:30:44 -05:00
"source": [
"type(condition)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One very useful thing in Pandas is to create a new `DataFrame` based on a condition from an old one. Let's make a new `DataFrame` from only the rows with a blue color:"
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 45,
2024-11-22 03:30:44 -05:00
"metadata": {},
2024-11-25 01:21:30 -05:00
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>number</th>\n",
" <th>color</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>blue</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>blue</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>3</td>\n",
" <td>blue</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>2</td>\n",
" <td>blue</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" number color\n",
"0 1 blue\n",
"1 2 blue\n",
"5 3 blue\n",
"6 2 blue"
]
},
"metadata": {},
"output_type": "display_data"
}
],
2024-11-22 03:30:44 -05:00
"source": [
"condition = sample_df['color']=='blue'\n",
"blue_sample_df = sample_df[ condition ]\n",
"display(blue_sample_df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This could of course be written so:"
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 46,
2024-11-22 03:30:44 -05:00
"metadata": {},
2024-11-25 01:21:30 -05:00
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>number</th>\n",
" <th>color</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>blue</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>blue</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>3</td>\n",
" <td>blue</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>2</td>\n",
" <td>blue</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" number color\n",
"0 1 blue\n",
"1 2 blue\n",
"5 3 blue\n",
"6 2 blue"
]
},
"metadata": {},
"output_type": "display_data"
}
],
2024-11-22 03:30:44 -05:00
"source": [
"blue_sample_df = sample_df[ sample_df['color']=='blue' ]\n",
"display(blue_sample_df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## A key concept in Pandas is iterating over a dataframe, grouping by values in one (or more) columns\n",
"\n",
"This allows doing a lot of powerful datascience work which requires nothing more than storing your data in a well-organized format. This of course has other advantages as well. Let's have a look at our `groupby()` example again:"
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 1,
2024-11-22 03:30:44 -05:00
"metadata": {},
2024-11-25 01:21:30 -05:00
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"=============== setosa ============\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sepal_length</th>\n",
" <th>sepal_width</th>\n",
" <th>petal_length</th>\n",
" <th>petal_width</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>50.00000</td>\n",
" <td>50.000000</td>\n",
" <td>50.000000</td>\n",
" <td>50.00000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>5.00600</td>\n",
" <td>3.418000</td>\n",
" <td>1.464000</td>\n",
" <td>0.24400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>0.35249</td>\n",
" <td>0.381024</td>\n",
" <td>0.173511</td>\n",
" <td>0.10721</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>4.30000</td>\n",
" <td>2.300000</td>\n",
" <td>1.000000</td>\n",
" <td>0.10000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>4.80000</td>\n",
" <td>3.125000</td>\n",
" <td>1.400000</td>\n",
" <td>0.20000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>5.00000</td>\n",
" <td>3.400000</td>\n",
" <td>1.500000</td>\n",
" <td>0.20000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>5.20000</td>\n",
" <td>3.675000</td>\n",
" <td>1.575000</td>\n",
" <td>0.30000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>5.80000</td>\n",
" <td>4.400000</td>\n",
" <td>1.900000</td>\n",
" <td>0.60000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sepal_length sepal_width petal_length petal_width\n",
"count 50.00000 50.000000 50.000000 50.00000\n",
"mean 5.00600 3.418000 1.464000 0.24400\n",
"std 0.35249 0.381024 0.173511 0.10721\n",
"min 4.30000 2.300000 1.000000 0.10000\n",
"25% 4.80000 3.125000 1.400000 0.20000\n",
"50% 5.00000 3.400000 1.500000 0.20000\n",
"75% 5.20000 3.675000 1.575000 0.30000\n",
"max 5.80000 4.400000 1.900000 0.60000"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"=============== versicolor ============\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sepal_length</th>\n",
" <th>sepal_width</th>\n",
" <th>petal_length</th>\n",
" <th>petal_width</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>50.000000</td>\n",
" <td>50.000000</td>\n",
" <td>50.000000</td>\n",
" <td>50.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>5.936000</td>\n",
" <td>2.770000</td>\n",
" <td>4.260000</td>\n",
" <td>1.326000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>0.516171</td>\n",
" <td>0.313798</td>\n",
" <td>0.469911</td>\n",
" <td>0.197753</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>4.900000</td>\n",
" <td>2.000000</td>\n",
" <td>3.000000</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>5.600000</td>\n",
" <td>2.525000</td>\n",
" <td>4.000000</td>\n",
" <td>1.200000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>5.900000</td>\n",
" <td>2.800000</td>\n",
" <td>4.350000</td>\n",
" <td>1.300000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>6.300000</td>\n",
" <td>3.000000</td>\n",
" <td>4.600000</td>\n",
" <td>1.500000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>7.000000</td>\n",
" <td>3.400000</td>\n",
" <td>5.100000</td>\n",
" <td>1.800000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sepal_length sepal_width petal_length petal_width\n",
"count 50.000000 50.000000 50.000000 50.000000\n",
"mean 5.936000 2.770000 4.260000 1.326000\n",
"std 0.516171 0.313798 0.469911 0.197753\n",
"min 4.900000 2.000000 3.000000 1.000000\n",
"25% 5.600000 2.525000 4.000000 1.200000\n",
"50% 5.900000 2.800000 4.350000 1.300000\n",
"75% 6.300000 3.000000 4.600000 1.500000\n",
"max 7.000000 3.400000 5.100000 1.800000"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"=============== virginica ============\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sepal_length</th>\n",
" <th>sepal_width</th>\n",
" <th>petal_length</th>\n",
" <th>petal_width</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>50.00000</td>\n",
" <td>50.000000</td>\n",
" <td>50.000000</td>\n",
" <td>50.00000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>6.58800</td>\n",
" <td>2.974000</td>\n",
" <td>5.552000</td>\n",
" <td>2.02600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>0.63588</td>\n",
" <td>0.322497</td>\n",
" <td>0.551895</td>\n",
" <td>0.27465</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>4.90000</td>\n",
" <td>2.200000</td>\n",
" <td>4.500000</td>\n",
" <td>1.40000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>6.22500</td>\n",
" <td>2.800000</td>\n",
" <td>5.100000</td>\n",
" <td>1.80000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>6.50000</td>\n",
" <td>3.000000</td>\n",
" <td>5.550000</td>\n",
" <td>2.00000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>6.90000</td>\n",
" <td>3.175000</td>\n",
" <td>5.875000</td>\n",
" <td>2.30000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>7.90000</td>\n",
" <td>3.800000</td>\n",
" <td>6.900000</td>\n",
" <td>2.50000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sepal_length sepal_width petal_length petal_width\n",
"count 50.00000 50.000000 50.000000 50.00000\n",
"mean 6.58800 2.974000 5.552000 2.02600\n",
"std 0.63588 0.322497 0.551895 0.27465\n",
"min 4.90000 2.200000 4.500000 1.40000\n",
"25% 6.22500 2.800000 5.100000 1.80000\n",
"50% 6.50000 3.000000 5.550000 2.00000\n",
"75% 6.90000 3.175000 5.875000 2.30000\n",
"max 7.90000 3.800000 6.900000 2.50000"
]
},
"metadata": {},
"output_type": "display_data"
}
],
2024-11-22 03:30:44 -05:00
"source": [
2024-11-25 01:21:30 -05:00
"import pandas as pd\n",
2024-11-22 03:30:44 -05:00
"df = pd.read_csv('iris.csv')\n",
"for species, gdf in df.groupby('species'):\n",
" print(f\"=============== {species} ============\")\n",
" display(gdf.describe())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# matplotlib + pandas + ❤️ = seaborn\n",
"\n",
"[Seaborn](https://seaborn.pydata.org/) is \"a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.\" It makes heavy use of pandas to make your life easy."
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 2,
2024-11-22 03:30:44 -05:00
"metadata": {},
"outputs": [],
"source": [
"import seaborn as sns"
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 3,
2024-11-22 03:30:44 -05:00
"metadata": {},
2024-11-25 01:21:30 -05:00
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjcAAAGzCAYAAADT4Tb9AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAA/U0lEQVR4nO3deXhU9dnG8XuSkBDIBoEsQGRfQoCEBq0B2RqFglVwafUVeIWqFQE3KlJQUYp9Y1uriBUQalFEoVakxaosVRKVxSp7EZAdxAQQMAmCCUnO+wdmZMLsS2Zy8v1c11wXOXOWH/HhzOOZ3zm3xTAMQwAAACYRFuwBAAAA+BPNDQAAMBWaGwAAYCo0NwAAwFRobgAAgKnQ3AAAAFOhuQEAAKZCcwMAAEyF5gYAAJgKzQ0AADCViGAPoFpeXp6mTp2q+++/XzNnzrS7Tn5+vgYOHHjJ8p07d6pLly5uH6uqqkpfffWVYmNjZbFYvB0yAACoRYZhqLS0VC1atFBYmOPrMyHR3Hz66aeaN2+eevTo4db6u3fvVlxcnPXn5s2be3S8r776SmlpaR5tAwAAQsORI0fUqlUrh+8Hvbk5c+aMRowYofnz5+vJJ590a5ukpCQlJCR4fczY2FhJF345FzdJAAAgdJWUlCgtLc36Oe5I0Jub8ePH69prr9XVV1/tdnPTs2dPfffdd+rataseffRRu19VXaysrExlZWXWn0tLSyVJcXFxNDcAANQxrqaUBLW5WbJkiTZt2qRPP/3UrfVTU1M1b948ZWdnq6ysTK+++qpyc3OVn5+vfv36OdwuLy9P06dP99ewAQBACLMYhmEE48BHjhxRr169tGrVKmVmZkqSBgwYoKysLIcTiu257rrrZLFYtHz5cofr1LxyU31Zq7i4mCs3AADUESUlJYqPj3f5+R20W8E3btyo48ePKzs7WxEREYqIiFBBQYFmzZqliIgIVVZWurWfK6+8Unv27HG6TlRUlPUrKL6KAgDA3IL2tVRubq62b99us2zMmDHq0qWLJk+erPDwcLf2s3nzZqWmpgZiiAAAoA4KWnMTGxurbt262Sxr3LixEhMTrcunTJmio0ePauHChZKkmTNnqk2bNsrIyFB5ebkWLVqkpUuXaunSpbU+fgAAEJqCfreUM4WFhTp8+LD15/Lycj300EM6evSooqOjlZGRoXfeeUdDhw4N4igBAEAoCdqE4mByd0ISAAAIHSE/oRgAACAQaG4AAICp0NwAAABTobkBAACmEtJ3SyGwVu4o0uw1e/XFsTPqlByjcQM7aHBGSrCHBQCAT7hyU0+t3FGku1/dqK1fFuvc+Upt/bJYYxdt1ModRcEeGgAAPqG5qadmr9l7yTLDkGbn7wvCaAAA8B+am3rqi2Nn7C7fc6y0lkcCAIB/0dzUU52SY+wu75gcW8sjAQDAv2hu6qlxAzvIYrFdZrFI4we0D86AAADwE5qbempwRormjsxWZlqCGkWGKzMtQS+OzNYg7pYCANRx3Apejw3OSOHWbwCA6XDlBgAAmArNDQAAMBWaGwAAYCo0NwAAwFRobgAAgKnQ3AAAAFOhuQEAAKZCcwMAAEyF5gYAAJgKzQ0AADAVmhsAAGAqNDcAAMBUaG4AAICp0NwAAABTobkBAACmQnMDAABMheYGAACYCs0NAAAwFZobAABgKjQ3AADAVCKCPQCEvpU7ijR7zV59ceyMOiXHaNzADhqckRLsYQEAYBdXbuDUyh1FuvvVjdr6ZbHOna/U1i+LNXbRRq3cURTsoQEAYBfNDZyavWbvJcsMQ5qdvy8IowEAwDWaGzj1xbEzdpfvOVZayyMBAMA9NDdwqlNyjN3lHZNja3kkAAC4h+YGTo0b2EEWi+0yi0UaP6B9cAYEAIALNDdwanBGiuaOzFZmWoIaRYYrMy1BL47M1iDulgIAhChuBYdLgzNSuPUbAFBncOUGAACYCs0NAAAwFZobAABgKjQ3AADAVGhuAACAqdDcAAAAU+FWcHiMlHAAQCjjyg08Qko4ACDU0dzAI6SEAwBCHc0NPEJKOAAg1NHcwCOkhAMAQh3NDTxCSjgAINTR3MAjpIQDAEIdt4LDY6SEAwBCGVduAACAqdDcAAAAU6G5AQAApkJzAwAATIXmBgAAmErINDd5eXmyWCx64IEHnK5XUFCg7OxsNWzYUO3atdPcuXNrZ4AAAKBOCIlbwT/99FPNmzdPPXr0cLregQMHNHToUN11111atGiR1q5dq3Hjxql58+a66aabamm08AWJ4gCAQAv6lZszZ85oxIgRmj9/vpo0aeJ03blz5+qyyy7TzJkzlZ6erjvvvFO//OUv9fTTT9fSaOELEsUBALUh6M3N+PHjde211+rqq692ue769es1aNAgm2WDBw/WZ599pvPnzzvcrqysTCUlJTYv1D4SxQEAtSGozc2SJUu0adMm5eXlubV+UVGRkpOTbZYlJyeroqJCX3/9tcPt8vLyFB8fb32lpaX5NG54h0RxAEBtCFpzc+TIEd1///1atGiRGjZs6PZ2lhqpjYZh2F1+sSlTpqi4uNj6OnLkiHeDhk9IFAcA1IagNTcbN27U8ePHlZ2drYiICEVERKigoECzZs1SRESEKisrL9kmJSVFRUW28zOOHz+uiIgIJSYmOjxWVFSU4uLibF6ofSSKAwBqQ9DulsrNzdX27dttlo0ZM0ZdunTR5MmTFR4efsk2OTk5evvtt22WrVq1Sr169VKDBg0COl74rjpRfHb+Pu05VqqOybEaP6A9ieIAAL8KWnMTGxurbt262Sxr3LixEhMTrcunTJmio0ePauHChZKksWPH6s9//rMmTpyou+66S+vXr9dLL72kxYsX1/r44R0SxQEAgRb0u6WcKSws1OHDh60/t23bVu+++67y8/OVlZWlGTNmaNasWTzjBgAAWFmM6hm59UhJSYni4+NVXFzM/BsAAOoIdz+/Q/rKDQAAgKdobgAAgKnQ3AAAAFOhuQEAAKYSEqngCA5fErpJ9wYAhCrulqqnd0tVJ3RfzGKR5o7Mdtmk+LItAADe4m4pOOVLQjfp3gCAUEZzU0/5ktBNujcAIJTR3NRTviR0k+4NAAhlNDf1lC8J3aR7AwBCGc1NPVWd0J2ZlqBGkeHKTEvQiyOz3Uro9mVbAAACjbul6undUgAA1DXcLQUAAOolmhsAAGAqNDcAAMBUaG4AAICp0NwAAABTobkBAACmQio4LrFyR5Hy3t2pQ6fOyiLpsqaNNGVouk+J4ZJIEQdgWr6eN+FfPOeG59zYsJf4LUkWSXNHeZkYLqlmkZEiDsAsfD1vwn085wZesZf4LV1oTrxODLe3P1LEAZiEr+dN+B/NDWw4SvyWfEsM93Z/ABDqfD1vwv9obmDDUeK35FtiuLf7A4BQ5+t5E/5HcwMb1ZN/a7LIh8Rwe/sjRRyASfh63oT/0dzAxuCMFL04KlttEhspzCKFWaQ2iY304igfEsNHZevFUaSIAzAnX8+b8D/uluJuKQAA6gTulgIAAPUSzQ0AADAVmhsAAGAqNDcAAMBUaG4AAICp0NwAAABTIRUcLl2c8p0cFyVJOlZS5jTd214yOOFxAIDawHNueM6NU47SbqvZS/e2mwxOCjgAwEc85wZ+4Sjttpq9dG+7yeCkgAMAagnNDZxyJ+W7Zuqto21IxwUA1AaaGzjlTsp3zdRbR9uQjgsAqA00N3DKXsr3xeyle9tNBicFHABQS2hu4FTNlO82iY3UJrGR03Rvu8ngpIADAGoJd0txtxQAAHUCd0sBAIB6ieYGAACYCs0NAAAwFZobAABgKjQ3AADAVGhuAACAqZAKDrd5k/RNOjgA2OK8GHg854bn3LjFm6Rv0sEBwBbnRd/wnBv4lTdJ36SDA4Atzou1g+YGbvEm6Zt0cACwxXmxdtDcwC3eJH2TDg4Atjgv1g6aG7jFm6Rv0sEBwBbnxdrBhGImFLtt5Y4izc7fpz3HStUxOVbjB7R3mfTtzTYAYGacF73n7uc3zQ3NDQAAdQJ3SwEAgHqJ5gYAAJgKzQ0AADAVmhsAAGAqNDcAAMBUgtrczJk
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
2024-11-22 03:30:44 -05:00
"source": [
"sns.stripplot(x=\"species\", y=\"sepal_width\", data=df);"
]
},
{
"cell_type": "code",
2024-11-25 01:21:30 -05:00
"execution_count": 4,
2024-11-22 03:30:44 -05:00
"metadata": {},
2024-11-25 01:21:30 -05:00
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjcAAAGzCAYAAADT4Tb9AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAA/rElEQVR4nO3deXhU5d3/8c8kISEhGwSyAJF9CVuCQWtAticKBatYta2P4CNUqWwu5REpqCDVPtHWKmIFhFoRUagVaWmVxSqJymJlpwjIDmICCJiAYALJ+f1BM79MmOVkMpOZOXm/rivXRc7c556b+OXk65lzzsdmGIYhAAAAiwgL9AIAAAB8ieYGAABYCs0NAACwFJobAABgKTQ3AADAUmhuAACApdDcAAAAS6G5AQAAlkJzAwAALIXmBgAAWEpEoBdQKS8vT1OnTtVDDz2kmTNnOh2Tn5+vgQMHXrF9165d6ty5s+n3qqio0Ndff624uDjZbDZvlwwAAOqQYRg6e/asmjdvrrAw1+dngqK5+fzzzzVv3jz16NHD1Pg9e/YoPj7e/n2zZs1q9H5ff/210tPTa7QPAAAIDkePHlXLli1dvh7w5ubcuXMaPny45s+fr6efftrUPsnJyUpMTPT6PePi4iRd/uFUbZIAAEDwKikpUXp6uv33uCsBb27Gjx+vm266STfccIPp5qZnz576/vvv1aVLFz3++ONOP6qqqrS0VKWlpfbvz549K0mKj4+nuQEAIMR4uqQkoM3NkiVLtHnzZn3++eemxqelpWnevHnKzs5WaWmp3njjDeXm5io/P1/9+vVzuV9eXp5mzJjhq2UDAIAgZjMMwwjEGx89elS9evXS6tWrlZmZKUkaMGCAsrKyXF5Q7MzNN98sm82m5cuXuxxT/cxN5Wmt4uJiztwAABAiSkpKlJCQ4PH3d8BuBd+0aZNOnDih7OxsRUREKCIiQgUFBZo1a5YiIiJUXl5uap7rrrtOe/fudTsmKirK/hEUH0UBAGBtAftYKjc3Vzt27HDYNmrUKHXu3FmTJ09WeHi4qXm2bNmitLQ0fywRAACEoIA1N3FxcerWrZvDtkaNGikpKcm+fcqUKTp27JgWLlwoSZo5c6Zat26trl27qqysTIsWLdLSpUu1dOnSOl8/AAAITgG/W8qdwsJCHTlyxP59WVmZHnnkER07dkzR0dHq2rWr3nvvPQ0dOjSAqwQAAMEkYBcUB5LZC5IAAEDwCPoLigEAAPyB5gYAAFgKzQ0AALAUmhsAAGApQX23FOrOqp1Fmr1mn748fk4dU2I1bmB7De6aGuhlAQBQY5y5gVbtLNL9b2zStq+KdeFiubZ9VawxizZp1c6iQC8NAIAao7mBZq/Zd8U2w5Bm5+8PwGoAAKgdmhvoy+PnnG7fe/xsHa8EAIDao7mBOqbEOt3eISWujlcCAEDt0dxA4wa2l83muM1mk8YPaBeYBQEAUAs0N9DgrqmaOyJbmemJiokMV2Z6ol4Zka1B3C0FAAhB3AoOSZcbHG79BgBYAWduAACApdDcAAAAS6G5AQAAlkJzAwAALIXmBgAAWArNDQAAsBSaGwAAYCk0NwAAwFJobgAAgKXQ3AAAAEuhuQEAAJZCcwMAACyF5gYAAFgKzQ0AALAUmhsAAGApNDcAAMBSaG4AAICl0NwAAABLobkBAACWQnMDAAAsJSLQC4A1rdpZpNlr9unL4+fUMSVW4wa21+CuqYFeFgCgHuDMDXxu1c4i3f/GJm37qlgXLpZr21fFGrNok1btLAr00gAA9QDNDXxu9pp9V2wzDGl2/v4ArAYAUN/Q3MDnvjx+zun2vcfP1vFKAAD1Ec0NfK5jSqzT7R1S4up4JQCA+ojmBj43bmB72WyO22w2afyAdoFZEACgXqG5gc8N7pqquSOylZmeqJjIcGWmJ+qVEdkaxN1SAIA6wK3g8IvBXVO59RsAEBCcuQEAAJZCcwMAACyF5gYAAFgKzQ0AALAUmhsAAGApNDcAAMBSuBUcPkMSOAAgGHDmBj5BEjgAIFjQ3MAnSAIHAAQLmhv4BEngAIBgQXMDnyAJHAAQLGhu4BMkgQMAggXNDXyCJHAAQLDgVnD4DEngAIBgwJkbAABgKTQ3AADAUmhuAACApdDcAAAAS6G5AQAAlhI0zU1eXp5sNpsefvhht+MKCgqUnZ2thg0bqm3btpo7d27dLBAAAISEoLgV/PPPP9e8efPUo0cPt+MOHjyooUOHavTo0Vq0aJHWrl2rcePGqVmzZrr99tvraLXwFqnhAIC6EPAzN+fOndPw4cM1f/58NW7c2O3YuXPn6qqrrtLMmTOVkZGh++67Tz//+c/13HPP1dFq4S1SwwEAdSXgzc348eN100036YYbbvA4dv369Ro0aJDDtsGDB2vjxo26ePGiy/1KS0tVUlLi8IW6RWo4AKCuBLS5WbJkiTZv3qy8vDxT44uKipSSkuKwLSUlRZcuXdI333zjcr+8vDwlJCTYv9LT02u1btQcqeEAgLoSsObm6NGjeuihh7Ro0SI1bNjQ9H62aumMhmE43V7VlClTVFxcbP86evSod4uG10gNBwDUlYA1N5s2bdKJEyeUnZ2tiIgIRUREqKCgQLNmzVJERITKy8uv2Cc1NVVFRY7XaJw4cUIRERFKSkpy+V5RUVGKj493+ELdIjUcAFBXAna3VG5urnbs2OGwbdSoUercubMmT56s8PDwK/bJycnR3//+d4dtq1evVq9evdSgQQO/rhe1U5kaPjt/v/YeP6sOKXEaP6AdqeEAAJ8LWHMTFxenbt26OWxr1KiRkpKS7NunTJmiY8eOaeHChZKkMWPG6A9/+IMmTpyo0aNHa/369Xr11Ve1ePHiOl8/ao7UcABAXQj43VLuFBYW6siRI/bv27Rpo/fff1/5+fnKysrSU089pVmzZvGMGwAAYGczKq/IrUdKSkqUkJCg4uJirr8BACBEmP39HdRnbgAAAGqK5gYAAFgKzQ0AALAUmhsAAGApQZEKjtBgJtWb5G8AQKBxtxR3S5lSmepdlc0mzR2RbW9ezIwBAMBb3C0FnzKT6k3yNwAgGNDcwBQzqd4kfwMAggHNDUwxk+pN8jcAIBjQ3MAUM6neJH8DAIIBzQ1MqUz1zkxPVExkuDLTE/XKiGyHVG8zYwAA8DfuluJuKQAAQgJ3SwEAgHqJ5gYAAFgKzQ0AALAUmhsAAGApNDcAAMBSaG4AAIClkApej1UmeO8qPKvwMJvKKwxlpMX5LMnbXUI46eEAQoG3xyqOcYHFc27q6XNunCV4V/JFkre7hHBJpIcDCHrujmPujlXe7gfPeM4N3HKW4F3JF0ne7hLCSQ8HEAq8PVZxjAs8Ppaqp1wleFeqbZK3u4RwV+cKSQ8HEEzcHcf8sR98hzM39ZSrBO9KtU3ydpcQTno4gFDg7bGKY1zg0dzUU84SvCv5IsnbXUI46eEAQoG3xyqOcYHHBcX19IJi6T9X8+fv1+7CEoWH2XSpwlBGWrzGD2jnkyTvyvn3Hj+rDilxDvO6ew0AgoW3xyqOcf5h9vc3zU09bm4AAAgl3C0FAADqJZobAABgKTQ3AADAUmhuAACApdDcAAAAS6G5AQAAlkL8AkyrmnKbEh8lSTpeUkriNwAgqPCcG55zY4q7FHGJxG8AgP+Z/f3NmRuY4i5FXKqSeOukV658jeYGAFAXaG5giqcUcYnEbwBAcOCCYpjiKUVcIvEbABAcaG5girsUcYnEbwBA8KC5gSmDu6Zq7ohsZaYnKiYyXK2TYtS6aSPFRIYrMz1Rr4zI1qCuqVeMq/oaAAB1gbuluFsKAICQQCo4AACol2huAACApdDcAAAAS6G5AQAAlkJzAwAALIXmBgAAWArxC/CK2YRwAIA5VY+rHEtrh+fc8JybGjObEM4/SgAwx9lxlWPplXjODfzGdEI4AMAUZ8dVjqXeo7lBjZlNCAcAmOPquMqx1Ds0N6gxswnhAABzXB1XOZZ6h+YGNWY2IRwAYI6z4yrHUu/R3KDGnCaEJ8WQAg4AXqp+XOVYWjvcLcXdUgAAhATulgIAAPUSzQ0AALAUmhs
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
2024-11-22 03:30:44 -05:00
"source": [
"sns.stripplot(x=\"species\", y=\"sepal_width\", data=pd.read_csv('iris.csv'));"
]
},
2024-11-25 01:21:30 -05:00
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjcAAAGzCAYAAADT4Tb9AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAB6+0lEQVR4nO3dd3zT1f7H8VdGk8500dLSlrJH2TIE2VRR4KJet+DAPRhucOK6iigiuEC5XhRU8Ke4UWQWZIlAAYGyRwtt6d5p2jT5/ZFLryVJm5S0SdPP8/HoQ/me5OT0S2g+/X7POW+F2Ww2I4QQQgjhJZTuHoAQQgghhCtJcSOEEEIIryLFjRBCCCG8ihQ3QgghhPAqUtwIIYQQwqtIcSOEEEIIryLFjRBCCCG8ihQ3QgghhPAqUtwIIYQQwqtIcSOEEEIIr6J29wDOmzVrFs8++yyPPPII8+bNs/mYpKQkRo4caXU8JSWFLl26OPxaJpOJ9PR0goKCUCgU9R2yEEIIIRqR2WymuLiYVq1aoVTavz7jEcXNn3/+yccff0zPnj0devzhw4fR6XTVf46IiHDq9dLT04mLi3PqOUIIIYTwDGlpacTGxtptd3txU1JSwsSJE1m0aBH/+te/HHpOZGQkISEh9X7NoKAgwHJy/l4kCSGEEMJzFRUVERcXV/05bo/bi5vJkyczbtw4Lr/8coeLmz59+lBeXk5CQgLPP/+8zVtVf2cwGDAYDNV/Li4uBkCn00lxI4QQQjQxdU0pcWtxs3z5cnbv3s2ff/7p0OOjo6P5+OOP6du3LwaDgaVLl5KYmEhSUhLDhg2z+7xZs2bx8ssvu2rYQgghhPBgCrPZbHbHC6elpdGvXz9Wr15Nr169ABgxYgS9e/e2O6HYlvHjx6NQKPjxxx/tPubCKzfnL2sVFhbKlRshhBCiiSgqKiI4OLjOz2+3LQXftWsXWVlZ9O3bF7VajVqtZuPGjbz77ruo1Wqqqqoc6mfgwIEcPXq01sdotdrqW1ByK0oIIYTwbm67LZWYmMhff/1V49hdd91Fly5dmDFjBiqVyqF+kpOTiY6OboghCiGEaCKqqqqorKx09zDERfLx8XH48782bitugoKC6N69e41jAQEBhIeHVx9/5plnOHv2LEuWLAFg3rx5tGnThm7dulFRUcHnn3/OihUrWLFiRaOPXwghhPuZzWYyMzMpKChw91CEi4SEhBAVFXVR+9C5fbVUbTIyMkhNTa3+c0VFBU8++SRnz57Fz8+Pbt26sXLlSsaOHevGUQohhHCX84VNZGQk/v7+sjFrE2Y2mykrKyMrKwvgou7KuG1CsTs5OiFJCCGE56qqquLIkSNERkYSHh7u7uEIF8nNzSUrK4tOnTpZ3aLy+AnFQgghxMU4P8fG39/fzSMRrnT+7/Ni5lBJcSOEEKJJk1tR3sUVf58ePedGeA6z2cy5YgPGKhMalZJIna+7hySEEELYJMWNqFNuiYFf92fy3vqjnCsyEBfmx1OjuzC0YwtCAzTuHp4QQjR7kyZNoqCggO+//97dQ/EIUtyIWpUajHyYdIxPNp+qPpaWp2fa8mSeH9eVOwbFo1Ff/J4EQggh6m/+/Pk0w/VBdsmcG1GrnBIDi7ecstk2d80RsooNNtuEEEI0nuDgYEJCQtw9DI8hxY2oVUZhOSY7vwyUVVRRUCY7ggohBMA333xDjx498PPzIzw8nMsvv5zS0lImTZrEtddey8svv0xkZCQ6nY4HHniAioqK6ueazWbefPNN2rVrh5+fH7169eKbb76p0f+BAwcYN24cOp2OoKAghg4dyvHjxwGqX8PR/vLz85k4cSIRERH4+fnRsWNHFi9e3LAnqBHJbSlRKz+f2m85adRSHwshREZGBrfeeitvvvkm//znPykuLub333+vvlW0bt06fH192bBhA6dOneKuu+6iRYsWvPbaawA8//zzfPvttyxYsICOHTuyadMmbrvtNiIiIhg+fDhnz55l2LBhjBgxgvXr16PT6diyZQtGo9HmeOrq74UXXuDgwYP8+uuvtGjRgmPHjqHX6xvtfDU0KW5ErSJ1WiKCtGTbuP3UqWUgYf4yoVgIITIyMjAajVx33XXEx8cD0KNHj+p2jUbDf/7zH/z9/enWrRuvvPIKTz31FK+++ip6vZ65c+eyfv16Bg0aBEC7du3YvHkzH330EcOHD+eDDz4gODiY5cuX4+PjA0CnTp1sjqW0tLTO/lJTU+nTpw/9+vUDoE2bNg11atxCihtRqyidL/++ox+3LtpOWcX/ktpD/X14f8IltAjSunF0QgjhGXr16kViYiI9evTgyiuvZPTo0dxwww2EhoZWt/99s8FBgwZRUlJCWloaWVlZlJeXc8UVV9Tos6Kigj59+gCwZ88ehg4dWl3Y1ObgwYN19vfQQw9x/fXXs3v3bkaPHs21117LZZdddlHnwJNIcSNqpVAo6B4TzG+PDmP7iVwOZxbTu3UIfeJCiQn1c/fwhBDCI6hUKtasWcPWrVtZvXo17733Hs899xx//PFHrc9TKBSYTCYAVq5cSUxMTI12rdbyC6Sfn+M/bx3pb8yYMZw+fZqVK1eydu1aEhMTmTx5MnPmzHH4dTyZFDeiTiqlgrgwf+LCZItzIYSwR6FQMHjwYAYPHszMmTOJj4/nu+++A2Dv3r3o9frqImX79u0EBgYSGxtLaGgoWq2W1NRUhg8fbrPvnj178tlnn1FZWVnn1ZuEhIQ6+wOIiIhg0qRJTJo0iaFDh/LUU09JcSOEEEIIiz/++IN169YxevRoIiMj+eOPP8jOzqZr167s27ePiooK7rnnHp5//nlOnz7Niy++yJQpU1AqlQQFBfHkk0/y2GOPYTKZGDJkCEVFRWzdupXAwEDuvPNOpkyZwnvvvcctt9zCM888Q3BwMNu3b2fAgAF07ty5xlgc6W/mzJn07duXbt26YTAY+Pnnn+nataubzp7rSXEjhBBCXCSdTsemTZuYN28eRUVFxMfH8/bbbzNmzBi++uorEhMT6dixI8OGDcNgMHDLLbfw0ksvVT//1VdfJTIyklmzZnHixAlCQkK45JJLePbZZwEIDw9n/fr1PPXUUwwfPhyVSkXv3r0ZPHiwzfHU1Z9Go+GZZ57h1KlT+Pn5MXToUJYvX97g56mxKMzNcEtDRyPThRBCeK7y8nJOnjxJ27Zt8fX13Lw7iUZwTm1/r45+fssmJUIIIYTwKlLcCCGEEMKryJwbIYQQogF9+umn7h5CsyNXboQQQgjhVaS4EUIIIYRXkeJGCCGEEF5FihshhBBCeBUpboQQQgjhVaS4EUIIIYRXkeJGCCGEEF5FihshhBCiiTh16hQKhYI9e/a4eygeTTbxE0II0awVllWQU1JBUXklOj8fWgRoCPbXuHtY4iLIlRshhBDNVnqBninLkkmcu5F/friVxLc3MnVZMukF+gZ93W+++YYePXrg5+dHeHg4l19+OaWlpQAsXryYrl274uvrS5cuXfjwww+rn9e2bVsA+vTpg0KhYMSIEQCYTCZeeeUVYmNj0Wq19O7dm1WrVlU/r6KigilTphAdHY2vry9t2rRh1qxZ1e1z586lR48eBAQEEBcXx8MPP0xJSUmDnoOGJMWNEEKIZqmwrIIZK/bx+9GcGsc3Hc3h6RX7KCyraJDXzcjI4NZbb+Xuu+8mJSWFpKQkrrvuOsxmM4sWLeK5557jtddeIyUlhddff50XXniBzz77DIAdO3YAsHbtWjIyMvj2228BmD9/Pm+//TZz5sxh3759XHnllVx99dUcPXoUgHfffZcff/yR//u//+Pw4cN8/vnntGnTpnpMSqWSd999l/379/PZZ5+xfv16pk+f3iDff2NQmM1ms7sH0dgcjUwXQgjhucrLyzl58iRt27bF19fX6ecfzyohce5Gu+3rHh9O+8jAixmiTbt376Zv376cOnWK+Pj4Gm2tW7dm9uzZ3HrrrdXH/vWvf/HLL7+wdetWTp06Rdu2bUlOTqZ3797Vj4mJiWHy5Mk8++yz1ccGDBhA//79+eCDD5g2bRoHDhxg7dq1KBSKOsf49ddf89B
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x=\"petal_width\", y=\"sepal_width\", data=pd.read_csv('iris.csv'), hue=\"species\");"
]
},
2024-11-22 03:30:44 -05:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's take a minute to look at the [seaborn gallery](https://seaborn.pydata.org/examples/index.html).\n",
"\n",
"And while we are at it, we should not forget the [matplotlib gallery](https://matplotlib.org/stable/gallery/index.html)."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.10"
}
},
"nbformat": 4,
"nbformat_minor": 4
}