3482 lines
121 KiB
Text
3482 lines
121 KiB
Text
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Note**: Click on \"*Kernel*\" > \"*Restart Kernel and Clear All Outputs*\" in [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/) *before* reading this notebook to reset its output. If you cannot run this file on your machine, you may want to open it [in the cloud <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_mb.png\">](https://mybinder.org/v2/gh/webartifex/intro-to-data-science/main?urlpath=lab/tree/01_scientific_stack/02_content_pandas.ipynb)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Chapter 1: Python's Scientific Stack (Part 2)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"For practitioners, the [numpy <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_np.png\">](https://numpy.org/) library may feel a bit too \"technical\" or too close to \"real programming\" and they may prefer something that looks and feels more like Excel. That is where the [pandas <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/) library comes in.\n",
|
||
"\n",
|
||
"Let's first `pip` install and then `import` it."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 1,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Requirement already satisfied: pandas in /home/instructor/Repositories/intro-to-data-science/.venv/lib/python3.12/site-packages (2.2.2)\n",
|
||
"Requirement already satisfied: numpy>=1.26.0 in /home/instructor/Repositories/intro-to-data-science/.venv/lib/python3.12/site-packages (from pandas) (2.0.0)\n",
|
||
"Requirement already satisfied: python-dateutil>=2.8.2 in /home/instructor/Repositories/intro-to-data-science/.venv/lib/python3.12/site-packages (from pandas) (2.9.0.post0)\n",
|
||
"Requirement already satisfied: pytz>=2020.1 in /home/instructor/Repositories/intro-to-data-science/.venv/lib/python3.12/site-packages (from pandas) (2024.1)\n",
|
||
"Requirement already satisfied: tzdata>=2022.7 in /home/instructor/Repositories/intro-to-data-science/.venv/lib/python3.12/site-packages (from pandas) (2024.1)\n",
|
||
"Requirement already satisfied: six>=1.5 in /home/instructor/Repositories/intro-to-data-science/.venv/lib/python3.12/site-packages (from python-dateutil>=2.8.2->pandas) (1.16.0)\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"!pip install pandas"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 2,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import pandas as pd"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Excel-like Data with Pandas"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"In the same folder as this notebook there is a file named \"*orders.csv*\" that holds the order data of an urban meal delivery platform operating in Bordeaux, France. Open in with a double-click and take a look at its contents right here in JupyterLab!\n",
|
||
"\n",
|
||
"[pandas <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/) provides a [pd.read_csv() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html#pandas.read_csv) function that, as the name suggests, can open and read in CSV data. For Excel files, there is also a [pd.read_excel() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html) function but the CSV format is probably more widespread in use.\n",
|
||
"\n",
|
||
"Let's read in the \"*orders.csv*\" file with [pd.read_csv() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html#pandas.read_csv) specifying the \"order_id\" column as the **index**. Here, index is a column with *unique* values that allow the identification of each row in a dataset. If we don't specify an index column, [pandas <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/) creates a surrogate index as a sequence of integers 1, 2, 3, and so on."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 3,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"df = pd.read_csv(\"orders.csv\", index_col=\"order_id\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"`df` models a table-like data structure, comparable to one tab in an Excel file. [pandas <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/) and JupyterLab are designed to work well together: The `df` object shows a preview of the dataset below the code cell. The rows are the **records** in the dataset and the columns take the role of the **attributes** each record has. Each column comes with a **domain** of allowable values."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 4,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>placed_at</th>\n",
|
||
" <th>restaurant_id</th>\n",
|
||
" <th>restaurant</th>\n",
|
||
" <th>o_street</th>\n",
|
||
" <th>o_zip</th>\n",
|
||
" <th>o_city</th>\n",
|
||
" <th>o_latitude</th>\n",
|
||
" <th>o_longitude</th>\n",
|
||
" <th>customer_id</th>\n",
|
||
" <th>d_street</th>\n",
|
||
" <th>d_zip</th>\n",
|
||
" <th>d_city</th>\n",
|
||
" <th>d_latitude</th>\n",
|
||
" <th>d_longitude</th>\n",
|
||
" <th>total</th>\n",
|
||
" <th>courier_id</th>\n",
|
||
" <th>pickup_at</th>\n",
|
||
" <th>delivery_at</th>\n",
|
||
" <th>cancelled</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>order_id</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>192594</th>\n",
|
||
" <td>2016-07-18 12:23:13</td>\n",
|
||
" <td>1204</td>\n",
|
||
" <td>Max A Table</td>\n",
|
||
" <td>36 Rue Cornac</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.851402</td>\n",
|
||
" <td>-0.575870</td>\n",
|
||
" <td>10298</td>\n",
|
||
" <td>Rue Rolland 14</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.842592</td>\n",
|
||
" <td>-0.580521</td>\n",
|
||
" <td>2050</td>\n",
|
||
" <td>1423.0</td>\n",
|
||
" <td>2016-07-18 12:38:08</td>\n",
|
||
" <td>2016-07-18 12:48:22</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>192644</th>\n",
|
||
" <td>2016-07-18 12:48:55</td>\n",
|
||
" <td>1204</td>\n",
|
||
" <td>Max A Table</td>\n",
|
||
" <td>36 Rue Cornac</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.851402</td>\n",
|
||
" <td>-0.575870</td>\n",
|
||
" <td>6037</td>\n",
|
||
" <td>Rue Rolland 14</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.842592</td>\n",
|
||
" <td>-0.580521</td>\n",
|
||
" <td>2450</td>\n",
|
||
" <td>1426.0</td>\n",
|
||
" <td>2016-07-18 13:03:08</td>\n",
|
||
" <td>2016-07-18 13:12:01</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>192658</th>\n",
|
||
" <td>2016-07-18 13:00:13</td>\n",
|
||
" <td>1205</td>\n",
|
||
" <td>Taj Mahal</td>\n",
|
||
" <td>24 Rue Du Parlement Sainte-Catherine</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.840405</td>\n",
|
||
" <td>-0.573940</td>\n",
|
||
" <td>73830</td>\n",
|
||
" <td>Rue Batailley 12</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.838504</td>\n",
|
||
" <td>-0.591961</td>\n",
|
||
" <td>2550</td>\n",
|
||
" <td>1423.0</td>\n",
|
||
" <td>2016-07-18 13:19:04</td>\n",
|
||
" <td>2016-07-18 13:29:03</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>193242</th>\n",
|
||
" <td>2016-07-18 20:39:54</td>\n",
|
||
" <td>1208</td>\n",
|
||
" <td>Chez Ambre And Michel</td>\n",
|
||
" <td>1 Rue Matignon</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.850258</td>\n",
|
||
" <td>-0.586204</td>\n",
|
||
" <td>10298</td>\n",
|
||
" <td>Rue Rolland 14</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.842592</td>\n",
|
||
" <td>-0.580521</td>\n",
|
||
" <td>1550</td>\n",
|
||
" <td>1420.0</td>\n",
|
||
" <td>2016-07-18 20:55:52</td>\n",
|
||
" <td>2016-07-18 21:05:28</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>192719</th>\n",
|
||
" <td>2016-07-18 13:52:04</td>\n",
|
||
" <td>1206</td>\n",
|
||
" <td>La Maison Du Glacier</td>\n",
|
||
" <td>1 Place Saint Pierre</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.839706</td>\n",
|
||
" <td>-0.570672</td>\n",
|
||
" <td>6037</td>\n",
|
||
" <td>Rue Rolland 14</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.842592</td>\n",
|
||
" <td>-0.580521</td>\n",
|
||
" <td>2450</td>\n",
|
||
" <td>1426.0</td>\n",
|
||
" <td>2016-07-18 14:01:23</td>\n",
|
||
" <td>2016-07-18 14:08:36</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>212021</th>\n",
|
||
" <td>2016-07-30 22:29:52</td>\n",
|
||
" <td>1249</td>\n",
|
||
" <td>Pitaya Sainte Catherine</td>\n",
|
||
" <td>275 Rue Sainte Catherine</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.831692</td>\n",
|
||
" <td>-0.573207</td>\n",
|
||
" <td>80400</td>\n",
|
||
" <td>Boulevard President Franklin Roosevelt 15</td>\n",
|
||
" <td>33400</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.820591</td>\n",
|
||
" <td>-0.582048</td>\n",
|
||
" <td>2250</td>\n",
|
||
" <td>1410.0</td>\n",
|
||
" <td>2016-07-30 22:50:16</td>\n",
|
||
" <td>2016-07-30 23:02:54</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>211501</th>\n",
|
||
" <td>2016-07-30 20:44:50</td>\n",
|
||
" <td>1204</td>\n",
|
||
" <td>Max A Table</td>\n",
|
||
" <td>36 Rue Cornac</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.851402</td>\n",
|
||
" <td>-0.575870</td>\n",
|
||
" <td>80163</td>\n",
|
||
" <td>Rue Marsan 22</td>\n",
|
||
" <td>33300</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.856133</td>\n",
|
||
" <td>-0.576172</td>\n",
|
||
" <td>1250</td>\n",
|
||
" <td>1415.0</td>\n",
|
||
" <td>2016-07-30 21:02:32</td>\n",
|
||
" <td>2016-07-30 21:06:19</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>211508</th>\n",
|
||
" <td>2016-07-30 20:45:55</td>\n",
|
||
" <td>1254</td>\n",
|
||
" <td>Funky Burger</td>\n",
|
||
" <td>5 Rue Du Loup</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.838081</td>\n",
|
||
" <td>-0.572281</td>\n",
|
||
" <td>80168</td>\n",
|
||
" <td>Rue Des Sablieres 42</td>\n",
|
||
" <td>33800</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.825488</td>\n",
|
||
" <td>-0.575264</td>\n",
|
||
" <td>1680</td>\n",
|
||
" <td>1461.0</td>\n",
|
||
" <td>2016-07-30 21:13:31</td>\n",
|
||
" <td>2016-07-30 21:19:45</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>211510</th>\n",
|
||
" <td>2016-07-30 20:46:05</td>\n",
|
||
" <td>1219</td>\n",
|
||
" <td>La Tagliatella</td>\n",
|
||
" <td>14 Rue Guiraude</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.839388</td>\n",
|
||
" <td>-0.574781</td>\n",
|
||
" <td>80169</td>\n",
|
||
" <td>Rue Pasteur 35</td>\n",
|
||
" <td>33200</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.845053</td>\n",
|
||
" <td>-0.601157</td>\n",
|
||
" <td>4085</td>\n",
|
||
" <td>1411.0</td>\n",
|
||
" <td>2016-07-30 21:11:00</td>\n",
|
||
" <td>2016-07-30 21:23:24</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>211519</th>\n",
|
||
" <td>2016-07-30 20:46:55</td>\n",
|
||
" <td>1254</td>\n",
|
||
" <td>Funky Burger</td>\n",
|
||
" <td>5 Rue Du Loup</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.838081</td>\n",
|
||
" <td>-0.572281</td>\n",
|
||
" <td>80172</td>\n",
|
||
" <td>Rue Monadey 28</td>\n",
|
||
" <td>33800</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.828816</td>\n",
|
||
" <td>-0.570789</td>\n",
|
||
" <td>2050</td>\n",
|
||
" <td>1817.0</td>\n",
|
||
" <td>2016-07-30 21:05:46</td>\n",
|
||
" <td>2016-07-30 21:14:07</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>694 rows × 19 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" placed_at restaurant_id restaurant \\\n",
|
||
"order_id \n",
|
||
"192594 2016-07-18 12:23:13 1204 Max A Table \n",
|
||
"192644 2016-07-18 12:48:55 1204 Max A Table \n",
|
||
"192658 2016-07-18 13:00:13 1205 Taj Mahal \n",
|
||
"193242 2016-07-18 20:39:54 1208 Chez Ambre And Michel \n",
|
||
"192719 2016-07-18 13:52:04 1206 La Maison Du Glacier \n",
|
||
"... ... ... ... \n",
|
||
"212021 2016-07-30 22:29:52 1249 Pitaya Sainte Catherine \n",
|
||
"211501 2016-07-30 20:44:50 1204 Max A Table \n",
|
||
"211508 2016-07-30 20:45:55 1254 Funky Burger \n",
|
||
"211510 2016-07-30 20:46:05 1219 La Tagliatella \n",
|
||
"211519 2016-07-30 20:46:55 1254 Funky Burger \n",
|
||
"\n",
|
||
" o_street o_zip o_city o_latitude \\\n",
|
||
"order_id \n",
|
||
"192594 36 Rue Cornac 33000 Bordeaux 44.851402 \n",
|
||
"192644 36 Rue Cornac 33000 Bordeaux 44.851402 \n",
|
||
"192658 24 Rue Du Parlement Sainte-Catherine 33000 Bordeaux 44.840405 \n",
|
||
"193242 1 Rue Matignon 33000 Bordeaux 44.850258 \n",
|
||
"192719 1 Place Saint Pierre 33000 Bordeaux 44.839706 \n",
|
||
"... ... ... ... ... \n",
|
||
"212021 275 Rue Sainte Catherine 33000 Bordeaux 44.831692 \n",
|
||
"211501 36 Rue Cornac 33000 Bordeaux 44.851402 \n",
|
||
"211508 5 Rue Du Loup 33000 Bordeaux 44.838081 \n",
|
||
"211510 14 Rue Guiraude 33000 Bordeaux 44.839388 \n",
|
||
"211519 5 Rue Du Loup 33000 Bordeaux 44.838081 \n",
|
||
"\n",
|
||
" o_longitude customer_id d_street \\\n",
|
||
"order_id \n",
|
||
"192594 -0.575870 10298 Rue Rolland 14 \n",
|
||
"192644 -0.575870 6037 Rue Rolland 14 \n",
|
||
"192658 -0.573940 73830 Rue Batailley 12 \n",
|
||
"193242 -0.586204 10298 Rue Rolland 14 \n",
|
||
"192719 -0.570672 6037 Rue Rolland 14 \n",
|
||
"... ... ... ... \n",
|
||
"212021 -0.573207 80400 Boulevard President Franklin Roosevelt 15 \n",
|
||
"211501 -0.575870 80163 Rue Marsan 22 \n",
|
||
"211508 -0.572281 80168 Rue Des Sablieres 42 \n",
|
||
"211510 -0.574781 80169 Rue Pasteur 35 \n",
|
||
"211519 -0.572281 80172 Rue Monadey 28 \n",
|
||
"\n",
|
||
" d_zip d_city d_latitude d_longitude total courier_id \\\n",
|
||
"order_id \n",
|
||
"192594 33000 Bordeaux 44.842592 -0.580521 2050 1423.0 \n",
|
||
"192644 33000 Bordeaux 44.842592 -0.580521 2450 1426.0 \n",
|
||
"192658 33000 Bordeaux 44.838504 -0.591961 2550 1423.0 \n",
|
||
"193242 33000 Bordeaux 44.842592 -0.580521 1550 1420.0 \n",
|
||
"192719 33000 Bordeaux 44.842592 -0.580521 2450 1426.0 \n",
|
||
"... ... ... ... ... ... ... \n",
|
||
"212021 33400 Bordeaux 44.820591 -0.582048 2250 1410.0 \n",
|
||
"211501 33300 Bordeaux 44.856133 -0.576172 1250 1415.0 \n",
|
||
"211508 33800 Bordeaux 44.825488 -0.575264 1680 1461.0 \n",
|
||
"211510 33200 Bordeaux 44.845053 -0.601157 4085 1411.0 \n",
|
||
"211519 33800 Bordeaux 44.828816 -0.570789 2050 1817.0 \n",
|
||
"\n",
|
||
" pickup_at delivery_at cancelled \n",
|
||
"order_id \n",
|
||
"192594 2016-07-18 12:38:08 2016-07-18 12:48:22 0 \n",
|
||
"192644 2016-07-18 13:03:08 2016-07-18 13:12:01 0 \n",
|
||
"192658 2016-07-18 13:19:04 2016-07-18 13:29:03 0 \n",
|
||
"193242 2016-07-18 20:55:52 2016-07-18 21:05:28 0 \n",
|
||
"192719 2016-07-18 14:01:23 2016-07-18 14:08:36 0 \n",
|
||
"... ... ... ... \n",
|
||
"212021 2016-07-30 22:50:16 2016-07-30 23:02:54 0 \n",
|
||
"211501 2016-07-30 21:02:32 2016-07-30 21:06:19 0 \n",
|
||
"211508 2016-07-30 21:13:31 2016-07-30 21:19:45 0 \n",
|
||
"211510 2016-07-30 21:11:00 2016-07-30 21:23:24 0 \n",
|
||
"211519 2016-07-30 21:05:46 2016-07-30 21:14:07 0 \n",
|
||
"\n",
|
||
"[694 rows x 19 columns]"
|
||
]
|
||
},
|
||
"execution_count": 4,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"The data type behind `df` is called a [pd.DataFrame <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame). `DataFrame`s are built around [numpy <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_np.png\">](https://numpy.org/)'s `ndarray`s providing an interface optimized for **interactive usage** (i.e., a data scientist exploring a dataset step by step)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 5,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"pandas.core.frame.DataFrame"
|
||
]
|
||
},
|
||
"execution_count": 5,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"type(df)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"`DataFrame`s come with many methdods.\n",
|
||
"\n",
|
||
"For example, [.head() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.head.html#pandas.DataFrame.head) and [.tail() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.tail.html#pandas.DataFrame.tail) show the first and last `n` rows, defaulting to `5`."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 6,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>placed_at</th>\n",
|
||
" <th>restaurant_id</th>\n",
|
||
" <th>restaurant</th>\n",
|
||
" <th>o_street</th>\n",
|
||
" <th>o_zip</th>\n",
|
||
" <th>o_city</th>\n",
|
||
" <th>o_latitude</th>\n",
|
||
" <th>o_longitude</th>\n",
|
||
" <th>customer_id</th>\n",
|
||
" <th>d_street</th>\n",
|
||
" <th>d_zip</th>\n",
|
||
" <th>d_city</th>\n",
|
||
" <th>d_latitude</th>\n",
|
||
" <th>d_longitude</th>\n",
|
||
" <th>total</th>\n",
|
||
" <th>courier_id</th>\n",
|
||
" <th>pickup_at</th>\n",
|
||
" <th>delivery_at</th>\n",
|
||
" <th>cancelled</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>order_id</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>192594</th>\n",
|
||
" <td>2016-07-18 12:23:13</td>\n",
|
||
" <td>1204</td>\n",
|
||
" <td>Max A Table</td>\n",
|
||
" <td>36 Rue Cornac</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.851402</td>\n",
|
||
" <td>-0.575870</td>\n",
|
||
" <td>10298</td>\n",
|
||
" <td>Rue Rolland 14</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.842592</td>\n",
|
||
" <td>-0.580521</td>\n",
|
||
" <td>2050</td>\n",
|
||
" <td>1423.0</td>\n",
|
||
" <td>2016-07-18 12:38:08</td>\n",
|
||
" <td>2016-07-18 12:48:22</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>192644</th>\n",
|
||
" <td>2016-07-18 12:48:55</td>\n",
|
||
" <td>1204</td>\n",
|
||
" <td>Max A Table</td>\n",
|
||
" <td>36 Rue Cornac</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.851402</td>\n",
|
||
" <td>-0.575870</td>\n",
|
||
" <td>6037</td>\n",
|
||
" <td>Rue Rolland 14</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.842592</td>\n",
|
||
" <td>-0.580521</td>\n",
|
||
" <td>2450</td>\n",
|
||
" <td>1426.0</td>\n",
|
||
" <td>2016-07-18 13:03:08</td>\n",
|
||
" <td>2016-07-18 13:12:01</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>192658</th>\n",
|
||
" <td>2016-07-18 13:00:13</td>\n",
|
||
" <td>1205</td>\n",
|
||
" <td>Taj Mahal</td>\n",
|
||
" <td>24 Rue Du Parlement Sainte-Catherine</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.840405</td>\n",
|
||
" <td>-0.573940</td>\n",
|
||
" <td>73830</td>\n",
|
||
" <td>Rue Batailley 12</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.838504</td>\n",
|
||
" <td>-0.591961</td>\n",
|
||
" <td>2550</td>\n",
|
||
" <td>1423.0</td>\n",
|
||
" <td>2016-07-18 13:19:04</td>\n",
|
||
" <td>2016-07-18 13:29:03</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>193242</th>\n",
|
||
" <td>2016-07-18 20:39:54</td>\n",
|
||
" <td>1208</td>\n",
|
||
" <td>Chez Ambre And Michel</td>\n",
|
||
" <td>1 Rue Matignon</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.850258</td>\n",
|
||
" <td>-0.586204</td>\n",
|
||
" <td>10298</td>\n",
|
||
" <td>Rue Rolland 14</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.842592</td>\n",
|
||
" <td>-0.580521</td>\n",
|
||
" <td>1550</td>\n",
|
||
" <td>1420.0</td>\n",
|
||
" <td>2016-07-18 20:55:52</td>\n",
|
||
" <td>2016-07-18 21:05:28</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>192719</th>\n",
|
||
" <td>2016-07-18 13:52:04</td>\n",
|
||
" <td>1206</td>\n",
|
||
" <td>La Maison Du Glacier</td>\n",
|
||
" <td>1 Place Saint Pierre</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.839706</td>\n",
|
||
" <td>-0.570672</td>\n",
|
||
" <td>6037</td>\n",
|
||
" <td>Rue Rolland 14</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.842592</td>\n",
|
||
" <td>-0.580521</td>\n",
|
||
" <td>2450</td>\n",
|
||
" <td>1426.0</td>\n",
|
||
" <td>2016-07-18 14:01:23</td>\n",
|
||
" <td>2016-07-18 14:08:36</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" placed_at restaurant_id restaurant \\\n",
|
||
"order_id \n",
|
||
"192594 2016-07-18 12:23:13 1204 Max A Table \n",
|
||
"192644 2016-07-18 12:48:55 1204 Max A Table \n",
|
||
"192658 2016-07-18 13:00:13 1205 Taj Mahal \n",
|
||
"193242 2016-07-18 20:39:54 1208 Chez Ambre And Michel \n",
|
||
"192719 2016-07-18 13:52:04 1206 La Maison Du Glacier \n",
|
||
"\n",
|
||
" o_street o_zip o_city o_latitude \\\n",
|
||
"order_id \n",
|
||
"192594 36 Rue Cornac 33000 Bordeaux 44.851402 \n",
|
||
"192644 36 Rue Cornac 33000 Bordeaux 44.851402 \n",
|
||
"192658 24 Rue Du Parlement Sainte-Catherine 33000 Bordeaux 44.840405 \n",
|
||
"193242 1 Rue Matignon 33000 Bordeaux 44.850258 \n",
|
||
"192719 1 Place Saint Pierre 33000 Bordeaux 44.839706 \n",
|
||
"\n",
|
||
" o_longitude customer_id d_street d_zip d_city \\\n",
|
||
"order_id \n",
|
||
"192594 -0.575870 10298 Rue Rolland 14 33000 Bordeaux \n",
|
||
"192644 -0.575870 6037 Rue Rolland 14 33000 Bordeaux \n",
|
||
"192658 -0.573940 73830 Rue Batailley 12 33000 Bordeaux \n",
|
||
"193242 -0.586204 10298 Rue Rolland 14 33000 Bordeaux \n",
|
||
"192719 -0.570672 6037 Rue Rolland 14 33000 Bordeaux \n",
|
||
"\n",
|
||
" d_latitude d_longitude total courier_id pickup_at \\\n",
|
||
"order_id \n",
|
||
"192594 44.842592 -0.580521 2050 1423.0 2016-07-18 12:38:08 \n",
|
||
"192644 44.842592 -0.580521 2450 1426.0 2016-07-18 13:03:08 \n",
|
||
"192658 44.838504 -0.591961 2550 1423.0 2016-07-18 13:19:04 \n",
|
||
"193242 44.842592 -0.580521 1550 1420.0 2016-07-18 20:55:52 \n",
|
||
"192719 44.842592 -0.580521 2450 1426.0 2016-07-18 14:01:23 \n",
|
||
"\n",
|
||
" delivery_at cancelled \n",
|
||
"order_id \n",
|
||
"192594 2016-07-18 12:48:22 0 \n",
|
||
"192644 2016-07-18 13:12:01 0 \n",
|
||
"192658 2016-07-18 13:29:03 0 \n",
|
||
"193242 2016-07-18 21:05:28 0 \n",
|
||
"192719 2016-07-18 14:08:36 0 "
|
||
]
|
||
},
|
||
"execution_count": 6,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 7,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>placed_at</th>\n",
|
||
" <th>restaurant_id</th>\n",
|
||
" <th>restaurant</th>\n",
|
||
" <th>o_street</th>\n",
|
||
" <th>o_zip</th>\n",
|
||
" <th>o_city</th>\n",
|
||
" <th>o_latitude</th>\n",
|
||
" <th>o_longitude</th>\n",
|
||
" <th>customer_id</th>\n",
|
||
" <th>d_street</th>\n",
|
||
" <th>d_zip</th>\n",
|
||
" <th>d_city</th>\n",
|
||
" <th>d_latitude</th>\n",
|
||
" <th>d_longitude</th>\n",
|
||
" <th>total</th>\n",
|
||
" <th>courier_id</th>\n",
|
||
" <th>pickup_at</th>\n",
|
||
" <th>delivery_at</th>\n",
|
||
" <th>cancelled</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>order_id</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>211510</th>\n",
|
||
" <td>2016-07-30 20:46:05</td>\n",
|
||
" <td>1219</td>\n",
|
||
" <td>La Tagliatella</td>\n",
|
||
" <td>14 Rue Guiraude</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.839388</td>\n",
|
||
" <td>-0.574781</td>\n",
|
||
" <td>80169</td>\n",
|
||
" <td>Rue Pasteur 35</td>\n",
|
||
" <td>33200</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.845053</td>\n",
|
||
" <td>-0.601157</td>\n",
|
||
" <td>4085</td>\n",
|
||
" <td>1411.0</td>\n",
|
||
" <td>2016-07-30 21:11:00</td>\n",
|
||
" <td>2016-07-30 21:23:24</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>211519</th>\n",
|
||
" <td>2016-07-30 20:46:55</td>\n",
|
||
" <td>1254</td>\n",
|
||
" <td>Funky Burger</td>\n",
|
||
" <td>5 Rue Du Loup</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.838081</td>\n",
|
||
" <td>-0.572281</td>\n",
|
||
" <td>80172</td>\n",
|
||
" <td>Rue Monadey 28</td>\n",
|
||
" <td>33800</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.828816</td>\n",
|
||
" <td>-0.570789</td>\n",
|
||
" <td>2050</td>\n",
|
||
" <td>1817.0</td>\n",
|
||
" <td>2016-07-30 21:05:46</td>\n",
|
||
" <td>2016-07-30 21:14:07</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" placed_at restaurant_id restaurant o_street \\\n",
|
||
"order_id \n",
|
||
"211510 2016-07-30 20:46:05 1219 La Tagliatella 14 Rue Guiraude \n",
|
||
"211519 2016-07-30 20:46:55 1254 Funky Burger 5 Rue Du Loup \n",
|
||
"\n",
|
||
" o_zip o_city o_latitude o_longitude customer_id \\\n",
|
||
"order_id \n",
|
||
"211510 33000 Bordeaux 44.839388 -0.574781 80169 \n",
|
||
"211519 33000 Bordeaux 44.838081 -0.572281 80172 \n",
|
||
"\n",
|
||
" d_street d_zip d_city d_latitude d_longitude total \\\n",
|
||
"order_id \n",
|
||
"211510 Rue Pasteur 35 33200 Bordeaux 44.845053 -0.601157 4085 \n",
|
||
"211519 Rue Monadey 28 33800 Bordeaux 44.828816 -0.570789 2050 \n",
|
||
"\n",
|
||
" courier_id pickup_at delivery_at cancelled \n",
|
||
"order_id \n",
|
||
"211510 1411.0 2016-07-30 21:11:00 2016-07-30 21:23:24 0 \n",
|
||
"211519 1817.0 2016-07-30 21:05:46 2016-07-30 21:14:07 0 "
|
||
]
|
||
},
|
||
"execution_count": 7,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.tail(2)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"[.info() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.info.html#pandas.DataFrame.info) shows on overview of the columns. In particular, it shows how many cells are filled in in a column (i.e., are \"non-null\") and what **data type** (i.e., \"dtype\") *all* values in a column have. \"int64\" and \"float64\" imply that there are only `int` and `float` values in a column (taking up to 64 bits or 1s and 0s in memory). \"object\" is [pandas <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/)' way of telling us it could not deduce any data type more specific than textual data. For the columns holding timestamps (e.g., \"placed_at\") we will convert the values further below.\n",
|
||
"\n",
|
||
"Looking at the output, we see that some columns hold the data of **origin**-**destination** pairs, corresponding to restaurants and customers. Other columns store data following the dispatch and delivery process of couriers picking up and delivering meals at various points in time."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 8,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"<class 'pandas.core.frame.DataFrame'>\n",
|
||
"Index: 694 entries, 192594 to 211519\n",
|
||
"Data columns (total 19 columns):\n",
|
||
" # Column Non-Null Count Dtype \n",
|
||
"--- ------ -------------- ----- \n",
|
||
" 0 placed_at 694 non-null object \n",
|
||
" 1 restaurant_id 694 non-null int64 \n",
|
||
" 2 restaurant 694 non-null object \n",
|
||
" 3 o_street 694 non-null object \n",
|
||
" 4 o_zip 694 non-null int64 \n",
|
||
" 5 o_city 694 non-null object \n",
|
||
" 6 o_latitude 694 non-null float64\n",
|
||
" 7 o_longitude 694 non-null float64\n",
|
||
" 8 customer_id 694 non-null int64 \n",
|
||
" 9 d_street 694 non-null object \n",
|
||
" 10 d_zip 694 non-null int64 \n",
|
||
" 11 d_city 694 non-null object \n",
|
||
" 12 d_latitude 694 non-null float64\n",
|
||
" 13 d_longitude 694 non-null float64\n",
|
||
" 14 total 694 non-null int64 \n",
|
||
" 15 courier_id 690 non-null float64\n",
|
||
" 16 pickup_at 665 non-null object \n",
|
||
" 17 delivery_at 663 non-null object \n",
|
||
" 18 cancelled 694 non-null int64 \n",
|
||
"dtypes: float64(5), int64(6), object(8)\n",
|
||
"memory usage: 108.4+ KB\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"df.info()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"[.describe() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.describe.html#pandas.DataFrame.describe) shows statistics on all numerical columns in a `DataFrame`.\n",
|
||
"\n",
|
||
"For the example orders, such statistics may not be meaningful for all numerical columns as some of them merely hold IDs or zip codes."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 9,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>restaurant_id</th>\n",
|
||
" <th>o_zip</th>\n",
|
||
" <th>o_latitude</th>\n",
|
||
" <th>o_longitude</th>\n",
|
||
" <th>customer_id</th>\n",
|
||
" <th>d_zip</th>\n",
|
||
" <th>d_latitude</th>\n",
|
||
" <th>d_longitude</th>\n",
|
||
" <th>total</th>\n",
|
||
" <th>courier_id</th>\n",
|
||
" <th>cancelled</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>count</th>\n",
|
||
" <td>694.000000</td>\n",
|
||
" <td>694.000000</td>\n",
|
||
" <td>694.000000</td>\n",
|
||
" <td>694.000000</td>\n",
|
||
" <td>694.000000</td>\n",
|
||
" <td>694.000000</td>\n",
|
||
" <td>694.000000</td>\n",
|
||
" <td>694.000000</td>\n",
|
||
" <td>694.000000</td>\n",
|
||
" <td>690.000000</td>\n",
|
||
" <td>694.000000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>mean</th>\n",
|
||
" <td>1228.479827</td>\n",
|
||
" <td>33075.216138</td>\n",
|
||
" <td>44.839258</td>\n",
|
||
" <td>-0.575759</td>\n",
|
||
" <td>74751.126801</td>\n",
|
||
" <td>33191.613833</td>\n",
|
||
" <td>44.838623</td>\n",
|
||
" <td>-0.576040</td>\n",
|
||
" <td>2294.636888</td>\n",
|
||
" <td>1484.755072</td>\n",
|
||
" <td>0.044669</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>std</th>\n",
|
||
" <td>18.001091</td>\n",
|
||
" <td>207.971435</td>\n",
|
||
" <td>0.007471</td>\n",
|
||
" <td>0.006920</td>\n",
|
||
" <td>14604.304963</td>\n",
|
||
" <td>307.378697</td>\n",
|
||
" <td>0.011545</td>\n",
|
||
" <td>0.010799</td>\n",
|
||
" <td>1060.695748</td>\n",
|
||
" <td>154.586210</td>\n",
|
||
" <td>0.206724</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>min</th>\n",
|
||
" <td>1204.000000</td>\n",
|
||
" <td>33000.000000</td>\n",
|
||
" <td>44.818180</td>\n",
|
||
" <td>-0.599400</td>\n",
|
||
" <td>2377.000000</td>\n",
|
||
" <td>33000.000000</td>\n",
|
||
" <td>44.809813</td>\n",
|
||
" <td>-0.606892</td>\n",
|
||
" <td>350.000000</td>\n",
|
||
" <td>1403.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>25%</th>\n",
|
||
" <td>1212.000000</td>\n",
|
||
" <td>33000.000000</td>\n",
|
||
" <td>44.836910</td>\n",
|
||
" <td>-0.579345</td>\n",
|
||
" <td>76648.500000</td>\n",
|
||
" <td>33000.000000</td>\n",
|
||
" <td>44.829981</td>\n",
|
||
" <td>-0.581612</td>\n",
|
||
" <td>1500.000000</td>\n",
|
||
" <td>1415.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>50%</th>\n",
|
||
" <td>1224.000000</td>\n",
|
||
" <td>33000.000000</td>\n",
|
||
" <td>44.838287</td>\n",
|
||
" <td>-0.573940</td>\n",
|
||
" <td>78146.000000</td>\n",
|
||
" <td>33000.000000</td>\n",
|
||
" <td>44.838364</td>\n",
|
||
" <td>-0.575056</td>\n",
|
||
" <td>1969.500000</td>\n",
|
||
" <td>1424.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>75%</th>\n",
|
||
" <td>1244.000000</td>\n",
|
||
" <td>33000.000000</td>\n",
|
||
" <td>44.841721</td>\n",
|
||
" <td>-0.572281</td>\n",
|
||
" <td>79331.500000</td>\n",
|
||
" <td>33300.000000</td>\n",
|
||
" <td>44.846696</td>\n",
|
||
" <td>-0.569601</td>\n",
|
||
" <td>2750.000000</td>\n",
|
||
" <td>1462.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>max</th>\n",
|
||
" <td>1267.000000</td>\n",
|
||
" <td>33800.000000</td>\n",
|
||
" <td>44.855438</td>\n",
|
||
" <td>-0.550576</td>\n",
|
||
" <td>80401.000000</td>\n",
|
||
" <td>33800.000000</td>\n",
|
||
" <td>44.877693</td>\n",
|
||
" <td>-0.537952</td>\n",
|
||
" <td>8370.000000</td>\n",
|
||
" <td>2013.000000</td>\n",
|
||
" <td>1.000000</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" restaurant_id o_zip o_latitude o_longitude customer_id \\\n",
|
||
"count 694.000000 694.000000 694.000000 694.000000 694.000000 \n",
|
||
"mean 1228.479827 33075.216138 44.839258 -0.575759 74751.126801 \n",
|
||
"std 18.001091 207.971435 0.007471 0.006920 14604.304963 \n",
|
||
"min 1204.000000 33000.000000 44.818180 -0.599400 2377.000000 \n",
|
||
"25% 1212.000000 33000.000000 44.836910 -0.579345 76648.500000 \n",
|
||
"50% 1224.000000 33000.000000 44.838287 -0.573940 78146.000000 \n",
|
||
"75% 1244.000000 33000.000000 44.841721 -0.572281 79331.500000 \n",
|
||
"max 1267.000000 33800.000000 44.855438 -0.550576 80401.000000 \n",
|
||
"\n",
|
||
" d_zip d_latitude d_longitude total courier_id \\\n",
|
||
"count 694.000000 694.000000 694.000000 694.000000 690.000000 \n",
|
||
"mean 33191.613833 44.838623 -0.576040 2294.636888 1484.755072 \n",
|
||
"std 307.378697 0.011545 0.010799 1060.695748 154.586210 \n",
|
||
"min 33000.000000 44.809813 -0.606892 350.000000 1403.000000 \n",
|
||
"25% 33000.000000 44.829981 -0.581612 1500.000000 1415.000000 \n",
|
||
"50% 33000.000000 44.838364 -0.575056 1969.500000 1424.000000 \n",
|
||
"75% 33300.000000 44.846696 -0.569601 2750.000000 1462.000000 \n",
|
||
"max 33800.000000 44.877693 -0.537952 8370.000000 2013.000000 \n",
|
||
"\n",
|
||
" cancelled \n",
|
||
"count 694.000000 \n",
|
||
"mean 0.044669 \n",
|
||
"std 0.206724 \n",
|
||
"min 0.000000 \n",
|
||
"25% 0.000000 \n",
|
||
"50% 0.000000 \n",
|
||
"75% 0.000000 \n",
|
||
"max 1.000000 "
|
||
]
|
||
},
|
||
"execution_count": 9,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.describe()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Indexing & Slicing"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"`DataFrame`s support being indexed or sliced, both in the row and column dimensions.\n",
|
||
"\n",
|
||
"To obtain all data in a single column, we index into the `DataFrame` with the column's name.\n",
|
||
"\n",
|
||
"For example, `restaurant_col` provides a list of only the restaurant names. Its index are still the \"order_id\"s."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 10,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"order_id\n",
|
||
"192594 Max A Table\n",
|
||
"192644 Max A Table\n",
|
||
"192658 Taj Mahal\n",
|
||
"193242 Chez Ambre And Michel\n",
|
||
"192719 La Maison Du Glacier\n",
|
||
"Name: restaurant, dtype: object"
|
||
]
|
||
},
|
||
"execution_count": 10,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"restaurant_col = df[\"restaurant\"]\n",
|
||
"\n",
|
||
"restaurant_col.head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"The data type of a single column is [pd.Series <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.Series.html#pandas.Series), which is very similar to a `DataFrame` with only one column. `Series` objects work like built-in `list`s with added functionalities."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 11,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"pandas.core.series.Series"
|
||
]
|
||
},
|
||
"execution_count": 11,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"type(restaurant_col)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"If we index with a `list` of column names, the result is itself another `DataFrame`. That operation is like slicing out a smaller matrix from a larger one as we saw with `ndarray`s before.\n",
|
||
"\n",
|
||
"For example, let's pull out all location data of the orders' origins (i.e., restaurants)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 12,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>o_street</th>\n",
|
||
" <th>o_zip</th>\n",
|
||
" <th>o_city</th>\n",
|
||
" <th>o_latitude</th>\n",
|
||
" <th>o_longitude</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>order_id</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>192594</th>\n",
|
||
" <td>36 Rue Cornac</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.851402</td>\n",
|
||
" <td>-0.575870</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>192644</th>\n",
|
||
" <td>36 Rue Cornac</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.851402</td>\n",
|
||
" <td>-0.575870</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>192658</th>\n",
|
||
" <td>24 Rue Du Parlement Sainte-Catherine</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.840405</td>\n",
|
||
" <td>-0.573940</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>193242</th>\n",
|
||
" <td>1 Rue Matignon</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.850258</td>\n",
|
||
" <td>-0.586204</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>192719</th>\n",
|
||
" <td>1 Place Saint Pierre</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.839706</td>\n",
|
||
" <td>-0.570672</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" o_street o_zip o_city o_latitude \\\n",
|
||
"order_id \n",
|
||
"192594 36 Rue Cornac 33000 Bordeaux 44.851402 \n",
|
||
"192644 36 Rue Cornac 33000 Bordeaux 44.851402 \n",
|
||
"192658 24 Rue Du Parlement Sainte-Catherine 33000 Bordeaux 44.840405 \n",
|
||
"193242 1 Rue Matignon 33000 Bordeaux 44.850258 \n",
|
||
"192719 1 Place Saint Pierre 33000 Bordeaux 44.839706 \n",
|
||
"\n",
|
||
" o_longitude \n",
|
||
"order_id \n",
|
||
"192594 -0.575870 \n",
|
||
"192644 -0.575870 \n",
|
||
"192658 -0.573940 \n",
|
||
"193242 -0.586204 \n",
|
||
"192719 -0.570672 "
|
||
]
|
||
},
|
||
"execution_count": 12,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"origins = df[[\"o_street\", \"o_zip\", \"o_city\", \"o_latitude\", \"o_longitude\"]]\n",
|
||
"\n",
|
||
"origins.head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"To access individual rows, we index not into a `DataFrame` directly but into its [.loc <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html#pandas.DataFrame.loc) object (which also exists for `Series`).\n",
|
||
"\n",
|
||
"Here, `200800` is an \"order_id\" number. The result is a `Series` object where the original `DataFrame`'s columns become the index."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 13,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"placed_at 2016-07-24 19:30:52\n",
|
||
"restaurant_id 1204\n",
|
||
"restaurant Max A Table\n",
|
||
"o_street 36 Rue Cornac\n",
|
||
"o_zip 33000\n",
|
||
"o_city Bordeaux\n",
|
||
"o_latitude 44.851402\n",
|
||
"o_longitude -0.57587\n",
|
||
"customer_id 76187\n",
|
||
"d_street Rue Judaique 213\n",
|
||
"d_zip 33000\n",
|
||
"d_city Bordeaux\n",
|
||
"d_latitude 44.840829\n",
|
||
"d_longitude -0.595445\n",
|
||
"total 2250\n",
|
||
"courier_id 1468.0\n",
|
||
"pickup_at 2016-07-24 19:50:52\n",
|
||
"delivery_at 2016-07-24 19:58:16\n",
|
||
"cancelled 0\n",
|
||
"Name: 200800, dtype: object"
|
||
]
|
||
},
|
||
"execution_count": 13,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.loc[200800]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"We can also index into the `restaurant_col` and `origins` objects from above. As `restaurant_col` is a `Series`, we get back a scalar value."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 14,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"'Max A Table'"
|
||
]
|
||
},
|
||
"execution_count": 14,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"restaurant_col.loc[200800]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 15,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"o_street 36 Rue Cornac\n",
|
||
"o_zip 33000\n",
|
||
"o_city Bordeaux\n",
|
||
"o_latitude 44.851402\n",
|
||
"o_longitude -0.57587\n",
|
||
"Name: 200800, dtype: object"
|
||
]
|
||
},
|
||
"execution_count": 15,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"origins.loc[200800]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Slicing also works with [.loc <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html#pandas.DataFrame.loc). A tiny difference to Python's built-in slicing, the upper bound is included in the slice as well!"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 16,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>placed_at</th>\n",
|
||
" <th>restaurant_id</th>\n",
|
||
" <th>restaurant</th>\n",
|
||
" <th>o_street</th>\n",
|
||
" <th>o_zip</th>\n",
|
||
" <th>o_city</th>\n",
|
||
" <th>o_latitude</th>\n",
|
||
" <th>o_longitude</th>\n",
|
||
" <th>customer_id</th>\n",
|
||
" <th>d_street</th>\n",
|
||
" <th>d_zip</th>\n",
|
||
" <th>d_city</th>\n",
|
||
" <th>d_latitude</th>\n",
|
||
" <th>d_longitude</th>\n",
|
||
" <th>total</th>\n",
|
||
" <th>courier_id</th>\n",
|
||
" <th>pickup_at</th>\n",
|
||
" <th>delivery_at</th>\n",
|
||
" <th>cancelled</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>order_id</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>200300</th>\n",
|
||
" <td>2016-07-24 13:46:15</td>\n",
|
||
" <td>1207</td>\n",
|
||
" <td>Le Jardin Pekinois</td>\n",
|
||
" <td>9 Rue Des Freres Bonie</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.837078</td>\n",
|
||
" <td>-0.579572</td>\n",
|
||
" <td>76030</td>\n",
|
||
" <td>Rue Villeneuve 1</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.839927</td>\n",
|
||
" <td>-0.580012</td>\n",
|
||
" <td>3820</td>\n",
|
||
" <td>1426.0</td>\n",
|
||
" <td>2016-07-24 14:12:45</td>\n",
|
||
" <td>2016-07-24 14:16:59</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>200305</th>\n",
|
||
" <td>2016-07-24 13:49:25</td>\n",
|
||
" <td>1207</td>\n",
|
||
" <td>Le Jardin Pekinois</td>\n",
|
||
" <td>9 Rue Des Freres Bonie</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.837078</td>\n",
|
||
" <td>-0.579572</td>\n",
|
||
" <td>76033</td>\n",
|
||
" <td>Rue Du Ha 54</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.835898</td>\n",
|
||
" <td>-0.577941</td>\n",
|
||
" <td>1689</td>\n",
|
||
" <td>1405.0</td>\n",
|
||
" <td>2016-07-24 14:12:04</td>\n",
|
||
" <td>2016-07-24 14:15:54</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>200800</th>\n",
|
||
" <td>2016-07-24 19:30:52</td>\n",
|
||
" <td>1204</td>\n",
|
||
" <td>Max A Table</td>\n",
|
||
" <td>36 Rue Cornac</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.851402</td>\n",
|
||
" <td>-0.575870</td>\n",
|
||
" <td>76187</td>\n",
|
||
" <td>Rue Judaique 213</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.840829</td>\n",
|
||
" <td>-0.595445</td>\n",
|
||
" <td>2250</td>\n",
|
||
" <td>1468.0</td>\n",
|
||
" <td>2016-07-24 19:50:52</td>\n",
|
||
" <td>2016-07-24 19:58:16</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" placed_at restaurant_id restaurant \\\n",
|
||
"order_id \n",
|
||
"200300 2016-07-24 13:46:15 1207 Le Jardin Pekinois \n",
|
||
"200305 2016-07-24 13:49:25 1207 Le Jardin Pekinois \n",
|
||
"200800 2016-07-24 19:30:52 1204 Max A Table \n",
|
||
"\n",
|
||
" o_street o_zip o_city o_latitude o_longitude \\\n",
|
||
"order_id \n",
|
||
"200300 9 Rue Des Freres Bonie 33000 Bordeaux 44.837078 -0.579572 \n",
|
||
"200305 9 Rue Des Freres Bonie 33000 Bordeaux 44.837078 -0.579572 \n",
|
||
"200800 36 Rue Cornac 33000 Bordeaux 44.851402 -0.575870 \n",
|
||
"\n",
|
||
" customer_id d_street d_zip d_city d_latitude \\\n",
|
||
"order_id \n",
|
||
"200300 76030 Rue Villeneuve 1 33000 Bordeaux 44.839927 \n",
|
||
"200305 76033 Rue Du Ha 54 33000 Bordeaux 44.835898 \n",
|
||
"200800 76187 Rue Judaique 213 33000 Bordeaux 44.840829 \n",
|
||
"\n",
|
||
" d_longitude total courier_id pickup_at \\\n",
|
||
"order_id \n",
|
||
"200300 -0.580012 3820 1426.0 2016-07-24 14:12:45 \n",
|
||
"200305 -0.577941 1689 1405.0 2016-07-24 14:12:04 \n",
|
||
"200800 -0.595445 2250 1468.0 2016-07-24 19:50:52 \n",
|
||
"\n",
|
||
" delivery_at cancelled \n",
|
||
"order_id \n",
|
||
"200300 2016-07-24 14:16:59 0 \n",
|
||
"200305 2016-07-24 14:15:54 0 \n",
|
||
"200800 2016-07-24 19:58:16 0 "
|
||
]
|
||
},
|
||
"execution_count": 16,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.loc[200300:200800]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 17,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"order_id\n",
|
||
"200300 Le Jardin Pekinois\n",
|
||
"200305 Le Jardin Pekinois\n",
|
||
"200800 Max A Table\n",
|
||
"Name: restaurant, dtype: object"
|
||
]
|
||
},
|
||
"execution_count": 17,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"restaurant_col.loc[200300:200800]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 18,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>o_street</th>\n",
|
||
" <th>o_zip</th>\n",
|
||
" <th>o_city</th>\n",
|
||
" <th>o_latitude</th>\n",
|
||
" <th>o_longitude</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>order_id</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>200300</th>\n",
|
||
" <td>9 Rue Des Freres Bonie</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.837078</td>\n",
|
||
" <td>-0.579572</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>200305</th>\n",
|
||
" <td>9 Rue Des Freres Bonie</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.837078</td>\n",
|
||
" <td>-0.579572</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>200800</th>\n",
|
||
" <td>36 Rue Cornac</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.851402</td>\n",
|
||
" <td>-0.575870</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" o_street o_zip o_city o_latitude o_longitude\n",
|
||
"order_id \n",
|
||
"200300 9 Rue Des Freres Bonie 33000 Bordeaux 44.837078 -0.579572\n",
|
||
"200305 9 Rue Des Freres Bonie 33000 Bordeaux 44.837078 -0.579572\n",
|
||
"200800 36 Rue Cornac 33000 Bordeaux 44.851402 -0.575870"
|
||
]
|
||
},
|
||
"execution_count": 18,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"origins.loc[200300:200800]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"[.loc <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html#pandas.DataFrame.loc) also allows us to index and slice in both dimensions simultaneously. The first index or slice goes along the row dimension while the second index or slice selects the columns."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 19,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>o_street</th>\n",
|
||
" <th>o_zip</th>\n",
|
||
" <th>o_city</th>\n",
|
||
" <th>o_latitude</th>\n",
|
||
" <th>o_longitude</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>order_id</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>200300</th>\n",
|
||
" <td>9 Rue Des Freres Bonie</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.837078</td>\n",
|
||
" <td>-0.579572</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>200305</th>\n",
|
||
" <td>9 Rue Des Freres Bonie</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.837078</td>\n",
|
||
" <td>-0.579572</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>200800</th>\n",
|
||
" <td>36 Rue Cornac</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.851402</td>\n",
|
||
" <td>-0.575870</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" o_street o_zip o_city o_latitude o_longitude\n",
|
||
"order_id \n",
|
||
"200300 9 Rue Des Freres Bonie 33000 Bordeaux 44.837078 -0.579572\n",
|
||
"200305 9 Rue Des Freres Bonie 33000 Bordeaux 44.837078 -0.579572\n",
|
||
"200800 36 Rue Cornac 33000 Bordeaux 44.851402 -0.575870"
|
||
]
|
||
},
|
||
"execution_count": 19,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.loc[\n",
|
||
" 200300:200800,\n",
|
||
" [\"o_street\", \"o_zip\", \"o_city\", \"o_latitude\", \"o_longitude\"]\n",
|
||
"]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Type Casting"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"As [.info() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.info.html#pandas.DataFrame.info) already revealed above, the timestamp columns could only be parsed as generic objects (i.e., textual data). Also, the \"cancelled\" column which holds only `True` or `False` values does not have a `bool` data type."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 20,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"<class 'pandas.core.frame.DataFrame'>\n",
|
||
"Index: 694 entries, 192594 to 211519\n",
|
||
"Data columns (total 19 columns):\n",
|
||
" # Column Non-Null Count Dtype \n",
|
||
"--- ------ -------------- ----- \n",
|
||
" 0 placed_at 694 non-null object \n",
|
||
" 1 restaurant_id 694 non-null int64 \n",
|
||
" 2 restaurant 694 non-null object \n",
|
||
" 3 o_street 694 non-null object \n",
|
||
" 4 o_zip 694 non-null int64 \n",
|
||
" 5 o_city 694 non-null object \n",
|
||
" 6 o_latitude 694 non-null float64\n",
|
||
" 7 o_longitude 694 non-null float64\n",
|
||
" 8 customer_id 694 non-null int64 \n",
|
||
" 9 d_street 694 non-null object \n",
|
||
" 10 d_zip 694 non-null int64 \n",
|
||
" 11 d_city 694 non-null object \n",
|
||
" 12 d_latitude 694 non-null float64\n",
|
||
" 13 d_longitude 694 non-null float64\n",
|
||
" 14 total 694 non-null int64 \n",
|
||
" 15 courier_id 690 non-null float64\n",
|
||
" 16 pickup_at 665 non-null object \n",
|
||
" 17 delivery_at 663 non-null object \n",
|
||
" 18 cancelled 694 non-null int64 \n",
|
||
"dtypes: float64(5), int64(6), object(8)\n",
|
||
"memory usage: 124.6+ KB\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"df.info()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"The [pd.to_datetime() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html#pandas.to_datetime) function **casts** the timestamp columns correctly."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 21,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"order_id\n",
|
||
"192594 2016-07-18 12:23:13\n",
|
||
"192644 2016-07-18 12:48:55\n",
|
||
"192658 2016-07-18 13:00:13\n",
|
||
"193242 2016-07-18 20:39:54\n",
|
||
"192719 2016-07-18 13:52:04\n",
|
||
" ... \n",
|
||
"212021 2016-07-30 22:29:52\n",
|
||
"211501 2016-07-30 20:44:50\n",
|
||
"211508 2016-07-30 20:45:55\n",
|
||
"211510 2016-07-30 20:46:05\n",
|
||
"211519 2016-07-30 20:46:55\n",
|
||
"Name: placed_at, Length: 694, dtype: datetime64[ns]"
|
||
]
|
||
},
|
||
"execution_count": 21,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"pd.to_datetime(df[\"placed_at\"])"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let's overwrite the original \"placed_at\" column with one that has the correct data type."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 22,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"df[\"placed_at\"] = pd.to_datetime(df[\"placed_at\"])"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"The [.astype() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.astype.html#pandas.DataFrame.astype) method generalizes this idea and allows us to cast several columns in a `DataFrame`. It takes a `dict`ionary mapping column names to data types as its input. Instead of references to actual data types (e.g., `bool`), it also understands [pandas <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/)-specific data types provides as text."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 23,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"df = df.astype({\n",
|
||
" \"pickup_at\": \"datetime64[ns]\",\n",
|
||
" \"delivery_at\": \"datetime64[ns]\",\n",
|
||
" \"cancelled\": bool,\n",
|
||
"})"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Now, all columns in `df` have more applicable data types."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 24,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"<class 'pandas.core.frame.DataFrame'>\n",
|
||
"Index: 694 entries, 192594 to 211519\n",
|
||
"Data columns (total 19 columns):\n",
|
||
" # Column Non-Null Count Dtype \n",
|
||
"--- ------ -------------- ----- \n",
|
||
" 0 placed_at 694 non-null datetime64[ns]\n",
|
||
" 1 restaurant_id 694 non-null int64 \n",
|
||
" 2 restaurant 694 non-null object \n",
|
||
" 3 o_street 694 non-null object \n",
|
||
" 4 o_zip 694 non-null int64 \n",
|
||
" 5 o_city 694 non-null object \n",
|
||
" 6 o_latitude 694 non-null float64 \n",
|
||
" 7 o_longitude 694 non-null float64 \n",
|
||
" 8 customer_id 694 non-null int64 \n",
|
||
" 9 d_street 694 non-null object \n",
|
||
" 10 d_zip 694 non-null int64 \n",
|
||
" 11 d_city 694 non-null object \n",
|
||
" 12 d_latitude 694 non-null float64 \n",
|
||
" 13 d_longitude 694 non-null float64 \n",
|
||
" 14 total 694 non-null int64 \n",
|
||
" 15 courier_id 690 non-null float64 \n",
|
||
" 16 pickup_at 665 non-null datetime64[ns]\n",
|
||
" 17 delivery_at 663 non-null datetime64[ns]\n",
|
||
" 18 cancelled 694 non-null bool \n",
|
||
"dtypes: bool(1), datetime64[ns](3), float64(5), int64(5), object(5)\n",
|
||
"memory usage: 119.9+ KB\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"df.info()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Filtering"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"A common operation when working with `DataFrame`s is to filter for rows fulfilling certain conditions. That is implemented by so-called **boolean filters** in [pandas <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/), which is simply a `DataFrame` or `Series` holding only `True` or `False` values.\n",
|
||
"\n",
|
||
"One way to obtain such objects is to use relational operators with columns.\n",
|
||
"\n",
|
||
"`max_a_table` holds `True` values for all orders at the restaurant with the ID `1204`."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 25,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"order_id\n",
|
||
"192594 True\n",
|
||
"192644 True\n",
|
||
"192658 False\n",
|
||
"193242 False\n",
|
||
"192719 False\n",
|
||
" ... \n",
|
||
"212021 False\n",
|
||
"211501 True\n",
|
||
"211508 False\n",
|
||
"211510 False\n",
|
||
"211519 False\n",
|
||
"Name: restaurant_id, Length: 694, dtype: bool"
|
||
]
|
||
},
|
||
"execution_count": 25,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"max_a_table = df[\"restaurant_id\"] == 1204\n",
|
||
"\n",
|
||
"max_a_table"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Next, let's use a boolean filter to index into `df`. That gives us back a new `DataFame` with all orders belonging to the restaurant \"Max A Table\"."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 26,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>placed_at</th>\n",
|
||
" <th>restaurant_id</th>\n",
|
||
" <th>restaurant</th>\n",
|
||
" <th>o_street</th>\n",
|
||
" <th>o_zip</th>\n",
|
||
" <th>o_city</th>\n",
|
||
" <th>o_latitude</th>\n",
|
||
" <th>o_longitude</th>\n",
|
||
" <th>customer_id</th>\n",
|
||
" <th>d_street</th>\n",
|
||
" <th>d_zip</th>\n",
|
||
" <th>d_city</th>\n",
|
||
" <th>d_latitude</th>\n",
|
||
" <th>d_longitude</th>\n",
|
||
" <th>total</th>\n",
|
||
" <th>courier_id</th>\n",
|
||
" <th>pickup_at</th>\n",
|
||
" <th>delivery_at</th>\n",
|
||
" <th>cancelled</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>order_id</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>192594</th>\n",
|
||
" <td>2016-07-18 12:23:13</td>\n",
|
||
" <td>1204</td>\n",
|
||
" <td>Max A Table</td>\n",
|
||
" <td>36 Rue Cornac</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.851402</td>\n",
|
||
" <td>-0.57587</td>\n",
|
||
" <td>10298</td>\n",
|
||
" <td>Rue Rolland 14</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.842592</td>\n",
|
||
" <td>-0.580521</td>\n",
|
||
" <td>2050</td>\n",
|
||
" <td>1423.0</td>\n",
|
||
" <td>2016-07-18 12:38:08</td>\n",
|
||
" <td>2016-07-18 12:48:22</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>192644</th>\n",
|
||
" <td>2016-07-18 12:48:55</td>\n",
|
||
" <td>1204</td>\n",
|
||
" <td>Max A Table</td>\n",
|
||
" <td>36 Rue Cornac</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.851402</td>\n",
|
||
" <td>-0.57587</td>\n",
|
||
" <td>6037</td>\n",
|
||
" <td>Rue Rolland 14</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.842592</td>\n",
|
||
" <td>-0.580521</td>\n",
|
||
" <td>2450</td>\n",
|
||
" <td>1426.0</td>\n",
|
||
" <td>2016-07-18 13:03:08</td>\n",
|
||
" <td>2016-07-18 13:12:01</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>194335</th>\n",
|
||
" <td>2016-07-19 20:35:21</td>\n",
|
||
" <td>1204</td>\n",
|
||
" <td>Max A Table</td>\n",
|
||
" <td>36 Rue Cornac</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.851402</td>\n",
|
||
" <td>-0.57587</td>\n",
|
||
" <td>74268</td>\n",
|
||
" <td>Place Canteloup 12</td>\n",
|
||
" <td>33800</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.833834</td>\n",
|
||
" <td>-0.565674</td>\n",
|
||
" <td>3100</td>\n",
|
||
" <td>1420.0</td>\n",
|
||
" <td>2016-07-19 20:51:16</td>\n",
|
||
" <td>2016-07-19 21:01:08</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>196615</th>\n",
|
||
" <td>2016-07-21 19:50:15</td>\n",
|
||
" <td>1204</td>\n",
|
||
" <td>Max A Table</td>\n",
|
||
" <td>36 Rue Cornac</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.851402</td>\n",
|
||
" <td>-0.57587</td>\n",
|
||
" <td>74901</td>\n",
|
||
" <td>Rue Marcelin Jourdan 55</td>\n",
|
||
" <td>33200</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.850360</td>\n",
|
||
" <td>-0.597361</td>\n",
|
||
" <td>2050</td>\n",
|
||
" <td>1418.0</td>\n",
|
||
" <td>2016-07-21 20:12:29</td>\n",
|
||
" <td>2016-07-21 20:25:57</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>196839</th>\n",
|
||
" <td>2016-07-21 20:27:22</td>\n",
|
||
" <td>1204</td>\n",
|
||
" <td>Max A Table</td>\n",
|
||
" <td>36 Rue Cornac</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.851402</td>\n",
|
||
" <td>-0.57587</td>\n",
|
||
" <td>74966</td>\n",
|
||
" <td>Rue Sainte-Catherine 137</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.836516</td>\n",
|
||
" <td>-0.573983</td>\n",
|
||
" <td>3750</td>\n",
|
||
" <td>1472.0</td>\n",
|
||
" <td>2016-07-21 20:41:42</td>\n",
|
||
" <td>2016-07-21 21:14:41</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" placed_at restaurant_id restaurant o_street \\\n",
|
||
"order_id \n",
|
||
"192594 2016-07-18 12:23:13 1204 Max A Table 36 Rue Cornac \n",
|
||
"192644 2016-07-18 12:48:55 1204 Max A Table 36 Rue Cornac \n",
|
||
"194335 2016-07-19 20:35:21 1204 Max A Table 36 Rue Cornac \n",
|
||
"196615 2016-07-21 19:50:15 1204 Max A Table 36 Rue Cornac \n",
|
||
"196839 2016-07-21 20:27:22 1204 Max A Table 36 Rue Cornac \n",
|
||
"\n",
|
||
" o_zip o_city o_latitude o_longitude customer_id \\\n",
|
||
"order_id \n",
|
||
"192594 33000 Bordeaux 44.851402 -0.57587 10298 \n",
|
||
"192644 33000 Bordeaux 44.851402 -0.57587 6037 \n",
|
||
"194335 33000 Bordeaux 44.851402 -0.57587 74268 \n",
|
||
"196615 33000 Bordeaux 44.851402 -0.57587 74901 \n",
|
||
"196839 33000 Bordeaux 44.851402 -0.57587 74966 \n",
|
||
"\n",
|
||
" d_street d_zip d_city d_latitude d_longitude \\\n",
|
||
"order_id \n",
|
||
"192594 Rue Rolland 14 33000 Bordeaux 44.842592 -0.580521 \n",
|
||
"192644 Rue Rolland 14 33000 Bordeaux 44.842592 -0.580521 \n",
|
||
"194335 Place Canteloup 12 33800 Bordeaux 44.833834 -0.565674 \n",
|
||
"196615 Rue Marcelin Jourdan 55 33200 Bordeaux 44.850360 -0.597361 \n",
|
||
"196839 Rue Sainte-Catherine 137 33000 Bordeaux 44.836516 -0.573983 \n",
|
||
"\n",
|
||
" total courier_id pickup_at delivery_at cancelled \n",
|
||
"order_id \n",
|
||
"192594 2050 1423.0 2016-07-18 12:38:08 2016-07-18 12:48:22 False \n",
|
||
"192644 2450 1426.0 2016-07-18 13:03:08 2016-07-18 13:12:01 False \n",
|
||
"194335 3100 1420.0 2016-07-19 20:51:16 2016-07-19 21:01:08 False \n",
|
||
"196615 2050 1418.0 2016-07-21 20:12:29 2016-07-21 20:25:57 False \n",
|
||
"196839 3750 1472.0 2016-07-21 20:41:42 2016-07-21 21:14:41 False "
|
||
]
|
||
},
|
||
"execution_count": 26,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.loc[df[\"restaurant_id\"] == 1204].head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Instead of an explicit condition, we can also use a reference to a boolean filter created above."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 27,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>placed_at</th>\n",
|
||
" <th>restaurant_id</th>\n",
|
||
" <th>restaurant</th>\n",
|
||
" <th>o_street</th>\n",
|
||
" <th>o_zip</th>\n",
|
||
" <th>o_city</th>\n",
|
||
" <th>o_latitude</th>\n",
|
||
" <th>o_longitude</th>\n",
|
||
" <th>customer_id</th>\n",
|
||
" <th>d_street</th>\n",
|
||
" <th>d_zip</th>\n",
|
||
" <th>d_city</th>\n",
|
||
" <th>d_latitude</th>\n",
|
||
" <th>d_longitude</th>\n",
|
||
" <th>total</th>\n",
|
||
" <th>courier_id</th>\n",
|
||
" <th>pickup_at</th>\n",
|
||
" <th>delivery_at</th>\n",
|
||
" <th>cancelled</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>order_id</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>192594</th>\n",
|
||
" <td>2016-07-18 12:23:13</td>\n",
|
||
" <td>1204</td>\n",
|
||
" <td>Max A Table</td>\n",
|
||
" <td>36 Rue Cornac</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.851402</td>\n",
|
||
" <td>-0.57587</td>\n",
|
||
" <td>10298</td>\n",
|
||
" <td>Rue Rolland 14</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.842592</td>\n",
|
||
" <td>-0.580521</td>\n",
|
||
" <td>2050</td>\n",
|
||
" <td>1423.0</td>\n",
|
||
" <td>2016-07-18 12:38:08</td>\n",
|
||
" <td>2016-07-18 12:48:22</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>192644</th>\n",
|
||
" <td>2016-07-18 12:48:55</td>\n",
|
||
" <td>1204</td>\n",
|
||
" <td>Max A Table</td>\n",
|
||
" <td>36 Rue Cornac</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.851402</td>\n",
|
||
" <td>-0.57587</td>\n",
|
||
" <td>6037</td>\n",
|
||
" <td>Rue Rolland 14</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.842592</td>\n",
|
||
" <td>-0.580521</td>\n",
|
||
" <td>2450</td>\n",
|
||
" <td>1426.0</td>\n",
|
||
" <td>2016-07-18 13:03:08</td>\n",
|
||
" <td>2016-07-18 13:12:01</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>194335</th>\n",
|
||
" <td>2016-07-19 20:35:21</td>\n",
|
||
" <td>1204</td>\n",
|
||
" <td>Max A Table</td>\n",
|
||
" <td>36 Rue Cornac</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.851402</td>\n",
|
||
" <td>-0.57587</td>\n",
|
||
" <td>74268</td>\n",
|
||
" <td>Place Canteloup 12</td>\n",
|
||
" <td>33800</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.833834</td>\n",
|
||
" <td>-0.565674</td>\n",
|
||
" <td>3100</td>\n",
|
||
" <td>1420.0</td>\n",
|
||
" <td>2016-07-19 20:51:16</td>\n",
|
||
" <td>2016-07-19 21:01:08</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>196615</th>\n",
|
||
" <td>2016-07-21 19:50:15</td>\n",
|
||
" <td>1204</td>\n",
|
||
" <td>Max A Table</td>\n",
|
||
" <td>36 Rue Cornac</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.851402</td>\n",
|
||
" <td>-0.57587</td>\n",
|
||
" <td>74901</td>\n",
|
||
" <td>Rue Marcelin Jourdan 55</td>\n",
|
||
" <td>33200</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.850360</td>\n",
|
||
" <td>-0.597361</td>\n",
|
||
" <td>2050</td>\n",
|
||
" <td>1418.0</td>\n",
|
||
" <td>2016-07-21 20:12:29</td>\n",
|
||
" <td>2016-07-21 20:25:57</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>196839</th>\n",
|
||
" <td>2016-07-21 20:27:22</td>\n",
|
||
" <td>1204</td>\n",
|
||
" <td>Max A Table</td>\n",
|
||
" <td>36 Rue Cornac</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.851402</td>\n",
|
||
" <td>-0.57587</td>\n",
|
||
" <td>74966</td>\n",
|
||
" <td>Rue Sainte-Catherine 137</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.836516</td>\n",
|
||
" <td>-0.573983</td>\n",
|
||
" <td>3750</td>\n",
|
||
" <td>1472.0</td>\n",
|
||
" <td>2016-07-21 20:41:42</td>\n",
|
||
" <td>2016-07-21 21:14:41</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" placed_at restaurant_id restaurant o_street \\\n",
|
||
"order_id \n",
|
||
"192594 2016-07-18 12:23:13 1204 Max A Table 36 Rue Cornac \n",
|
||
"192644 2016-07-18 12:48:55 1204 Max A Table 36 Rue Cornac \n",
|
||
"194335 2016-07-19 20:35:21 1204 Max A Table 36 Rue Cornac \n",
|
||
"196615 2016-07-21 19:50:15 1204 Max A Table 36 Rue Cornac \n",
|
||
"196839 2016-07-21 20:27:22 1204 Max A Table 36 Rue Cornac \n",
|
||
"\n",
|
||
" o_zip o_city o_latitude o_longitude customer_id \\\n",
|
||
"order_id \n",
|
||
"192594 33000 Bordeaux 44.851402 -0.57587 10298 \n",
|
||
"192644 33000 Bordeaux 44.851402 -0.57587 6037 \n",
|
||
"194335 33000 Bordeaux 44.851402 -0.57587 74268 \n",
|
||
"196615 33000 Bordeaux 44.851402 -0.57587 74901 \n",
|
||
"196839 33000 Bordeaux 44.851402 -0.57587 74966 \n",
|
||
"\n",
|
||
" d_street d_zip d_city d_latitude d_longitude \\\n",
|
||
"order_id \n",
|
||
"192594 Rue Rolland 14 33000 Bordeaux 44.842592 -0.580521 \n",
|
||
"192644 Rue Rolland 14 33000 Bordeaux 44.842592 -0.580521 \n",
|
||
"194335 Place Canteloup 12 33800 Bordeaux 44.833834 -0.565674 \n",
|
||
"196615 Rue Marcelin Jourdan 55 33200 Bordeaux 44.850360 -0.597361 \n",
|
||
"196839 Rue Sainte-Catherine 137 33000 Bordeaux 44.836516 -0.573983 \n",
|
||
"\n",
|
||
" total courier_id pickup_at delivery_at cancelled \n",
|
||
"order_id \n",
|
||
"192594 2050 1423.0 2016-07-18 12:38:08 2016-07-18 12:48:22 False \n",
|
||
"192644 2450 1426.0 2016-07-18 13:03:08 2016-07-18 13:12:01 False \n",
|
||
"194335 3100 1420.0 2016-07-19 20:51:16 2016-07-19 21:01:08 False \n",
|
||
"196615 2050 1418.0 2016-07-21 20:12:29 2016-07-21 20:25:57 False \n",
|
||
"196839 3750 1472.0 2016-07-21 20:41:42 2016-07-21 21:14:41 False "
|
||
]
|
||
},
|
||
"execution_count": 27,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.loc[max_a_table].head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Combining the filter with a `list` of columns allows us to further narrow down the `DataFrame`.\n",
|
||
"\n",
|
||
"For example, the preview below shows us the first five customers \"Max A Table\" had in the target period."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 28,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>customer_id</th>\n",
|
||
" <th>d_street</th>\n",
|
||
" <th>d_zip</th>\n",
|
||
" <th>d_city</th>\n",
|
||
" <th>d_latitude</th>\n",
|
||
" <th>d_longitude</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>order_id</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>192594</th>\n",
|
||
" <td>10298</td>\n",
|
||
" <td>Rue Rolland 14</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.842592</td>\n",
|
||
" <td>-0.580521</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>192644</th>\n",
|
||
" <td>6037</td>\n",
|
||
" <td>Rue Rolland 14</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.842592</td>\n",
|
||
" <td>-0.580521</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>194335</th>\n",
|
||
" <td>74268</td>\n",
|
||
" <td>Place Canteloup 12</td>\n",
|
||
" <td>33800</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.833834</td>\n",
|
||
" <td>-0.565674</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>196615</th>\n",
|
||
" <td>74901</td>\n",
|
||
" <td>Rue Marcelin Jourdan 55</td>\n",
|
||
" <td>33200</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.850360</td>\n",
|
||
" <td>-0.597361</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>196839</th>\n",
|
||
" <td>74966</td>\n",
|
||
" <td>Rue Sainte-Catherine 137</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.836516</td>\n",
|
||
" <td>-0.573983</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" customer_id d_street d_zip d_city d_latitude \\\n",
|
||
"order_id \n",
|
||
"192594 10298 Rue Rolland 14 33000 Bordeaux 44.842592 \n",
|
||
"192644 6037 Rue Rolland 14 33000 Bordeaux 44.842592 \n",
|
||
"194335 74268 Place Canteloup 12 33800 Bordeaux 44.833834 \n",
|
||
"196615 74901 Rue Marcelin Jourdan 55 33200 Bordeaux 44.850360 \n",
|
||
"196839 74966 Rue Sainte-Catherine 137 33000 Bordeaux 44.836516 \n",
|
||
"\n",
|
||
" d_longitude \n",
|
||
"order_id \n",
|
||
"192594 -0.580521 \n",
|
||
"192644 -0.580521 \n",
|
||
"194335 -0.565674 \n",
|
||
"196615 -0.597361 \n",
|
||
"196839 -0.573983 "
|
||
]
|
||
},
|
||
"execution_count": 28,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.loc[\n",
|
||
" max_a_table,\n",
|
||
" [\"customer_id\", \"d_street\", \"d_zip\", \"d_city\", \"d_latitude\", \"d_longitude\"]\n",
|
||
"].head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Boolean filters can be created in an arbitray fashion by combining several conditions with `&` and `|` modeling logical AND and OR operators.\n",
|
||
"\n",
|
||
"The example lists the first five customers of \"Max A Table\" in a target area provided as latitude-longitude coordinates."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 29,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>customer_id</th>\n",
|
||
" <th>d_street</th>\n",
|
||
" <th>d_zip</th>\n",
|
||
" <th>d_city</th>\n",
|
||
" <th>d_latitude</th>\n",
|
||
" <th>d_longitude</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>order_id</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>196615</th>\n",
|
||
" <td>74901</td>\n",
|
||
" <td>Rue Marcelin Jourdan 55</td>\n",
|
||
" <td>33200</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.850360</td>\n",
|
||
" <td>-0.597361</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>200800</th>\n",
|
||
" <td>76187</td>\n",
|
||
" <td>Rue Judaique 213</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.840829</td>\n",
|
||
" <td>-0.595445</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>200893</th>\n",
|
||
" <td>76218</td>\n",
|
||
" <td>Rue Notre Dame 21</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.850260</td>\n",
|
||
" <td>-0.572377</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>202788</th>\n",
|
||
" <td>76786</td>\n",
|
||
" <td>Rue De Leybardie 27</td>\n",
|
||
" <td>33300</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.861360</td>\n",
|
||
" <td>-0.565057</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>202563</th>\n",
|
||
" <td>76730</td>\n",
|
||
" <td>Rue Lombard 47</td>\n",
|
||
" <td>33300</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>44.858661</td>\n",
|
||
" <td>-0.563095</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" customer_id d_street d_zip d_city d_latitude \\\n",
|
||
"order_id \n",
|
||
"196615 74901 Rue Marcelin Jourdan 55 33200 Bordeaux 44.850360 \n",
|
||
"200800 76187 Rue Judaique 213 33000 Bordeaux 44.840829 \n",
|
||
"200893 76218 Rue Notre Dame 21 33000 Bordeaux 44.850260 \n",
|
||
"202788 76786 Rue De Leybardie 27 33300 Bordeaux 44.861360 \n",
|
||
"202563 76730 Rue Lombard 47 33300 Bordeaux 44.858661 \n",
|
||
"\n",
|
||
" d_longitude \n",
|
||
"order_id \n",
|
||
"196615 -0.597361 \n",
|
||
"200800 -0.595445 \n",
|
||
"200893 -0.572377 \n",
|
||
"202788 -0.565057 \n",
|
||
"202563 -0.563095 "
|
||
]
|
||
},
|
||
"execution_count": 29,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.loc[\n",
|
||
" (\n",
|
||
" max_a_table\n",
|
||
" &\n",
|
||
" (\n",
|
||
" (df[\"d_latitude\"] > 44.85)\n",
|
||
" |\n",
|
||
" (df[\"d_longitude\"] < -0.59)\n",
|
||
" ) \n",
|
||
" ),\n",
|
||
" [\"customer_id\", \"d_street\", \"d_zip\", \"d_city\", \"d_latitude\", \"d_longitude\"]\n",
|
||
"].head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"[.isin() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.isin.html#pandas.DataFrame.isin) is another useful method: It allows us to provide a `list` of values that we are filtering for in a column."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 30,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>placed_at</th>\n",
|
||
" <th>customer_id</th>\n",
|
||
" <th>d_street</th>\n",
|
||
" <th>d_zip</th>\n",
|
||
" <th>d_city</th>\n",
|
||
" <th>total</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>order_id</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>192644</th>\n",
|
||
" <td>2016-07-18 12:48:55</td>\n",
|
||
" <td>6037</td>\n",
|
||
" <td>Rue Rolland 14</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>2450</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>210945</th>\n",
|
||
" <td>2016-07-30 19:30:39</td>\n",
|
||
" <td>79900</td>\n",
|
||
" <td>Rue Du Couvent 16</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>1650</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>211363</th>\n",
|
||
" <td>2016-07-30 20:27:45</td>\n",
|
||
" <td>80095</td>\n",
|
||
" <td>Rue De La Porte Saint-Jean 8</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>2400</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" placed_at customer_id d_street \\\n",
|
||
"order_id \n",
|
||
"192644 2016-07-18 12:48:55 6037 Rue Rolland 14 \n",
|
||
"210945 2016-07-30 19:30:39 79900 Rue Du Couvent 16 \n",
|
||
"211363 2016-07-30 20:27:45 80095 Rue De La Porte Saint-Jean 8 \n",
|
||
"\n",
|
||
" d_zip d_city total \n",
|
||
"order_id \n",
|
||
"192644 33000 Bordeaux 2450 \n",
|
||
"210945 33000 Bordeaux 1650 \n",
|
||
"211363 33000 Bordeaux 2400 "
|
||
]
|
||
},
|
||
"execution_count": 30,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.loc[\n",
|
||
" (\n",
|
||
" max_a_table\n",
|
||
" &\n",
|
||
" df[\"customer_id\"].isin([6037, 79900, 80095])\n",
|
||
" ),\n",
|
||
" [\"placed_at\", \"customer_id\", \"d_street\", \"d_zip\", \"d_city\", \"total\"]\n",
|
||
"].head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"The `~` operator negates a condition. So, in the cell below we see all orders at \"Max A Table\" except the ones from the indicated customers."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 31,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>placed_at</th>\n",
|
||
" <th>customer_id</th>\n",
|
||
" <th>d_street</th>\n",
|
||
" <th>d_zip</th>\n",
|
||
" <th>d_city</th>\n",
|
||
" <th>total</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>order_id</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>192594</th>\n",
|
||
" <td>2016-07-18 12:23:13</td>\n",
|
||
" <td>10298</td>\n",
|
||
" <td>Rue Rolland 14</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>2050</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>194335</th>\n",
|
||
" <td>2016-07-19 20:35:21</td>\n",
|
||
" <td>74268</td>\n",
|
||
" <td>Place Canteloup 12</td>\n",
|
||
" <td>33800</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>3100</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>196615</th>\n",
|
||
" <td>2016-07-21 19:50:15</td>\n",
|
||
" <td>74901</td>\n",
|
||
" <td>Rue Marcelin Jourdan 55</td>\n",
|
||
" <td>33200</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>2050</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>196839</th>\n",
|
||
" <td>2016-07-21 20:27:22</td>\n",
|
||
" <td>74966</td>\n",
|
||
" <td>Rue Sainte-Catherine 137</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>3750</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>198631</th>\n",
|
||
" <td>2016-07-22 21:29:40</td>\n",
|
||
" <td>75047</td>\n",
|
||
" <td>Rue Boudet 29</td>\n",
|
||
" <td>33000</td>\n",
|
||
" <td>Bordeaux</td>\n",
|
||
" <td>2650</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" placed_at customer_id d_street d_zip \\\n",
|
||
"order_id \n",
|
||
"192594 2016-07-18 12:23:13 10298 Rue Rolland 14 33000 \n",
|
||
"194335 2016-07-19 20:35:21 74268 Place Canteloup 12 33800 \n",
|
||
"196615 2016-07-21 19:50:15 74901 Rue Marcelin Jourdan 55 33200 \n",
|
||
"196839 2016-07-21 20:27:22 74966 Rue Sainte-Catherine 137 33000 \n",
|
||
"198631 2016-07-22 21:29:40 75047 Rue Boudet 29 33000 \n",
|
||
"\n",
|
||
" d_city total \n",
|
||
"order_id \n",
|
||
"192594 Bordeaux 2050 \n",
|
||
"194335 Bordeaux 3100 \n",
|
||
"196615 Bordeaux 2050 \n",
|
||
"196839 Bordeaux 3750 \n",
|
||
"198631 Bordeaux 2650 "
|
||
]
|
||
},
|
||
"execution_count": 31,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.loc[\n",
|
||
" (\n",
|
||
" max_a_table\n",
|
||
" &\n",
|
||
" ~df[\"customer_id\"].isin([6037, 79900, 80095])\n",
|
||
" ),\n",
|
||
" [\"placed_at\", \"customer_id\", \"d_street\", \"d_zip\", \"d_city\", \"total\"]\n",
|
||
"].head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## DataFrame Methods"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Now that we have learned the basics of selecting the data we want from a `DataFrame`, let's look at a couple of methods that allow us to obtain some infos out of a `DataFrame`, in particular, to run some **descriptive statistics**.\n",
|
||
"\n",
|
||
"[.unique() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.Series.unique.html#pandas.Series.unique) is a simple `Series` method returning an `ndarray` with all values that are in the `Series` once.\n",
|
||
"\n",
|
||
"Here, we get an overview of how many restaurants there are in Bordeaux in the target time horizon."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 32,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"array([1204, 1205, 1208, 1206, 1209, 1207, 1211, 1213, 1214, 1212, 1216,\n",
|
||
" 1215, 1217, 1218, 1219, 1220, 1221, 1223, 1222, 1224, 1225, 1229,\n",
|
||
" 1226, 1227, 1230, 1231, 1232, 1233, 1234, 1235, 1236, 1237, 1239,\n",
|
||
" 1241, 1242, 1243, 1245, 1244, 1246, 1247, 1249, 1254, 1250, 1256,\n",
|
||
" 1258, 1259, 1260, 1263, 1264, 1266, 1265, 1267])"
|
||
]
|
||
},
|
||
"execution_count": 32,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df[\"restaurant_id\"].unique()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 33,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"52"
|
||
]
|
||
},
|
||
"execution_count": 33,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"len(df[\"restaurant_id\"].unique())"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"To obtain an `ndarray` of all customer IDs of \"Max A Table\", we write the following."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 34,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"array([10298, 6037, 74268, 74901, 74966, 75047, 76187, 76218, 76442,\n",
|
||
" 76396, 76421, 76786, 76822, 76730, 76871, 75687, 77409, 77386,\n",
|
||
" 77355, 77556, 78129, 78353, 78608, 78621, 78958, 79119, 79153,\n",
|
||
" 76838, 79234, 79486, 79576, 79563, 79653, 79900, 79912, 80026,\n",
|
||
" 80204, 80095, 80163])"
|
||
]
|
||
},
|
||
"execution_count": 34,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.loc[\n",
|
||
" max_a_table,\n",
|
||
" \"customer_id\"\n",
|
||
"].unique()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"[.value_counts() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.value_counts.html#pandas.DataFrame.value_counts) is similar to [.unique() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.Series.unique.html#pandas.Series.unique) and provides an array sorted by the counts of how often an element occurs in a column or `Series` in descending order.\n",
|
||
"\n",
|
||
"We use it to list the `10` most popular restaurants and customers in the dataset."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 35,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"restaurant_id\n",
|
||
"1254 78\n",
|
||
"1207 47\n",
|
||
"1204 39\n",
|
||
"1217 37\n",
|
||
"1212 32\n",
|
||
"1244 25\n",
|
||
"1225 25\n",
|
||
"1249 23\n",
|
||
"1242 19\n",
|
||
"1221 18\n",
|
||
"Name: count, dtype: int64"
|
||
]
|
||
},
|
||
"execution_count": 35,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df[\"restaurant_id\"].value_counts().head(10)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 36,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"customer_id\n",
|
||
"73919 14\n",
|
||
"10298 12\n",
|
||
"6037 8\n",
|
||
"77048 5\n",
|
||
"4210 4\n",
|
||
"74426 4\n",
|
||
"9304 3\n",
|
||
"76838 3\n",
|
||
"75905 3\n",
|
||
"74791 3\n",
|
||
"Name: count, dtype: int64"
|
||
]
|
||
},
|
||
"execution_count": 36,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df[\"customer_id\"].value_counts().head(10)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"[.sum() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sum.html#pandas.DataFrame.sum), [.min() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.min.html#pandas.DataFrame.min), [.max() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.max.html#pandas.DataFrame.max), [.mean() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.mean.html#pandas.DataFrame.mean), and [.round() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.round.html#pandas.DataFrame.round) are self-explanatory.\n",
|
||
"\n",
|
||
"We use it to analyze the overall spendings in Bordeaux and for \"Max A Table\"."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 37,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"np.float64(15924.78)"
|
||
]
|
||
},
|
||
"execution_count": 37,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df[\"total\"].sum() / 100 # Convert to Euro"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 38,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"np.float64(885.0)"
|
||
]
|
||
},
|
||
"execution_count": 38,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.loc[\n",
|
||
" max_a_table,\n",
|
||
" \"total\"\n",
|
||
"].sum() / 100"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 39,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"np.float64(3.5)"
|
||
]
|
||
},
|
||
"execution_count": 39,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df[\"total\"].min() / 100"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 40,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"np.float64(83.7)"
|
||
]
|
||
},
|
||
"execution_count": 40,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df[\"total\"].max() / 100"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 41,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"np.float64(12.5)"
|
||
]
|
||
},
|
||
"execution_count": 41,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.loc[\n",
|
||
" max_a_table,\n",
|
||
" \"total\"\n",
|
||
"].min() / 100"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 42,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"np.float64(60.0)"
|
||
]
|
||
},
|
||
"execution_count": 42,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.loc[\n",
|
||
" max_a_table,\n",
|
||
" \"total\"\n",
|
||
"].max() / 100"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 43,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"np.float64(22.94636887608069)"
|
||
]
|
||
},
|
||
"execution_count": 43,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df[\"total\"].mean() / 100"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 44,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"np.float64(22.95)"
|
||
]
|
||
},
|
||
"execution_count": 44,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df[\"total\"].mean().round() / 100"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 45,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"np.float64(22.69)"
|
||
]
|
||
},
|
||
"execution_count": 45,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.loc[\n",
|
||
" max_a_table,\n",
|
||
" \"total\"\n",
|
||
"].mean().round() / 100"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "intro-to-data-science",
|
||
"language": "python",
|
||
"name": "intro-to-data-science"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.12.4"
|
||
},
|
||
"toc": {
|
||
"base_numbering": 1,
|
||
"nav_menu": {},
|
||
"number_sections": false,
|
||
"sideBar": true,
|
||
"skip_h1_title": false,
|
||
"title_cell": "Table of Contents",
|
||
"title_sidebar": "Contents",
|
||
"toc_cell": false,
|
||
"toc_position": {},
|
||
"toc_section_display": true,
|
||
"toc_window_display": false
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 4
|
||
}
|