intro-to-data-science/01_scientific_stack/02_content_pandas.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Note**: Click on \"*Kernel*\" > \"*Restart Kernel and Clear All Outputs*\" in [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/) *before* reading this notebook to reset its output. If you cannot run this file on your machine, you may want to open it [in the cloud <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_mb.png\">](https://mybinder.org/v2/gh/webartifex/intro-to-data-science/main?urlpath=lab/tree/01_scientific_stack/02_content_pandas.ipynb)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Chapter 1: Python's Scientific Stack (Part 2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For practitioners, the [numpy <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_np.png\">](https://numpy.org/) library may feel a bit too \"technical\" or too close to \"real programming\" and they may prefer something that looks and feels more like Excel. That is where the [pandas <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/) library comes in.\n",
    "\n",
    "Let's first `pip` install and then `import` it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: pandas in /home/instructor/Repositories/intro-to-data-science/.venv/lib/python3.12/site-packages (2.2.2)\n",
      "Requirement already satisfied: numpy>=1.26.0 in /home/instructor/Repositories/intro-to-data-science/.venv/lib/python3.12/site-packages (from pandas) (2.0.0)\n",
      "Requirement already satisfied: python-dateutil>=2.8.2 in /home/instructor/Repositories/intro-to-data-science/.venv/lib/python3.12/site-packages (from pandas) (2.9.0.post0)\n",
      "Requirement already satisfied: pytz>=2020.1 in /home/instructor/Repositories/intro-to-data-science/.venv/lib/python3.12/site-packages (from pandas) (2024.1)\n",
      "Requirement already satisfied: tzdata>=2022.7 in /home/instructor/Repositories/intro-to-data-science/.venv/lib/python3.12/site-packages (from pandas) (2024.1)\n",
      "Requirement already satisfied: six>=1.5 in /home/instructor/Repositories/intro-to-data-science/.venv/lib/python3.12/site-packages (from python-dateutil>=2.8.2->pandas) (1.16.0)\n"
     ]
    }
   ],
   "source": [
    "!pip install pandas"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Excel-like Data with Pandas"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the same folder as this notebook there is a file named \"*orders.csv*\" that holds the order data of an urban meal delivery platform operating in Bordeaux, France. Open in with a double-click and take a look at its contents right here in JupyterLab!\n",
    "\n",
    "[pandas <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/) provides a [pd.read_csv() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html#pandas.read_csv) function that, as the name suggests, can open and read in CSV data. For Excel files, there is also a [pd.read_excel() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html) function but the CSV format is probably more widespread in use.\n",
    "\n",
    "Let's read in the \"*orders.csv*\" file with [pd.read_csv() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html#pandas.read_csv) specifying the \"order_id\" column as the **index**. Here, index is a column with *unique* values that allow the identification of each row in a dataset. If we don't specify an index column, [pandas <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/) creates a surrogate index as a sequence of integers 1, 2, 3, and so on."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_csv(\"orders.csv\", index_col=\"order_id\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`df` models a table-like data structure, comparable to one tab in an Excel file. [pandas <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/) and JupyterLab are designed to work well together: The `df` object shows a preview of the dataset below the code cell. The rows are the **records** in the dataset and the columns take the role of the **attributes** each record has. Each column comes with a **domain** of allowable values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>placed_at</th>\n",
       "      <th>restaurant_id</th>\n",
       "      <th>restaurant</th>\n",
       "      <th>o_street</th>\n",
       "      <th>o_zip</th>\n",
       "      <th>o_city</th>\n",
       "      <th>o_latitude</th>\n",
       "      <th>o_longitude</th>\n",
       "      <th>customer_id</th>\n",
       "      <th>d_street</th>\n",
       "      <th>d_zip</th>\n",
       "      <th>d_city</th>\n",
       "      <th>d_latitude</th>\n",
       "      <th>d_longitude</th>\n",
       "      <th>total</th>\n",
       "      <th>courier_id</th>\n",
       "      <th>pickup_at</th>\n",
       "      <th>delivery_at</th>\n",
       "      <th>cancelled</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>order_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>192594</th>\n",
       "      <td>2016-07-18 12:23:13</td>\n",
       "      <td>1204</td>\n",
       "      <td>Max A Table</td>\n",
       "      <td>36 Rue Cornac</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.851402</td>\n",
       "      <td>-0.575870</td>\n",
       "      <td>10298</td>\n",
       "      <td>Rue Rolland 14</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.842592</td>\n",
       "      <td>-0.580521</td>\n",
       "      <td>2050</td>\n",
       "      <td>1423.0</td>\n",
       "      <td>2016-07-18 12:38:08</td>\n",
       "      <td>2016-07-18 12:48:22</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>192644</th>\n",
       "      <td>2016-07-18 12:48:55</td>\n",
       "      <td>1204</td>\n",
       "      <td>Max A Table</td>\n",
       "      <td>36 Rue Cornac</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.851402</td>\n",
       "      <td>-0.575870</td>\n",
       "      <td>6037</td>\n",
       "      <td>Rue Rolland 14</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.842592</td>\n",
       "      <td>-0.580521</td>\n",
       "      <td>2450</td>\n",
       "      <td>1426.0</td>\n",
       "      <td>2016-07-18 13:03:08</td>\n",
       "      <td>2016-07-18 13:12:01</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>192658</th>\n",
       "      <td>2016-07-18 13:00:13</td>\n",
       "      <td>1205</td>\n",
       "      <td>Taj Mahal</td>\n",
       "      <td>24 Rue Du Parlement Sainte-Catherine</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.840405</td>\n",
       "      <td>-0.573940</td>\n",
       "      <td>73830</td>\n",
       "      <td>Rue Batailley 12</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.838504</td>\n",
       "      <td>-0.591961</td>\n",
       "      <td>2550</td>\n",
       "      <td>1423.0</td>\n",
       "      <td>2016-07-18 13:19:04</td>\n",
       "      <td>2016-07-18 13:29:03</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>193242</th>\n",
       "      <td>2016-07-18 20:39:54</td>\n",
       "      <td>1208</td>\n",
       "      <td>Chez Ambre And Michel</td>\n",
       "      <td>1 Rue Matignon</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.850258</td>\n",
       "      <td>-0.586204</td>\n",
       "      <td>10298</td>\n",
       "      <td>Rue Rolland 14</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.842592</td>\n",
       "      <td>-0.580521</td>\n",
       "      <td>1550</td>\n",
       "      <td>1420.0</td>\n",
       "      <td>2016-07-18 20:55:52</td>\n",
       "      <td>2016-07-18 21:05:28</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>192719</th>\n",
       "      <td>2016-07-18 13:52:04</td>\n",
       "      <td>1206</td>\n",
       "      <td>La Maison Du Glacier</td>\n",
       "      <td>1 Place Saint Pierre</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.839706</td>\n",
       "      <td>-0.570672</td>\n",
       "      <td>6037</td>\n",
       "      <td>Rue Rolland 14</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.842592</td>\n",
       "      <td>-0.580521</td>\n",
       "      <td>2450</td>\n",
       "      <td>1426.0</td>\n",
       "      <td>2016-07-18 14:01:23</td>\n",
       "      <td>2016-07-18 14:08:36</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>212021</th>\n",
       "      <td>2016-07-30 22:29:52</td>\n",
       "      <td>1249</td>\n",
       "      <td>Pitaya Sainte Catherine</td>\n",
       "      <td>275 Rue Sainte Catherine</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.831692</td>\n",
       "      <td>-0.573207</td>\n",
       "      <td>80400</td>\n",
       "      <td>Boulevard President Franklin Roosevelt 15</td>\n",
       "      <td>33400</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.820591</td>\n",
       "      <td>-0.582048</td>\n",
       "      <td>2250</td>\n",
       "      <td>1410.0</td>\n",
       "      <td>2016-07-30 22:50:16</td>\n",
       "      <td>2016-07-30 23:02:54</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>211501</th>\n",
       "      <td>2016-07-30 20:44:50</td>\n",
       "      <td>1204</td>\n",
       "      <td>Max A Table</td>\n",
       "      <td>36 Rue Cornac</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.851402</td>\n",
       "      <td>-0.575870</td>\n",
       "      <td>80163</td>\n",
       "      <td>Rue Marsan 22</td>\n",
       "      <td>33300</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.856133</td>\n",
       "      <td>-0.576172</td>\n",
       "      <td>1250</td>\n",
       "      <td>1415.0</td>\n",
       "      <td>2016-07-30 21:02:32</td>\n",
       "      <td>2016-07-30 21:06:19</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>211508</th>\n",
       "      <td>2016-07-30 20:45:55</td>\n",
       "      <td>1254</td>\n",
       "      <td>Funky Burger</td>\n",
       "      <td>5 Rue Du Loup</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.838081</td>\n",
       "      <td>-0.572281</td>\n",
       "      <td>80168</td>\n",
       "      <td>Rue Des Sablieres 42</td>\n",
       "      <td>33800</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.825488</td>\n",
       "      <td>-0.575264</td>\n",
       "      <td>1680</td>\n",
       "      <td>1461.0</td>\n",
       "      <td>2016-07-30 21:13:31</td>\n",
       "      <td>2016-07-30 21:19:45</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>211510</th>\n",
       "      <td>2016-07-30 20:46:05</td>\n",
       "      <td>1219</td>\n",
       "      <td>La Tagliatella</td>\n",
       "      <td>14 Rue Guiraude</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.839388</td>\n",
       "      <td>-0.574781</td>\n",
       "      <td>80169</td>\n",
       "      <td>Rue Pasteur 35</td>\n",
       "      <td>33200</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.845053</td>\n",
       "      <td>-0.601157</td>\n",
       "      <td>4085</td>\n",
       "      <td>1411.0</td>\n",
       "      <td>2016-07-30 21:11:00</td>\n",
       "      <td>2016-07-30 21:23:24</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>211519</th>\n",
       "      <td>2016-07-30 20:46:55</td>\n",
       "      <td>1254</td>\n",
       "      <td>Funky Burger</td>\n",
       "      <td>5 Rue Du Loup</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.838081</td>\n",
       "      <td>-0.572281</td>\n",
       "      <td>80172</td>\n",
       "      <td>Rue Monadey 28</td>\n",
       "      <td>33800</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.828816</td>\n",
       "      <td>-0.570789</td>\n",
       "      <td>2050</td>\n",
       "      <td>1817.0</td>\n",
       "      <td>2016-07-30 21:05:46</td>\n",
       "      <td>2016-07-30 21:14:07</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>694 rows × 19 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                    placed_at  restaurant_id               restaurant  \\\n",
       "order_id                                                                \n",
       "192594    2016-07-18 12:23:13           1204              Max A Table   \n",
       "192644    2016-07-18 12:48:55           1204              Max A Table   \n",
       "192658    2016-07-18 13:00:13           1205                Taj Mahal   \n",
       "193242    2016-07-18 20:39:54           1208    Chez Ambre And Michel   \n",
       "192719    2016-07-18 13:52:04           1206     La Maison Du Glacier   \n",
       "...                       ...            ...                      ...   \n",
       "212021    2016-07-30 22:29:52           1249  Pitaya Sainte Catherine   \n",
       "211501    2016-07-30 20:44:50           1204              Max A Table   \n",
       "211508    2016-07-30 20:45:55           1254             Funky Burger   \n",
       "211510    2016-07-30 20:46:05           1219           La Tagliatella   \n",
       "211519    2016-07-30 20:46:55           1254             Funky Burger   \n",
       "\n",
       "                                      o_street  o_zip    o_city  o_latitude  \\\n",
       "order_id                                                                      \n",
       "192594                           36 Rue Cornac  33000  Bordeaux   44.851402   \n",
       "192644                           36 Rue Cornac  33000  Bordeaux   44.851402   \n",
       "192658    24 Rue Du Parlement Sainte-Catherine  33000  Bordeaux   44.840405   \n",
       "193242                          1 Rue Matignon  33000  Bordeaux   44.850258   \n",
       "192719                    1 Place Saint Pierre  33000  Bordeaux   44.839706   \n",
       "...                                        ...    ...       ...         ...   \n",
       "212021                275 Rue Sainte Catherine  33000  Bordeaux   44.831692   \n",
       "211501                           36 Rue Cornac  33000  Bordeaux   44.851402   \n",
       "211508                           5 Rue Du Loup  33000  Bordeaux   44.838081   \n",
       "211510                         14 Rue Guiraude  33000  Bordeaux   44.839388   \n",
       "211519                           5 Rue Du Loup  33000  Bordeaux   44.838081   \n",
       "\n",
       "          o_longitude  customer_id                                   d_street  \\\n",
       "order_id                                                                        \n",
       "192594      -0.575870        10298                             Rue Rolland 14   \n",
       "192644      -0.575870         6037                             Rue Rolland 14   \n",
       "192658      -0.573940        73830                           Rue Batailley 12   \n",
       "193242      -0.586204        10298                             Rue Rolland 14   \n",
       "192719      -0.570672         6037                             Rue Rolland 14   \n",
       "...               ...          ...                                        ...   \n",
       "212021      -0.573207        80400  Boulevard President Franklin Roosevelt 15   \n",
       "211501      -0.575870        80163                              Rue Marsan 22   \n",
       "211508      -0.572281        80168                       Rue Des Sablieres 42   \n",
       "211510      -0.574781        80169                             Rue Pasteur 35   \n",
       "211519      -0.572281        80172                             Rue Monadey 28   \n",
       "\n",
       "          d_zip    d_city  d_latitude  d_longitude  total  courier_id  \\\n",
       "order_id                                                                \n",
       "192594    33000  Bordeaux   44.842592    -0.580521   2050      1423.0   \n",
       "192644    33000  Bordeaux   44.842592    -0.580521   2450      1426.0   \n",
       "192658    33000  Bordeaux   44.838504    -0.591961   2550      1423.0   \n",
       "193242    33000  Bordeaux   44.842592    -0.580521   1550      1420.0   \n",
       "192719    33000  Bordeaux   44.842592    -0.580521   2450      1426.0   \n",
       "...         ...       ...         ...          ...    ...         ...   \n",
       "212021    33400  Bordeaux   44.820591    -0.582048   2250      1410.0   \n",
       "211501    33300  Bordeaux   44.856133    -0.576172   1250      1415.0   \n",
       "211508    33800  Bordeaux   44.825488    -0.575264   1680      1461.0   \n",
       "211510    33200  Bordeaux   44.845053    -0.601157   4085      1411.0   \n",
       "211519    33800  Bordeaux   44.828816    -0.570789   2050      1817.0   \n",
       "\n",
       "                    pickup_at          delivery_at  cancelled  \n",
       "order_id                                                       \n",
       "192594    2016-07-18 12:38:08  2016-07-18 12:48:22          0  \n",
       "192644    2016-07-18 13:03:08  2016-07-18 13:12:01          0  \n",
       "192658    2016-07-18 13:19:04  2016-07-18 13:29:03          0  \n",
       "193242    2016-07-18 20:55:52  2016-07-18 21:05:28          0  \n",
       "192719    2016-07-18 14:01:23  2016-07-18 14:08:36          0  \n",
       "...                       ...                  ...        ...  \n",
       "212021    2016-07-30 22:50:16  2016-07-30 23:02:54          0  \n",
       "211501    2016-07-30 21:02:32  2016-07-30 21:06:19          0  \n",
       "211508    2016-07-30 21:13:31  2016-07-30 21:19:45          0  \n",
       "211510    2016-07-30 21:11:00  2016-07-30 21:23:24          0  \n",
       "211519    2016-07-30 21:05:46  2016-07-30 21:14:07          0  \n",
       "\n",
       "[694 rows x 19 columns]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data type behind `df` is called a [pd.DataFrame <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame). `DataFrame`s are built around [numpy <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_np.png\">](https://numpy.org/)'s `ndarray`s providing an interface optimized for **interactive usage** (i.e., a data scientist exploring a dataset step by step)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.frame.DataFrame"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`DataFrame`s come with many methdods.\n",
    "\n",
    "For example, [.head() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.head.html#pandas.DataFrame.head) and [.tail() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.tail.html#pandas.DataFrame.tail) show the first and last `n` rows, defaulting to `5`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>placed_at</th>\n",
       "      <th>restaurant_id</th>\n",
       "      <th>restaurant</th>\n",
       "      <th>o_street</th>\n",
       "      <th>o_zip</th>\n",
       "      <th>o_city</th>\n",
       "      <th>o_latitude</th>\n",
       "      <th>o_longitude</th>\n",
       "      <th>customer_id</th>\n",
       "      <th>d_street</th>\n",
       "      <th>d_zip</th>\n",
       "      <th>d_city</th>\n",
       "      <th>d_latitude</th>\n",
       "      <th>d_longitude</th>\n",
       "      <th>total</th>\n",
       "      <th>courier_id</th>\n",
       "      <th>pickup_at</th>\n",
       "      <th>delivery_at</th>\n",
       "      <th>cancelled</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>order_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>192594</th>\n",
       "      <td>2016-07-18 12:23:13</td>\n",
       "      <td>1204</td>\n",
       "      <td>Max A Table</td>\n",
       "      <td>36 Rue Cornac</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.851402</td>\n",
       "      <td>-0.575870</td>\n",
       "      <td>10298</td>\n",
       "      <td>Rue Rolland 14</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.842592</td>\n",
       "      <td>-0.580521</td>\n",
       "      <td>2050</td>\n",
       "      <td>1423.0</td>\n",
       "      <td>2016-07-18 12:38:08</td>\n",
       "      <td>2016-07-18 12:48:22</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>192644</th>\n",
       "      <td>2016-07-18 12:48:55</td>\n",
       "      <td>1204</td>\n",
       "      <td>Max A Table</td>\n",
       "      <td>36 Rue Cornac</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.851402</td>\n",
       "      <td>-0.575870</td>\n",
       "      <td>6037</td>\n",
       "      <td>Rue Rolland 14</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.842592</td>\n",
       "      <td>-0.580521</td>\n",
       "      <td>2450</td>\n",
       "      <td>1426.0</td>\n",
       "      <td>2016-07-18 13:03:08</td>\n",
       "      <td>2016-07-18 13:12:01</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>192658</th>\n",
       "      <td>2016-07-18 13:00:13</td>\n",
       "      <td>1205</td>\n",
       "      <td>Taj Mahal</td>\n",
       "      <td>24 Rue Du Parlement Sainte-Catherine</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.840405</td>\n",
       "      <td>-0.573940</td>\n",
       "      <td>73830</td>\n",
       "      <td>Rue Batailley 12</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.838504</td>\n",
       "      <td>-0.591961</td>\n",
       "      <td>2550</td>\n",
       "      <td>1423.0</td>\n",
       "      <td>2016-07-18 13:19:04</td>\n",
       "      <td>2016-07-18 13:29:03</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>193242</th>\n",
       "      <td>2016-07-18 20:39:54</td>\n",
       "      <td>1208</td>\n",
       "      <td>Chez Ambre And Michel</td>\n",
       "      <td>1 Rue Matignon</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.850258</td>\n",
       "      <td>-0.586204</td>\n",
       "      <td>10298</td>\n",
       "      <td>Rue Rolland 14</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.842592</td>\n",
       "      <td>-0.580521</td>\n",
       "      <td>1550</td>\n",
       "      <td>1420.0</td>\n",
       "      <td>2016-07-18 20:55:52</td>\n",
       "      <td>2016-07-18 21:05:28</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>192719</th>\n",
       "      <td>2016-07-18 13:52:04</td>\n",
       "      <td>1206</td>\n",
       "      <td>La Maison Du Glacier</td>\n",
       "      <td>1 Place Saint Pierre</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.839706</td>\n",
       "      <td>-0.570672</td>\n",
       "      <td>6037</td>\n",
       "      <td>Rue Rolland 14</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.842592</td>\n",
       "      <td>-0.580521</td>\n",
       "      <td>2450</td>\n",
       "      <td>1426.0</td>\n",
       "      <td>2016-07-18 14:01:23</td>\n",
       "      <td>2016-07-18 14:08:36</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    placed_at  restaurant_id             restaurant  \\\n",
       "order_id                                                              \n",
       "192594    2016-07-18 12:23:13           1204            Max A Table   \n",
       "192644    2016-07-18 12:48:55           1204            Max A Table   \n",
       "192658    2016-07-18 13:00:13           1205              Taj Mahal   \n",
       "193242    2016-07-18 20:39:54           1208  Chez Ambre And Michel   \n",
       "192719    2016-07-18 13:52:04           1206   La Maison Du Glacier   \n",
       "\n",
       "                                      o_street  o_zip    o_city  o_latitude  \\\n",
       "order_id                                                                      \n",
       "192594                           36 Rue Cornac  33000  Bordeaux   44.851402   \n",
       "192644                           36 Rue Cornac  33000  Bordeaux   44.851402   \n",
       "192658    24 Rue Du Parlement Sainte-Catherine  33000  Bordeaux   44.840405   \n",
       "193242                          1 Rue Matignon  33000  Bordeaux   44.850258   \n",
       "192719                    1 Place Saint Pierre  33000  Bordeaux   44.839706   \n",
       "\n",
       "          o_longitude  customer_id          d_street  d_zip    d_city  \\\n",
       "order_id                                                                \n",
       "192594      -0.575870        10298    Rue Rolland 14  33000  Bordeaux   \n",
       "192644      -0.575870         6037    Rue Rolland 14  33000  Bordeaux   \n",
       "192658      -0.573940        73830  Rue Batailley 12  33000  Bordeaux   \n",
       "193242      -0.586204        10298    Rue Rolland 14  33000  Bordeaux   \n",
       "192719      -0.570672         6037    Rue Rolland 14  33000  Bordeaux   \n",
       "\n",
       "          d_latitude  d_longitude  total  courier_id            pickup_at  \\\n",
       "order_id                                                                    \n",
       "192594     44.842592    -0.580521   2050      1423.0  2016-07-18 12:38:08   \n",
       "192644     44.842592    -0.580521   2450      1426.0  2016-07-18 13:03:08   \n",
       "192658     44.838504    -0.591961   2550      1423.0  2016-07-18 13:19:04   \n",
       "193242     44.842592    -0.580521   1550      1420.0  2016-07-18 20:55:52   \n",
       "192719     44.842592    -0.580521   2450      1426.0  2016-07-18 14:01:23   \n",
       "\n",
       "                  delivery_at  cancelled  \n",
       "order_id                                  \n",
       "192594    2016-07-18 12:48:22          0  \n",
       "192644    2016-07-18 13:12:01          0  \n",
       "192658    2016-07-18 13:29:03          0  \n",
       "193242    2016-07-18 21:05:28          0  \n",
       "192719    2016-07-18 14:08:36          0  "
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>placed_at</th>\n",
       "      <th>restaurant_id</th>\n",
       "      <th>restaurant</th>\n",
       "      <th>o_street</th>\n",
       "      <th>o_zip</th>\n",
       "      <th>o_city</th>\n",
       "      <th>o_latitude</th>\n",
       "      <th>o_longitude</th>\n",
       "      <th>customer_id</th>\n",
       "      <th>d_street</th>\n",
       "      <th>d_zip</th>\n",
       "      <th>d_city</th>\n",
       "      <th>d_latitude</th>\n",
       "      <th>d_longitude</th>\n",
       "      <th>total</th>\n",
       "      <th>courier_id</th>\n",
       "      <th>pickup_at</th>\n",
       "      <th>delivery_at</th>\n",
       "      <th>cancelled</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>order_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>211510</th>\n",
       "      <td>2016-07-30 20:46:05</td>\n",
       "      <td>1219</td>\n",
       "      <td>La Tagliatella</td>\n",
       "      <td>14 Rue Guiraude</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.839388</td>\n",
       "      <td>-0.574781</td>\n",
       "      <td>80169</td>\n",
       "      <td>Rue Pasteur 35</td>\n",
       "      <td>33200</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.845053</td>\n",
       "      <td>-0.601157</td>\n",
       "      <td>4085</td>\n",
       "      <td>1411.0</td>\n",
       "      <td>2016-07-30 21:11:00</td>\n",
       "      <td>2016-07-30 21:23:24</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>211519</th>\n",
       "      <td>2016-07-30 20:46:55</td>\n",
       "      <td>1254</td>\n",
       "      <td>Funky Burger</td>\n",
       "      <td>5 Rue Du Loup</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.838081</td>\n",
       "      <td>-0.572281</td>\n",
       "      <td>80172</td>\n",
       "      <td>Rue Monadey 28</td>\n",
       "      <td>33800</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.828816</td>\n",
       "      <td>-0.570789</td>\n",
       "      <td>2050</td>\n",
       "      <td>1817.0</td>\n",
       "      <td>2016-07-30 21:05:46</td>\n",
       "      <td>2016-07-30 21:14:07</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    placed_at  restaurant_id      restaurant         o_street  \\\n",
       "order_id                                                                        \n",
       "211510    2016-07-30 20:46:05           1219  La Tagliatella  14 Rue Guiraude   \n",
       "211519    2016-07-30 20:46:55           1254    Funky Burger    5 Rue Du Loup   \n",
       "\n",
       "          o_zip    o_city  o_latitude  o_longitude  customer_id  \\\n",
       "order_id                                                          \n",
       "211510    33000  Bordeaux   44.839388    -0.574781        80169   \n",
       "211519    33000  Bordeaux   44.838081    -0.572281        80172   \n",
       "\n",
       "                d_street  d_zip    d_city  d_latitude  d_longitude  total  \\\n",
       "order_id                                                                    \n",
       "211510    Rue Pasteur 35  33200  Bordeaux   44.845053    -0.601157   4085   \n",
       "211519    Rue Monadey 28  33800  Bordeaux   44.828816    -0.570789   2050   \n",
       "\n",
       "          courier_id            pickup_at          delivery_at  cancelled  \n",
       "order_id                                                                   \n",
       "211510        1411.0  2016-07-30 21:11:00  2016-07-30 21:23:24          0  \n",
       "211519        1817.0  2016-07-30 21:05:46  2016-07-30 21:14:07          0  "
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.tail(2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[.info() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.info.html#pandas.DataFrame.info) shows on overview of the columns. In particular, it shows how many cells are filled in in a column (i.e., are \"non-null\") and what **data type** (i.e., \"dtype\") *all* values in a column have. \"int64\" and \"float64\" imply that there are only `int` and `float` values in a column (taking up to 64 bits or 1s and 0s in memory). \"object\" is [pandas <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/)' way of telling us it could not deduce any data type more specific than textual data. For the columns holding timestamps (e.g., \"placed_at\") we will convert the values further below.\n",
    "\n",
    "Looking at the output, we see that some columns hold the data of **origin**-**destination** pairs, corresponding to restaurants and customers. Other columns store data following the dispatch and delivery process of couriers picking up and delivering meals at various points in time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "Index: 694 entries, 192594 to 211519\n",
      "Data columns (total 19 columns):\n",
      " #   Column         Non-Null Count  Dtype  \n",
      "---  ------         --------------  -----  \n",
      " 0   placed_at      694 non-null    object \n",
      " 1   restaurant_id  694 non-null    int64  \n",
      " 2   restaurant     694 non-null    object \n",
      " 3   o_street       694 non-null    object \n",
      " 4   o_zip          694 non-null    int64  \n",
      " 5   o_city         694 non-null    object \n",
      " 6   o_latitude     694 non-null    float64\n",
      " 7   o_longitude    694 non-null    float64\n",
      " 8   customer_id    694 non-null    int64  \n",
      " 9   d_street       694 non-null    object \n",
      " 10  d_zip          694 non-null    int64  \n",
      " 11  d_city         694 non-null    object \n",
      " 12  d_latitude     694 non-null    float64\n",
      " 13  d_longitude    694 non-null    float64\n",
      " 14  total          694 non-null    int64  \n",
      " 15  courier_id     690 non-null    float64\n",
      " 16  pickup_at      665 non-null    object \n",
      " 17  delivery_at    663 non-null    object \n",
      " 18  cancelled      694 non-null    int64  \n",
      "dtypes: float64(5), int64(6), object(8)\n",
      "memory usage: 108.4+ KB\n"
     ]
    }
   ],
   "source": [
    "df.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[.describe() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.describe.html#pandas.DataFrame.describe) shows statistics on all numerical columns in a `DataFrame`.\n",
    "\n",
    "For the example orders, such statistics may not be meaningful for all numerical columns as some of them merely hold IDs or zip codes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>restaurant_id</th>\n",
       "      <th>o_zip</th>\n",
       "      <th>o_latitude</th>\n",
       "      <th>o_longitude</th>\n",
       "      <th>customer_id</th>\n",
       "      <th>d_zip</th>\n",
       "      <th>d_latitude</th>\n",
       "      <th>d_longitude</th>\n",
       "      <th>total</th>\n",
       "      <th>courier_id</th>\n",
       "      <th>cancelled</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>694.000000</td>\n",
       "      <td>694.000000</td>\n",
       "      <td>694.000000</td>\n",
       "      <td>694.000000</td>\n",
       "      <td>694.000000</td>\n",
       "      <td>694.000000</td>\n",
       "      <td>694.000000</td>\n",
       "      <td>694.000000</td>\n",
       "      <td>694.000000</td>\n",
       "      <td>690.000000</td>\n",
       "      <td>694.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>1228.479827</td>\n",
       "      <td>33075.216138</td>\n",
       "      <td>44.839258</td>\n",
       "      <td>-0.575759</td>\n",
       "      <td>74751.126801</td>\n",
       "      <td>33191.613833</td>\n",
       "      <td>44.838623</td>\n",
       "      <td>-0.576040</td>\n",
       "      <td>2294.636888</td>\n",
       "      <td>1484.755072</td>\n",
       "      <td>0.044669</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>18.001091</td>\n",
       "      <td>207.971435</td>\n",
       "      <td>0.007471</td>\n",
       "      <td>0.006920</td>\n",
       "      <td>14604.304963</td>\n",
       "      <td>307.378697</td>\n",
       "      <td>0.011545</td>\n",
       "      <td>0.010799</td>\n",
       "      <td>1060.695748</td>\n",
       "      <td>154.586210</td>\n",
       "      <td>0.206724</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>1204.000000</td>\n",
       "      <td>33000.000000</td>\n",
       "      <td>44.818180</td>\n",
       "      <td>-0.599400</td>\n",
       "      <td>2377.000000</td>\n",
       "      <td>33000.000000</td>\n",
       "      <td>44.809813</td>\n",
       "      <td>-0.606892</td>\n",
       "      <td>350.000000</td>\n",
       "      <td>1403.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>1212.000000</td>\n",
       "      <td>33000.000000</td>\n",
       "      <td>44.836910</td>\n",
       "      <td>-0.579345</td>\n",
       "      <td>76648.500000</td>\n",
       "      <td>33000.000000</td>\n",
       "      <td>44.829981</td>\n",
       "      <td>-0.581612</td>\n",
       "      <td>1500.000000</td>\n",
       "      <td>1415.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>1224.000000</td>\n",
       "      <td>33000.000000</td>\n",
       "      <td>44.838287</td>\n",
       "      <td>-0.573940</td>\n",
       "      <td>78146.000000</td>\n",
       "      <td>33000.000000</td>\n",
       "      <td>44.838364</td>\n",
       "      <td>-0.575056</td>\n",
       "      <td>1969.500000</td>\n",
       "      <td>1424.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>1244.000000</td>\n",
       "      <td>33000.000000</td>\n",
       "      <td>44.841721</td>\n",
       "      <td>-0.572281</td>\n",
       "      <td>79331.500000</td>\n",
       "      <td>33300.000000</td>\n",
       "      <td>44.846696</td>\n",
       "      <td>-0.569601</td>\n",
       "      <td>2750.000000</td>\n",
       "      <td>1462.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>1267.000000</td>\n",
       "      <td>33800.000000</td>\n",
       "      <td>44.855438</td>\n",
       "      <td>-0.550576</td>\n",
       "      <td>80401.000000</td>\n",
       "      <td>33800.000000</td>\n",
       "      <td>44.877693</td>\n",
       "      <td>-0.537952</td>\n",
       "      <td>8370.000000</td>\n",
       "      <td>2013.000000</td>\n",
       "      <td>1.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       restaurant_id         o_zip  o_latitude  o_longitude   customer_id  \\\n",
       "count     694.000000    694.000000  694.000000   694.000000    694.000000   \n",
       "mean     1228.479827  33075.216138   44.839258    -0.575759  74751.126801   \n",
       "std        18.001091    207.971435    0.007471     0.006920  14604.304963   \n",
       "min      1204.000000  33000.000000   44.818180    -0.599400   2377.000000   \n",
       "25%      1212.000000  33000.000000   44.836910    -0.579345  76648.500000   \n",
       "50%      1224.000000  33000.000000   44.838287    -0.573940  78146.000000   \n",
       "75%      1244.000000  33000.000000   44.841721    -0.572281  79331.500000   \n",
       "max      1267.000000  33800.000000   44.855438    -0.550576  80401.000000   \n",
       "\n",
       "              d_zip  d_latitude  d_longitude        total   courier_id  \\\n",
       "count    694.000000  694.000000   694.000000   694.000000   690.000000   \n",
       "mean   33191.613833   44.838623    -0.576040  2294.636888  1484.755072   \n",
       "std      307.378697    0.011545     0.010799  1060.695748   154.586210   \n",
       "min    33000.000000   44.809813    -0.606892   350.000000  1403.000000   \n",
       "25%    33000.000000   44.829981    -0.581612  1500.000000  1415.000000   \n",
       "50%    33000.000000   44.838364    -0.575056  1969.500000  1424.000000   \n",
       "75%    33300.000000   44.846696    -0.569601  2750.000000  1462.000000   \n",
       "max    33800.000000   44.877693    -0.537952  8370.000000  2013.000000   \n",
       "\n",
       "        cancelled  \n",
       "count  694.000000  \n",
       "mean     0.044669  \n",
       "std      0.206724  \n",
       "min      0.000000  \n",
       "25%      0.000000  \n",
       "50%      0.000000  \n",
       "75%      0.000000  \n",
       "max      1.000000  "
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Indexing & Slicing"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`DataFrame`s support being indexed or sliced, both in the row and column dimensions.\n",
    "\n",
    "To obtain all data in a single column, we index into the `DataFrame` with the column's name.\n",
    "\n",
    "For example, `restaurant_col` provides a list of only the restaurant names. Its index are still the \"order_id\"s."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "order_id\n",
       "192594              Max A Table\n",
       "192644              Max A Table\n",
       "192658                Taj Mahal\n",
       "193242    Chez Ambre And Michel\n",
       "192719     La Maison Du Glacier\n",
       "Name: restaurant, dtype: object"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "restaurant_col = df[\"restaurant\"]\n",
    "\n",
    "restaurant_col.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data type of a single column is [pd.Series <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.Series.html#pandas.Series), which is very similar to a `DataFrame` with only one column. `Series` objects work like built-in `list`s with added functionalities."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.series.Series"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(restaurant_col)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we index with a `list` of column names, the result is itself another `DataFrame`. That operation is like slicing out a smaller matrix from a larger one as we saw with `ndarray`s before.\n",
    "\n",
    "For example, let's pull out all location data of the orders' origins (i.e., restaurants)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>o_street</th>\n",
       "      <th>o_zip</th>\n",
       "      <th>o_city</th>\n",
       "      <th>o_latitude</th>\n",
       "      <th>o_longitude</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>order_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>192594</th>\n",
       "      <td>36 Rue Cornac</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.851402</td>\n",
       "      <td>-0.575870</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>192644</th>\n",
       "      <td>36 Rue Cornac</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.851402</td>\n",
       "      <td>-0.575870</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>192658</th>\n",
       "      <td>24 Rue Du Parlement Sainte-Catherine</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.840405</td>\n",
       "      <td>-0.573940</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>193242</th>\n",
       "      <td>1 Rue Matignon</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.850258</td>\n",
       "      <td>-0.586204</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>192719</th>\n",
       "      <td>1 Place Saint Pierre</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.839706</td>\n",
       "      <td>-0.570672</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                      o_street  o_zip    o_city  o_latitude  \\\n",
       "order_id                                                                      \n",
       "192594                           36 Rue Cornac  33000  Bordeaux   44.851402   \n",
       "192644                           36 Rue Cornac  33000  Bordeaux   44.851402   \n",
       "192658    24 Rue Du Parlement Sainte-Catherine  33000  Bordeaux   44.840405   \n",
       "193242                          1 Rue Matignon  33000  Bordeaux   44.850258   \n",
       "192719                    1 Place Saint Pierre  33000  Bordeaux   44.839706   \n",
       "\n",
       "          o_longitude  \n",
       "order_id               \n",
       "192594      -0.575870  \n",
       "192644      -0.575870  \n",
       "192658      -0.573940  \n",
       "193242      -0.586204  \n",
       "192719      -0.570672  "
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "origins = df[[\"o_street\", \"o_zip\", \"o_city\", \"o_latitude\", \"o_longitude\"]]\n",
    "\n",
    "origins.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To access individual rows, we index not into a `DataFrame` directly but into its [.loc <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html#pandas.DataFrame.loc) object (which also exists for `Series`).\n",
    "\n",
    "Here, `200800` is an \"order_id\" number. The result is a `Series` object where the original `DataFrame`'s columns become the index."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "placed_at        2016-07-24 19:30:52\n",
       "restaurant_id                   1204\n",
       "restaurant               Max A Table\n",
       "o_street               36 Rue Cornac\n",
       "o_zip                          33000\n",
       "o_city                      Bordeaux\n",
       "o_latitude                 44.851402\n",
       "o_longitude                 -0.57587\n",
       "customer_id                    76187\n",
       "d_street            Rue Judaique 213\n",
       "d_zip                          33000\n",
       "d_city                      Bordeaux\n",
       "d_latitude                 44.840829\n",
       "d_longitude                -0.595445\n",
       "total                           2250\n",
       "courier_id                    1468.0\n",
       "pickup_at        2016-07-24 19:50:52\n",
       "delivery_at      2016-07-24 19:58:16\n",
       "cancelled                          0\n",
       "Name: 200800, dtype: object"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[200800]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also index into the `restaurant_col` and `origins` objects from above. As `restaurant_col` is a `Series`, we get back a scalar value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Max A Table'"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "restaurant_col.loc[200800]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "o_street       36 Rue Cornac\n",
       "o_zip                  33000\n",
       "o_city              Bordeaux\n",
       "o_latitude         44.851402\n",
       "o_longitude         -0.57587\n",
       "Name: 200800, dtype: object"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "origins.loc[200800]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Slicing also works with [.loc <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html#pandas.DataFrame.loc). A tiny difference to Python's built-in slicing, the upper bound is included in the slice as well!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>placed_at</th>\n",
       "      <th>restaurant_id</th>\n",
       "      <th>restaurant</th>\n",
       "      <th>o_street</th>\n",
       "      <th>o_zip</th>\n",
       "      <th>o_city</th>\n",
       "      <th>o_latitude</th>\n",
       "      <th>o_longitude</th>\n",
       "      <th>customer_id</th>\n",
       "      <th>d_street</th>\n",
       "      <th>d_zip</th>\n",
       "      <th>d_city</th>\n",
       "      <th>d_latitude</th>\n",
       "      <th>d_longitude</th>\n",
       "      <th>total</th>\n",
       "      <th>courier_id</th>\n",
       "      <th>pickup_at</th>\n",
       "      <th>delivery_at</th>\n",
       "      <th>cancelled</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>order_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>200300</th>\n",
       "      <td>2016-07-24 13:46:15</td>\n",
       "      <td>1207</td>\n",
       "      <td>Le Jardin Pekinois</td>\n",
       "      <td>9 Rue Des Freres Bonie</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.837078</td>\n",
       "      <td>-0.579572</td>\n",
       "      <td>76030</td>\n",
       "      <td>Rue Villeneuve 1</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.839927</td>\n",
       "      <td>-0.580012</td>\n",
       "      <td>3820</td>\n",
       "      <td>1426.0</td>\n",
       "      <td>2016-07-24 14:12:45</td>\n",
       "      <td>2016-07-24 14:16:59</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>200305</th>\n",
       "      <td>2016-07-24 13:49:25</td>\n",
       "      <td>1207</td>\n",
       "      <td>Le Jardin Pekinois</td>\n",
       "      <td>9 Rue Des Freres Bonie</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.837078</td>\n",
       "      <td>-0.579572</td>\n",
       "      <td>76033</td>\n",
       "      <td>Rue Du Ha 54</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.835898</td>\n",
       "      <td>-0.577941</td>\n",
       "      <td>1689</td>\n",
       "      <td>1405.0</td>\n",
       "      <td>2016-07-24 14:12:04</td>\n",
       "      <td>2016-07-24 14:15:54</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>200800</th>\n",
       "      <td>2016-07-24 19:30:52</td>\n",
       "      <td>1204</td>\n",
       "      <td>Max A Table</td>\n",
       "      <td>36 Rue Cornac</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.851402</td>\n",
       "      <td>-0.575870</td>\n",
       "      <td>76187</td>\n",
       "      <td>Rue Judaique 213</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.840829</td>\n",
       "      <td>-0.595445</td>\n",
       "      <td>2250</td>\n",
       "      <td>1468.0</td>\n",
       "      <td>2016-07-24 19:50:52</td>\n",
       "      <td>2016-07-24 19:58:16</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    placed_at  restaurant_id          restaurant  \\\n",
       "order_id                                                           \n",
       "200300    2016-07-24 13:46:15           1207  Le Jardin Pekinois   \n",
       "200305    2016-07-24 13:49:25           1207  Le Jardin Pekinois   \n",
       "200800    2016-07-24 19:30:52           1204         Max A Table   \n",
       "\n",
       "                        o_street  o_zip    o_city  o_latitude  o_longitude  \\\n",
       "order_id                                                                     \n",
       "200300    9 Rue Des Freres Bonie  33000  Bordeaux   44.837078    -0.579572   \n",
       "200305    9 Rue Des Freres Bonie  33000  Bordeaux   44.837078    -0.579572   \n",
       "200800             36 Rue Cornac  33000  Bordeaux   44.851402    -0.575870   \n",
       "\n",
       "          customer_id          d_street  d_zip    d_city  d_latitude  \\\n",
       "order_id                                                               \n",
       "200300          76030  Rue Villeneuve 1  33000  Bordeaux   44.839927   \n",
       "200305          76033      Rue Du Ha 54  33000  Bordeaux   44.835898   \n",
       "200800          76187  Rue Judaique 213  33000  Bordeaux   44.840829   \n",
       "\n",
       "          d_longitude  total  courier_id            pickup_at  \\\n",
       "order_id                                                        \n",
       "200300      -0.580012   3820      1426.0  2016-07-24 14:12:45   \n",
       "200305      -0.577941   1689      1405.0  2016-07-24 14:12:04   \n",
       "200800      -0.595445   2250      1468.0  2016-07-24 19:50:52   \n",
       "\n",
       "                  delivery_at  cancelled  \n",
       "order_id                                  \n",
       "200300    2016-07-24 14:16:59          0  \n",
       "200305    2016-07-24 14:15:54          0  \n",
       "200800    2016-07-24 19:58:16          0  "
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[200300:200800]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "order_id\n",
       "200300    Le Jardin Pekinois\n",
       "200305    Le Jardin Pekinois\n",
       "200800           Max A Table\n",
       "Name: restaurant, dtype: object"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "restaurant_col.loc[200300:200800]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>o_street</th>\n",
       "      <th>o_zip</th>\n",
       "      <th>o_city</th>\n",
       "      <th>o_latitude</th>\n",
       "      <th>o_longitude</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>order_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>200300</th>\n",
       "      <td>9 Rue Des Freres Bonie</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.837078</td>\n",
       "      <td>-0.579572</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>200305</th>\n",
       "      <td>9 Rue Des Freres Bonie</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.837078</td>\n",
       "      <td>-0.579572</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>200800</th>\n",
       "      <td>36 Rue Cornac</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.851402</td>\n",
       "      <td>-0.575870</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                        o_street  o_zip    o_city  o_latitude  o_longitude\n",
       "order_id                                                                  \n",
       "200300    9 Rue Des Freres Bonie  33000  Bordeaux   44.837078    -0.579572\n",
       "200305    9 Rue Des Freres Bonie  33000  Bordeaux   44.837078    -0.579572\n",
       "200800             36 Rue Cornac  33000  Bordeaux   44.851402    -0.575870"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "origins.loc[200300:200800]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[.loc <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html#pandas.DataFrame.loc) also allows us to index and slice in both dimensions simultaneously. The first index or slice goes along the row dimension while the second index or slice selects the columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>o_street</th>\n",
       "      <th>o_zip</th>\n",
       "      <th>o_city</th>\n",
       "      <th>o_latitude</th>\n",
       "      <th>o_longitude</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>order_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>200300</th>\n",
       "      <td>9 Rue Des Freres Bonie</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.837078</td>\n",
       "      <td>-0.579572</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>200305</th>\n",
       "      <td>9 Rue Des Freres Bonie</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.837078</td>\n",
       "      <td>-0.579572</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>200800</th>\n",
       "      <td>36 Rue Cornac</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.851402</td>\n",
       "      <td>-0.575870</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                        o_street  o_zip    o_city  o_latitude  o_longitude\n",
       "order_id                                                                  \n",
       "200300    9 Rue Des Freres Bonie  33000  Bordeaux   44.837078    -0.579572\n",
       "200305    9 Rue Des Freres Bonie  33000  Bordeaux   44.837078    -0.579572\n",
       "200800             36 Rue Cornac  33000  Bordeaux   44.851402    -0.575870"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[200300:200800, [\"o_street\", \"o_zip\", \"o_city\", \"o_latitude\", \"o_longitude\"]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Type Casting"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As [.info() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.info.html#pandas.DataFrame.info) already revealed above, the timestamp columns could only be parsed as generic objects (i.e., textual data). Also, the \"cancelled\" column which holds only `True` or `False` values does not have a `bool` data type."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "Index: 694 entries, 192594 to 211519\n",
      "Data columns (total 19 columns):\n",
      " #   Column         Non-Null Count  Dtype  \n",
      "---  ------         --------------  -----  \n",
      " 0   placed_at      694 non-null    object \n",
      " 1   restaurant_id  694 non-null    int64  \n",
      " 2   restaurant     694 non-null    object \n",
      " 3   o_street       694 non-null    object \n",
      " 4   o_zip          694 non-null    int64  \n",
      " 5   o_city         694 non-null    object \n",
      " 6   o_latitude     694 non-null    float64\n",
      " 7   o_longitude    694 non-null    float64\n",
      " 8   customer_id    694 non-null    int64  \n",
      " 9   d_street       694 non-null    object \n",
      " 10  d_zip          694 non-null    int64  \n",
      " 11  d_city         694 non-null    object \n",
      " 12  d_latitude     694 non-null    float64\n",
      " 13  d_longitude    694 non-null    float64\n",
      " 14  total          694 non-null    int64  \n",
      " 15  courier_id     690 non-null    float64\n",
      " 16  pickup_at      665 non-null    object \n",
      " 17  delivery_at    663 non-null    object \n",
      " 18  cancelled      694 non-null    int64  \n",
      "dtypes: float64(5), int64(6), object(8)\n",
      "memory usage: 124.6+ KB\n"
     ]
    }
   ],
   "source": [
    "df.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The [pd.to_datetime() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html#pandas.to_datetime) function **casts** the timestamp columns correctly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "order_id\n",
       "192594   2016-07-18 12:23:13\n",
       "192644   2016-07-18 12:48:55\n",
       "192658   2016-07-18 13:00:13\n",
       "193242   2016-07-18 20:39:54\n",
       "192719   2016-07-18 13:52:04\n",
       "                 ...        \n",
       "212021   2016-07-30 22:29:52\n",
       "211501   2016-07-30 20:44:50\n",
       "211508   2016-07-30 20:45:55\n",
       "211510   2016-07-30 20:46:05\n",
       "211519   2016-07-30 20:46:55\n",
       "Name: placed_at, Length: 694, dtype: datetime64[ns]"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.to_datetime(df[\"placed_at\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's overwrite the original \"placed_at\" column with one that has the correct data type."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "df[\"placed_at\"] = pd.to_datetime(df[\"placed_at\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The [.astype() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.astype.html#pandas.DataFrame.astype) method generalizes this idea and allows us to cast several columns in a `DataFrame`. It takes a `dict`ionary mapping column names to data types as its input. Instead of references to actual data types (e.g., `bool`), it also understands [pandas <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/)-specific data types provides as text."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = df.astype(\n",
    "    {\n",
    "        \"pickup_at\": \"datetime64[ns]\",\n",
    "        \"delivery_at\": \"datetime64[ns]\",\n",
    "        \"cancelled\": bool,\n",
    "    }\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, all columns in `df` have more applicable data types."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "Index: 694 entries, 192594 to 211519\n",
      "Data columns (total 19 columns):\n",
      " #   Column         Non-Null Count  Dtype         \n",
      "---  ------         --------------  -----         \n",
      " 0   placed_at      694 non-null    datetime64[ns]\n",
      " 1   restaurant_id  694 non-null    int64         \n",
      " 2   restaurant     694 non-null    object        \n",
      " 3   o_street       694 non-null    object        \n",
      " 4   o_zip          694 non-null    int64         \n",
      " 5   o_city         694 non-null    object        \n",
      " 6   o_latitude     694 non-null    float64       \n",
      " 7   o_longitude    694 non-null    float64       \n",
      " 8   customer_id    694 non-null    int64         \n",
      " 9   d_street       694 non-null    object        \n",
      " 10  d_zip          694 non-null    int64         \n",
      " 11  d_city         694 non-null    object        \n",
      " 12  d_latitude     694 non-null    float64       \n",
      " 13  d_longitude    694 non-null    float64       \n",
      " 14  total          694 non-null    int64         \n",
      " 15  courier_id     690 non-null    float64       \n",
      " 16  pickup_at      665 non-null    datetime64[ns]\n",
      " 17  delivery_at    663 non-null    datetime64[ns]\n",
      " 18  cancelled      694 non-null    bool          \n",
      "dtypes: bool(1), datetime64[ns](3), float64(5), int64(5), object(5)\n",
      "memory usage: 119.9+ KB\n"
     ]
    }
   ],
   "source": [
    "df.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Filtering"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A common operation when working with `DataFrame`s is to filter for rows fulfilling certain conditions. That is implemented by so-called **boolean filters** in [pandas <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/), which is simply a `DataFrame` or `Series` holding only `True` or `False` values.\n",
    "\n",
    "One way to obtain such objects is to use relational operators with columns.\n",
    "\n",
    "`max_a_table` holds `True` values for all orders at the restaurant with the ID `1204`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "order_id\n",
       "192594     True\n",
       "192644     True\n",
       "192658    False\n",
       "193242    False\n",
       "192719    False\n",
       "          ...  \n",
       "212021    False\n",
       "211501     True\n",
       "211508    False\n",
       "211510    False\n",
       "211519    False\n",
       "Name: restaurant_id, Length: 694, dtype: bool"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "max_a_table = df[\"restaurant_id\"] == 1204\n",
    "\n",
    "max_a_table"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, let's use a boolean filter to index into `df`. That gives us back a new `DataFame` with all orders belonging to the restaurant \"Max A Table\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>placed_at</th>\n",
       "      <th>restaurant_id</th>\n",
       "      <th>restaurant</th>\n",
       "      <th>o_street</th>\n",
       "      <th>o_zip</th>\n",
       "      <th>o_city</th>\n",
       "      <th>o_latitude</th>\n",
       "      <th>o_longitude</th>\n",
       "      <th>customer_id</th>\n",
       "      <th>d_street</th>\n",
       "      <th>d_zip</th>\n",
       "      <th>d_city</th>\n",
       "      <th>d_latitude</th>\n",
       "      <th>d_longitude</th>\n",
       "      <th>total</th>\n",
       "      <th>courier_id</th>\n",
       "      <th>pickup_at</th>\n",
       "      <th>delivery_at</th>\n",
       "      <th>cancelled</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>order_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>192594</th>\n",
       "      <td>2016-07-18 12:23:13</td>\n",
       "      <td>1204</td>\n",
       "      <td>Max A Table</td>\n",
       "      <td>36 Rue Cornac</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.851402</td>\n",
       "      <td>-0.57587</td>\n",
       "      <td>10298</td>\n",
       "      <td>Rue Rolland 14</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.842592</td>\n",
       "      <td>-0.580521</td>\n",
       "      <td>2050</td>\n",
       "      <td>1423.0</td>\n",
       "      <td>2016-07-18 12:38:08</td>\n",
       "      <td>2016-07-18 12:48:22</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>192644</th>\n",
       "      <td>2016-07-18 12:48:55</td>\n",
       "      <td>1204</td>\n",
       "      <td>Max A Table</td>\n",
       "      <td>36 Rue Cornac</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.851402</td>\n",
       "      <td>-0.57587</td>\n",
       "      <td>6037</td>\n",
       "      <td>Rue Rolland 14</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.842592</td>\n",
       "      <td>-0.580521</td>\n",
       "      <td>2450</td>\n",
       "      <td>1426.0</td>\n",
       "      <td>2016-07-18 13:03:08</td>\n",
       "      <td>2016-07-18 13:12:01</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>194335</th>\n",
       "      <td>2016-07-19 20:35:21</td>\n",
       "      <td>1204</td>\n",
       "      <td>Max A Table</td>\n",
       "      <td>36 Rue Cornac</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.851402</td>\n",
       "      <td>-0.57587</td>\n",
       "      <td>74268</td>\n",
       "      <td>Place Canteloup 12</td>\n",
       "      <td>33800</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.833834</td>\n",
       "      <td>-0.565674</td>\n",
       "      <td>3100</td>\n",
       "      <td>1420.0</td>\n",
       "      <td>2016-07-19 20:51:16</td>\n",
       "      <td>2016-07-19 21:01:08</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>196615</th>\n",
       "      <td>2016-07-21 19:50:15</td>\n",
       "      <td>1204</td>\n",
       "      <td>Max A Table</td>\n",
       "      <td>36 Rue Cornac</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.851402</td>\n",
       "      <td>-0.57587</td>\n",
       "      <td>74901</td>\n",
       "      <td>Rue Marcelin Jourdan 55</td>\n",
       "      <td>33200</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.850360</td>\n",
       "      <td>-0.597361</td>\n",
       "      <td>2050</td>\n",
       "      <td>1418.0</td>\n",
       "      <td>2016-07-21 20:12:29</td>\n",
       "      <td>2016-07-21 20:25:57</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>196839</th>\n",
       "      <td>2016-07-21 20:27:22</td>\n",
       "      <td>1204</td>\n",
       "      <td>Max A Table</td>\n",
       "      <td>36 Rue Cornac</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.851402</td>\n",
       "      <td>-0.57587</td>\n",
       "      <td>74966</td>\n",
       "      <td>Rue Sainte-Catherine 137</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.836516</td>\n",
       "      <td>-0.573983</td>\n",
       "      <td>3750</td>\n",
       "      <td>1472.0</td>\n",
       "      <td>2016-07-21 20:41:42</td>\n",
       "      <td>2016-07-21 21:14:41</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   placed_at  restaurant_id   restaurant       o_street  \\\n",
       "order_id                                                                  \n",
       "192594   2016-07-18 12:23:13           1204  Max A Table  36 Rue Cornac   \n",
       "192644   2016-07-18 12:48:55           1204  Max A Table  36 Rue Cornac   \n",
       "194335   2016-07-19 20:35:21           1204  Max A Table  36 Rue Cornac   \n",
       "196615   2016-07-21 19:50:15           1204  Max A Table  36 Rue Cornac   \n",
       "196839   2016-07-21 20:27:22           1204  Max A Table  36 Rue Cornac   \n",
       "\n",
       "          o_zip    o_city  o_latitude  o_longitude  customer_id  \\\n",
       "order_id                                                          \n",
       "192594    33000  Bordeaux   44.851402     -0.57587        10298   \n",
       "192644    33000  Bordeaux   44.851402     -0.57587         6037   \n",
       "194335    33000  Bordeaux   44.851402     -0.57587        74268   \n",
       "196615    33000  Bordeaux   44.851402     -0.57587        74901   \n",
       "196839    33000  Bordeaux   44.851402     -0.57587        74966   \n",
       "\n",
       "                          d_street  d_zip    d_city  d_latitude  d_longitude  \\\n",
       "order_id                                                                       \n",
       "192594              Rue Rolland 14  33000  Bordeaux   44.842592    -0.580521   \n",
       "192644              Rue Rolland 14  33000  Bordeaux   44.842592    -0.580521   \n",
       "194335          Place Canteloup 12  33800  Bordeaux   44.833834    -0.565674   \n",
       "196615     Rue Marcelin Jourdan 55  33200  Bordeaux   44.850360    -0.597361   \n",
       "196839    Rue Sainte-Catherine 137  33000  Bordeaux   44.836516    -0.573983   \n",
       "\n",
       "          total  courier_id           pickup_at         delivery_at  cancelled  \n",
       "order_id                                                                        \n",
       "192594     2050      1423.0 2016-07-18 12:38:08 2016-07-18 12:48:22      False  \n",
       "192644     2450      1426.0 2016-07-18 13:03:08 2016-07-18 13:12:01      False  \n",
       "194335     3100      1420.0 2016-07-19 20:51:16 2016-07-19 21:01:08      False  \n",
       "196615     2050      1418.0 2016-07-21 20:12:29 2016-07-21 20:25:57      False  \n",
       "196839     3750      1472.0 2016-07-21 20:41:42 2016-07-21 21:14:41      False  "
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[df[\"restaurant_id\"] == 1204].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Instead of an explicit condition, we can also use a reference to a boolean filter created above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>placed_at</th>\n",
       "      <th>restaurant_id</th>\n",
       "      <th>restaurant</th>\n",
       "      <th>o_street</th>\n",
       "      <th>o_zip</th>\n",
       "      <th>o_city</th>\n",
       "      <th>o_latitude</th>\n",
       "      <th>o_longitude</th>\n",
       "      <th>customer_id</th>\n",
       "      <th>d_street</th>\n",
       "      <th>d_zip</th>\n",
       "      <th>d_city</th>\n",
       "      <th>d_latitude</th>\n",
       "      <th>d_longitude</th>\n",
       "      <th>total</th>\n",
       "      <th>courier_id</th>\n",
       "      <th>pickup_at</th>\n",
       "      <th>delivery_at</th>\n",
       "      <th>cancelled</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>order_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>192594</th>\n",
       "      <td>2016-07-18 12:23:13</td>\n",
       "      <td>1204</td>\n",
       "      <td>Max A Table</td>\n",
       "      <td>36 Rue Cornac</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.851402</td>\n",
       "      <td>-0.57587</td>\n",
       "      <td>10298</td>\n",
       "      <td>Rue Rolland 14</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.842592</td>\n",
       "      <td>-0.580521</td>\n",
       "      <td>2050</td>\n",
       "      <td>1423.0</td>\n",
       "      <td>2016-07-18 12:38:08</td>\n",
       "      <td>2016-07-18 12:48:22</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>192644</th>\n",
       "      <td>2016-07-18 12:48:55</td>\n",
       "      <td>1204</td>\n",
       "      <td>Max A Table</td>\n",
       "      <td>36 Rue Cornac</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.851402</td>\n",
       "      <td>-0.57587</td>\n",
       "      <td>6037</td>\n",
       "      <td>Rue Rolland 14</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.842592</td>\n",
       "      <td>-0.580521</td>\n",
       "      <td>2450</td>\n",
       "      <td>1426.0</td>\n",
       "      <td>2016-07-18 13:03:08</td>\n",
       "      <td>2016-07-18 13:12:01</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>194335</th>\n",
       "      <td>2016-07-19 20:35:21</td>\n",
       "      <td>1204</td>\n",
       "      <td>Max A Table</td>\n",
       "      <td>36 Rue Cornac</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.851402</td>\n",
       "      <td>-0.57587</td>\n",
       "      <td>74268</td>\n",
       "      <td>Place Canteloup 12</td>\n",
       "      <td>33800</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.833834</td>\n",
       "      <td>-0.565674</td>\n",
       "      <td>3100</td>\n",
       "      <td>1420.0</td>\n",
       "      <td>2016-07-19 20:51:16</td>\n",
       "      <td>2016-07-19 21:01:08</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>196615</th>\n",
       "      <td>2016-07-21 19:50:15</td>\n",
       "      <td>1204</td>\n",
       "      <td>Max A Table</td>\n",
       "      <td>36 Rue Cornac</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.851402</td>\n",
       "      <td>-0.57587</td>\n",
       "      <td>74901</td>\n",
       "      <td>Rue Marcelin Jourdan 55</td>\n",
       "      <td>33200</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.850360</td>\n",
       "      <td>-0.597361</td>\n",
       "      <td>2050</td>\n",
       "      <td>1418.0</td>\n",
       "      <td>2016-07-21 20:12:29</td>\n",
       "      <td>2016-07-21 20:25:57</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>196839</th>\n",
       "      <td>2016-07-21 20:27:22</td>\n",
       "      <td>1204</td>\n",
       "      <td>Max A Table</td>\n",
       "      <td>36 Rue Cornac</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.851402</td>\n",
       "      <td>-0.57587</td>\n",
       "      <td>74966</td>\n",
       "      <td>Rue Sainte-Catherine 137</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.836516</td>\n",
       "      <td>-0.573983</td>\n",
       "      <td>3750</td>\n",
       "      <td>1472.0</td>\n",
       "      <td>2016-07-21 20:41:42</td>\n",
       "      <td>2016-07-21 21:14:41</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   placed_at  restaurant_id   restaurant       o_street  \\\n",
       "order_id                                                                  \n",
       "192594   2016-07-18 12:23:13           1204  Max A Table  36 Rue Cornac   \n",
       "192644   2016-07-18 12:48:55           1204  Max A Table  36 Rue Cornac   \n",
       "194335   2016-07-19 20:35:21           1204  Max A Table  36 Rue Cornac   \n",
       "196615   2016-07-21 19:50:15           1204  Max A Table  36 Rue Cornac   \n",
       "196839   2016-07-21 20:27:22           1204  Max A Table  36 Rue Cornac   \n",
       "\n",
       "          o_zip    o_city  o_latitude  o_longitude  customer_id  \\\n",
       "order_id                                                          \n",
       "192594    33000  Bordeaux   44.851402     -0.57587        10298   \n",
       "192644    33000  Bordeaux   44.851402     -0.57587         6037   \n",
       "194335    33000  Bordeaux   44.851402     -0.57587        74268   \n",
       "196615    33000  Bordeaux   44.851402     -0.57587        74901   \n",
       "196839    33000  Bordeaux   44.851402     -0.57587        74966   \n",
       "\n",
       "                          d_street  d_zip    d_city  d_latitude  d_longitude  \\\n",
       "order_id                                                                       \n",
       "192594              Rue Rolland 14  33000  Bordeaux   44.842592    -0.580521   \n",
       "192644              Rue Rolland 14  33000  Bordeaux   44.842592    -0.580521   \n",
       "194335          Place Canteloup 12  33800  Bordeaux   44.833834    -0.565674   \n",
       "196615     Rue Marcelin Jourdan 55  33200  Bordeaux   44.850360    -0.597361   \n",
       "196839    Rue Sainte-Catherine 137  33000  Bordeaux   44.836516    -0.573983   \n",
       "\n",
       "          total  courier_id           pickup_at         delivery_at  cancelled  \n",
       "order_id                                                                        \n",
       "192594     2050      1423.0 2016-07-18 12:38:08 2016-07-18 12:48:22      False  \n",
       "192644     2450      1426.0 2016-07-18 13:03:08 2016-07-18 13:12:01      False  \n",
       "194335     3100      1420.0 2016-07-19 20:51:16 2016-07-19 21:01:08      False  \n",
       "196615     2050      1418.0 2016-07-21 20:12:29 2016-07-21 20:25:57      False  \n",
       "196839     3750      1472.0 2016-07-21 20:41:42 2016-07-21 21:14:41      False  "
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[max_a_table].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Combining the filter with a `list` of columns allows us to further narrow down the `DataFrame`.\n",
    "\n",
    "For example, the preview below shows us the first five customers \"Max A Table\" had in the target period."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>customer_id</th>\n",
       "      <th>d_street</th>\n",
       "      <th>d_zip</th>\n",
       "      <th>d_city</th>\n",
       "      <th>d_latitude</th>\n",
       "      <th>d_longitude</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>order_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>192594</th>\n",
       "      <td>10298</td>\n",
       "      <td>Rue Rolland 14</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.842592</td>\n",
       "      <td>-0.580521</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>192644</th>\n",
       "      <td>6037</td>\n",
       "      <td>Rue Rolland 14</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.842592</td>\n",
       "      <td>-0.580521</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>194335</th>\n",
       "      <td>74268</td>\n",
       "      <td>Place Canteloup 12</td>\n",
       "      <td>33800</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.833834</td>\n",
       "      <td>-0.565674</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>196615</th>\n",
       "      <td>74901</td>\n",
       "      <td>Rue Marcelin Jourdan 55</td>\n",
       "      <td>33200</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.850360</td>\n",
       "      <td>-0.597361</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>196839</th>\n",
       "      <td>74966</td>\n",
       "      <td>Rue Sainte-Catherine 137</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.836516</td>\n",
       "      <td>-0.573983</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          customer_id                  d_street  d_zip    d_city  d_latitude  \\\n",
       "order_id                                                                       \n",
       "192594          10298            Rue Rolland 14  33000  Bordeaux   44.842592   \n",
       "192644           6037            Rue Rolland 14  33000  Bordeaux   44.842592   \n",
       "194335          74268        Place Canteloup 12  33800  Bordeaux   44.833834   \n",
       "196615          74901   Rue Marcelin Jourdan 55  33200  Bordeaux   44.850360   \n",
       "196839          74966  Rue Sainte-Catherine 137  33000  Bordeaux   44.836516   \n",
       "\n",
       "          d_longitude  \n",
       "order_id               \n",
       "192594      -0.580521  \n",
       "192644      -0.580521  \n",
       "194335      -0.565674  \n",
       "196615      -0.597361  \n",
       "196839      -0.573983  "
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[\n",
    "    max_a_table,\n",
    "    [\"customer_id\", \"d_street\", \"d_zip\", \"d_city\", \"d_latitude\", \"d_longitude\"],\n",
    "].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Boolean filters can be created in an arbitray fashion by combining several conditions with `&` and `|` modeling logical AND and OR operators.\n",
    "\n",
    "The example lists the first five customers of \"Max A Table\" in a target area provided as latitude-longitude coordinates."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>customer_id</th>\n",
       "      <th>d_street</th>\n",
       "      <th>d_zip</th>\n",
       "      <th>d_city</th>\n",
       "      <th>d_latitude</th>\n",
       "      <th>d_longitude</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>order_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>196615</th>\n",
       "      <td>74901</td>\n",
       "      <td>Rue Marcelin Jourdan 55</td>\n",
       "      <td>33200</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.850360</td>\n",
       "      <td>-0.597361</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>200800</th>\n",
       "      <td>76187</td>\n",
       "      <td>Rue Judaique 213</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.840829</td>\n",
       "      <td>-0.595445</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>200893</th>\n",
       "      <td>76218</td>\n",
       "      <td>Rue Notre Dame 21</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.850260</td>\n",
       "      <td>-0.572377</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>202788</th>\n",
       "      <td>76786</td>\n",
       "      <td>Rue De Leybardie 27</td>\n",
       "      <td>33300</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.861360</td>\n",
       "      <td>-0.565057</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>202563</th>\n",
       "      <td>76730</td>\n",
       "      <td>Rue Lombard 47</td>\n",
       "      <td>33300</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>44.858661</td>\n",
       "      <td>-0.563095</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          customer_id                 d_street  d_zip    d_city  d_latitude  \\\n",
       "order_id                                                                      \n",
       "196615          74901  Rue Marcelin Jourdan 55  33200  Bordeaux   44.850360   \n",
       "200800          76187         Rue Judaique 213  33000  Bordeaux   44.840829   \n",
       "200893          76218        Rue Notre Dame 21  33000  Bordeaux   44.850260   \n",
       "202788          76786      Rue De Leybardie 27  33300  Bordeaux   44.861360   \n",
       "202563          76730           Rue Lombard 47  33300  Bordeaux   44.858661   \n",
       "\n",
       "          d_longitude  \n",
       "order_id               \n",
       "196615      -0.597361  \n",
       "200800      -0.595445  \n",
       "200893      -0.572377  \n",
       "202788      -0.565057  \n",
       "202563      -0.563095  "
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[\n",
    "    (\n",
    "        max_a_table\n",
    "        &\n",
    "        (\n",
    "            (df[\"d_latitude\"] > 44.85) | (df[\"d_longitude\"] < -0.59)\n",
    "        )\n",
    "    ),\n",
    "    [\"customer_id\", \"d_street\", \"d_zip\", \"d_city\", \"d_latitude\", \"d_longitude\"],\n",
    "].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[.isin() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.isin.html#pandas.DataFrame.isin) is another useful method: It allows us to provide a `list` of values that we are filtering for in a column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>placed_at</th>\n",
       "      <th>customer_id</th>\n",
       "      <th>d_street</th>\n",
       "      <th>d_zip</th>\n",
       "      <th>d_city</th>\n",
       "      <th>total</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>order_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>192644</th>\n",
       "      <td>2016-07-18 12:48:55</td>\n",
       "      <td>6037</td>\n",
       "      <td>Rue Rolland 14</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>2450</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>210945</th>\n",
       "      <td>2016-07-30 19:30:39</td>\n",
       "      <td>79900</td>\n",
       "      <td>Rue Du Couvent 16</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>1650</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>211363</th>\n",
       "      <td>2016-07-30 20:27:45</td>\n",
       "      <td>80095</td>\n",
       "      <td>Rue De La Porte Saint-Jean 8</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>2400</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   placed_at  customer_id                      d_street  \\\n",
       "order_id                                                                  \n",
       "192644   2016-07-18 12:48:55         6037                Rue Rolland 14   \n",
       "210945   2016-07-30 19:30:39        79900             Rue Du Couvent 16   \n",
       "211363   2016-07-30 20:27:45        80095  Rue De La Porte Saint-Jean 8   \n",
       "\n",
       "          d_zip    d_city  total  \n",
       "order_id                          \n",
       "192644    33000  Bordeaux   2450  \n",
       "210945    33000  Bordeaux   1650  \n",
       "211363    33000  Bordeaux   2400  "
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[\n",
    "    (max_a_table & df[\"customer_id\"].isin([6037, 79900, 80095])),\n",
    "    [\"placed_at\", \"customer_id\", \"d_street\", \"d_zip\", \"d_city\", \"total\"],\n",
    "].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `~` operator negates a condition. So, in the cell below we see all orders at \"Max A Table\" except the ones from the indicated customers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>placed_at</th>\n",
       "      <th>customer_id</th>\n",
       "      <th>d_street</th>\n",
       "      <th>d_zip</th>\n",
       "      <th>d_city</th>\n",
       "      <th>total</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>order_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>192594</th>\n",
       "      <td>2016-07-18 12:23:13</td>\n",
       "      <td>10298</td>\n",
       "      <td>Rue Rolland 14</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>2050</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>194335</th>\n",
       "      <td>2016-07-19 20:35:21</td>\n",
       "      <td>74268</td>\n",
       "      <td>Place Canteloup 12</td>\n",
       "      <td>33800</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>3100</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>196615</th>\n",
       "      <td>2016-07-21 19:50:15</td>\n",
       "      <td>74901</td>\n",
       "      <td>Rue Marcelin Jourdan 55</td>\n",
       "      <td>33200</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>2050</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>196839</th>\n",
       "      <td>2016-07-21 20:27:22</td>\n",
       "      <td>74966</td>\n",
       "      <td>Rue Sainte-Catherine 137</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>3750</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>198631</th>\n",
       "      <td>2016-07-22 21:29:40</td>\n",
       "      <td>75047</td>\n",
       "      <td>Rue Boudet 29</td>\n",
       "      <td>33000</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>2650</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   placed_at  customer_id                  d_street  d_zip  \\\n",
       "order_id                                                                     \n",
       "192594   2016-07-18 12:23:13        10298            Rue Rolland 14  33000   \n",
       "194335   2016-07-19 20:35:21        74268        Place Canteloup 12  33800   \n",
       "196615   2016-07-21 19:50:15        74901   Rue Marcelin Jourdan 55  33200   \n",
       "196839   2016-07-21 20:27:22        74966  Rue Sainte-Catherine 137  33000   \n",
       "198631   2016-07-22 21:29:40        75047             Rue Boudet 29  33000   \n",
       "\n",
       "            d_city  total  \n",
       "order_id                   \n",
       "192594    Bordeaux   2050  \n",
       "194335    Bordeaux   3100  \n",
       "196615    Bordeaux   2050  \n",
       "196839    Bordeaux   3750  \n",
       "198631    Bordeaux   2650  "
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[\n",
    "    (max_a_table & ~df[\"customer_id\"].isin([6037, 79900, 80095])),\n",
    "    [\"placed_at\", \"customer_id\", \"d_street\", \"d_zip\", \"d_city\", \"total\"],\n",
    "].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## DataFrame Methods"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have learned the basics of selecting the data we want from a `DataFrame`, let's look at a couple of methods that allow us to obtain some infos out of a `DataFrame`, in particular, to run some **descriptive statistics**.\n",
    "\n",
    "[.unique() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.Series.unique.html#pandas.Series.unique) is a simple `Series` method returning an `ndarray` with all values that are in the `Series` once.\n",
    "\n",
    "Here, we get an overview of how many restaurants there are in Bordeaux in the target time horizon."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([1204, 1205, 1208, 1206, 1209, 1207, 1211, 1213, 1214, 1212, 1216,\n",
       "       1215, 1217, 1218, 1219, 1220, 1221, 1223, 1222, 1224, 1225, 1229,\n",
       "       1226, 1227, 1230, 1231, 1232, 1233, 1234, 1235, 1236, 1237, 1239,\n",
       "       1241, 1242, 1243, 1245, 1244, 1246, 1247, 1249, 1254, 1250, 1256,\n",
       "       1258, 1259, 1260, 1263, 1264, 1266, 1265, 1267])"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"restaurant_id\"].unique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "52"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(df[\"restaurant_id\"].unique())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To obtain an `ndarray` of all customer IDs of \"Max A Table\", we write the following."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([10298,  6037, 74268, 74901, 74966, 75047, 76187, 76218, 76442,\n",
       "       76396, 76421, 76786, 76822, 76730, 76871, 75687, 77409, 77386,\n",
       "       77355, 77556, 78129, 78353, 78608, 78621, 78958, 79119, 79153,\n",
       "       76838, 79234, 79486, 79576, 79563, 79653, 79900, 79912, 80026,\n",
       "       80204, 80095, 80163])"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[max_a_table, \"customer_id\"].unique()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[.value_counts() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.value_counts.html#pandas.DataFrame.value_counts) is similar to [.unique() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.Series.unique.html#pandas.Series.unique) and provides an array sorted by the counts of how often an element occurs in a column or `Series` in descending order.\n",
    "\n",
    "We use it to list the `10` most popular restaurants and customers in the dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "restaurant_id\n",
       "1254    78\n",
       "1207    47\n",
       "1204    39\n",
       "1217    37\n",
       "1212    32\n",
       "1244    25\n",
       "1225    25\n",
       "1249    23\n",
       "1242    19\n",
       "1221    18\n",
       "Name: count, dtype: int64"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"restaurant_id\"].value_counts().head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "customer_id\n",
       "73919    14\n",
       "10298    12\n",
       "6037      8\n",
       "77048     5\n",
       "4210      4\n",
       "74426     4\n",
       "9304      3\n",
       "76838     3\n",
       "75905     3\n",
       "74791     3\n",
       "Name: count, dtype: int64"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"customer_id\"].value_counts().head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[.sum() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sum.html#pandas.DataFrame.sum), [.min() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.min.html#pandas.DataFrame.min), [.max() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.max.html#pandas.DataFrame.max), [.mean() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.mean.html#pandas.DataFrame.mean), and [.round() <img height=\"12\" style=\"display: inline-block\" src=\"../static/link/to_pd.png\">](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.round.html#pandas.DataFrame.round) are self-explanatory.\n",
    "\n",
    "We use it to analyze the overall spendings in Bordeaux and for \"Max A Table\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "np.float64(15924.78)"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"total\"].sum() / 100  # Convert to Euro"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "np.float64(885.0)"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[max_a_table, \"total\"].sum() / 100"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "np.float64(3.5)"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"total\"].min() / 100"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "np.float64(83.7)"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"total\"].max() / 100"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "np.float64(12.5)"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[max_a_table, \"total\"].min() / 100"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "np.float64(60.0)"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[max_a_table, \"total\"].max() / 100"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "np.float64(22.94636887608069)"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"total\"].mean() / 100"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "np.float64(22.95)"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"total\"].mean().round() / 100"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "np.float64(22.69)"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[max_a_table, \"total\"].mean().round() / 100"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "intro-to-data-science",
   "language": "python",
   "name": "intro-to-data-science"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.4"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": false,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}