Create notebook for the fourth application of tidying

2018-08-26 15:33:33 +02:00 · 2018-08-26 15:33:33 +02:00 · d18f7133a8
commit d18f7133a8
parent 442a541ad5
3 changed files with 5001 additions and 2 deletions
--- a/1_column_headers_are_values.ipynb
+++ b/1_column_headers_are_values.ipynb
@ -28,7 +28,7 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "2018-08-26 00:55:18 CEST\n",
+      "2018-08-26 14:39:56 CEST\n",
      "\n",
      "CPython 3.6.5\n",
      "IPython 6.5.0\n",
@ -1026,7 +1026,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Tidy Data\n",
+    "### \"Tidy\" Data\n",
    "\n",
    "As before the *pd.melt* function is used to transform the data from \"wide\" to \"long\" form."
   ]
@ -1079,6 +1079,13 @@
    ")"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Note that this dataset is not yet fully tidy as will be explained in notebook No. 4."
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": 16,
@ -1313,6 +1320,24 @@
   "source": [
    "molten_billboard.head(15)"
   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Save the Data\n",
+    "\n",
+    "The above \"tidy\" billboard dataset is saved as input for notebook No. 4."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "molten_billboard.to_csv('data/billboard_cleaned.csv', index=False)"
+   ]
  }
 ],
 "metadata": {
--- a/4_multiple_types_in_one_table.ipynb
+++ b/4_multiple_types_in_one_table.ipynb
@ -0,0 +1,708 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Multiple Types in one Table\n",
+    "\n",
+    "> Datasets often involve values collected at multiple levels, on different types of observational units. During tidying, each type of observational unit should be stored in its own table. This is closely related to the idea of database normalisation, where each fact is expressed in only one place. If this is not done, it’s possible for inconsistencies to occur."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## \"Housekeeping\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "2018-08-26 15:32:47 CEST\n",
+      "\n",
+      "CPython 3.6.5\n",
+      "IPython 6.5.0\n",
+      "\n",
+      "numpy 1.15.1\n",
+      "pandas 0.23.4\n"
+     ]
+    }
+   ],
+   "source": [
+    "% load_ext watermark\n",
+    "% watermark -d -t -v -z -p numpy,pandas"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Example: Billboard revisited"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Load the Data\n",
+    "\n",
+    "Load the cleaned and almost tidy dataset from notebook No. 1."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "billboard = pd.read_csv('data/billboard_cleaned.csv')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Messy Data\n",
+    "\n",
+    "> The Billboard dataset described in Table 8 actually contains observations on two types of\n",
+    "observational units: the *song* and its *rank* in each week. This manifests itself through the duplication of facts about the song: *artist* and *time* are repeated for every song in each *week*."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>year</th>\n",
+       "      <th>artist</th>\n",
+       "      <th>time</th>\n",
+       "      <th>track</th>\n",
+       "      <th>date</th>\n",
+       "      <th>week</th>\n",
+       "      <th>rank</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>2000</td>\n",
+       "      <td>2 Pac</td>\n",
+       "      <td>4:22</td>\n",
+       "      <td>Baby Don't Cry (Keep Ya Head Up II)</td>\n",
+       "      <td>2000-02-26</td>\n",
+       "      <td>1</td>\n",
+       "      <td>87</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>2000</td>\n",
+       "      <td>2 Pac</td>\n",
+       "      <td>4:22</td>\n",
+       "      <td>Baby Don't Cry (Keep Ya Head Up II)</td>\n",
+       "      <td>2000-03-04</td>\n",
+       "      <td>2</td>\n",
+       "      <td>82</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>2000</td>\n",
+       "      <td>2 Pac</td>\n",
+       "      <td>4:22</td>\n",
+       "      <td>Baby Don't Cry (Keep Ya Head Up II)</td>\n",
+       "      <td>2000-03-11</td>\n",
+       "      <td>3</td>\n",
+       "      <td>72</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>2000</td>\n",
+       "      <td>2 Pac</td>\n",
+       "      <td>4:22</td>\n",
+       "      <td>Baby Don't Cry (Keep Ya Head Up II)</td>\n",
+       "      <td>2000-03-18</td>\n",
+       "      <td>4</td>\n",
+       "      <td>77</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>2000</td>\n",
+       "      <td>2 Pac</td>\n",
+       "      <td>4:22</td>\n",
+       "      <td>Baby Don't Cry (Keep Ya Head Up II)</td>\n",
+       "      <td>2000-03-25</td>\n",
+       "      <td>5</td>\n",
+       "      <td>87</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5</th>\n",
+       "      <td>2000</td>\n",
+       "      <td>2 Pac</td>\n",
+       "      <td>4:22</td>\n",
+       "      <td>Baby Don't Cry (Keep Ya Head Up II)</td>\n",
+       "      <td>2000-04-01</td>\n",
+       "      <td>6</td>\n",
+       "      <td>94</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>6</th>\n",
+       "      <td>2000</td>\n",
+       "      <td>2 Pac</td>\n",
+       "      <td>4:22</td>\n",
+       "      <td>Baby Don't Cry (Keep Ya Head Up II)</td>\n",
+       "      <td>2000-04-08</td>\n",
+       "      <td>7</td>\n",
+       "      <td>99</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>7</th>\n",
+       "      <td>2000</td>\n",
+       "      <td>2Ge+her</td>\n",
+       "      <td>3:15</td>\n",
+       "      <td>The Hardest Part Of Breaking Up (Is Getting Ba...</td>\n",
+       "      <td>2000-09-02</td>\n",
+       "      <td>1</td>\n",
+       "      <td>91</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>8</th>\n",
+       "      <td>2000</td>\n",
+       "      <td>2Ge+her</td>\n",
+       "      <td>3:15</td>\n",
+       "      <td>The Hardest Part Of Breaking Up (Is Getting Ba...</td>\n",
+       "      <td>2000-09-09</td>\n",
+       "      <td>2</td>\n",
+       "      <td>87</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>9</th>\n",
+       "      <td>2000</td>\n",
+       "      <td>2Ge+her</td>\n",
+       "      <td>3:15</td>\n",
+       "      <td>The Hardest Part Of Breaking Up (Is Getting Ba...</td>\n",
+       "      <td>2000-09-16</td>\n",
+       "      <td>3</td>\n",
+       "      <td>92</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>10</th>\n",
+       "      <td>2000</td>\n",
+       "      <td>3 Doors Down</td>\n",
+       "      <td>3:53</td>\n",
+       "      <td>Kryptonite</td>\n",
+       "      <td>2000-04-08</td>\n",
+       "      <td>1</td>\n",
+       "      <td>81</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>11</th>\n",
+       "      <td>2000</td>\n",
+       "      <td>3 Doors Down</td>\n",
+       "      <td>3:53</td>\n",
+       "      <td>Kryptonite</td>\n",
+       "      <td>2000-04-15</td>\n",
+       "      <td>2</td>\n",
+       "      <td>70</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>12</th>\n",
+       "      <td>2000</td>\n",
+       "      <td>3 Doors Down</td>\n",
+       "      <td>3:53</td>\n",
+       "      <td>Kryptonite</td>\n",
+       "      <td>2000-04-22</td>\n",
+       "      <td>3</td>\n",
+       "      <td>68</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>13</th>\n",
+       "      <td>2000</td>\n",
+       "      <td>3 Doors Down</td>\n",
+       "      <td>3:53</td>\n",
+       "      <td>Kryptonite</td>\n",
+       "      <td>2000-04-29</td>\n",
+       "      <td>4</td>\n",
+       "      <td>67</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>14</th>\n",
+       "      <td>2000</td>\n",
+       "      <td>3 Doors Down</td>\n",
+       "      <td>3:53</td>\n",
+       "      <td>Kryptonite</td>\n",
+       "      <td>2000-05-06</td>\n",
+       "      <td>5</td>\n",
+       "      <td>66</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "    year        artist  time  \\\n",
+       "0   2000         2 Pac  4:22   \n",
+       "1   2000         2 Pac  4:22   \n",
+       "2   2000         2 Pac  4:22   \n",
+       "3   2000         2 Pac  4:22   \n",
+       "4   2000         2 Pac  4:22   \n",
+       "5   2000         2 Pac  4:22   \n",
+       "6   2000         2 Pac  4:22   \n",
+       "7   2000       2Ge+her  3:15   \n",
+       "8   2000       2Ge+her  3:15   \n",
+       "9   2000       2Ge+her  3:15   \n",
+       "10  2000  3 Doors Down  3:53   \n",
+       "11  2000  3 Doors Down  3:53   \n",
+       "12  2000  3 Doors Down  3:53   \n",
+       "13  2000  3 Doors Down  3:53   \n",
+       "14  2000  3 Doors Down  3:53   \n",
+       "\n",
+       "                                                track        date  week  rank  \n",
+       "0                 Baby Don't Cry (Keep Ya Head Up II)  2000-02-26     1    87  \n",
+       "1                 Baby Don't Cry (Keep Ya Head Up II)  2000-03-04     2    82  \n",
+       "2                 Baby Don't Cry (Keep Ya Head Up II)  2000-03-11     3    72  \n",
+       "3                 Baby Don't Cry (Keep Ya Head Up II)  2000-03-18     4    77  \n",
+       "4                 Baby Don't Cry (Keep Ya Head Up II)  2000-03-25     5    87  \n",
+       "5                 Baby Don't Cry (Keep Ya Head Up II)  2000-04-01     6    94  \n",
+       "6                 Baby Don't Cry (Keep Ya Head Up II)  2000-04-08     7    99  \n",
+       "7   The Hardest Part Of Breaking Up (Is Getting Ba...  2000-09-02     1    91  \n",
+       "8   The Hardest Part Of Breaking Up (Is Getting Ba...  2000-09-09     2    87  \n",
+       "9   The Hardest Part Of Breaking Up (Is Getting Ba...  2000-09-16     3    92  \n",
+       "10                                         Kryptonite  2000-04-08     1    81  \n",
+       "11                                         Kryptonite  2000-04-15     2    70  \n",
+       "12                                         Kryptonite  2000-04-22     3    68  \n",
+       "13                                         Kryptonite  2000-04-29     4    67  \n",
+       "14                                         Kryptonite  2000-05-06     5    66  "
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "billboard.head(15)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Tidy Data\n",
+    "\n",
+    "> The billboard dataset needs to be broken down into two datasets: a **song** dataset which stores *artist*, *song name* and *time*, and a **ranking** dataset which gives the *rank* of the song in each *week*.\n",
+    "\n",
+    "Transforming data columns into index columns is enough in pandas to obtain unique tuples from several columns. So no real \"function\" is needed to tidy up the dataset."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Get the unique combinations for the song DataFrame and\n",
+    "# \"store\" them in the original dataset for reuse.\n",
+    "billboard = billboard.set_index(['artist', 'track', 'time'])\n",
+    "\n",
+    "# Create the song DataFrame.\n",
+    "songs = pd.DataFrame.from_records(\n",
+    "    columns=['id', 'artist', 'track', 'time'],\n",
+    "    data=[  # Combine enumerate with tuple unpacking\n",
+    "        (a + 1, b, c, d)  # to create the ID column.\n",
+    "        for (a, (b, c, d))\n",
+    "        in enumerate(billboard.index.unique())\n",
+    "    ],\n",
+    ")\n",
+    "\n",
+    "# Take the date and rank columns from the original dataset\n",
+    "# and use the implicit index alignment to assign the songs' IDs.\n",
+    "ranking = billboard[['date', 'rank']].copy()\n",
+    "ranking['id'] = songs.set_index(['artist', 'track', 'time'])\n",
+    "\n",
+    "# Use the song ID as the index as in the paper.\n",
+    "ranking = ranking.reset_index(drop=True).set_index('id')\n",
+    "songs = songs.set_index('id')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>artist</th>\n",
+       "      <th>track</th>\n",
+       "      <th>time</th>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>id</th>\n",
+       "      <th></th>\n",
+       "      <th></th>\n",
+       "      <th></th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>2 Pac</td>\n",
+       "      <td>Baby Don't Cry (Keep Ya Head Up II)</td>\n",
+       "      <td>4:22</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>2Ge+her</td>\n",
+       "      <td>The Hardest Part Of Breaking Up (Is Getting Ba...</td>\n",
+       "      <td>3:15</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>3 Doors Down</td>\n",
+       "      <td>Kryptonite</td>\n",
+       "      <td>3:53</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>3 Doors Down</td>\n",
+       "      <td>Loser</td>\n",
+       "      <td>4:24</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5</th>\n",
+       "      <td>504 Boyz</td>\n",
+       "      <td>Wobble Wobble</td>\n",
+       "      <td>3:35</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>6</th>\n",
+       "      <td>98¡</td>\n",
+       "      <td>Give Me Just One Night (Una Noche)</td>\n",
+       "      <td>3:24</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>7</th>\n",
+       "      <td>A*Teens</td>\n",
+       "      <td>Dancing Queen</td>\n",
+       "      <td>3:44</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>8</th>\n",
+       "      <td>Aaliyah</td>\n",
+       "      <td>I Don't Wanna</td>\n",
+       "      <td>4:15</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>9</th>\n",
+       "      <td>Aaliyah</td>\n",
+       "      <td>Try Again</td>\n",
+       "      <td>4:03</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>10</th>\n",
+       "      <td>Adams, Yolanda</td>\n",
+       "      <td>Open My Heart</td>\n",
+       "      <td>5:30</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>11</th>\n",
+       "      <td>Adkins, Trace</td>\n",
+       "      <td>More</td>\n",
+       "      <td>3:05</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>12</th>\n",
+       "      <td>Aguilera, Christina</td>\n",
+       "      <td>Come On Over Baby (All I Want Is You)</td>\n",
+       "      <td>3:38</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>13</th>\n",
+       "      <td>Aguilera, Christina</td>\n",
+       "      <td>I Turn To You</td>\n",
+       "      <td>4:00</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>14</th>\n",
+       "      <td>Alice Deejay</td>\n",
+       "      <td>Better Off Alone</td>\n",
+       "      <td>6:50</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>15</th>\n",
+       "      <td>Allan, Gary</td>\n",
+       "      <td>Smoke Rings In The Dark</td>\n",
+       "      <td>4:18</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                 artist                                              track  \\\n",
+       "id                                                                           \n",
+       "1                 2 Pac                Baby Don't Cry (Keep Ya Head Up II)   \n",
+       "2               2Ge+her  The Hardest Part Of Breaking Up (Is Getting Ba...   \n",
+       "3          3 Doors Down                                         Kryptonite   \n",
+       "4          3 Doors Down                                              Loser   \n",
+       "5              504 Boyz                                      Wobble Wobble   \n",
+       "6                   98¡                 Give Me Just One Night (Una Noche)   \n",
+       "7               A*Teens                                      Dancing Queen   \n",
+       "8               Aaliyah                                      I Don't Wanna   \n",
+       "9               Aaliyah                                          Try Again   \n",
+       "10       Adams, Yolanda                                      Open My Heart   \n",
+       "11        Adkins, Trace                                               More   \n",
+       "12  Aguilera, Christina              Come On Over Baby (All I Want Is You)   \n",
+       "13  Aguilera, Christina                                      I Turn To You   \n",
+       "14         Alice Deejay                                   Better Off Alone   \n",
+       "15          Allan, Gary                            Smoke Rings In The Dark   \n",
+       "\n",
+       "    time  \n",
+       "id        \n",
+       "1   4:22  \n",
+       "2   3:15  \n",
+       "3   3:53  \n",
+       "4   4:24  \n",
+       "5   3:35  \n",
+       "6   3:24  \n",
+       "7   3:44  \n",
+       "8   4:15  \n",
+       "9   4:03  \n",
+       "10  5:30  \n",
+       "11  3:05  \n",
+       "12  3:38  \n",
+       "13  4:00  \n",
+       "14  6:50  \n",
+       "15  4:18  "
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "songs.head(15)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>date</th>\n",
+       "      <th>rank</th>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>id</th>\n",
+       "      <th></th>\n",
+       "      <th></th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>2000-02-26</td>\n",
+       "      <td>87</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>2000-03-04</td>\n",
+       "      <td>82</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>2000-03-11</td>\n",
+       "      <td>72</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>2000-03-18</td>\n",
+       "      <td>77</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>2000-03-25</td>\n",
+       "      <td>87</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>2000-04-01</td>\n",
+       "      <td>94</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>2000-04-08</td>\n",
+       "      <td>99</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>2000-09-02</td>\n",
+       "      <td>91</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>2000-09-09</td>\n",
+       "      <td>87</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>2000-09-16</td>\n",
+       "      <td>92</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>2000-04-08</td>\n",
+       "      <td>81</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>2000-04-15</td>\n",
+       "      <td>70</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>2000-04-22</td>\n",
+       "      <td>68</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>2000-04-29</td>\n",
+       "      <td>67</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>2000-05-06</td>\n",
+       "      <td>66</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "          date  rank\n",
+       "id                  \n",
+       "1   2000-02-26    87\n",
+       "1   2000-03-04    82\n",
+       "1   2000-03-11    72\n",
+       "1   2000-03-18    77\n",
+       "1   2000-03-25    87\n",
+       "1   2000-04-01    94\n",
+       "1   2000-04-08    99\n",
+       "2   2000-09-02    91\n",
+       "2   2000-09-09    87\n",
+       "2   2000-09-16    92\n",
+       "3   2000-04-08    81\n",
+       "3   2000-04-15    70\n",
+       "3   2000-04-22    68\n",
+       "3   2000-04-29    67\n",
+       "3   2000-05-06    66"
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "ranking.head(15)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/data/billboard_cleaned.csv
+++ b/data/billboard_cleaned.csv