tidy-data/1_column_headers_are_values.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Column Headers are Values, not Variable Names\n",
    "\n",
    "This notebook shows two examples of how column headers display values. These type of messy datasets have practical use in two types of settings:\n",
    "\n",
    "1. Presentations\n",
    "2. Recordings of regularly spaced observations over time"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## \"Housekeeping\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2018-08-26 14:39:56 CEST\n",
      "\n",
      "CPython 3.6.5\n",
      "IPython 6.5.0\n",
      "\n",
      "numpy 1.15.1\n",
      "pandas 0.23.4\n"
     ]
    }
   ],
   "source": [
    "% load_ext watermark\n",
    "% watermark -d -t -v -z -p numpy,pandas"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import datetime\n",
    "import re\n",
    "\n",
    "import pandas as pd\n",
    "import savReaderWriter as spss"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example 1: Religion vs. Income\n",
    "\n",
    "> A common type of messy dataset is tabular data designed for **presentation**, where variables\n",
    "form both the rows and columns, and column headers are values, not variable names.\n",
    "\n",
    "The [Pew Research Center](http://www.pewresearch.org/) provides many studies on all kinds of aspects of life in the USA. The following examples uses data taken from its [Religious Landscape Study](http://www.pewforum.org/religious-landscape-study/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load the Data\n",
    "\n",
    "The data are provided as a SPSS data file. This is a binary specification with a built-in header section describing the data, for example, what variables / columns are included and what the realizations categorical data can have."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Load the dataset's meta data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "columns = ['q16', 'reltrad', 'income']\n",
    "encodings = {}\n",
    "\n",
    "# For sake of simplicity all data cleaning operations\n",
    "# are done within the for-loop for all columns.\n",
    "with spss.SavHeaderReader('data/pew.sav') as pew:\n",
    "    for c in columns:\n",
    "        encodings[c] = {\n",
    "            int(k): (\n",
    "                re.sub(r'\\(.*\\)', '', (\n",
    "                        v.decode('iso-8859-1')\n",
    "                        .replace('\\x92', \"'\")\n",
    "                        .replace(' Churches', '')\n",
    "                        .replace('Less than $10,000', '<$10k')\n",
    "                        .replace('10 to under $20,000', '$10-20k')\n",
    "                        .replace('20 to under $30,000', '$20-30k')\n",
    "                        .replace('30 to under $40,000', '$30-40k')\n",
    "                        .replace('40 to under $50,000', '$40-50k')\n",
    "                        .replace('50 to under $75,000', '$50-75k')\n",
    "                        .replace('75 to under $100,000', '$75-100k')\n",
    "                        .replace('100 to under $150,000', '$100-150k')\n",
    "                        .replace('$150,000 or more', '>150k')\n",
    "                    ),\n",
    "                ).strip()\n",
    "            )\n",
    "            for (k, v) in pew.all().valueLabels[c.encode()].items()\n",
    "        }"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Load the actual data and prepare them as they are presented in the paper."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "with spss.SavReader('data/pew.sav', selectVars=[c.encode() for c in columns]) as pew:\n",
    "    pew = list(pew)\n",
    "\n",
    "# Use the above encodings to map the numeric data\n",
    "# to the actual labels.\n",
    "pew = pd.DataFrame(pew, columns=columns, dtype=int)\n",
    "for c in columns:\n",
    "    pew[c] = pew[c].map(encodings[c])\n",
    "\n",
    "for v in ('Atheist', 'Agnostic'):\n",
    "    pew.loc[(pew['q16'] == v), 'reltrad'] = v\n",
    "\n",
    "income_columns = ['<$10k', '$10-20k', '$20-30k', '$30-40k', '$40-50k', '$50-75k',\n",
    "                  '$75-100k', '$100-150k', '>150k', 'Don\\'t know/Refused']\n",
    "\n",
    "pew = pew.groupby(['reltrad', 'income']).size().unstack('income')\n",
    "\n",
    "pew = pew[income_columns]\n",
    "pew.index.name = 'religion'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Messy Data\n",
    "\n",
    "The next cell shows the data as they can actually be provided as \"raw\" data (i.e., the pre-processing as done above is assumed to be done by someone else and the data analyst is only presented with the below dataset)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(18, 10)"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pew.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>income</th>\n",
       "      <th>&lt;$10k</th>\n",
       "      <th>$10-20k</th>\n",
       "      <th>$20-30k</th>\n",
       "      <th>$30-40k</th>\n",
       "      <th>$40-50k</th>\n",
       "      <th>$50-75k</th>\n",
       "      <th>$75-100k</th>\n",
       "      <th>$100-150k</th>\n",
       "      <th>&gt;150k</th>\n",
       "      <th>Don't know/Refused</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>religion</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Agnostic</th>\n",
       "      <td>27</td>\n",
       "      <td>34</td>\n",
       "      <td>60</td>\n",
       "      <td>81</td>\n",
       "      <td>76</td>\n",
       "      <td>137</td>\n",
       "      <td>122</td>\n",
       "      <td>109</td>\n",
       "      <td>84</td>\n",
       "      <td>96</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Atheist</th>\n",
       "      <td>12</td>\n",
       "      <td>27</td>\n",
       "      <td>37</td>\n",
       "      <td>52</td>\n",
       "      <td>35</td>\n",
       "      <td>70</td>\n",
       "      <td>73</td>\n",
       "      <td>59</td>\n",
       "      <td>74</td>\n",
       "      <td>76</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Buddhist</th>\n",
       "      <td>27</td>\n",
       "      <td>21</td>\n",
       "      <td>30</td>\n",
       "      <td>34</td>\n",
       "      <td>33</td>\n",
       "      <td>58</td>\n",
       "      <td>62</td>\n",
       "      <td>39</td>\n",
       "      <td>53</td>\n",
       "      <td>54</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Catholic</th>\n",
       "      <td>418</td>\n",
       "      <td>617</td>\n",
       "      <td>732</td>\n",
       "      <td>670</td>\n",
       "      <td>638</td>\n",
       "      <td>1116</td>\n",
       "      <td>949</td>\n",
       "      <td>792</td>\n",
       "      <td>633</td>\n",
       "      <td>1489</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Don't know/refused</th>\n",
       "      <td>15</td>\n",
       "      <td>14</td>\n",
       "      <td>15</td>\n",
       "      <td>11</td>\n",
       "      <td>10</td>\n",
       "      <td>35</td>\n",
       "      <td>21</td>\n",
       "      <td>17</td>\n",
       "      <td>18</td>\n",
       "      <td>116</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Evangelical Protestant</th>\n",
       "      <td>575</td>\n",
       "      <td>869</td>\n",
       "      <td>1064</td>\n",
       "      <td>982</td>\n",
       "      <td>881</td>\n",
       "      <td>1486</td>\n",
       "      <td>949</td>\n",
       "      <td>723</td>\n",
       "      <td>414</td>\n",
       "      <td>1529</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Hindu</th>\n",
       "      <td>1</td>\n",
       "      <td>9</td>\n",
       "      <td>7</td>\n",
       "      <td>9</td>\n",
       "      <td>11</td>\n",
       "      <td>34</td>\n",
       "      <td>47</td>\n",
       "      <td>48</td>\n",
       "      <td>54</td>\n",
       "      <td>37</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Historically Black Protestant</th>\n",
       "      <td>228</td>\n",
       "      <td>244</td>\n",
       "      <td>236</td>\n",
       "      <td>238</td>\n",
       "      <td>197</td>\n",
       "      <td>223</td>\n",
       "      <td>131</td>\n",
       "      <td>81</td>\n",
       "      <td>78</td>\n",
       "      <td>339</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Jehovah's Witness</th>\n",
       "      <td>20</td>\n",
       "      <td>27</td>\n",
       "      <td>24</td>\n",
       "      <td>24</td>\n",
       "      <td>21</td>\n",
       "      <td>30</td>\n",
       "      <td>15</td>\n",
       "      <td>11</td>\n",
       "      <td>6</td>\n",
       "      <td>37</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Jewish</th>\n",
       "      <td>19</td>\n",
       "      <td>19</td>\n",
       "      <td>25</td>\n",
       "      <td>25</td>\n",
       "      <td>30</td>\n",
       "      <td>95</td>\n",
       "      <td>69</td>\n",
       "      <td>87</td>\n",
       "      <td>151</td>\n",
       "      <td>162</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "income                         <$10k  $10-20k  $20-30k  $30-40k  $40-50k  \\\n",
       "religion                                                                   \n",
       "Agnostic                          27       34       60       81       76   \n",
       "Atheist                           12       27       37       52       35   \n",
       "Buddhist                          27       21       30       34       33   \n",
       "Catholic                         418      617      732      670      638   \n",
       "Don't know/refused                15       14       15       11       10   \n",
       "Evangelical Protestant           575      869     1064      982      881   \n",
       "Hindu                              1        9        7        9       11   \n",
       "Historically Black Protestant    228      244      236      238      197   \n",
       "Jehovah's Witness                 20       27       24       24       21   \n",
       "Jewish                            19       19       25       25       30   \n",
       "\n",
       "income                         $50-75k  $75-100k  $100-150k  >150k  \\\n",
       "religion                                                             \n",
       "Agnostic                           137       122        109     84   \n",
       "Atheist                             70        73         59     74   \n",
       "Buddhist                            58        62         39     53   \n",
       "Catholic                          1116       949        792    633   \n",
       "Don't know/refused                  35        21         17     18   \n",
       "Evangelical Protestant            1486       949        723    414   \n",
       "Hindu                               34        47         48     54   \n",
       "Historically Black Protestant      223       131         81     78   \n",
       "Jehovah's Witness                   30        15         11      6   \n",
       "Jewish                              95        69         87    151   \n",
       "\n",
       "income                         Don't know/Refused  \n",
       "religion                                           \n",
       "Agnostic                                       96  \n",
       "Atheist                                        76  \n",
       "Buddhist                                       54  \n",
       "Catholic                                     1489  \n",
       "Don't know/refused                            116  \n",
       "Evangelical Protestant                       1529  \n",
       "Hindu                                          37  \n",
       "Historically Black Protestant                 339  \n",
       "Jehovah's Witness                              37  \n",
       "Jewish                                        162  "
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pew.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Tidy Data\n",
    "\n",
    "> This dataset has **three** variables, **religion**, **income** and **frequency**. To tidy it, we need to **melt**, or stack it. In other words, we need to turn columns into rows.\n",
    "\n",
    "pandas provides a [pd.melt](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.melt.html) function to un-pivot the dataset.\n",
    "\n",
    "**Notes:** *reset_index()* transforms the religion index column into a data column (*pd.melt()* needs that). Further, the resulting table is sorted implicitly by the *religion* column. To get to the same ordering as in the paper, the molten table is explicitly sorted."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "molten_pew = pd.melt(pew.reset_index(), id_vars=['religion'], value_name='frequency')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a ordered column for the income labels.\n",
    "income_dtype = pd.api.types.CategoricalDtype(income_columns, ordered=True)\n",
    "molten_pew['income'] = molten_pew['income'].astype(income_dtype)\n",
    "molten_pew = molten_pew.sort_values(['religion', 'income']).reset_index(drop=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(180, 3)"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "molten_pew.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>religion</th>\n",
       "      <th>income</th>\n",
       "      <th>frequency</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Agnostic</td>\n",
       "      <td>&lt;$10k</td>\n",
       "      <td>27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Agnostic</td>\n",
       "      <td>$10-20k</td>\n",
       "      <td>34</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Agnostic</td>\n",
       "      <td>$20-30k</td>\n",
       "      <td>60</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Agnostic</td>\n",
       "      <td>$30-40k</td>\n",
       "      <td>81</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Agnostic</td>\n",
       "      <td>$40-50k</td>\n",
       "      <td>76</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Agnostic</td>\n",
       "      <td>$50-75k</td>\n",
       "      <td>137</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Agnostic</td>\n",
       "      <td>$75-100k</td>\n",
       "      <td>122</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Agnostic</td>\n",
       "      <td>$100-150k</td>\n",
       "      <td>109</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Agnostic</td>\n",
       "      <td>&gt;150k</td>\n",
       "      <td>84</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Agnostic</td>\n",
       "      <td>Don't know/Refused</td>\n",
       "      <td>96</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   religion              income  frequency\n",
       "0  Agnostic               <$10k         27\n",
       "1  Agnostic             $10-20k         34\n",
       "2  Agnostic             $20-30k         60\n",
       "3  Agnostic             $30-40k         81\n",
       "4  Agnostic             $40-50k         76\n",
       "5  Agnostic             $50-75k        137\n",
       "6  Agnostic            $75-100k        122\n",
       "7  Agnostic           $100-150k        109\n",
       "8  Agnostic               >150k         84\n",
       "9  Agnostic  Don't know/Refused         96"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "molten_pew.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example 2: Billboard\n",
    "\n",
    "> Another common use of this data format is to record regularly spaced observations over time. For example, the Billboard dataset shown in Table 7 records the date a song first entered the Billboard Top 100. It has variables for **artist**, **track**, **date.entered**, **rank** and **week**. The rank in each week after it enters the top 100 is recorded in 75 columns, wk1 to wk75. If a song is in the Top 100 for less than 75 weeks the remaining columns are filled with missing values. This form of storage is not tidy, but it is useful for data entry. It reduces duplication since otherwise each song in each week would need its own row, and song metadata like title and artist would need to be repeated."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load the Data\n",
    "\n",
    "The data come in a CSV file with tediously named week columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Usage of \"1st\", \"2nd\", \"3rd\" should be forbidden by law :)\n",
    "usecols = ['artist.inverted', 'track', 'time', 'date.entered'] + (\n",
    "    [f'x{i}st.week' for i in range(1, 76, 10) if i != 11]\n",
    "    + [f'x{i}nd.week' for i in range(2, 76, 10) if i != 12]\n",
    "    + [f'x{i}rd.week' for i in range(3, 76, 10) if i != 13]\n",
    "    + [f'x{i}th.week' for i in range(1, 76) if (i % 10) not in (1, 2, 3)]\n",
    "    + [f'x11th.week', f'x12th.week', f'x13th.week']\n",
    ")\n",
    "\n",
    "billboard = pd.read_csv('data/billboard.csv', encoding='iso-8859-1',\n",
    "                        parse_dates=['date.entered'], usecols=usecols)\n",
    "\n",
    "billboard = billboard.assign(year=lambda x: x['date.entered'].dt.year)\n",
    "\n",
    "# Rename the week columns.\n",
    "week_columns = {\n",
    "    c: ('wk' + re.sub(r'[^\\d]+', '', c))\n",
    "    for c in billboard.columns\n",
    "    if c.endswith('.week')\n",
    "}\n",
    "billboard = billboard.rename(columns={'artist.inverted': 'artist', **week_columns})\n",
    "\n",
    "# Ensure the columns' order is the same as in the paper.\n",
    "columns = ['year', 'artist', 'track', 'time', 'date.entered'] + [\n",
    "    f'wk{i}' for i in range(1, 76)\n",
    "]\n",
    "billboard = billboard[columns]\n",
    "\n",
    "# Ensure the rows' order is similar as in the paper.\n",
    "# For unknown reasons the exact ordering as in the paper cannot be reconstructed.\n",
    "billboard = billboard[billboard['year'] == 2000]\n",
    "billboard = billboard.sort_values(['artist', 'track'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Messy Data\n",
    "\n",
    "Again, the next cell shows the data as they were actually provided as \"raw\" data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(267, 80)"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "billboard.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>year</th>\n",
       "      <th>artist</th>\n",
       "      <th>track</th>\n",
       "      <th>time</th>\n",
       "      <th>date.entered</th>\n",
       "      <th>wk1</th>\n",
       "      <th>wk2</th>\n",
       "      <th>wk3</th>\n",
       "      <th>wk4</th>\n",
       "      <th>wk5</th>\n",
       "      <th>...</th>\n",
       "      <th>wk66</th>\n",
       "      <th>wk67</th>\n",
       "      <th>wk68</th>\n",
       "      <th>wk69</th>\n",
       "      <th>wk70</th>\n",
       "      <th>wk71</th>\n",
       "      <th>wk72</th>\n",
       "      <th>wk73</th>\n",
       "      <th>wk74</th>\n",
       "      <th>wk75</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>246</th>\n",
       "      <td>2000</td>\n",
       "      <td>2 Pac</td>\n",
       "      <td>Baby Don't Cry (Keep Ya Head Up II)</td>\n",
       "      <td>4:22</td>\n",
       "      <td>2000-02-26</td>\n",
       "      <td>87</td>\n",
       "      <td>82.0</td>\n",
       "      <td>72.0</td>\n",
       "      <td>77.0</td>\n",
       "      <td>87.0</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>287</th>\n",
       "      <td>2000</td>\n",
       "      <td>2Ge+her</td>\n",
       "      <td>The Hardest Part Of Breaking Up (Is Getting Ba...</td>\n",
       "      <td>3:15</td>\n",
       "      <td>2000-09-02</td>\n",
       "      <td>91</td>\n",
       "      <td>87.0</td>\n",
       "      <td>92.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>2000</td>\n",
       "      <td>3 Doors Down</td>\n",
       "      <td>Kryptonite</td>\n",
       "      <td>3:53</td>\n",
       "      <td>2000-04-08</td>\n",
       "      <td>81</td>\n",
       "      <td>70.0</td>\n",
       "      <td>68.0</td>\n",
       "      <td>67.0</td>\n",
       "      <td>66.0</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>193</th>\n",
       "      <td>2000</td>\n",
       "      <td>3 Doors Down</td>\n",
       "      <td>Loser</td>\n",
       "      <td>4:24</td>\n",
       "      <td>2000-10-21</td>\n",
       "      <td>76</td>\n",
       "      <td>76.0</td>\n",
       "      <td>72.0</td>\n",
       "      <td>69.0</td>\n",
       "      <td>67.0</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>69</th>\n",
       "      <td>2000</td>\n",
       "      <td>504 Boyz</td>\n",
       "      <td>Wobble Wobble</td>\n",
       "      <td>3:35</td>\n",
       "      <td>2000-04-15</td>\n",
       "      <td>57</td>\n",
       "      <td>34.0</td>\n",
       "      <td>25.0</td>\n",
       "      <td>17.0</td>\n",
       "      <td>17.0</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>2000</td>\n",
       "      <td>98¡</td>\n",
       "      <td>Give Me Just One Night (Una Noche)</td>\n",
       "      <td>3:24</td>\n",
       "      <td>2000-08-19</td>\n",
       "      <td>51</td>\n",
       "      <td>39.0</td>\n",
       "      <td>34.0</td>\n",
       "      <td>26.0</td>\n",
       "      <td>26.0</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>304</th>\n",
       "      <td>2000</td>\n",
       "      <td>A*Teens</td>\n",
       "      <td>Dancing Queen</td>\n",
       "      <td>3:44</td>\n",
       "      <td>2000-07-08</td>\n",
       "      <td>97</td>\n",
       "      <td>97.0</td>\n",
       "      <td>96.0</td>\n",
       "      <td>95.0</td>\n",
       "      <td>100.0</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>135</th>\n",
       "      <td>2000</td>\n",
       "      <td>Aaliyah</td>\n",
       "      <td>I Don't Wanna</td>\n",
       "      <td>4:15</td>\n",
       "      <td>2000-01-29</td>\n",
       "      <td>84</td>\n",
       "      <td>62.0</td>\n",
       "      <td>51.0</td>\n",
       "      <td>41.0</td>\n",
       "      <td>38.0</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>2000</td>\n",
       "      <td>Aaliyah</td>\n",
       "      <td>Try Again</td>\n",
       "      <td>4:03</td>\n",
       "      <td>2000-03-18</td>\n",
       "      <td>59</td>\n",
       "      <td>53.0</td>\n",
       "      <td>38.0</td>\n",
       "      <td>28.0</td>\n",
       "      <td>21.0</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>200</th>\n",
       "      <td>2000</td>\n",
       "      <td>Adams, Yolanda</td>\n",
       "      <td>Open My Heart</td>\n",
       "      <td>5:30</td>\n",
       "      <td>2000-08-26</td>\n",
       "      <td>76</td>\n",
       "      <td>76.0</td>\n",
       "      <td>74.0</td>\n",
       "      <td>69.0</td>\n",
       "      <td>68.0</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>10 rows × 80 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     year          artist                                              track  \\\n",
       "246  2000           2 Pac                Baby Don't Cry (Keep Ya Head Up II)   \n",
       "287  2000         2Ge+her  The Hardest Part Of Breaking Up (Is Getting Ba...   \n",
       "24   2000    3 Doors Down                                         Kryptonite   \n",
       "193  2000    3 Doors Down                                              Loser   \n",
       "69   2000        504 Boyz                                      Wobble Wobble   \n",
       "22   2000             98¡                 Give Me Just One Night (Una Noche)   \n",
       "304  2000         A*Teens                                      Dancing Queen   \n",
       "135  2000         Aaliyah                                      I Don't Wanna   \n",
       "14   2000         Aaliyah                                          Try Again   \n",
       "200  2000  Adams, Yolanda                                      Open My Heart   \n",
       "\n",
       "     time date.entered  wk1   wk2   wk3   wk4    wk5  ...   wk66  wk67  wk68  \\\n",
       "246  4:22   2000-02-26   87  82.0  72.0  77.0   87.0  ...    NaN   NaN   NaN   \n",
       "287  3:15   2000-09-02   91  87.0  92.0   NaN    NaN  ...    NaN   NaN   NaN   \n",
       "24   3:53   2000-04-08   81  70.0  68.0  67.0   66.0  ...    NaN   NaN   NaN   \n",
       "193  4:24   2000-10-21   76  76.0  72.0  69.0   67.0  ...    NaN   NaN   NaN   \n",
       "69   3:35   2000-04-15   57  34.0  25.0  17.0   17.0  ...    NaN   NaN   NaN   \n",
       "22   3:24   2000-08-19   51  39.0  34.0  26.0   26.0  ...    NaN   NaN   NaN   \n",
       "304  3:44   2000-07-08   97  97.0  96.0  95.0  100.0  ...    NaN   NaN   NaN   \n",
       "135  4:15   2000-01-29   84  62.0  51.0  41.0   38.0  ...    NaN   NaN   NaN   \n",
       "14   4:03   2000-03-18   59  53.0  38.0  28.0   21.0  ...    NaN   NaN   NaN   \n",
       "200  5:30   2000-08-26   76  76.0  74.0  69.0   68.0  ...    NaN   NaN   NaN   \n",
       "\n",
       "     wk69  wk70  wk71  wk72  wk73  wk74  wk75  \n",
       "246   NaN   NaN   NaN   NaN   NaN   NaN   NaN  \n",
       "287   NaN   NaN   NaN   NaN   NaN   NaN   NaN  \n",
       "24    NaN   NaN   NaN   NaN   NaN   NaN   NaN  \n",
       "193   NaN   NaN   NaN   NaN   NaN   NaN   NaN  \n",
       "69    NaN   NaN   NaN   NaN   NaN   NaN   NaN  \n",
       "22    NaN   NaN   NaN   NaN   NaN   NaN   NaN  \n",
       "304   NaN   NaN   NaN   NaN   NaN   NaN   NaN  \n",
       "135   NaN   NaN   NaN   NaN   NaN   NaN   NaN  \n",
       "14    NaN   NaN   NaN   NaN   NaN   NaN   NaN  \n",
       "200   NaN   NaN   NaN   NaN   NaN   NaN   NaN  \n",
       "\n",
       "[10 rows x 80 columns]"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "billboard.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### \"Tidy\" Data\n",
    "\n",
    "As before the *pd.melt* function is used to transform the data from \"wide\" to \"long\" form."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "molten_billboard = pd.melt(\n",
    "    billboard,\n",
    "    id_vars=['year', 'artist', 'track', 'time', 'date.entered'],\n",
    "    var_name='week',\n",
    "    value_name='rank',\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In contrast to R, pandas keeps (unneccesary) rows for weeks where the song was already out of the charts. These are discarded. Also, a new column *date* indicating when exactly a particular song was at a certain rank in the charts is added."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "# pandas keeps \"wide\" variables that had missing values as rows.\n",
    "molten_billboard = molten_billboard[molten_billboard['rank'].notnull()]\n",
    "\n",
    "# Cast as integer after missing values are removed.\n",
    "molten_billboard['week'] = molten_billboard['week'].map(lambda x: int(x[2:]))\n",
    "molten_billboard['rank'] = molten_billboard['rank'].map(int)\n",
    "\n",
    "# Calculate the actual week from the date of first entering the list.\n",
    "molten_billboard = molten_billboard.assign(\n",
    "    date=lambda x: x['date.entered'] + (x['week'] - 1) * datetime.timedelta(weeks=1)\n",
    ")\n",
    "\n",
    "# Sort rows and columns as in the paper.\n",
    "molten_billboard = molten_billboard[\n",
    "    ['year', 'artist', 'time', 'track', 'date', 'week', 'rank']\n",
    "]\n",
    "molten_billboard = (\n",
    "    molten_billboard.sort_values(['artist', 'track', 'week']).reset_index(drop=True)\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that this dataset is not yet fully tidy as will be explained in notebook No. 4."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>year</th>\n",
       "      <th>artist</th>\n",
       "      <th>time</th>\n",
       "      <th>track</th>\n",
       "      <th>date</th>\n",
       "      <th>week</th>\n",
       "      <th>rank</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2000</td>\n",
       "      <td>2 Pac</td>\n",
       "      <td>4:22</td>\n",
       "      <td>Baby Don't Cry (Keep Ya Head Up II)</td>\n",
       "      <td>2000-02-26</td>\n",
       "      <td>1</td>\n",
       "      <td>87</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2000</td>\n",
       "      <td>2 Pac</td>\n",
       "      <td>4:22</td>\n",
       "      <td>Baby Don't Cry (Keep Ya Head Up II)</td>\n",
       "      <td>2000-03-04</td>\n",
       "      <td>2</td>\n",
       "      <td>82</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2000</td>\n",
       "      <td>2 Pac</td>\n",
       "      <td>4:22</td>\n",
       "      <td>Baby Don't Cry (Keep Ya Head Up II)</td>\n",
       "      <td>2000-03-11</td>\n",
       "      <td>3</td>\n",
       "      <td>72</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2000</td>\n",
       "      <td>2 Pac</td>\n",
       "      <td>4:22</td>\n",
       "      <td>Baby Don't Cry (Keep Ya Head Up II)</td>\n",
       "      <td>2000-03-18</td>\n",
       "      <td>4</td>\n",
       "      <td>77</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2000</td>\n",
       "      <td>2 Pac</td>\n",
       "      <td>4:22</td>\n",
       "      <td>Baby Don't Cry (Keep Ya Head Up II)</td>\n",
       "      <td>2000-03-25</td>\n",
       "      <td>5</td>\n",
       "      <td>87</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>2000</td>\n",
       "      <td>2 Pac</td>\n",
       "      <td>4:22</td>\n",
       "      <td>Baby Don't Cry (Keep Ya Head Up II)</td>\n",
       "      <td>2000-04-01</td>\n",
       "      <td>6</td>\n",
       "      <td>94</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>2000</td>\n",
       "      <td>2 Pac</td>\n",
       "      <td>4:22</td>\n",
       "      <td>Baby Don't Cry (Keep Ya Head Up II)</td>\n",
       "      <td>2000-04-08</td>\n",
       "      <td>7</td>\n",
       "      <td>99</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>2000</td>\n",
       "      <td>2Ge+her</td>\n",
       "      <td>3:15</td>\n",
       "      <td>The Hardest Part Of Breaking Up (Is Getting Ba...</td>\n",
       "      <td>2000-09-02</td>\n",
       "      <td>1</td>\n",
       "      <td>91</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>2000</td>\n",
       "      <td>2Ge+her</td>\n",
       "      <td>3:15</td>\n",
       "      <td>The Hardest Part Of Breaking Up (Is Getting Ba...</td>\n",
       "      <td>2000-09-09</td>\n",
       "      <td>2</td>\n",
       "      <td>87</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>2000</td>\n",
       "      <td>2Ge+her</td>\n",
       "      <td>3:15</td>\n",
       "      <td>The Hardest Part Of Breaking Up (Is Getting Ba...</td>\n",
       "      <td>2000-09-16</td>\n",
       "      <td>3</td>\n",
       "      <td>92</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>2000</td>\n",
       "      <td>3 Doors Down</td>\n",
       "      <td>3:53</td>\n",
       "      <td>Kryptonite</td>\n",
       "      <td>2000-04-08</td>\n",
       "      <td>1</td>\n",
       "      <td>81</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>2000</td>\n",
       "      <td>3 Doors Down</td>\n",
       "      <td>3:53</td>\n",
       "      <td>Kryptonite</td>\n",
       "      <td>2000-04-15</td>\n",
       "      <td>2</td>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>2000</td>\n",
       "      <td>3 Doors Down</td>\n",
       "      <td>3:53</td>\n",
       "      <td>Kryptonite</td>\n",
       "      <td>2000-04-22</td>\n",
       "      <td>3</td>\n",
       "      <td>68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>2000</td>\n",
       "      <td>3 Doors Down</td>\n",
       "      <td>3:53</td>\n",
       "      <td>Kryptonite</td>\n",
       "      <td>2000-04-29</td>\n",
       "      <td>4</td>\n",
       "      <td>67</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>2000</td>\n",
       "      <td>3 Doors Down</td>\n",
       "      <td>3:53</td>\n",
       "      <td>Kryptonite</td>\n",
       "      <td>2000-05-06</td>\n",
       "      <td>5</td>\n",
       "      <td>66</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    year        artist  time  \\\n",
       "0   2000         2 Pac  4:22   \n",
       "1   2000         2 Pac  4:22   \n",
       "2   2000         2 Pac  4:22   \n",
       "3   2000         2 Pac  4:22   \n",
       "4   2000         2 Pac  4:22   \n",
       "5   2000         2 Pac  4:22   \n",
       "6   2000         2 Pac  4:22   \n",
       "7   2000       2Ge+her  3:15   \n",
       "8   2000       2Ge+her  3:15   \n",
       "9   2000       2Ge+her  3:15   \n",
       "10  2000  3 Doors Down  3:53   \n",
       "11  2000  3 Doors Down  3:53   \n",
       "12  2000  3 Doors Down  3:53   \n",
       "13  2000  3 Doors Down  3:53   \n",
       "14  2000  3 Doors Down  3:53   \n",
       "\n",
       "                                                track       date  week  rank  \n",
       "0                 Baby Don't Cry (Keep Ya Head Up II) 2000-02-26     1    87  \n",
       "1                 Baby Don't Cry (Keep Ya Head Up II) 2000-03-04     2    82  \n",
       "2                 Baby Don't Cry (Keep Ya Head Up II) 2000-03-11     3    72  \n",
       "3                 Baby Don't Cry (Keep Ya Head Up II) 2000-03-18     4    77  \n",
       "4                 Baby Don't Cry (Keep Ya Head Up II) 2000-03-25     5    87  \n",
       "5                 Baby Don't Cry (Keep Ya Head Up II) 2000-04-01     6    94  \n",
       "6                 Baby Don't Cry (Keep Ya Head Up II) 2000-04-08     7    99  \n",
       "7   The Hardest Part Of Breaking Up (Is Getting Ba... 2000-09-02     1    91  \n",
       "8   The Hardest Part Of Breaking Up (Is Getting Ba... 2000-09-09     2    87  \n",
       "9   The Hardest Part Of Breaking Up (Is Getting Ba... 2000-09-16     3    92  \n",
       "10                                         Kryptonite 2000-04-08     1    81  \n",
       "11                                         Kryptonite 2000-04-15     2    70  \n",
       "12                                         Kryptonite 2000-04-22     3    68  \n",
       "13                                         Kryptonite 2000-04-29     4    67  \n",
       "14                                         Kryptonite 2000-05-06     5    66  "
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "molten_billboard.head(15)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Save the Data\n",
    "\n",
    "The above \"tidy\" billboard dataset is saved as input for notebook No. 4."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "molten_billboard.to_csv('data/billboard_cleaned.csv', index=False)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}