2018-09-05 00:48:12 +02:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Descriptive Visualizations\n",
"\n",
"The purpose of this notebook is to visually examine the nominal features, discard the useless ones among them, and create new factor variables.\n",
"\n",
"The \"main\" plot used in this notebook is *Gr Liv Area* vs. *SalePrice* as the overall living area is the most correlated predictor (which is also very intuitive). Many of the nominal variables change the slopes of the regression lines for sub-groups of data points significantly."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## \"Housekeeping\""
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import pandas as pd\n",
"import seaborn as sns\n",
"\n",
"from sklearn.ensemble import IsolationForest\n",
"\n",
"from utils import (\n",
" ALL_COLUMNS,\n",
" NOMINAL_VARIABLES,\n",
" TARGET_VARIABLES,\n",
" load_clean_data,\n",
" encode_ordinals,\n",
" print_column_list,\n",
")"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 2,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"pd.set_option(\"display.max_columns\", 120)"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 3,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"sns.set_style(\"white\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load the Data\n"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 4,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
2018-09-05 15:34:04 +02:00
"df = load_clean_data(\"data/data_clean_with_transformations.csv\")"
2018-09-05 00:48:12 +02:00
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 5,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
2020-06-29 01:10:19 +02:00
"(2898, 86)"
2018-09-05 00:48:12 +02:00
]
},
2020-06-29 01:10:19 +02:00
"execution_count": 5,
2018-09-05 00:48:12 +02:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.shape"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 6,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>1st Flr SF</th>\n",
2020-06-29 01:10:19 +02:00
" <th>1st Flr SF (box-cox-0)</th>\n",
2018-09-05 00:48:12 +02:00
" <th>2nd Flr SF</th>\n",
" <th>3Ssn Porch</th>\n",
" <th>Alley</th>\n",
" <th>Bedroom AbvGr</th>\n",
" <th>Bldg Type</th>\n",
" <th>Bsmt Cond</th>\n",
" <th>Bsmt Exposure</th>\n",
" <th>Bsmt Full Bath</th>\n",
" <th>Bsmt Half Bath</th>\n",
" <th>Bsmt Qual</th>\n",
" <th>Bsmt Unf SF</th>\n",
" <th>BsmtFin SF 1</th>\n",
" <th>BsmtFin SF 2</th>\n",
" <th>BsmtFin Type 1</th>\n",
" <th>BsmtFin Type 2</th>\n",
" <th>Central Air</th>\n",
" <th>Condition 1</th>\n",
" <th>Condition 2</th>\n",
" <th>Electrical</th>\n",
" <th>Enclosed Porch</th>\n",
" <th>Exter Cond</th>\n",
" <th>Exter Qual</th>\n",
" <th>Exterior 1st</th>\n",
" <th>Exterior 2nd</th>\n",
" <th>Fence</th>\n",
" <th>Fireplace Qu</th>\n",
" <th>Fireplaces</th>\n",
" <th>Foundation</th>\n",
" <th>Full Bath</th>\n",
" <th>Functional</th>\n",
" <th>Garage Area</th>\n",
" <th>Garage Cars</th>\n",
" <th>Garage Cond</th>\n",
" <th>Garage Finish</th>\n",
" <th>Garage Qual</th>\n",
" <th>Garage Type</th>\n",
" <th>Gr Liv Area</th>\n",
2020-06-29 01:10:19 +02:00
" <th>Gr Liv Area (box-cox-0)</th>\n",
2018-09-05 00:48:12 +02:00
" <th>Half Bath</th>\n",
" <th>Heating</th>\n",
" <th>Heating QC</th>\n",
" <th>House Style</th>\n",
" <th>Kitchen AbvGr</th>\n",
" <th>Kitchen Qual</th>\n",
" <th>Land Contour</th>\n",
" <th>Land Slope</th>\n",
" <th>Lot Area</th>\n",
2020-06-29 01:10:19 +02:00
" <th>Lot Area (box-cox-0.1)</th>\n",
2018-09-05 00:48:12 +02:00
" <th>Lot Config</th>\n",
" <th>Lot Shape</th>\n",
" <th>Low Qual Fin SF</th>\n",
" <th>MS SubClass</th>\n",
" <th>MS Zoning</th>\n",
" <th>Mas Vnr Area</th>\n",
" <th>Mas Vnr Type</th>\n",
" <th>Misc Feature</th>\n",
" <th>Misc Val</th>\n",
" <th>Mo Sold</th>\n",
" <th>Neighborhood</th>\n",
" <th>Open Porch SF</th>\n",
" <th>Overall Cond</th>\n",
" <th>Overall Qual</th>\n",
" <th>Paved Drive</th>\n",
" <th>Pool Area</th>\n",
" <th>Pool QC</th>\n",
" <th>Roof Matl</th>\n",
" <th>Roof Style</th>\n",
" <th>Sale Condition</th>\n",
" <th>Sale Type</th>\n",
" <th>Screen Porch</th>\n",
" <th>Street</th>\n",
" <th>TotRms AbvGrd</th>\n",
" <th>Total Bath</th>\n",
" <th>Total Bsmt SF</th>\n",
" <th>Total Porch SF</th>\n",
2018-09-05 15:34:04 +02:00
" <th>Total SF</th>\n",
2020-06-29 01:10:19 +02:00
" <th>Total SF (box-cox-0.2)</th>\n",
2018-09-05 00:48:12 +02:00
" <th>Utilities</th>\n",
" <th>Wood Deck SF</th>\n",
" <th>Year Built</th>\n",
" <th>Year Remod/Add</th>\n",
" <th>Yr Sold</th>\n",
" <th>SalePrice</th>\n",
2020-06-29 01:10:19 +02:00
" <th>SalePrice (box-cox-0)</th>\n",
2018-09-05 00:48:12 +02:00
" </tr>\n",
" <tr>\n",
" <th>Order</th>\n",
" <th>PID</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
2018-09-05 15:34:04 +02:00
" <th></th>\n",
" <th></th>\n",
2020-06-29 01:10:19 +02:00
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
2018-09-05 00:48:12 +02:00
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <th>526301100</th>\n",
" <td>1656.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>7.412160</td>\n",
2018-09-05 00:48:12 +02:00
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>NA</td>\n",
" <td>3</td>\n",
" <td>1Fam</td>\n",
" <td>Gd</td>\n",
" <td>Gd</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>TA</td>\n",
" <td>441.0</td>\n",
" <td>639.0</td>\n",
" <td>0.0</td>\n",
" <td>BLQ</td>\n",
" <td>Unf</td>\n",
" <td>Y</td>\n",
" <td>Norm</td>\n",
" <td>Norm</td>\n",
" <td>SBrkr</td>\n",
" <td>0.0</td>\n",
" <td>TA</td>\n",
" <td>TA</td>\n",
" <td>BrkFace</td>\n",
" <td>Plywood</td>\n",
" <td>NA</td>\n",
" <td>Gd</td>\n",
" <td>2</td>\n",
" <td>CBlock</td>\n",
" <td>1</td>\n",
" <td>Typ</td>\n",
" <td>528.0</td>\n",
" <td>2</td>\n",
" <td>TA</td>\n",
" <td>Fin</td>\n",
" <td>TA</td>\n",
" <td>Attchd</td>\n",
" <td>1656.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>7.412160</td>\n",
2018-09-05 00:48:12 +02:00
" <td>0</td>\n",
" <td>GasA</td>\n",
" <td>Fa</td>\n",
" <td>1Story</td>\n",
" <td>1</td>\n",
" <td>TA</td>\n",
" <td>Lvl</td>\n",
" <td>Gtl</td>\n",
" <td>31770.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>18.196923</td>\n",
2018-09-05 00:48:12 +02:00
" <td>Corner</td>\n",
" <td>IR1</td>\n",
" <td>0.0</td>\n",
" <td>020</td>\n",
" <td>RL</td>\n",
" <td>112.0</td>\n",
" <td>Stone</td>\n",
" <td>NA</td>\n",
" <td>0.0</td>\n",
" <td>5</td>\n",
" <td>Names</td>\n",
" <td>62.0</td>\n",
" <td>5</td>\n",
" <td>6</td>\n",
" <td>P</td>\n",
" <td>0.0</td>\n",
" <td>NA</td>\n",
" <td>CompShg</td>\n",
" <td>Hip</td>\n",
" <td>Normal</td>\n",
" <td>WD</td>\n",
" <td>0.0</td>\n",
" <td>Pave</td>\n",
" <td>7</td>\n",
2018-09-05 15:34:04 +02:00
" <td>2.0</td>\n",
2018-09-05 00:48:12 +02:00
" <td>1080.0</td>\n",
2018-09-05 15:34:04 +02:00
" <td>272.0</td>\n",
" <td>2736.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>19.344072</td>\n",
2018-09-05 00:48:12 +02:00
" <td>AllPub</td>\n",
" <td>210.0</td>\n",
" <td>1960</td>\n",
" <td>1960</td>\n",
" <td>2010</td>\n",
" <td>215000.0</td>\n",
" <td>12.278393</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <th>526350040</th>\n",
" <td>896.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>6.797940</td>\n",
2018-09-05 00:48:12 +02:00
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>NA</td>\n",
" <td>2</td>\n",
" <td>1Fam</td>\n",
" <td>TA</td>\n",
" <td>No</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>TA</td>\n",
" <td>270.0</td>\n",
" <td>468.0</td>\n",
" <td>144.0</td>\n",
" <td>Rec</td>\n",
" <td>LwQ</td>\n",
" <td>Y</td>\n",
" <td>Feedr</td>\n",
" <td>Norm</td>\n",
" <td>SBrkr</td>\n",
" <td>0.0</td>\n",
" <td>TA</td>\n",
" <td>TA</td>\n",
" <td>VinylSd</td>\n",
" <td>VinylSd</td>\n",
" <td>MnPrv</td>\n",
" <td>NA</td>\n",
" <td>0</td>\n",
" <td>CBlock</td>\n",
" <td>1</td>\n",
" <td>Typ</td>\n",
" <td>730.0</td>\n",
" <td>1</td>\n",
" <td>TA</td>\n",
" <td>Unf</td>\n",
" <td>TA</td>\n",
" <td>Attchd</td>\n",
" <td>896.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>6.797940</td>\n",
2018-09-05 00:48:12 +02:00
" <td>0</td>\n",
" <td>GasA</td>\n",
" <td>TA</td>\n",
" <td>1Story</td>\n",
" <td>1</td>\n",
" <td>TA</td>\n",
" <td>Lvl</td>\n",
" <td>Gtl</td>\n",
" <td>11622.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>15.499290</td>\n",
2018-09-05 00:48:12 +02:00
" <td>Inside</td>\n",
" <td>Reg</td>\n",
" <td>0.0</td>\n",
" <td>020</td>\n",
" <td>RH</td>\n",
" <td>0.0</td>\n",
" <td>None</td>\n",
" <td>NA</td>\n",
" <td>0.0</td>\n",
" <td>6</td>\n",
" <td>Names</td>\n",
" <td>0.0</td>\n",
" <td>6</td>\n",
" <td>5</td>\n",
" <td>Y</td>\n",
" <td>0.0</td>\n",
" <td>NA</td>\n",
" <td>CompShg</td>\n",
" <td>Gable</td>\n",
" <td>Normal</td>\n",
" <td>WD</td>\n",
" <td>120.0</td>\n",
" <td>Pave</td>\n",
" <td>5</td>\n",
2018-09-05 15:34:04 +02:00
" <td>1.0</td>\n",
2018-09-05 00:48:12 +02:00
" <td>882.0</td>\n",
2018-09-05 15:34:04 +02:00
" <td>260.0</td>\n",
" <td>1778.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>17.333478</td>\n",
2018-09-05 00:48:12 +02:00
" <td>AllPub</td>\n",
" <td>140.0</td>\n",
" <td>1961</td>\n",
" <td>1961</td>\n",
" <td>2010</td>\n",
" <td>105000.0</td>\n",
" <td>11.561716</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <th>526351010</th>\n",
" <td>1329.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>7.192182</td>\n",
2018-09-05 00:48:12 +02:00
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>NA</td>\n",
" <td>3</td>\n",
" <td>1Fam</td>\n",
" <td>TA</td>\n",
" <td>No</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>TA</td>\n",
" <td>406.0</td>\n",
" <td>923.0</td>\n",
" <td>0.0</td>\n",
" <td>ALQ</td>\n",
" <td>Unf</td>\n",
" <td>Y</td>\n",
" <td>Norm</td>\n",
" <td>Norm</td>\n",
" <td>SBrkr</td>\n",
" <td>0.0</td>\n",
" <td>TA</td>\n",
" <td>TA</td>\n",
" <td>Wd Sdng</td>\n",
" <td>Wd Sdng</td>\n",
" <td>NA</td>\n",
" <td>NA</td>\n",
" <td>0</td>\n",
" <td>CBlock</td>\n",
" <td>1</td>\n",
" <td>Typ</td>\n",
" <td>312.0</td>\n",
" <td>1</td>\n",
" <td>TA</td>\n",
" <td>Unf</td>\n",
" <td>TA</td>\n",
" <td>Attchd</td>\n",
" <td>1329.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>7.192182</td>\n",
2018-09-05 00:48:12 +02:00
" <td>1</td>\n",
" <td>GasA</td>\n",
" <td>TA</td>\n",
" <td>1Story</td>\n",
" <td>1</td>\n",
" <td>Gd</td>\n",
" <td>Lvl</td>\n",
" <td>Gtl</td>\n",
" <td>14267.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>16.027549</td>\n",
2018-09-05 00:48:12 +02:00
" <td>Corner</td>\n",
" <td>IR1</td>\n",
" <td>0.0</td>\n",
" <td>020</td>\n",
" <td>RL</td>\n",
" <td>108.0</td>\n",
" <td>BrkFace</td>\n",
" <td>Gar2</td>\n",
" <td>12500.0</td>\n",
" <td>6</td>\n",
" <td>Names</td>\n",
" <td>36.0</td>\n",
" <td>6</td>\n",
" <td>6</td>\n",
" <td>Y</td>\n",
" <td>0.0</td>\n",
" <td>NA</td>\n",
" <td>CompShg</td>\n",
" <td>Hip</td>\n",
" <td>Normal</td>\n",
" <td>WD</td>\n",
" <td>0.0</td>\n",
" <td>Pave</td>\n",
" <td>6</td>\n",
2018-09-05 15:34:04 +02:00
" <td>1.5</td>\n",
2018-09-05 00:48:12 +02:00
" <td>1329.0</td>\n",
2018-09-05 15:34:04 +02:00
" <td>429.0</td>\n",
" <td>2658.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>19.203658</td>\n",
2018-09-05 00:48:12 +02:00
" <td>AllPub</td>\n",
" <td>393.0</td>\n",
" <td>1958</td>\n",
" <td>1958</td>\n",
" <td>2010</td>\n",
" <td>172000.0</td>\n",
" <td>12.055250</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <th>526353030</th>\n",
" <td>2110.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>7.654443</td>\n",
2018-09-05 00:48:12 +02:00
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>NA</td>\n",
" <td>3</td>\n",
" <td>1Fam</td>\n",
" <td>TA</td>\n",
" <td>No</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>TA</td>\n",
" <td>1045.0</td>\n",
" <td>1065.0</td>\n",
" <td>0.0</td>\n",
" <td>ALQ</td>\n",
" <td>Unf</td>\n",
" <td>Y</td>\n",
" <td>Norm</td>\n",
" <td>Norm</td>\n",
" <td>SBrkr</td>\n",
" <td>0.0</td>\n",
" <td>TA</td>\n",
" <td>Gd</td>\n",
" <td>BrkFace</td>\n",
" <td>BrkFace</td>\n",
" <td>NA</td>\n",
" <td>TA</td>\n",
" <td>2</td>\n",
" <td>CBlock</td>\n",
" <td>2</td>\n",
" <td>Typ</td>\n",
" <td>522.0</td>\n",
" <td>2</td>\n",
" <td>TA</td>\n",
" <td>Fin</td>\n",
" <td>TA</td>\n",
" <td>Attchd</td>\n",
" <td>2110.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>7.654443</td>\n",
2018-09-05 00:48:12 +02:00
" <td>1</td>\n",
" <td>GasA</td>\n",
" <td>Ex</td>\n",
" <td>1Story</td>\n",
" <td>1</td>\n",
" <td>Ex</td>\n",
" <td>Lvl</td>\n",
" <td>Gtl</td>\n",
" <td>11160.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>15.396064</td>\n",
2018-09-05 00:48:12 +02:00
" <td>Corner</td>\n",
" <td>Reg</td>\n",
" <td>0.0</td>\n",
" <td>020</td>\n",
" <td>RL</td>\n",
" <td>0.0</td>\n",
" <td>None</td>\n",
" <td>NA</td>\n",
" <td>0.0</td>\n",
" <td>4</td>\n",
" <td>Names</td>\n",
" <td>0.0</td>\n",
" <td>5</td>\n",
" <td>7</td>\n",
" <td>Y</td>\n",
" <td>0.0</td>\n",
" <td>NA</td>\n",
" <td>CompShg</td>\n",
" <td>Hip</td>\n",
" <td>Normal</td>\n",
" <td>WD</td>\n",
" <td>0.0</td>\n",
" <td>Pave</td>\n",
" <td>8</td>\n",
2018-09-05 15:34:04 +02:00
" <td>3.5</td>\n",
2018-09-05 00:48:12 +02:00
" <td>2110.0</td>\n",
2018-09-05 15:34:04 +02:00
" <td>0.0</td>\n",
" <td>4220.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>21.548042</td>\n",
2018-09-05 00:48:12 +02:00
" <td>AllPub</td>\n",
" <td>0.0</td>\n",
" <td>1968</td>\n",
" <td>1968</td>\n",
" <td>2010</td>\n",
" <td>244000.0</td>\n",
" <td>12.404924</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <th>527105010</th>\n",
" <td>928.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>6.833032</td>\n",
2018-09-05 00:48:12 +02:00
" <td>701.0</td>\n",
" <td>0.0</td>\n",
" <td>NA</td>\n",
" <td>3</td>\n",
" <td>1Fam</td>\n",
" <td>TA</td>\n",
" <td>No</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Gd</td>\n",
" <td>137.0</td>\n",
" <td>791.0</td>\n",
" <td>0.0</td>\n",
" <td>GLQ</td>\n",
" <td>Unf</td>\n",
" <td>Y</td>\n",
" <td>Norm</td>\n",
" <td>Norm</td>\n",
" <td>SBrkr</td>\n",
" <td>0.0</td>\n",
" <td>TA</td>\n",
" <td>TA</td>\n",
" <td>VinylSd</td>\n",
" <td>VinylSd</td>\n",
" <td>MnPrv</td>\n",
" <td>TA</td>\n",
" <td>1</td>\n",
" <td>PConc</td>\n",
" <td>2</td>\n",
" <td>Typ</td>\n",
" <td>482.0</td>\n",
" <td>2</td>\n",
" <td>TA</td>\n",
" <td>Fin</td>\n",
" <td>TA</td>\n",
" <td>Attchd</td>\n",
" <td>1629.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>7.395722</td>\n",
2018-09-05 00:48:12 +02:00
" <td>1</td>\n",
" <td>GasA</td>\n",
" <td>Gd</td>\n",
" <td>2Story</td>\n",
" <td>1</td>\n",
" <td>TA</td>\n",
" <td>Lvl</td>\n",
" <td>Gtl</td>\n",
" <td>13830.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>15.946705</td>\n",
2018-09-05 00:48:12 +02:00
" <td>Inside</td>\n",
" <td>IR1</td>\n",
" <td>0.0</td>\n",
" <td>060</td>\n",
" <td>RL</td>\n",
" <td>0.0</td>\n",
" <td>None</td>\n",
" <td>NA</td>\n",
" <td>0.0</td>\n",
" <td>3</td>\n",
" <td>Gilbert</td>\n",
" <td>34.0</td>\n",
" <td>5</td>\n",
" <td>5</td>\n",
" <td>Y</td>\n",
" <td>0.0</td>\n",
" <td>NA</td>\n",
" <td>CompShg</td>\n",
" <td>Gable</td>\n",
" <td>Normal</td>\n",
" <td>WD</td>\n",
" <td>0.0</td>\n",
" <td>Pave</td>\n",
" <td>6</td>\n",
2018-09-05 15:34:04 +02:00
" <td>2.5</td>\n",
2018-09-05 00:48:12 +02:00
" <td>928.0</td>\n",
2018-09-05 15:34:04 +02:00
" <td>246.0</td>\n",
" <td>2557.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>19.016856</td>\n",
2018-09-05 00:48:12 +02:00
" <td>AllPub</td>\n",
" <td>212.0</td>\n",
" <td>1997</td>\n",
" <td>1998</td>\n",
" <td>2010</td>\n",
" <td>189900.0</td>\n",
" <td>12.154253</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
2020-06-29 01:10:19 +02:00
" 1st Flr SF 1st Flr SF (box-cox-0) 2nd Flr SF 3Ssn Porch \\\n",
"Order PID \n",
"1 526301100 1656.0 7.412160 0.0 0.0 \n",
"2 526350040 896.0 6.797940 0.0 0.0 \n",
"3 526351010 1329.0 7.192182 0.0 0.0 \n",
"4 526353030 2110.0 7.654443 0.0 0.0 \n",
"5 527105010 928.0 6.833032 701.0 0.0 \n",
2018-09-05 00:48:12 +02:00
"\n",
2020-06-29 01:10:19 +02:00
" Alley Bedroom AbvGr Bldg Type Bsmt Cond Bsmt Exposure \\\n",
"Order PID \n",
"1 526301100 NA 3 1Fam Gd Gd \n",
"2 526350040 NA 2 1Fam TA No \n",
"3 526351010 NA 3 1Fam TA No \n",
"4 526353030 NA 3 1Fam TA No \n",
"5 527105010 NA 3 1Fam TA No \n",
2018-09-05 00:48:12 +02:00
"\n",
2020-06-29 01:10:19 +02:00
" Bsmt Full Bath Bsmt Half Bath Bsmt Qual Bsmt Unf SF \\\n",
"Order PID \n",
"1 526301100 1 0 TA 441.0 \n",
"2 526350040 0 0 TA 270.0 \n",
"3 526351010 0 0 TA 406.0 \n",
"4 526353030 1 0 TA 1045.0 \n",
"5 527105010 0 0 Gd 137.0 \n",
2018-09-05 00:48:12 +02:00
"\n",
2020-06-29 01:10:19 +02:00
" BsmtFin SF 1 BsmtFin SF 2 BsmtFin Type 1 BsmtFin Type 2 \\\n",
"Order PID \n",
"1 526301100 639.0 0.0 BLQ Unf \n",
"2 526350040 468.0 144.0 Rec LwQ \n",
"3 526351010 923.0 0.0 ALQ Unf \n",
"4 526353030 1065.0 0.0 ALQ Unf \n",
"5 527105010 791.0 0.0 GLQ Unf \n",
2018-09-05 00:48:12 +02:00
"\n",
2020-06-29 01:10:19 +02:00
" Central Air Condition 1 Condition 2 Electrical \\\n",
"Order PID \n",
"1 526301100 Y Norm Norm SBrkr \n",
"2 526350040 Y Feedr Norm SBrkr \n",
"3 526351010 Y Norm Norm SBrkr \n",
"4 526353030 Y Norm Norm SBrkr \n",
"5 527105010 Y Norm Norm SBrkr \n",
2018-09-05 00:48:12 +02:00
"\n",
2020-06-29 01:10:19 +02:00
" Enclosed Porch Exter Cond Exter Qual Exterior 1st \\\n",
"Order PID \n",
"1 526301100 0.0 TA TA BrkFace \n",
"2 526350040 0.0 TA TA VinylSd \n",
"3 526351010 0.0 TA TA Wd Sdng \n",
"4 526353030 0.0 TA Gd BrkFace \n",
"5 527105010 0.0 TA TA VinylSd \n",
2018-09-05 00:48:12 +02:00
"\n",
2020-06-29 01:10:19 +02:00
" Exterior 2nd Fence Fireplace Qu Fireplaces Foundation \\\n",
"Order PID \n",
"1 526301100 Plywood NA Gd 2 CBlock \n",
"2 526350040 VinylSd MnPrv NA 0 CBlock \n",
"3 526351010 Wd Sdng NA NA 0 CBlock \n",
"4 526353030 BrkFace NA TA 2 CBlock \n",
"5 527105010 VinylSd MnPrv TA 1 PConc \n",
"\n",
" Full Bath Functional Garage Area Garage Cars Garage Cond \\\n",
"Order PID \n",
"1 526301100 1 Typ 528.0 2 TA \n",
"2 526350040 1 Typ 730.0 1 TA \n",
"3 526351010 1 Typ 312.0 1 TA \n",
"4 526353030 2 Typ 522.0 2 TA \n",
"5 527105010 2 Typ 482.0 2 TA \n",
2018-09-05 00:48:12 +02:00
"\n",
2020-06-29 01:10:19 +02:00
" Garage Finish Garage Qual Garage Type Gr Liv Area \\\n",
2018-09-05 00:48:12 +02:00
"Order PID \n",
2020-06-29 01:10:19 +02:00
"1 526301100 Fin TA Attchd 1656.0 \n",
"2 526350040 Unf TA Attchd 896.0 \n",
"3 526351010 Unf TA Attchd 1329.0 \n",
"4 526353030 Fin TA Attchd 2110.0 \n",
"5 527105010 Fin TA Attchd 1629.0 \n",
2018-09-05 00:48:12 +02:00
"\n",
2020-06-29 01:10:19 +02:00
" Gr Liv Area (box-cox-0) Half Bath Heating Heating QC \\\n",
2018-09-05 00:48:12 +02:00
"Order PID \n",
2020-06-29 01:10:19 +02:00
"1 526301100 7.412160 0 GasA Fa \n",
"2 526350040 6.797940 0 GasA TA \n",
"3 526351010 7.192182 1 GasA TA \n",
"4 526353030 7.654443 1 GasA Ex \n",
"5 527105010 7.395722 1 GasA Gd \n",
2018-09-05 00:48:12 +02:00
"\n",
" House Style Kitchen AbvGr Kitchen Qual Land Contour \\\n",
"Order PID \n",
"1 526301100 1Story 1 TA Lvl \n",
"2 526350040 1Story 1 TA Lvl \n",
"3 526351010 1Story 1 Gd Lvl \n",
"4 526353030 1Story 1 Ex Lvl \n",
"5 527105010 2Story 1 TA Lvl \n",
"\n",
2020-06-29 01:10:19 +02:00
" Land Slope Lot Area Lot Area (box-cox-0.1) Lot Config \\\n",
"Order PID \n",
"1 526301100 Gtl 31770.0 18.196923 Corner \n",
"2 526350040 Gtl 11622.0 15.499290 Inside \n",
"3 526351010 Gtl 14267.0 16.027549 Corner \n",
"4 526353030 Gtl 11160.0 15.396064 Corner \n",
"5 527105010 Gtl 13830.0 15.946705 Inside \n",
2018-09-05 00:48:12 +02:00
"\n",
2020-06-29 01:10:19 +02:00
" Lot Shape Low Qual Fin SF MS SubClass MS Zoning \\\n",
"Order PID \n",
"1 526301100 IR1 0.0 020 RL \n",
"2 526350040 Reg 0.0 020 RH \n",
"3 526351010 IR1 0.0 020 RL \n",
"4 526353030 Reg 0.0 020 RL \n",
"5 527105010 IR1 0.0 060 RL \n",
2018-09-05 00:48:12 +02:00
"\n",
2020-06-29 01:10:19 +02:00
" Mas Vnr Area Mas Vnr Type Misc Feature Misc Val Mo Sold \\\n",
"Order PID \n",
"1 526301100 112.0 Stone NA 0.0 5 \n",
"2 526350040 0.0 None NA 0.0 6 \n",
"3 526351010 108.0 BrkFace Gar2 12500.0 6 \n",
"4 526353030 0.0 None NA 0.0 4 \n",
"5 527105010 0.0 None NA 0.0 3 \n",
2018-09-05 00:48:12 +02:00
"\n",
2020-06-29 01:10:19 +02:00
" Neighborhood Open Porch SF Overall Cond Overall Qual \\\n",
2018-09-05 00:48:12 +02:00
"Order PID \n",
2020-06-29 01:10:19 +02:00
"1 526301100 Names 62.0 5 6 \n",
"2 526350040 Names 0.0 6 5 \n",
"3 526351010 Names 36.0 6 6 \n",
"4 526353030 Names 0.0 5 7 \n",
"5 527105010 Gilbert 34.0 5 5 \n",
2018-09-05 00:48:12 +02:00
"\n",
2020-06-29 01:10:19 +02:00
" Paved Drive Pool Area Pool QC Roof Matl Roof Style \\\n",
"Order PID \n",
"1 526301100 P 0.0 NA CompShg Hip \n",
"2 526350040 Y 0.0 NA CompShg Gable \n",
"3 526351010 Y 0.0 NA CompShg Hip \n",
"4 526353030 Y 0.0 NA CompShg Hip \n",
"5 527105010 Y 0.0 NA CompShg Gable \n",
2018-09-05 00:48:12 +02:00
"\n",
2020-06-29 01:10:19 +02:00
" Sale Condition Sale Type Screen Porch Street TotRms AbvGrd \\\n",
"Order PID \n",
"1 526301100 Normal WD 0.0 Pave 7 \n",
"2 526350040 Normal WD 120.0 Pave 5 \n",
"3 526351010 Normal WD 0.0 Pave 6 \n",
"4 526353030 Normal WD 0.0 Pave 8 \n",
"5 527105010 Normal WD 0.0 Pave 6 \n",
2018-09-05 00:48:12 +02:00
"\n",
2020-06-29 01:10:19 +02:00
" Total Bath Total Bsmt SF Total Porch SF Total SF \\\n",
"Order PID \n",
"1 526301100 2.0 1080.0 272.0 2736.0 \n",
"2 526350040 1.0 882.0 260.0 1778.0 \n",
"3 526351010 1.5 1329.0 429.0 2658.0 \n",
"4 526353030 3.5 2110.0 0.0 4220.0 \n",
"5 527105010 2.5 928.0 246.0 2557.0 \n",
2018-09-05 00:48:12 +02:00
"\n",
2020-06-29 01:10:19 +02:00
" Total SF (box-cox-0.2) Utilities Wood Deck SF Year Built \\\n",
"Order PID \n",
"1 526301100 19.344072 AllPub 210.0 1960 \n",
"2 526350040 17.333478 AllPub 140.0 1961 \n",
"3 526351010 19.203658 AllPub 393.0 1958 \n",
"4 526353030 21.548042 AllPub 0.0 1968 \n",
"5 527105010 19.016856 AllPub 212.0 1997 \n",
2018-09-05 15:34:04 +02:00
"\n",
2020-06-29 01:10:19 +02:00
" Year Remod/Add Yr Sold SalePrice SalePrice (box-cox-0) \n",
"Order PID \n",
"1 526301100 1960 2010 215000.0 12.278393 \n",
"2 526350040 1961 2010 105000.0 11.561716 \n",
"3 526351010 1958 2010 172000.0 12.055250 \n",
"4 526353030 1968 2010 244000.0 12.404924 \n",
"5 527105010 1998 2010 189900.0 12.154253 "
2018-09-05 00:48:12 +02:00
]
},
2020-06-29 01:10:19 +02:00
"execution_count": 6,
2018-09-05 00:48:12 +02:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Newly created variables are collected in the *new_variables* list."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 7,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"new_variables = []"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Derived Characteristics\n",
"\n",
"Certain characteristics of a house are assumed to have a \"binary\" influence on the sales price. For example, the existence of a pool could be an important predictor while the exact size of the pool can be deemed not so important.\n",
"\n",
"The below cell creates boolean factor variables out of a set of numeric variables."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 8,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"derived_variables = {\n",
" \"has 2nd Flr\": \"2nd Flr SF\",\n",
" \"has Bsmt\": \"Total Bsmt SF\",\n",
" \"has Fireplace\": \"Fireplaces\",\n",
" \"has Garage\": \"Garage Area\",\n",
" \"has Pool\": \"Pool Area\",\n",
" \"has Porch\": \"Total Porch SF\",\n",
"}\n",
"# Factorize numeric columns.\n",
"for factor_column, column in derived_variables.items():\n",
" df[factor_column] = df[column].apply(lambda x: 1 if x > 0 else 0)\n",
"derived_variables = list(derived_variables.keys())"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 9,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"new_variables.extend(derived_variables)"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 10,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>has 2nd Flr</th>\n",
" <th>has Bsmt</th>\n",
" <th>has Fireplace</th>\n",
" <th>has Garage</th>\n",
" <th>has Pool</th>\n",
" <th>has Porch</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Order</th>\n",
" <th>PID</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <th>526301100</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <th>526350040</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <th>526351010</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <th>526353030</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <th>527105010</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" has 2nd Flr has Bsmt has Fireplace has Garage has Pool \\\n",
"Order PID \n",
"1 526301100 0 1 1 1 0 \n",
"2 526350040 0 1 0 1 0 \n",
"3 526351010 0 1 0 1 0 \n",
"4 526353030 0 1 1 1 0 \n",
"5 527105010 1 1 1 1 0 \n",
"\n",
" has Porch \n",
"Order PID \n",
"1 526301100 1 \n",
"2 526350040 1 \n",
"3 526351010 1 \n",
"4 526353030 0 \n",
"5 527105010 1 "
]
},
2020-06-29 01:10:19 +02:00
"execution_count": 10,
2018-09-05 00:48:12 +02:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[derived_variables].head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2nd Floors\n",
"\n",
2020-06-29 01:10:19 +02:00
"A second floor may have a positive effect on the sales price. However, having a second floor correlates with overall living space. The individual effect is therefore not as clear as it seems in the plot below."
2018-09-05 00:48:12 +02:00
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 11,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
2020-06-29 01:10:19 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZcAAAEGCAYAAACpXNjrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdeVyVVf7A8c9duHDZF+GigribuWHljksYmhrjnk5Ni5PZlKVmy+S0WZpTzUxZP9vMthmtNBMtsbTQRHMtFxIzVxQULvtyudz9/P44Ci6IqCBa5/168ZL73Od5zgHh+XK279EIIQSKoiiKUoe0DV0BRVEU5fdHBRdFURSlzqngoiiKotQ5FVwURVGUOqeCi6IoilLn9A1dgatFjx49aNq0aUNXQ1EU5Zpy/Phxtm7des5xFVxOatq0KcuWLWvoaiiKolxTRo0aVe1x1S2mKIqi1DkVXBRFUZQ6p4KLoiiKUufUmEsNnE4nWVlZ2Gy2hq7K756Pjw9RUVF4eXk1dFUURakDKrjUICsri4CAAJo3b45Go2no6vxuCSEoKCggKyuLFi1aNHR1FEWpA6pbrAY2m42wsDAVWOqZRqMhLCxMtRAV5XdEtVwuQAWWK0N9n5XLJgRYzFCeDwEm8Atv6Br9oangoijK74PFDPP7Q1kOXDcMhr8FxpCGrtUfluoWu4plZWVx22231ek9f/31V8aNG8ewYcNITExk1apVF32P+Ph4CgsLqz2emJjI8OHDGT58ODt27KiXr0FRquV2yMACkJ0GLvv5z1XbWNU71XL5g/Hx8eGVV16hefPmmM1mRo8eTVxcHIGBgXVy/08++YTQ0NDK11lZWdWe53K50OvVj59Sh7wD4dZXYO9yuOUF8A079xy3C/L3wZZ3oetd0LgLePlc+br+Aajf7quc2+3mmWeeYefOnZhMJt5++218fHxYsmQJixcvxul0EhMTw6uvvorRaOSbb77hrbfeQqvVEhAQwKJFi8643+mzsUwmE6GhoRQWFhIYGEh8fDwjRoxg3bp1uFwu5s6dS6tWrSgqKuKxxx7DbDYTGxvLpW5eumzZMtasWYPVasXj8bBw4cLL+t4oyhmMwXDTBOg8DnwCQas79xxrAXx8G1QUQdpimJoGXo2vfF3/AFS32FXu6NGj3HnnnSQnJxMQEMDq1asBSEhI4Msvv+Srr76iZcuWLF26FIC3336bDz74gK+++op33nmnxnunpaXhdDpp1qxZ5bGQkBCSkpIYP348H374IQBvvfUWN9xwA8nJySQkJHDixInz3vOee+5h+PDhjB07ttr39+7dy5tvvqkCi1I/9N7gG1J9YAHQaKrGYbwDQKMegfVFtVyuclFRUbRv3x6ADh06cPz4cQAOHDjA3LlzKSsro7y8nLi4OAC6du3KU089xZAhQ0hISDjvfXNzc3niiSd45ZVX0GqrfsEGDRoEQMeOHfnuu+8A2L59O/PmzQNgwIABBAUFnfe+Z3eLna1Pnz4EBwfX5ktXlLrnHwH3roSjmyCqu5pRVo9U2L7KGQyGys91Oh1utxuAp556iueee46vv/6ahx9+GIfDAcCLL77ItGnTyM7OZvTo0RQVFZ1zT4vFwgMPPMCjjz5KbGzsGe+dWiGv1Wory6pLRqOxzu+pKBclsCl0GgshMaBVj8D6or6z16jy8nLCw8NxOp18/fXXlcePHTtGly5dmDp1KiEhIeTk5JxxncPhYPLkyQwfPpxbb721VmV169atsoz169dTUlJSd1+Ioii/S6pb7Bo1depUxo4dS2hoKF26dKG8vByAV199laNHjyKEoGfPnlx33XVnXPfNN9/w008/UVxcTFJSEgAvv/xyZddbdSZPnsxjjz3GsGHD6Nq1K02aNKm/L0xRlN8FjbjUqT+/M6NGjTpns7Bff/21xoeuUrfU91tRrj3VPTtBdYspiqIo9UAFF0VRFKXOqeCiKIqi1Ll6Cy6HDx+uzDE1fPhwbrjhBj7++GOKi4uZMGECgwYNYsKECZUzj4QQzJ49m4SEBBITE0lPT6+8V1JSEoMGDWLQoEGVg9AAe/bsITExkYSEBGbPnl25cvx8ZSiKoihXRr0Fl5YtW7JixQpWrFjBsmXLMBqNJCQkMH/+fHr16sWaNWvo1asX8+fPByA1NZWMjAzWrFnDrFmzmDlzJiADxbx581iyZAlffPEF8+bNqwwWM2fOZNasWaxZs4aMjAxSU1MBzluGoiiKcmVckW6xzZs3Ex0dTdOmTUlJSWHEiBEAjBgxgu+//x6g8rhGoyE2NpbS0lJyc3PZuHFj5aruoKAg+vTpw4YNG8jNzcVisRAbG4tGo2HEiBGkpKScca+zy1AURVGujCsSXJKTkyvTrhcUFBAREQFAeHg4BQUFAJjNZiIjIyuviYyMxGw2n3PcZDJVe/zU+TWVcS1KTU1l8ODBla0+RVGUa0G9BxeHw8HatWurXQ2u0WjqfQfCK1FGfXG73bz44ossWLCA5ORkVq5cycGDBxu6WoqiKBdU78ElNTWVDh060KhRIwDCwsLIzc0FZPLEU0kOTSbTGalKcnJyMJlM5xw3m83VHj91fk1l1LflO4/T5+W1tHgqmT4vr2X5zuOXdb+0tDRiYmKIjo7GYDAwbNiwyq4/RVGUq1m9B5fk5GSGDRtW+To+Pp7ly5cDsHz5cgYOHHjGcSEEu3btIiAggIiICOLi4ti4cSMlJSWUlJSwceNG4uLiiIiIwN/fn127diGEqPZeZ5dRn5bvPM6MZb9wvLgCARwvrmDGsl8uK8Ccr0tQURTlalevucWsViubNm3ixRdfrDw2adIkpk2bxtKlS2nSpAlz584FoH///qxfv56EhASMRiNz5swBIDg4mIceeogxY8YAMs/VqZTtzz//PDNmzMBms9GvXz/69etXYxn16V+rf6PCeWYW4Qqnm3+t/o0RXZvWe/mKoihXk3oNLr6+vmzduvWMYyEhIXzyySfnnKvRaHj++eervc+YMWMqg8vpOnXqxMqVK885fr4y6tOJ4oqLOl4b5+sSVBRFudqpFfp1pElw9fuUnO94bXTq1ImMjAwyMzNxOBwkJycTHx9/yfdTFEW5UlRwqSNPDG6H0evMrVWNXjqeGNzuku+p1+t57rnnmDhxIkOHDmXIkCG0adPmcquqKIpS79R+LnXk1LjKv1b/xoniCpoEG3licLvLHm/p378//fv3r4sqKoqiXDEquNShEV2bqsF7RVEUVLeYoiiKUg9UcFEURVHqnAouiqIoSp1TwUVRFEWpcyq4KIqiKHVOBZer3IwZM+jVq1fllgWKoijXAhVcrnKjRo1iwYIFDV0NRVGUi6KCS11KWwKvd4SZwfLftCWXfctu3boRFBRUB5VTlDridoHH09C1UK5yahFlXUlbAl9PAefJRJUlmfI1QOfbG65eilKXynJg7WwIjoGb/gp+YQ1dI+UqpVoudSXlxarAcoqzQh5XlN8Dt0sGlp3/g3WzwbynoWukXMVUy6WulGRd3HHlmlJS4cDu8mDQaQn2NTR0dRqGRitbLKf4hTdcXZSrngoudSUoSnaFVXdcuWbkW+zYXR58vXSE+MkgUlLh4K21h1iw8TDjuzXjyVvbXTDAeDwCtxB46X5HnQNaLXT7K0R3k4ElKLqha6RcxX5HP/kNbOBz4HXW3i1eRnn8MkyfPp3x48dz5MgR+vXrxxdffHFZ91POr8BiZ8pnO+nz8lpeWvUrxVYHAHaXh/c3HsYj4NNtx7A7ax7MLrY6+GRzBk98kcbxy9gs7qrkGwYtB4CpA/gENHRtlKtYvQaX0tJSpkyZwq233sqQIUPYuXMnxcXFTJgwgUGDBjFhwgRKSkoAEEIwe/ZsEhISSExMJD09vfI+SUlJDBo0iEGDBpGUlFR5fM+ePSQmJpKQkMDs2bMRQgCct4x61fl2SHzz5F9zGvlv4puXPZj/2muvsXHjRtLT00lNTWXs2LF1U1/lHA6Xh02HCgD4evcJHG4ZRAw6LbffJFugt3VqjEFf869NdomNF77ey/Jdx3l2+R4sNmf9VlxRrkL
2018-09-05 00:48:12 +02:00
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x=\"Gr Liv Area\", y=\"SalePrice\", hue=\"has 2nd Flr\", s=15, data=df);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Basements\n",
"\n",
"Nearly all houses in Ames, IA, have a basement. Therefore, *has Bsmt* is most likely not an important predictor."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 12,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
2020-06-29 01:10:19 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZcAAAEGCAYAAACpXNjrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOydd3hUZdqH70nvlRRK6IiKQGxA6ARDFemCa1lQxIICYln4VhEV3V1XsWEBsa6KFOlBQYMaEAQLXVQQQk8CIb1ncr4/fkmGEkKAhIC+93XlYnLmzDnvDPA+87TfY7Msy8JgMBgMhirEqaYXYDAYDIY/H8a4GAwGg6HKMcbFYDAYDFWOMS4Gg8FgqHKMcTEYDAZDleNS0wu4WGjbti1169at6WUYDAbDJcXBgwdZv379KceNcSmhbt26LFiwoKaXYTAYDJcUgwYNKve4CYsZDAaDocoxxsVgMBgMVY4xLgaDwWCockzOpQIKCws5cOAAeXl5Nb2USxYPDw/q1auHq6trTS/FYDBcQIxxqYADBw7g6+tLw4YNsdlsNb2cSw7LskhJSeHAgQM0atSoppdjMBguICYsVgF5eXkEBwcbw3KO2Gw2goODjednMPwFMZ7LGTCG5fwwn5/hgmFZkJUE2UfBNwy8Q2p6RX9pjHExGAx/DrKSYGYXyEyEy/tC/9fBM7CmV/WXxYTFLmIOHDjAjTfeWOXXbNWqFf379+emm25i+PDh7N69+7yvu379en7++ecqWKHBcI7YC2RYAA5vgaL8059rxlhVO8a4/AWpX78+ixcvZsmSJQwYMIAZM2ac9zU3bNjAxo0bq2B1BsM54u4Hvf4D9aNg8DvgFXzqOfYiSNoGSx6Efeuh0OQDqwsTFrvIsdvtPP7442zcuJGwsDDeeOMNPDw8mDt3LnPmzKGwsJAGDRrw/PPP4+npyeeff87rr7+Ok5MTvr6+fPzxxxVePysrCz8/PwB27tzJpEmTKCwspLi4mNdeew0XFxdGjRpFZGQkGzdu5KqrrmLw4MG8+uqrHDt2jBdeeIGgoCA+/fRTnJycWLJkCU888QTXXXfdhfh4DAYHngFw3UhoNQw8/MDJ+dRzclLg/RshNxW2zIFxW8C19oVf618AY1wucvbu3cu0adOYOnUq48aNY8WKFfTv35+YmBhuvvlmAF566SXmz5/P7bffzhtvvME777xDWFgYGRkZ5V5z37599O/fn+zsbPLy8pg7dy4An376KXfccQc33XQTBQUFFBcXc/ToUfbt28crr7zCc889x5AhQ1i6dCmzZ88mLi6Ot956izfeeIPhw4fj5eXFXXfddcE+G4PhFFzc9XM6bDblYXJTwd0XbCZ4U10Y43KRU69ePa644goAWrRowcGDBwF5GS+//DKZmZlkZ2fTsWNHAK6++momTpxI7969iYmJKfeapWExgOXLl/PEE0/wzjvvEBkZyVtvvUViYiI9evSgYcOGZWto3rw5AE2bNiUqKgqbzUbz5s3L1mMwXBL4hMKIZbB3LdRrYyrKqhFjti9y3Nzcyh47Oztjt9sBmDhxIpMnT2bp0qU88MADFBQUAPD0008zfvx4Dh8+zODBg0lNTa3w+tHR0fz4448A9OvXjzfffBMPDw9Gjx7NunXrTlmDk5NT2e82m61sPQbDJYNfXWg5FAIbgJPZAqsL88leomRnZxMSEkJhYSFLly4tO75v3z5at27NuHHjCAwMJDExscLr/PTTT9SvXx+A/fv3ExERwR133EH37t357bffKr0eb29vsrOzz+3NGAyGPx0mLHaJMm7cOIYOHUpQUBCtW7cu29iff/559u7di2VZtGvXjssvv/yU15bmXCzLwtXVlalTpwLw+eefs3jxYlxcXKhVqxb33HMPWVlZlVpPt27dGDt2LHFxcSahbzAYsFmWKfgGDbw5eVjYjh07yvIdhnPHfI4Gw5+X8vZOMGExg8FgMFQDxrgYDAaDocoxxsVgMBgMVU61GZfdu3fTv3//sp9rrrmG999/n7S0NEaOHEmPHj0YOXIk6enpgGZ/TJ06lZiYGPr168f27dvLrrVw4UJ69OhBjx49WLhwYdnxbdu20a9fP2JiYpg6dSql6aPT3cNgMBgMF4ZqMy6NGzdm8eLFLF68mAULFuDp6UlMTAwzZ84kKiqKlStXEhUVxcyZMwGIj48nISGBlStX8swzzzBlyhRAhmL69OnMnTuXefPmMX369DJjMWXKFJ555hlWrlxJQkIC8fHxAKe9h8FgMBguDBckLLZu3ToiIiKoW7cucXFxDBgwAIABAwbw1VdfAZQdt9lsREZGkpGRQXJyMmvWrKFDhw4EBATg7+9Phw4dWL16NcnJyWRlZREZGYnNZmPAgAHExcWdcK2T72EwGAyGC8MFMS6xsbFl0vEpKSmEhoYCEBISQkpKCgBJSUmEh4eXvSY8PJykpKRTjoeFhZV7vPT8iu5xKRIfH0/Pnj3LvD6DwWC4FKh241JQUMCqVavo1avXKc/ZbLZqn1R4Ie5RXdjtdp5++mlmzZpFbGwsy5YtY9euXTW9LIPBYDgj1W5c4uPjadGiBbVq1QIgODiY5ORkAJKTkwkKCgLkkRwvVZKYmEhYWNgpx5OSkso9Xnp+RfeobhZtPEiHf6+i0cRYOvx7FYs2np+o45YtW2jQoAERERG4ubnRt2/fstCfwWAwXMxUu3GJjY2lb9++Zb9HR0ezaNEiABYtWkT37t1POG5ZFps2bcLX15fQ0FA6duzImjVrSE9PJz09nTVr1tCxY0dCQ0Px8fFh06ZNWJZV7rVOvkd1smjjQSYt2MrBtFws4GBaLpMWbD0vA3O6kKDBYDBc7FSrtlhOTg5r167l6aefLjs2evRoxo8fz/z586lTpw4vv/wyAF26dOHbb78lJiYGT09PnnvuOQACAgK4//77GTJkCABjxowhICAAgCeffJJJkyaRl5dH586d6dy5c4X3qE7+u+I3cgtPVAjOLbTz3xW/MeDqutV+f4PBYLiYqFbj4uXlxfr16084FhgYyAcffHDKuTabjSeffLLc6wwZMqTMuBxPy5YtWbZs2SnHT3eP6uRQWu5ZHa8MpwsJGgwGw8WO6dCvIuoEeJ7V8crQsmVLEhIS2L9/PwUFBcTGxhIdHX3O1zMYDIYLhTEuVcSjPZvj6XrizG5PV2ce7dn8nK/p4uLC5MmTGTVqFH369KF37940a9bsfJdqMBgM1Y6Z51JFlOZV/rviNw6l5VInwJNHezY/73xLly5d6NKlS1Us0WAwGC4YxrhUIQOurmuS9waDwYAJixkMBoOhGjDGxWAwGAxVjjEuBoPBYKhyjHExGAwGQ5VjjIvBYDAYqhxjXC5yJk2aRFRUVNnIAoPBYLgUMMblImfQoEHMmjWrppdhMBgMZ4UxLlXJlrnw0lUwJUB/bpl73pe8/vrr8ff3r4LFGQxVhL0IiotrehWGixzTRFlVbJkLS8dCYYlQZfp+/Q7Q6uaaW5fBUJVkJsKqqRDQAK67E7yDa3pFhosU47lUFXFPOwxLKYW5Om4w/BmwF8mwbPwffD0VkrbV9IoMFzHGc6kq0g+c3XHDpUVuKhQVgIsbeAbW9GpqBpuTPJZSvENqbi2Gix5jXKoK/3oKhZV33HDpkHUE7Hng6gVeJSGf3DRY/SKsex2u+Tt0nwxeZxidXVwMlh2cXat/zRcKJye4/k6IuF6GxT+ipldkuIgxYbGqovtkcD1pdourp46fBxMmTGD48OHs2bOHzp07M2/evPO6nqECso/CZ3eqGGPlE/JWAIryYd10sIrhp/f0e0XkHIMNM2Hx/ZBWzheOSxmvYGjcFcJagIdvTa/GcBFTrcYlIyODsWPH0qtXL3r37s3GjRtJS0tj5MiR9OjRg5EjR5Keng6AZVlMnTqVmJgY+vXrx/bt28uus3DhQnr06EGPHj1YuHBh2fFt27bRr18/YmJimDp1KpZlAZz2HtVKq5uh36sl3+Zs+rPfq+edzJ82bRpr1qxh+/btxMfHM3To0KpZr+FUivJhT7web5uvMBiAsxtE3q7
2018-09-05 00:48:12 +02:00
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x=\"Gr Liv Area\", y=\"SalePrice\", hue=\"has Bsmt\", s=15, data=df);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Fireplaces\n",
"\n",
"Bigger houses are more likely to have a fireplace. Thus, the variable *has Fireplace* might be an interesting predictor."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 13,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
2020-06-29 01:10:19 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZcAAAEGCAYAAACpXNjrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOydd3RU1dqHn0nvhZACJHQUpcUGhFCDCc0gVSxXBQtXAQGxXPAKoiD3qp9YLhawYAEVBAIaEKJBDEhV6U0QAgmQCaS3STIz+/vjTSGQhAAZArKftbKYOXPm7J0Bzm/eblBKKTQajUajqUXs6noDGo1Go/n7ocVFo9FoNLWOFheNRqPR1DpaXDQajUZT62hx0Wg0Gk2t41DXG7ha6NSpE40aNarrbWg0Gs01xYkTJ9iyZct5x7W4lNCoUSOWLVtW19vQaDSaa4ohQ4ZUely7xTQajUZT62hx0Wg0Gk2to8VFo9FoNLWOjrlUQ3FxMcnJyZhMprreigZwcXEhODgYR0fHut6KRqO5AFpcqiE5ORlPT0+aNm2KwWCo6+1c1yilSEtLIzk5mWbNmtX1djQazQXQbrFqMJlM+Pn5aWG5CjAYDPj5+WkrUqO5RtCWywXQwnL1oP8uNNWiFOQaIe8MeAaCu39d7+i6RouLRqP5e5BrhHk9ICcFWg+Au98DV9+63tV1i3aLXcUkJydz11131fo127dvz9133132c/LkScaPH19ra9xyyy21di2NpsZYikRYAE7tAnNh1efqMVY2R1su1yGNGzdmxYoVFY69++67551nNptxcND/RDTXCM5e0Pc12Lcc7nwZ3PzOP8dihjMHYPOHcMuD0KADOLpc+b1eB+g7x1WOxWLhxRdfZPv27QQGBvL+++/j4uLC4sWLWbRoEcXFxTRp0oTXX38dV1dXfvjhB9577z3s7Ozw9PRk4cKFF1wjOTmZJ554gtjYWJYtW0ZcXBz5+flYrVbmzZvHjBkzOHToEGazmXHjxnHnnXeybNkyfvzxR3JzczEajQwcOJBx48ZVuG5eXh5jxowhOzsbs9nMhAkTuPPOOwFYvnw5n3zyCQaDgRtvvJE33niD9PR0XnrpJU6ePAnACy+8wG233Vb7H6rm74mrD9w+CtqPABcvsLM//5z8NPjsLijIgF2LYMIucGxw5fd6PaA0SimlBg8efN6xffv21cFOyklKSlI33XRT2T7Gjx+vli9frpRSKj09vey82bNnqy+++EIppdRdd92lUlJSlFJKZWVlVXrNdu3aqYEDB6qBAweq6dOnq6SkJDVgwACllFJLly5V3bp1UxkZGUoppd58882yNbOyslRUVJTKy8tTS5cuVeHh4So9PV0VFBSoAQMGqF27dimllAoNDVVKKVVcXKxycnKUUkqlpaWpO++8U1mtVvXnn3+qqKgolZaWppRSZWtNmjRJbdu2TSml1IkTJ1Tfvn3P239d/51ornFyjEq9E6rUS15KvdZMqeyUut7RNU9l906llNKWy1VOcHAwN910EwBt2rThxIkTABw6dIi3336bnJwc8vLy6Nq1KyDxjsmTJ9OvXz8iIyMrvea5brHk5OQKr4eHh+Pj4wPAhg0bWLt2LZ9++ikAhYWFnDp1CoAuXbrg6ysB08jISH7//XfatWtXdh2lFLNnz2bbtm3Y2dlhNBo5c+YMmzdvpm/fvtSrVw+gbK2NGzdy+PDhsvfn5uaSl5eHu7v7pXx0Gs35eATAyFg4thGCO+qMMhuixeUqx8nJqeyxvb09hYUSpJw8eTLvv/8+rVu3ZtmyZWzduhWAV155hZ07d7Ju3TqGDh3K0qVLywSgpri6ulZ4/u6779K8efMKx3bu3HleavC5z7///nvS09NZtmwZjo6ORERElO2/MqxWK4sXL8bZ2fmi9qvRXBRejaDd8Lrexd8enS12jZKXl4e/vz/FxcV8//33ZcePHz9Ohw4dmDBhAr6+vqSkpFzWOl27dmXBggWokuyaffv2lb3266+/kpmZiclk4qeffuLWW2+t8N6cnBz8/PxwdHRk8+bNZVZX586dWb16NRkZGQBkZmaWrfXll1+WvX///v2XtXeNRlN3aMvlGmXChAkMHz6cevXq0aFDB/Ly8gB4/fXXOXbsGEopOnfuTOvWrS9rnTFjxjBr1iwGDhyI1WolODiYuXPnAtC+fXueeuqpsoD+2S4xgOjoaJ588kmio6Np27ZtmfXTqlUrnnjiCR588EHs7Oy4+eab+e9//8u///1vXnnlFaKjo7FYLNx+++288sorl7V/jUZTNxiU0gnfIANvzh0Wtn///rJ4h6Yiy5YtY8+ePUybNu2Krqv/TjSaq4vK7p2g3WIajUajsQHaLaa5JIYMGVLleFONRqPRlotGo9Foah2bicuRI0cq9K+69dZb+eyzz8jMzGTUqFFERUUxatQosrKyAKmJmDlzJpGRkURHR7N3796ya8XExBAVFUVUVBQxMTFlx/fs2UN0dDSRkZHMnDmzLKOpqjU0Go1Gc2Wwmbg0b96cFStWsGLFCpYtW4arqyuRkZHMmzePsLAw4uLiCAsLY968eQAkJCSQmJhIXFwcM2bMYPr06YAIxZw5c1i8eDHffvstc+bMKROL6dOnM2PGDOLi4khMTCQhIQGgyjU0Go1Gc2W4Im6xTZs2ERISQqNGjYiPj2fQoEEADBo0iJ9++gmg7LjBYCA0NJTs7GxSU1PZsGFDWcW4t7c34eHhrF+/ntTUVHJzcwkNDcVgMDBo0CDi4+MrXOvcNTQajUZzZbgi4rJy5cqy1vFpaWkEBAQA4O/vT1paGgBGo5GgoKCy9wQFBWE0Gs87HhgYWOnx0vOrW+NaJCEhgT59+pRZfRqNRnMtYHNxKSoqYu3atfTt2/e81wwGg82nC16JNWyFxWLhlVde4eOPP2blypXExsZW6L2l0Wg0Vys2F5eEhATatGlD/fr1AfDz8yM1NRWA1NTUsuaFgYGBFVqVpKSkEBgYeN5xo9FY6fHS86tbw9Ys336C8P+updnklYT/dy3Lt5+4rOvt2rWLJk2aEBISgpOTEwMGDChz/Wk0Gs3VjM3FZeXKlQwYMKDseUREBMuXLwdkpkfv3r0rHFdKsWPHDjw9PQkICKBr165s2LCBrKwssrKy2LBhA127diUgIAAPDw927NiBUqrSa527hi1Zvv0EU5bt5kRmAQo4kVnAlGW7L0tgqnIJajQazdWOTYso8/Pz2bhxY4X+UKNHj2bixIksWbKEhg0b8vbbbwPQo0cPfvnlFyIjI3F1dWXWrFmAtGMfM2YMw4YNA2Ds2LFlLdpfeuklpkyZgslkonv37nTv3r3aNWzJG2sOUlBsqXCsoNjCG2sOMuiWRjZfX6PRaK4mbCoubm5ubNmypcIxX19fPv/88/PONRgMvPTSS5VeZ9iwYWXicjbt2rUjNjb2vONVrWFLTmYWXNTxmlCVS1Cj0WiudnSFfi3R0Mf1oo7XhHbt2pGYmEhSUhJFRUWsXLmSiIiIS76eRqPRXCm0uNQSz/W5EVfHijO7XR3tea7PjZd8TQcHB6ZNm8Zjjz1G//796devH61atbrcrWo0Go3N0Y0ra4nSuMobaw5yMrOAhj6uPNfnxsuOt/To0YMePXrUxhY1Go3miqHFpRYZdEsjHbzXaDQatFtMo9FoNDZAi4tGo9Foah0tLhqNRqOpdbS4aDQajabW0eKi0Wg0mlpHi8tVzpQpUwgLCysbWaDRaDTXAlpcrnKGDBnCxx9/XNfb0Gg0motCi0ttsmsxvNUWpvvIn7sWX/Yl77jjDry9vWthcxpNLWExg9Va17vQXOXoIsraYtdi+H48FJc0qsxKkucA7e+pu31pNLVJTgqsnQk+TeD2R8Ddr653pLlK0ZZLbRH/SrmwlFJcIMc1mr8DFrMIy/Yv4eeZYNxT1zvSXMVoy6W2yEq+uOOaa4uCDDAXgYMTuPrW9W7qBoOdWCyluPvX3V40Vz1aXGoL72BxhVV2XHPtkHsaLCZwdAO3EpdPQSasfxM2vQe3Pgy9p4HbBUZnW62gLGDvaPs
2018-09-05 00:48:12 +02:00
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x=\"Gr Liv Area\", y=\"SalePrice\", hue=\"has Fireplace\", s=15, data=df);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Garages\n",
"\n",
"Holding the overall living area fixed adding a garage seems to affect the price positively. Thus, *has Garage* seems like an interesting predictor as well."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 14,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
2020-06-29 01:10:19 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZcAAAEGCAYAAACpXNjrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOydd3hU1daH30nvlRQgoYMgLaCU0AmEapQqNrygiFIExAbfVUAp12vBckEFUSwgUiShBCUaxNBBIYSqIISeBNJ7MjPn+2NlMgSSECBDKPt9njxMzpw5e88AZ81qv6XTNE1DoVAoFIpKxKqqN6BQKBSKuw9lXBQKhUJR6SjjolAoFIpKRxkXhUKhUFQ6yrgoFAqFotKxqeoN3C60a9eOmjVrVvU2FAqF4o7i3Llz7Nq166rjyrgUUbNmTVavXl3V21AoFIo7ikGDBpV6XIXFFAqFQlHpKOOiUCgUikpHGReFQqFQVDoq51IOhYWFnD17lry8vKreyj2Bg4MDAQEB2NraVvVWFArFTaKMSzmcPXsWV1dX6tSpg06nq+rt3NVomkZycjJnz56lbt26Vb0dhUJxk6iwWDnk5eXh7e2tDMstQKfT4e3trbxEheIuQXku10AZlluH+qwVN4WmQVYiZF8CVz9w9qnqHd3TKOOiUCjuDrISYWFXyEyAxv3hkfng6FnVu7pnUWGx25izZ8/y0EMPVfp14+Pjef755+nZsyeDBg1i+PDh7Nmzp9LXUShuKYYCMSwAF+JAn1/2uWqMlcVRnss9Rn5+Ps8//zyvvfYaPXr0AODvv//m4MGDtGnTpkLX0Ov12NiofzqK2wx7N+jzXzgcAT3fAifvq88x6OHSUdj5ObQaDtVbgq3Drd/rPYC6Q9zmGAwG3njjDfbt24efnx+ffvopDg4OrFixguXLl1NYWEjt2rV59913cXR05KeffmL+/PlYWVnh6urK0qVLS1xv7dq1BAUFFRsWgEaNGtGoUSMA4uLimD17Nvn5+Tg4ODBnzhzq1avH6tWriYqKIicnB6PRyIIFCxg7diwZGRno9XomTpxIz549AZg/fz5r167Fy8uL6tWr07RpU5599llOnz7NW2+9RWpqKg4ODsycOZP69evfug9TcXfj6AEPjoQWw8DBDaysrz4nJxm+fghyUyFuOUyMA9vqt36v9wKaQtM0TRs4cOBVxw4fPlwFOzFz5swZrUmTJsX7mDBhghYREaFpmqalpKQUnzd37lzt22+/1TRN0x566CEtISFB0zRNS09Pv+qac+bM0b7++usy18zMzNQKCws1TdO0bdu2aePHj9c0TdN+/PFHrXPnzlpqaqqmaZpWWFioZWZmapqmacnJyVrPnj01o9Go7d+/X3v44Ye1vLw8LTMzUwsNDdUWLVqkaZqmPf3009rJkyc1TdO02NhYbfjw4VetX9WfueIuJzNR0z4O0rTpbpr237qalpFQ1Tu64ynt3qlpmqY8l9ucgIAAmjRpAkDTpk05d+4cAMeOHeOjjz4iMzOT7OxsOnXqBECrVq2YMmUKffv2JTQ09JrXHzduHKdOnaJOnTrMmzePzMxMXn/9dU6dOoVOp6OwsLD43I4dO+Lh4QFIX8rcuXPZs2cPVlZWJCYmcunSJfbu3UuPHj2wt7fH3t6e7t27A5Cdnc2+ffuYOHFi8fUKCgoq50NSKCqKiy+MWA+ntkNAW1VRZkGUcbnNsbOzK35sbW1Nfr4kKadMmcKnn35K48aNWb16Nbt37wbg7bffZv/+/WzevJnBgwfz448/4ulprphp0KABf/zxR/Hv8+fP58CBA7z77rsAfPzxx7Rr14758+dz9uxZnn766eJzHR0dix+vW7eOlJQUVq9eja2tLSEhIcV7Kw1N03Bzc2PNmjU3+YkoFDeJW01oPrSqd3HXo6rF7lCys7Px8fGhsLCQdevWFR8/ffo0LVu2ZOLEiXh6epKQkFDidWFhYezdu5fo6OjiY5c3LmZmZuLn5wdAeHh4metnZmbi7e2Nra0tO3fuLPaoWrduzW+//UZ+fj7Z2dls3rwZABcXFwICAvjpp58AMTZHjx69uQ9BoVDctijP5Q5l4sSJDB06FC8vL1q2bEl2djYA7777LqdOnULTNNq3b0/jxo1LvM7BwYHPP/+cd955hzlz5lCtWjWcnZ0ZM2YMAKNGjWLKlCl89tlndO3atcz1w8LCGDNmDGFhYTRr1ox69eoB0KJFC0JCQnj44Yfx9vamUaNGuLq6AvDee+8xY8YMPvvsM/R6Pf369btqfwqF4u5Ap2mq4Btk4M2Vw8KOHDlSnO9QVJzs7GycnZ3Jzc3lySefZObMmTRt2rRCr1WfuUJxZ1HavROU56KwANOmTeP48ePk5+czcODAChsWhUJx96CMi6LS+eCDD6p6CwqFoopRCX2FQqFQVDoWMy4nTpzgkUceKf5p3bo1X3/9NWlpaYwcOZJevXoxcuRI0tPTAakemjVrFqGhoYSFhXHo0KHia4WHh9OrVy969epVooLp4MGDhIWFERoayqxZszClj8paQ6FQKBS3BosZl3r16rFmzRrWrFnD6tWrcXR0JDQ0lIULFxIcHExUVBTBwcEsXLgQgJiYGOLj44mKimLmzJnMmDEDEEMxb948VqxYwcqVK5k3b16xsZgxYwYzZ84kKiqK+Ph4YmJiAMpcQ6FQKBS3hlsSFtuxYweBgYHUrFmT6OhoBgwYAMCAAQP49ddfAYqP63Q6goKCyMjIICkpia1btxZ3hru7u9OxY0e2bNlCUlISWVlZBAUFodPpGDBgQHHvRllrKBQKheLWcEuMS2RkZLF0fHJyMr6+vgD4+PiQnJwMQGJiIv7+/sWv8ff3JzEx8arjfn5+pR43nV/eGnciMTEx9O7du9jrUygUijsBixuXgoICNm3aRJ8+fa56TqfTWXz64K1Yw1IYDAbefvttFi1aRGRkJOvXr+f48eNVvS2FQqG4JhY3LjExMTRt2pRq1aoB4O3tTVJSEgBJSUl4eXkB4pFcLlWSkJCAn5/fVccTExNLPW46v7w1LE3EvnN0fGcTdadE0vGdTUTsO3dT14uLi6N27doEBgZiZ2dH//79S8i2KBQKxe2KxY1LZGQk/fv3L/49JCSEiIgIACIiIornipiOa5pGbGwsrq6u+Pr60qlTJ7Zu3Up6ejrp6els3bqVTp064evri4uLC7GxsWiaVuq1rlzDkkTsO8fU1Qc4l5aLBpxLy2Xq6gM3ZWDKCgkqFArF7Y5FmyhzcnLYvn07b7/9dvGx0aNHM2nSJFatWkWNGjX46KOPAOjatSu///47oaGhODo6MmfOHAA8PDwYO3YsQ4YMAUQi3iT7Pn36dKZOnUpeXh5dunShS5cu5a5hSd7b+Be5hYYSx3ILDby38S8GtKpp8fUVCoXidsKixsXJyYldu3aVOObp6ck333xz1bk6nY7p06eXep0hQ4YUG5fLad68OevXr7/qeFlrWJLzabnXdbwilBUSVCgUitsd1aFfSdTwcLyu4xWhefPmxMfHc+bMGQoKCoiMjCQkJOSGr6dQKBS3CmVcKolXe9+Ho23Jmd2Otta82vu+G76mjY0N06ZNY9SoUfTr14++ffvSsGHDm92qQqFQWBwlXFlJmPIq7238i/NpudTwcOTV3vfddL6la9eu5c5VUSgUitsRZVwqkQGtaqrkvUKhUKDCYgqFQqGwAMq4KBQKhaLSUcZFoVAoFJWOMi4KhUKhqHSUcVEoFApFpaOMy23O1KlTCQ4OLh5ZoFAoFHcCyrjc5gwaNIhFixZV9TYUCoXiulDGpTKJWwEfNoMZHvJn3IqbvmSbNm1wd3evhM0pFJWEQQ9GY1XvQnGbo5ooK4u4FbBuAhQWCVWmn5HfAVo8WnX7Uigqk8wE2DQLPGrDg8+As3dV70hxm6I8l8oi+m2zYTFRmCvHFYq7AYNeDMu+7+C3WZB4sKp3pLiNUZ5LZZF+9vqOK+4sclNBXwA2duDoWdW7qRp0VuKxmHD2qbq9KG57lHGpLNwDJBRW2nHFnUPWRTDkga0TOBWFfHL
2018-09-05 00:48:12 +02:00
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x=\"Gr Liv Area\", y=\"SalePrice\", hue=\"has Garage\", s=15, data=df);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Pools\n",
"\n",
"Unfortunately, almost no one in Ames, IA, has a pool. The predictor *has Pool* seems quite uninteresting."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 15,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
2020-06-29 01:10:19 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZcAAAEGCAYAAACpXNjrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdd3SUVfrA8e/UzEwyyaR3AqFGWlBaJIRmQMBIF8u6CytiQUGxYkEU1F3XLurC4g8L6goISNE1GkoIXelBOoEkkJ7JZDJ95v39MTCChBAgMaD3cw6H5M7Me29yZt4ntz1XJkmShCAIgiA0IHlTN0AQBEH44xHBRRAEQWhwIrgIgiAIDU4EF0EQBKHBieAiCIIgNDhlUzfgatGjRw9iY2ObuhmCIAjXlMLCQrZs2XJeuQgup8XGxrJkyZKmboYgCMI1ZeTIkbWWi2ExQRAEocGJ4CIIgiA0OBFcBEEQhAYn5lzq4HQ6KSgowGazNXVTrgkajYa4uDhUKlVTN0UQhCYmgksdCgoK0Ov1NG/eHJlM1tTNuapJkkR5eTkFBQW0aNGiqZsjCEITE8NidbDZbISGhorAUg8ymYzQ0FDRyxMEARA9l4sSgaX+xO9KaEqSJFFabae8xkGE3o/QAL+mbtKfmggugiD8IZRW27nlvRxKqu0MvC6S10Z3wqBTN3Wz/rTEsNhVrKCggFtuuaXBr9mpUyeGDRvGkCFDmD59Oh6P55Kv89577/HRRx81aNsE4Uo43B5Kqu0A5J404XDV8b4Wx1g1OhFc/oSaNWvGN998w/Llyzly5Ag//vhjUzdJEK6YXqPkhYzr6NY8mHdvTybYv5ZVi24XFO+F5Q/DiS3gFHOEjUUMi13l3G43zz33HDt27CAyMpIPPvgAjUbDwoUL+eqrr3A6nSQkJPDaa6+h1Wr57rvveP/995HL5ej1ej7//PMLXlupVNKlSxeOHz9OQUEBzzzzDJWVlYSEhPDqq68SExNzwXJBuNoEadXc2aMZI7rEoteoUMhrmQO0lMPHt4C1EnZ/BVN2gyr692/sn4DouVzljh8/zl133cWqVavQ6/V8//33AKSnp/P111+zfPlyEhMTWbx4MQAffPABH330EcuXL+fDDz+s89pWq5VNmzbRpk0bZs2axYgRI1ixYgUZGRnMmjUL4ILlgnA18lMqMOjUtQcWAJkMtMGnn6wHmbgFNhbxm73KxcXFkZSUBED79u0pLCwE4NChQ9x5551kZGSwYsUKDh06BECXLl14+umnWbhwIW63u9ZrnjhxgmHDhnHHHXfQt29f+vTpw44dO3zzO8OGDePnn38GuGC5IFyTAiJg3EoYNQ/uXQP+4U3doj8sMSx2lVOrf13tolAosNu9E5ZPP/00H3zwAe3atWPJkiVs3boVgJdeeoldu3axdu1aRo0axddff01wcPA51zwz5yIIf0qBsdBxTFO34g9P9FyuUTU1NYSHh+N0OlmxYoWv/MSJE3Tu3JkpU6YQHBxMUVFRva7XpUsXVq1aBcCKFSvo2rVrneWCIAh1ET2Xa9SUKVMYM2YMISEhdO7cmZqaGgBee+01jh8/jiRJ9OzZk3bt2tXres8//zzTpk3jo48+8k3c11UuCIJQF5kkiQXf4D3w5reHhf3yyy+++Q6hfsTvTBD+XGq7d4IYFhMEQRAagQgugiAIQoMTwUUQBEFocI0WXI4ePcqwYcN8/66//no+/vhjjEYj48ePZ+DAgYwfP56qqirAm9F01qxZpKenk5GRQW5uru9aS5cuZeDAgQwcOJClS5f6yvfu3UtGRgbp6enMmjWLM9NHF6pDEARB+H00WnBJTEzkm2++4ZtvvmHJkiVotVrS09OZO3cuKSkpZGZmkpKSwty5cwHIzs4mLy+PzMxMZs6cyYwZMwBvoJg9ezYLFy5k0aJFzJ492xcsZsyYwcyZM8nMzCQvL4/s7GyAC9YhCIIg/D5+l2GxTZs2ER8fT2xsLFlZWQwfPhyA4cOH+5ImnimXyWQkJydjMpkoKSkhJyeHXr16YTAYCAoKolevXqxfv56SkhLMZjPJycnIZDKGDx9OVlbWOdf6bR2CIAjC7+N3CS6rVq3ypRApLy8nIiICgPDwcMrLywEoLi4mKirK95qoqCiKi4vPK4+MjKy1/Mzz66rjWpSdnc2gQYN8vT5BEIRrQaMHF4fDwerVq7n55pvPe0wmkzX66YW/Rx2Nxe1289JLLzFv3jxWrVrFypUrOXz4cFM3SxAE4aIaPbhkZ2fTvn17wsLCAAgNDaWkpASAkpISQkJCAG+P5OxUJUVFRURGRp5XXlxcXGv5mefXVUdjW7ajkF7/WE2Lp1fR6x+rWbaj8Iqut3v3bhISEoiPj0etVjN06FDf0J8gCMLVrNGDy6pVqxg6dKjv+/79+7Ns2TIAli1bxoABA84plySJnTt3otfriYiIIDU1lZycHKqqqqiqqiInJ4fU1FQiIiIICAhg586dSJJU67V+W0djWrajkGlL9lBotCIBhUYr05bsuaIAc6EhQUEQhKtdo+YWs1gsbNy4kZdeeslXNnHiRB555BEWL15MTEwMb7/9NgB9+vRh3bp1pKeno9VqeeWVVwAwGAw8+OCDjB49GoBJkyZhMBgAeOGFF5g2bRo2m420tDTS0tLqrKMx/ev7A1id56a4tzrd/Ov7AwzvEtvo9QuCIFxNGjW46HQ6tmzZck5ZcHAwn3zyyXnPlclkvPDCC7VeZ/To0b7gcraOHTuycuXK88ovVEdjOmm0XlJ5fVxoSFAQBOFqJ3boN5AYg/aSyuujY8eO5OXlkZ+fj8PhYNWqVfTv3/+yrycIgvB7EcGlgTwxqC1aleKcMq1KwROD2l72NZVKJdOnT2fChAkMGTKEwYMH07p16yttqiAIQqMT57k0kDPzKv/6/gAnjVZiDFqeGNT2iudb+vTpQ58+fRqiiYIgCL8bEVwa0PAusWLyXhAEATEsJgiCIDQCEVwEQRCEBieCiyAIgtDgRHARBEEQGpwILoIgCEKDE8HlKjdt2jRSUlJ8RxYIgiBcC0RwucqNHDmSefPmNXUzBEEQLokILg1p90J4qwPMMHj/373wii/ZrVs3goKCGqBxgtAwXG4PHo/U1M0QrnJiE2VD2b0QVkwG5+lElVX53u8BOt3WdO0ShAZUYrLxRuYB4kJ03NUjgRB/dVM3SbhKiZ5LQ8l66dfAcobT6i0XhD8Al9vDG5kH+OqnAt7IPMj+U6ambpJwFRM9l4ZSVXBp5cK1xVoJLgco1aANburWNAm5TEZciM73fWiA6LUIFyaCS0MJivMOhdVWLlwzysx27C4POpWC4DNDPlYjrH8DNr0P1/8NBkwHXd1HZ3s8Em5JQqX44wwOyOUy/tIjgeubBRPqr76i4ySEP74/zju/qQ2YDqrffNhUWm/5FZg6dSq33347x44dIy0tjUWLFl3R9YQLKzfbmfzlDnr9YzUvf/sLRovD+4DLDptmg+SBn+d7v6+D0eLgk015PLFoN4VXcFjc1SjYX02vVmG0iw5Er1E1dXOEq1ijBheTycTkyZO5+eabGTx4MDt27MBoNDJ+/HgGDhzI+PHjqaqqAkCSJGbNmkV6ejoZGRnk5ub6rrN06VIGDhzIwIEDWbp0qa987969ZGRkkJ6ezqxZs5Ak7wqWC9XRqDrdBhnvQlA8IPP+n/HuFU/mv/nmm+Tk5JCbm0t2djZjxoxpmPYK53G4PGw8Ug7Ail0ncbg93gcUaki+2/t1+xHeobE6nKqy8eKKfSzbWcjzy/Zitjkbs9mCcFVq1ODy8ssv07t3b/73v//xzTff0LJlS+bOnUtKSgqZmZmkpKQwd+5cALKzs8nLyyMzM5OZM2cyY8YMwBsoZs+ezcKFC1m0aBGzZ8/2BYsZM2Ywc+ZMMjMzycvLIzs7G+CCdTS6TrfBo3thhtH7v1gldk3RqBSMu7E5gRoljw086/A3XTCkvwhTf4Ghb4AutM7r6DVKFHIZAHHBWpR/oKExQaivRnvXV1dXs23bNkaPHg2AWq0mMDCQrKwshg8fDsD
2018-09-05 00:48:12 +02:00
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x=\"Gr Liv Area\", y=\"SalePrice\", hue=\"has Pool\", s=15, data=df);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Porch\n",
"\n",
"Most houses have a porch."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 16,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
2020-06-29 01:10:19 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZcAAAEGCAYAAACpXNjrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOydd3hTZfvHP2nadG86gDIFRdkoo2yKLcvKFnGCIg4UEEXBVwEF9f05cLw4QBQHyJSyilAtQkEQUCllyixQoAl07zbJ+f1xd1BoS4GGgjyf6+pFcnJynqcBzjf31mmapqFQKBQKRRViV90bUCgUCsW/DyUuCoVCoahylLgoFAqFospR4qJQKBSKKkeJi0KhUCiqHPvq3sCNQvv27aldu3Z1b0OhUChuKk6fPs327dsvOa7EpZDatWuzfPny6t6GQqFQ3FQMGjSozOPKLaZQKBSKKkeJi0KhUCiqHCUuCoVCoahyVMylAgoKCkhISCA3N7e6t3LT4+TkRFBQEA4ODtW9FYVCcR1Q4lIBCQkJuLu7U79+fXQ6XXVv56ZF0zSSkpJISEigQYMG1b0dhUJxHVBusQrIzc3F19dXCcs1otPp8PX1VRagQnELoSyXy6CEpWpQn6PC5mgaZBoh6zy4B4CrX3Xv6JZGiYtCofh3kGmEOd0gIxGa9IP+n4Gzd3Xv6pZFucVuYBISErjvvvuq/JotWrSgf//+9O3blylTpmC1Wq/pmtu3b+fpp5+uoh0qFFeJJV+EBeBsHJjzyj9XjbGyOUpcbkHq1q3LypUrWbVqFUePHuXXX3+t1PvMZrONd6ZQXAOOHtD7/6BuMAz+Glx8Lz3HYgbjXlj1ApzcDgUqDmgrlFvsBsdisfD666+za9cuAgIC+Pzzz3FycmLJkiUsXryYgoIC6tWrx3vvvYezszM///wzn332GXZ2dri7u7NgwYJyr21vb0/r1q05ceIECQkJvPbaa6SkpODj48O7775LrVq1mDRpEgaDgQMHDtCmTRseeughpk6dSnJyMnq9nk8++QSA7Oxsxo4dy6FDh2jatCkffPCBirMori/OXnDPSGgxDJw8wE5/6TnZSfDtfZCTAnGLYVwcONS8/nu9BVCWyw3OiRMnePjhh4mMjMTd3Z3169cDEBoayk8//cSqVato2LAhy5YtA+Dzzz/n66+/ZtWqVXzxxRcVXjsnJ4dt27Zx++23M2PGDAYOHMjq1asJDw9nxowZxecZjUYWLVrE5MmTefnll3n44YdZtWoVixYtws9Pgqb79+/ntddeY+3atSQkJPDXX3/Z6BNRKCrA3hFcvMsWFgCdriQO4+gOOnULtBXqk73BCQoK4s477wSgadOmnD59GoDDhw/z0EMPER4ezurVqzl8+DAArVu3ZtKkSSxZsgSLxVLmNU+ePEn//v0ZPnw43bt3p1u3buzatas4vtO/f/9S4tC7d2/0ej2ZmZkYjUZCQ0MBcHR0xNnZGYAWLVoQGBiInZ0dTZo0Kd6nQnFD4eYPI9bA4Lnw1G8qo8yGKLfYDY7BYCh+rNfrycuTIOWkSZP4/PPPadKkCcuXL2fHjh0AvPXWW+zevZuNGzcyePBgfvrpJ7y9S2fMFMVcKkuRgFzJPssTNoWi2vGoDc2HVvcu/vUoy+UmJSsrCz8/PwoKCli9enXx8ZMnT9KyZUvGjRuHt7c3iYmJlbpe69atiYyMBGD16tXcc889l5zj5uZGYGBgcQJAfn4+OTk5VfDbKBSKfxvKcrlJGTduHEOHDsXHx4eWLVuSlZUFwHvvvceJEyfQNI0OHTrQpEmTSl3vjTfeYPLkyXz99dfFAf2yeO+995gyZQqffPIJDg4OxQF9hUKhuBCdpqmEb5CBNxcPCztw4EBxvENx7ajPU6H491HWvROUW0yhUCgUNkCJi0KhUCiqHCUuCoVCoahybCYux44do3///sU/bdq04dtvvyU1NZWRI0cSFhbGyJEjSUtLA2Tmx4wZMwgNDSU8PJx9+/YVXysiIoKwsDDCwsKIiIgoPr53717Cw8MJDQ1lxowZFIWPyltDoVAoFNcHm4lLw4YNWblyJStXrmT58uU4OzsTGhrKnDlzCA4OJioqiuDgYObMmQNATEwM8fHxREVFMX36dKZNmwaIUMyaNYslS5awdOlSZs2aVSwW06ZNY/r06URFRREfH09MTAxAuWsoFAqF4vpwXdxi27Zto06dOtSuXZvo6GgGDBgAwIABA4prJoqO63Q6WrVqRXp6OiaTiS1bttCpUye8vLzw9PSkU6dObN68GZPJRGZmJq1atUKn0zFgwACio6NLXeviNRQKhUJxfbgu4hIZGVncWiQpKQl/f38A/Pz8SEpKAqR/VWBgYPF7AgMDMRqNlxwPCAgo83jR+RWtcTMSExNDr169iq0+hUKhuBmwubjk5+ezYcMGevfufclrOp3O5p1zr8catsJisfDWW28xd+5cIiMjWbNmDUeOHKnubSkUCsVlsbm4xMTE0LRpU2rUqAGAr68vJpMJAJPJhI+PDyAWyYWtShITEwkICLjkuNFoLPN40fkVrWFrVuw6Taf/bqDBpEg6/XcDK3ZdW/PGuLg46tWrR506dTAYDPTr16/Y9adQKBQ3MjYXl8jISPr161f8PCQkhBUrVgCwYsUKevbsWeq4pmnExsbi7u6Ov78/nTt3ZsuWLaSlpZGWlsaWLVvo3Lkz/v7+uLm5ERsbi6ZpZV7r4jVsyYpdp5m8fA+nU3PQgNOpOUxevueaBKY8l6BCoVDc6Ni0t1h2djZbt27lrbfeKj42evRoxo8fz7Jly6hVqxYff/wxAN26dWPTpk2Ehobi7OzMO++8A4CXlxfPPfccQ4YMAWDMmDF4eXkBMHXqVCZPnkxubi5du3ala9euFa5hS95f/w85BaU7AecUWHh//T8MaF3b5usrFArFjYRNxcXFxYXt27eXOubt7c133313ybk6nY6pU6eWeZ0hQ4YUi8uFNG/enDVr1lxyvLw1bMmZ1LK7A5d3vDKU5xJUKBSKGx1VoV9F1PIqe+ZJeccrQ/PmzYmPj+fUqVPk5+cTGRlJSEjIVV9PoVAorhdKXKqIib3uwNmh9GhVZwc9E3vdcdXXtLe3Z8qUKYwaNYq+ffvSp08fGjdufK1bVSgUCpuj5rlUEUVxlffX/8OZ1BxqeTkzsdcd1xxv6datG926dauKLSoUCsV1Q4lLFTKgdW0VvFcoFAqUW0yhUCgUNkCJi0KhUCiqHCUuCoVCoahylLgoFAqFospR4qJQKBSKKkeJyw3O5MmTCQ4OLh5ZoFAoFDcDSlxucAYNGsTcuXOrexsKhUJxRShxqUrilsBHzWCal/wZt+SaL9m2bVs8PT2rYHMKRRVhMYPVWt27UNzgqCLKqiJuCaweCwWFjSrTTslzgBYPVN++FIqqJCMRNswAr3pwzxPg6lvdO1LcoCjLpaqIfqtEWIooyJHjCsW/AYtZhGXXD/DbDDDure4dKW5glOVSVaQlXNlxxc1FTgqY88HeAM7e1b2b6kFnJxZLEa5+1bcXxQ2PEpeqwjNIXGFlHVfcPGSeA0suOLiAS6HLJycVNn8I2z6DNo9DzyngcpnR2VYraBbQO9h+z9cLOzto+wTUaSvC4lmnunekuIFRbrGqoucUcLhodouDsxy/BiZMmMCDDz7I8ePH6dq1K0uXLr2m6ykqIOs8/PSEJGNEvSHWCoA5D7bNAs0Kf82T5xWRnQw75sDK5yC1jC8cNzMuvtCwOwQ0BSf36t6N4gbGpuKSnp7O2LFj6d27N3369GHXrl2kpqYycuRIwsLCGDlyJGlpaQBomsaMGTMIDQ0lPDycffv2FV8nIiKCsLAwwsLCiIiIKD6+d+9ewsPDCQ0NZcaMGWiaBlDuGjalxQMQ/mnhtzmd/Bn+6TUH82fOnMmWLVvYt28fMTExDB06tGr2q7gUcx4cj5HHe5eJGwxAb4BWj8rjpgPFNVYRGWdg3auS5LH2JcjNsN2eFYobFJuKy9tvv02XLl1Yt24dK1eu5LbbbmPOnDkEBwcTFRVFcHAwc+b
2018-09-05 00:48:12 +02:00
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x=\"Gr Liv Area\", y=\"SalePrice\", hue=\"has Porch\", s=15, data=df);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Neighborhoods\n",
"\n",
"The instructors' notes say:\n",
"\n",
"> For instructors who cover nominal variables in their class, I would suggest incorporating the neighborhood variable into their models by converting it to a set of dummy (indicator) variables. I have found that the coefficients for the continuous variables tend to have values with more realistic interpretations when used in conjunction with the neighborhood variable.\n",
"\n",
"Indeed, plotting the price distributions by neighborhood reveals significant differences in the price level."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 17,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
2020-06-29 01:10:19 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAoYAAAIgCAYAAAAYz6iKAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzde1hU9dr/8fegoKSiiAppZGmh/fLU0dTCHTagmUWmVqPutEwzq52m7c6207LStu6y3KkdfEp6UhOzMEGhLeE5E03LXUaSeQAdEEQ8Aev3B7IekQEHGJhh/Lyuy+ty1pq1vvcsmJmbe30PFsMwDERERETkgufj7gBERERExDMoMRQRERERQImhiIiIiJyhxFBEREREACWGIiIiInKGEkMRERERAZQYitQpw4cPp0OHDixdutTdobjUn3/+SYcOHejQoYO7Q/EoNXVdli5dSocOHRg+fHilj924cSMdOnQgIiLCpTG5ije+Rzz9mot3qe/uAES8yTPPPENsbGyZ7Y0aNSI0NJSePXvywAMPEBIS4obopLKGDx/Opk2bALBarcyePbvc595///388MMPPPbYYzz++OO1FaKIiEupYihSA3x9fWnRogUtWrQgKCiI/Px8du3axYcffsiAAQP4/vvvq3Teiy++mMsvv5wmTZq4OGI5n1WrVrFjx45abdPX15fLL7+cyy+/vFbbFZELlyqGIjXgmmuu4ZNPPjEfHz9+nPj4eF599VVyc3N58sknWb16NQ0bNqzUed98801XhyqVMGvWLObPn19r7QUHB7Ny5cpaa09ERBVDkVrg7+9PdHQ0zz//PACHDh1i9erVbo5KnHXLLbdgsVj47rvvqlztFRGpC5QYitSi22+/HR+f4rfdzp07ze1nd5jPzc1l+vTp9O3bl65du3L99dc7fJ4jhmGwYsUKRo8eTa9evejUqRO33HILQ4cO5eOPPyY7O9vhcd9//z3jx48nPDycTp060b17d0aMGMHXX39Necup7927l8mTJxMVFUWXLl3o2rUrt956K8OHD+f9998nKyurqpeJLVu2MGbMGG666Sa6du3KXXfdxaeffkpRUVGp5/3555907NiRDh068Msvv5R7vmPHjnHNNdfQoUMHUlJSKh1Px44d6du3LwD/+te/Kn18icpeZ2cGn3z77bcMHz6c6667jmuvvZYhQ4aY/VydHYiRlJTE8OHDuf7667nmmmsYMmQIX3/9tVOvqeTYG264gWuuuYZ7772Xr776qsJjioqKWLx4McOGDePGG2+kc+fORERE8OKLL5Kenu7wmHMHYKxZs4ZRo0bRo0cPOnbsyMcff1zmmBMnTvDOO++Yv6M9evRg/Pjx7Nmzp8L4fvrpJyZOnEjv3r3Nn9NDDz1EfHz8ea9HQkICDz30EDfddBOdOnUiPDycp556qtT73ZGjR4/yxhtvEBERQefOnenduzcvvPACBw8ePG+bIq6kW8kitcjPz4/AwEDsdjt5eXll9mdlZTFw4ED27t2Ln58fvr6+Tp/76NGjPPHEE6xbtw4Ai8VCQEAAOTk5fP/993z//fcEBAQwcODAUsdNnz691O3Rxo0bk5OTw/r161m/fj1JSUnMmDHDTGihOKkdPnw4x44dA4r7wvn7+7N//37279/Ppk2buOqqqwgPD6/U9QGIj49nwoQJFBQUEBAQQEFBAbt27WLKlCmsX7+ef/3rX9SvX/zRdckll9CzZ0/Wrl3L0qVLeeaZZxyec8WKFeTn59O6dWt69uxZ6ZgAHn/8cRISEti0aRNr166lV69elTq+Ktf5fN577z0zUbVYLDRp0oQff/yRbdu28fPPPzt1jnfffZe3334bHx8fGjVqRH5+Ptu2beOpp57i8OHDjBgxotxjP/74Y6ZNm2a2feLECVJTU0lNTWXr1q289NJLZY45fvw4jz32mJmg+/r60rBhQ/bt28eiRYv48ssv+ec//8ltt91Wbrsffvghb7zxhtmuo2uWl5fH/fffz08//YSfnx8+Pj5kZWWxYsUK1q1bx+LFi7n00kvLHPf555/z8ssvm3+EBAQEcPToUVJSUkhJSeHOO+/k9ddfp169eqWOKyoq4tlnn2XZsmUA1KtXj0aNGpGRkcHXX3/NihUrePHFF7HZbGXazMzMZNiwYWZS3KBBA3Jzc1m8eDGJiYlMmDCh3Gsh4mqqGIrUohMnTpiVNEcDSN59910KCgqYN28e27Zt44cffuCLL75w6twTJ05k3bp1NGzYkOeff55NmzaxadMmtm3bxooVKxg3bhwBAQGljlmwYAHz58+nRYsWTJkyhe+//54tW7aQmprKzJkzadmyJXFxccybN6/UcW+88QbHjh2ja9euxMbGsmPHDjZv3kxqaipLlizhgQceqPIAmeeff54ePXqwevVqNm/ezObNm5k0aRI+Pj6sXr26TB+/QYMGAbB8+XIKCgocnrOkYhYdHV2pxOts7du358477wSK+xpWRlWvc0VKkmSAgQMHsnbtWjZv3symTZsYO3YsCxYsOG9y+PPPP/Puu+/yt7/9jY0bN/L999+zdu1aoqKiAPjnP//JkSNHHB6blZXFjBkziI6OJiUlhc2bN7NhwwYefPBBABYuXOiwcjht2jRSUlLw8/PjH//4Bz/88APff/89K1eu5MYbb+TkyZNMnDiR33//3WG7hw8fZsaMGdhsNrPdrVu3mhXdEu+88w45OTnMnz/fTFQXLlxISEgIR44c4a233ipz7h9++MFMCqOiolizZo35O/jkk09isVhYvnw577//fplj58+fz7Jly7BYLPztb39j06ZNbN68meTkZPr27UtRURFTpkxh8+bNZY595plnSE9PJzAwkPfee69UvI0bN+aNN95weC1EaoQhIi7z97//3QgLCzOGDRvmcP8nn3xihIWFGWFhYcbKlSvN7cOGDTPCwsKMq6++2vjvf/9b7vlLnvfFF1+U2v6f//zHCAsLMzp06GCsWbPGqVhzcnKMbt26GZ07dzZ+/vlnh8/54YcfjA4dOhg33HCDcfLkSXN7ly5djLCwMCM1NdWpts5n79695nXp379/qbZKvP3220ZYWJhx7bXXGvn5+eb2kydPGt27dzfCwsKMVatWlTkuLS3NvDZ//PFHpeIqud7Tp083DMMw/vjjD+Pqq6922NZ9991nhIWFGW+//Xap7dW5zmdfl3MNHTrUCAsLMx588EGjqKiozP7Jkyebx577+/LFF1+Y+957770yxx4/fty46aabjLCwMCM2NrbUvg0bNpjHjhw50mHbJe8Dq9Vaav/evXuNjh07GmFhYcZnn31W5rj8/HzjtttuM8LCwoxJkyaV2+6ECRPKHFui5GfWpUsXY8+ePWX2r1y50ggLCzM6depU5vfsr3/9qxEWFmbcd999RkFBQZlj33rrLSMsLMzo1q2bcfToUXN7Xl6ece211xphYWHGjBkzyhxXUFBg3H///UZYWJhhs9lK7du8ebP5utavX1/m2D179hidOnUywsLCjFtvvbXc1y3iKqoYitQwwzD4888/+eCDD5g+fToAbdq04dZbby3z3FtuuYWwsLBKt1Fy++rmm292+vZtfHw8+fn59OzZk44dOzp8zjXXXMMll1xCTk5OqT5SjRs3BooH0bjayJEj8fPzc7i9QYMG5OXlsXbtWnO7n58f0dHRAA6rqyXVwhtvvJHQ0NBqxRYaGso999wDFPc1NMrpf3m26lzn8mRlZZmVp1GjRmGxWMo85+GHHz7veRo0aMADDzxQZnvDhg25+eabASrsuzl69GiHbT/yyCMApKens2vXLnP76tWrKSoqomXLlgwePLjMcf7+/owaNQoonh6osLDQYbsPPfRQBa+qWFRUFG3bti2zPSIiAovFwqlTp/jjjz/M7UeOHGHjxo0AjBkzpsytYii+pg0aNCA/P581a9aY29etW0deXh6+vr5m/GerV68ejz76KFDcz/Ts903JqPNu3bpx0003lTm2bdu23H777ed9vSKuosRQpAZs2rTJHDTQsWNH+vTpw5tvvsmJEydo2bIl777
2018-09-05 00:48:12 +02:00
"text/plain": [
"<Figure size 720x576 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"_, ax = plt.subplots(figsize=(10, 8))\n",
"sns.boxplot(x=\"Neighborhood\", y=\"SalePrice\", data=df, ax=ax)\n",
"ax.set_title(\"Prices by Neighborhood\", fontsize=24)\n",
"ax.set_xlabel(\"Neighborhood\", fontsize=18)\n",
"ax.set_xticklabels(ax.get_xticklabels(), rotation=45)\n",
"ax.set_ylabel(\"House Price\", fontsize=18);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The 28 neighborhoods are encoded as factor variables."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 18,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"neighborhood = pd.get_dummies(df[\"Neighborhood\"], prefix=\"nhood\")\n",
"df = pd.concat([df, neighborhood], axis=1)\n",
"del df[\"Neighborhood\"]"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 19,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
2020-06-29 01:10:19 +02:00
"new_variables.extend(neighborhood.columns)"
2018-09-05 00:48:12 +02:00
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 20,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(2898, 28)"
]
},
2020-06-29 01:10:19 +02:00
"execution_count": 20,
2018-09-05 00:48:12 +02:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[neighborhood.columns].shape"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 21,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>nhood_Blmngtn</th>\n",
" <th>nhood_Blueste</th>\n",
" <th>nhood_BrDale</th>\n",
" <th>nhood_BrkSide</th>\n",
" <th>nhood_ClearCr</th>\n",
" <th>nhood_CollgCr</th>\n",
" <th>nhood_Crawfor</th>\n",
" <th>nhood_Edwards</th>\n",
" <th>nhood_Gilbert</th>\n",
" <th>nhood_Greens</th>\n",
" <th>nhood_GrnHill</th>\n",
" <th>nhood_IDOTRR</th>\n",
" <th>nhood_Landmrk</th>\n",
" <th>nhood_MeadowV</th>\n",
" <th>nhood_Mitchel</th>\n",
" <th>nhood_Names</th>\n",
" <th>nhood_NoRidge</th>\n",
" <th>nhood_NPkVill</th>\n",
" <th>nhood_NridgHt</th>\n",
" <th>nhood_NWAmes</th>\n",
" <th>nhood_OldTown</th>\n",
" <th>nhood_SWISU</th>\n",
" <th>nhood_Sawyer</th>\n",
" <th>nhood_SawyerW</th>\n",
" <th>nhood_Somerst</th>\n",
" <th>nhood_StoneBr</th>\n",
" <th>nhood_Timber</th>\n",
" <th>nhood_Veenker</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Order</th>\n",
" <th>PID</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <th>526301100</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <th>526350040</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <th>526351010</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <th>526353030</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <th>527105010</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" nhood_Blmngtn nhood_Blueste nhood_BrDale nhood_BrkSide \\\n",
"Order PID \n",
"1 526301100 0 0 0 0 \n",
"2 526350040 0 0 0 0 \n",
"3 526351010 0 0 0 0 \n",
"4 526353030 0 0 0 0 \n",
"5 527105010 0 0 0 0 \n",
"\n",
" nhood_ClearCr nhood_CollgCr nhood_Crawfor nhood_Edwards \\\n",
"Order PID \n",
"1 526301100 0 0 0 0 \n",
"2 526350040 0 0 0 0 \n",
"3 526351010 0 0 0 0 \n",
"4 526353030 0 0 0 0 \n",
"5 527105010 0 0 0 0 \n",
"\n",
" nhood_Gilbert nhood_Greens nhood_GrnHill nhood_IDOTRR \\\n",
"Order PID \n",
"1 526301100 0 0 0 0 \n",
"2 526350040 0 0 0 0 \n",
"3 526351010 0 0 0 0 \n",
"4 526353030 0 0 0 0 \n",
"5 527105010 1 0 0 0 \n",
"\n",
" nhood_Landmrk nhood_MeadowV nhood_Mitchel nhood_Names \\\n",
"Order PID \n",
"1 526301100 0 0 0 1 \n",
"2 526350040 0 0 0 1 \n",
"3 526351010 0 0 0 1 \n",
"4 526353030 0 0 0 1 \n",
"5 527105010 0 0 0 0 \n",
"\n",
" nhood_NoRidge nhood_NPkVill nhood_NridgHt nhood_NWAmes \\\n",
"Order PID \n",
"1 526301100 0 0 0 0 \n",
"2 526350040 0 0 0 0 \n",
"3 526351010 0 0 0 0 \n",
"4 526353030 0 0 0 0 \n",
"5 527105010 0 0 0 0 \n",
"\n",
" nhood_OldTown nhood_SWISU nhood_Sawyer nhood_SawyerW \\\n",
"Order PID \n",
"1 526301100 0 0 0 0 \n",
"2 526350040 0 0 0 0 \n",
"3 526351010 0 0 0 0 \n",
"4 526353030 0 0 0 0 \n",
"5 527105010 0 0 0 0 \n",
"\n",
" nhood_Somerst nhood_StoneBr nhood_Timber nhood_Veenker \n",
"Order PID \n",
"1 526301100 0 0 0 0 \n",
"2 526350040 0 0 0 0 \n",
"3 526351010 0 0 0 0 \n",
"4 526353030 0 0 0 0 \n",
"5 527105010 0 0 0 0 "
]
},
2020-06-29 01:10:19 +02:00
"execution_count": 21,
2018-09-05 00:48:12 +02:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[neighborhood.columns].head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Nominal Features\n",
"\n",
"This section investigates the rest of the nominal variables with regard to which realizations / encoding might be a useful predictor."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 22,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Alley Type of alley access to property\n",
"Bldg Type Type of dwelling\n",
"Central Air Central air conditioning\n",
"Condition 1 Proximity to various conditions\n",
"Condition 2 Proximity to various conditions (if more than one is present)\n",
"Exterior 1st Exterior covering on house\n",
"Exterior 2nd Exterior covering on house (if more than one material)\n",
"Foundation Type of foundation\n",
"Garage Type Garage location\n",
"Heating Type of heating\n",
"House Style Style of dwelling\n",
"Land Contour Flatness of the property\n",
"Lot Config Lot configuration\n",
"MS SubClass Identifies the type of dwelling involved in the sale.\n",
"MS Zoning Identifies the general zoning classification of the sale.\n",
"Mas Vnr Type Masonry veneer type\n",
"Misc Feature Miscellaneous feature not covered in other categories\n",
"Roof Matl Roof material\n",
"Roof Style Type of roof\n",
"Sale Condition Condition of sale\n",
"Sale Type Type of sale\n",
"Street Type of road access to property\n"
]
}
],
"source": [
"print_column_list(set(NOMINAL_VARIABLES) - set([\"Neighborhood\"]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Alleys\n",
"\n",
"Almost no house has access to an alley."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 23,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
2020-06-29 01:10:19 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZcAAAEGCAYAAACpXNjrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdd3jT1f7A8Xdmk850b0ZZVlYRkFVWoahFoEzXT4Wr4kBB3Bu8gnodV72iXlCvCxciIIJogSKlLEEoUzYFWrqb0TQ7+f7+CEQKpRRoqch5PQ8P7ck333PaJ80nZ32OTJIkCUEQBEFoQPKmboAgCILw9yOCiyAIgtDgRHARBEEQGpwILoIgCEKDE8FFEARBaHDKpm7AX0WPHj2Ij49v6mYIgiBcVgoLC9m4ceMZ5SK4nBAfH8+CBQuauhmCIAiXlVGjRtVaLobFBEEQhAYngosgCILQ4ERwEQRBEBqcmHMRBEG4AE6nk4KCAmw2W1M35ZLQaDQkJCSgUqnqdb0ILoIgCBegoKCAoKAgWrRogUwma+rmNCpJkqioqKCgoICWLVvW6zliWEwQBOEC2Gw2wsPD//aBBUAmkxEeHn5evTTRcxEE4W9BkiTKquxUVDuICvIjPNCv0eu8EgLLSef7s4rgIgjC30JZlZ0b382ltMrOkKujeW1MJ3T+6qZu1hVLDIsJgvC34HB7KK2yA7DruAmHy3PWa/9Ox1itWLGCdu3acfDgQcA7F3TjjTcCsHHjRu69994maZcILoIg/C0EaZRMG3Y13VuE8p+bUwgNOHNVk8vtYU+Riae+387mI5XYnO4maGnDWrJkCV27dmXp0qVN3ZQaRHARBOFvIUSr5tYezfjwjm6kNAtFpVCccU2lxcHNH27g280F3DpnI0arswla2nCqq6v5/fffmTlz5jmDi8Vi4emnn2bMmDFkZmayYsUKAG677Tb++OMP33W33HILe/bsuei2ieAiCMLfhp9Sgc5fjUJe++SzHBkhWm+PJlCj5HKfjl+5ciV9+/alZcuWhIaGsnPnzrNe+9///peePXsyf/58Pv/8c15//XUsFgtjxozx5VU8fPgwdrudq6666qLbJoKLIAhXjIggP76Z2JN3bk7hh0l9iLgEK8oa09KlSxk6dCgAGRkZdfZecnNz+fDDDxkxYgS33347drudoqIirr/+en799VecTifff//9WRNRni+xWkwQhCtKbIiWESmX//EaBoOBDRs2sG/fPmQyGW63G5lMxq233nrW5/znP/8hKSnpjPLevXuzcuVKli1b1mDZ4UXPRRAE4TL0yy+/MGLECFatWkV2djarV68mISGB4uLiWq9PTU1l7ty5vpVyu3fv9j02duxYZsyYQceOHQkJCWmQ9ongIgiCcBlasmQJgwcPrlE2ZMgQZs+eXev1DzzwAC6Xi+HDhzN06FDeeecd32MdOnQgMDCwwYbEQAyLCYIgXJa++OKLM8ruuOMO7rjjDt/3PXr0oEePHoA38eQ///nPWu9VUlKCJEmkpqY2WPtEz0UQBOEKtmjRIsaNG8fDDz+MXN5wIUH0XARBEK5gmZmZZGZmNvh9Rc9FEARBaHCNFlwOHTrEiBEjfP+uueYaPv30UwwGAxMmTGDIkCFMmDABo9EIeHP9zJgxg/T0dIYNG8auXbt891q4cCFDhgxhyJAhLFy40Fe+c+dOhg0bRnp6OjNmzPCtgjhbHYIgCMKl0WjBJSkpiR9++IEffviBBQsWoNVqSU9PZ86cOfTq1YusrCx69erFnDlzAMjJySE/P5+srCxeeuklpk+fDngDxaxZs5g3bx7fffcds2bN8gWL6dOn89JLL5GVlUV+fj45OTkAZ61DEARBuDQuybDY+vXrSUxMJD4+npUrV/rG907Nb3OyXCaTkZKSgslkorS0lNzcXPr06YNOpyMkJIQ+ffqwZs0aSktLMZvNpKSkIJPJyMzMZOXKlTXudXodgiAIwqVxSYLL0qVLfSmgKyoqiIqKAiAyMpKKigrAuxQuJibG95yYmBhKSkrOKI+Ojq61/OT1ddUhCILwd9KuXTteffVV3/cff/wx7777bo1rRowYwdSpUy910xo/uDgcDrKzs7n++uvPeEwmkzX6SW6Xog5BEISmoFarycrKorKystbHDx48iMfjYfPmzVgslkvatkYPLjk5ObRv356IiAgAwsPDKS0tBaC0tJSwsDDA2yM5NW1BcXEx0dHRZ5SXlJTUWn7y+rrqEARBaCqLthbS59VsWj61lD6vZrNoa+FF31OpVHLTTTfx2Wef1fr4kiVLGD58OKmpqb5pg0ul0YPLqVk7AdLS0li0aBHg3bwzaNCgGuWSJJGXl0dQUBBRUVGkpqaSm5uL0WjEaDSSm5tLamoqUVFRBAYGkpeXhyRJtd7r9DoEQRCawqKthTy9YAeFBisSUGiw8vSCHQ0SYG677TZ+/PFHqqqqznjsp59+YujQoQwdOvSSHybWqMHFYrGwbt06hgwZ4iubOHEia9euZciQIaxbt46JEycC0L9/fxITE0lPT+f5559n2rRpAOh0Oh544AHGjBnDmDFjmDRpEjqdDoBp06bx3HPPkZ6eTrNmzejXr1+ddQiCIDSF13/Zi/W0Uy+tTjev/7L3ou8dGBjIiBEj+Pzzz2uU79ixg9DQUOLi4ujVqxe7d+/GYDBcdH311ag79P39/dm4cWONstDQ0Fq7cDKZzBdQTncysJyuY8eOLFmy5Izys9UhCILQFI4brOdVfr7uvPNORo0aVSPx5NKlSzl8+DBpaWkAmM1msrKyGDduXIPUeS5ih74gCEIji9Npz6v8fOl0Oq6//nrmz58PgMfjYdmyZSxevJjs7Gyys7N5//33a/0w3lhEcBEEQWhkj1/XDq1KUaNMq1Lw+HXtGqyOf/zjH+j1egA2b97sW/h0Uvfu3Tl48KBvsVNjE4krBUEQGllmF+/Jl6//spfjBitxOi2PX9fOV36htm7d6vs6IiKCbdu2+b6fN29ejWsVCgVr1669qPrOhwgugiAIl0Bml/iLDiaXEzEsJgiCIDQ4EVwEQRCEBieCiyAIgtDgRHARBEEQGpwILoIgCEKDE6vFBEEQLlPJycm0bdsWt9tNUlIS//rXv9BqG2Zj5sUSPRdBEITLlEaj4YcffmDJkiWoVCq++eabpm6SjwgugiCcF5fbg8cjNXUzLj/b58FbHWC6zvv/9nnnfs556NatG0eOHCE7O5uxY8eSmZnJ+PHjKS8vx+PxkJaWhslk8l0/ZMgQysvLqays5KGHHmL06NGMHj2a33//vUHaI4KLIAj1Vmqy8ezCHbz36wEqqx1N3ZzLx/Z58ONkMB4DJO//P05usADjcrnIycmhbdu2dO3alXnz5rFo0SKGDh3KRx99hFwuJy0tjeXLlwOwbds24uLiiIiIYObMmdx55518//33vPvuuzz33HMN0iYx5yIIQr243B7ezNrLt5sLAOjaLJTerSOauFWXiZX/BOdpGZCdVm95pwvPUmyz2RgxYgTg7bmMGTOGw4cPM3XqVMrKynA4HCQkJACQkZHBe++9x+jRo1m6dCkZGRkArFu3jgMHDvjuaTabqa6uJiAg4ILbBSK4CEK9GK0O7C4PaoUcnb+6qZvTJOQyGQlh/r7vwwOvzN/DBTEWnF95PZ2ccznVjBkzGD9+PIMGDWLjxo3MmjULgC5dunD06FEqKytZsWIF999/P+DNoDxv3jz8/Pwuqi2nE8NignCKcrOdQoMV/SlDPkarg/eyD9Lz5ZW89vNeDJZzDwd5PBJOt6cxm3rJyeUy/q9Hc768uwc/T+nbYOnirwghCedXfhGqqqp82ZBPnsgL3jOzBg8ezCuvvEKrVq0IDQ0FIDU1lS+++MJ33R9//NEg7RDBRRBOqDDbmfz1Vvq8ms3Mn/7wBRG7y8OHuYfwSPDVb0exO+sOGgaLg8/W5/P4d9spbKDDoP4qQgPU9GkdwVWxwQRpVE3dnMvHoBdAdVowVmm95Q3swQcfZMqUKYwaNcp3au9JGRkZLF682DckBvDss8+yc+dOhg0
2018-09-05 00:48:12 +02:00
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x=\"Gr Liv Area\", y=\"SalePrice\", hue=\"Alley\", s=15, data=df);"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 24,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"del df[\"Alley\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Building Type\n",
"\n",
"The type of a building clearly affects the valuation. The two types of townhouses as well as the 2-family condo and duplex type are summarized into a single category. This makes sense a) semantically, and b) by looking at the two sub-clusters in the scatter plot."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 25,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
2020-06-29 01:10:19 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZcAAAEGCAYAAACpXNjrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdd3yT1f7A8U9mk7bpnrQIFChUVhEQKhsEGdaWJeh1wA8HooK4GCqiIIpeFQWvV65eF+AVkKHFUQWkIHvvIbt00ZGkaZr9/P4IBAqlFEiAwnm/Xr5oT54857Sm+eas75FJkiQhCIIgCF4kv94NEARBEG4+IrgIgiAIXieCiyAIguB1IrgIgiAIXieCiyAIguB1yuvdgBtF27ZtiYuLu97NEARBqFFOnjzJ+vXrLygXweW0uLg4Fi5ceL2bIQiCUKP079+/0nIxLCYIgiB4nQgugiAIgteJ4CIIgiB4nZhzqYLdbic7OxuLxXK9m1LjaTQa4uPjUalU17spgiBcAyK4VCE7OxudTkfdunWRyWTXuzk1liRJFBUVkZ2dTb169a53cwRBuAbEsFgVLBYL4eHhIrBcJZlMRnh4uOgBCsItRPRcLkEEFu8Qv0fB1yRJ4lSplaIyG1E6P8ID/a53k25pIrgIgnBTOFVq5d4ZqykotdLz9mjeHdicEH/19W7WLUsMi93gkpKSSEtL47777qNfv35s2bIFcM8H3XvvvZU+5+GHH2bnzp3Vuv8PP/xAWloaaWlpNG3alNTUVNLS0vjnP//ptZ9BEK4Fm9NFQakVgN05RmwO10WvFcdY+Z7oudzgNBoNS5YsAWDVqlV88MEHzJ4922v3HzBgAAMGDACgW7dufP3114SFhXnt/oJwreg0Sl5PvZ2fd+YyrldjQgMuXJnocLr4u8DEl38dYVCb2jStFYxGpbgOrb35ieBSg5hMJoKCgi4ot1gsjB8/nn379pGQkFBh4nz+/Pl8/vnn6HQ6GjdujFqtZuLEiVXWs2DBAvbv388rr7wCwLx58/j777955JFHeOyxx2jSpAl79uyhYcOGTJs2Da1Wy65du3jnnXcwm82Ehoby9ttvExUV5d1fgCBUIVir5sG2t9GvZRw6jQqF/MJ5vmKzjSH/WYfebGfR1hxWje0qgouPiGGxG5zFYiEtLY1evXrx6quvMnLkyAuu+e6779BoNPzyyy88++yz7N69G4D8/Hw+/fRTvv/+e7777jsOHz5crTp79+7NihUrsNvtACxcuNDTuzly5AgPPvggv/zyCwEBAcydOxe73c6UKVP4+OOPPdd++OGHXvoNCEL1+SkVhPirKw0sAHJkBGvdPZpAjRKxzMR3RM/lBnfusNjWrVsZO3YsGRkZFa7ZuHEjDz/8MACNGzemUaNGAOzcuZM2bdoQEhICQK9evTh69Ogl6wwICKBdu3b8+eefJCQkYLfbadSoEdnZ2cTGxtKqVSsA7rvvPr799ls6duzIgQMHGDZsGAAul4vIyEiv/PyC4E0ROj/+90Q7Nhwp5o7bQokQK8p8RgSXGqRly5aUlJRQXFzs87oGDRrEv//9bxISEipkPT1/SbFMJkOSJBo2bMj333/v83YJwtWKDdaSliyO1/A1MSxWgxw6dAin0+npiZzRpk0bT2/mwIED7N+/H4BmzZqxceNGDAYDDoeDzMzMatfVokUL8vLyyMjIqLAqLScnh61btwKQkZFBq1atqFevHsXFxZ5yu93OwYMHr+pnFQShZhM9lxvcmTkXcC+fnDZtGgpFxQnIBx54gPHjx9O7d2/q169PkyZNAIiOjubJJ59k0KBBBAcHk5CQgE6nq3bdvXv3Zu/evQQHB3vK6tWrx5w5c5gwYQINGjTggQceQK1W8/HHHzNlyhRKS0txOp08+uijNGzY0Au/AUEQaiIRXG5we/furbQ8Pj7e01vRaDQXnUC/9957GTx4MA6Hg2eeeYa77777onUtX768wvebN29m6NChFcqUSmWle2CSkpKYM2dOVT+KIAi3EBFcbnIzZ85kzZo1WK1WOnToUGVwOcNoNDJo0CAaNWpESkrKNWilIAg3GxFcbnJjx4697OcEBQXx22+/XVB+bm9JEAShKmJCXxAEQfA6nwWXw4cPe3JWpaWlcccdd/DVV1+h1+sZNmwYPXv2ZNiwYRgMBsA9WT1lyhR69OhBamqqZyMgwKJFi+jZsyc9e/Zk0aJFnvJdu3aRmppKjx49mDJliidf0MXqEARBEK4NnwWXhIQElixZwpIlS1i4cCFarZYePXowa9YsUlJSyMzMJCUlhVmzZgGQlZXF0aNHyczMZPLkyUyaNAlwB4qZM2cyb9485s+fz8yZMz3BYtKkSUyePJnMzEyOHj1KVlYWwEXrEARBEK6NazIstnbtWmrXrk1cXBzLli0jPT0dgPT0dP744w8AT7lMJiM5ORmj0UhBQQGrV6+mffv2hISEEBwcTPv27Vm1ahUFBQWYTCaSk5ORyWSkp6ezbNmyCvc6vw5BEATh2rgmE/pLly71bMQrKiryJDSMjIykqKgIcOfBiomJ8TwnJiaG/Pz8C8qjo6MrLT9zfVV11ETjx4/nzz//JDw83DOZPm7cODZs2ODZszJgwAAeeeSR69lMQRCECnweXGw2G8uXL+eFF1644DGZTObzEwqvRR2+1L9/fx566KELVn29/PLL9OrV6zq1ShAEoWo+Dy5ZWVk0adKEiIgIAMLDwykoKCAqKoqCggLP2SHR0dHk5eV5npeXl0d0dDTR0dFs2LDBU56fn8+dd9550eurqsPXFm89yXu/7SdHX06tEC0v3dOI9JZXl8OoTZs2ZGdnX/K6mTNnsmLFCqxWKy1btuTNN99EJpPx8MMPk5SUxKZNmygvL2fatGnMmjWLAwcO0Lt3b8aMGXNV7RMEQaiMz+dcli5dSt++fT3fd+vWjcWLFwOwePFiunfvXqFckiS2bduGTqcjKiqKDh06sHr1agwGAwaDgdWrV9OhQweioqIIDAxk27ZtSJJU6b3Or8OXFm89yfiFOzmpL0cCTurLGb9wJ4u3nvRJfe+++65nJd7+/ft56KGH+OGHH8jIyMBisbBixQrPtSqVioULFzJkyBBGjhzJxIkTycjIYNGiRZSUlPikfYIg3Np82nMxm82sWbOGN99801P2xBNP8Nxzz7FgwQJq1arF9OnTAejcuTMrV66kR48eaLVapk6dCkBISAgjR45k4MCBADz99NOexI2vv/4648ePx2Kx0KlTJzp16lRlHb703m/7Kbc7K5SV252899v+q+69VOb8YbHffvuNzz//HIvFgl6vp2HDhnTr1g3A829iYiINGzb0zEfVrl2bvLw8QkNDvd4+QRBubT4NLv7+/qxfv75CWWhoKF9//fUF18pkMl5//fVK7zNw4EBPcDlXs2bNKt0xfrE6fClHX35Z5d5ktVp54403+OGHH4iNjWXGjBlYrVbP42q1GgC5XO75+sz3DofD5+0TBOHWI3boe0mtEO1llXvTmUASGhpKWVlZpalbBEEQriWRW8xLXrqnEeMX7qwwNKZVKXjpnkZXdd/nn3+eDRs2UFJSQqdOnXj22WcvuCYoKIhBgwZx7733EhERQbNmza6qTkEQhKslk87kTLnF9e/fn4ULF1Yo27t3L0lJSdW+hy9Wi91MLvf3KQjCja+y904QPRevSm8ZJ4KJIAgCYs5FEARB8AERXARBEASvE8FFEARB8DoRXARBEASvE8FFEARB8DqxWuwGVlJSwtChQwEoLCxELpd7knDOnz+/wm77S1m/fj3//e9/+eyzz6p1/YwZM5g3b16FpJ/ffvstQUFB1f8BBEG4ZYngcgMLDQ1lyZIlgPvN3t/fn+HDh1+z+ocOHXpN6xME4eYhhsW8acc8+LApTApx/7tjnldv73K56N+/PwD79u2jUaNG5OTkAHD33XdTXl7OuHHjmDJlCkOGDKF79+78+uuvnuebzWZGjRpFr169eOGFFzizf/af//wnffr0ITU1lWnTpnm1zcLNx+F04XKJvddC1UTPxVt2zIOfRoH
2018-09-05 00:48:12 +02:00
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x=\"Gr Liv Area\", y=\"SalePrice\", hue=\"Bldg Type\", s=15, data=df);"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 26,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"# Unify the two townhouse types into one.\n",
"df[\"Bldg Type\"] = df[\"Bldg Type\"].apply(\n",
" lambda x: \"Twnhs\" if x in (\"TwnhsE\", \"TwnhsI\") else x\n",
")\n",
"# Unify the two kinds of 2-family homes.\n",
"df[\"Bldg Type\"] = df[\"Bldg Type\"].apply(\n",
" lambda x: \"2Fam\" if x in (\"2FmCon\", \"Duplx\") else x\n",
")"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 27,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"build_type = pd.get_dummies(df[\"Bldg Type\"], prefix=\"build_type\")\n",
"df = pd.concat([df, build_type], axis=1)\n",
"del df[\"Bldg Type\"]"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 28,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
2020-06-29 01:10:19 +02:00
"new_variables.extend(build_type.columns)"
2018-09-05 00:48:12 +02:00
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 29,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>build_type_1Fam</th>\n",
" <th>build_type_2Fam</th>\n",
" <th>build_type_Twnhs</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Order</th>\n",
" <th>PID</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <th>526301100</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <th>526350040</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <th>526351010</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <th>526353030</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <th>527105010</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" build_type_1Fam build_type_2Fam build_type_Twnhs\n",
"Order PID \n",
"1 526301100 1 0 0\n",
"2 526350040 1 0 0\n",
"3 526351010 1 0 0\n",
"4 526353030 1 0 0\n",
"5 527105010 1 0 0"
]
},
2020-06-29 01:10:19 +02:00
"execution_count": 29,
2018-09-05 00:48:12 +02:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[build_type.columns].head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Air Conditioning\n",
"\n",
"Air conditioning clearly increases the valuation (\"steeper\" regression line with respect to the overall living area)."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 30,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
2020-06-29 01:10:19 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZcAAAEGCAYAAACpXNjrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdd3zT1f748Vdmk7Rp070plFlZRbaUbZkiBUEc13vhgjhQcG8FL4j35wKvqBfEr+O6QGQIiFSLUMpGdpFNgRbadKdpdvL5/RGIjFIKNAJ6no8HD9JPPvmc0z6SzztnvY9MkiQJQRAEQahH8mtdAUEQBOHPRwQXQRAEod6J4CIIgiDUOxFcBEEQhHongosgCIJQ75TXugLXi86dOxMfH3+tqyEIgnBDKSgoYNOmTRccF8HltPj4eBYuXHitqyEIgnBDGT58eI3HRbeYIAiCUO9EcBEEQRDqnQgugiAIQr0TYy61cDqd5OfnY7PZrnVV/pQ0Gg0JCQmoVKprXRVBEOqZCC61yM/PR6/X07BhQ2Qy2bWuzp+KJEmUlpaSn59Po0aNrnV1BEGoZ6JbrBY2m43w8HARWPxAJpMRHh4uWoWC8CclWi6XIAKL/4i/rVCfJEmiuMpOabWDKH0A4UEB17pKf2kiuAiC8KdQXGXntvdyMFbZ6XdTNG+MaINBp77W1frLEt1i17ni4mIef/xxbr31VoYPH87999/P0aNHr+haCxcupKio6LJf99577/Hxxx9f9PmhQ4fy+OOPn3Ps3XffZf369ZddliBcKYfbg7HKDkDuSRMOl+ei54ptrPxPtFyuY5Ik8cgjj5CRkcGMGTMA2LdvH6WlpVc0CL5o0SKaNm1KdHT0Bc+53W4UCsVlX/Pw4cN4PB62bt2KxWJBp9MBMGnSpBrPv9JyBOFS9Bolk4fcxA+7T/HcgBaEBl44C9Hl9nDIaOaTdUcZ2TGRVnEhaFTi/egPIrhcxzZu3IhSqeTuu+/2HWvRooXv8dy5c1mxYgUOh4P09HQmTpxIfn4+999/P+3bt2f79u1ER0fzwQcfsHr1avbs2cNTTz2FRqNh3rx5DBo0iIEDB7J+/XrGjRtHdXU18+bNw+l0kpSUxBtvvIFWq621jsuWLeP222/nyJEjZGVlMWTIEACee+45evXqxYABA+jTp8855QwePNg/fzDhLy1Eq+aezg0Y1i4evUaFQn7hmF6ZxcFdH22kwuJk0faTrH22twgufiK6xa5jBw8epGXLljU+l5OTw7Fjx1iwYAFLliwhNzeXLVu2AHDs2DHuvfdeli9fjl6vZ+XKlQwYMIBWrVrx1ltvsWTJEjQaDQAGg4FFixYxePBg0tPT+e677/j+++9JTk5mwYIFl6zjDz/8wODBgxk8eDDLly+/6HlnlyMI/hKgVGDQqWsMLAByZIRovS2aII0SMaXEf0TL5Qa1bt061q1bR0ZGBgAWi4W8vDxiY2NJSEggJSUFgJYtW1JQUHDR6wwaNMj3+ODBg8ycOZOqqiqqq6tJS0urtQ67d+8mNDSUuLg4oqOjeeGFF6ioqMBgMNRajiBcKxH6AL4Z34XNR8u4uUEoEWJGmd+I4HIda9q0KStXrqzxOUmSGD9+PHfdddc5x/Pz81Grf58ho1AosNvtFy3j7G6v5557jg8++IAWLVqwcOFCNm/eXGv9li9fztGjR+nTpw8AZrOZzMxM7rzzzlrLEYRrKTZEy9BUsb2Gv4lusetYly5dcDgczJs3z3ds3759bN26lbS0NL777juqq6sBKCoqorS0tNbrBQYG+s6vSXV1NZGRkTidTpYuXVrrtTweDytWrOD7779n1apVrFq1ig8++IBly5Zdxm8oCMKflWi5XMdkMhmzZs1i+vTpfPTRRwQEBBAfH88LL7xAw4YNOXz4sK/lotPpePPNN5HLL/59YdiwYUyePNk3oH++SZMmMXLkSMLCwmjbtm2tgWjr1q1ER0efM/OsY8eOHD58GKPReBW/tSAIfwYySUz4Brwb3py/Wdhvv/3mG7sQ/EP8jQXhxlbTvRNEt5ggCILgByK4CIIgCPVOBBdBEASh3vktuBw5coShQ4f6/t188818+umnVFRUMGbMGPr168eYMWOorKwEvFNrp02bRnp6OkOGDCE3N9d3rUWLFtGvXz/69evHokWLfMf37NnDkCFDSE9PZ9q0ab58QRcrQxAEQfhj+C24JCcns2TJEpYsWcLChQvRarWkp6czZ84cunbtSmZmJl27dmXOnDkAZGdnk5eXR2ZmJlOnTmXKlCmAN1DMmjWL+fPn8+233zJr1ixfsJgyZQpTp04lMzOTvLw8srOzAS5ahiAIgvDH+EO6xTZs2EBiYiLx8fFkZWX5VpVnZGTw888/A/iOy2QyUlNTMZlMGI1GcnJy6NatGwaDgZCQELp168batWsxGo2YzWZSU1ORyWRkZGSQlZV1zrXOL0MQBEH4Y/whwWX58uXcdtttAJSWlhIVFQVAZGSkb+FfUVERMTExvtfExMRQVFR0wfHo6Ogaj585v7YybjSSJHH33XezZs0a37EVK1YwduzYa1grQRCES/N7cHE4HKxatYoBAwZc8JxMJvP7boR/RBn+IpPJePXVV/n3v/+N3W6nurqaGTNmMHny5GtdNUEQhFr5fYV+dnY2LVu2JCIiAoDw8HCMRiNRUVEYjUbCwsIAb4uksLDQ97rCwkLfCvCzc1wVFRXRqVOni55fWxn+tnh7AW+u3M/JCitxBi1P929ORrury2HUrFkzevfuzUcffYTFYmHo0KE0aNCgnmosCILgH35vuSxfvvycNOt9+vRh8eLFACxevJi+ffuec1ySJHbs2IFerycqKoq0tDRycnKorKyksrKSnJwc0tLSiIqKIigoiB07diBJUo3XOr8Mf1q8vYDnF+6moMKKBBRUWHl+4W4Wb794RuK6euSRR1i6dClr167l/vvvv/rKCoIg+JlfWy4Wi4X169fzr3/9y3ds/PjxPPbYYyxYsIC4uDhmzpwJQM+ePVmzZg3p6elotVqmT58OePcBefjhhxkxYgQAEyZM8KV0nzx5Ms8//zw2m40ePXrQo0ePWsvwpzdX7sfqdJ9zzOp08+bK/VfdetHpdAwaNAidTndOxmNBEITrlV+Di06nY9OmTeccCw0N5bPPPrvgXJlMdtGxhBEjRviCy9lat25dYxbei5XhTycrrJd1/HLJ5fJak1IKgiBcT8Tdqp7EGWrer+RixwVBEP7MRHCpJ0/3b472vL24tSoFT/dvfo1qJAiCcO2I/VzqyZlxlfqeLXbGo48+Wi/XEQRB+COI4FKPMtrF11swEQRBuJGJbjFBEASh3ongIgiCINQ7EVwEQRCEeieCiyAIglDvRHARBEEQ6p0ILte55s2b8+9//9v388cff8x77713DWskCIJwaSK4XOfUajWZmZmUlZVd66oIgiDUmQgu9WnXfJjRCqYYvP/vmn/Vl1QqlYwaNeoPz5UmCBfjcnvweKRrXQ3hOieCS33ZNR+WToTKE4Dk/X/pxHoJMPfeey9Lly6lqqrq6uspCFfBaLLx4qLdvL/6EGXVjmtdHeE6JoJLfcn6FzjPy4DstHqPX6WgoCCGDh3K559/ftXXEoQr5XJ7eDtzP/O25vN25gH2nTJd6yoJ1zERXOpLZf7lHb9M//jHP/juu++wWusnhb9weSqtDoxVNiosf91v63KZjIQwne/n8CCxt5BwcSK41JeQhMs7fpkMBgMDBgxgwYIF9XI9oWYlZjsFFVbKz+ryqbQ6eH/VYbpMz+KNH/fXKcB4PBJOt8efVf3DyeUy/tY5iS/HdebHSd3FdhJCrURwqS99XwHVeR82ldZ7vJ7885//pLy8vN6uJ5yr1Gxn4tfb6fbvVbz2w2++IGJ3efgo5wgeCb7afBy7s/agUWFx8NmGPJ7+dhcF9bRZ3PUiNFBNtyYRtIgNRq9RXevqCNcxvwYXk8nExIkTGTBgAAMHDmT79u1UVFQwZswY+vXrx5gxY6isrARAkiSmTZtGeno6Q4YMITc313edRYsW0a9
2018-09-05 00:48:12 +02:00
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x=\"Gr Liv Area\", y=\"SalePrice\", hue=\"Central Air\", s=15, data=df);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use a new variable name to cleary show that the variable's *dtype* is changed from *str* to *int*."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 31,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"df[\"air_cond\"] = df[\"Central Air\"].apply(lambda x: 1 if x == \"Y\" else 0).astype(int)\n",
"del df[\"Central Air\"]"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 32,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
2020-06-29 01:10:19 +02:00
"new_variables.append(\"air_cond\")"
2018-09-05 00:48:12 +02:00
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 33,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>air_cond</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Order</th>\n",
" <th>PID</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <th>526301100</th>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <th>526350040</th>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <th>526351010</th>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <th>526353030</th>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <th>527105010</th>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" air_cond\n",
"Order PID \n",
"1 526301100 1\n",
"2 526350040 1\n",
"3 526351010 1\n",
"4 526353030 1\n",
"5 527105010 1"
]
},
2020-06-29 01:10:19 +02:00
"execution_count": 33,
2018-09-05 00:48:12 +02:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[[\"air_cond\"]].head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### \"Proximity to various Conditions\"\n",
"\n",
"The columns *Condition 1* and *Condition 2* have the same realizations and can be regarded as \"tags\" given to a house indicating the nearby presence of a) a major street, b) a railroad, or c) a park.\n",
"\n",
"The default tag \"Norm\" (implying no \"condition\") is given to 86% of the houses (this realization should therefore not be regarded as a tag!).\n",
"\n",
"From the comparison of the grouped scatter plots below, it can be assumed that the proximity of a major street decreases the valuation (lower regression slope through the cloud of blue and orange dots). Therefore, a factor variable *major_street* is extracted indicating the proximity of an \"artery\" or \"feeder\" street.\n",
"\n",
"Further, a factor variable *railway* is extracted as a relatively high proportion of the houses has such a tag. From the plots, a railway seems to not affect the valuations strongly.\n",
"\n",
"Lastly, a factor variable *park* is extracted. From the plots, this does not seem to affect the valuation much."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"List the \"raw\" realizations:"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 34,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Feedr 174\n",
"Artery 97\n",
"RRAn 48\n",
"PosN 43\n",
"RRAe 29\n",
"PosA 24\n",
"RRNn 11\n",
"RRNe 6\n",
"dtype: int64"
]
},
2020-06-29 01:10:19 +02:00
"execution_count": 34,
2018-09-05 00:48:12 +02:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
2020-06-29 01:10:19 +02:00
"(\n",
" (\n",
" df[\"Condition 1\"].value_counts() + df[\"Condition 2\"].value_counts()\n",
" )\n",
" .sort_values(ascending=False)[1:]\n",
")"
2018-09-05 00:48:12 +02:00
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 35,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"# Condition 2 is only filled with anything other than \"Norm\"\n",
"# if Condition 1 already has such a tag.\n",
"assert not ((df[\"Condition 1\"] == \"Norm\") & (df[\"Condition 2\"] != \"Norm\")).any()"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 36,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
2020-06-29 01:10:19 +02:00
"86"
2018-09-05 00:48:12 +02:00
]
},
2020-06-29 01:10:19 +02:00
"execution_count": 36,
2018-09-05 00:48:12 +02:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 86% of the houses actually have no tag.\n",
"round(100* (df[\"Condition 1\"] == \"Norm\").sum() / df.shape[0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"From a simple scatter plot it is hard to see any significant impact by a predictor."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 37,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
2020-06-29 01:10:19 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZcAAAEGCAYAAACpXNjrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdeVyU1f7A8c8sDAz7Diq4oSK54ZaSW0JaqQgupd6Wm1lm9UuzW5mmae63uldNb4vXNq9mllfluhWJJu6m4r7gjqDsMMMAsz+/P0ZHEUQQENHzfr18JWee5zkHwvnO2b5HJkmShCAIgiBUI3ltN0AQBEF48IjgIgiCIFQ7EVwEQRCEaieCiyAIglDtRHARBEEQqp2ythtwv+jSpQsNGjSo7WYIgiDUKWlpaezdu7dUuQgu1zRo0IDVq1fXdjMEQRDqlMGDB5dZLobFBEEQhGongosgCIJQ7URwEQRBEKqdmHMRBKFOMplMpKamotfra7spDwUnJyeCgoJwcHCo0PUiuAiCUCelpqbi5uZG48aNkclktd2cB5okSeTk5JCamkqTJk0qdI8YFhMEoU7S6/X4+PiIwHIPyGQyfHx8KtVLFD0XQRDqrJsDiyRJmK0SZouEg0KGUiE+O1enygZxEVwEQXggmK0SZzN1mCxW3J0cCPJSiwBTi8RPXhCEB4IkSZgsVgD0JgvlHVRVHcdYZWVlMX78eJ544gkGDx7Mq6++yoULF6r83L179/Laa68BkJCQwOLFiwHYvHkzZ8+etV+3YMECdu3aVeX68vLyeOGFF2jfvj3Tp0+v8vOuEz0XQRAeCHK5jPqeajRFJgI9nFDISw/jSJKE3mQlR2fAy0WF2kGBvIzr7kSSJP7v//6P2NhY5s2bB8CpU6fIycmp8IR3RURFRREVFQXYgsvjjz9Os2bNABg3bly11OHo6Mi4ceM4c+YMZ86cqZZngggugiA8IJRyOd4uKjzVDijksjLnCMxWifPZOixWibxiEy0D3ZBT+eCyZ88elEolI0aMsJe1bNkSsAWeTz75hO3btyOTyXj99dfp168fe/fuZdGiRXh5eZGcnEyrVq347LPPkMlkJCYmMnv2bNRqNR07drQ/c/Xq1Rw7dowBAwawZcsW9u3bx5dffsnChQv54osvePzxx3nqqafYvXs3f//737FYLLRu3ZqPP/4YlUpFZGQksbGxbN26FbPZzPz58wkJCSnxvTg7O9OpUydSUlIq/XMojxgWEwThgSGX2Sbyy5t8vt6jUVRhldmZM2do1apVma/Fx8dz6tQp4uLi+O677/jkk0/IzMwE4MSJE0yaNImNGzeSmprKgQMHMBgMTJkyha+++orVq1eTlZVV6pkdOnQgMjKS999/n7i4OBo2bGh/zWAw8MEHHzBv3jzWrVuHxWLhxx9/tL/u5eXFmjVrGD58ON9+++1df8+VJYKLIAgPDQeFnKa+rjT0dqaZvwvKuxgSu5MDBw7Qv39/FAoFvr6+dO7cmaNHjwLQtm1bAgMDkcvltGzZkrS0NM6fP09QUJB9v87AgQMrVd+FCxcICgqyD8cNGjSI/fv321/v27cvAK1btyYtLa2avss7E8FFEISHikopx9NZhUqpuOs9Ms2bN+f48eOVr1ulsv9doVBgsVjuqv7KuL6jXi6X35P6rhPBRRAEoZK6du2K0Whk5cqV9rJTp06xf/9+OnXqxKZNm7BYLOTm5rJ//37atm1722c1bdqUtLQ0+5zHhg0byrzOxcWFwsLCUuVNmjQhLS2NS5cuARAXF0fnzp2r8u1VCzGhLwiCUEkymYxFixYxe/Zs/v3vf+Po6EiDBg2YNGkSHTt2JCkpiZiYGGQyGe+99x5+fn6cP3++zGc5Ojoyffp0Ro8ebZ/QLyuI9OvXjylTpvCf//yHzz//vMT9c+bMYdy4cfYJ/ZsXGlREZGQkOp0Ok8nE5s2b+fbbb+2r0u6WTKqOBd8PgMGDB4vDwgShDjl58iRhYWG13YyHSlk/89u9d4phMUEQBKHaieAiCIIgVDsRXARBEIRqV2PB5fz588TExNj/dOjQge+//578/HxGjhxJ3759GTlyJBqNBrDtap05cyZ9+vQhOjq6xDK/NWvW0LdvX/r27cuaNWvs5ceOHSM6Opo+ffowc+ZMe76g29UhCIIg3Bs1FlyaNm1KXFwccXFxrF69GrVaTZ8+fVi8eDERERHEx8cTERFhT8qWmJjIxYsXiY+PZ8aMGUybNg2wBYpFixbx888/88svv7Bo0SJ7sJg2bRozZswgPj6eixcvkpiYCHDbOgRBEIR7454Mi+3evZvg4GAaNGhAQkICsbGxAMTGxrJ582YAe7lMJiM8PBytVktmZiY7duygW7dueHp64uHhQbdu3di+fTuZmZnodDrCw8ORyWTExsaSkJBQ4lm31iEIgiDcG/ckuGzYsIEBAwYAkJOTg7+/PwB+fn7k5OQAkJGRQWBgoP2ewMBAMjIySpUHBASUWX79+vLqEARBqE6hoaHMnTvX/vU333zDwoULa7FF948aDy5Go5EtW7bw1FNPlXpNJis7c2l1uhd1CILwcFKpVMTHx5Obm3tX95vN5mpu0f2jxnfoJyYm0qpVK3x9fQHw8fEhMzMTf39/MjMz8fb2Bmw9kvT0dPt96enpBAQEEBAQwL59++zlGRkZPProo7e9vrw6BEF4eK1NSuPT305zJb+Y+p5q3nsylNj2Dar0TKVSybBhw/jhhx8YP358iddSU1OZNGkSeXl5eHt7M2fOHOrXr88HH3yASqXi5MmTdOjQAY1Gg6OjIydPniQnJ4fZs2ezdu1aDh06RLt27Ur0jOqSGu+5bNiwgf79+9u/joyMZO3atQCsXbvWfhDO9XJJkjh06BBubm74+/vTvXt3duzYgUajQaPRsGPHDrp3746/vz+urq4cOnQISZLKfNatdQiC8HBam5TGxNVHScsvRgLS8ouZuPooa5OqniX4ueeeY926dRQUFJQonzlzJoMGDWLdunVER0czc+ZM+2sZGRn89NNPTJw4EQCtVsvKlSuZOHEir7/+Oi+99BIbNmwgOTmZkydPVrmNtaFGg0tRURG7du2yp3wGGD16NDt37qRv377s2rWL0aNHA9CrVy+Cg4Pp06cPU6ZMYerUqQB4enryxhtvMHToUIYOHcqbb76Jp6cnAFOnTmXy5Mn06dOHhg0b0rNnz3LrEATh4fTpb6cpNpXMCFxssvDpb6er/GxXV1diYmJYunRpifKkpCT7XHNMTAwHDhywv/bUU0+hUCjsX/fu3RuZTEZoaCi+vr6EhoYil8tp1qzZPU2TX51qdFjM2dmZvXv3lijz8vLihx9+KHWtTCazB5RbXQ8st2rTpg3r168vVX67OgRBeDhdyS+uVHll/fWvf2Xw4MEMHjy4Qter1eoSX19PxS+TyUqk5ZfL5XV2Xkbs0BcE4YFX31NdqfLK8vT05KmnnmLVqlX2svbt29vT569bt45OnTpVS111hQgugiA88N57MhS1g6JEmdpBwXtPhlZbHS+//DJ5eXn2r6dMmcLq1auJjo4mLi6ODz/8sNrqqgtEyv1rRMp9QahbKptyvyZWiz1sKpNyXxwWJgjCQyG2fQMRTO4hMSwmCIIgVDsRXARBEIRqJ4KLIAiCUO1EcBEEQRCqnQgugiAIQrUTq8UEQRDuUlhYGC1atLB//a9//YugoKAqPbN9+/YkJSVVtWm1TgQXQRCEu+Tk5ERcXFyN12M2m1Eq69bbdd1qrSAItc5ssSKXyZDL69g5SUd+hoTpoEkFjyCI+gjaPlvt1Rw7doy5c+dSVFSEl5cXc+bMwd/fn5SUFD7++GPy8vJwcnJixowZhISEcPnyZd59912KioqIjIy0P2fv3r0sWLAAd3d3Lly4wG+//Vbtba1JIrgIglBhmVo9/4g/TZC3M891aYS3i+rON90PjvwM68aC6VqiSs1l29dQpQCj1+uJiYkBICgoiPnz5zNz5ky++OILvL292bhxI/PmzWPOnDlMmTKFjz/+mMaNG3P48GE+/vhjli5dyqx
2018-09-05 00:48:12 +02:00
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x=\"Gr Liv Area\", y=\"SalePrice\", hue=\"Condition 1\", s=15, data=df);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"However, plotting the groups seperately reveals different slopes."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 38,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
2020-06-29 01:10:19 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAABZ0AAAQ8CAYAAAD5fjDMAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdeVyU9fr/8fewKaKCKIv71klyVyJ3LQ21FMUtrbRyycylY/mNVDRNRTuUbVIppeZWmYrkyXI3FXNFLTfymNpxAwwFFFO2+f3hjzkiAw46MICv5+Pho7jnvu/PNTM8rhmuuea6DUaj0SgAAAAAAAAAAKzAztYBAAAAAAAAAABKDorOAAAAAAAAAACroegMAAAAAAAAALAais4AAAAAAAAAAKuh6AwAAAAAAAAAsBqKzgAAAAAAAAAAq6HojBKvY8eO6tixY7ZtERERqlevniIiIvJ1rnr16mnQoEHWDA8AigVyKQAULPIs8mPOnDmqV6+e9uzZY+tQgCKN3ArYjoOtA0Dx88cff+jrr7/Wnj17dPHiRd28eVNubm6qX7++/P391bNnTzk5Odk6zHuS9WK0ZcsWG0diuXXr1mnfvn06fvy4YmJilJKSooCAAL3//vu2Di2bQYMGae/evbKzs1NkZKTq1auXY5/x48dr9erVWrhwoVq3bm2DKIHCQy4tWopLLr1ddHS0nnvuOUnStGnT1L9//3s+V3F8zoC7Ic8WLcUlz2a9Z81Nr1699O677xZiREDRQm4tWopzbi1Tpoxq1Kghf39/DR48WC4uLjmO69ixo86fP2/62WAwyMXFRXXr1tXTTz+t559/Xo6OjjmOy6o3VKlSRevWrVOpUqVyPffRo0fl4EB5tCDwqCJfwsLC9OmnnyozM1PNmjVTr169VKZMGf3111/au3evJk2apG+++SbfnxgWNn9/fzVp0kSenp75Ou7HH3+Us7NzAUV1bz7//HPFxMSoTJky8vb21qlTp2wdUp4yMzMVGhqq+fPn2zoUwGbIpeRSa/juu+8k3XrzvXz58vsqOgMlDXmWPHu/evXqpapVq+bY/sgjj9ggGqBoILeSW+9XVm41Go2Kj4/Xpk2bNGfOHG3ZskXffvttrh9YvPDCCypfvrwyMjJ08eJFbdiwQbNmzdLu3bs1d+7cXNe7cOGCFi1apOHDhxfUXUIeKDrDYnPnztWcOXNUuXJlffzxx2rSpEmOfbZu3aoFCxbYILr8KVeunMqVK5fv4+rWrVsA0dyfCRMmyNvbWzVr1tTevXv1wgsv2DqkPNWsWVNRUVHauXOn2rRpY+twgEJHLiWXWkNycrLWrVunWrVqqV69elq/fr2OHTum+vXr2zo0wObIs+RZa+jVq5datGhh6zCAIoPcSm61hjtz67hx49SjRw8dPXpUa9euVa9evcwe9+KLL6patWqmn0eOHKnAwEBt3bpVe/fu1WOPPZbjGFdXVxkMBoWHh6tv375yd3e3/h1CnpjpDIucO3dOYWFhcnR0VHh4uNkXGEl64oknzHaw/vjjj3r++efl6+urxo0bKyAgQPPmzVNqamqOfbNmLl2/fl3/+te/9Pjjj6thw4by9/dXeHi4jEZjjmOMRqOWLl2qbt26qVGjRmrXrp2mTZumq1evmo3zzhlOe/bsUb169XT+/HmdP39e9erVM/0bP3686bjcZjhdvXpVs2fPVpcuXdSoUSP5+flp6NCh+uWXX3Lsm7XWnDlzdPz4cQ0fPlyPPvqomjRpooEDB+rAgQNmY85Ny5YtVatWLRkMhnwdZyuvv/66DAaDQkNDlZmZafFxR44c0ZgxY9SqVSs1bNhQTzzxhKZOnar4+Pgc+44fP1716tXT2bNntWTJEgUEBKhx48am5+7225cuXaqnn35ajRo1UseOHTV37lzT79hPP/2kvn37qmnTpmrVqpWmTZumGzduWOeBwAOJXHoLufT+rVmzRjdu3FCvXr1Mb86XL19udt/bH6vffvtNw4cP12OPPWZ67ix5zqRbX6cdP368OnTooIYNG6p169YaN26c2Y6avPLwjh07VK9ePU2YMMFsvKmpqWrRooVatGhh9ncbyAt59hbybOFIT0/XsmXL9Mwzz6h58+Zq0qSJAgMDtXTp0lzf5/7666967bXX1KZNGzVs2FAdOnTQ22+/rbi4OLP7HzlyREOHDlWzZs3UvHlzvfTSSzp48GCuMWU995cuXVJwcLDatWunRx55pMh3nqJoI7feQm61vgoVKujJJ5+UJB0+fNji42rWrCk/P788jytdurReffVVXb16VZ9++qlF5z137pzpeT937pxef/11tWjRQo0aNVLv3r21detWi2MEnc6wUEREhNLS0tStWzc9/PDDee5759chPvjgA82bN08VKlRQ9+7dVaZMGe3YsUMffPCBoqKiNH/+/BzHpKWlaejQoYqPj1f79u1lb2+vTZs2afbs2UpNTdXo0aOz7R8SEqIlS5bIw8ND/fv3l4ODgzZv3qxff/1Vqampd50pVbVqVY0ePVqLFi2SdOtTtCx3+wpdcnKynn32WZ08eVKNGjXSiy++qCtXruinn37SkCFDNHXqVA0YMCDHcUeOHNGXX36ppk2bql+/frpw4YI2bNigl156SZGRkapTp06e6xZX9evXV48ePfT9999r9erV6tOnz12P2bp1q8aMGSNJ6tKli6pUqaKjR4/qm2++0ebNm/X111+revXqOY4LCQnR/v371aFDB3Xo0EH29vbZbg8NDdXevXv1xBNPqE2bNtqyZYs+/PBDpaWlydXVVbNnz9aTTz6pRx99VDt37tSyZcuUkZGhd955xzoPBh445NLckUvz57vvvpOdnZ0CAwNVqVIleXh46IcfftBbb72lMmXKmD3m0KFDmjdvnnx9fdWnTx9duXJFtWrVsug52759u8aMGaP09HQ98cQTqlGjhuLi4rRhwwb9/PPPWrx4sRo0aJBjTXN5uG3btqpRo4Z++uknTZw4MUen0fr165WYmKghQ4YU25mQsB3ybO7Is9aVlpamESNGKCoqSrVr11b37t1VqlQp7dmzR9OnT9evv/6q9957L9sxK1eu1Ntvvy0nJyd17NhR3t7e+vPPP7VixQpt2bJF3333napUqWLa/8CBAxo8eLDS0tLk7++vmjVr6vjx4xo0aJBatmyZa2yJiYnq37+/ypQpo86dO8tgMKhixYoF9lig5CO35o7caj33Olc5r+Oef/55LVu2TMuXL9egQYNUq1Yti855/vx59evXT9WrV1fPnj2VlJSkH3/8USNHjtTChQvzzMH4H4rOsEh0dLQkqVWrVvk67uDBg5o3b54qV66sFStWyMPDQ9Ktr1CMHj3a9PWbESNGZDsuPj5ePj4+WrhwoUqXLi1JGj16tLp06aKvvvpKr7zyimlY/IEDB7RkyRLVqFFDK1askJubm6RbHbUvvPCCLl26ZHYe2+2qVaumMWPGaPXq1ZJkKnBa4v3339fJkyfVv39/vfPOO6ZPGF9++WX16dNHM2bMUNu2bbN9FUSSfv75Z82aNUu9e/c2bfv22281ZcoULV68WFOnTrU4BmuJiIjINqT/bqpWrZotfku9/vrrWrdunT7++GN169bN9Bybk5KSovHjxysjI0NLlizRo48+arotPDxcs2fP1pQpU8x+jevo0aNavXq12YJ01u1r1qyRl5eXpFvPu7+/v+bPn6/SpUsrIiLC9BWq1NRUBQYGatWqVXrttdd44457Qi7NHbnU8lx66NAh/f7772rbtq28vb0lSQEBAVqwYIHWrl2rfv36mT0uKipK77zzTo4/fJo3b57nc5aUlKRx48apdOnSWrZsmR566CHTbSdOnFD//v01adIk0zlul1seHjBggEJDQ/X9999r4MCB2W7LmlX9zDPP3O2hAHIgz+aOPJu/96yrV682e0HBrMd87ty5ioqK0sCBAzVx4kRTc0NGRoYmT56sVatWqUuXLqYOvtOnT2vq1KmqWrWqli5danr/KUm7du3SkCFDFBISYurGMxqNmjhxom7cuKFPP/3UdB5JWrR
2018-09-05 00:48:12 +02:00
"text/plain": [
"<Figure size 1440x1080 with 9 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"street = [\"Artery\", \"Feedr\"]\n",
"railway = [\"RRNn\", \"RRAn\", \"RRNe\", \"RRAe\"]\n",
"park = [\"PosA\", \"PosN\"]\n",
"plot = sns.lmplot(\n",
" x=\"Gr Liv Area\", y=\"SalePrice\", col=\"Condition 1\", hue=\"Condition 1\",\n",
" col_order=[\"Norm\"] + street + railway + park,\n",
" data=df, robust=True, col_wrap=4, ci=None, truncate=True, scatter_kws={\"s\": 15},\n",
")\n",
"# Adjust font sizes.\n",
"for ax in plot.axes:\n",
" ax.set_title(ax.get_title(), fontsize=20)\n",
" ax.set_xlabel(ax.get_xlabel(), fontsize=16)\n",
" ax.set_ylabel(ax.get_ylabel(), fontsize=16)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Extract factor variables *major_street*, *railway*, and *park*."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 39,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"df[\"major_street\"] = 0\n",
"df.loc[\n",
" df[\"Condition 1\"].isin(street) | df[\"Condition 2\"].isin(street),\n",
" \"major_street\",\n",
"] = 1"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 40,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"df[\"railway\"] = 0\n",
"df.loc[\n",
" df[\"Condition 1\"].isin(railway) | df[\"Condition 2\"].isin(railway),\n",
" \"railway\",\n",
"] = 1"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 41,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"df[\"park\"] = 0\n",
"df.loc[\n",
" df[\"Condition 1\"].isin(park) | df[\"Condition 2\"].isin(park),\n",
" \"park\",\n",
"] = 1"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 42,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"del df[\"Condition 1\"]\n",
"del df[\"Condition 2\"]"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 43,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
2020-06-29 01:10:19 +02:00
"new_variables.extend([\"major_street\", \"railway\", \"park\"])"
2018-09-05 00:48:12 +02:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Show summary of counts:"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 44,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"major_street 264\n",
"railway 94\n",
"park 60\n",
"dtype: int64"
]
},
2020-06-29 01:10:19 +02:00
"execution_count": 44,
2018-09-05 00:48:12 +02:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[[\"major_street\", \"railway\", \"park\"]].sum()"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 45,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>major_street</th>\n",
" <th>railway</th>\n",
" <th>park</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Order</th>\n",
" <th>PID</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <th>526301100</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <th>526350040</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <th>526351010</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <th>526353030</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <th>527105010</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" major_street railway park\n",
"Order PID \n",
"1 526301100 0 0 0\n",
"2 526350040 1 0 0\n",
"3 526351010 0 0 0\n",
"4 526353030 0 0 0\n",
"5 527105010 0 0 0"
]
},
2020-06-29 01:10:19 +02:00
"execution_count": 45,
2018-09-05 00:48:12 +02:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[[\"major_street\", \"railway\", \"park\"]].head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Exterior\n",
"\n",
"This dimensions tells the main material with which the houses are made of. The category is too diverse and the various grouped scatter plots did not reveal differing slopes. For simplicity, this variable is dropped.\n",
"\n",
"This variable actually also represents tags that could be associated with a house (possibly up to two different tags)."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 46,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
2020-06-29 01:10:19 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZcAAAEVCAYAAAAsHqjeAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdeXhN1/rA8e8+U05GkcggaNWsplDUzEVKE5FQlE7owC3larVFlbqmcqtKqwNtf9ftpIaq1NiUqqmoIampagwSksg8nnGv3x+HQ0giIRHD+jxPnydnnb33WidPnTdrepcihBBIkiRJUhnSVHQDJEmSpHuPDC6SJElSmZPBRZIkSSpzMrhIkiRJZU4GF0mSJKnMyeAiSZIklTldRTfgTvHoo49SrVq1im6GJEnSXSUhIYHdu3dfVy6DyyXVqlVj5cqVFd0MSZKku0rfvn0LLZfDYpIkSVKZk8FFkiRJKnMyuEiSJEllTs65SJJ017FarcTHx2MymSq6KfcNo9FI9erV0ev1JbpeBhdJku468fHxeHp6UrNmTRRFqejm3POEEKSmphIfH89DDz1UonvksJgkSXcdk8mEr6/vdYFFVe3YVRtCqBXUsnuToij4+vqWqqcog4skSXel6wOLjTxTKplZcViseTLAlLHS9hBlcJEk6Z4ghEpefgo2u4Wc3POowl6u9TVs2JCIiAjnf4sWLSr2+s8+++ym6pk4cSInTpy4qXsB1q9fT1hYGA0aNODgwYPFXpuVlcW3335703VdTc65SJJ0b1AUtFoDdrsFvd4dpYi/nVW7DYQKigaN9ua/Ao1GI1FRUSW+fuHChfzzn/8sVR12u50ZM2aU+h6tVut8Xa9ePT766CPeeeedG96blZXFkiVLePrpp0tVZ2Fkz0WSpHuCVqPH2+tBfLxr4+EWgEajve4aVbVhyT6PKfUElsyzjkBThrKzs+nRowenTp0C4LXXXmPZsmXMmTMHk8lEREQEY8eOBSAqKop+/foRERHB5MmTsdsdPa3mzZsza9YsevfuTUxMDM8++6yzx7FmzRrCw8Pp1asX7733nrPea++5Wu3atalVq9Z1bT1+/Liz/vDwcOLi4nj//fc5e/YsERERzJ49+5Z+F7LnIknSPUOrucEyWSFQLbkAqFYTcPOnvF8OFpcNHz6c0NBQJk+ezIQJE3juuefIzMxkwIABAHz77bfOns7JkydZv349S5YsQa/XM2XKFFavXk1kZCR5eXk0bdqU8ePHF6gvKSmJOXPmsHLlSry8vHj++efZuHEj3bt3L/Ke4nz//fc899xz9O7dG4vFgqqqjB07luPHj5eqR1YUGVwkSbpvKIoGg2dVrLkX0bl6oyg3P3hT1LBY+/bt2bBhA1OnTi3yS3rnzp0cOnSIfv36AVdWvwFotVp69Ohx3T0HDx6kdevW+Pj4ABAeHs6ePXvo3r17kfcUJzg4mM8++4zExEQee+wxatasWar7b0QGF0mS7huKRovWxQuNwQNFUVAKGTq7VaqqcvLkSYxGI5mZmQQGBl53jRCCPn36OIfIrubi4lJgzqQkbuae8PBwmjVrxm+//cawYcP497//TY0aNUr1jOLIORdJku4risYxkV8egQVg8eLF1K5dm/fff58JEyZgtVoB0Ol0zp/btm3Lzz//TGpqKgAZGRkkJCQU+9ymTZuyZ88e0tLSsNvtrF27llatWt10O8+dO0eNGjV47rnn6NatG3///Tfu7u7k5ube9DOvJnsukiRJN+HaOZeOHTvSt29fli9fzvLly/Hw8KBVq1Z8+umnjB49mgEDBtC7d28efvhh3n//fcaMGcPzzz+Pqqro9XomT55c7JlS/v7+jB07lsGDByOEoHPnznTv3v2G7fzll1+YNm0aaWlpDB8+nIYNG/Lll1+yfv16oqKi0Ol0VKlSheHDh+Pt7U2LFi3o1asXHTt2ZNy4cTf9+1GEEDc/o3UP6du3rzzPRZLuEn/99RcNGzas6Gbcdwr7vRf13SmHxSRJkqQyJ4OLJEmSVOZkcJEkSZLKXLkFl1OnThXIu9OiRQsWL15MRkYGQ4cO5bHHHmPo0KFkZmYCjqV506dPJyQkhPDwcA4fPux81o8//shjjz3GY489xo8//ugsP3ToEOHh4YSEhDB9+nQuTx8VVYckSZJ0e5RbcKlVqxZRUVFERUWxcuVKXF1dCQkJYdGiRbRt25bo6Gjatm3rTPa2detW4uLiiI6OZtq0aUyZMgVwBIoFCxawbNkyli9fzoIFC5zBYsqUKUybNo3o6Gji4uLYunUrQJF1SJIkSbfHbRkW27lzJzVq1KBatWps2rSJyMhIACIjI9m4cSOAs1xRFIKDg8nKyiI5OZnt27fTvn17vL29qVSpEu3bt2fbtm0kJyeTk5NDcHAwiqIQGRnJpk2bCjzr2jokSZKk2+O27HNZu3YtvXr1AiA1NRV/f38A/Pz8nJuIkpKSCuxkDQwMJCkp6brygICAQssvX19cHZIkSWWlYcOG1KtXDyEEWq2WSZMm0aJFi+uui4+P55///Cdr1qwpUL57925GjBhB9erVAahcuTKLFy++HU2/Lco9uFgsFn799ddC0xwoilLuR5TejjokSbr/XJ1bbNu2bcydO5dvvvmmwDU2W/FZl1u2bMnChQvLrY0VqdyHxbZu3UqjRo2oUqUKAL6+viQnJwOQnJzsTMIWEBBAYmKi877ExEQCAgKuK09KSiq0/PL1xdUhSdL9ybbvCKZpn2F67T+Ypn2Gbd+RMn1+Tk4OXl5egKNH8tRTT/HPf/6TsLCwAtedO3eOyMhIDhw4UOhzDhw4wJNPPklkZCQDBw50pu632+3Mnj2bXr16ER4eztdffw04FjU988wz9O3blxdeeMH5vXcnKPfgsnbt2gK/4K5du7Jq1SoAVq1aRbdu3QqUCyGIjY3F09MTf39/OnTowPbt28nMzCQzM5Pt27fToUMH/P398fDwIDY2FiFEoc+6tg5Jku4/tn1HsC3bAOlZjoL0LGzLNtxygLmc/qVnz568/fbbjBgxwvnekSNHmDhxIj///LOz7NSpU4waNYpZs2bRtGlTAPbu3etcUfvpp59Sq1Ytvv32W1atWsXo0aP54IMPAFi6dCkJCQmsWrWK1atXEx4ejtVqZfr06Xz44YesXLmSJ554wnn9naBch8Xy8vL4/fffmTp1qrNs2LBhjBkzhhUrVhAUFMS8efMA6Ny5M1u2bCEkJARXV1dmzpwJgLe3NyNGjHCmph45ciTe3t4AvPPOO0yYMAGTyUSnTp3o1KlTsXVIknT/sa3bCtZrhqesNmzrtqJ75OGbfu7Vw2IxMTGMGzfOOa/SpEmTAhmG09LSGDFiBAsWLKBOnTrO8muHxS5cuMC4ceM4c+YMiqI4E13u3LmTgQMHotM5vrK9vb05duwYx44dY+jQoYAjG7Ofn99Nf56yVq7Bxc3Njd27dxcoq1y5Mv/73/+uu1ZRlCKP4ezXr58zuFytSZMm102SFVeHJEn3ocs9lpKW34TmzZuTnp5OWloa4Pjuu5qnpydBQUHs27evQHC51vz583n00Uf5+OOPiY+P57nnnivyWiEEdevWZenSpWXzIcqY3KEvSdK9rbJX6cpvwsmTJ7Hb7c5RlWvp9XoWLFjgHNYqSnZ2tnPu+OoN4+3atWPp0qXOBQIZGRk89NBDpKWlOY81tlqtHD9+vKw+0i2TKfclSbqn6UI7OeZcrh4a0+vQhXa6pedenXJfCMHs2bOLPbDLzc2NhQsXMnToUNzc3PDw8LjumhdffJHx48fz6aef0rlzZ2d5//79iYuLo3fv3uh0OgYMGMAzzzzDhx9+yPTp08nOzsZutzN48GDq1q17S5+rrMiU+5fIlPuSdPcobcp9274jjrmX9Cyo7IUutNMtzbfcr0qTcl/2XCRJuufpHnlYBpPbTM65SJIkSWVOBhdJkiSpzMngIkmSJJU5GVwkSZKkMieDiyRJklTmZHCRJEkqpWeffZZt27YVKFu8eDFdu3a96cMJd+/ezfDhwwFISUlh+PDh9O7dm9D
2018-09-05 00:48:12 +02:00
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x=\"Gr Liv Area\", y=\"SalePrice\", hue=\"Exterior 1st\", s=15, data=df);"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 47,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
2020-06-29 01:10:19 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZcAAAEkCAYAAADjOHzWAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdd1zV1f/A8dfnLu5lgwiIK2eZqeBMcaXhQFE0smlllg3Lhg3NHL8cqVlaWabVN+vbcmNu0syVmil+HWXmAAMFZM/LHZ/z++PqVZSpoKjn+Xj4eHAPn8/nnIvK+571PooQQiBJkiRJlUhzvRsgSZIk3XxkcJEkSZIqnQwukiRJUqWTwUWSJEmqdDK4SJIkSZVOBhdJkiSp0umudwOqiw4dOlC7du3r3QxJkqQbSmJiIrt3776sXAaXc2rXrs3y5cuvdzMkSZJuKIMHDy62XA6LSZIkSZVOBhdJkiSp0sngIkmSJFU6OeciSVK1Z7VaSUhIwGw2X++m3LKMRiN16tRBr9eX63oZXCRJqvYSEhLw8PDgtttuQ1GU692cW44QgrS0NBISEmjQoEG57pHDYpIkVXtms5kaNWqUGVhUYceu2hBCvUYtuzUoikKNGjUq1HOUwUWSpBtCWYHFrtrILUgjNSeOQmu+DDCVrKI9RhlcJEm6KQihkmNOxWa3kJl/GlXYK/X5zZo1Y+DAgc4/CxYsKPX6zz777IrqGTduHMeOHbuiewFmzJhBnz59iIiIYOTIkWRnZ1fo/uXLl/POO+9ccf3nyTkXSZJuCoqioNMasNktuOjcUEr47GxTVVQh0Cig02jL/Xyj0cjKlSvLff38+fN59tlny309gN1uZ+rUqRW+R6u98D5CQ0MZPXo0Op2O9957j/nz5/P6669X6JmVQfZcJEm6KWg1evw86hPg1QhP1wA0xQQOm6pyJj+H49np/JubjU29uqGznJwcevfuzYkTJwB49dVXWbx4MbNmzcJsNjNw4EBGjx4NwMqVK4mKimLgwIFMmDABu93RswoJCWH69OkMGDCA2NhYhg4dysGDBwFYvXo1ERER9O/fn/fee89Z76X3XKxz587odI5+Q3BwMElJSYCjR/LCCy8wfPhwevXqxcyZM533LFu2jN69exMVFcW+ffuu6mdynuy5SJJ009BqSl8mKxDk2SwAmO02BOU/5f18sDjvmWeeITw8nAkTJjB27Fgee+wxsrKyGDJkCADfffeds6dz/Phx1q1bxw8//IBer2fSpEmsWrWKyMhI8vPzadmyJWPGjClSX3JyMrNmzWL58uV4enry5JNPsnHjRu69994S77nUsmXL6Nu3r/P1X3/9RXR0NAaDgT59+jB06FC0Wi0ff/wxy5cvx93dnccee4w777yz3D+XksjgIknSLUOjKASa3Ek1F+DtYkRD+SepSxoWCw0NZf369bzzzjslDpvt3LmTQ4cOERUVBVxY/Qag1Wrp3bv3ZfccPHiQ9u3b4+vrC0BERAR79uzh3nvvLfGei82bNw+tVsuAAQOcZR07dsTDwwOARo0akZiYSGZmZpF6wsPDiYuLK+OnUTYZXCRJumVoFQ2eBiPuegOKoqBVrn5mQFVVjh8/jtFoJCsri8DAwMuuEUIwaNAg5xDZxVxcXIrMmZRHWfcsX76cX3/9lYULFxZZ5WUwGJxfa7Va59BcVZBzLpIk3VI0ioJOo62UwAKwcOFCGjVqxPvvv8/YsWOxWq0A6HQ659cdO3Zkw4YNpKWlAZCZmUliYmKpz23ZsiV79uwhPT0du93OmjVraNeuXZnt2bp1K1988QXz5s3DZDKVef35ejIyMrBaraxfv77Me8pD9lwkSZLK4dI5ly5dujB48GCWLFnCkiVLcHd3p127dsybN49Ro0YxZMgQBgwYwJ133sn777/Pyy+/zJNPPomqquj1eiZMmFDqGVL+/v6MHj2axx9/HCEE3bp149577y2znZMnT8ZisTBs2DAAWrVqVerSYn9/f1544QUefPBBPDw8aNasWQV+KiVThBDln9G6iQ0ePFie5yJJ1dRff/1Vab/0pCtX3N9DSb875bCYJEmSVOlkcJEkSZIqnQwukiRJUqWrsuBy4sSJInl4WrduzcKFC8nMzGTYsGH06tWLYcOGkZWVBTiW6k2ZMoWwsDAiIiI4fPiw81krVqygV69e9OrVixUrVjjLDx06REREBGFhYUyZMoXz00cl1SFJkiRdG1UWXBo2bMjKlStZuXIly5cvx2QyERYWxoIFC+jYsSMxMTF07NjRmfxt69atxMXFERMTw+TJk5k0aRLgCBRz585l8eLFLFmyhLlz5zqDxaRJk5g8eTIxMTHExcWxdetWgBLrkCRJkq6NazIstnPnTurWrUvt2rXZtGkTkZGRAERGRrJx40YAZ7miKAQHB5OdnU1KSgrbt28nNDQUb29vvLy8CA0NZdu2baSkpJCbm0twcDCKohAZGcmmTZuKPOvSOiRJkqRr45oElzVr1tC/f38A0tLS8Pf3B6BmzZrOTUXJyclFdrYGBgaSnJx8WXlAQECx5eevL60OSZKkK3U+5X7//v0ZNWoUBQUFgCOJ5PU0ZsyYStv4WJmqPLhYLBZ++eUX+vTpc9n3FEWp8iNLr0UdkiTd/M7nFlu9ejV6vZ4ff/zxejepWqvy4LJ161aaN2+On58fADVq1CAlJQWAlJQUZ7K0gIAAZ2pogKSkJAICAi4rT05OLrb8/PWl1SFJ0q3BtvdPzJM/w/zqTMyTP8O2989KfX7btm2Jj48vUvbGG28UGYIfPXo0GzduZMSIERw5cgRwDNPPnTsXgA8//JDFixcjhGDGjBn079+fiIgI1q5dC1Bq+TvvvEPv3r154oknqu3ITJUHlzVr1tCvXz/n6x49ehAdHQ1AdHQ0PXv2LFIuhGD//v14eHjg7+9P586d2b59O1lZWWRlZbF9+3Y6d+6Mv78/7u7u7N+/HyFEsc+6tA5Jkm5+tr1/Ylu8HjLOncCYkY1t8fpKCzA2m42tW7fStGnTIuVRUVHOneo5OTnExsbSvXt32rZty969e8nJyUGr1TrPX/njjz9o27YtMTExHDlyhJUrV/LVV18xc+ZMUlJSSiz/+eefOXnyJGvXrmXGjBmXnedSXVRpcMnPz+e3336jV69ezrIRI0awY8cOevXqxW+//caIESMA6NatG3Xr1iUsLIzx48czceJEALy9vXn++eeJiooiKiqKkSNH4u3tDcDEiRN5++23CQsLo169enTt2rXUOiRJuvnZ1m4Fq61oodXmKL8K53OL3XfffQQFBTnT55/Xvn174uPjSU9PZ/Xq1fTu3RudTkebNm3Ys2cP+/bto3v37uTl5VFQUEBiYiINGzZk79699OvXD61Wi5+fH+3atePgwYMllu/Zs8dZHhAQwN13331V76uqVGniSldXV3bv3l2kzMfHh6+//vqyaxVFcQaUS50PLJdq0aIFq1evvqy8pDokSboFZJRwZnxJ5eVUnmOOBw4cyE8//cSaNWt49913AcfvqUOHDlG3bl06depERkYGixcvpnnz5lfVnupO7tCXJOnm4uNZsfJKNHjwYOcH28aNGwOOM1Rq1arF+vXrCQkJoW3btvznP/+hbdu2gGP+Zt26ddjtdtLT0/njjz9o2bJlieXt2rVzlqekpFz2Ab66kCn3JUm6qejCuzrmXC4eGtPr0IV3rfK6/fz8aNiw4WWp8du0acOuXbswGo20adOGpKQkZ3AJCwsjNjaWgQMHoigKr7/+OjVr1iy1fNeuXYSHhxMUFERwcHCVv68rIVPunyNT7ktS9VXRlPu2vX865lgyssHHE114V3Rtrv5c+LIUFBQQERHBihUrnMcJ30wqknJf9lwkSbrp6NrceU2CycV+++03xo0bx+OPP35TBpaKksFFkiSpEnTq1InNmzdf72ZUG3JCX5IkSap0MrhIkiRJlU4GF0mSJKnSyeAiSZIkVToZXCRJksowdOhQtm3bVqRs4cKF9OjR44oPI9y9ezfPPPMMAKmpqTzzzDMMGDCA8PBwnn766WL
2018-09-05 00:48:12 +02:00
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x=\"Gr Liv Area\", y=\"SalePrice\", hue=\"Exterior 2nd\", s=15, data=df);"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 48,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"del df[\"Exterior 1st\"]\n",
2018-09-05 11:24:17 +02:00
"del df[\"Exterior 2nd\"]\n",
"# Also discard the associated ordinal variables.\n",
"del df[\"Exter Cond\"]\n",
"del df[\"Exter Qual\"]"
2018-09-05 00:48:12 +02:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Foundation\n",
"\n",
"The type of foundation appears to have an effect. However, only three of the six realizations occur in a large number. Factor variables *found_BrkTil*, *found_CBlock*, and *found_PConc* are extracted but not regarded as \"interesting\"."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 49,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
2020-06-29 01:10:19 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZcAAAEGCAYAAACpXNjrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOydeZzNVf/A33fuzJ19Z2YsQ4jIErIkazSUacyQ0NNGixYiFaUS2VqeQr+UaHnapCRMiGSoIUtoxi401mH2fZ975/z++MzcMcxmzBhy3q/XfXXvued7zrk38/3cz25QSik0Go1Go6lGbGr7ABqNRqP596GFi0aj0WiqHS1cNBqNRlPtaOGi0Wg0mmpHCxeNRqPRVDu2tX2Aq4WuXbvSoEGD2j6GRqPRXFNER0ezY8eOi8a1cCmkQYMGLF++vLaPodFoNNcUQ4YMKXVcm8U0Go1GU+1o4aLRaDSaakcLF41Go9FUO9rnotForlny8/M5c+YMOTk5tX2Ufz0ODg40bNgQOzu7Ss3XwkWj0VyznDlzBldXV2644QYMBkNtH+dfi1KKxMREzpw5Q5MmTSp1jTaLaTSaa5acnBy8vb21YKlhDAYD3t7el6Qhas1Fo9Fc01gFi1JQYJaHjS0YK2e+0VSOSxXgWrhoNJp/BwVmiP8bCvLBwR08GomQ0dQK+pvXaDT/DlSBCBaA/GzRZMqcq6CaTGmtWrWiRYsW1tcffvghDRs2rJa1L2T58uXs37+f119/vcw5O3bswM7Ojo4dOwKwZMkSHB0dCQkJqZEzlYUWLhqN5t+BjRHcGkBOCrjWl9cXohSYcyAzHhy9wM4JbC7P9ezg4EBoaOhlrVGd/Pnnnzg5OVmFy/33318r59DCRaPR/DuwsQXnOiI0bIylayYFZkg4CsoCWUngezNgqvajHDp0iKlTp5KdnU2jRo2YPXs27u7uPPTQQ0yaNIm2bduSlJTE0KFD2bhxI8uXL2fjxo1kZ2dz+vRp7rzzTiZNmgTAjz/+yKJFi3B1daVly5aYTHLejRs3smDBAvLz8/Hw8ODdd98lJyeH7777DhsbG3766SemTJnCtm3bcHJy4rHHHiv3XO3atWPHjh2kp6cza9YsOnXqdFnfgY4W02g0/x4MNmC0Ld/kVeSHsTECl28ay8nJITg4mODgYMaMGQPApEmTePHFF1m1ahUtWrRg/vz5Fa5z6NAh5s2bx6pVq1i7di3nzp0jLi6ODz74gCVLlvDtt99y7Ngx6/xbb72VpUuXsnLlSgIDA/n0009p2LAhI0aMYOTIkYSGhl4kIMo7l8ViYdmyZbzyyiuVOm9FaM1Fo9FcPxjtoM6NkJsBJudqcfhfaBZLT08nPT2dLl26ADB48GDGjx9f4TrdunXD1dUVgGbNmhEdHU1KSgpdunTBy8sLgIEDB3LixAkAYmJimDBhAvHx8eTl5VXo56noXAEBAQC0bt2a6OjoSn76stGai0ajub4wmsDJC2ztq82pX+mtjUZUYaBBXl5eifeKzF1F8ywWS7lrzZw5kwceeIBVq1Yxffr0i9a7VIr2t7GxqXDvyqCFi0aj0VQjrq6uuLm5sWvXLgBCQ0Pp3LkzIK099u/fD8C6desqXKtdu3bs3LmT5ORk8vPzS1yTnp6Or68vACtXrrSOOzs7k5mZeUnnqgm0WUyj0WiqmbffftvqOPf39+fNN98E4NFHH+W5555j6dKl9O7du8J1fHx8GDt2LCNGjMDV1ZVWrVpZ3xs7dizjx4/H3d2drl27cubMGQDuuOMOxo0bR1hYGFOmTKnUuWoCg1LlBYNfPwwZMkQ3C9NorjEOHTpU4oarqVlK+77Lundqs5hGo9Foqh0tXDQajUZT7WjhotFoNJpqp8aES1RUlDWxKDg4mI4dO/LFF1+QkpLCqFGj6N+/P6NGjSI1NRWQfgEzZ84kICCAoKAgDhw4YF1rxYoV9O/fn/79+7NixQrr+P79+wkKCiIgIICZM2daQ/zK2kOj0Wg0V4YaEy5NmzYlNDSU0NBQli9fjqOjIwEBASxatIhu3bqxfv16unXrxqJFiwAIDw/nxIkTrF+/nhkzZjBt2jRABMX8+fNZunQpP/zwA/Pnz7cKi2nTpjFjxgzWr1/PiRMnCA8PByhzD41Go9FcGa6IWWzbtm34+/vToEEDwsLCrNU5Q0JC2LBhA4B13GAw0L59e9LS0oiLi2PLli10794dDw8P3N3d6d69O5s3byYuLo6MjAzat2+PwWAgJCSEsLCwEmtduIdGo9ForgxXRLisWbOGe+65B4DExER8fHwAqFu3LomJiQDExsbi5+dnvcbPz4/Y2NiLxn19fUsdL5pf3h4ajUZT3cTHxzNhwgTuvPNOhgwZwhNPPMHx48dp164dwcHBDBo0iBEjRhAVFQVISfwnn3yySnt16NChOo9eo9S4cMnLy2Pjxo3cddddF71nMBhqvD3pldhDo9FcnyilGDt2LF26dGHDhg0sX76cF154gcTERBo1akRoaCg//fQTISEhLFy4sLaPe0WpceESHh5O69atqVOnDgDe3t7ExcUBEBcXZy3I5uvrS0xMjPW6mJgYfH19LxqPjY0tdbxofnl7aDSa65uVEdF0f2sjTV5eQ/e3NrIy4vIKNG7fvh1bW9sSPVNatmxZwqoCkJGRgZub20XXp6Sk8MwzzxAUFMSwYcM4fPgwAJmZmUyePJmgoCCCgoL45ZdfSlyXlJTE8OHD+e233y7r/DVJjQuXNWvWEBgYaH3dt29fax2clStX0q9fvxLjSikiIyNxdXXFx8eHHj16sGXLFlJTU0lNTWXLli306NEDHx8fXFxciIyMRClV6loX7qHRaK5fVkZEM3n5PqJTslFAdEo2k5fvuywBc/ToUVq3bl3qe6dOnSI4OJg777yTL774glGjRl0054MPPuDmm29m1apVTJgwgZdeegmAjz76CBcXF1atWsWqVau47bbbrNckJCTw5JNPMm7cOPr06VPls9c0NSpcsrKy2Lp1K/3797eOjR49mj/++IP+/fuzdetWRo8eDUDv3r3x9/cnICCAKVOmMHXqVAA8PDx45plnGDp0KEOHDmXMmDF4eHgAMHXqVF577TUCAgJo1KgRvXr1KncPjUZz/fLfX/4mO79ktd/sfAv//eXvGtmvyCy2YcMGXnnllYvqfAHs3r2b4OBgQErup6SkkJGRwbZt23jggQes89zd3QHIz89n5MiRTJw4ke7du9fIuauLGi1c6eTkxI4dO0qMeXp68uWXX14012AwWAXKhRQJlgtp27Ytq1evvmi8rD00Gs31y9mU7EsarwzNmze/yGRVGn379mXy5MlV3qcIW1tbWrduzZYtW6x9Wa5WdIa+RqO5Lqjv4XhJ45XhtttuIy8vj++//946dvjw4RL+YBANpVGjRhdd36lTJ3766SdAosg8PT1xcXHh9ttvZ/HixdZ5Rbl9BoOB2bNnExUVddXn72nhotForgsmDrgJRztjiTFHOyMTB9xU5TUNBgPz589n69at3HnnnQQGBjJnzhzq1Klj9bkMGjSIOXPmMHPmzIuuHzt2LAcOHCAoKIj33nuPt956C4Cnn36atLQ07rnnHgYNGlTCAmQ0GpkzZw47duwoIYCuNnTJ/UJ0yX2N5trjUkvur4yI5r+//M3ZlGzqezgyccBNhHRoUIMn/HdxKSX3dbMwjUZz3RDSoYEWJlcIbRbTaDQaTbWjhYtGo9Foqh0tXDQajUZT7WjhotFoNJpqRwsXjUaj0VQ7OlpMo9FoLoNWrVrRokULLBYLTZs25e2338bR0ZH4+Hhmz57Nvn37cHNzw9vbm1deeYUmTZrU9pGvCFpz0Wg0msvAwcGB0NBQVq9ejZ2dHd999125pfivF7TmotFoLg2LGQw2YHMN/jbduxTCpkPqGXBvCP1eh3bDqm35Tp068ffff5dZih+kB8w777zD5s2bMRgMPP300wwcOJAdO3Ywf/58PD09OXLkCK1bt+bdd9/FYDCwd+9eZs+eTVZWFiaTiS+++AIXF5dqO3dNoIWLRqOpPOk
2018-09-05 00:48:12 +02:00
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x=\"Gr Liv Area\", y=\"SalePrice\", hue=\"Foundation\", s=15, data=df);"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 50,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"PConc 1282\n",
"CBlock 1242\n",
"BrkTil 310\n",
"Slab 48\n",
"Stone 11\n",
"Wood 5\n",
"Name: Foundation, dtype: int64"
]
},
2020-06-29 01:10:19 +02:00
"execution_count": 50,
2018-09-05 00:48:12 +02:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[\"Foundation\"].value_counts()"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 51,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
2020-06-29 01:10:19 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAABEcAAAFsCAYAAAAuddfcAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdeViU9f7/8eewioAgJqDmXolLaprlFpqGlku5ZGqrlVkueUyTVCx3LY7483uyLCrN8rRobqWeUilFNMU0M/PQyRSPG2Ag+zIMzO8PL+aALA7rgLwe1+Vl3nPf9+c9w/QZePFZDGaz2YyIiIiIiIiISC1lZ+sCRERERERERERsSeGIiIiIiIiIiNRqCkdEREREREREpFZTOCIiIiIiIiIitZrCERERERERERGp1RSOiIiIiIiIiEitpnBEarXNmzfTpk0bNm/eXKntzJo1izZt2nDhwoVKbUdExFbUn94c+vXrR79+/aqsvQsXLtCmTRtmzZpVZW2K1HTqb2uWt99+mzZt2nD48OEKv3dxfai+dmXjYOsCpPpr06ZNiY8vW7aMESNGVFE11dPbb7/NqlWr+OSTT7j33nttXY7N5b0e+Tk7O9OoUSN69uzJiy++iK+vb6HrjEYj27ZtY/fu3Zw6dYrExEQcHR1p1qwZ99xzDyNHjsTPz6+qnoZIhVN/emPqT4tW2v6xqH7YYDDg6urK7bffztChQxk9ejQODvpWUG5O6m9vTP1tQbbuN5966ikiIyOtPn/48OFMmTKlUmqprfSJKFYr7n++tm3bVnElNc/06dN54YUX8PHxsXUpVeqee+7hnnvuAeDq1ascOHCAzz77jH/9619s2LCBZs2aWc49e/YskydP5s8//6R+/fr06tWLRo0akZ2dzenTp/niiy/49NNPeeedd+jfv7+tnpJIhVB/Wna1sT8tT/+Yvx82mUzExMTw/fffs3DhQo4dO0ZISIgtnpJIlVF/W3a1sb8F2/Wbw4cPt7SbZ8+ePURFRdG/f/9C79m2bdvi4+PDzp07cXd3r5SaahuFI2K1l19+2dYl1Fje3t54e3vbuowqd8899xR432RnZ/PCCy/w448/snr1apYtWwbAX3/9xbhx44iJieGZZ55h+vTp1KlTp8C94uPjWbVqFcnJyVX6HEQqg/rTsqtt/Wl5+8fr+2GAS5cuMXjwYLZv384rr7zCrbfeWqnPQcSW1N+WXW3rb/PYqt8saiTTxYsXiYqK4oEHHih2pFPr1q0rvJbaSmuOSIUyGo2EhoYydOhQOnXqRJcuXXj88cfZuXNnoXMPHz5MmzZtePvtt4u8V1HznvPPsTx06BBPPfUUd911F126dGHChAn8+eefRd7r3LlzTJ06lW7dutG5c2fGjBnD3r17i30ehw4d4vXXX2fQoEF06dKFjh07MmTIEFatWkVWVlahOvOG4D399NO0adPG8idPSfP+du7cyRNPPEHXrl3p2LEjQ4cO5f3338doNBb7mqSnp/PWW2/Rt29fOnToQEBAAKGhoZjN5mKfU3Xg6OjIY489BsCJEycsx1euXElMTAxDhgxhzpw5hb7xB2jQoAHz5s1j8ODBBY7HxcWxYMEC+vXrR4cOHejevTtTpkzh5MmThe5R1vdPRkYGoaGhjBgxgrvuuou77rqLhx56iMWLF/PXX3+V5yURKZb6U/WnUL7+sTiNGzemZcuWACQkJFh1TWnej3lOnDjBtGnTuO++++jQoQO9e/fmueeeK/GaPLm5uSxevJg2bdowZcoUMjMzrapTpCzU36q/vZGS+s2855eamsqyZcvo168f7du3L/Y9kicvcOnQoQNbt24tU11at6liaeSIVBij0cjzzz9PZGQkrVq14vHHHyczM5PvvvuOV155haioKKZPn14hbe3du5ewsDDuu+8+xowZw59//sm+ffv49ddf2bFjB15eXpZzo6OjGT16NImJifj7+9O2bVvOnTvH5MmTue+++4q8/wcffMDZs2e566676NOnD0ajkWPHjvH2229z+PBhPv74Y+zt7YFrHyhhYWFERkYyfPhwmjRpYvXzWLFiBe+//z7169dnyJAh1K1bl/3797NixQoiIiL46KOPcHJyKnBNdnY2zz//PHFxcfj7+2Nvb8+ePXsICQnBaDTWmLmHBoMBgMzMTLZt2wbA5MmTb3hd/tfj/PnzPP7448TFxdG9e3cGDx7M5cuX+fbbb9m7dy9vv/02999/f6F7lOb9k5SUxNNPP01UVBQtW7Zk5MiRODo6cv78eTZt2kRAQAC33HJLeV8OkQLUn6o/hfL1jyW5fPkyZ8+exdXVlVatWt3w/LK8Hzds2MD8+fOxs7OjX79+tGjRgvj4eE6ePMnnn3/OoEGDim0vKyuLV199lV27dvHEE08wd+5c7Oz0+zypHOpv1d9a40b9ptFo5OmnnyYpKYlevXrh5uZW4uiSqKgoXnjhBdLS0ggNDaVnz56VWb5YSeGIWK2o9LNJkyaWIV5r164lMjISf39/Vq9ebVmsaMqUKYwaNYr333+fvn370qVLl3LXsmfPHj766CN69OhhORYSEkJoaCibNm3ihRdesBxfuHAhiYmJzJkzh2eeeabAPYr7ZnP+/Pnceuutlh/g86xcuZLVq1fz3XffWb6xGzduHCkpKZYPF2sXtPr55595//33adSoERs3bqRhw4YAzJgxgylTpvDDDz+wZs0aXnrppQLXxcXF4efnx9q1ay2/QZwyZQoDBw7k448/5sUXX8TR0fGG7R8+fLhUiz5B+YemmkwmvvzySwA6duwIwMmTJzEajfj4+Fj1TXp+8+fPJy4ujmnTpjFx4kTL8ccff5wnn3ySWbNm8f333+Pq6lrgutK+f6KiohgzZgzz5s0r8A16Wloaubm5papZBNSfgvpTa/rT8vSPeSIjIy3vN5PJRFxcHN9//z1OTk4sWrQINze3G96jtO/H06dPs2DBAtzc3PjnP//J7bffXuB+MTExxbaVmJjIxIkT+fnnn5kxYwYTJkwo0/MWyaP+Vv1tab9/LUu/eeXKFW677TbWr19P3bp1S7z/wYMHefnll3FxceGzzz7TZgPViMIRsdr1qzfDtTl5eR8umzZtwmAwMGvWrAKrODdo0ICJEycyd+5cNm7cWCEfLoMGDSrwwQLw2GOPERoayq+//mo5FhMTw4EDB7j11lt58sknC5z/wAMPcM899xTZwTZt2rTIdseNG8fq1avZv39/ib/1ssamTZsAmDhxouWDBcDBwYHXXnuNffv2sXHjxkIfLgBz584tMLS6QYMG9O/fn61bt3L27FnuuOOOG7YfGRlZ5Ne0JOX5cElMTCQiIoLo6Gjq169vCTPi4uIAity9piQxMTFERETQuHFjxo8fX+CxLl26MHjwYL7++mt2797NsGHDCjxu7fsnPj6enTt30rBhQ1577bVCv7m8PnQRsZb6U/Wn1vSnZe0f84uMjCz0dXFwcODRRx+1hNQ3Utr34+eff47JZGLSpEmFgpGSns/FixcZP34858+fJzg4mIcfftjapylSLPW36m/L8v1rWfrNWbNm3TAY2bZtG0FBQTRr1owPP/yQxo0bl6o2qVwKR8Rqv//+e7GPpaamcu7cOXx8fIpcFKh79+4A/Pvf/66QWjp06FDoWKNGjYBr0yDynDp1CoCuXbtahhHmV9yHS3p6Op988gm7d+8mOjqatLS0AvMh875hLY+82vJem/xatmyJr68vFy5cICUlpcAK1O7u7jRv3rzQNXnfbFq7YOnLL79c6YuU5f9wcXR0pFGjRowZM4aXXnrJ8vUqq/xf26J+09C9e3e+/vprTp06VSgcsfb98+uvv5Kbm0u3bt1u+GEnUhrqT9WfVpUpU6ZYasvNzeXKlSvs2bOHN998k7CwMDZu3Fhif1yW9+Px48cBih36X5SzZ88yevRoMjIy+OCDDwr9AClSVupv1d+WVln6TWdn5xtuH/3JJ58QFhZGly5dWL16NR4eHpX2HKRsFI5IhUhNTQUokCDnl7fSdUXtNFKvXr1Cx/LS/vzTHFJSUoB
2018-09-05 00:48:12 +02:00
"text/plain": [
"<Figure size 1440x360 with 3 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plot = sns.lmplot(\n",
" x=\"Gr Liv Area\", y=\"SalePrice\", col=\"Foundation\", hue=\"Foundation\",\n",
" col_order=[\"PConc\", \"CBlock\", \"BrkTil\"],\n",
" data=df, robust=True, col_wrap=4, ci=None, truncate=True, scatter_kws={\"s\": 15},\n",
")\n",
"# Adjust font sizes.\n",
"for ax in plot.axes:\n",
" ax.set_title(ax.get_title(), fontsize=20)\n",
" ax.set_xlabel(ax.get_xlabel(), fontsize=16)\n",
" ax.set_ylabel(ax.get_ylabel(), fontsize=16)"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 52,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"foundation = pd.get_dummies(df[\"Foundation\"], prefix=\"found\")\n",
"# Only keep the top 3 realizations.\n",
"del foundation[\"found_Slab\"]\n",
"del foundation[\"found_Stone\"]\n",
"del foundation[\"found_Wood\"]\n",
"df = pd.concat([df, foundation], axis=1)\n",
"del df[\"Foundation\"]"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 53,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"new_variables.extend([\"found_BrkTil\", \"found_CBlock\", \"found_PConc\"])"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 54,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>found_BrkTil</th>\n",
" <th>found_CBlock</th>\n",
" <th>found_PConc</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Order</th>\n",
" <th>PID</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <th>526301100</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <th>526350040</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <th>526351010</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <th>526353030</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <th>527105010</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" found_BrkTil found_CBlock found_PConc\n",
"Order PID \n",
"1 526301100 0 1 0\n",
"2 526350040 0 1 0\n",
"3 526351010 0 1 0\n",
"4 526353030 0 1 0\n",
"5 527105010 0 0 1"
]
},
2020-06-29 01:10:19 +02:00
"execution_count": 54,
2018-09-05 00:48:12 +02:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[foundation.columns].head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Garage Type\n",
"\n",
"As can be expected, the *Garage Type* looks very similar to the above *has Garage* variable. Therefore, it is dropped."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 55,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
2020-06-29 01:10:19 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZcAAAEGCAYAAACpXNjrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdeVyU1f7A8c+sMOz7IuC+xFXTNEsCNXFNRFxT62fZtatlXs1sMzUt07LFrLR79baZmbmk4pKJYYm4pYW5p+IKwiDbwACzP78/RkdRNgXE5bxfL18yzzzPOWdY5jvnOed8j0ySJAlBEARBqEHyum6AIAiCcPcRwUUQBEGocSK4CIIgCDVOBBdBEAShxongIgiCINQ4ZV034Hbx8MMPExISUtfNEARBuKOkp6ezZ8+e646L4HJJSEgIq1evrutmCIIg3FEGDhxY5nFxW0wQBEGocSK4CIIgCDVOBBdBEAShxokxF0EQbmtms5m0tDQMBkNdN+We5uzsTGhoKCqVqkrni+AiCMJtLS0tDXd3dxo2bIhMJqvr5tyTJEkiJyeHtLQ0GjVqVKVrxG0xQRBuawaDAV9fXxFY6pBMJsPX1/eGeo+i5yIIwm2vKoFFkiQsNgmLVUKlkKFUiM/ONelGg7sILoIg3BUsNomTWXrMVhsezipCvTUiwNQhEVwEQbgrSJKE2WoDwGC2UtFGVZIk3dAn8ezsbN59913279+Pp6cnKpWKZ599lh49elSz1dWzfft2PvzwQwDOnTtHQEAAzs7OtGjRgvfff79O2yaCiyAIdwW5XEY9Lw26YjNBns4o5NcHD0mSMJht5OiNeLuq0agUyMs479prXnjhBfr3789HH30E2FOebN26tcpts1gsKJU1/3bbqVMnOnXqBMCIESN49dVXad26dY3XczNEcBEE4a6glMvxcVXjpVGhkMvK7JlYbBKnsvVYbRJ5JWbuC3JHTsXBZffu3ahUKoYPH+44FhISwogRIwD7bLZXX32VkpISAKZNm0a7du3Ys2cPn3zyCR4eHpw+fZrNmzczduxYMjMzMRqNPPXUUwwdOhSAlStX8sUXX+Du7s59992HWq3mzTffJDc3l+nTp3PhwgUA3njjDdq3b19he3ft2sWSJUv4/PPPAdixYwfff/89CxYs4IEHHmDIkCHs2LEDPz8/Pv74Y3x8fDh37hxvvfUWeXl5ODs7M3PmTJo0aVLF73w5JEGSJEkaMGBAXTdBEIQyHDlypMbKMlms0tEMnfTX+TzpcLpOMlmslV6zePFiadasWeU+X1xcLBkMBkmSJOn06dOO95Ldu3dLbdq0kc6dO+c4Ny8vT5IkSSopKZFiYmKk3NxcKTMzU+ratauUl5cnmUwmafjw4dJbb70lSZIkvfTSS9LevXslSZKk9PR0qXfv3uW24//+7/+kAwcOSDabTerVq5eUk5PjKCMxMVGSJElq3ry5FB8fL0mSJH322WeOep566inp9OnTkiRJ0v79+6URI0aUWUdZP4vy3jtFz0UQhHuGSiGnsZ8bxSYLLmoFykpuiZXlrbfe4o8//kClUvHjjz9isVh4++23OXbsGHK5nDNnzjjObd26NWFhYY7HS5YsYcuWLQBkZGRw9uxZsrOz6dChA15eXgD07t3bUcbOnTs5efKk43q9Xk9RURGurq7ltk8mkxEXF8e6desYOHAgKSkpzJkzBwC5XE6fPn0AiIuLY9y4cRQVFZGSksKECRMcZZhMphv+vlxLBBdBEO4paqUctVJd5fObNWtGQkKC4/H06dPJzc1l8ODBAHzzzTf4+fkRHx+PzWbj/vvvd5zr4uLi+HrPnj3s3LmT5cuXo9FoGDFiBEajscK6bTYbK1aswMnJqcrtBXum4ueffx61Wk3v3r3LHe+RyWRIkoSHhwfx8fE3VEdlxDw9QRCECnTs2BGj0cj333/vOHb1YsLCwkL8/f2Ry+XEx8djtVrLLKewsBBPT080Gg2pqans378fsPdu9u7di06nw2KxlApkUVFRLFmyxPH46NGjVWpzYGAgAQEB/Oc//2HQoEGO4zabjc2bNwOwfv162rdvj5ubG6GhoWzatAmwT2A4duxYleqpiAgugiAIFZDJZCxYsIC9e/cSHR3N4MGDee2113j55ZcBeOKJJ1izZg39+vXj1KlTpXorV+vcuTMWi4XHHnuMjz76iLZt2wL2QDBmzBiGDBnC8OHDCQkJwd3dHYApU6Zw6NAhYmNj6dOnD8uWLatyu2NjYwkODi41MO/i4sKBAwfo27cvu3fv5oUXXgDggw8+YNWqVfTr14+YmBh++eWXm/peXU0mSVJF08HvGQMHDhSbhQnCbejo0aOEh4fXdTNq1eVxFIvFwrhx4xg0aFC119C8/fbbhIeHM2TIEMexBx54gJSUlJsus6yfRXnvnWLMRRAEoY7Nnz+fnTt3YjQaiYqKonv37tUqb+DAgWg0Gl5//fUaauGNE8FFEAShjr322ms1Wl55d2Gq02u5UWLMRRAEQahxtRZcTp06RVxcnONfu3bt+Oabb8jPz+eZZ56hZ8+ePPPMM+h0OsA+Q+Gdd96hR48exMbGcvjwYUdZa9asoWfPnvTs2ZM1a9Y4jl8e6OrRowfvvPMOl4ePyqtDEARBuDVqLbg0btyY+Ph44uPjWb16NRqNhh49erBo0SIiIiJISEggIiKCRYsWAZCUlMSZM2dISEhg5syZzJgxA7AHivnz57NixQpWrlzJ/PnzHcFixowZzJw5k4SEBM6cOUNSUhJAuXUIgiAIt8YtuS22a9cuwsLCCAkJITExkf79+wPQv39/x5S3y8dlMhlt27aloKCArKwskpOTiYyMxMvLC09PTyIjI9m+fTtZWVno9Xratm2LTCajf//+JCYmlirr2joEQRCEW+OWBJeNGzfSt29fAHJycggICADA39+fnJwcALRaLUFBQY5rgoKC0Gq11x0PDAws8/jl8yuqQxAE4Wb98ssvtGjRgtTUVMA+LXfbtm2O5/fs2cOff/5ZYRlpaWmO98KaOO92VuvBxWQysXXrVnr37n3dczJZ2ZlLa9KtqEMQhLvfhg0baN++PRs3bgSuDy6///77LZ2Ndbur9anISUlJtGzZEj8/PwB8fX3JysoiICCArKwsfHx8AHuPJDMz03FdZmYmgYGBBAYG8vvvvzuOa7VaHnrooXLPr6gOQRDufmtT0vlg899cyC+hnpeGV3q1oP8DIdUqs6ioiD/++INvv/2W5557jueee45PP/0Ug8HAH3/8QUxMDD/88ANyuZx169Yxbdo0GjZsyPTp0zl//jxgHyMOCAjAarUydepUUlJSCAwM5PPPP8fZ2ZlDhw7xxhtvABAZGVnt70Ndq/Wey8aNG4mJiXE8jo6OZu3atQCsXbuWbt26lTouSRL79+/H3d2dgIAAoqKiSE5ORqfTodPpSE5OJioqioCAANzc3Ni/fz+SJJVZ1rV1CIJwd1ubks7k1QdJzy9BAtLzS5i8+iBrU9KrVW5iYiKdOnWiUaNGeHt7c/z4ccaPH0+fPn2Ij49n9OjRDBs2jJEjRxIfH8+DDz7IO++8Q4cOHVi3bh1r1qyhWbNmAJw9e5Ynn3ySjRs34u7u7sj1NXnyZKZNm8a6deuq+224LdRqcCkuLmbnzp307NnTcWz06NHs2LGDnj17snPnTkaPHg1Aly5dCAsLo0ePHkybNo3p06cD4OXlxdixYxk8eDCDBw/mhRdecKSmnj59OlOnTqVHjx7Ur1+fzp07V1iHIAh3tw82/02JuXTiyBKzlQ82/12tcq/+kNynTx/HrbGK7N69myeeeAIAhULhyBcWGhrqSKHSsmVL0tPTKSgooLCwkA4dOgD2dPh3ulq9Lebi4sKePXtKHfP29mbx4sXXnSuTyRwB5VqXA8u1WrduzYYNG647Xl4dgiDc3S7kl9zQ8arIz89n9+7dHD9+HJlMhtVqRSaT0bRp05sqT62+ku5foVBUmnb/TiVW6AuCcNeo56W5oeNVsXnzZuLi4vj111/ZunUr27ZtIzQ0lIyMDIqKihznubq
2018-09-05 00:48:12 +02:00
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x=\"Gr Liv Area\", y=\"SalePrice\", hue=\"Garage Type\", s=15, data=df);"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 56,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"del df[\"Garage Type\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Heating\n",
"\n",
"Most of the houses have gas. The variable is not helpful."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 57,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
2020-06-29 01:10:19 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZcAAAEGCAYAAACpXNjrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdd3yT1f7A8U9mkzZd6aZl02KZbUULMiplypCyxC388LoFURFERRRB770qKKDCdeBWRIYMtVqQDSJt2cgs0L3TpGl2fn8EApW2FNpSxnm/XrxoTp4857Sv5PnmOeN7JE6n04kgCIIg1CNpYzdAEARBuP6I4CIIgiDUOxFcBEEQhHongosgCIJQ70RwEQRBEOqdvLEbcLWIj48nPDy8sZshCIJwTcnKymLHjh0XlIvgckZ4eDjLli1r7GYIgiBcU0aMGFFluegWEwRBEOqdCC6CIAhCvRPBRRAEQah3YsxFEIQbltVqJTMzE5PJ1NhNueqpVCoiIiJQKBS1Ol4EF0EQbliZmZl4e3vTokULJBJJYzfnquV0OikqKiIzM5OWLVvW6jWiW0wQhBuWyWQiICBABJaLkEgkBAQEXNIdnrhzEQThuuB0OinQmykqtxDs7UGAxqNWrxOBpXYu9e8kgosgCNeFAr2ZIfM2k683079dCP8Z1Qk/T2VjN+uGJbrFBEG4LljsDvL1ZgD2Z5dhsTmqPfZq2sYqNja20uNly5bx+uuvX9a5Dh48yIYNG9yPU1JSWLRoUZ3ad7lEcBEE4brgrZLz6tB23NLCn/fvjsHf68JZTTa7g0M5ZUz9cQ9/nSy+qoJMffhncOnTpw+PPPJIo7RFdIsJgnBd8FUruTe+GcNjw/FWKZBJLxwjKDZauPt/2yk1Wlmels2ye5s3Qktrr7i4mFdffZXs7GwApk2bxs0338yePXuYNWsWZrMZlUrF7NmziYiI4P3338dkMrFr1y4effRRTCYT+/btY/r06UydOhWNRsO+ffsoKChg8uTJDBw4EIfDweuvv8727dsJCwtDLpczcuRIBg4cWKe2i+AiCMJ1w0Muw0Muq/Z5KRJ81QpKjVY0qqvj8mcymRg2bJj7sU6nIzExEYBZs2bx0EMP0aVLF7Kzsxk/fjw///wzrVq14uuvv0Yul7N161bmzJnDvHnzmDBhgjuYABfkS8zPz+ebb77h+PHjPP744wwcOJDk5GSysrJYu3YtRUVFDBo0iJEjR9b597o6/rqCIAhXQKC3B9890pU/TxQT18wffe7Jxm4SKpWKlStXuh8vW7aMffv2AbB161aOHj3qfs5gMFBeXo5er2fKlCmcPHkSiUSC1WqtVV19+/ZFKpXSpk0bCgsLAdi1axcDBw5EKpUSFBREfHx8vfxeIrgIgnBDCfNVMyzGtb3GwbxGbsxFOBwOlixZgodH5WnVM2fOJD4+ngULFpCZmcmDDz5Yq/MplVdu9pwY0BcEQbhK9ejRgy+//NL9+ODBgwDo9XpCQkIAWL58uft5Ly8vysvLL6mOuLg4kpOTcTgcFBYW8ueff9ZDy0VwEQRBuGq99NJL7Nu3j6FDhzJo0CC+/fZbAB5++GHeffddkpKSsNls7uPj4+M5evQow4YNY+3atbWqY8CAAYSEhDBo0CAmT55Mu3bt8Pb2rnPbJc7rbS7eZRoxYoTYLEwQbjAHDx4kOjq6sZvR6MrLy/Hy8qKkpITRo0fz7bffEhQUdMFxVf29qrt2ijEXQRCEG9xjjz1GWVkZVquVJ554osrAcqlEcBEEQbjBnT+uU1/EmIsgCIJQ7xosuBw/fpxhw4a5/8XFxbF48WJKS0sZN24c/fv3Z9y4ceh0OsCV6+eNN96gX79+DB06lP3797vPtXz5cvr370///v0rzYw4O9DVr18/3njjDXcqh+rqEARBEK6MBgsurVq1YuXKlaxcuZJly5ahVqvp168fixYtolu3biQnJ9OtWzd3UrWNGzeSkZFBcnIyM2fOZMaMGYArUMyfP58lS5bwww8/MH/+fHewmDFjBjNnziQ5OZmMjAw2btwIUG0dgiAIwpVxRbrFtm3bRtOmTQkPDyclJYWkpCQAkpKS+P333wHc5RKJhJiYGMrKysjPz2fz5s10794dPz8/fH196d69O5s2bSI/Px+DwUBMTAwSiYSkpCRSUlIqneufdQiCIAhXxhUJLmvWrGHIkCEAFBUVERwcDEBQUBBFRUUA5OXlERoa6n5NaGgoeXl5F5SHhIRUWX72+JrqEARBuBoVFhby3HPP0adPH0aMGMGYMWP47bffLutcBw8epG3btu6enMbS4MHFYrGwbt26KjNsSiSSBt8F7krUIQiCcLmcTidPPvkkXbp0ISUlhWXLlvHuu++Sm5t7WedbvXo1N998M2vWrKnnll6aBg8uGzdupH379gQGBgIQEBBAfn4+4MrQqdVqAdcdyfl/zNzcXEJCQi4oz8vLq7L87PE11SEIglAXK9Ky6P7WOlpOXUP3t9axIi2rzufcvn07CoWCe+65x10WHh7OAw88QGZmJvfeey/Dhw9n+PDhpKamAq7r2n333cewYcMYMmQIf/31F+AKVL/88gtvvfUWW7ZswWw217l9l6vBg8uaNWsYPHiw+3FiYiIrVqwAYMWKFfTp06dSudPpJD09HW9vb4KDg+nRowebN29Gp9Oh0+nYvHkzPXr0IDg4GI1GQ3p6Ok6ns8pz/bMOQRCEy7UiLYsXl+0lq7QCJ5BVWsGLy/bWOcAcOXKEdu3aVflcQEAAn332GcuXL2fOnDm88cYbgOvupEePHu5JUzfddBMAqampRERE0KxZM+Lj4/njjz/q1La6aNBFlEajka1bt1basvORRx7hmWeeYenSpTRp0oS5c+cCkJCQwIYNG+jXrx9qtZrZs2cD4OfnxxNPPMGoUaMAePLJJ/Hz8wPg1Vdf5cUXX8RkMtGrVy969epVYx2CIAiX67+//k2F1V6prMJq57+//k1SbHi91fPaa6+xa9cuFAoFixcv5vXXX+fQoUNIpVIyMjIA6NixI9OmTcNms9G3b193Spbzv8wPGjSIlStXMmDAgHpr26UQucXOELnFBOHGcym5xVpOXUNVF0sJcOKtwVU8Uzvbtm1jwYIFfPXVV+6y4uJiRo0axfDhwzEajUyePBmHw0GnTp04cOAA4Boi2LBhA19//TXjxo1j6NChJCQkIJPJkMlkOJ1OSktL2bRpExqN5rLbd75LyS0mVugLgiDUQhM/9SWV11bXrl0xm81888037jKTyQS4UusHBQUhlUpZuXIldrvrzikrK4vAwEDuuusuRo8ezf79+9m2bRtt27Zlw4YNrFu3jvXr19O/f/9GW4ohgosgCEItTB7QFrWi8hbKaoWMyQPa1um8EomEBQsWsHPnThITExk1ahRTpkzh+eef595772X58uXceeedHD9+HE9PTwD+/PNPhg0bRlJSEmvXruXBBx9kzZo19O3bt9K5+/fvz+rVq+vUvsslusXOEN1ignDjudSU+yvSsvjvr3+TXVpBEz81kwe0rdfxlqudSLkvCILQAJJiw2+oYFIXoltMEARBqHciuAiCIAj1TgQXQRAEod6J4CIIgiDUOxFcBEEQhHongosgCEIjq4+U+59//jmzZs1yP54+fTpjx451P/7yyy/ducmuBDEVWRAEoRGdTbmflJTEO++8A7hW4K9bt+6SzhMXF8eqVavcjw8dOoTdbsdutyOTyUhLS7uiSXzFnYsgCJfEZnfgcNyga6/3LIE5HWCGn+v/PUvqfMr6SrkfHR1NRkYGJpMJvV6Ph4cH0dHRHD58GIC0tDTi4uLq3N7aEncugiDUWn6ZiXeS/yZC68l98c3Reikbu0lXzp4lsGoCWCtcj3WnXY8BOt112aetTcp9Dw8PMjIyePbZZ1m2bJk75f7jjz+O3W6noqICuVxOdHQ0e/fuxWQy0blzZ5o3b05qaiparRan00lYWNhlt/NSieAiCEKt2OwO3kn+m+//ygTg5mb+3NYmsJFbdQWlvH4usJxlrXCV1yG4/FNdUu7HxcWRlpa
2018-09-05 00:48:12 +02:00
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x=\"Gr Liv Area\", y=\"SalePrice\", hue=\"Heating\", s=15, data=df);"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 58,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
2018-09-05 11:24:17 +02:00
"del df[\"Heating\"]\n",
"# Also discard the associated ordinal variable.\n",
"del df[\"Heating QC\"]"
2018-09-05 00:48:12 +02:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### House Style\n",
"\n",
"In summary, this variable is very similar to the above derived variable *has 2nd Flr*. Therefore, it is dropped."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 59,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
2020-06-29 01:10:19 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZcAAAEGCAYAAACpXNjrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdeVyU1f7A8c8sDDDsIIv7ikpqmrlEoBaFa4S4ZauaXSu30jI1s8ylujdvaVnevPbLtqupoSRWWmriiiua+76AwrAOyzDDLM/vj0cGkUUQENHzfr14xZx55jkHwvnO2b5HIUmShCAIgiBUI2VtN0AQBEG4+4jgIgiCIFQ7EVwEQRCEaieCiyAIglDtRHARBEEQqp26thtwp+jevTsNGzas7WYIgiDUKUlJScTHx5coF8HlmoYNGxIdHV3bzRAEQahTBg0aVGq5GBYTBEEQqp0ILoIgCEK1E8FFEARBqHZizkUQhDrDbDaTmJiI0Wis7abcc5ycnGjUqBEODg4Vul4EF0EQ6ozExETc3Nxo1qwZCoWitptzz5AkifT0dBITE2nevHmFXiOGxQRBqDOMRiM+Pj4isNxmCoUCHx+fSvUYRc9FEIQ6pczAIklgs8hfSjWoKjZ8I1RMZQO6CC6CINwdbBZIPQk2Mzh5gGcTOcgItUIMiwmCcHeQbHJgATDnyz2ZMq+99WOsHnjggWKPo6OjmT179i3f71YlJCQwdOhQIiMj6devH59//jkA8fHxHDhw4KavnzZtGr///nuNtU+EdUEQ7g5KFbg3BGMWuDWQH99IksBihLxUcPYGBy0o6+Zn7KlTp7Jw4ULatm2L1Wrl/PnzAOzZswetVkvnzp1rtX1187cqCIJwI6UaXOqBVwvQuICilLc3mwXSToMhHdLPgGSp1iYkJibywgsvEBERwYgRI7hy5QpQspdQ2PvR6XQ8++yzREZG8sQTT7Bv3z4Atm/fzlNPPUVUVBQTJ04kLy+vRF0ZGRn4+voCoFKpaNWqFYmJiaxYsYJly5YRGRnJvn37CAsLw2yWe3S5ubnFHhc6cuQIzz33HIMGDWL06NHodLoq/y5Ez0UQhLuHQgmqm3xmVqrBar3Ws6n8qjOj0UhkZKT9sV6vJywsDIC5c+cSFRVFVFQUq1evZu7cuXz55Zdl3is2NpbQ0FBeffVVrFYr+fn5ZGRksHjxYr755hu0Wi1Llizhm2++Yfz48cVeO2LECPr27Uu3bt3o0aMHUVFRNGrUiOHDh6PVahk9ejQgJ+XdunUrjz/+OOvXr6d3797F9qqYzWZ7O729vfn111/59NNP+fDDDyv9u7meCC6CINw7VA5QrxWYcuXezS1M+Ds5ORETE2N/HB0dzZEjRwA4ePCgfe4jMjKSjz/+uNx7dejQgbfffhuLxcLjjz9OUFAQW7Zs4cyZMzz99NOA/ObfqVOnEq8dP348Tz75JNu3byc2Npb169fz/fffl7huyJAhLF26lMcff5zo6GjmzJlT7Pnz589z6tQpRo0aBYDNZrP3iKpCBBdBEO4tKg1ovW9vlSoVNpsNkN+8C4elunbtyg8//MDWrVuZNm0ao0aNwt3dnZCQED755JOb3rdJkyY888wzDBs2jODgYDIzM0tc8+CDD/L+++8THx+P1WqldevWxZ6XJInAwEB++umnavhJi4g5F0EQhGrywAMPsH79egDWrVtHly5dAPlIj6NHjwKwefNme3BJSkqiXr16DBs2jKFDh3L06FE6derEgQMHuHjxIgAGg8E+WX+9v/76C+naqreLFy+iVCpxd3fHxcWlxBzNwIEDeeONN0pNj9+8eXMyMjI4ePAgIPeUTp8+XeXfhei5CIIgVJOZM2cyffp0vv76a7y9ve3zFsOGDWPs2LE8+eST9OjRA61WC8gru77++mvUajVarZZ//vOf9tdNnjyZgoICAF5//fUSaVdiYmL48MMPcXJyQqVSMX/+fFQqFY8++igTJ05k06ZNzJw5ky5duhAREcGCBQt44oknSrRZo9Hw2WefMXfuXHJycrBarYwYMYLAwMAq/S4UklSFBd93kUGDBonDwgThDnf8+HGCgoJquxl1zu+//86mTZtuOgd0M6X9/st67xQ9F0EQhLvYnDlziIuLY8mSJbe1XhFcBEEQ7mIzZ86slXrFhL4gCIJQ7WosuJw7d47IyEj7V+fOnVm2bBlZWVmMGjWK3r17M2rUKPR6PSAvh5s7dy7h4eFERETYV1YArFmzht69e9O7d2/WrFljLz9y5AgRERGEh4czd+5c+8qJsuoQBEEQbo8aCy4tWrQgJiaGmJgYoqOjcXZ2Jjw8nCVLlhAcHMzGjRsJDg62jwPGxcVx4cIFNm7cyJw5c5g1axYgB4pFixaxcuVKVq1axaJFi+zBYtasWcyZM4eNGzdy4cIF4uLiAMqsQxAEQbg9bsuw2K5du2jcuDENGzZk06ZNDBw4EJDXXv/5558A9nKFQkGnTp3Izs5Gp9Oxfft2QkJC8PT0xMPDg5CQELZt24ZOpyM3N5dOnTqhUCgYOHAgmzZtKnavG+sQBEEQbo/bElzWr19vX1+dnp6On58fAL6+vqSnpwOQkpJCQECA/TUBAQGkpKSUKPf39y+1vPD68uoQBEGoqunTpxMcHFxsz0hV09/fjWo8uBQUFLB582b69u1b4jmFQlHjx5XejjoEQbh3DBo0iKVLlxYrmzp1KnPmzCEmJobY2Fj69esHyJskC3e+V5TFUr2ZmmtLjS9FjouLo127dtSrVw8AHx8fdDodfn5+6HQ6vL3lHD/+/v4kJyfbX5ecnIy/vz/+/v7s2bPHXp6SkkK3bt3KvL68OgRBuLesPZjExxtOciUrnwaezkzp04aBDzSs0j27du1KYmJisbLy0t8rlUp++eUXZs6cSUBAAG+//TaZmZn2nfgNGjRg2rRpaDQajh8/TufOndmyZQsrVqzA29sbm81Gnz59+Omnn+rUe1mN91zWr1/PgAED7I/DwsJYu3YtAGvXruWxxx4rVi5JEgkJCbi5ueHn50doaCjbt29Hr9ej1+vZvn07oaGh+Pn54erqSkJCApIklXqvG+sQBOHesfZgEtOj/yYpKx8JSMrKZ3r036w9mFTtdRWmvx83bhwrVqzAZDLZ09+PHDmSmJgYunTpYk/Jv27dOiIiIpg7d679HikpKaxYsYLp06fz5JNP8ssvvwCwc+dO2rZtW6cCC9RwcDEYDOzcuZPevXvby8aMGcOOHTvo3bs3O3fuZMyYMQD06tWLxo0bEx4ezsyZM3nvvfcA8PT0ZOzYsQwZMoQhQ4Ywbtw4PD09AXjvvfd45513CA8Pp0mTJvTs2bPcOgRBuHd8vOEk+WZrsbJ8s5WPN5ys9rrGjx/Pzz//TEhICLGxsbz00kulXnfw4EH7XE1kZCT79++3P9e3b19UKvn0zMGDB9vT+v/888+lJpy809XosJhWqyU+Pr5YmZeXF99++22JaxUKhT2g3KgwsNyoQ4cOxMbGligvqw5BEO4dV7LyK1VeVRVJf18eZ2dn+/f169fHx8eHXbt2cfjwYebPn1/dza1xYoe+IAh3pQaezpUqr4qKpr8vKyV/aYYOHcqUKVOK9WjqEhFcBEG4K03p0wZnh+Jvys4OKqb0aVOl+06ePJnhw4dz/vx5evbsyapVq4iJiaFv375ERkYyZcqUYunv//jjD/t59jNnziQ6OpqIiAhiYmKYMWNGmfWEhYVhMBjq5JAYiMSVgiDcpQpXhVX3arHSTogcOnRoqdc2b96cdevWFSv77rvvSlz30UcflSg7ceIEbdu2pWXLlrfY0tolgosgCHetgQ80rHIwqQ1Llixh+fLlVT5/pTaJ4CIIgnCHGTNmTJ1f5SrmXARBEIRqJ4KLIAiCUO1EcBEEQRCqnQgugiAIQrUTE/qCIAiVcPXqVd566y3S09NRKBQMGzaMESNGkJCQwLx58ygoKKCgoID+/fszYcIE4uPjcXBwoHPnzrXd9NtKBBdBEIRKUKlUTJs2jXbt2pGbm8vgwYMJCQlh6tSpLFy4kLZt22K
2018-09-05 00:48:12 +02:00
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x=\"Gr Liv Area\", y=\"SalePrice\", hue=\"House Style\", s=15, data=df);"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 60,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"del df[\"House Style\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Land Contour\n",
"\n",
"This variable is assumed to contain the same information as the ordinal variable *Land Slope* and is dropped."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 61,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
2020-06-29 01:10:19 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZcAAAEGCAYAAACpXNjrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdd3iUVdrA4d/UzEwy6ZUk9B5K6C2ABgMajAREUHddZXGBFQSxNwQVdXXXtuCyonysLq4KCEGKGgElNJHQqyAQIAnpyZRMn3m/PwYGkJAEyNA893VxMXPmnfecQDJPTnuOTJIkCUEQBEFoQPJr3QBBEATh5iOCiyAIgtDgRHARBEEQGpwILoIgCEKDE8FFEARBaHDKa92A60WvXr2Ij4+/1s0QBEG4oRQUFLBly5YLykVwOS0+Pp4lS5Zc62YIgiDcUEaMGFFjuRgWEwRBEBqcCC6CIAhCgxPBRRAEQWhwYs6lFk6nk/z8fGw227VuinAOjUZDQkICKpXqWjdFEISLEMGlFvn5+ej1epo2bYpMJrvWzREASZIoLy8nPz+fZs2aXevmCIJwEWJYrBY2m42IiAgRWK4jMpmMiIgI0ZsUhOuc6LnUQQSW64/4PxFqIkkSpSY75dUOovUBRAQFXOsm/a6J4CIIwk2h1GTnzlkbKDHZGdw+hrdGdiJUp77WzfrdEsNi17kuXbo0+D1nzZrFvHnzanwtKyuLO++8k4yMDDIzMy96XV0OHDjAunXrrqSZgnBJHG4PJSY7APsKjThcnoteK46x8j8RXASfdevW8cknnzBv3jyWL1/OwoUL0ev1l3WvhgwukiTh8Vz8g0IQAPQaJdMz2tOjaRj/vDeZsMALVxO63B4OnjLy7Fe7yT1egc3pvgYt/X0Qw2I3oLVr1zJnzhycTiehoaH84x//IDIyklmzZlFYWEh+fj6FhYU8+OCD/OlPfwJgzpw5ZGVlER4eTlxcHElJSRfcd+7cuTz99NPExMQAoFarGTVqFOANFtOnT8dqtdK4cWNef/11QkJCeOCBB+jUqRNbtmzBZDLx2muv0alTJ/75z39is9nYtm0b48ePp2/fvjz//POcPHkSrVbLK6+8Qtu2bZk1axY6nY6xY8cCcOedd/Lvf/8bgLFjx9K5c2f27dvH3LlzRe43oVYhWjX392rM8C7x6DUqFPIL5+YqLA7u/egnqixOlu4oZP0zt6JRKa5Ba29+oudyA+rWrRsLFy4kKyuLoUOH8vHHH/teO3bsGPPmzWPRokV88MEHOJ1O9u7dy6pVq8jKyuKjjz5iz549Nd738OHDdOjQocbXnn76aZ588kmWL19O69atmT17tu81t9vN4sWLef7555k9ezZqtZrJkyeTnp7OsmXLSE9PZ9asWbRv357ly5czdepUnnnmmTq/zuPHj3P//fezcuVKEViEeglQKgjVqWsMLAByZIRovT2aII0SsTTEf0TP5QZUVFTE1KlTKS0txeFwkJCQ4Htt4MCBqNVqwsPDCQ8Pp7y8nNzcXG677Ta0Wi0Aqampl1SfyWTCZDLRs2dPAIYPH86UKVN8r6elpQGQlJREQUFBjffYtm0bs2bNAqBPnz5UVVVhNptrrbdRo0YkJydfUlsFoTaR+gC+GNebn49V0LVxGJFiRZnfiJ7LDWjmzJn84Q9/YPny5bzyyis4HA7fa2r12dUxCoUCl8tV7/u2bNmSvXv3XnJ7ztQpl8txuy9tDFuhUJw3n2K3232PdTrdJbdFEOoSF6JlWHI8ieE65Bfp4QhXTgSXG5DJZPLNi2RlZdV5fY8ePVi9ejU2mw2z2cwPP/xQ43Xjx4/n73//O6WlpQA4HA4WLVqEXq8nODiY3NxcAJYtW0aPHj1qrTMwMJDq6mrf8+7du/P1118DsGXLFsLCwggKCiI+Pp79+/cDsG/fPvLz8+v8egRBuP6JYbHrnNVqZcCAAb7nY8aMYdKkSUyZMoWQkBB69epV5wdyUlIS6enpDBs2jPDwcDp27FjjdQMHDqSsrIwxY8YgSRIymYy7774bgDfffNM3oZ+YmMgbb7xRa529evVi7ty5DBs2jPHjxzNp0iSef/55MjIy0Gq1/O1vfwNgyJAhLFu2jKFDh9KpUyeaNm16Cf86giBcr2SSWPANeA+8+e1hYQcOHKBdu3bXqEVCbcT/jSBcH2r67AQxLCYIgiD4gQgugiAIQoMTwUUQBEFocH4LLkePHmXYsGG+P127duU///kPVVVVjBkzhsGDBzNmzBgMBgPgTfExc+ZM0tLSyMjIYN++fb57LV26lMGDBzN48GCWLl3qK9+7dy8ZGRmkpaUxc+ZMX76gi9UhCIIgXB1+Cy7Nmzdn2bJlLFu2jCVLlqDVaklLS2Pu3Ln06dOH7Oxs+vTpw9y5cwHIyckhLy+P7OxsXn31VWbMmAF4A8Xs2bNZuHAhixYtYvbs2b5gMWPGDF599VWys7PJy8sjJycH4KJ1CIIgCFfHVRkW27x5M4mJicTHx7NmzRoyMzMByMzMZPXq1QC+cplMRnJyMkajkZKSEjZs2EC/fv0IDQ0lJCSEfv36sX79ekpKSjCbzSQnJyOTycjMzGTNmjXn3eu3dQiCIAhXx1UJLitXruTOO+8EoLy8nOjoaACioqIoLy8HoLi4mNjYWN97YmNjKS4uvqA8JiamxvIz19dWx43oUlLuP/vss3z77bd+bI0gCEL9+D24OBwO1q5dy+23337BazKZzO+nCl6NOgRBEITz+T245OTkkJSURGRkJAARERGUlJQAUFJSQnh4OODtkRQVFfneV1RURExMzAXlxcXFNZafub62Ovwta0cB/f62lmbPrqTf39aStaPmJI5XwmQyceutt/rycVksFgYOHIjT6WzwugRBEC6X34PLypUrGTp0qO95amqqLx9WVlYWgwYNOq9ckiR27tyJXq8nOjqalJQUNmzYgMFgwGAwsGHDBlJSUoiOjiYoKIidO3ciSVKN9/ptHf6UtaOA55bsoaDKigQUVFl5bsmeBg8wer2etm3b8vPPPwPw448/kpKSgkp14cFIgiAI14pfg4vFYmHTpk0MHjzYVzZu3Dg2btzI4MGD2bRpE+PGjQO8ea0SExNJS0tj2rRpTJ8+HYDQ0FAeeeQRRo4cyciRI5k4cSKhoaEATJ8+nRdffJG0tDQaN27sy8F1sTr86e/f/YL1N6faWZ1u/v7dLw1eV3p6OqtWrQK8wTs9Pb3B6xAEQbgSfk1cqdPp2LJly3llYWFhfPLJJxdcK5PJfAHlt84Elt/q2LEjK1asuKD8YnX4U2GV9ZLKr0RqairvvvsuVVVV7Nu3j969ezd4HYIgCFdC7NBvII1CtZdUfiUCAwPp0KEDr732GrfccgsKhTimVRCE64sILg3kqSFt0P7mLG6tSsFTQ9pc0X3PpNw/82f+/PmAd2js66+/FkNigiBcl8R5Lg0ks4v3jPe/f/cLhVVWGoVqeWpIG1/55Tp48GCN5bfffju//HL+fM6ZM1IEQRCuNRFcGlBml/grDiaCIAg3AzEsJgiCIDQ4EVwEQRCEBieCiyAIgtDgRHARBEEQGpwILoIgCEKDE8HlOvfblPtLlizhlVdeAWDWrFnMmzfvgvfMmTOHoUOHkpGRwbBhw9i1a9dVaasgCMIZYinyTWbHjh38+OOPLF26FLVaTUVFhciYLAjCVSeCS0PavRDWvAKGfAhJgEEvQadRV7UJpaWlhIWFoVarAa7acQPC74fL7UEukyGXi3OShIsTwaWh7F4IyyeD83SiSsNJ73O4ogBjs9kYNmyY77nBYCA1NfWi1/fr148PPviAIUOG0KdPH9LT0+nZs+dl1y8I5yox2ng7+xcSwnX8oVcTwgPV17pJwnVKBJeGsuaVs4HlDKfVW34FwUWj0bBs2TLf8yVLlrB3796LXh8YGMiSJUvIzc1ly5YtTJ06lSeeeIIRI0ZcdhsEAbw9lrezf+HL3HwAujUOo2/LyGvcKuF6JYJLQzHkX1q5HykUCnr16kWvXr1o3bo1WVlZIrhcKWsluBygVIM27Fq35pq
2018-09-05 00:48:12 +02:00
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x=\"Gr Liv Area\", y=\"SalePrice\", hue=\"Land Contour\", s=15, data=df);"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 62,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"del df[\"Land Contour\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Lot Configuration\n",
"\n",
"This variable shows no good pattern and is dropped."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 63,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
2020-06-29 01:10:19 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZcAAAEGCAYAAACpXNjrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdd3iT1dvA8W9G06R7t0DZgmxRWZUKCJYpUJbiQMGBPwEFFyAiQxnqqwKKA1ARRRCoQGUIZSgFLQgIsnfLKHTPtE3SJM/7x4FAaSkFGso4n+vqZfMkec6hQu6edd8qRVEUJEmSJKkcqSu6A5IkSdKdRwYXSZIkqdzJ4CJJkiSVOxlcJEmSpHIng4skSZJU7rQV3YFbRcuWLalSpUpFd0OSJOm2kpiYyLZt24pdl8HlvCpVqrB06dKK7oYkSdJtpXfv3iVel9NikiRJUrmTwUWSJEkqdzK4SJIkSeVOrrmUorCwkDNnzmAymSq6K3cMvV5PaGgoLi4uFd0VSZKcSAaXUpw5cwZPT09q1KiBSqWq6O7c9hRFIT09nTNnzlCzZs2K7o4kSU4kp8VKYTKZ8Pf3l4GlnKhUKvz9/eVIUJLuAnLkchUysJQv+fOUnEVRFFJzzaTnWQjydMXfw7Wiu3RXk8FFkqQ7Qmqumce+2EJKrpmODYL5uG8TfNx0Fd2tu5acFrvF3X///WV+7dKlS0lOTr7i89999x2dO3emZ8+e9OnTh+XLl19XnywWCwMHDqRnz56sXr2ad999l2PHjl3XvSSpvFhsdlJyzQDsP5uDxWq/4mtlGSvnkyOXO8iyZcuoU6cOwcHBxZ5buHAhf//9N1FRUXh4eGA0Glm3bt11tXPgwAEAoqOjAejatev1d1qSyomnXsv47g1YvfccozvXw9e9+I5Eq83OsRQjc/+Kp1/zqjSq7I3eRVMBvb3zyeByGzp48CDjx4+noKCAatWqMWXKFOLi4ti3bx9vvfUWer2eRYsWodfrHe+ZNWsWP/30Ex4eHgB4eHjQq1cvAOLi4vjoo4+w2Ww0atSIiRMnotPpaN++PZGRkfzxxx9YrVamT5+Oj48Pb7/9NhkZGfTs2ZMvvviCd999l5EjR9K4cWOWLFnCt99+i6enJ/Xq1UOn0zFu3LgK+TlJdxdvg46nWlaj1/1V8NS7oFEXX9/LyLfQf85WsvILWbbrLJtHPSKDi5PIabHb0MiRI3nrrbdYsWIFdevWZebMmXTu3JlGjRrxySefEB0dXSSwGI1G8vLyqFq1arF7mc1mRo8ezbRp01ixYgU2m40FCxY4nvf19WXZsmX079+f77//Hn9/fyZNmkSzZs2Ijo6mWrVqjtcmJyfz9ddfs2jRIhYuXMiJEyec+4OQpMu4ajX4uOlKDCwAalR4G8SIxkOvRW4vcR4ZXG4zubm55Obm0qJFCwB69erFjh07rvt+8fHxhIaGOs6dXH6/jh07AtCoUSMSExNLvdfevXtp3rw5Pj4+uLi40Llz5+vulyQ5Q4CnK78MbsWM/k2JHtqaALmjzGlkcLkLeHh44ObmxunTp6/5vRdO0qvVamw2W3l3TZJuukreBno2rUJVPzfUVxjhSDdOBpfbjKenJ15eXo7RRXR0NM2bNwfA3d2dvLy8Et83ePBgJk6ciNFoBCAvL4/ly5dTs2ZNEhMTOXnyZLH7XavGjRuzfft2srOzsVqtxMTEXNd9JEm6/ckF/VtcQUEBbdq0cTweNGgQH330kWNBv2rVqkydOhUQU1rjx48vcUH/qaeeIj8/nz59+uDi4oJWq2XQoEG4uroydepUhg8f7ljQf/LJJ6+rr8HBwbz88sv069cPb29vatWqhaen5439ACRJui2pFLnhGxAFby4vFnbw4EHq169fQT26PeXl5eHu7o7VamXYsGH06dOHiIiIIq+RP1dJunOU9NkJcuQilbOZM2fy999/YzabCQ8P59FHH63oLkmSVAFkcJHK1ahRoyq6C5Ik3QLkgr4kSZJU7pwWXE6cOEHPnj0dXw888AA//PADWVlZDBo0iI4dOzJo0CCys7MBketn0qRJRERE0L17d/bv3++417Jly+jYsSMdO3Zk2bJljuv79u2je/fuREREMGnSJEe+oCu1IUmSJN0cTgsutWrVIjo6mujoaJYuXYrBYCAiIoLZs2cTFhZGTEwMYWFhzJ49G4DY2FgSEhKIiYnhgw8+YMKECYAIFDNnzmTx4sUsWbKEmTNnOoLFhAkT+OCDD4iJiSEhIYHY2FiAK7YhSZIk3Rw3ZVosLi6OqlWrUqVKFTZs2EBkZCQAkZGRrF+/HsBxXaVS0bRpU3JyckhJSWHLli20bt0aHx8fvL29ad26NZs3byYlJQWj0UjTpk1RqVRERkayYcOGIve6vA1JkiTp5rgpC/qrVq3iscceAyA9PZ2goCAAAgMDSU9PB0ReqpCQEMd7QkJCSE5OLnY9ODi4xOsXXl9aG7ej1NRUpkyZwt69e/Hy8sLf358xY8bIMsGSJN3SnB5cLBYLGzdu5M033yz2nEqlcnplwpvRhrMoisKwYcOIjIxk2rRpABw6dIj09PSrBher1YpWe2P/e8vjHpIk3Z2c/skRGxtLw4YNCQgIAMDf35+UlBSCgoJISUnBz88PECOSpKQkx/uSkpIIDg4mODiYf/75x3E9OTmZFi1aXPH1pbXhbMt3JfJ/aw9zNquAyj4G3u50L5H3V7nu+23duhWtVlvkxHy9evVQFIWPPvqIzZs3o1KpeOWVV+jatSvbtm1jxowZeHl5ER8fz/vvv8/MmTPx9fXlyJEjNGzYkE8++QSVSsW+ffv48MMPyc/Px9fXl6lTpxIUFMSAAQOoV68eO3fu5LHHHuP5558vjx+NJEl3GaevuaxatYpu3bo5Hrdv395RAXH58uV06NChyHVFUdi9ezeenp4EBQURHh7Oli1byM7OJjs7my1bthAeHk5QUBAeHh7s3r0bRVFKvNflbTjT8l2JvLN0L4lZBShAYlYB7yzdy/JdpWcSLs3Ro0dp2LBhsesxMTEcOnSI6Oho5s6dy8cff0xKSgogCnm9++67rF271vF4zJgxrF69mjNnzrBz504KCwuZNGkSn3/+OUuXLqVPnz6OkRFAYWEhS5culYFFkqTr5tSRS35+Pn///Tfvv/++49rgwYMZMWIEUVFRVK5cmenTpwPQtm1bNm3aREREBAaDgSlTpgDg4+PDkCFD6Nu3LwBDhw7Fx8cHgPHjx/POO+9gMplo06aNIwfXldpwpv9be5iCwqJZgwsKbfzf2sM3NHopyc6dO+nWrRsajYaAgACaN2/O3r178fDwoHHjxkXqtjRp0sSxNlWvXj0SExPx8vLiyJEjDBo0CAC73U5gYKDjPbKypCRJN8qpwcXNzY1t27YVuebr68u8efOKvValUjF+/PgS79O3b19HcLlU48aNWblyZbHrV2rDmc5mFVzT9bKoU6eOYwRSVm5ubkUe63Q6x/cajQabzYaiKNSpU4dFixaVeA+DwXDtnZUkSbqEPKFfTir7lPyBfKXrZdGqVSssFkuRIHDo0CG8vLz4/fffsdlsZGRksGPHDpo0aVLm+9asWZOMjAx27doFiGmwo0ePXnc/JUmSLie3ApWTtzvdyztL9xaZGjO4aHi7073XfU+VSsXMmTOZMmUKc+bMwdXVlSpVqjBmzBjy8vLo2bMnKpWKt99+m8DAwDKXFdbpdHz++edMmjSJ3NxcbDYbzz33HHXq1LnuvkqSJF1Kptw/rzxS7pf3brE7lUy5L0l3Dply/yaIvL+KDCaSJEnINRdJkiTJCWRwkSRJksqdDC6SJElSuZPBRZIkSSp3MrhIkiRJ5U4Gl1vc/ffff13vW7hwoSO/2qXOnDnjKH8gSZLkLHIr8h3q0kzKkiRJN5sMLuVpz2LY8D5knwHvUOgwDpo8Xi633rZt2xXT53/yySds3LgRjUZDeHg4o0aN4osvvsDNzY0XXniBffv2MWbMGABat27tuKfNZuOTTz7hn3/+wWKx8PTTT9O/f/9y6a9057LarahVatQqOfE
2018-09-05 00:48:12 +02:00
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x=\"Gr Liv Area\", y=\"SalePrice\", hue=\"Lot Config\", s=15, data=df);"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 64,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"del df[\"Lot Config\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### MS SubClass\n",
"\n",
"By looking at this variable's realizations, one can see that several distinct features are lumped together in one. In particular, the above variables *has 2nd Flr* and *build_type_\\** and the future age related features at the bottom of this notebook together should comprise the same patterns in a more advantagous way. Thus, the column is dropped."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 65,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['1-STORY 1946 & NEWER ALL STYLES',\n",
" '1-STORY 1945 & OLDER',\n",
" '1-STORY W/FINISHED ATTIC ALL AGES',\n",
" '1-1/2 STORY - UNFINISHED ALL AGES',\n",
" '1-1/2 STORY FINISHED ALL AGES',\n",
" '2-STORY 1946 & NEWER',\n",
" '2-STORY 1945 & OLDER',\n",
" '2-1/2 STORY ALL AGES',\n",
" 'SPLIT OR MULTI-LEVEL',\n",
" 'SPLIT FOYER',\n",
" 'DUPLEX - ALL STYLES AND AGES',\n",
" '1-STORY PUD (Planned Unit Development) - 1946 & NEWER',\n",
" '1-1/2 STORY PUD - ALL AGES',\n",
" '2-STORY PUD - 1946 & NEWER',\n",
" 'PUD - MULTILEVEL - INCL SPLIT LEV/FOYER',\n",
" '2 FAMILY CONVERSION - ALL STYLES AND AGES']"
]
},
2020-06-29 01:10:19 +02:00
"execution_count": 65,
2018-09-05 00:48:12 +02:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"list(ALL_COLUMNS[\"MS SubClass\"][\"lookups\"].values())"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 66,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"del df[\"MS SubClass\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### MS Zoning\n",
"\n",
"This variable is dropped as most houses are located in a \"residential\" zone."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 67,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"RL 2252\n",
"RM 459\n",
"FV 131\n",
"RH 27\n",
"C 25\n",
"I 2\n",
"A 2\n",
"RP 0\n",
"Name: MS Zoning, dtype: int64"
]
},
2020-06-29 01:10:19 +02:00
"execution_count": 67,
2018-09-05 00:48:12 +02:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[\"MS Zoning\"].value_counts()"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 68,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
2020-06-29 01:10:19 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZcAAAEGCAYAAACpXNjrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdd3RU1drA4d/UZNJ7I6FDiLSg9CQg4SZICQREsFwLooKiICpesXIF8YrXKyg21A8bqIgUISiBgEBoAtJ7C5CQ3iaT6TPn+2NgICSEQBJC2c9aLDL7nDl7J2tm3jm7vFsmSZKEIAiCINQheUM3QBAEQbj1iOAiCIIg1DkRXARBEIQ6J4KLIAiCUOdEcBEEQRDqnLKhG3Cj6NatG40aNWroZgiCINxUsrKy2Lp1a6VyEVzOadSoEYsWLWroZgiCINxUhg0bVmW56BYTBEEQ6pwILoIgCEKdE8FFEARBqHNizEUQhFuCxWIhMzMTo9HY0E25Jbm6uhIeHo5KparR+SK4CIJwS8jMzMTT05OmTZsik8kaujm3FEmSKCwsJDMzk2bNmtXoOaJbTBCEW4LRaMTf318Elnogk8nw9/e/qrtCEVwEQbhlWO0SBrMNq83e0E255Vxt0BbdYoIg3BJsdoljeTosNjterirCfTUoFeL7c0MRf3lBEG4JEhKWc3csRouN6jaqqq9trCIjI3nppZecj61WK927d2fMmDEAFBQUMGbMGAYPHsyAAQN48sknK11j1apVDBkypMK/Nm3asG7dumtq02uvvcaxY8eu7ReqBXHnIgjCLUEukxHso6FUbyHE2xWFvHI3jiRJGC12CnUmfN3VaFQK5FWcd63c3Nw4evQoRqMRV1dXNm7cSHBwsPP4Rx99RM+ePXn00UcBOHToUKVrJCQkkJCQ4Hz8888/s2zZMuLi4q6pTe+88841Pa+2xJ2LIAi3BLlMhp+7mib+bripFcirGCOw2iVOFOgo0ps5UVCOrR7uYHr37s2ff/4JQEpKCgMHDnQey8vLIyQkxPm4TZs21V7r5MmTfPLJJ8yYMQO5XI4kSbz33nsMGjSIpKQkVqxYAcDWrVt5+OGHGT9+PPfccw8vvvii8+7s4YcfZu/evQB06tSJDz/8kMGDBzNixAgKCgoAOH36NCNGjCApKYkPP/yQTp061frvIIKLIAi3DLlMhlIhr3bw+fwdjaKeZpUNGDCAFStWYDKZOHz4MB07dnQee+ihh3jttdd4+OGH+eyzz8jNzb3sdSwWCy+++CKvvPIKYWFhAKSmpnLo0CGWLl3K3LlzmTFjBnl5eQAcOHCAV199lRUrVpCZmcmOHTsqXVOv19OxY0d+++03OnfuzIIFCwDH3c0jjzzCsmXLKgS/2hDBRRCE24ZKIad5gAeN/dxoGeSOsg67xM5r06YNmZmZLF++nN69e1c4FhcXx+rVqxkxYgQnTpxg6NChFBUVVXmdWbNm0apVKwYMGOAs27FjBwMHDkShUBAQEECXLl2cdyUdOnQgJCQEuVxOmzZtyMrKqnRNlUpFnz59AGjXrp3znF27dnHPPfcAkJSUVPs/AiK4CIJwm1Er5fi4qVErFfW2JiY+Pp4ZM2ZU6BI7z8fHh6SkJN5//33at2/Ptm3bKp2zdetWUlNTeeONN2pcp1qtdv6sUCiw2WyVzlGpVM7fWS6XV3lOXRHBRRAEoY4NHz6ccePGERkZWaF88+bNGAwGAHQ6HadPnyY0NLTCOaWlpUyePJn33nsPDw+PCsc6d+7M77//js1mo6ioiO3bt9OhQ4dat7djx46kpqYCjnGiuiBmiwmCINSxkJAQHnnkkUrl+/fvZ+rUqSgUCiRJ4r777qsUHH766SeKioqYMmVKhfIxY8bQv39/du7cyZAhQ5DJZEyaNInAwEBOnDhRq/a++uqrTJo0ic8++4y4uLhKQe1ayKT6mvB9kxk2bJjYLEwQbmIHDx4kKiqqoZtxUzIYDLi6uiKTyUhJSWH58uV89tlnlc6r6m98uc9OceciCIJwm9u/fz9vv/02kiTh5eXF9OnTa31NEVwEQRBuc507d+a3336r02uKAX1BEAShztVbcDlx4kSF3Dh33nkn33zzDSUlJYwaNYrExERGjRpFaWkp4EjLMG3aNBISEkhKSmL//v3Oay1evJjExEQSExNZvHixs3zfvn0kJSWRkJDAtGnTnCtSL1eHIAiCcH3UW3Bp3rw5S5cuZenSpSxatAiNRkNCQgJz5syhR48epKam0qNHD+bMmQPA+vXrycjIIDU1lalTpzpnSpSUlDB79mwWLFjAL7/8wuzZs53BYsqUKUydOpXU1FQyMjJYv349wGXrEARBEK6P69IttnnzZiIiImjUqBFpaWkkJycDkJyczOrVqwGc5TKZjOjoaLRaLXl5eaSnpxMTE4OPjw/e3t7ExMSwYcMG8vLy0Ol0REdHI5PJSE5OJi0trcK1Lq1DEARBuD6uS3BJSUlh0KBBABQWFhIUFARAYGAghYWFAOTm5lbIaRMSEkJubm6l8uDg4CrLz59fXR2CIAj1KSoqiiFDhjBo0CDGjh2LVqsFHFswn/8MvF3Ue3Axm82sWbPGmbfmYjKZrN63JL0edQiCIAC4urqydOlSli9fjre3N/PmzWvoJjWYeg8u69evp23btgQEBADg7+/vzOKZl5eHn58f4LgjycnJcT4vJyeH4ODgSuW5ublVlp8/v7o6BEEQzluyM4uY/6yh2SspxPxnDUt2Vk70WBvR0dHVZj2+1dV7cLl0P4P4+HiWLFkCwJIlS+jbt2+FckmS2LVrF56engQFBREbG0t6ejqlpaWUlpaSnp5ObGwsQUFBeHh4sGvXLiRJqvJal9YhCIIAjsAyedFeskoMSEBWiYHJi/bWWYCx2Wxs3ryZ+Pj4OrnezaheF1Hq9Xo2bdrE22+/7Sx76qmneP7551m4cCFhYWHMnDkTcGyws27dOhISEtBoNM4Voj4+PjzzzDMMHz4cgHHjxuHj4wPAW2+9xeTJkzEajfTq1YtevXpVW4cgCALA+ysPY7BUzAhssNh4f+Vhkjs1uubrGo1GhgwZQm5uLi1atCAmJqa2Tb1p1WtwcXNzY+vWrRXKfH19+fbbbyudK5PJeOutt6q8zvDhw53B5WLt27dn+fLllcovV4cgCALA2RLDVZXX1PkxF4PBwOjRo5k3b16VCSxvB2KFviAIt50wH81VlV8tjUbD66+/zty5c7FarXVyzZuNCC6CINx2JvWLRKNSVCjTqBRM6hd5mWdcvTvuuIPIyEhn78rJkyed3fe9evXi999/r7O6bkQicaUgCLed8+Mq7688zNkSA2E+Gib1i6zVeAvAzp07Kzz+/PPPnT9fnNLqdiCCiyAIt6XkTo1qHUyEyxPdYoIgCEKdE8FFEARBqHMiuAiCIAh1TgQXQRAEoc6J4CIIgiDUORFcBEEQ6sjVpNz/+OOP+frrrxuimdeFCC6CIAh1RKTcv0AEF0EQrorVZsdulxq6GbW3ZwF82A6m+Dj+37OgTi9/u6fcF4soBUGosTytkQ9SDxPu58ZD3Zrg565u6CZdmz0LYNl4sJxLVFl6xvEYoMOIWl/+fMr9ixPunj59miFDhjgfFxQU8Pjjj9e6rhuVCC6CINSI1Wbng9TD/Lw9E4C7GvvSs2VAA7fqGqW9fSGwnGcxOMprEVyqS7nfuHFjli5d6nz88ccfX3M9NwPRLSYINVBqMJNXZqREb27opjQYuUxGuJ+b87G/x0161wJQmnl15TV0fsxl7dq1SJIkxlwEQXAo0JnIKjFQXH4hiJQazHyy5jjdp6cx44/DNQowdruExWavz6Zed3K5jH92a8K8J7rxx4S4OktP3yC8w6+u/CqJlPsiuAiCU6HOxPgfdxLznzW8s+KgM4iYrHa+TD+BXYL5f53GZKk+aJTozXy7OYNJv+whq5abT91ofN3VxLQMoE2oF56uqoZuzrXr+yaoLgmOKo2jvI5cmnL/dlOvwUWr1TJ+/Hjuuece+vfvz86dOykpKWHUqFEkJiYyatQoSktLAZA
2018-09-05 00:48:12 +02:00
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x=\"Gr Liv Area\", y=\"SalePrice\", hue=\"MS Zoning\", s=15, data=df);"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 69,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"del df[\"MS Zoning\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Masonry Veneer Type\n",
"\n",
"None of the groups have a slope differing from the overall one."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 70,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
2020-06-29 01:10:19 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZcAAAEGCAYAAACpXNjrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdd3RUxdvA8e/W7GbTOwGkN+lIlQgapBMSOr4qRfmhgFIERRAVISAqUgQLKiogKkgJ0iOhBhBRQXoRCJBAerJp2/e+fyxZCAkhQEKdzzkcktm7M7Mp+2TuzDwjkyRJQhAEQRBKkfxed0AQBEF4+IjgIgiCIJQ6EVwEQRCEUieCiyAIglDqRHARBEEQSp3yXnfgftGiRQvKly9/r7shCILwQElISGDfvn2FykVwuaJ8+fKsWrXqXndDEAThgdKzZ88iy8VtMUEQBKHUieAiCIIglDoRXARBEIRSJ+ZcimGxWIiPj8doNN7rrghXaDQaKlSogEqlutddEQShGCK4FCM+Ph53d3cqV66MTCa719155EmSRFpaGvHx8VSpUuVed0cQhGKI22LFMBqN+Pr6isByn5DJZPj6+oqRpCA8AMTI5SZEYLm/iO+HcCOSJJFqSCXdmI6/1h8frc+97tIjTQQXQRAeCqmGVPqt60eKIYXQiqFMaT0FTxfPe92tR5a4LXafq1WrFuPGjXN+brVaadmyJa+88sod1TthwgR++eWXAmVbtmxhyJAht1XfypUrCQ8PJzw8nHr16hEWFkZ4eDgzZ868o34KQklZ7BZSDCkAHE8/jtlmvvHF4hirMidGLvc5V1dXTp8+jdFoRKPRsHv3bgIDA++43q5du7JgwQL69+/vLFu/fj3dunUrcR1WqxWl0vEj1KtXL3r16gVAaGgoixYtwsdH3JYQ7h43lRtvN3+b6LhoRj8xGi8Xr8IX2ayQegL++AoavwjlGoJKc/c7+wgQI5cHQNu2bdm+fTvgCABdu3Z1Pnbo0CH69etHREQE/fv35+zZswCcPn2a3r17Ex4eTlhYGHFxcQXqbNWqFefOnSM5ORmAvLw89uzZw7PPPkt8fDydO3dm0qRJdO3alZdeesk5if7iiy8ybdo0evbsyeLFi4vt94oVK5g2bZrz8+XLlzN9+nTi4+Pp1KkTY8eOpXPnzowcORKDwQDAkSNHeOGFF+jZsycvv/yys3+CcDMeLh70rtGbz0I/o4FfA1SKIpar56XBD93gwBJY1A0MGXe/o48IEVweAF26dGHDhg2YTCZOnjxJw4YNnY9VrVqVpUuXEhUVxciRI5k9ezYAv/zyCwMGDGDNmjWsXLmSoKCgAnUqFAo6dOjAxo0bAdi2bRstWrTAzc0NgPPnz/P888+zfv163N3d2bx5s/O5FouFVatW8dJLLxXb786dO7Nt2zYsFgsAq1atco5uzp07x//93/+xceNGdDodP/30ExaLhcjISD777DPntfmvRxBKwkXpgqeLJwq5ougLZDLQel+52B1k4i2wrIjbYg+A2rVrEx8fz7p162jbtm2Bx7Kzsxk/fjznz59HJpM538gbNWrEV199RWJiIh06dKBy5cqF6u3atSsff/wxAwcOZP369YSHhzsfq1ChAnXq1AGgbt26JCQkOB/r0qVLifqt0+lo2bIl27dvp2rVqlgsFmrVqkV8fDzlypXjiSeeAKB79+4sWbKEp556ilOnTjF48GAA7HY7/v7+Jf9CCcLNuAXAoHVwfg9UaA468fNVVkRweUCEhoby8ccfs3jxYjIzM53lc+fOpUWLFnz++efEx8czYMAAAMLCwmjYsCHbt29n6NChfPDBB7Rq1apAnU2aNCElJYUTJ05w4MCBAqMEtVrt/FihUGAymZyfa7XaEve7T58+fPXVV1StWrVA9tTrlxTLZDIkSaJGjRosW7asxPULwi3zKA/1+9zrXjz0xJjwAdG7d29GjBhBrVq1CpRnZ2c7J/hXr17tLL948SIVK1ZkwIABtGvXjpMnTxaqUyaT0blzZ8aPH0+bNm1wcXEp9X43bNiQxMRE1q1bV2CxwKVLlzhw4AAA69at44knnqBKlSqkp6c7yy0WC6dPny71PgmCUPZEcHlABAUFOUcl1xoyZAizZs0iIiICq9XqLN+4cSPdunUjPDycU6dOERERUWS93bp148SJEwUWCZS2zp0706RJEzw9r+45qFKlCkuXLqVz585kZWXx3HPPoVar+eyzz5g5cybdu3cnIiLCGWgEQXiwyCRJLPgGx4E31x8Wdvz4cee8g3D7XnnlFQYNGuS8LRcfH8+rr77KunXrbqs+8X0RhPtHUe+dIEYuQhnKysqiY8eOuLi4FJrvEQTh4SYm9IUy4+HhUWAJc74KFSrc9qhFEIQHgxi5CIIgCKWuzILL2bNnnbmmwsPDadKkCT/88AOZmZkMHjyYDh06MHjwYPR6PeDIaBoZGUn79u0JCwvj6NGjzrpWr15Nhw4d6NChQ4EVUUeOHCEsLIz27dsTGRlJ/vTRjdoQBEEQ7o4yCy5Vq1ZlzZo1rFmzhlWrVqHVamnfvj1ff/01rVq1Ijo6mlatWvH1118DsHPnTuLi4oiOjmbq1KlMnjwZcASK+fPns3z5cn799Vfmz5/vDBaTJ09m6tSpREdHExcXx86dOwFu2IYgCIJwd9yV22J79+6lYsWKlC9fnpiYGOey2IiICLZs2QLgLJfJZDRq1IisrCySk5OJjY2ldevWeHl54enpSevWrdm1axfJycnk5OTQqFEjZDIZERERxMTEFKjr+jYEQRCEu+OuTOhfm203LS2NgIAAAPz9/UlLSwMgKSmpQP6roKAgkpKSCpUHBgYWWZ5/fXFtPIi+/PJL1q1bh1wuRy6XM2XKFA4cOEC/fv1uaae8IAjC3VTmwcVsNrN161bGjh1b6DGZTFbmJwvejTbKyoEDB9i+fTurV69GrVaTnp6OxWJh8eLFdO/eXQQXQRDuW2V+W2znzp3UrVsXPz8/AHx9fZ1p1JOTk51nfgQGBpKYmOh8XmJiIoGBgYXKk5KSiizPv764Nspa1IEEWs/YSpW319N6xlaiDiTc/EnFSElJwdvb25nny8fHh82bN5OcnMzAgQN58cUXAUf6lLCwMLp168Ynn3zifH7jxo2ZPXs23bt3p2/fvqSmpgKQnp7O66+/7jyD5e+//76jfgqCIFyvzIPL9eePhIaGEhUVBUBUVBTt2rUrUC5JEgcPHsTd3Z2AgABCQkKIjY1Fr9ej1+uJjY0lJCSEgIAA3NzcOHjwIJIkFVnX9W2UpagDCUxYdZiETAMSkJBpYMKqw3cUYFq3bs3ly5fp2LEjkydP5s8//2TAgAEEBASwaNEilixZQlJSEjNnzmTRokVERUVx+PBh5xxTXl4eDRs25LfffqNp06YsX74cgGnTpjFw4EBWrlzJvHnzmDRpUml8CQRBEJzK9LZY/gFUU6ZMcZYNHTqU0aNHs2LFCoKDg5kzZw7gOBBrx44dtG/fHq1Wy/Tp0wHw8vJi+PDh9O7dG4ARI0bg5eU4Ye79999nwoQJGI1G2rRpQ5s2bYptoyx9svkkBoutQJnBYuOTzSeJaFz+turU6XSsWrWKv/76i3379jFmzJhCtxcPHz5M8+bNnaOzsLAw9u/fz7PPPotKpeKZZ54BoF69euzevRuAPXv28N9//znryMnJITc3F51Od1v9FARBuF6ZBhdXV1f27dtXoMzb25tFixYVulYmk/H+++8XWU/v3r2dweVa9evXL3Kn943aKEuXMg23VF5SCoWCFi1a0KJFC2rWrOkckZWESqVyzjfJ5XJsNkfws9vtLF++vEyyIAuCIIDYoV9qgr2Knly/UXlJnD17tsDxxMePHyc4OBidTkdubi4ADRo0YP/+/aSnp2Oz2Vi/fj3NmjUrtt6QkBCWLFlSoF5BEITSJHKLlZI3O9ZiwqrDBW6NaVUK3uxYq5hnFS8vL4/IyEiysrJQKBRUqlSJKVOmsH79eoYMGUJAQABLlixh7NixDBw4EEmSaNu2Lc8++2yx9b7zzjtMmTKFsLA
2018-09-05 00:48:12 +02:00
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x=\"Gr Liv Area\", y=\"SalePrice\", hue=\"Mas Vnr Type\", s=15, data=df);"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 71,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"del df[\"Mas Vnr Type\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Miscellaneous Features\n",
"\n",
"This variable is basically a \"other\" field with no pattern."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 72,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
2020-06-29 01:10:19 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZcAAAEGCAYAAACpXNjrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdd3xTVf/A8U9m03TvllKBsqysgqxKASkWFCwtQ5yoPPgDH1QQFQQHICC4QcEBjwoiDoZAZahVkCkge5UNBVq6R9o0zb6/PwKB0kELLfO8Xy9fNif33nNaknxzz/gemSRJEoIgCIJQg+Q3ugGCIAjC7UcEF0EQBKHGieAiCIIg1DgRXARBEIQaJ4KLIAiCUOOUN7oBN4sOHToQGhp6o5shCIJwS0lLS2Pbtm1lykVwOS80NJSlS5fe6GYIgiDcUvr161duuegWEwRBEGqcCC6CIAhCjRPBRRAEQahxYsxFEISbmsViITU1FaPReKObckfTaDTUrVsXlUpVpeNFcBEE4aaWmpqKh4cH9evXRyaT3ejm3JEkSSI3N5fU1FQaNGhQpXNEt5ggCDc1o9GIn5+fCCw3kEwmw8/Pr1p3j+LORRCEm15VAoskSVjtElabhEohQ6kQ351rUnWDuwgugiDcFqx2ieNZeiw2O54aFXV9XEWAuYHEX14QhNuCJElYbHYAjBYblW1UVZ1trJo2bcprr73mfGy1WunYsSPDhg0DYM2aNcyZM+eq2nzB2LFjiYmJIT4+nvj4eObPn1/ta6SmprJixYprakdNEncugiDcFuRyGXW8XdEZLAR7aVDIy3bjSJKE0WInV2/Cx02Nq0qBvJzjLqXVajl27BhGoxGNRsPmzZsJCgpyPt+9e3e6d+9+ze0fM2YMDz744FWfn5aWxsqVK4mLi6vWeTabDYVCcdX1VkTcuQiCcFtQyuX4uqmp56dFq1YgL2eMwGqXOJmjJ89g5mROMbYq3sF07dqVdevWAbBq1Sp69+7tfG7p0qVMmjQJgN9++42HH36YPn368OSTTwKOD+/333+fhx9+mLi4OL7//vsq1WkwGBg3bhwDBgwgISGBv/76C3DcoTzxxBP07duXvn37smvXLgA+/vhjduzYQXx8PPPmzSvVLoBhw4Y5c4C1bt2a9957jz59+rB7924SExMZMGAA8fHxjB8/HpvNVqU2VkYEF0EQbhtymWMgv7LB5wt3NIpqDFD36tWL1atXYzKZOHLkCK1atSr3uC+++IJvvvmGX3/9lS+//BKAhQsXkpaWxvLly1mxYkWFdxYffPCBs1vsyJEjfPXVV3Ts2JElS5Ywf/58PvzwQwwGA35+fsydO5dly5Yxffp0pkyZAsCrr75K27ZtSUxM5Nlnn6309zEYDLRs2ZJff/0VHx8ffvvtN3766ScSExORy+U10r0musUEQbhjqBRywv3dMZitaNUKlFfoErvg7rvvJjU1lZUrV9K1a9cKj2vdujVjx47loYceIjY2FoAtW7bw2GOPoVQ6Pm69vb3LPffybrFx48axdu1avv32WwBMJhPp6ekEBgYyadIkDh8+jFwuJyUlpUq/w6UUCgU9e/Z0tu/AgQMMGDAAuDj1+1qJ4CIIwh1FrZSjVqqrfV5MTAwffPAB8+fPp6CgoNxjJk2axN69e1m3bh39+/fnl19+uaa2fvbZZ4SHh5cqmzlzJv7+/iQmJmK322nZsmW55yoUCux2u/OxyWRy/uzi4uIcZ5Ekib59+/Lqq69eU1svJ7rFBEEQqmDAgAG88MILNG3atMJjzpw5Q6tWrRg5ciQ+Pj5kZGRw3333sXDhQqxWK0CFgely0dHRLFiwwDmzLTk5GYCioiICAgKQy+UkJiY6x0fc3NwoLi52nh8aGsrhw4ex2+2kp6ezb9++cuuJiorijz/+IDc319m+tLS0KrWxMuLORRAEoQqCg4N5+umnKz3mgw8+4PTp00iSRMeOHbn77rtp3LgxKSkp9OnTB6VSycCBA3nqqaeuWN/w4cOZOnUqffr0wW63U7duXWbPns0TTzzBSy+9xPLly+ncuTNarRZwTJmWy+X06dOHfv368cwzzxAaGkqvXr1o2LAhzZo1K7eeRo0a8fLLL/Of//wHu92OSqVi/Pjx17x5okyqzoTv21i/fv3EZmGCcBM6dOgQERERN7oZAuX/W1T02Sm6xQRBEIQaJ4KLIAiCUONEcBEEQRBqXK0Fl5MnTzoXBMXHx9OmTRvmzZtHQUEBgwcPpkePHgwePBidTgc4psNNmTKF2NhY4uLiOHjwoPNay5Yto0ePHvTo0YNly5Y5yw8cOEBcXByxsbFMmTLFOauiojoEQRCE66PWgkt4eDiJiYkkJiaydOlSXF1diY2NZc6cOURFRZGUlERUVJQz4duGDRtISUkhKSmJyZMnM3HiRMARKGbNmsWiRYtYvHgxs2bNcgaLiRMnMnnyZJKSkkhJSWHDhg0AFdYhCIIgXB/XpVtsy5YthIWFERoaypo1a0hISAAolS/nQrlMJiMyMpLCwkKysrLYtGkTnTp1wtvbGy8vLzp16sTGjRvJyspCr9cTGRmJTCYjISGBNWvWlLrW5XUIgiAI18d1CS6rVq3i4YcfBiA3N5fAwEAAAgICnAt3MjMzCQ4Odp4THBxMZmZmmfKgoKByyy8cX1kdgiAIV6Np06a89957zsfffPMNM2fOLHVMfHw8o0aNut5Nu2nVenAxm82sXbu23FTSMpms1rcuvR51CIJwe1Or1SQlJZGXl1fu8ydOnMBut7Njxw4MBsN1bt3NqdaDy4YNG2jWrBn+/v4A+Pn5kZWVBUBWVha+vr6A444kIyPDeV5GRgZBQUFlyjMzM8stv3B8ZXUIgnD7W747jU7vraXB2FV0em8ty3fXQCoTpZJHH32U7777rtznV65cSZ8+fYiOjnZ2z9/paj24XL73QUxMDMuXLwdg+fLlzk12LpRLksSePXvw8PAgMDCQ6OhoNm3ahE6nQ6fTsWnTJqKjowkMDMTd3Z09e/YgSVK517q8DkEQbm/Ld6cxbul+0gpKkIC0ghLGLd1fIwHmySefZMWKFRQVFZV5bvXq1fTu3ZvevXuzatWqa67rdlCrwcVgMPDPP//Qo0cPZ9nQoUPZvHkzPXr04J9//mHo0KGAYzOesLAwYmNjefvtt5kwYQLgSE89fPhwBgwY4EwcdyFl9YQJE3jrrbeIjY3lrrvuokuXLpXWIQjC7e3DP45QYim90VWJxcaHfxy55mu7u7uXuwXx/v378fHxoU6dOkRFRZGcnFzl5JS3s1pNXKnVap07n13g4+NT7q2lTCZzBpTLXQgsl2vRogUrV64sU15RHYIg3N7OFZRUq7y6nnnmGfr160e/fv2cZatWreLUqVPExMQAoNfrSUpKYuDAgTVS561KrNAXBOG2UcfbtVrl1eXt7c2DDz7IkiVLALDb7fz222/8+uuvrF27lrVr1/LFF1+U+6X3TiOCiyAIt43RPZviqlKUKnNVKRjds+I9WKrrP//5D/n5+QDs2LHDOcHognbt2nHixAnnpKI7ldjPRRCE20ZCa8ceJB/+cYRzBSXU8XZldM+mzvKrtXv3bufP/v7+7N271/l40aJFpY5VKBRs3rz5muq7HYjgIgjCbSWhdeg1BxPh2oluMUEQBKHGieAiCIIg1DgRXARBEIQaJ4KLIAiCUONEcBEEQRBqnJgtJgiCcAU5OTlMmzaNPXv24OXlhUql4rnnniM2NrZK56enpzNmzBhyc3ORyWQMHDiQZ555ppZbfWOJ4CIIglAJSZJ44YUXSEhI4OOPPwYgLS2NtWvXVul8q9WKQqFg7NixNGvWDL1eT//+/enUqRONGjWqzabfUCK4CIJQLVabHblMhlx+k+6TtG8RrJkEulTwqgvdx0PLq8/ztXXrVlQqFY8//rizLDQ0lEGDBpGamsqYMWMoKXHkLnv77bdp06YN27Zt49NPP8XT05NTp07xxx9/ODcwdHd3Jzw8nMzMTBFcBEEQALIKjXycdIS6vlqe7FAPXzf1jW5SafsWwYoRYDm
2018-09-05 00:48:12 +02:00
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x=\"Gr Liv Area\", y=\"SalePrice\", hue=\"Misc Feature\", s=15, data=df);"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 73,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"del df[\"Misc Feature\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Roof\n",
"\n",
"Roofs in Ames, IA, are not special enough to make a difference in the price. Even \"hip\" roofs seem already priced in bigger houses."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 74,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
2020-06-29 01:10:19 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZcAAAEGCAYAAACpXNjrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdd3RU1drA4d+UTOqkkgaEHkKkhSjSWyCUQCQURfBD4eKliIpgo4MgQb0qqKgXRClWipBcQDAaamhSEqrU0BJI75NMP98fQ0YCSUggCYL7Wcsls+ecvc/Mmsw7Z5d3yyRJkhAEQRCEKiR/0BcgCIIgPHpEcBEEQRCqnAgugiAIQpUTwUUQBEGociK4CIIgCFVO+aAv4O+iXbt21KlT50FfhiAIwkMlOTmZgwcP3lEugstNderUYcOGDQ/6MgRBEB4qgwcPLrVcdIsJgiAIVU4EF0EQBKHKieAiCIIgVDkx5iIIwiPLYDCQlJSEVqt90Jfy0LOzs6Nu3brY2NhU6HgRXARBeGQlJSWhVqtp0KABMpnsQV/OQ0uSJDIzM0lKSqJhw4YVOkd0iwmC8MjSarV4eHiIwHKfZDIZHh4elboDFHcugiA8EiRJwpSRgTEzE6WnF0oPdwARWKpIZd9HEVwEQXgkmDIyuDRkCMa0dJx69sQ3csGDvqR/NNEtJgjCI8Gs12NMSwdA++efoNeXeWxNbmMVGBjIwIEDGTBgAOPHjycvL++e6rl48SIDBw4kIiKCq1evlnguJCSEESNGlCgrbrM8SUlJbNq0yfr44MGDjBs37p6u73YiuAiC8EhQqNV4z5iO/eOPU+fD/yB3db3jGEmSKNKbSM4uQqMzYjZXf5Cxs7MjOjqazZs34+Liwvfff39P9cTGxtKnTx+ioqKoV6/eHc9rNBpu3LgBWAJRRSQnJ7N58+Z7up67EcFFEIRHgsLZGddnnsHvi8+xb90aeSlTZo1micSMArIK9SRmaDDV8Ea8QUFBpKamAvDnn3/yzDPPEB4ezsSJE8nNzS2zfNeuXaxatYoff/yRkSNHllp3v379+OWXXwDYvHkz/fv3tz6XlJTEiBEjGDRoEIMGDeLo0aMAfPTRRxw+fJiBAweycuXKKn2tIrgIgvDIkNvaonBxQaZQlHmMQm4ZmFbU8EC/yWRi//79hISEAPDWW2/xxhtvsGnTJpo2bcqSJUvKLO/WrRvPPvsso0aN4ttvvy21/t69e/Pbb78BsGPHDms7AB4eHqxYsYKNGzeyaNEi3n33XQBef/11nnjiCaKjoxk1alSVvl4xoC8Iwj+GjUJOo1pOFOqNOKgUKOXVH2C0Wi0DBw4kNTWVxo0b06lTJ/Lz88nPz+fJJ58EYNCgQUyaNKnM8opwdXXF2dmZLVu20LhxY+zs7KzPGY1G5s2bx5kzZ5DL5Vy+fLnKX+ftxJ2LIAj/KCqlHFcHFSqlokamKRePuezYsQNJku55zKUiwsLCmDdvXokuMYCVK1dSq1YtoqOj+fnnnzEYDNV2DcVEcBEEQagB9vb2zJw5kxUrVmBvb4+zszOHDx8GIDo6mrZt26JWq0str6hevXoxZswYOnfuXKI8Pz8fT09P5HI50dHRmEwmABwdHdFoNFX0CksS3WKCIAg15LHHHiMgIIDNmzfz/vvvM2fOHIqKivDz82PhwoUAZZZXhJOTE2PHjr2jfMSIEbzyyitERUXRpUsXHBwcAAgICEAul/PUU08xePBgAgMDq+aFAjKpJid8/40NHjxYbBYmCI+YP//8s0q/MP/pSns/y/ruFN1igiAIQpUTwUUQBEGociK4CIIgCFWu2oJLYmIiAwcOtP4XHBzMypUrycnJYfTo0fTu3ZvRo0dbV6VKksS7775LaGgo4eHhnDp1ylrXxo0b6d27N71792bjxo3W8pMnTxIeHk5oaCjvvvuuNV9QWW0IgiAINaPagkujRo2Ijo4mOjqaDRs2YG9vT2hoKMuWLaNDhw7ExMTQoUMHli1bBsDu3bu5fPkyMTExzJ8/n7lz5wKWQLFkyRLWrl3LunXrWLJkiTVYzJ07l/nz5xMTE8Ply5fZvXs3QJltCIIgCDWjRrrF9u/fj5+fH3Xq1CE2NpaIiAgAIiIi+P333wGs5TKZjKCgIPLy8khLSyMuLo5OnTrh6uqKi4sLnTp1Ys+ePaSlpVFQUEBQUBAymYyIiAhiY2NL1HV7G4IgCELNqJF1Llu2bLGmfs7MzMTLywsAT09PMjMzAUhNTcXHx8d6jo+PD6mpqXeUe3t7l1pefHx5bQiCINS09PR0IiMjOXHiBM7Oznh4eDB9+vQKbxdcGUVFRcycOZNz584hSRJqtZrly5eTnZ3N+PHjqy0DcmmqPbjo9Xq2b9/O66+/fsdzMpms2tMv1EQbgiAIpZEkiZdffpmIiAgWLVoEwJkzZ8jMzKyW4LJ69Wpq1arFRx99BFjGvm1KyQ5dE6o9uOzevZvmzZtTq1YtwJKdMy0tDS8vL9LS0nB3t2xF6u3tTUpKivW8lJQUvL298fb25o8//rCWp6am8uSTT5Z5fHltCIIglCcqPpn//HqW6zlF1Ha1580+AUS0qXPP9R04cAClUsnw4cOtZc2aNUOSJN5//3327NmDTCZjwoQJhIWFcfDgQT777DPUajXnzp2jX79+NG3alNWrV6PT6fj888+pV68eU6dORaVScfLkSTQaDVOnTqVHjx6kp6dTu3Zta1uNGjWy/ttkMjFz5kzi4+Px9vbmiy++wM7OjuPHjzNjxgzkcjkdO3Zkz549VXKHU+1jLlu2bCmRRC0kJISoqCgAoqKi6NmzZ4lySZJISEhArVbj5eVF586diYuLIzc3l9zcXOLi4ujcuTNeXl44OTmRkJCAJEml1nV7G4IgCGWJik9m2oYTJOcUIQHJOUVM23CCqPjke67z/PnzNG/e/I7ymJgYzpw5Q3R0NCtWrOCDDz4gLS0NsNzZvPPOO2zdupXo6GguX77M+vXrGTp0aIl0+8nJyaxfv56lS5cyZ84cdDodQ4YM4auvvmLYsGEsWrSoRPbjK1eu8Nxzz7FlyxbUajW//vorANOnT2fevHlER0ejKGergsqq1uBSWFjIvn376N27t7Vs7Nix7N27l969e7Nv3z5rHpxu3brh5+dHaGgos2bNYs6cOYAljfRLL73E0KFDGTp0KBMnTsT15g5zc+bMYebMmYSGhlKvXj26du1abhuCIAhl+c+vZykymEqUFRlM/OfXs1Xe1pEjR+jfvz8KhYJatWrRtm1bTpw4AUDLli3x8vJCpVJRr149OnXqBEDTpk1JTv4r0PXr1w+5XE6DBg3w8/MjMTGRwMBAfv/9d8aMGUNubi5Dhw617kpZt25da+qW5s2bk5ycTF5eHhqNhjZt2gDcdVvkyqjWbjEHBwcOHjxYoszNzY1Vq1bdcaxMJrMGlNsVB5bbtWzZstTbt7LaEARBKMv1nKJKlVeEv7+/9Q6holQqlfXfcrnc+lgul1uzGQN3jCUXP3Z0dLSuC5TL5ezatYvevXuXqFehUKDT6Sr9eipDrNAXBEEAarvaV6q8Itq3b49er2fNmjXWsjNnzuDs7MzWrVsxmUxkZWVx+PBhWrVqVam6t23bhtls5urVq1y7do2GDRty5MgR6zpAvV7PhQsXSozB3M7Z2RlHR0eOHTsGYN0muSqIlPuCIAjAm30CmLbhRImuMXsbBW/2CbjnOmUyGUuWLCEyMpKvvvoKW1tb6tSpw/Tp09FoNAwcOBCZTMabb76Jp6cniYmJFa7b19eXoUOHotFoeOedd7C1teXatWvWBehms5lu3brRp0+fEt1pt1uwYAEzZ85ELpfTtm1bnJyc7vn13kqk3L9JpNwXhEdPZVPuV/VsseoydepUunfvTt++fe+7Lo1Gg6OjI2DJbpKWlsbMmTNLPbYyKffFnYsgCMJNEW3q/C2DSXXatWsXS5cuxWQyUbt2bd57770qqVcEF0EQhIdMVQUAgLCwMMLCwqqsvmJiQF8QBEGociK4CIIgCFVOBBdBEAShyongIgiCIFQ5EVw
2018-09-05 00:48:12 +02:00
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x=\"Gr Liv Area\", y=\"SalePrice\", hue=\"Roof Matl\", s=15, data=df);"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 75,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
2020-06-29 01:10:19 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZcAAAEGCAYAAACpXNjrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdeVzU1f748deszLDv4L5r5JJZLriRmLkiqJjV99rP0muLlVpWlqWWqdXt5pJey1uppZnmRUmtpDAXysw1wyW3UEFh2Blg9vn8/vjoKLKIyLie5+PBQzifz3zOmRHmPZ+zvI9CkiQJQRAEQahFyhvdAEEQBOH2I4KLIAiCUOtEcBEEQRBqnQgugiAIQq0TwUUQBEGodeob3YCbRefOnalXr96NboYgCMItJSMjg507d5YrF8HlvHr16pGQkHCjmyEIgnBLGTp0aIXloltMEARBqHUiuAiCIAi1TgQXQRAEodaJMRdBEG47NpuN9PR0zGbzjW7KbUOn01G/fn00Gk21zhfBRRCE2056ejo+Pj40btwYhUJxo5tzy5MkidzcXNLT02nSpEm1HiO6xQRBuO2YzWaCgoJEYKklCoWCoKCgq7oTFHcugiDcHiQJirOgJAeciMBSy6729RR3LoIg3B6Ks2BxFHzcDUz54LTf6Bbd0URwEQTh9uCwgjHz4vdVbFV1PbaxioiIIDY2lkGDBvH0009TVFRUo+ucOHGC2NhY4uLiOH36dJlja9asISYmhpiYGAYNGsRPP/0EQEJCAllZWVe8dnR0NHl5eTVq15WI4CIIwu3Bwxf6vQcNI8EzCJSqcqdIkoTJ6iAj30SJxY7T6b4go9PpSExMZMOGDfj5+bFixYoaXSc5OZm+ffuybt06GjZs6CrPzMzk448/5quvvmL9+vWsWrWKVq1aAbB27VoMBkOtPI+aEsFFEITbg94f7n8CHlkJKi0oyr+92Z0SJ3OKySu1cjKnBMd12oi3ffv2rjuJw4cP8/DDDxMTE8O4ceMoLCystHzr1q0sW7aMlStXMnLkyDLXzM3NxcvLC09PTwC8vLxo0KABP/zwA6mpqUyaNInY2Fi2bNnCs88+63rcL7/8wrhx48q1MTExkfj4eGJjY5k6dSoOh+OanrMILoIg3D7UHuAZAFUMPquU8jHVdRrwdzgc7Nixg+joaABeeeUVJk2axPr162nZsiULFiyotDwqKopHHnmEUaNG8eWXX5a57l133UVwcDC9e/fmtddeY/PmzQD069ePNm3a8MEHH5CYmEhUVBQnT550dX8lJCQwbNiwMtc6ceIE33//PStXriQxMRGlUsn69euv6XmL4CIIwh1Do1LSNNibhoGeNA/1Qq10X4Axm83ExsbSrVs3cnNz6datG0ajEaPRSKdOnQAYMmQIu3fvrrS8KiqVik8//ZT58+fTuHFjZs+ezUcffVTuPIVCQWxsLN9++y1FRUXs27ePnj17ljlnx44dpKamuu5cduzYwZkzZ67p+YupyIIg3FG0aiVatdbt9VwYczGZTIwePZoVK1YwZMiQWq1DoVDQrl072rVrR9euXXn99dd5/vnny503dOhQnnnmGbRaLf369UOtLvvWL0kSQ4YM4aWXXqq1tok7F0EQBDfS6/W88cYbLFmyBL1ej6+vr+uuJDExkY4dO+Lj41NheVWysrI4ePCg6+cjR45Qt25dQB5/KSkpcR0LCwsjNDSURYsWlesSA4iMjGTTpk3k5uYCUFBQQEZGxjU9b3HnIgiC4GZ33303rVq1YsOGDbz33ntMmzYNk8lEgwYNmD17NkCl5ZWx2+289957GAwGPDw8CAwM5K233gLkbrVp06ah0+lYtWoVOp2OmJgY8vLyaNasWblrNW/enAkTJvDkk0/idDrRaDRMnTr1mjZQVEjXY8L3LWDo0KFiszBBuE0cPnyYiIiIG92Mm8rbb79NREQEw4cPr/E1KnpdK3vvFN1igiAIt7mhQ4fy119/ERsbe93qFN1igiAIt7kb0Ssj7lwEQRCEWue24HLy5EliY2NdXx06dGDp0qUUFBTwxBNP8NBDD/HEE0+4VqdKksQ777xDnz59iImJKTMLYu3atTz00EM89NBDrF271lWemppKTEwMffr04Z133nHlC6qsDkEQBOH6cFtwadq0KYmJiSQmJpKQkIBer6dPnz4sXryYyMhIkpKSiIyMZPHixQBs27aNtLQ0kpKSmDFjBtOnTwfkQLFgwQJWr17NN998w4IFC1zBYvr06cyYMYOkpCTS0tLYtm0bQKV1CIIgCNfHdekW27FjBw0aNKBevXokJycTFxcHQFxcnCuL54VyhUJB+/btKSoqwmAwkJKSQrdu3fD398fPz49u3bqxfft2DAYDxcXFtG/fHoVCQVxcHMnJyWWudXkdgiAIwvVxXYLLxo0bGTRoECAnWwsNDQUgJCTEtWgnKyuL8PBw12PCw8PJysoqVx4WFlZh+YXzq6pDEATherj33nvL/JyQkMDbb78NwMqVK1m3bt2NaNZ15fbZYlarlc2bN1eYVkChULh9t7jrUYcgCEJ1Pfrooze6CdeF24PLtm3baN26NcHBwQAEBQVhMBgIDQ3FYDAQGBgIyHckmZmZrsdlZmYSFhZGWFgYv//+u6s8KyuLTp06VXp+VXUIgiBUZN2+DP616S/OFpio66/n5b6tiLu35qvTq/LRRx/h6enJ6NGjGTlyJK1atWLXrl04HA5mzZpFu3bt3FLv9eb2brGNGzcycOBA18/R0dGuW8J169bRu3fvMuWSJLF//358fHwIDQ2le/fupKSkUFhYSGFhISkpKXTv3p3Q0FC8vb3Zv38/kiRVeK3L6xAEQbjcun0ZvJbwJxkFJiQgo8DEawl/sm5fzXNrXciIfOFr/vz5VZ6bmJjItGnTeP3112tc583GrXcupaWl/Prrr66+RoCxY8cyYcIE1qxZQ926dZk7dy4AUVFRbN26lT59+qDX65k1axYA/v7+PPvss8THxwMwbtw4/P39AZg2bRqvvfYaZrOZnj17utJIV1aHIAjC5f616S9MtrIbY5lsDv616a8a371cyIh8QUJCAqmpqRWee+HDd8eOHSkuLqaoqAhfX98a1XszcWtw8fT0ZOfOnWXKAgICWLZsWblzFQoF06ZNq/A68fHxruByqbZt27Jhw4Zy5ZXVIQiCcLmzBaarKq9tl48J3y5jxGKFviAId7S6/vqrKq9t3333HQC7d+/Gx8cHHx+f61Kvu4ncYoIg3NFe7tuK1xL+LNM1pteoeLlvq+tSv4eHB3FxcdjtdtdwwO1ABBdBEO5oF8ZVanO22L59+8r8PHToUIYOHQpQbqfIwYMHM2XKlBrXdbMSwUUQhDte3L313Db1+E4lgosgCMIN8uWXX97oJriNGNAXBEEQap0ILoIgCEKtE8FFEARBqHUiuAiCIAi1TgQXQRAEN8jJyeGll16id+/eDB06lBEjRvDjjz9Wev7OnTt56qmnKjwWHR1NXl6eu5rqFmK2mCAIQi2TJIlx48YRFxfHv//9bwAyMjLYvHnzDW7Z9SPuXARBuCp2hxOnU7rRzahdB1bDnDYw3V/+98Dqa7rcb7/9hkajKbN3S7169Rg5ciTp6ek89thjDBkyhCFDhrB3717XOcXFxYwdO5a+ffsydepUnE5nuWsnJiYSHx9PbGwsU6dOxeFwlDvnZiCCiyAI1WYoMjNl7Z8s3HKcvBLrjW5O7TiwGta/AIVnAEn+d/0L1xRgjh07xt13313hsaCgIJYsWcLatWuZM2cO77zzzsWmHDjAm2++yXfffceZM2dISkoq89gTJ07w/fffs3LlShITE1Eqlaxfv77G7XQn0S0mCEK12B1O/p30F6t2pwNwX8MAujYPvsGtqgXJb4PtsgzINpNc3u7hWqnirbfeYs+ePWg0GpYuXcrbb7/NkSNHUCqVpKWluc5r164dDRo0AORU/Hv27KFfv36u4zt27CA1NdWVJd5sNhMUFFQrbaxtIrgIQjUUmqxY7E60KiX+ntob3ZwbQqlQUD/Q0/VzkPdt8jo
2018-09-05 00:48:12 +02:00
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x=\"Gr Liv Area\", y=\"SalePrice\", hue=\"Roof Style\", s=15, data=df);"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 76,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"del df[\"Roof Matl\"]\n",
"del df[\"Roof Style\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sale Info\n",
"\n",
"Partial and abnormal (= foreclosure) sales seem to make a change with higher and lower prices respectively. These two types will be encoded in factor variables *partial_sale* and *abnormal_sale*. The impact seems to be not big though."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 77,
2018-09-05 00:48:12 +02:00
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"Normal 2396\n",
"Partial 233\n",
"Abnorml 189\n",
"Family 46\n",
"Alloca 22\n",
"AdjLand 12\n",
"Name: Sale Condition, dtype: int64"
]
},
2020-06-29 01:10:19 +02:00
"execution_count": 77,
2018-09-05 00:48:12 +02:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[\"Sale Condition\"].value_counts()"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 78,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
2020-06-29 01:10:19 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZcAAAEGCAYAAACpXNjrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdd3hUxfrA8e/WbHolCST0EmLoUgxVgnRyCYhiQ+GqYAW5NkApIsX7s6GgXhGugnqVYkgkgEYDUqQISCcQekhIr7ub7GbL+f1xYCGkkEACBObzPHnYnT3nzCQk++7MmXlHIUmShCAIgiDUIOWtboAgCIJw5xHBRRAEQahxIrgIgiAINU4EF0EQBKHGieAiCIIg1Dj1rW7A7aJbt24EBQXd6mYIgiDUKampqezatatMuQguFwUFBREdHX2rmyEIglCnjBw5stxyMSwmCIIg1DgRXARBEIQaJ4KLIAiCUOPEPRdBEG4Zi8VCSkoKJpPpVjdFuAadTkdwcDAajaZKx4vgIgjCLZOSkoK7uztNmjRBoVDc6uYIFZAkiZycHFJSUmjatGmVzhHDYoIg3DImkwlfX18RWG5zCoUCX1/favUwRc9FEIRbqqYCiyRJWO0SVpuERqVArRKfnWtSdf+fRHARBOGOYLVLnMw0YLHZ8dBpCPZ2FgHmFhI/eUEQ7giSJGGx2QEwWWxUtlHVldtYffHFFwwdOpTIyEiGDx/OgQMHKq1nypQp/PLLL9VqW0xMDMOGDSMyMpKoqCiWLl1arfMrMmbMGA4dOgTAs88+S2FhIYWFhXz//feOYzIyMpg4cWKN1FcdouciCMIdQalU0MDLmYIiC4GeOlTKssM4kiQhmc1Yc3JQeXlz4Fgif/zxB2vWrEGr1ZKbm4vFYqnRdm3evJlly5axdOlSAgICKCkpISYmpkbrAPjqq68AeZLEDz/8wOOPPw5AQEAAn376aY3Xdy0iuAiCcEdQK5X4uGrxctagUirKvUcgWa2UnDmDZLNhy88nMyMDb29vtFotAD4+Po5jFy1axKZNmzCbzXTs2JHZs2eXuebhw4d57733KCoqwtvbm/nz5+Pv71/qmMWLF/PGG28QEBAAgFar5eGHHwYgMTGRmTNnUlxcTKNGjZg3bx6enp6MGTOGdu3asWvXLvR6PXPnzqVz586YTCamTp3KsWPHaNasWakb7BEREaxevZoPP/yQ5ORkhg8fTvfu3Xn88cd57rnniIuLw2w2M2vWLA4fPoxKpWLKlCncd999REdHs3HjRoqLizl//jwPPPAAb7zxxg39f4hhMUEQ7hhKhXwjv6KbzwoUoFLJj5VKevToQVpaGgMHDmTWrFn89ddfjmOfeOIJfvrpJ+Li4jCZTGzatKnUtSwWC3PmzOHTTz8lOjqaBx98kI8//rhMnSdOnKBNmzbltueNN97gtddeY+3atbRq1YpFixY5XrPZbKxevZpp06Y5yn/44Qd0Oh0bNmzg5Zdf5siRI2Wu+eqrr9KoUSNiY2N58803S712abhs7dq1fPjhh0yZMgWz2QzIgW7BggWsXbuWDRs2kJaWVm6bq0r0XARBuGsoNGq0TZtiNxpRurjgpNEQHR3Nnj172LVrF5MnT+bVV19l5MiR7Nq1iyVLlmAymcjPz6dly5ZEREQ4rnXmzBmSkpIYN24cAHa7nXr16lW5LXq9Hr1eT9euXQEYMWIEkyZNcrzev39/AMLCwkhNTQVg9+7djBkzBoDWrVsTEhJSre9/7969PPHEEwA0b96cBg0acObMGQDCw8Nxd3d3vJaamkr9+vWrdf0rieAiCMJdRanRoPTycjxXqVR069aNbt260apVK2JiYhg6dCjvvPMOP/30E/Xr12fhwoWOT/iXSJJEy5YtWbFiRaX1tWjRgsOHDxMeHl6tdl4aqlMqldhstmqdez0u1Qfyz+RG6xTDYoIg3LVOnz7N2bNnHc8TExNp0KCBI5B4e3tjNBr59ddfy5zbtGlTcnNz2bdvHyAPk504caLMcRMmTOD9998nKysLgJKSElatWoW7uzseHh7s2bMHgNjYWLp06VJpe7t06UJcXBwASUlJHD9+vMwxrq6uGI3Gcs/v3Lkza9euBeSeV1paGs2aNau0zuslei6CINy1ioqKmDNnDoWFhahUKho3bszs2bPx8PDgoYceYtiwYfj5+dG2bdsy52q1Wj799FPmzJmDXq/HZrPx1FNP0bJly1LH9enTh+zsbMaNG4ckSSgUCh588EEA/v3vfztu6Dds2JD58+dX2t5HH32UqVOnMnjwYJo3b05YWFiZY7y9venUqRPDhg2jV69ejlljAI899hizZs0iMjISlUrF/PnzS/VYapJCunLC911s5MiRYrMwQbjJEhMTCQ0NvdXNEKqovP+vit47xbCYIAiCUONEcBEEQRBqnAgugiAIQo2rteBy+vRphg8f7vjq1KkT33zzDfn5+YwbN44BAwYwbtw4CgoKAHla35w5c+jfvz+RkZGlFgetWbOGAQMGMGDAANasWeMoP3z4MJGRkfTv3585c+Y48gVVVIcgCIJwc9RacGnWrBmxsbHExsYSHR2Ns7Mz/fv3Z/HixYSHhxMfH094eDiLFy8GYMuWLZw9e5b4+HjeffddZs2aBciBYtGiRaxcuZJVq1axaNEiR7CYNWsW7777LvHx8Zw9e5YtW7YAVFiHIAiCcHPclGGxHTt20LBhQ4KCgkhISCAqKgqAqKgofv/9dwBHuUKhoEOHDhQWFpKZmcm2bdvo0aMHXl5eeHp60qNHD7Zu3UpmZiYGg4EOHTqgUCiIiooiISGh1LWurkMQBEG4OW5KcFm3bh3Dhg0DICcnx5HYrV69euTk5AByWujAwEDHOYGBgWRkZJQpDwgIKLf80vGV1SEIgnClkJAQ3nvvPcfzpUuXsnDhwpvahivT5t9Jaj24lJSUsHHjRgYNGlTmNYWi/MylNelm1CEIQt2k1WqJj48nNzf3us63Wq013KI7R62v0N+yZQthYWH4+fkB4OvrS2ZmJv7+/mRmZjpSXAcEBJCenu44Lz09nYCAAAICAkplKs3IyKBr164VHl9ZHYIg1G0x+1J5/9fjXMgvpoGXM68PDCGqY9B1X0+tVjN69GiWLVvG5MmTS72WkpLCtGnTyMvLw8fHh/nz59OgQQOmTJmCVqslMTGRTp06UVBQgJOTE4mJieTk5DBv3jxiYmLYv38/7du3d/SMZs6cyaFDhzCbzQwcOPCWbOB1M9V6z2XdunUMHTrU8TwiIsKxUU5MTAz9+vUrVS5JEvv378fd3R1/f3969uzJtm3bKCgooKCggG3bttGzZ0/8/f1xc3Nj//79SJJU7rWurkMQhLorZl8qU6MPkZpfjASk5hczNfoQMftSb+i6jz/+OGvXrkWv15cqnzNnDiNGjGDt2rVERkYyZ84cx2sZGRn8+OOPTJ06FYDCwkJWrFjB1KlTef755xk7dizr1q0jKSmJxMREACZPnkx0dDQ///wzu3fv5tixYzfU7ttdrQaXoqIitm/fzoABAxxl48eP588//2TAgAFs376d8ePHA3L+nYYNG9K/f3+mT5/OzJkzAfDy8uKFF15g1KhRjBo1ihdffBGvixlNZ86cydtvv03//v1p1KgRvXv3rrQOQRDqrvd/PU6xpXSm3mKLjfd/LZu8sTrc3NwYPnw4y5cvL1W+b98+x73i4cOHs3fvXsdrgwYNQnVxXxiAvn37olAoCAkJwc/Pj5CQEJRKJS1atHCky9+wYQMjRowgKiqKEydOcOrUqRtq9+2uVofFXFxc2LVrV6kyb29vli1bVuZYhULhCChXuxRYrta2bVtHhtCq1CEIQt11Ib+4WuXV8dRTTzFy5EhGjhxZpeOdnZ1LPb+U/FGhUJRKBKlUKrFarZw/f57//ve/rF69Gk9Pz1KbdN2pxAp9QRDqhAZeztUqrw4vLy8GDRrE6tWrHWUdO3Zk3bp1gLxzY+fOna/7+kajEWdnZ9zd3cnOznasybuTieAiCEKd8PrAEJw1qlJlzhoVrw+s3m6MFfnnP/9
2018-09-05 00:48:12 +02:00
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x=\"Gr Liv Area\", y=\"SalePrice\", hue=\"Sale Condition\", s=15, data=df);"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 79,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
2020-06-29 01:10:19 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAABZ0AAALUCAYAAABguZwZAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdeVyU5f7/8fcAghuIFgIpkZqD4i7irhUdxBRLTMsWOy7tpUcz205mpbZqm57MtM3sVJq4fEFFRQtJQ3NJC81ckwLsqAiKCgz37w9+TCADDjowAq/n4+EDue7l+sw942fGz33NdZkMwzAEAAAAAAAAAIADuDg7AAAAAAAAAABA9UHRGQAAAAAAAADgMBSdAQAAAAAAAAAOQ9EZAAAAAAAAAOAwFJ0BAAAAAAAAAA5D0RkAAAAAAAAA4DAUnVEuQUFBGjFihLPDuOLMmjVLQUFBSkpKKtZ+KdfrmWeeUVBQkFJSUhwZIqqA6OhoBQUFKTo62tmhoIKQQ20jh1Y+Xou8VqojXte2kWOvHCkpKQoKCtIzzzxTrL06XNfq8Bhw6ci/tpF/K5+ta1va84CK5+bsAOA4FotFS5Ys0YoVK7Rv3z6dOXNGXl5euvrqq9W+fXuFhYXp5ptvdnaYdsnOztaiRYu0fv16/fbbb8rKylLt2rV13XXXqVevXho6dKgCAgKcHeYlmTVrlmbPnq0FCxaoW7duzg7H6QqvhyS98MILuueee0rsEx0drWeffVYPP/ywJkyYUNkhooYgh1YN5NDSnT17Vn369FFWVpYiIyM1c+ZMZ4cEWJFjqwZybOmckWODgoIkSb/++muF94Xqi/xbNZB/S8dn3KqLonM1YbFY9NBDD2njxo3y8vLSDTfcID8/P+Xm5mr//v2KiYnRwYMHq8Sbyc6dOzVu3Dilp6fLz89PN9xwgxo3bqzs7Gzt2bNH8+bN00cffaSvv/5abdq0cXa4ZVq5cqXq1KlTrmOeeOIJPfDAA/L19a2gqK5c//nPf3Tbbbepfv36zg4FNQw59MpEDi2flStXKisrSyaTSWvWrNHJkyfVsGFDZ4cFkGOvUOTY8qnIHFuTrysqFvn3ykT+LR8+41ZdFJ2riZiYGG3cuFGtWrXSwoUL5enpWWz72bNn9dNPPzkpOvsdOHBAY8aMUXZ2tiZOnKjRo0fLza34y/To0aOaMWOGTp8+7aQo7deiRYtyH9O4cWM1bty4AqK5sgUGBurIkSOaN28eo5lR6cihVyZyaPksWrRILi4uGj16tObPn69ly5Zp1KhRzg4LIMdeocix5VORObYmX1dULPLvlYn8Wz58xq26mNO5mtixY4ckKSoqqsQbiSTVqVNH3bt3L9aWlZWl+fPn67777lPfvn3Vtm1bde/eXQ8//LD1fPbKy8vTF198oTvuuEOdO3dWhw4dNHjwYC1cuFD5+fl2n2fatGk6ffq0HnjgAT344IMl3kgkKSAgQO+++646depUrP3w4cN66qmn1KdPH7Vt21a9e/fWU089pcOHD5c4R9E5fVavXq2hQ4eqQ4cO6tq1qyZMmKD09HSb8f38888aM2aMOnXqpM6dO2vkyJFlXqsL5xMKCwuzTiVx3333KSgoyPqnUFlzNa1cuVL33HOPQkJC1L59ew0aNEhz585VTk5OiX3DwsIUFham7Oxsvf7667rxxhvVtm1bhYeH68MPP5RhGKXG7Qz33nuvGjdurE8//VRpaWl2H3fs2DG99NJLCgsLs76GH3/8cf38888l9i06Z3JCQoJGjBihkJAQ6/Uvuv3777/X3XffrU6dOql79+569tlnlZmZKUlKTk7WQw89pNDQUHXq1EkPP/ywzefr559/1rRp03Trrbeqa9euateunfr166fXXntNp06dusQrhYpADiWHXqiq5dB9+/Zp586d6tGjhx544AHVqlVLixcvvuhx6enpmjRpknr06KH27dtryJAh+r//+78S+yUlJSkoKEizZs3Snj179OCDD6pLly7q0KGD7r33Xm3fvt3m+bOysjRz5kxFRESoXbt2Cg0N1ZgxY7Rp06Yy+9i1a5cefPBBde3a1fp8Ft2+e/dujRkzRiEhIQoNDdXYsWOVmpoqqeA/nRMmTFD37t3Vvn17jRgxQnv37i3nFYUjkWPJsReq7jn29OnTevXVV9W3b1+1a9dO/fv31yeffFLq43LEXK3r1q3Tk08+qYiICHXs2FEdO3bUkCFDtGDBApuv86J9fvXVVxo0aJDatWunnj17avLkycrKyrLZz6ZNm3T33XerY8eO6tq1qx599FEdOHDgkuNGxSL/kn8vVN3zb3lt3rxZY8aMUdeuXdW2bVtFRERoxowZpebAjIwMvf3224qMjFSHDh0UEhKiW2+9VTNmzFB2drZ1P2oBBRjpXE14e3tLks3EWZoDBw7onXfeUZcuXXTjjTfKy8tLqampWr9+vTZu3Kg5c+aob9++Fz1Pbm6uHn74YSUmJqpZs2aKjIyUh4eHkpKSNHXqVP3000968803L3qeo0ePatOmTfLw8ND9999/0f3d3d2tf9+1a5dGjRqlM2fOKCwsTNdff70OHjyoFStWKD4+Xp988onat29f4hz//e9/tX79eoWFhSk0NFS7du3SypUrtXfvXi1fvrxYH9u3b9eoUaOUm5ur8PBwBQYGas+ePRoxYkSJN+rS3HfffYqPj9eWLVsUFRWlJk2a2HWcJL311luaO3euGjZsqMjISNWtW1cbN27UW2+9pcTERH300UfF4pUKnpsxY8bo2LFj6tu3r1xdXbVu3TrNnDlTOTk5evzxx+3uv6LVqVNH//rXv/Tvf/9bb7/9tl5//fWLHnP06FHdfffdOnbsmLp3766BAwcqNTVVq1ev1rfffqtZs2bppptuKnFcXFycNm7cqL59+2r48OH6888/i21fv369vv32W914440aPny4duzYoejoaKWkpGjixIkaOXKkQkJCNHToUO3bt08bNmxQSkqKVqxYIReXv+/lLVq0SOvWrVNoaKh69uyp/Px8/fLLL/rkk0+UkJCgRYsWMZXIFYIcSg6t6jl00aJFkgr+U+nt7a2wsDDFxcXpxx9/VJcuXWwec+rUKd11113y9PTUkCFDlJWVpVWrVunJJ59Uenq6zdfRzz//rPnz56tjx44aNmyY/vzzT61Zs0YjR47UsmXL1Lx5c+u+mZmZuuuuu7R//361a9dO//znP3Xy5EmtWrVKo0eP1osvvqjhw4eX6GPnzp2aO3euQkJCdPvtt+vkyZOqVauWdfvu3bs1b948hYaG6o477tC+ffu0Zs0a7du3T++//77uvvtuNW/eXIMHD7bGN2rUKK1bt0716tW73EuNS0COJcfWpBybk5OjkSNHavfu3WrVqpUGDRqkrKwsvf/++9qyZUuFxThjxgy5uLioffv28vX1VVZWln744QdNnz5du3fvLvV1/uabbyoxMVE33XSTevXqpaSkJC1atEhHjhzRggULiu27evVqTZgwQbVq1dKAAQPk4+Ojbdu2afjw4cWKY7hykH/JvzUp/5bXV199pRdffFF16tRR//79ddVVV2nLli2aN2+eNmzYoC+//FJeXl7W/Y8ePap//vOf+uOPP9SmTRvdddddys/P1+HDh/Xpp59q+PDhqlu3rjVuagGSDFQLv/zyi9GmTRsjKCjIePLJJ424uDgjJSWlzGMyMzON48ePl2hPTU01evXqZfTv37/ENrPZbNx7773F2t577z3DbDYbL7/8spGXl2dtz8vLM5599lnDbDYba9euvehjWLp0qWE2m43hw4dfdN+i8vPzjf79+xtms9lYvnx5sW2xsbGG2Ww2IiIiDIvFUiLmTp06GXv37i12zBNPPGGYzWYjNja2WB8RERE2H8unn35qmM1mw2w2Gz/88EOxbWVdrwv3LfT0008bZrPZOHr0qLVt+/bthtlsNm644Qbj2LFj1vbc3FzjoYceMsxmszFnzpxi57npppsMs9ls3H///cbZs2et7f/73/+MkJAQIyQkxMjJybEZw4V++OEH47333ivXH3s
2018-09-05 00:48:12 +02:00
"text/plain": [
"<Figure size 1440x720 with 6 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plot = sns.lmplot(\n",
" x=\"Gr Liv Area\", y=\"SalePrice\", col=\"Sale Condition\", hue=\"Sale Condition\",\n",
" data=df, robust=True, col_wrap=4, ci=None, truncate=True, scatter_kws={\"s\": 15},\n",
")\n",
"# Adjust font sizes.\n",
"for ax in plot.axes:\n",
" ax.set_title(ax.get_title(), fontsize=20)\n",
" ax.set_xlabel(ax.get_xlabel(), fontsize=16)\n",
" ax.set_ylabel(ax.get_ylabel(), fontsize=16)"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 80,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"df[\"partial_sale\"] = df[\"Sale Condition\"].apply(lambda x: 1 if x == \"Partial\" else 0)\n",
"df[\"abnormal_sale\"] = df[\"Sale Condition\"].apply(lambda x: 1 if x == \"Abnorml\" else 0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Homes that are sold for the first time cleare are priced higher. A factor variable *new_home* is introduced."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 81,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
2020-06-29 01:10:19 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZcAAAEGCAYAAACpXNjrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdd3hUVfrA8e/UlEmZ9EASejHSEaSDggnNSKhWVlkRC4q9S1GK+3NVLKgLtlURFZCAUjQrIBDpkBCq1EBImfTMTCbT7++PiwMhIQSSAMHzeR4eMyf33nMSk3lz7j3nfRWSJEkIgiAIQh1SXu0BCIIgCNcfEVwEQRCEOieCiyAIglDnRHARBEEQ6pwILoIgCEKdU1/tAVwrevToQVRU1NUehiAIQoOSlZXFtm3bKrWL4HJGVFQUy5Ytu9rDEARBaFBGjRpVZbu4LSYIgiDUORFcBEEQhDongosgCIJQ50RwEQRBEOqcCC6CIAhCnRPBRRAEQahzYimyIAjXBUmSyDfZKCyzE+7vRYif19Ue0t+aCC6CIFwX8k02bv8whTyTjfgbI3hrTEf0vtqrPay/LXFbTBCE64Ld5SbPZANgf7YRu9N9wWNFGav6J4KLIAjXBX9vNdMTbqR7syA+uKszQTpNpWOcLjeHcoy89GM6O08WYXW4rsJI/x7EbTFBEK4LgT5a7unRhJFdovD31qBSKiodU2Sxc9enWymxOEhKzWbTi7firVFdhdFe/8TMRRCE64aXWoXeV1tlYAFQoiDQR57R+HmrqfoooS6ImYsgCH8bof5efD+pJ9tPFNG1SRChYkVZvRHBRRCEv5VGgT6M6CzKa9Q3cVtMEARBqHMiuAiCIAh1TgQXQRAEoc6J4CIIgiDUORFcBEEQhDongosgCIJQ5+otuBw/fpwRI0Z4/nXt2pX//ve/lJSUMGHCBOLj45kwYQKlpaWAnOtn1qxZxMXFkZCQwP79+z3XSkpKIj4+nvj4eJKSkjzt+/btIyEhgbi4OGbNmuXJF3ShPgRBEIQro96CS4sWLVixYgUrVqxg2bJl+Pj4EBcXx4IFC+jVqxfJycn06tWLBQsWALBx40YyMjJITk5m5syZzJgxA5ADxbx581i8eDFLlixh3rx5nmAxY8YMZs6cSXJyMhkZGWzcuBHggn0IgiAIV8YVuS22ZcsWYmJiiIqKYu3atSQmJgKQmJjIb7/9BuBpVygUdO7cGaPRSF5eHikpKfTp0we9Xk9gYCB9+vRh06ZN5OXlYTab6dy5MwqFgsTERNauXVvhWuf3IQiCIFwZVyS4rFq1ittvvx2AwsJCwsPDAQgLC6OwsBAAg8FAZGSk55zIyEgMBkOl9oiIiCrb/zq+uj4EQRCEK6Peg4vdbmfdunUMGTKk0ucUCgUKRf2mjrsSfQiCIAgV1Xtw2bhxI+3atSM0NBSAkJAQ8vLyAMjLyyM4OBiQZyS5ubme83Jzc4mIiKjUbjAYqmz/6/jq+hAEQRCujHoPLqtWrWL48OGe1wMHDmT58uUALF++nEGDBlVolySJtLQ0/P39CQ8Pp2/fvqSkpFBaWkppaSkpKSn07duX8PBw/Pz8SEtLQ5KkKq91fh+CIAjClVGvWZEtFgubN2/mjTfe8LRNmjSJp556iqVLl9K4cWPee+89AAYMGMCGDRuIi4vDx8eHOXPmAKDX63nssccYM2YMAJMnT0av1wMwffp0Xn75ZaxWK/3796d///7V9iEIgiBcGQpJFJMGYNSoUSxbtuxqD0MQBKFBudB7p9ihLwiCINQ5EVwEQRCEOieCiyAIglDnRHARBEEQ6pwILoIgCEKdE8FFEARBqHMiuAiCIAh1TgQXQRAEoc6J4CIIgiDUORFcBEEQhDongosgCIJQ50RwEQThkjhdbtxukZJQqJ4ILoIg1Fie0cqrSXv56PejFJXZr/ZwhGtYvabcFwTh+uF0uXkn+U9+2HkagJuaBNG7VehVHpVwrRLBRRBqoLTcjs3pRqtSovfVXu3hXBVKhYLoYF/P6xC/a//74HA4OH36NFar9WoPpcHz9vYmOjoajUZTo+NFcBGEc5nzwWUFjS/4hgByYPlo3TE+SznOXd2b8MKQthcNMG63hEuS0KiunzvPSqWC+3o0pWuTIEJ0Whrrfa72kC7q9OnT+Pv706xZMxQKxdUeToMlSRKFhYWcPn2a5s2b1+ic6+cnXxBqq6wAfvwnzG0PyVOhvBgAm9PNpynHcUuwaPspbA53tZcpsdj5aksGzy9JJ6uk/AoM/MoJ0mnp0yqUGxoF4O9ds79gryar1UpISIgILLWkUCgICQm5pBlgvQYXo9HIlClTGDJkCEOHDiU1NZWSkhImTJhAfHw8EyZMoLS0FJAj46xZs4iLiyMhIYH9+/d7rpOUlER8fDzx8fEkJSV52vft20dCQgJxcXHMmjWLv4pqXqgPQaiW0wYnNsof71sKTvmBtValZFy3aABu79AIrbr6X5ucUiuv/3yA5WlZTF2+D7PVUa/DFqonAkvduNTvY70Gl9mzZ9OvXz9++eUXVqxYQcuWLVmwYAG9evUiOTmZXr16sWDBAgA2btxIRkYGycnJzJw5kxkzZgByoJg3bx6LFy9myZIlzJs3zxMsZsyYwcyZM0lOTiYjI4ONG+U3hgv1IQjV0vhAj0fAOxBufU1+Deh9tbw0NJatLw/ijcT2BOmqvyXm761GpZR/EaODfFBfR7fGBKGm6u2n3mQysWPHDsaMGQOAVqslICCAtWvXkpiYCEBiYiK//fYbgKddoVDQuXNnjEYjeXl5pKSk0KdPH/R6PYGBgfTp04dNmzaRl5eH2Wymc+fOKBQKEhMTWbt2bYVrnd+HIFTLNxhueRkmb4duE8A7wPOpIF8tkYHeBF8ksAAE67T87+n+fP3Pm3nqttZ4a1T1OWrhGvfJJ58wfPhwEhISGDFiBHv27Kn2+JdeeolffvmlxtceMWIEI0aMIDY21vPx119/XRdDr5V6e6B/+vRpgoODefnllzl06BDt2rXj1VdfpbCwkPDwcADCwsIoLCwEwGAwEBkZ6Tk/MjISg8FQqT0iIqLK9r+OBy7YhyBUy2oCpRr8Iy9+bDV8tWpahPnRIsyvjgYmNFSpqan8/vvvJCUlodVqKSoqwuGou9ukjz76KI8++igAXbp0YcWKFXV27dqqt+DidDo5cOAAU6dOpVOnTsyaNavS7SmFQlHv90OvRB/CdcCUCyuflleJDXkT/MKv9oiE60B+fj5BQUFotfKMNzg42PO5efPmsX79emw2G126dOGNN96o9F61b98+/vWvf2GxWAgKCuLNN9/0/OF8Ie+//z6BgYE88MADAMydO5fg4GBuuOEGPvjgA3Q6HSdPnqRHjx7MmDEDpVJJSkoKH374IXa7nZiYGN588010Ol2tvvZ6uy0WGRlJZGQknTp1AmDIkCEcOHCAkJAQ8vLyAMjLy/N8syMiIsjNzfWcn5ubS0RERKV2g8FQZftfxwMX7EMQLmjbAvhztfwg/8+a3ZK4XpU7nOSZrJhtzqs9lAavT58+5OTkMHjwYGbMmMH27ds9n7vvvvv48ccfWblyJVarlfXr11c41+FwMGvWLD744AOWLVvG6NGjmTt37kX7HD16tGcG43a7WbVqFXfccQcA6enpTJ06ldWrV5OZmUlycjJFRUV88sknfPnllyQlJdG+fXu+/PLLWn/t9RZcwsLCiIyM5Pjx4wBs2bKFli1bMnDgQJYvXw7A8uXLGTRoEICnXZIk0tLS8Pf3Jzw8nL59+5KSkkJpaSmlpaWkpKTQt29fwsPD8fPzIy0tDUmSqrzW+X0IwgVF3yT/V6GAiBvrvz9zHqR9B4b9YL92liuX210k7zcw6uPNzN9wjBKLSPFSGzqdjmXLlvHGG28QHBzM008/zbJlywDYtm0bY8eOJSEhga1bt3L06NEK5544cYLDhw8zYcIERowYwSeffOK59V+d6Oho9Ho9Bw4cICUlhRtvvJGgoCAAOnbsSExMDCqViuHDh7Nr1y727NnD0aNHufvuuxkxYgTLly8nOzu
2018-09-05 00:48:12 +02:00
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x=\"Gr Liv Area\", y=\"SalePrice\", hue=\"Sale Type\", s=15, data=df);"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 82,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"df[\"new_home\"] = df[\"Sale Type\"].apply(lambda x: 1 if x == \"New\" else 0)"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 83,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
2020-06-29 01:10:19 +02:00
"new_variables.extend([\"partial_sale\", \"abnormal_sale\", \"new_home\"])"
2018-09-05 00:48:12 +02:00
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 84,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"del df[\"Sale Condition\"]\n",
"del df[\"Sale Type\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Show summary of counts:"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 85,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"partial_sale 233\n",
"abnormal_sale 189\n",
"new_home 227\n",
"dtype: int64"
]
},
2020-06-29 01:10:19 +02:00
"execution_count": 85,
2018-09-05 00:48:12 +02:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[[\"partial_sale\", \"abnormal_sale\", \"new_home\"]].sum()"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 86,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>partial_sale</th>\n",
" <th>abnormal_sale</th>\n",
" <th>new_home</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Order</th>\n",
" <th>PID</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <th>526301100</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <th>526350040</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <th>526351010</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <th>526353030</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <th>527105010</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" partial_sale abnormal_sale new_home\n",
"Order PID \n",
"1 526301100 0 0 0\n",
"2 526350040 0 0 0\n",
"3 526351010 0 0 0\n",
"4 526353030 0 0 0\n",
"5 527105010 0 0 0"
]
},
2020-06-29 01:10:19 +02:00
"execution_count": 86,
2018-09-05 00:48:12 +02:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[[\"partial_sale\", \"abnormal_sale\", \"new_home\"]].head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Street Name\n",
"\n",
"Looking at the value counts this variable is pretty useless."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 87,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Pave 2886\n",
"Grvl 12\n",
"Name: Street, dtype: int64"
]
},
2020-06-29 01:10:19 +02:00
"execution_count": 87,
2018-09-05 00:48:12 +02:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[\"Street\"].value_counts()"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 88,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"del df[\"Street\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Age & Remodeling\n",
"\n",
"The dataset was put together over several years. Therefore, the variables with year numbers need to be aligned to indicate the right ages."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 89,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"# For one house the year of being remodeled is one year\n",
"# before it was built. That input error is corrected.\n",
"input_error = (df[\"Year Remod/Add\"] < df[\"Year Built\"])\n",
"assert input_error.sum() == 1\n",
"df.loc[input_error, \"Year Remod/Add\"] = df.loc[input_error, \"Year Built\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Introduce a factor variable *remodeled*. Almost half the houses were remodeled at some point in time."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 90,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
2020-06-29 01:10:19 +02:00
"46"
2018-09-05 00:48:12 +02:00
]
},
2020-06-29 01:10:19 +02:00
"execution_count": 90,
2018-09-05 00:48:12 +02:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"remodeled = (df[\"Year Remod/Add\"] > df[\"Year Built\"])\n",
"df[\"remodeled\"] = 0\n",
"df.loc[remodeled, \"remodeled\"] = 1\n",
"round(100 * remodeled.sum() / df.shape[0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create discrete variables *years_since_built* and *years_since_remodeled*."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 91,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"df[\"years_since_built\"] = df[\"Yr Sold\"] - df[\"Year Built\"]\n",
"df[\"years_since_remodeled\"] = df[\"Yr Sold\"] - df[\"Year Remod/Add\"]"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 92,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
2020-06-29 01:10:19 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEICAYAAACktLTqAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3deVhUV5rH8W8BKTEiqEQoNUx6NNrhwQTt1kZcWwyiIi2uSUxMpO3WMSZEUQzqaJy0EGMbpZNnFuk8cUgm40y0FZM4HReM6CQuWTSmlayOERWKCYKIC+udP2hrBNksWaquv8/z+DzWqbq33nvr1Mupc889x2IYhoGIiJiKR1sHICIizU/JXUTEhJTcRURMSMldRMSElNxFRExIyV1ExISU3F3M+fPn6d+/P5WVla3+3ocPH2b48OFOb9+/f39ycnIASEpKYv369c0VmriYtqynriYiIoKPP/640dedPXuWn/70p1RUVNzyezizrZK7i+nevTtHjx7F09OzrUO5ZUePHiUoKOim8tv9oyGux53r6Z1Cyf02OfNXWKS1mbmemvnYboepk/vrr7/Os88+W6Ns1apVrFq1ikuXLrF06VKGDh3KsGHDWL9+veMn5pkzZ3jyyScJCwsjLCyMhQsXUlxc7NhHREQEaWlpxMTE0K9fPyoqKkhLS2PYsGH079+fqKgoDh482GBsx48fZ9KkSfzsZz9j8ODBvPTSS8DNP79mzJhBamoqjz76KP379+fXv/41Fy5ccOzn008/5dFHH2XAgAGMGDGCrVu3AlBWVsbLL7/ML3/5SwYPHsyKFSu4du1ak87bv/zLvxAWFkZERATvvvuuo3zGjBls3rzZ8Xjr1q089thjjsc//elP+eGHH2rs68qVK/z2t78lPz+f/v37079/f+x2e5PiuFOont5aPb3+SzAtLY0hQ4awZMkSqqqqSEtL4+GHHyYsLIznnnuOoqKiGrH+6U9/YsSIEQwcOJBNmzZx/PhxYmJiGDBgAC+++KJj/1VVVfzTP/0TI0eOJDw8nMWLF3Pp0iXH8xkZGYwcOZKwsDD++Z//uUZsDcVRW0OfbWVlJS+//DJhYWGMGjWKrKysBs9JnQwTs9vtRmhoqHHx4kXDMAyjvLzcGDRokPHll18aTz/9tLF8+XLj8uXLxo8//mhMnjzZ2LRpk2EYhnH69Gnjv//7v43S0lKjoKDAmD59urFq1SrHfkeOHGn86le/Ms6fP29cvXrV+P77743hw4cbeXl5hmEYRk5OjvHDDz80GNu0adOMbdu2GYZhGCUlJcbRo0cd2/bp08coLy83DMMwnnjiCWPUqFHGqVOnjKtXrxpPPPGE8fvf/94wDMM4e/as0a9fP+O9994zysrKjAsXLhgnT540DMMwkpOTjTlz5hiFhYXGpUuXjDlz5hhr165tMKZDhw4ZwcHBRkpKilFaWmocPnzYCA0NNb7//ntHLO+8847j9X/605+MRx991PG4T58+xunTpw3DMIznn3/eWLdunWO/w4YNa/C972Sqp87V0zVr1hilpaXG1atXjX/91381pk6dauTm5hqlpaXG8uXLjQULFtSIdfny5ca1a9eMAwcOGH379jXmzp1r/Pjjj0ZeXp4xaNAg4/Dhw4ZhGMbmzZuNhx9+2Dhz5oxRUlJizJs3z1i0aJFhGIbx7bffGv369TOOHDlilJaWGikpKUZwcLDx0UcfGYZhNCmO6+esoc/23//9342oqCjj/PnzRmFhofHEE0/U2LYpTN1yDwgIYMCAAXzwwQcAHDhwgM6dO2Oz2cjKymLp0qXcfffd+Pv7M3PmTHbs2AHAfffdx5AhQ7BarXTp0oW4uDg++eSTGvueMWMG3bp1w9vbG09PT8rKyvj+++8pLy/n3nvv5W/+5m8ajM3Ly4szZ85w4cIFOnToQL9+/ep97aRJk/jbv/1bvL29GTNmDNnZ2QC8//77DB48mPHjx3PXXXfRuXNngoODMQyDd955h6VLl9KpUyd8fHyYM2eO4/ga89xzz2G1WvnFL37BiBEj+POf/9yk7cQ5qqe3Xk89PDyIj4/HarXi7e3Nf/zHf7BgwQJsNhtWq5VnnnmGnTt31uiymTdvHu3atWPo0KHcfffdjB8/Hn9/fwIDAxkwYAAnT54E4L333mPmzJkEBQXRoUMHEhIS+K//+i8qKir44IMP+OUvf8nAgQOxWq0899xzeHj8fxptShwAP/74Y4Of7Z///GeeeuopunXrRqdOnZgzZ06j56Q2r1vews1MnDiRTZs2MW3aNN59910mTJjA+fPnqaioYOjQoY7XVVVV0a1bN6D6xCcnJ/Ppp59y+fJlDMPA19e3xn6vvxaqv2RLly7ltdde47vvvmPo0KEkJSURGBhYb1zJycm8+uqrjB07lnvvvZdnnnmGkSNH1vnarl27Ov7fvn17rly5AkBubm6dX84LFy5w9epVJk2a5CgzDIOqqqqGThUAvr6+3H333Y7H3bt3Jz8/v9Ht5PaonlZraj3t3Lkz7dq1czw+f/488+bNq5FoPTw8KCgocDz29/d3/L9du3Y3Pb4eb35+Pj169HA816NHDyoqKigoKCA/Px+bzeZ47u6776ZTp063FMf11zX02ebn59f47Lp3797oOanN9Mn94YcfZuXKlXzzzTfs27ePxMREvLy8sFqtHDp0CC+vm0/BunXrsFgsvPfee3Tq1Ik9e/bU6JMDsFgsNR7HxMQQExNDSUkJK1asYO3atfz+97+vN66f/OQnrFu3jqqqKnbt2kV8fDyHDx++pWPr1q0bx48fv6m8c+fOeHt7s2PHjga/uHUpLi7mypUrjgSfm5tL7969geov7NWrVx2v/fHHH5u0z9rnSm6menpr9bT2cdlsNlJSUvj5z39+02vPnj17S/sOCAjg3Llzjsfnz5/Hy8sLf39/AgIC+P777x3PXb16tUafelPjuN6yr++z7dq1K7m5uY7HN/6/qUzdLQPVf5GjoqJYuHAhDz74IN27dycgIIAhQ4awevVqSkpKqKqq4syZMxw5cgSAy5cvc/fdd9OxY0fsdjuvv/56g+9x6tQpDh48SFlZGVarlXbt2tX4y12X7du3c+HCBTw8PBytrca2qS0mJoaPP/7Y8ZOxsLCQ7OxsPDw8mDp1KikpKY4Wg91u58CBA03a72uvvUZZWRmffvop+/btY8yYMQAEBweze/durl69yg8//MCWLVuatD9/f3+KiopqXJSSmlRPb72e3uixxx4jNTXVkZQvXLjAnj17bnk/AOPHjyc9PZ2cnBwuX77M+vXrGTt2LF5eXkRFRbFv3z4+/fRTysrKePXVV2v80mhqHI19tmPHjuWtt94iLy+PixcvkpaWdsvHYfrkDhAbG8s333zDhAkTHGVr1qyhvLyccePGMXDgQOLj4/nf//1fAJ555hlOnjzJgAEDmD17NqNHj25w/2VlZbzyyiuEhYUxdOhQLly4QEJCQoPbHDhwgOjoaPr3709ycjLr16/H29v7lo6re/fu/PGPf2Tjxo384he/IDY2lq+++gqAxMRE7rvvPqZNm8bPfvYzZs6cyf/8z/80us977rkHX19fhg0bxqJFi1i5ciW9evUC4KmnnuKuu+5i8ODBPP/888TExDQpzl69ehEdHc3DDz/MgAEDNFqmHqqnTa+ntT355JNERETw61//mv79+zNt2rQ6fy00xeTJk/nVr37FE088wahRo7BarSxfvhyA3r17s2LFChYtWsSwYcPw9fWt0U1zK3E09NlOmzaNoUOHMmHCBCZOnNjoZ1sXi2GYf7GO8+fPM3bsWD766CN8fHzaOhyROqmeSnMyfcu9qqqKjRs3Mm7cOH1hxGWpnkpzM/UF1StXrjBkyBC6d+/eaH9kS/jNb37DZ599dlP5nDlz+Lu/+7tWjweqb1DasGHDTeU///nP2+QcieppXVRPb98d0S0jInKnMX23jIjIncglumXCwsJq3DRQ2/WhW+5K8beOc+fO3fIY7LbSWJ2/kbucf1CsLaGhOBuq8y6R3Hv06OGYSKgu2dnZBAcHt2J
2018-09-05 00:48:12 +02:00
"text/plain": [
"<Figure size 432x288 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df[[\"years_since_built\", \"years_since_remodeled\"]].hist(bins=20);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Two factor variables *recently_built* and *recently_remodeled* are created indicating that the corresponding action took place in the last 10 years. The two scatter plots below suggest that these groups of \"recent vs. old\" affect the price."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 93,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"df[\"recently_built\"] = df[\"years_since_built\"].apply(lambda x: 1 if x <= 10 else 0)\n",
"df[\"recently_remodeled\"] = df[\"years_since_remodeled\"].apply(lambda x: 1 if x <= 10 else 0)"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 94,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
2020-06-29 01:10:19 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZcAAAEGCAYAAACpXNjrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOydd3iTVfvHP0nadG862FtkFCsis4AWy66UjQOFF8SBAqKgqICvAq5XxVccoL7+HIgMKShbiyyZyihbVqGMJh1p05WkSc7vj0NbCm0p0FKQ87muXH1y8jznnLTNc+ec+76/t0YIIVAoFAqFogLRVvUEFAqFQvHPQxkXhUKhUFQ4yrgoFAqFosJRxkWhUCgUFY4yLgqFQqGocFyqegI3C23btqVmzZpVPQ2FQqG4pTh79izbt2+/rF0ZlwvUrFmTJUuWVPU0FAqF4paif//+JbarbTGFQqFQVDjKuCgUCoWiwlHGRaFQKBQVjvK5lEF+fj5nzpzBYrFU9VQUl+Du7k6tWrVwdXWt6qkoFIoSUMalDM6cOYOPjw/16tVDo9FU9XQUFxBCkJaWxpkzZ6hfv35VT0ehUJSA2hYrA4vFQlBQkDIsNxkajYagoCC1olQobmLUyuUKKMNyc6L+LopLEUKQkmUlLcdGiI8bQd5uVT2l2xplXBQKxT+ClCwrfT7ejDHLSrdmobw7sCX+nvqqntZti9oWUygU/whsDifGLCsAB86ZsdmdpZ6rylhVPsq4KAoxm83Mmzev8PmZM2fo06fPVffz8ccf89VXX13VNa+++irHjh0DICoqivT09Mvmo1CUhY+7C9NimnFvvQD+OzSCAK/LIwntDieHz5t5+acE/jyVjiXfUQUzvT1QxuUWQgiB01n6t7HrxWw2M3/+/ErrvyxmzJhBo0aNbpr5KG49/Dz0PNy2Dl881pqIOgG46nSXnZOea2PoF9tY8OcZHp67ncy8/CqY6e2BMi43OWfOnKF79+5MmjSJPn368OmnnzJgwABiYmL473//W3je0qVLiYmJ4cEHH2TixIkApKen89xzzzFgwAAGDBjAX3/9BciVxeTJkxk2bBhdu3bl22+/BeD999/n9OnT9O3bl3feeafYPB555BEOHTpU+Pyhhx7i8OHDpc778OHDDBkyhG7durFw4UIAtm/fzpNPPll4zhtvvFGo5zZs2DD27dtXrI+y5qNQlISbiw5/Tz06bckBH1o0+HnIFY23uwsqLKTyUA79W4BTp07xzjvvkJ2dzZo1a1i8eDFCCJ5++ml27tyJv78/n332GfPnzycwMJCMjAxArgYef/xxWrduzblz5xg5ciSrVq0C4OTJk3z77bdkZ2fTs2dPHnroIV544QWOHj3KsmXLAGnYChg4cCBLlizh1Vdf5eTJk1itVu68885S53zkyBEWLlxIbm4u/fr1o0uXLlf9vi+dj0JxvVTzcePH0e3YcTKdVnUCqKYiyioNZVxuAWrUqEFERATvvPMOf/zxB7GxsQDk5uaSmJiIxWKhR48eBAYGAuDv7w/Ali1bCv0YANnZ2eTk5ADQpUsX9Ho9gYGBBAYGkpaWVuYcevTowaeffsqkSZP46aefSlVCLaBr1664u7vj7u5O27Zt2bdvHz4+Ptf8O1AoKorqfh70jVDlNSobZVxuATw9PQHpcxk9ejRDhw4t9vp3331X4nVOp5OFCxfi5nb5tzO9vihEU6fTYbfby5yDh4cHHTp0ID4+nlWrVl2xPEFJeSg6na6Yz8hqtZbZh0KhuHVRPpdbiMjISH766afC1YfBYCAtLY127dqxevVqTCYTQOG2WGRkZDHDc7HPpCS8vLwK+y6JQYMGMX36dMLDw/Hz8yuzr/j4eKxWKyaTiR07dhAeHk7NmjU5fvw4NpsNs9nM1q1br2s+CoXi5kWtXG4hIiMjOX78eOHKxdPTk/fee4/GjRvz1FNPMWzYMLRaLc2aNePtt9/m1Vdf5Y033iAmJgaHw0Hr1q154403Su0/ICCAVq1a0adPHzp16sQjjzxS7PUWLVrg7e19xS0xgCZNmvDYY49hMpl45plnCA0NBeT2Wp8+fahVqxbNmjUrs49L5/PSSy9dcVyFQnFzoBEqmwiQ1dQu3eo5dOgQTZs2raIZ3XwYDAYee+wxVq1ahVZb9Yte9fdRKKqeku6doLbFFOVk6dKlDB48mPHjx98UhkWhUNzcqG0xRbmIjY0tjFIr4KeffirMkSmgVatWTJs27UZOTaFQ3IRUmnE5ceIEzz//fOHzpKQkxo4dS2xsLM8//zxnz56lZs2azJo1Cz8/P4QQzJgxgw0bNuDu7s7bb79N8+bNAYiLi+Ozzz4D4Omnn6Zfv34A7N+/n8mTJ2OxWOjSpQuvvvoqGo2GjIyMEsdQVCwFyZkKhUJxKZW2v9GgQQOWLVvGsmXLWLJkCR4eHkRHRzN37lzat2/P2rVrad++PXPnzgVg48aNJCYmsnbtWt58801ef/11QEY+zZ49m4ULF7Jo0SJmz55NZmYmAK+//jpvvvkma9euJTExkY0bNwKUOoZCoVAobgw3ZPN869at1K5dm5o1axIfH1+4vRIbG8tvv/0GUNiu0WiIiIjAbDZjNBrZvHkzHTt2xN/fHz8/Pzp27MimTZswGo1kZ2cTERGBRqMhNjaW+Pj4Yn1dOoZCoVAobgw3xLisWLGiUF03LS2NkJAQAIKDgwszww0GA2FhYYXXhIWFYTAYLmsPDQ0tsb3g/LLGUCgUCsWNodKNi81mY926dfTo0eOy1zQaTaVXFLwRY1QmGzdupHv37oVbigqFQnErUOnGZePGjTRv3pxq1aoBEBQUhNFoBMBoNBbqYYWGhpKcnFx4XXJyMqGhoZe1GwyGEtsLzi9rjFsNh8PBG2+8wZdffsmKFStYvnx5Ma0whUKhuFmpdOOyYsUKevfuXfg8KiqKpUuXAjJ3omvXrsXahRDs2bMHHx8fQkJCiIyMZPPmzWRmZpKZmcnmzZuJjIwkJCQEb29v9uzZgxCixL4uHaOyWbr7LB3fXkf9l1fQ8e11LN199rr6S0hIoG7dutSuXRu9Xk/v3r0L/UoKhUJxM1OpeS65ubls2bKlmOTI6NGjGT9+PIsXL6ZGjRrMmjULkCq9GzZsIDo6Gg8PD2bOnAlIhd9nnnmGgQMHAjBmzJhC1d9p06YVhiJ37tyZzp07lzlGZbJ091kmL9lH3oXKdmcz8pi8RNYnib372hRYS/I3JSQkXP9kFQqFopKpVOPi6enJ9u3bi7UFBATwzTffXHauRqMpNflu4MCBhcblYsLDw1m+fPll7aWNUZm8t+ZIoWEpIC/fwXtrjlyzcVEoFIpbFaXjUUGcy8i7qvbyUJq/SaFQKG52lHGpIGr4e1xVe3kIDw8nMTGRpKQkbDYbK1asICoq6pr7UygUihuFMi4VxMTuTfBw1RVr83DVMbF7k2vu08XFhalTpzJq1Ch69epFz549ady48fVOVaFQKCodJVxZQRT4Vd5bc4RzGXnU8PdgYvcm1+1v6dKlyzXVn1coFIqqRBmXCiT27prKea9QKBSobTGFQqFQVALKuCgUCoWiwlHGRaFQKBQVjjIuCoVCoahwlHFRKBQKRYWjjMtNzuTJk2nfvn1hPRyFQqG4FVDG5Sanf//+fPnll1U9DYWiELvDidMpqnoaipscZVwqkoSF8GELeN1f/kxYeN1d3nvvvfj5+VXA5BSK68dotvBq3D4+WX+M9BxbVU9HcROjkigrioSF8MtYyL8gVJmZJJ8DtBxcdfNSKCoIu8PJ+2uPsODPMwDcUyeADo2qVfGsFDcrauVSUcS/UWRYCsjPk+2KW588E2QZ5M/bFK1GQ61Az8LnQd76KpyN4mZHrVwqiswzV9euuDnJTgGHBVw9wTNItuVlwKb3Yesn0Opx6DoVPK9QOtvpBOEAnWvlz/kGodVqeLRtXVrVCSDIS39dit+Kfz5q5VJR+NW6unbFzUdOKvz0L+kvWzulaJVit8LW2SCc8NfX8nlZ5KbDjrmw7BnISKr8ed9AArz0dGxUjTur++L
2018-09-05 00:48:12 +02:00
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x=\"Gr Liv Area\", y=\"SalePrice\", hue=\"recently_built\", s=15, data=df);"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 95,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
2020-06-29 01:10:19 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZcAAAEGCAYAAACpXNjrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOydd3iUVfbHPzOTTHqvEEIHRYqICoROMKGJFMFeQBELCth1FXRXxF1/q+KKDburIogEFBCyBDFEqkKkCYIQEkpm0ntmMjP398dJoSQhaAKi9/M8eZh555333oTk/c6955zvMSilFBqNRqPRNCLG8z0BjUaj0fz50OKi0Wg0mkZHi4tGo9FoGh0tLhqNRqNpdLS4aDQajabRcTvfE/ij0KtXL6Kios73NDQajeaC4ujRo2zevPm041pcKomKimLJkiXnexoajUZzQTFu3Lhaj+ttMY1Go9E0OlpcNBqNRtPoaHHRaDQaTaOjxUWj0Wg0jY4WF41Go9E0OlpcNBqNRtPo6FRkjUbzp0ApRVaRjZwSO+F+HoT4epzvKf2l0eKi0Wj+FGQV2bj6tRSsRTbiL4ngxfHdCPQ2n+9p/WXR22IajeZPgd3pwlpkA2D3sULsDled5+o2Vk2PFheNRvOnwM/TjWdGXcKVrYP4zw3dCfJxP+0ch9PF3uOFPPHlDn44nEt5hfM8zPSvgd4W02g0fwoCvMzc1KslYy+Lws/THZPRcNo5uaV2bnhnE/mlFSRsP8b6xwfj6W46D7P986NXLhqN5k+Dh5uJQG9zrcICYMRAgJesaHw93aj9LE1joFcuGo3mL0OonwefT+nNlkO59GgZRKjOKGsytLhoNJq/FM0CvBjdXbfXaGr0tphGo9FoGh0tLhqNRqNpdLS4aDQajabR0eKi0Wg0mkZHi4tGo9FoGh0tLhqNRqNpdJpMXA4ePMjo0aOrv3r06MGHH35Ifn4+kyZNIj4+nkmTJlFQUACI18/s2bOJi4tj1KhR7N69u/paCQkJxMfHEx8fT0JCQvXxXbt2MWrUKOLi4pg9e3a1X1BdY2g0Go3m3NBk4tK2bVuWLVvGsmXLWLJkCV5eXsTFxTF//nxiYmJITEwkJiaG+fPnA5CcnExaWhqJiYk899xzPPvss4AIxbx581i0aBFffPEF8+bNqxaLZ599lueee47ExETS0tJITk4GqHMMjUaj0Zwbzsm22MaNG4mOjiYqKoqkpCTGjBkDwJgxY1izZg1A9XGDwUD37t0pLCzEarWSkpJC3759CQwMJCAggL59+7J+/XqsVivFxcV0794dg8HAmDFjSEpKOulap46h0Wg0mnPDORGXFStWcPXVVwOQk5NDeHg4AGFhYeTk5ABgsViIjIysfk9kZCQWi+W04xEREbUerzq/vjE0Go1Gc25ocnGx2+2sXbuWYcOGnfaawWDAYGha67hzMYZGo9FoTqbJxSU5OZnOnTsTGhoKQEhICFarFQCr1UpwcDAgK5LMzMzq92VmZhIREXHacYvFUuvxqvPrG0Oj0Wg054YmF5cVK1YwcuTI6uexsbEsXboUgKVLlzJkyJCTjiulSE1Nxc/Pj/DwcPr160dKSgoFBQUUFBSQkpJCv379CA8Px9fXl9TUVJRStV7r1DE0Go1Gc25oUlfk0tJSNmzYwD/+8Y/qY1OmTGHGjBksXryY5s2bM3fuXAAGDhzId999R1xcHF5eXsyZMweAwMBA7rvvPsaPHw/A1KlTCQwMBOCZZ57hySefpLy8nAEDBjBgwIB6x9BoNBrNucGgdDNpAMaNG8eSJUvO9zQ0Go3mgqKue6eu0NdoNBpNo6PFRaPRaDSNjhYXjUaj0TQ6Wlw0Go1G0+hocdFoNBpNo6PFRaPRaDSNjhYXjUaj0TQ6Wlw0Go1G0+hocdFoNBpNo6PFRaPRaDSNjhYXjUaj0TQ6Wlw0Gs3Z4XSAy3W+Z6H5g6PFRaPRNJyiTFg+A9a/BCW6w6umbprUcl+j0fyJcDpg7WzY/l95Ht0T2g48v3PS/GHR4qLRNISyPHDYwc0MXkHnezbnB4MRAlvVPPcJO39z0fzh0eKi0ZxIcRY4y8HdG7xD5FhZvmwDbXwdetwOQ2aB9xlaZ7tcoJxgcm/6OZ8rjEa48g6IvlKEJSD6fM9I8wdGx1w0mipKsuHLO+CVLpA4U1YrAA4bbJwHygU/fiDP66M0F7bMh2X3QX5G08/7XOIdAm0HQURn8PQ737PR/IFpUnEpLCxk2rRpDBs2jOHDh7N9+3by8/OZNGkS8fHxTJo0iYKCAgCUUsyePZu4uDhGjRrF7t27q6+TkJBAfHw88fHxJCQkVB/ftWsXo0aNIi4ujtmzZ1PVVLOuMTSaenHY4FCyPN61WLbBAExm6H6rPO48VrbG6qPoGKx6HHYsgpUPQ3lR081Zo/mD0qTi8vzzz9O/f39WrVrFsmXLaNeuHfPnzycmJobExERiYmKYP38+AMnJyaSlpZGYmMhzzz3Hs88+C4hQzJs3j0WLFvHFF18wb968arF49tlnee6550hMTCQtLY3kZLkx1DWGRlMv7l7Q6x7wDIDBT8tzAO8giPs7PPQzjHypZrusLjz8wWiSxwEtwaR3nzV/PZpMXIqKiti6dSvjx48HwGw24+/vT1JSEmPGjAFgzJgxrFmzBqD6uMFgoHv37hQWFmK1WklJSaFv374EBgYSEBBA3759Wb9+PVarleLiYrp3747BYGDMmDEkJSWddK1Tx9Bo6sU7GAY9CVO3wBWTwNP/5Nf8m59ZWEDOuW8z3JIAg56oESmN5i9Ek32kOnLkCMHBwTz55JPs3buXzp0789RTT5GTk0N4eDgAYWFh5ORIrrzFYiEyMrL6/ZGRkVgsltOOR0RE1Hq86nygzjE0mnopLwKjG/hFnvnc+jD7QGgH+dJo/qI02crF4XCwZ88ebrzxRpYuXYqXl9dp21MGgwGDwdBUUzhnY2j+BBRlQsIU+Ho6FFvP92w0mgueJhOXyMhIIiMjufTSSwEYNmwYe/bsISQkBKtV/nitVivBwZLSGRERQWZmZvX7MzMziYiIOO24xWKp9XjV+UCdY2g0dbJ5PuxbKYH8favO92zOK2UVDqxF5RTbHOd7KpoLmCYTl7CwMCIjIzl48CAAGzdupF27dsTGxrJ06VIAli5dypAhQwCqjyulSE1Nxc/Pj/DwcPr160dKSgoFBQUUFBSQkpJCv379CA8Px9fXl9TUVJRStV7r1DE0mjppcbn8azBAxCVNP16xFVIXgGU32MuafrwGUmZ3krjbwrg3NvD2d7+SX2o/31PSXKA0aRrLzJkzeeSRR6ioqCA6OpoXXngBl8vFjBkzWLx4Mc2bN2fu3LkADBw4kO+++464uDi8vLyYM2cOAIGBgdx3333ViQFTp04lMDAQgGeeeYYnn3yS8vJyBgwYwIABAwCYMmVKrWNoNHXSqh888KPEXLxDm3assnxYdj/sXy3jzdgB5qimHbOBFNkqmLl0F4XlDl5be4AberYk0Pt8z0pzIWJQVcUhf3HGjRvHkiVLzvc0NH8FSnNh4c1weIM8n7ETAlue3zlVkl9q5+mlu1i+4zidmvnx8R29CPPzON/T0vyBqeveqRPwNZpzjXcwjHsHNrwGbQeDZ+D5nlE1gd5m/n5NZx4dehHeZjctLJrfjBYXjeZ8ENAChv/rfM+iVkJ8PQjx1aKi+X1obzGNRqPRNDpaXDQajUbT6Ghx0Whqo7xACitLm9jdwWGD7AOw8wsosjTtWBrNOUSLi0ZzKrZi+PEjePli+HIKlGTJ8dI8yD0ERcelK2NjUJoDb/WFLyeL3X9pbuNcV6M5z2hx0WhOxV4Mm14HpeDXNWAvFfv97f+F/3SH13tBSSNZxCgXOCv7w5Tlg8vZONfVaM4zWlw0mlMx+0DPe+Rxm4Fg9gZHGfy6Vo5VbZk1Bp4BcNMXcPlEuP4T8D1HrYMryiD7F/h5uXTf1GgaGZ2KrNGcioefWO53v1Eq6H1CZRUz9HlIuBsiujRe0aOHH3SIg/ZXifXMuaI0F97sA84K6DwORr1
2018-09-05 00:48:12 +02:00
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x=\"Gr Liv Area\", y=\"SalePrice\", hue=\"recently_remodeled\", s=15, data=df);"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 96,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"del df[\"Yr Sold\"]\n",
"del df[\"Year Built\"]\n",
"del df[\"Year Remod/Add\"]"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 97,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"age_columns = [\n",
" \"remodeled\", \"years_since_built\", \"years_since_remodeled\",\n",
" \"recently_built\", \"recently_remodeled\",\n",
"]\n",
2020-06-29 01:10:19 +02:00
"new_variables.extend(age_columns)"
2018-09-05 00:48:12 +02:00
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 98,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>remodeled</th>\n",
" <th>years_since_built</th>\n",
" <th>years_since_remodeled</th>\n",
" <th>recently_built</th>\n",
" <th>recently_remodeled</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Order</th>\n",
" <th>PID</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <th>526301100</th>\n",
" <td>0</td>\n",
" <td>50</td>\n",
" <td>50</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <th>526350040</th>\n",
" <td>0</td>\n",
" <td>49</td>\n",
" <td>49</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <th>526351010</th>\n",
" <td>0</td>\n",
" <td>52</td>\n",
" <td>52</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <th>526353030</th>\n",
" <td>0</td>\n",
" <td>42</td>\n",
" <td>42</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <th>527105010</th>\n",
" <td>1</td>\n",
" <td>13</td>\n",
" <td>12</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" remodeled years_since_built years_since_remodeled \\\n",
"Order PID \n",
"1 526301100 0 50 50 \n",
"2 526350040 0 49 49 \n",
"3 526351010 0 52 52 \n",
"4 526353030 0 42 42 \n",
"5 527105010 1 13 12 \n",
"\n",
" recently_built recently_remodeled \n",
"Order PID \n",
"1 526301100 0 0 \n",
"2 526350040 0 0 \n",
"3 526351010 0 0 \n",
"4 526353030 0 0 \n",
"5 527105010 0 0 "
]
},
2020-06-29 01:10:19 +02:00
"execution_count": 98,
2018-09-05 00:48:12 +02:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[age_columns].head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Outliers\n",
"\n",
"The instructors' notes state:\n",
"\n",
"> **Five observations** that an instructor may wish to remove from the data set before giving it to students (a plot of SALE PRICE versus GR LIV AREA will quickly indicate these\n",
"points). Three of them are true **outliers** (Partial Sales that likely don’ t represent actual market values) and two of them are simply unusual sales (very large houses priced\n",
"relatively appropriately). I would **recommend removing any houses with more than\n",
"4000 square feet** from the data set (which eliminates these five unusual observations)\n",
"before assigning it to students.\n",
"\n",
"To apply a more \"rigorous\" approach, outlier detection is conducted with a so-called Isolation Forest."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 99,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"# Use only numeric columns that are strongly correlated with the target.\n",
"# This mitigates the risk that a \"not so good\" chosen factor variable introduced\n",
"# in this notebook causes an observation to be removed as an outlier.\n",
"with open(\"data/correlated_variables.json\", \"r\") as file:\n",
" content = json.loads(file.read())\n",
"strongly_correlated = content[\"strongly_correlated\"]\n",
"df_encoded = encode_ordinals(df[list(set(strongly_correlated) & set(df.columns))])\n",
2020-06-29 01:10:19 +02:00
"iso = IsolationForest(n_estimators=100, bootstrap=True, contamination=0.005)\n",
2018-09-05 00:48:12 +02:00
"outliers = pd.DataFrame(\n",
" iso.fit_predict(df_encoded), columns=[\"outlier\"], index=df.index\n",
")\n",
"outliers[\"outlier\"] = outliers[\"outlier\"].apply(lambda x: 1 if x < 0 else 0)\n",
"df = pd.concat([df, outliers], axis=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The five aforementioned outliers are among the ones detected."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 100,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
2020-06-29 01:10:19 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZcAAAEGCAYAAACpXNjrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdd3iUVdrA4d/UTMkkk4T0hEBoRlqQIoEQBAxIkyKI6667sCIWFBUr6gIK6q5rXXFdED/sSpGiYEECEkJHehCpAQJk0iaZTKbPvN8fEwaQEIIkhHLu6+KCvPPOOQeYmWdOe45MkiQJQRAEQahD8oZugCAIgnDtEcFFEARBqHMiuAiCIAh1TgQXQRAEoc6J4CIIgiDUOWVDN+BKcfPNNxMfH9/QzRAEQbiqHD9+nI0bN55zXQSXKvHx8SxcuLChmyEIgnBVGT58eLXXxbCYIAiCUOdEcBEEQRDqnAgugiAIQp0Tcy6CIAjVcLvd5Ofn43A4GropVwSNRkNCQgIqlapW94vgIgiCUI38/HwMBgNNmjRBJpM1dHMalCRJlJSUkJ+fT9OmTWv1HDEsJgiCUA2Hw0FERMR1H1gAZDIZERERF9WLEz0XQRCuDZIEVhNUFoMhGvSRl1ykCCynXey/hQgugiBcG6wmmNUTKgrghoEw5D3QhjV0q65bYlhMEIRrg9flDywAJ3eCx3n+e6/RY6wWLlyIyWQK/HzPPfewa9cuAO677z4sFstla4sILoIgXBuCQuC2f0HjNLjjQ9BFnHuP1wOm3fDNI3B0I7ivrZVgixYtorCwsNrHPvjgA0JCQmpdltfrvaS2iOAiCMK1QWuETmPgri8hoRMoqlkyayuBjwbBtk/h40FgN1/+dl6kOXPmMGjQIAYNGsRHH31Efn4+gwYNCjz+4Ycf8u677/LDDz+we/dunnzySYYMGXLO5Hvv3r0pLS0FYMmSJYwYMYIhQ4YwefLkQCDp0KED//znP7n99tvZtm3bJbVbBBdBEK4dyiDQhYFcUf3jMtnpeZggA8iu7I/A3bt3s3DhQubNm8fcuXOZP3/+eYe2brvtNtq0acPrr7/OkiVL0Gg01d538OBBvv/+e7788kuWLFmCXC7n22+/BcBms9GuXTu++eYbOnXqdEltFxP6giBcP4KjYPRSOLIOErrUyYqy+vTLL79w6623otPpAMjMzGTLli2XVOb69evZvXs3I0aMAE4vuQZQKBT069fv0hpdRQQXQRCuLyHx0HZkQ7fiD7NYLPh8vsDPTmcNCxeqIUkSw4YN44knnjjnsaCgIBSK8/T6LtKV3ScUBEG4jnXq1IkVK1Zgt9ux2WysWLGCjIwMSkpKMJvNuFwufv7558D9er2eysrKGstMS0vjxx9/pKSkBICysjKOHz9e520XPRdBEIQrVOvWrRk+fDgjR/p7WiNGjKBdu3aMHz+ekSNHEh0dTXJycuD+YcOGMWXKFDQaDXPnzq22zObNm/PYY4/x97//HZ/Ph0qlYvLkyXV+WKJMkq7RBd8Xafjw4eKwMEEQAn799VdSUlIauhlXlOr+Tc732SmGxQRBEIQ6J4KLIAiCUOdEcBEEQRDqXL0Fl0OHDjFkyJDAr5tuuomPPvqIsrIyxowZQ9++fRkzZgzl5eWAf3nc9OnTyczMZPDgweTm5gbKWrRoEX379qVv374sWrQocH337t0MHjyYzMxMpk+fzqnpo/PVIQiCIFwe9RZckpOTWbJkCUuWLGHhwoVotVoyMzOZNWsWaWlpLF++nLS0NGbNmgVAdnY2eXl5LF++nGnTpjF16lTAHyhmzJjBvHnzmD9/PjNmzAgEi6lTpzJt2jSWL19OXl4e2dnZAOetQxAEQbg8Lsuw2Pr160lMTCQ+Pp6srCyGDh0KwNChQ1mxYgVA4LpMJiM1NRWLxUJhYSE5OTl0794do9FIaGgo3bt3Z82aNRQWFmK1WklNTUUmkzF06FCysrLOKuv3dQiCIAiXx2UJLsuWLQskWispKSEqKgqAyMjIwEYek8lETExM4DkxMTGYTKZzrkdHR1d7/dT9NdUhCIJwtcnOzqZfv36BkZ+rRb0HF5fLxcqVK7ntttvOeUwmk9X7SW+Xow5BEIT64PV6eemll5g9ezbLli1j6dKlHDhwoKGbVSv1Hlyys7Np3bo1jRo1AiAiIiJw3kBhYSHh4eGAv0dSUFAQeF5BQQHR0dHnXDeZTNVeP3V/TXUIgiDUl8XbjtP9nytp+uwyuv9zJYu3XXpKlZ07d5KUlERiYiJqtZqBAwcGhv+vdPUeXJYtW8bAgQMDP/fu3ZvFixcDsHjxYvr06XPWdUmS2L59OwaDgaioKNLT08nJyaG8vJzy8nJycnJIT08nKiqK4OBgtm/fjiRJ1Zb1+zoEQRDqw+Jtx5m0cBfHy+xIwPEyO5MW7rrkAHO+aYGrQb3mFrPZbKxbt46XXnopcG3cuHE89thjLFiwgLi4ON5++20AevbsyerVq8nMzESr1fLKK68AYDQaeeihhwLpocePH4/RaARgypQpTJo0CYfDQUZGBhkZGTXWIQiCUB/+/eNv2N1nn9xod3v594+/MbRD3ebsulrUa3DR6XRs3LjxrGthYWF8/PHH59wrk8mYMmVKteWMGDEiEFzO1LZtW5YuXXrO9fPVIQiCUB9OlNkv6nptnW9a4GogdugLgiBcojij9qKu11bbtm3Jy8vj2LFjuFwuli1bRu/evS+pzMtFBBdBEIRL9FS/VmhVZx+ypVUpeKpfq0sqV6lUMnnyZMaOHcuAAQPo378/LVq0uKQyLxdxnosgCMIlOjWv8u8ff+NEmZ04o5an+rWqk/mWnj170rNnz0su53ITwUUQBKEODO0Qf91O3ldHDIsJgiAIdU4EF0EQBKHOieAiCIIg1DkRXARBEIQ6J4KLIAiCUOdEcBEEQbhCTZo0ibS0tMCRJVcTEVwEQRCuUMOHD2f27NkN3Yw/RAQXQRAujtcDPl9Dt+LKs3MevNUGphr9v++cd8lFdu7cmdDQ0Dpo3OUnNlEKglB7FQWwcjoYk6DT30Ef0dAtujLsnAffTgB3VaLK8mP+nwHa3dlw7WpAouciCELteD3+wLLtU1g1HUy7G7pFV46sl04HllPcdv/165TouQhCLZTbXTg9PtQKOUaduqGb0zBkcn+P5RR9ZMO15UpTnn9x168DIrgIwhmKrU6cHh86lYIwvT+IlNtdvLfyILNzDnFX58Y8fVurCwYYn0/CK0moFNfQ4IBcDp3/Domd/YElNLGhW3TlCE3wD4VVd/06dQ298gXh0pRYnUz4chvd/7mSl7/7lTKbCwCnx8cHOYfwSfDFpqM43TVPZpfZXHy8Po+n5u/k+CUeFnXF0UVA8i0Q3Ro0hoZuzZWjz2RQ/e7sFpXWf/0STJw4kbvuuovDhw+TkZHB/PnzL6m8y6leg4vFYmHChAncdttt9O/fn23btlFWVsaYMWPo27cvY8aMoby8HABJkpg+fTqZmZkMHjyY3NzcQDmLFi2ib9++9O3bl0WLFgWu7969m8GDB5OZmcn06dORJAngvHUIQk1cHh/rDpYA8O2OE7i8/iCiVsi5s5P/G+igtrGolTW/bU6WO3jx2z0s3n6cfyzejdXhrt+GCw2v3Z0w+D9VvTmZ//fB/7nkyfw333yTnJwccnNzyc7OZuTIkXXT3sugXoPLyy+/TI8ePfjhhx9YsmQJzZo1Y9asWaSlpbF8+XLS0tKYNWsWANnZ2eTl5bF8+XKmTZvG1KlTAX+gmDFjBvPmzWP+/PnMmDEjECymTp3KtGnTWL58OXl5eWRnZwOctw5BqIlGpWB0tyaEaJQ80ff04U9GnZpn+6ewYVIfXhraJjBcdj4GjRKFXAZAQpgW5bU0NCacX7s74fHdMLXM//t1ukrslHp71VdUVLB582ZGjBgBgFqtJiQkhKysLIYOHQrA0KFDWbFiBUDgukwmIzU1FYvFQmFhITk5OXTv3h2j0UhoaCjdu3dnzZo1FBYWYrVaSU1NRSaTMXToULKyss4q6/d1CEJNwvRqHs9syYq
2018-09-05 00:48:12 +02:00
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x=\"Gr Liv Area\", y=\"SalePrice\", hue=\"outlier\", s=15, data=df);"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 101,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"# Remove the outliers.\n",
"df = df[df[\"outlier\"] == 0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Save the Results"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Save the Data"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 102,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"# Re-order the columns for convenience.\n",
"final_columns = (\n",
" sorted(set(list(ALL_COLUMNS.keys()) + new_variables) & set(df.columns))\n",
" + TARGET_VARIABLES\n",
")\n",
"df = df[final_columns]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Discarding useless and adding new predictors changed the final dataset significantly."
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 103,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
2020-06-29 01:10:19 +02:00
"(2883, 109)"
2018-09-05 00:48:12 +02:00
]
},
2020-06-29 01:10:19 +02:00
"execution_count": 103,
2018-09-05 00:48:12 +02:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.shape"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 104,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>1st Flr SF</th>\n",
2020-06-29 01:10:19 +02:00
" <th>1st Flr SF (box-cox-0)</th>\n",
2018-09-05 00:48:12 +02:00
" <th>2nd Flr SF</th>\n",
" <th>3Ssn Porch</th>\n",
" <th>Bedroom AbvGr</th>\n",
" <th>Bsmt Cond</th>\n",
" <th>Bsmt Exposure</th>\n",
" <th>Bsmt Full Bath</th>\n",
" <th>Bsmt Half Bath</th>\n",
" <th>Bsmt Qual</th>\n",
" <th>Bsmt Unf SF</th>\n",
" <th>BsmtFin SF 1</th>\n",
" <th>BsmtFin SF 2</th>\n",
" <th>BsmtFin Type 1</th>\n",
" <th>BsmtFin Type 2</th>\n",
" <th>Electrical</th>\n",
" <th>Enclosed Porch</th>\n",
" <th>Fence</th>\n",
" <th>Fireplace Qu</th>\n",
" <th>Fireplaces</th>\n",
" <th>Full Bath</th>\n",
" <th>Functional</th>\n",
" <th>Garage Area</th>\n",
" <th>Garage Cars</th>\n",
" <th>Garage Cond</th>\n",
" <th>Garage Finish</th>\n",
" <th>Garage Qual</th>\n",
" <th>Gr Liv Area</th>\n",
2020-06-29 01:10:19 +02:00
" <th>Gr Liv Area (box-cox-0)</th>\n",
2018-09-05 00:48:12 +02:00
" <th>Half Bath</th>\n",
" <th>Kitchen AbvGr</th>\n",
" <th>Kitchen Qual</th>\n",
" <th>Land Slope</th>\n",
" <th>Lot Area</th>\n",
2020-06-29 01:10:19 +02:00
" <th>Lot Area (box-cox-0.1)</th>\n",
2018-09-05 00:48:12 +02:00
" <th>Lot Shape</th>\n",
" <th>Low Qual Fin SF</th>\n",
" <th>Mas Vnr Area</th>\n",
" <th>Misc Val</th>\n",
" <th>Mo Sold</th>\n",
" <th>Open Porch SF</th>\n",
" <th>Overall Cond</th>\n",
" <th>Overall Qual</th>\n",
" <th>Paved Drive</th>\n",
" <th>Pool Area</th>\n",
" <th>Pool QC</th>\n",
" <th>Screen Porch</th>\n",
" <th>TotRms AbvGrd</th>\n",
" <th>Total Bath</th>\n",
" <th>Total Bsmt SF</th>\n",
" <th>Total Porch SF</th>\n",
2018-09-05 15:34:04 +02:00
" <th>Total SF</th>\n",
2020-06-29 01:10:19 +02:00
" <th>Total SF (box-cox-0.2)</th>\n",
2018-09-05 00:48:12 +02:00
" <th>Utilities</th>\n",
" <th>Wood Deck SF</th>\n",
" <th>abnormal_sale</th>\n",
" <th>air_cond</th>\n",
" <th>build_type_1Fam</th>\n",
" <th>build_type_2Fam</th>\n",
" <th>build_type_Twnhs</th>\n",
" <th>found_BrkTil</th>\n",
" <th>found_CBlock</th>\n",
" <th>found_PConc</th>\n",
" <th>has 2nd Flr</th>\n",
" <th>has Bsmt</th>\n",
" <th>has Fireplace</th>\n",
" <th>has Garage</th>\n",
" <th>has Pool</th>\n",
" <th>has Porch</th>\n",
" <th>major_street</th>\n",
" <th>new_home</th>\n",
" <th>nhood_Blmngtn</th>\n",
" <th>nhood_Blueste</th>\n",
" <th>nhood_BrDale</th>\n",
" <th>nhood_BrkSide</th>\n",
" <th>nhood_ClearCr</th>\n",
" <th>nhood_CollgCr</th>\n",
" <th>nhood_Crawfor</th>\n",
" <th>nhood_Edwards</th>\n",
" <th>nhood_Gilbert</th>\n",
" <th>nhood_Greens</th>\n",
" <th>nhood_GrnHill</th>\n",
" <th>nhood_IDOTRR</th>\n",
" <th>nhood_Landmrk</th>\n",
" <th>nhood_MeadowV</th>\n",
" <th>nhood_Mitchel</th>\n",
" <th>nhood_NPkVill</th>\n",
" <th>nhood_NWAmes</th>\n",
" <th>nhood_Names</th>\n",
" <th>nhood_NoRidge</th>\n",
" <th>nhood_NridgHt</th>\n",
" <th>nhood_OldTown</th>\n",
" <th>nhood_SWISU</th>\n",
" <th>nhood_Sawyer</th>\n",
" <th>nhood_SawyerW</th>\n",
" <th>nhood_Somerst</th>\n",
" <th>nhood_StoneBr</th>\n",
" <th>nhood_Timber</th>\n",
" <th>nhood_Veenker</th>\n",
" <th>park</th>\n",
" <th>partial_sale</th>\n",
" <th>railway</th>\n",
" <th>recently_built</th>\n",
" <th>recently_remodeled</th>\n",
" <th>remodeled</th>\n",
" <th>years_since_built</th>\n",
" <th>years_since_remodeled</th>\n",
" <th>SalePrice</th>\n",
2020-06-29 01:10:19 +02:00
" <th>SalePrice (box-cox-0)</th>\n",
2018-09-05 00:48:12 +02:00
" </tr>\n",
" <tr>\n",
" <th>Order</th>\n",
" <th>PID</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
2018-09-05 15:34:04 +02:00
" <th></th>\n",
" <th></th>\n",
2020-06-29 01:10:19 +02:00
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
2018-09-05 00:48:12 +02:00
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <th>526301100</th>\n",
" <td>1656.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>7.412160</td>\n",
2018-09-05 00:48:12 +02:00
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>3</td>\n",
" <td>Gd</td>\n",
" <td>Gd</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>TA</td>\n",
" <td>441.0</td>\n",
" <td>639.0</td>\n",
" <td>0.0</td>\n",
" <td>BLQ</td>\n",
" <td>Unf</td>\n",
" <td>SBrkr</td>\n",
" <td>0.0</td>\n",
" <td>NA</td>\n",
" <td>Gd</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>Typ</td>\n",
" <td>528.0</td>\n",
" <td>2</td>\n",
" <td>TA</td>\n",
" <td>Fin</td>\n",
" <td>TA</td>\n",
" <td>1656.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>7.412160</td>\n",
2018-09-05 00:48:12 +02:00
" <td>0</td>\n",
" <td>1</td>\n",
" <td>TA</td>\n",
" <td>Gtl</td>\n",
" <td>31770.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>18.196923</td>\n",
2018-09-05 00:48:12 +02:00
" <td>IR1</td>\n",
" <td>0.0</td>\n",
" <td>112.0</td>\n",
" <td>0.0</td>\n",
" <td>5</td>\n",
" <td>62.0</td>\n",
" <td>5</td>\n",
" <td>6</td>\n",
" <td>P</td>\n",
" <td>0.0</td>\n",
" <td>NA</td>\n",
" <td>0.0</td>\n",
" <td>7</td>\n",
2018-09-05 15:34:04 +02:00
" <td>2.0</td>\n",
2018-09-05 00:48:12 +02:00
" <td>1080.0</td>\n",
2018-09-05 15:34:04 +02:00
" <td>272.0</td>\n",
" <td>2736.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>19.344072</td>\n",
2018-09-05 00:48:12 +02:00
" <td>AllPub</td>\n",
" <td>210.0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>50</td>\n",
" <td>50</td>\n",
" <td>215000.0</td>\n",
" <td>12.278393</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <th>526350040</th>\n",
" <td>896.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>6.797940</td>\n",
2018-09-05 00:48:12 +02:00
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>2</td>\n",
" <td>TA</td>\n",
" <td>No</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>TA</td>\n",
" <td>270.0</td>\n",
" <td>468.0</td>\n",
" <td>144.0</td>\n",
" <td>Rec</td>\n",
" <td>LwQ</td>\n",
" <td>SBrkr</td>\n",
" <td>0.0</td>\n",
" <td>MnPrv</td>\n",
" <td>NA</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>Typ</td>\n",
" <td>730.0</td>\n",
" <td>1</td>\n",
" <td>TA</td>\n",
" <td>Unf</td>\n",
" <td>TA</td>\n",
" <td>896.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>6.797940</td>\n",
2018-09-05 00:48:12 +02:00
" <td>0</td>\n",
" <td>1</td>\n",
" <td>TA</td>\n",
" <td>Gtl</td>\n",
" <td>11622.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>15.499290</td>\n",
2018-09-05 00:48:12 +02:00
" <td>Reg</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>6</td>\n",
" <td>0.0</td>\n",
" <td>6</td>\n",
" <td>5</td>\n",
" <td>Y</td>\n",
" <td>0.0</td>\n",
" <td>NA</td>\n",
" <td>120.0</td>\n",
" <td>5</td>\n",
2018-09-05 15:34:04 +02:00
" <td>1.0</td>\n",
2018-09-05 00:48:12 +02:00
" <td>882.0</td>\n",
2018-09-05 15:34:04 +02:00
" <td>260.0</td>\n",
" <td>1778.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>17.333478</td>\n",
2018-09-05 00:48:12 +02:00
" <td>AllPub</td>\n",
" <td>140.0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>49</td>\n",
" <td>49</td>\n",
" <td>105000.0</td>\n",
" <td>11.561716</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <th>526351010</th>\n",
" <td>1329.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>7.192182</td>\n",
2018-09-05 00:48:12 +02:00
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>3</td>\n",
" <td>TA</td>\n",
" <td>No</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>TA</td>\n",
" <td>406.0</td>\n",
" <td>923.0</td>\n",
" <td>0.0</td>\n",
" <td>ALQ</td>\n",
" <td>Unf</td>\n",
" <td>SBrkr</td>\n",
" <td>0.0</td>\n",
" <td>NA</td>\n",
" <td>NA</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>Typ</td>\n",
" <td>312.0</td>\n",
" <td>1</td>\n",
" <td>TA</td>\n",
" <td>Unf</td>\n",
" <td>TA</td>\n",
" <td>1329.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>7.192182</td>\n",
2018-09-05 00:48:12 +02:00
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Gd</td>\n",
" <td>Gtl</td>\n",
" <td>14267.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>16.027549</td>\n",
2018-09-05 00:48:12 +02:00
" <td>IR1</td>\n",
" <td>0.0</td>\n",
" <td>108.0</td>\n",
" <td>12500.0</td>\n",
" <td>6</td>\n",
" <td>36.0</td>\n",
" <td>6</td>\n",
" <td>6</td>\n",
" <td>Y</td>\n",
" <td>0.0</td>\n",
" <td>NA</td>\n",
" <td>0.0</td>\n",
" <td>6</td>\n",
2018-09-05 15:34:04 +02:00
" <td>1.5</td>\n",
2018-09-05 00:48:12 +02:00
" <td>1329.0</td>\n",
2018-09-05 15:34:04 +02:00
" <td>429.0</td>\n",
" <td>2658.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>19.203658</td>\n",
2018-09-05 00:48:12 +02:00
" <td>AllPub</td>\n",
" <td>393.0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>52</td>\n",
" <td>52</td>\n",
" <td>172000.0</td>\n",
" <td>12.055250</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <th>526353030</th>\n",
" <td>2110.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>7.654443</td>\n",
2018-09-05 00:48:12 +02:00
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>3</td>\n",
" <td>TA</td>\n",
" <td>No</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>TA</td>\n",
" <td>1045.0</td>\n",
" <td>1065.0</td>\n",
" <td>0.0</td>\n",
" <td>ALQ</td>\n",
" <td>Unf</td>\n",
" <td>SBrkr</td>\n",
" <td>0.0</td>\n",
" <td>NA</td>\n",
" <td>TA</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>Typ</td>\n",
" <td>522.0</td>\n",
" <td>2</td>\n",
" <td>TA</td>\n",
" <td>Fin</td>\n",
" <td>TA</td>\n",
" <td>2110.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>7.654443</td>\n",
2018-09-05 00:48:12 +02:00
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Ex</td>\n",
" <td>Gtl</td>\n",
" <td>11160.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>15.396064</td>\n",
2018-09-05 00:48:12 +02:00
" <td>Reg</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>4</td>\n",
" <td>0.0</td>\n",
" <td>5</td>\n",
" <td>7</td>\n",
" <td>Y</td>\n",
" <td>0.0</td>\n",
" <td>NA</td>\n",
" <td>0.0</td>\n",
" <td>8</td>\n",
2018-09-05 15:34:04 +02:00
" <td>3.5</td>\n",
2018-09-05 00:48:12 +02:00
" <td>2110.0</td>\n",
2018-09-05 15:34:04 +02:00
" <td>0.0</td>\n",
" <td>4220.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>21.548042</td>\n",
2018-09-05 00:48:12 +02:00
" <td>AllPub</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>42</td>\n",
" <td>42</td>\n",
" <td>244000.0</td>\n",
" <td>12.404924</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <th>527105010</th>\n",
" <td>928.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>6.833032</td>\n",
2018-09-05 00:48:12 +02:00
" <td>701.0</td>\n",
" <td>0.0</td>\n",
" <td>3</td>\n",
" <td>TA</td>\n",
" <td>No</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Gd</td>\n",
" <td>137.0</td>\n",
" <td>791.0</td>\n",
" <td>0.0</td>\n",
" <td>GLQ</td>\n",
" <td>Unf</td>\n",
" <td>SBrkr</td>\n",
" <td>0.0</td>\n",
" <td>MnPrv</td>\n",
" <td>TA</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>Typ</td>\n",
" <td>482.0</td>\n",
" <td>2</td>\n",
" <td>TA</td>\n",
" <td>Fin</td>\n",
" <td>TA</td>\n",
" <td>1629.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>7.395722</td>\n",
2018-09-05 00:48:12 +02:00
" <td>1</td>\n",
" <td>1</td>\n",
" <td>TA</td>\n",
" <td>Gtl</td>\n",
" <td>13830.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>15.946705</td>\n",
2018-09-05 00:48:12 +02:00
" <td>IR1</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>3</td>\n",
" <td>34.0</td>\n",
" <td>5</td>\n",
" <td>5</td>\n",
" <td>Y</td>\n",
" <td>0.0</td>\n",
" <td>NA</td>\n",
" <td>0.0</td>\n",
" <td>6</td>\n",
2018-09-05 15:34:04 +02:00
" <td>2.5</td>\n",
2018-09-05 00:48:12 +02:00
" <td>928.0</td>\n",
2018-09-05 15:34:04 +02:00
" <td>246.0</td>\n",
" <td>2557.0</td>\n",
2020-06-29 01:10:19 +02:00
" <td>19.016856</td>\n",
2018-09-05 00:48:12 +02:00
" <td>AllPub</td>\n",
" <td>212.0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>13</td>\n",
" <td>12</td>\n",
" <td>189900.0</td>\n",
" <td>12.154253</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
2020-06-29 01:10:19 +02:00
" 1st Flr SF 1st Flr SF (box-cox-0) 2nd Flr SF 3Ssn Porch \\\n",
"Order PID \n",
"1 526301100 1656.0 7.412160 0.0 0.0 \n",
"2 526350040 896.0 6.797940 0.0 0.0 \n",
"3 526351010 1329.0 7.192182 0.0 0.0 \n",
"4 526353030 2110.0 7.654443 0.0 0.0 \n",
"5 527105010 928.0 6.833032 701.0 0.0 \n",
2018-09-05 00:48:12 +02:00
"\n",
2020-06-29 01:10:19 +02:00
" Bedroom AbvGr Bsmt Cond Bsmt Exposure Bsmt Full Bath \\\n",
"Order PID \n",
"1 526301100 3 Gd Gd 1 \n",
"2 526350040 2 TA No 0 \n",
"3 526351010 3 TA No 0 \n",
"4 526353030 3 TA No 1 \n",
"5 527105010 3 TA No 0 \n",
2018-09-05 00:48:12 +02:00
"\n",
2020-06-29 01:10:19 +02:00
" Bsmt Half Bath Bsmt Qual Bsmt Unf SF BsmtFin SF 1 \\\n",
"Order PID \n",
"1 526301100 0 TA 441.0 639.0 \n",
"2 526350040 0 TA 270.0 468.0 \n",
"3 526351010 0 TA 406.0 923.0 \n",
"4 526353030 0 TA 1045.0 1065.0 \n",
"5 527105010 0 Gd 137.0 791.0 \n",
2018-09-05 00:48:12 +02:00
"\n",
2020-06-29 01:10:19 +02:00
" BsmtFin SF 2 BsmtFin Type 1 BsmtFin Type 2 Electrical \\\n",
"Order PID \n",
"1 526301100 0.0 BLQ Unf SBrkr \n",
"2 526350040 144.0 Rec LwQ SBrkr \n",
"3 526351010 0.0 ALQ Unf SBrkr \n",
"4 526353030 0.0 ALQ Unf SBrkr \n",
"5 527105010 0.0 GLQ Unf SBrkr \n",
2018-09-05 00:48:12 +02:00
"\n",
2020-06-29 01:10:19 +02:00
" Enclosed Porch Fence Fireplace Qu Fireplaces Full Bath \\\n",
"Order PID \n",
"1 526301100 0.0 NA Gd 2 1 \n",
"2 526350040 0.0 MnPrv NA 0 1 \n",
"3 526351010 0.0 NA NA 0 1 \n",
"4 526353030 0.0 NA TA 2 2 \n",
"5 527105010 0.0 MnPrv TA 1 2 \n",
2018-09-05 00:48:12 +02:00
"\n",
2020-06-29 01:10:19 +02:00
" Functional Garage Area Garage Cars Garage Cond \\\n",
"Order PID \n",
"1 526301100 Typ 528.0 2 TA \n",
"2 526350040 Typ 730.0 1 TA \n",
"3 526351010 Typ 312.0 1 TA \n",
"4 526353030 Typ 522.0 2 TA \n",
"5 527105010 Typ 482.0 2 TA \n",
2018-09-05 00:48:12 +02:00
"\n",
2020-06-29 01:10:19 +02:00
" Garage Finish Garage Qual Gr Liv Area \\\n",
"Order PID \n",
"1 526301100 Fin TA 1656.0 \n",
"2 526350040 Unf TA 896.0 \n",
"3 526351010 Unf TA 1329.0 \n",
"4 526353030 Fin TA 2110.0 \n",
"5 527105010 Fin TA 1629.0 \n",
2018-09-05 00:48:12 +02:00
"\n",
2020-06-29 01:10:19 +02:00
" Gr Liv Area (box-cox-0) Half Bath Kitchen AbvGr \\\n",
"Order PID \n",
"1 526301100 7.412160 0 1 \n",
"2 526350040 6.797940 0 1 \n",
"3 526351010 7.192182 1 1 \n",
"4 526353030 7.654443 1 1 \n",
"5 527105010 7.395722 1 1 \n",
"\n",
" Kitchen Qual Land Slope Lot Area Lot Area (box-cox-0.1) \\\n",
"Order PID \n",
"1 526301100 TA Gtl 31770.0 18.196923 \n",
"2 526350040 TA Gtl 11622.0 15.499290 \n",
"3 526351010 Gd Gtl 14267.0 16.027549 \n",
"4 526353030 Ex Gtl 11160.0 15.396064 \n",
"5 527105010 TA Gtl 13830.0 15.946705 \n",
"\n",
" Lot Shape Low Qual Fin SF Mas Vnr Area Misc Val Mo Sold \\\n",
"Order PID \n",
"1 526301100 IR1 0.0 112.0 0.0 5 \n",
"2 526350040 Reg 0.0 0.0 0.0 6 \n",
"3 526351010 IR1 0.0 108.0 12500.0 6 \n",
"4 526353030 Reg 0.0 0.0 0.0 4 \n",
"5 527105010 IR1 0.0 0.0 0.0 3 \n",
2018-09-05 00:48:12 +02:00
"\n",
" Open Porch SF Overall Cond Overall Qual Paved Drive \\\n",
"Order PID \n",
"1 526301100 62.0 5 6 P \n",
"2 526350040 0.0 6 5 Y \n",
"3 526351010 36.0 6 6 Y \n",
"4 526353030 0.0 5 7 Y \n",
"5 527105010 34.0 5 5 Y \n",
"\n",
" Pool Area Pool QC Screen Porch TotRms AbvGrd Total Bath \\\n",
"Order PID \n",
2018-09-05 15:34:04 +02:00
"1 526301100 0.0 NA 0.0 7 2.0 \n",
"2 526350040 0.0 NA 120.0 5 1.0 \n",
"3 526351010 0.0 NA 0.0 6 1.5 \n",
"4 526353030 0.0 NA 0.0 8 3.5 \n",
"5 527105010 0.0 NA 0.0 6 2.5 \n",
2018-09-05 00:48:12 +02:00
"\n",
2018-09-05 15:34:04 +02:00
" Total Bsmt SF Total Porch SF Total SF \\\n",
"Order PID \n",
"1 526301100 1080.0 272.0 2736.0 \n",
"2 526350040 882.0 260.0 1778.0 \n",
"3 526351010 1329.0 429.0 2658.0 \n",
"4 526353030 2110.0 0.0 4220.0 \n",
"5 527105010 928.0 246.0 2557.0 \n",
"\n",
2020-06-29 01:10:19 +02:00
" Total SF (box-cox-0.2) Utilities Wood Deck SF \\\n",
2018-09-05 15:34:04 +02:00
"Order PID \n",
2020-06-29 01:10:19 +02:00
"1 526301100 19.344072 AllPub 210.0 \n",
"2 526350040 17.333478 AllPub 140.0 \n",
"3 526351010 19.203658 AllPub 393.0 \n",
"4 526353030 21.548042 AllPub 0.0 \n",
"5 527105010 19.016856 AllPub 212.0 \n",
2018-09-05 00:48:12 +02:00
"\n",
" abnormal_sale air_cond build_type_1Fam build_type_2Fam \\\n",
"Order PID \n",
"1 526301100 0 1 1 0 \n",
"2 526350040 0 1 1 0 \n",
"3 526351010 0 1 1 0 \n",
"4 526353030 0 1 1 0 \n",
"5 527105010 0 1 1 0 \n",
"\n",
" build_type_Twnhs found_BrkTil found_CBlock found_PConc \\\n",
"Order PID \n",
"1 526301100 0 0 1 0 \n",
"2 526350040 0 0 1 0 \n",
"3 526351010 0 0 1 0 \n",
"4 526353030 0 0 1 0 \n",
"5 527105010 0 0 0 1 \n",
"\n",
" has 2nd Flr has Bsmt has Fireplace has Garage has Pool \\\n",
"Order PID \n",
"1 526301100 0 1 1 1 0 \n",
"2 526350040 0 1 0 1 0 \n",
"3 526351010 0 1 0 1 0 \n",
"4 526353030 0 1 1 1 0 \n",
"5 527105010 1 1 1 1 0 \n",
"\n",
" has Porch major_street new_home nhood_Blmngtn \\\n",
"Order PID \n",
"1 526301100 1 0 0 0 \n",
"2 526350040 1 1 0 0 \n",
"3 526351010 1 0 0 0 \n",
"4 526353030 0 0 0 0 \n",
"5 527105010 1 0 0 0 \n",
"\n",
" nhood_Blueste nhood_BrDale nhood_BrkSide nhood_ClearCr \\\n",
"Order PID \n",
"1 526301100 0 0 0 0 \n",
"2 526350040 0 0 0 0 \n",
"3 526351010 0 0 0 0 \n",
"4 526353030 0 0 0 0 \n",
"5 527105010 0 0 0 0 \n",
"\n",
" nhood_CollgCr nhood_Crawfor nhood_Edwards nhood_Gilbert \\\n",
"Order PID \n",
"1 526301100 0 0 0 0 \n",
"2 526350040 0 0 0 0 \n",
"3 526351010 0 0 0 0 \n",
"4 526353030 0 0 0 0 \n",
"5 527105010 0 0 0 1 \n",
"\n",
" nhood_Greens nhood_GrnHill nhood_IDOTRR nhood_Landmrk \\\n",
"Order PID \n",
"1 526301100 0 0 0 0 \n",
"2 526350040 0 0 0 0 \n",
"3 526351010 0 0 0 0 \n",
"4 526353030 0 0 0 0 \n",
"5 527105010 0 0 0 0 \n",
"\n",
" nhood_MeadowV nhood_Mitchel nhood_NPkVill nhood_NWAmes \\\n",
"Order PID \n",
"1 526301100 0 0 0 0 \n",
"2 526350040 0 0 0 0 \n",
"3 526351010 0 0 0 0 \n",
"4 526353030 0 0 0 0 \n",
"5 527105010 0 0 0 0 \n",
"\n",
" nhood_Names nhood_NoRidge nhood_NridgHt nhood_OldTown \\\n",
"Order PID \n",
"1 526301100 1 0 0 0 \n",
"2 526350040 1 0 0 0 \n",
"3 526351010 1 0 0 0 \n",
"4 526353030 1 0 0 0 \n",
"5 527105010 0 0 0 0 \n",
"\n",
" nhood_SWISU nhood_Sawyer nhood_SawyerW nhood_Somerst \\\n",
"Order PID \n",
"1 526301100 0 0 0 0 \n",
"2 526350040 0 0 0 0 \n",
"3 526351010 0 0 0 0 \n",
"4 526353030 0 0 0 0 \n",
"5 527105010 0 0 0 0 \n",
"\n",
" nhood_StoneBr nhood_Timber nhood_Veenker park \\\n",
"Order PID \n",
"1 526301100 0 0 0 0 \n",
"2 526350040 0 0 0 0 \n",
"3 526351010 0 0 0 0 \n",
"4 526353030 0 0 0 0 \n",
"5 527105010 0 0 0 0 \n",
"\n",
" partial_sale railway recently_built recently_remodeled \\\n",
"Order PID \n",
"1 526301100 0 0 0 0 \n",
"2 526350040 0 0 0 0 \n",
"3 526351010 0 0 0 0 \n",
"4 526353030 0 0 0 0 \n",
"5 527105010 0 0 0 0 \n",
"\n",
" remodeled years_since_built years_since_remodeled \\\n",
"Order PID \n",
"1 526301100 0 50 50 \n",
"2 526350040 0 49 49 \n",
"3 526351010 0 52 52 \n",
"4 526353030 0 42 42 \n",
"5 527105010 1 13 12 \n",
"\n",
2020-06-29 01:10:19 +02:00
" SalePrice SalePrice (box-cox-0) \n",
"Order PID \n",
"1 526301100 215000.0 12.278393 \n",
"2 526350040 105000.0 11.561716 \n",
"3 526351010 172000.0 12.055250 \n",
"4 526353030 244000.0 12.404924 \n",
"5 527105010 189900.0 12.154253 "
2018-09-05 00:48:12 +02:00
]
},
2020-06-29 01:10:19 +02:00
"execution_count": 104,
2018-09-05 00:48:12 +02:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
2020-06-29 01:10:19 +02:00
"execution_count": 105,
2018-09-05 00:48:12 +02:00
"metadata": {},
"outputs": [],
"source": [
"df.to_csv(\"data/data_clean_with_transformations_and_factors.csv\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
2020-06-29 01:10:19 +02:00
"version": "3.7.8"
2018-09-05 00:48:12 +02:00
}
},
"nbformat": 4,
2020-06-29 01:10:19 +02:00
"nbformat_minor": 4
2018-09-05 00:48:12 +02:00
}