Clean the raw data
- clean the raw data given by the undisclosed meal delivery platform:
+ keep data only for the three target citis:
* Bordeaux
* Lyon
* Paris
+ merge duplicates
* it appears as redundant addresses were created
for each order by the same customer
=> significant reduction in the number of addresses
* propagate the merges to the other tables
that reference records merged away
+ cast data types and keep their scopes narrow
+ normalize the data
+ remove obvious outliers
+ adjust/discard unplausible values
- map the cleaned data onto the ORM models
- store the cleaned data in a new database schema
This commit is contained in:
parent
3393071db3
commit
6333f1af1e
1 changed files with 7656 additions and 0 deletions
7656
notebooks/00_clean_data.ipynb
Normal file
7656
notebooks/00_clean_data.ipynb
Normal file
File diff suppressed because it is too large
Load diff
Loading…
Reference in a new issue