Clean the raw data
- clean the raw data given by the undisclosed meal delivery platform: + keep data only for the three target citis: * Bordeaux * Lyon * Paris + merge duplicates * it appears as redundant addresses were created for each order by the same customer => significant reduction in the number of addresses * propagate the merges to the other tables that reference records merged away + cast data types and keep their scopes narrow + normalize the data + remove obvious outliers + adjust/discard unplausible values - map the cleaned data onto the ORM models - store the cleaned data in a new database schema
This commit is contained in:
parent
3393071db3
commit
6333f1af1e
1 changed files with 7656 additions and 0 deletions
7656
notebooks/00_clean_data.ipynb
Normal file
7656
notebooks/00_clean_data.ipynb
Normal file
File diff suppressed because it is too large
Load diff
Loading…
Reference in a new issue