Skip to content
Verified Commit 6333f1af authored by Alexander Hess's avatar Alexander Hess
Browse files

Clean the raw data

- clean the raw data given by the undisclosed meal delivery platform:
  + keep data only for the three target citis:
    * Bordeaux
    * Lyon
    * Paris
  + merge duplicates
    * it appears as redundant addresses were created
      for each order by the same customer
      =>  significant reduction in the number of addresses
    * propagate the merges to the other tables
      that reference records merged away
  + cast data types and keep their scopes narrow
  + normalize the data
  + remove obvious outliers
  + adjust/discard unplausible values
- map the cleaned data onto the ORM models
- store the cleaned data in a new database schema
parent 3393071d
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment