Clean the raw data

- clean the raw data given by the undisclosed meal delivery platform:
  + keep data only for the three target citis:
    * Bordeaux
    * Lyon
    * Paris
  + merge duplicates
    * it appears as redundant addresses were created
      for each order by the same customer
      =>  significant reduction in the number of addresses
    * propagate the merges to the other tables
      that reference records merged away
  + cast data types and keep their scopes narrow
  + normalize the data
  + remove obvious outliers
  + adjust/discard unplausible values
- map the cleaned data onto the ORM models
- store the cleaned data in a new database schema
This commit is contained in:
Alexander Hess 2020-09-30 13:39:48 +02:00
parent 3393071db3
commit 6333f1af1e
Signed by: alexander
GPG key ID: 344EA5AB10D868E0

File diff suppressed because it is too large Load diff