Data Cleaning

Data cleaning is the process of transforming data into
regular, structured, usable, trustable formats.

Setting expectations

Data is never clean...even after you clean it! Do not expect perfection.
Instead, we manage data cleanliness.
99% perfect is close enough.
Getting another 99% of that last 1% is amazing.
If you wanted to be exactly perfect, that would be your full time job.

schema

Target a table

Make a query

query

Download a CSV

csv

Inspect and filter your data

filter

Edit and Inspect

edit

Create a sample

sample

Import data to Workbench

sample

Map edits and save upload plan

map

Validate and upload

validate

Verify through original query

verify

Rollback (if you want) and repeat for full set

rollback