Trifacta – Wrangling US Flight Data
A useful case study shows how Trifacta can clean and analyze US Flight data, including cleaning up markup, removing unrelated and redundant columns, cleaning geographic names and more.
Since there are separate state columns for origin and destination, I decide to remove the state part from the OriginCityName and DestCityName columns. I select the example text ‘, UT’ (including comma) in the grid:
This does not cover all rows, so I select ‘, CA’ as a second example:
Trifacta’s Predictive Transformation has generalized my selection and now covers all the text that I wanted to select. Since I want to remove it, I select the ‘replace’ preview card:
The suggested transform would remove the states from the origin city. I could just repeat the same steps to also remove it from the destination city, but it’s faster to just include the destination city in the transform. I click the ‘modify’ button and add the ‘DestinationCity’ to the col parameter in the transform editor: