Transform to Curated Zone
It’s a good practice to plan the steps to transform your data. Based on the information we captured during data
exploration stage, we can come up with the following transformation step:
- Read standardized data from S3.
- Perform data transformation: Join yellow_tripdata table with taxi_zone_lookup to get location pick up, location drop
off
- Save processed dataset to S3 in a query optimized format.
- Query transformed data in Athena
Create Standardize to Curated Glue job
Go to Glue console.
In the left navigation panel, click ETL jobs.
On the AWS Glue Studio page, click Visual ETL.

Adding taxi_zone_lookup pickup from Glue Data Catalog
- Click on the Source icon, choose Glue Data Catalog.
- Choose your Standardize database and taxi_zone_lookup table
- Rename node with suffix pickup

Adding yellow_tripdata from Glue Data Catalog
- Click on the Source icon, choose Glue Data Catalog.
- Choose your Standardize database and yellow_tripdata table

Modify column name of taxi_zone_lookup pickup
- Choose node taxi_zone_lookup pickup
- Click on the Transform icon, choose Change Schema.
- Change column name with prefix pu

Join yellow_tripdata with taxi_zone_lookup pickup transformed
- Choose node yellow_tripdata
- Click on the Transform icon, choose Join.
- Update node parent
- Rename node with suffix pickup
- Under the Join conditions, select the following keys:
- taxi_zone_lookup pickup - pu_location_id
- yellow_tripdata - pulocationid

Adding taxi_zone_lookup drop off from Glue Data Catalog
- Click on the Source icon, choose Glue Data Catalog.
- Choose your Standardize database and taxi_zone_lookup table
- Rename node with suffix dropoff

Modify column name of taxi_zone_lookup drop off
- Choose node taxi_zone_lookup dropoff
- Click on the Transform icon, choose Change Schema.
- Change column name with prefix do

Join taxi_zone_lookup drop off transformed
- Choose node join pickup
- Click on the Transform icon, choose Join.
- Update node parent
- Under the Join conditions, select the following keys:
- taxi_zone_lookup dropoff - do_location_id
- yellow_tripdata - dolocationid

Save transformed data to Amazon S3
Set job detail
Specify Iam role

Run job
Click Run.

Check log.

Check metrics

Check output
Query Curated data
Choose your Curated
Choose yellow_tripdata
Choose preview table

Review schema
Choose yellow_tripdata
Click View in glue

