So I decided to embark on trying to build my own free data platform. If nothing else this could be a little project for someone looking to get into Data Engineering and Warehousing to have a go at.
Honestly I have not yet decided on the subject area I am going to use but am looking into the tooling and processes. To give me a basic starting point on testing out the tooling etc. I am going to use a really basic Google Form and Spreadsheet as my data sources. I am an avid runner but do not do enough strength training so I set up a form to track what I am doing to help motivate me.
The form can be seen in the screen shot below, though there are more fields for other types of exercise.
This form feeds straight into a spreadsheet which then acts as the data source for the ETL. Now this data source could be a transactional database, a file in an S3 bucket or whatever source you want. For ease of use downstream I have set all the column headers to be in capitals, this seemed to be beneficial downstream when working with Snowflake which didn't play nice with case changes.
No comments:
Post a Comment