Thursday, 1 February 2018

Data Engineer

I was having a browse the other day and I came across the article here. This got me thinking a bit more about what I actually do at work. In previous jobs I have been clearly defined as a Database Developer but the new role was initially a Data Analyst job with the scope to move into Data Science. The thing is that I am in a small company with only a few people in the department and whilst we do use Data Analysis to provide information to the management teams I would far from describe myself as a core MI Data Analyst. 

The reason for this is because it is not as simple as writing a query and doing some nice visualisations in some reporting tool. We, well at the moment mostly just me, is responsible for developing custom ETL (extract, transform, load) processes in order to create our data warehouse. We really need a data warehouse as we have data coming from MySQL, SQL server, Google Drive, online platforms and daily e-mails. Much of this data can be linked but doing it in Excel is just not a sensible solution give the amount of data that is now flowing through the company. 

We do have an ETL tool but due to the complexities of the data and to be honest the sheer number of tables this tool feels like a cheap (which it was) fit rather than the right fit. Using some python code that I have developed I am able to effectively create views and just transfer across the required data from the SQL server database in a direct import. The MySQL database has a much smaller database design so we should be able to get the data-warehouse up to scratch more easily. As people that have read past posts will know I am now able to link to Google Drive and E-mail and soon hope to have the former set up at work. Hope I can then work on setting up a scheduling tool so much of the work is done for me when I get in. 

Next you have the report, now we are still the reporting team but we don't have a fancy tool to report in. Again those that have read any previous posts can probably guess that we basically use Excel automated with XLWings. I am hoping that if we can get the Data Warehouse sorted we can move on to some reporting tool and be able to reduce some of the reporting that is currently required.  

No comments:

Post a Comment