My friends and family used to try and ask me what I do for a living and I generally just pull a Chandler Bing and say numbers or computers . The time has come for me to try to do a service to the entire Analytics Engineering community and the people who love them, and do my best to explain the job of an Analytics Engineer to someone who is not remotely technical. The easiest way for me to describe it is with the graphic below. I essentially play with billions of little data legos everyday and try to help people make sense of them. Or at least that is what I tell my nephews.
Where does Data Come From?
As we all did once upon a time, we asked our parents where babies come from. Well now I am getting questions from my parents where data comes from. What a world!
Well, Mom and Dad when a person loves a website, they visit that website and then the stork comes a few months later and delivers me data. Eh, not exactly but in a sense there are many data storks who program a website to deliver me timely data to be analyzed by data scientists.
When you enter a website or an app, the app is tracking your every move on that platform. We want to know what you see, what you click on, and what you are interacting with in general. Everything you do within a site or app is tracked by that company through the definition of events. Sometimes I help explain to the engineers what events they need to implement and work with them to ensure we track what we need to gauge success. This is super important because in order for the data team to be able to analyze anything, we need to make sure that we are tracking the right things and that there is a baseline understanding that the data matches up with how something is used.
Now when engineers implement logging of these events, they are generally giving you messy datasets that are sometimes usable and sometimes need to be organized in a way that can be leveraged by analysts. Also, think about how many scrolls, clicks, swipes, and everything else you do on your phone or on the internet. It adds up quickly! Each one of those actions becomes a row in a table and the bigger the product, the more data that needs to be stored. Data collection is also not linear, it is exponential. So the more data we track and the better the tools become to track that data, the bigger and bigger those data warehouses become.
So now we’ve got a giant database of raw data. This is great, right? Well it is great but it is not usable. A good analogy for this is to think of when you go grocery shopping. You made a list and you bought all your ingredients for a meal but it is not dinner yet. You might have a raw veggie or two that you can use but if you want to make a whole meal, you need to manipulate those ingredients and turn them into something delicious.
Just like with a meal, when data comes back from the “store” it has not reached its useful state yet, you need to combine different data sets, aggregate different metrics, and turn the mess of your raw data into something wonderful to use for the business. I take those ingredients and I help to assemble them into that final product.
But Mitchell, how do you do this? What are the tools you need to make this happen? What do you actually do? You still haven’t told me? OK Mom, calm down, I am getting there.
What do Analytics Engineers do?
So we now have large databases of data that have been logged by our trusted software and data engineers. We trust that the data is flowing correctly into these large data warehouses but we still need to transform this data to make it usable. That is where I come in!
An Analytics Engineer is there to ensure that all data is properly built to allow for easy retrieval of that data as well as the flexibility for data scientists to run complex algorithms and forecasts for the business. So what I have to do is essentially put together the giant data lego set to ensure that we are able to see how all the pieces fit together. I put together the legos with a few different tools and processes to ensure that all the data is not only delivered in a structured and usable format but also that it is delivered on time everyday and is accurate.
To put it into real world terms let’s go back to our food analogy. A restaurant has to feed all the customers at the same time everyday, they have to buy the food, prepare the food, and serve the food to the patrons. Just like at a restaurant, data has to be procured, transformed, and eventually served to our stakeholders on time everyday. If the patrons don’t have food, they aren’t going to pay you. If my data scientists don’t get their data, they cannot perform the proper analysis or serve up their insights via dashboards, charts, and graphs. I have to act as the head data chef and make sure that everyone has what they need on time and that it is actually usable for them. Not only must it be usable but it also has to have the same underlying business logic for each data scientist to depend on. When you go to a restaurant you expect every pizza to have roughly the same sauce, cheese, and crust. If a data scientist is going to report on the number of clicks on the site, then the definition of what is considered a click needs to be the same for everyone.
What is my Day to Day Actually Look Like?
But Mitchell, what did you do at work today? Let me walk you through some of what I do everyday.
Typically, I check each morning that all the workflows that I have automated to run before I come in, have in fact done what they are supposed to do. I check the data tables to make sure they are populated and working as we expected and start to go through my ‘to do’ list. I will often meet with my stakeholders to go over future projects or to ask clarifying questions about things I am working on. If I am lucky I have some time to work on new projects and do some individual contributor (IC) work.
This is my time to code and program data infrastructure to try and accomplish all the goals laid out in the beginning of this article. This is mainly done using SQL (Structured Query Language) with a little bit of Python. I build data tables and make sure that we schedule them to run at the right times. It is important that I make these tables as efficiently as possible because we pay for the processing power it takes to create them as well as the storage it takes to keep them stored somewhere. So I could be costing my company thousands of dollars if I am not careful. I will also generally take this time to scope out new projects as a means of creating a baseline contract between me and my stakeholders on future work.
Did I do it?
So Mom and Dad, did I do it? Do you understand what I do for a living now? Of course there are a lot more details and minutia to it but I wanted to finally have a blog post I could point people to and say, “Here, read this”. I hope this helps my community to understand my job and maybe help other Analytics Engineers like me to explain what they do to their Uncle Bob or Aunt Judy. Just remember when in doubt, just say you play with legos all day and that might shut them up!