After a few conversations about what a data engineer is these days, I’ve found that there isn’t a shared understanding of the role. I also noticed the majority of the data engineers I spoke to were experienced software engineers.
Based on this, I decided to create a blog post series that consists of interviews with our data engineers. I believe this will help to demystify data engineering, and it might encourage more software engineers to become one.
This week the interview is with Himanshu Agarwal.
What is data engineering for you and how does it overlap with software engineering?
I feel data engineering requires good software engineering skills in order to do things in the right way. The only thing that differs is that you need specialised data skills where you work on the data available, perform transformations, create insights, and make it available to data scientists and analysts in easy to use formats.
How did you get involved in data engineering?
It’s an interesting journey. In my previous gig, they had a data client and were looking for data engineers from India. It was difficult to get enough people within the short timespan and that’s where my software engineer to data engineering journey started, from learning the concepts of Hadoop, Sqoop and Hive to working on data pipelines. After that I joined Equal Experts and went back to doing Java work for a year, but then I thought why not try data engineering again and see if it excites me. That’s when I started looking around for opportunities in this space and approached the recruitment team where we had an open role with one of our clients. Everything went well and continues to do so.
What are the skills a data engineer must have that a software engineer usually doesn’t have?
When I made the transition I thought it would be all around SQL, but later realised that I use most of my software engineering skills while working on the development, building, packaging and deployment of the data pipelines and processes. Saying this, it doesn’t mean that you don’t need some specialised or advance skills that you’ll learn when you enter the world of data. Some of them are:
SQL – you definitely need to be very good with advanced SQL as it’s a backbone of data engineering and helps you in very quick data analysis to get back to people within a very short span.
Terraform / IoC – most of us have worked with Infrastructure As Code in software projects, but here you might need to skill up while creating a data platform and work with integrating many different sources and sinks.
Data storage options and data processing – in the big data world, we have N number of options to do the same thing in different ways, so you need to be aware of multiple tools and techniques to do what you need, and use the right approach for current requirements.
Data modelling – it plays an important role in the data engineering world.
Scaling – you process data in TB, so you need to be always on your toes and think if a solution is scalable and optimised to handle a huge amount of data being processed.
Also, as a data engineer, you need to be aware about streaming and batch processing concepts, and how to do each one in an effective manner.
What data trends are you keeping an eye on?
After reading a few blogs from this series, data mesh was on everyone’s list, so I started reading about it and it looks like a new shift towards how data can be viewed as a product within each domain, handling their own data pipelines. So yes, this is something I’m looking at these days.
Also data space is continuously evolving with new approaches, solutions, frameworks coming in on how processing can be improved – so keeping a focus on how compute power can be utilised in a better way.
Do you have any recommendations for software engineers who want to be data engineers?
If you’re already working as a software engineer then don’t wait – just grab an opportunity to work with any data engineer and you should be able to make a mark with your engineering skills. Learn about data and its processing techniques on the go, as I have.
One more thing I should mention is that we still pair most of the time in data engineering work also.