Yesterday, I had the pleasure of speaking at Cloud Data Driven, a free data and analytics user group with weekly online meetings. It brought together interested data professionals from all over the world.
My session, titled ‘Don’t Repeat Yourself – how custom Python modules give back hours of your time’, focused on the software engineering principle of DRY, that can be brought into the data engineering world as well. After an introduction of the Fabric compute landscape we zoomed in to Spark and especially PySpark notebooks where we can utilise custom Python code to make our data engineering tasks a lot easier.
Tomorrow, I will present the same session at Data Saturday Parma in Italy.
Don’t Repeat Yourself, how custom Python modules in Microsoft Fabric give you back hours every day
Warning! This session may contain very DRY content!
Dont Repeat Yourself, or DRY, is a concept in software engineering that governs the way software is written by stating that you should never repeat yourself.
As a data engineer working with Microsoft Fabric, when you start building a data lakehouse you will be writing a lot of code to connect to source systems, copy and transform data, and orchestrate your ELT process.
Fabric allows you to write custom Python modules that can be called from within your notebooks, in order to streamline these processes.
Never again you’ll have to write the same function twice again!
In this very practical session we will dive deep into:
1. Creating a very simple Python module using Visual Studio Code
2. Publishing our Python module to Microsoft Fabric
3. Calling functions in Python from Fabric notebooks
After this session you will go home never having to repeat yourself again, because you will be writing reusable Python modules for all your data engineering needs.