Data Engineering Archives - That Fabric Guy

Delta Lake Liquid Clustering vs Partitioning

1 May 2025 by Bas Land

Introduction to Delta Lake Liquid Clustering As your Delta tables grow in size, the need for performance tuning in Microsoft Fabric becomes essential. In this post, I’ll explore two powerful optimisation techniques — Delta Lake Partitioning and Liquid Clustering. Both can help improve query speed and reduce costs, but they work in very different ways. …

Delta Lake Partitioning for Microsoft Fabric

27 March 2025 by Bas Land

When managing large-scale data lakes with Microsoft Fabric, performance optimisation becomes crucial. One effective technique to achieve better performance is Delta Lake partitioning. Partitioning can significantly enhance query performance, reduce computational costs, and improve data management efficiency within Microsoft Fabric environments. In this blog post, we will explore what Delta Lake partitioning is, how it …

Extracting Paginating APIs Without NextPage Metadata with Microsoft Fabric Notebooks

31 January 2025 by Bas Land

Most APIs these days will have some kind of pagination built into them. This is to make sure that queries against the underlying database are not returning too much data, compromising the database performance as well as sending too large messages across the network. Often, these APIs will tell you in their responses how many …

Implementing the DRY Principle in Microsoft Fabric

29 November 2024 by Bas Land

When we start implementing a data lake using Microsoft Fabric, we might be tempted to start creating pipelines and notebooks right away, without thinking about design principles. However, there’s one design principle I’d like you to consider from the beginning: DRY. DRY is an acronym that stands for Don’t Repeat Yourself. In this blog post, …

Notebook Orchestration in Microsoft Fabric

25 October 2024 by Bas Land

Coming from the ‘old school’ world of SSIS and SQL Server, and later Azure Data Factory and Azure SQL Database, I have always built my ETL orchestration processes using some kind of pipelines. In Fabric, we also have pipelines (the successor of ADF), but, we can now also create notebook orchestration using NotebookUtils and runMultiple(). …

Spark Dataframes are Views, not Tables

17 October 202414 October 2024 by Bas Land

In Microsoft Fabric data engineering, we use Spark to apply transformations to our data. Just like you would write T-SQL to transform data in SQL Server, you would write PySpark or SparkSQL to transform data in a Fabric lakehouse. There are a lot of parallels between Spark en SQL in the way you can process …