OneLake Security Preview in Microsoft Fabric

Recently, Microsoft came out with a private preview for OneLake Security in Microsoft Fabric lakehouses. This is amazing, it is a feature I have been waiting for now for a long time. The promise with Fabric would be that security in the most detailed grain (both row-level and column-level) would be implemented in OneLake. However, …

Read more

VS Code Notebooks to Improve Your Microsoft Fabric Experience

When you’re using PySpark notebooks in Microsoft Fabric data engineering, you can develop straight from the web browser. While that is interesting, a browser is usually not the most perfect software development environment. In this article I will show you how you can use VS Code Notebooks to develop for Microsoft Fabric. Why Use VS …

Read more

Delta Lake Liquid Clustering vs Partitioning

delta lake liquid clustering schema

Introduction to Delta Lake Liquid Clustering As your Delta tables grow in size, the need for performance tuning in Microsoft Fabric becomes essential. In this post, I’ll explore two powerful optimisation techniques — Delta Lake Partitioning and Liquid Clustering. Both can help improve query speed and reduce costs, but they work in very different ways. …

Read more

Delta Lake Partitioning for Microsoft Fabric

When managing large-scale data lakes with Microsoft Fabric, performance optimisation becomes crucial. One effective technique to achieve better performance is Delta Lake partitioning. Partitioning can significantly enhance query performance, reduce computational costs, and improve data management efficiency within Microsoft Fabric environments. In this blog post, we will explore what Delta Lake partitioning is, how it …

Read more

Extracting Paginating APIs Without NextPage Metadata with Microsoft Fabric Notebooks

Most APIs these days will have some kind of pagination built into them. This is to make sure that queries against the underlying database are not returning too much data, compromising the database performance as well as sending too large messages across the network. Often, these APIs will tell you in their responses how many …

Read more

Implementing the DRY Principle in Microsoft Fabric

When we start implementing a data lake using Microsoft Fabric, we might be tempted to start creating pipelines and notebooks right away, without thinking about design principles. However, there’s one design principle I’d like you to consider from the beginning: DRY. DRY is an acronym that stands for Don’t Repeat Yourself. In this blog post, …

Read more

Notebook Orchestration in Microsoft Fabric

Coming from the ‘old school’ world of SSIS and SQL Server, and later Azure Data Factory and Azure SQL Database, I have always built my ETL orchestration processes using some kind of pipelines. In Fabric, we also have pipelines (the successor of ADF), but, we can now also create notebook orchestration using NotebookUtils and runMultiple(). …

Read more