I Just wrapped up a session at Cloud Data Driven:
“Delta Table Optimisation: Improvig Queries using Delta Partitioning and Liquid Clustering”
We talked about the stuff that actually matters when your lakehouse starts dragging:
• Why and when partitioning helps (until it doesn’t)
• How clustering fixes what partitioning can’t
• Why small files are silent query killers
• How to use OPTIMIZE, VACUUM, and Azure Storage Explorer to see what’s really going on under the hood
• And yes, I demoed it all live on 297M rows, with real performance gains
Big thanks to the Cloud Data Driven team for having me, and to the crowd that showed up with great questions.
Slides and notebook recap are coming soon to my public GitHub page (/basland/presentations). Until then, keep your layouts lean and your file counts low 😉
You can read more about Partitioning and about Liquid Clustering on my blog.
Delta Table Optimisation with Partitioning and Liquid Clustering
Data volumes are skyrocketing, and with every new project, the pressure is on for data engineers to deliver snappy queries over ever-growing datasets. In this session, we will deep dive into how Delta Lake’s partitioning and Liquid Clustering capabilities can transform query performance in Microsoft Fabric. We’ll be putting these optimisations to the test against a massive dataset of billions(!) of rows to demonstrate real-world impacts on speed and efficiency.
We’ll explore the details of Delta partitioning to ensure your data is stored in the most optimal way, reducing query overhead and lowering query runtimes. Then we’ll compare it to Liquid Clustering, an advanced feature that automatically reorganises your data for faster querying and easier maintenance. Finally, we’ll show you how to integrate these Delta optimisations into your Microsoft Fabric Lakehouse, so you can power your dashboards, reports, or machine learning pipelines with near real-time insights without the performance bottlenecks you might expect with such large data volumes.
By the end of this session, you will:
1. Understand how you could improve performance on your Delta tables.
2. Choose the proper optimisation technique between liquid clustering, partitioning, or both.
3. Setup, test, and validate performance enhancements after implementing partitioning or liquid clustering.
If you’re a data engineer looking for hands-on techniques to improve query speed and lower compute cost in Microsoft Fabric, then this is your must-attend session.
Get ready to leave your old, slow tables behind.