All about Lakehouse in Microsoft Fabric

Lakehouses are the foundation of data engineering in Microsoft Fabric. They’re where you store your raw data, your transformed tables, and everything in between. But if you’re new to Fabric, lakehouses can be a bit confusing. What’s the difference between tables and files? What are shortcuts? How do you actually work with the data once it’s in there?

Today I’m going to walk you through everything you need to know about lakehouses in Microsoft Fabric. We’ll create one from scratch, explore the different sections, set up shortcuts, query data with SQL, and even create a semantic model. Let’s dive in.

Creating Your First Lakehouse

Creating a lakehouse in Fabric is surprisingly simple. Navigate to your workspace, click “New item,” and then either select “Lakehouse” from your favorites or find it in the full list of items. If you can’t find it, there’s a search box on the top right where you can just type “lakehouse” and it’ll pop right up.

Give your lakehouse a name—I’m calling mine “Lakehouse Demo”—and you’ll see an option to enable lakehouse schemas. Now mind you, schemas are still in preview as of the time I’m recording this, so there’s no official support yet. But in practice, this feature has been pretty stable. I’ve been using it without issues, so I’d recommend enabling it.

Click create, and boom—you’ve got yourself a lakehouse. You’ll land in the lakehouse explorer view where you can start working with your data.

Understanding Tables vs Files

On the left side of the lakehouse explorer, you’ll see two main sections: Tables and Files. This is probably the most important concept to understand about lakehouses, so let’s break it down.

Tables: The Managed Part

The Tables section is the managed part of your lakehouse. This is where you store structured data as Delta tables. When you enable schemas, you’ll see a default “dbo” schema—just like in a SQL Server database. You can create additional schemas for organizing your tables. For example, I might create an “ingest” schema for my raw ingestion data.

Here’s the thing though: you can’t create tables directly from the UI. You need to use PySpark notebooks, data pipelines with copy activities, or other programmatic methods to actually create tables. The UI is for browsing and managing, not creating.

Under the hood, Delta tables are just folders with Parquet files and a delta log. I’ve got another video and blog posts about how that works if you want to dive deeper. But from a front-end perspective, the Tables section gives you managed, structured data that works like database tables.

Files: The Unmanaged Part

The Files section is the unmanaged part of your lakehouse. This is your free-form storage where you can dump any file format you want: Parquet, CSV, JSON, XML, even JPEGs and PDFs. You decide the folder structure, you decide what goes where.

This is perfect for a raw landing zone. If you’re pulling data from APIs that give you JSON or Parquet files, you can store them in the Files section. Storage in OneLake is very cheap, so you can keep your raw data here and then process it with Spark notebooks to create Delta tables in the Tables section.

Let me show you a real example. In my silver lakehouse, I’ve got a folder structure under Files that contains the New York City Taxi and Limousine Commission dataset. There’s a “landing” folder, then subfolders for years, and within those are the actual Parquet files. This is all raw data that I downloaded and stored before processing it into Delta tables.

By the way, if you want this dataset for your own demos, just Google “NYC TLC taxi data” and you’ll find it. It’s free to use. Be warned though—it’s massive. We’re talking 4.5 billion rows if you take all the history from 2009 to present. That’ll keep you busy for a while 🙂

Working with Shortcuts

Shortcuts are one of the coolest features in Fabric lakehouses. A shortcut lets you access data that physically lives somewhere else—another lakehouse, an Azure Blob Storage account, Amazon S3, Google Cloud Storage—and query it as if it were local to your lakehouse.

You don’t need to copy the data. You just create a shortcut pointing to it, and you can work with it like it’s right there in your lakehouse. This is great for avoiding unnecessary data duplication and keeping storage costs down.

Creating a Table Shortcut

Let me show you how to create a shortcut to a table in another lakehouse. In the Tables section, click on your schema (I’m using the “ingest” schema I created), then click “New table shortcut.”

The shortcut wizard opens and shows you all the different sources you can connect to. For this example, I’m connecting to Microsoft OneLake because I’ve got other lakehouses in my workspace. I’ll browse to my “Gold” lakehouse and select the “dim_taxi_types” table.

Here’s something interesting: in the Gold lakehouse, some tables have a little link icon. Those are shortcuts themselves—shortcuts from Gold to Silver in this case. Shortcuts can reference shortcuts. It’s shortcuts all the way down!

After selecting the table, you can rename the shortcut. The wizard defaults to the full table name (“gold_dim_taxi_types”), but I’ll change it to just “taxi_types” to keep things clean. Click create, and you’re done.

Now in my “ingest” schema, I have a table called “taxi_types” that’s actually pointing to data in my Gold lakehouse. I can query it, use it in notebooks, whatever—it works just like a local table.

File Shortcuts and Folder Structures

You can also create shortcuts to entire folders in the Files section. In my silver lakehouse, that “landing” folder I mentioned earlier? That’s actually a shortcut to the landing folder in my bronze lakehouse. By creating one shortcut to the root folder, I get access to all the subfolders and files underneath it automatically.

Pro tip: when you’re organizing your Files section and planning to use shortcuts between lakehouses, put everything under one root folder. Don’t just dump files and folders at the top level. Create a single parent folder (like “landing”) and structure everything underneath it. Then you can shortcut that entire structure with one link, and any new data you add to the folder will automatically be available through the shortcut.

Browsing Data in the Lakehouse Explorer

The lakehouse explorer lets you browse your data directly in the UI. Click on a table, and you’ll get a preview. Now, be aware—this preview is not instant. It takes a few seconds to load. But in my experience, after you’ve done it a few times, it seems to warm up some kind of cache and gets faster.

The preview typically shows you 1,000 rows. Even if your table has millions or billions of rows, you’ll see a 1,000-row sample. You can use filters in the UI (it looks similar to Power BI’s filter interface) to narrow down what you’re looking at.

For example, if I’m looking at the taxi rides table and I want to see only rides with dropoff location ID 206, I can filter on that column and the preview updates to show only those rows. It’s handy for quick data exploration without writing any code.

Querying with the SQL Analytics Endpoint

Here’s where things get really cool for those of us who love SQL. On the top right of the lakehouse view, you can switch from “Lakehouse” to “SQL Analytics Endpoint.” This gives you a query editor where you can write SQL queries against your lakehouse tables.

If you’re like me and you’ve been working with databases for years, this feels natural. You can use standard SQL syntax to browse and analyze your data.

SELECT * 
FROM dbo.taxi_zone_lookup

Notice the three-part naming convention: schema.table. Even if your lakehouse doesn’t have schemas enabled (like my older silver lakehouse), the SQL Analytics Endpoint will treat everything as being in the “dbo” schema for compatibility.

You can do all the SQL things you’re used to:

SELECT location_id, borough, zone
FROM dbo.taxi_zone_lookup
WHERE borough = 'Manhattan'

Or create calculated columns:

SELECT 
    location_id,
    borough,
    zone,
    location_id * 10 AS some_calculation
FROM dbo.taxi_zone_lookup
WHERE borough = 'Manhattan'

You can run queries with Command+Enter on Mac or Control+Enter on Windows. And if you prefer working in a desktop SQL client, you can download a SQL database project or open your queries in Visual Studio Code or SQL Server Management Studio. The SQL Analytics Endpoint gives you full connectivity like any other SQL database.

Creating Semantic Models from Lakehouses

If you’re using Power BI as your reporting layer (and let’s be honest, most Fabric users are), you can create semantic models directly from your lakehouse. Click “New semantic model,” give it a name, and select which tables you want to include.

The semantic model gets created in Direct Lake storage mode. That means the data stays in your lakehouse as Delta tables—it doesn’t get copied into the Power BI semantic model. You get a live connection to your lakehouse data, which is fantastic for keeping everything in sync and avoiding data duplication.

After the model is created, you can manage it through the browser. This is great for me because I work on a MacBook, and Power BI Desktop doesn’t run natively on macOS without virtualization. The online semantic modeling experience lets me create relationships, write DAX measures, set up row-level security, and do everything I’d normally do in Power BI Desktop—all from Safari.

Schema Best Practices

Let me share a quick tip about organizing your tables. In my older lakehouses (like the silver one I showed you), I used table name prefixes to indicate the layer. So I’d have tables named “silver_fhv,” “silver_taxi_trips,” and so on.

These days, I recommend using schemas instead of prefixes. Why? Because you can apply security rules at the schema level, but you can’t apply them to table prefixes. If I have a “silver” schema, I can grant or deny access to that entire schema. With prefixes, I’d have to manage permissions table by table. Not fun.

So instead of naming a table “silver_fhv,” create a schema called “silver” and name the table “fhv.” Much cleaner, and way easier to secure.

What You’ve Learned About Lakehouses

Let’s recap what we covered today:

  • How to create a lakehouse and enable schemas (currently in preview)
  • The difference between Tables (managed Delta tables) and Files (unmanaged storage)
  • How to create shortcuts to access data from other lakehouses or cloud storage
  • How to browse data in the lakehouse explorer
  • How to query lakehouse data using the SQL Analytics Endpoint
  • How to create Direct Lake semantic models for Power BI
  • Best practices for organizing tables with schemas instead of prefixes

Lakehouses are the foundation of data engineering in Microsoft Fabric. Once you understand how they work—the difference between managed and unmanaged storage, how shortcuts simplify data access, and the various ways to query and consume the data—you can start building robust data platforms.

Are you using lakehouses in your Fabric environment? What’s your folder structure strategy? Do you use shortcuts extensively or prefer to copy data? Let me know in the comments below!

Leave a Comment