Fabric Capacity Metrics App is Useful but Only If You Know This

Today I’ll be diving deep into the Fabric Capacity Metrics app. With this Power BI app, you can find out what items in your Fabric capacity are using a lot of compute resources, and you’ll use this to find out what your spending baseline is.

I’ve created another video on how Fabric is licensed and what the cost structure is. Today we’re going to look at the report that Microsoft gives you to find out where your spend is located.

If you find yourself getting throttled or experiencing overages and outages, then this app is definitely something for you.

Important Setup Note

One thing to keep in mind before we dive in: you want this app to run in a workspace that is hosted on a Power BI Pro license.

You do not want to have your Fabric metrics reporting running on your Fabric capacity. Why? Because if your Fabric capacity goes offline because you’re using too much, then the report goes offline as well, and you cannot see why you’re offline.

So please host your Fabric metrics app in a different workspace and host it on a Pro license workspace.

First Impressions of the App

We’re looking at the Fabric Capacity Metrics app here in Power BI. This is a Power BI report that has been created by Microsoft.

If you think, “Okay, Microsoft created this Power BI report, so it should be an awesome report that’s super fast to work with,” and then you start actually working with this report… it will be very slow.

Yes, I am a bit bummed that it’s very slow, and you could be as well.

But on the other hand, if we think about what’s actually underneath this report—what the dataset actually is—it’s literally trillions of rows in a Kusto database.

All the usage of all Fabric capacities of all Microsoft customers in the entire world are living inside of this dataset. So it’s a massive dataset we’re working with.

We’re only seeing our own Fabric capacities, of course—the Fabric capacities that we as a user have access to and are capacity administrators of. That’s important to note.

Understanding the 14-Day Limitation

From here, we can figure out which capacities we have. In my current tenant, I have two paid capacities and a trial capacity. I’ve selected one of the paid capacities that we’re looking at today.

Mind you, in this report we can figure out compute usage and storage usage to see why we’re getting cloud spend and where it’s going. But we’re only going to show that for the past 14 days.

You can only look back two weeks at a time. In a later video somewhere in the near future, I’ll probably be creating a solution to persist that information into a lakehouse and store all of this in your own Fabric environment to analyze data further back in time.

The Multi-Metric Ribbon Chart

We’ve selected our capacity name, and we see a few different visuals on the screen. The first one is the multi-metric ribbon chart—that’s just the name of the chart itself.

What it’s actually showing is the CU usage over time. We’re looking at CU usage at a certain date and against certain types of usage.

If we click on this visual, we’ll find item kinds here and then dates on the bottom. We can find out quickly how much spend we’re generating over the past couple of days, per day and per item kind.

We can see, for example, that we’re spending on Synapse notebooks, lakehouses, Data Activator, pipelines—and then suddenly on Tuesday the 2nd of December (two days ago), we were spending quite a lot on warehouse.

I don’t know why there was a big spend on warehouse, but at least we can see there’s a spend that’s going up, and probably a large part of that is coming from a warehouse item.

Understanding the Different Tabs

Duration Tab

We have a few different tabs here. We’re on the Duration tab. What we can see when we hover over things with our mouse: there are item kinds that have been running for a certain number of seconds.

This is the actual time taken to complete the operation, including Fabric workloads. We see that 87,000 seconds have been spent on Data Activator, 7,000 seconds on Synapse notebooks, 1,700 seconds on pipelines, and so on.

This is not as important as your compute usage, because compute usage is a number of compute units over time. So that’s more important than just the time. But at least we can look at time here.

Operations Tab

On the Operations tab, we can see what’s actually running—how many individual operations are taking place on these different Fabric workloads.

We can see there are 16,000 hits on the lakehouse on Friday 28th November. There were hits on the warehouse, function sets being executed, lakehouse operations, and so on.

Again, this tells us a little bit about the usage of our capacity, but it doesn’t necessarily tell us how expensive things are. So I’m still looking for the CU tab as my most important thing.

Users Tab

The Users tab shows the distinct number of users who performed operations.

We can see that two users performed operations against the warehouse, another two for function sets, another two for datasets, one for Synapse notebooks, one for lakehouse, and so on.

This is a small capacity in a small tenant, so not much happens. But in a larger organization, it might make sense to look at how many users are actually working with certain workloads.

Maybe a workload is generating a lot of spend, but if a lot of users are happy with that and using it, then maybe it’s okay if that spend is happening.

The Utilization Tab (Most Important)

Let’s go back to the CU tab. It’s a little bit quicker now because it’s cached data.

We can see the total spend over time. Then on the right side, if we move over to the Utilization tab, we see a bar chart that shows a percentage against a target. The target is set to 100%, of course. This is the CU percentage limit—the CU limit.

This is the amount of compute we get with our capacity.

Understanding the 100% Line

Mind you, if we hover over one of these bars, we can see that we actually have 60 compute units (CUs). That’s the last line we see here—that’s the 100% line.

Why is it 60? We have an F2 capacity in this case, the smallest capacity available. An F2 capacity gives us two compute units.

These two compute units give us two compute units per second. It’s two multiplied by one.

Every bar in this graph is a 30-second time window. So this time window contains a total allowance of 60 CUs. That’s the 100% line.

If you have, for example, an F64, then this will be 30 × 64, and that will be your 100% line.

We can see how much our allowance is and how much we’re using inside the allowance over time. This is very interesting to see what our baseline capacity is doing and what our usage patterns look like.

Interpreting Usage Percentages

We see that we’re using a total of 7.36 CUs in this 30-second time slice, and we know we have an allowance of 60. So we’re using about 12.2 or 12.3% of our allowance.

Even though we have the smallest capacity available, we’re still using only maybe a seventh or an eighth of the total capacity.

That’s maybe good news because we’re not looking at overspending anytime soon. On the other hand, we’re probably overspending on the F2. I mean, why even buy an F1, right? This is something that’s of interest.

Cross-Filtering and Detailed Analysis

Something I do a lot: I’ll use the Power BI cross-filtering behavior. I’ll click on a date—for example, on Tuesday the 2nd where this massive warehouse usage peak was. Now I can see over time what my spend looks like.

When I start clicking here, the time slices we’re selecting are super small. What we can do is look at our throttling and overages in this dataset.

We don’t have any throttling or overage issues. However, if we do have usage that goes above the 100% line, that will trigger throttling and overages in our capacity.

We have interactive delays, interactive rejections, and background rejections. This is where we can find out how much Fabric is trying to slow us down and make us spend less compute to not go into a rejection or overage situation.

Understanding Overages

If we go to the Overages Over Time tab, there’s probably no data in here. It’s a 0% line. We simply can’t work with this.

What we would see here if we had overages on our compute is a carry-forward percentage.

What happens if you spend more than you’re allowed? You basically borrow that from future you. Your future compute allowance will be used for the compute you generate today.

This carry-forward percentage can be a lot. You can easily borrow a lot of compute from tomorrow and so on.

The Overages chart will show you how much percentage you’re above your current allowance. Are you 10 times over or 100 times over? It will show you the burndown rate in blue.

Spend by Item

At the bottom, we see the spend by item for the past 14 days. Because we can cross-filter, we can actually select data that’s in a single day. I think that’s more useful to look at.

Usage per item over 14 days means I need to do the mental math—divide things by 14 and so on to find out the average spend per item per day. I like to look at just a single day.

I’m interested in what happened on the 2nd of December, on this Tuesday. So I clicked on that one.

Now we get the table sorted by CU usage, per item, showing us what parts of the platform are actually generating or consuming a lot of compute.

Understanding the Hierarchy

The items have hierarchy. They’re shown to be in a workspace. So “Internal Data Warehouse” here is a workspace. Then “KDF2.0 Fabric” is a workspace, and so on.

We have warehouse, Synapse notebook, lakehouse, Data Activator—these are the item kinds. These are the same item kinds we find in the ribbon chart up top.

The last piece of this hierarchy (at the last backslash) is the name of the object. We have a warehouse called “Internal DWH”, a notebook called “OneLake Logging Orchestrator”, and so on.

Calculating Daily Allowance

We can see that on Tuesday the 2nd, we used up 38,887 CUs, and that’s only a little bit out of our allowance.

Our allowance for an F2 capacity in this case is the number of seconds in a day—that’s 86,400. The number of compute unit seconds we get for an F2 is 2 × 86,400.

That’s the total number of CUs we’re allowed to spend in a day, and we only spent 38,887. That means we’re well within the limits—easily within the limits.

However, if we weren’t, we could find out what’s actually generating the spend. On warehouse queries, we’re spending 15,000 CUs on that day. On the logging orchestrator notebook, we’re spending 10,000 CUs, and so on.

Here we can find out what’s actually generating the spend on a per-item level. That’s really cool because that helps us find out if we’re looking at overages—where are we actually generating that spend, where are we actually consuming a lot of capacity.

Drill-Through Functionality

One more thing I want to show you about the compute: if we right-click somewhere on the Utilization Over Time chart and go to “Drill through” → “Time point detail”, we’ll get a detailed Power BI page with just that 30-second slice.

We can see at the top left that we have 6:30 PM to 6:01 PM. That’s a 30-second time window we’re working with.

We’re well within our limits here. We only spent 13 and a half CUs out of the 60 CUs we have available. That’s 22.47% of the limit. So we’re super easy here.

Background Operations and Smoothing

Mind you, in this tiny 30-second slice, a whopping almost 10,000 background operations are running. These 10,000 background operations are found in this table.

These background operations are usually carry-forward operations that have been started in the past (in the hours before) and are then smoothed out over the next 24 hours.

If you’re using Fabric compute power, you’re not using it as-is. With a regular virtual machine or if you buy hardware and run a server, you need to allocate physical resources—CPU, RAM memory, and so on.

With Fabric, you can use more than your allowance, and that will be smoothed out over the next 24 hours. That’s how you end up with 9,920 background operations in that tiny 30-second slice.

Definitely there were no 10,000 activities happening on this capacity. It’s just background operations that have been smoothed out.

Storage Tab

Let’s dive into storage for a little bit, because as we know, Fabric will bill you for both compute usage as well as for storage.

Now, storage is super cheap, and usually storage is for me an afterthought. We spend hundreds of euros or dollars (or thousands) per month on compute, and maybe we spend a few bucks—maybe 10 or 20—on storage.

But if you have a situation where you store a lot of data on OneLake, then this is a tab you might want to be looking at.

The storage tab will tell you the number of workspaces this capacity is running, the storage happening inside those workspaces, and the billable storage (because you can have billable and non-billable storage).

The billable storage table up top will give us the top 10 workspaces by billable storage. As you can see, the “KDF2.0 Fabric” workspace is actually consuming 47 GB of billable storage.

That sounds like a lot. It’s really not. I think a gigabyte of billable storage runs about 2 or 2.5 cents. So this is not really expensive.

We do see that storage by date is going up just a little bit. I’m not really worried about this, but at least we can now find out where our storage bill is going.

We’re getting billed for 116 GB, which is probably running us a couple euros. If you have a large bill in your invoice by Microsoft or by your partner, then check your storage. If it’s getting out of hand, then you probably want to dive in here and see what objects you’re getting billed for.

Wrapping Up

This is a very handy app. It’s maybe not the nicest Power BI report in terms of look and feel. It’s also not the quickest.

But considering there are trillions of transactions in just a two-week rolling period (or at least I imagine it to be trillions of transactions), this is actually quite cool. It will help you dive into the usage you have on your compute and storage spend inside Microsoft Fabric.

In a future episode, as I mentioned, I’ll dive into how we can actually persist this information—write this data to our own Fabric lakehouse and then do reporting on top of that. Then we’ll have control over the number of days of history we keep, and we can create our own reports that maybe are a little bit quicker.

Are you using the Fabric Capacity Metrics app? What insights have you discovered about your capacity usage? Let me know in the comments below!

Leave a Comment