Data Engineering & Infrastructure
Microsoft Fabric
You finally have a day off, no commitments or distractions, and you sit down to binge-watch the new show everyone’s been raving about. You curl up on the couch, immersing yourself in the plot when a message pops up on your screen:
“You’ve reached your maximum number of streaming minutes. Press continue to borrow from tomorrow’s streaming minutes or click here to double your plan.”
Frustrated, you click continue. You get back into the show, and it’s nearing the stunning climax everyone hints at; you’re riveted and on the edge of your seat when another message appears:
“You’ve used all of tomorrow’s streaming minutes. Check back later or click here to double your plan.”
You are not alone if this feels like a confusing, frustrating user experience. This is how the capacity cost model for Microsoft Fabric works.
Microsoft Fabric was introduced in 2023 as “An all-in-one analytics solution for enterprises that covers everything from Data Movement to Data Science, Real-Time Analytics, and Business Intelligence.”
In this blog post, we will explore the intricacies of its capacity model, shedding light on concepts like Capacity Units (CUs), consumption throttling, smoothing, and reservation discounts in the Fabric environment.
What are Capacity Models?
A capacity is a subscription to reserve access to a set number of Capacity Units. Capacity Units (CUs) measure the computing power in a reserved capacity.
A suitable analogy might be internet service provider plans. The download speed in your plan is equivalent to Capacity Units, and your provider tracks how much data you are downloading at any given time.
Microsoft recommends selecting the appropriate size based on the total CUs your workload will consume. To understand what capacity you need, you must also understand how throttling and smoothing work and factor that into your capacity calculations.
What is Smoothing?
In Microsoft Fabric, “smoothing” plays a role in managing capacity and avoiding throttling. Your workloads are averaged out over a set period, depending on what Capacity is selected. You can borrow compute from the next period if you exceed capacity in that given time.
The idea is that this allows workloads to spike and over-consume within a given period, using some capacity from the next period, assuming that the next period will be lower. There are two types of smoothing:
Interactive Job Smoothing
- Smoothing occurs over a minimum of 5 minutes
- Fabric averages the computed seconds of operations over that period
- If you can’t “borrow” enough capacity from the next period, throttling may occur
Scheduled / Background Job Smoothing
- Similar to Interactive, however, smoothing occurs over 24 hours
As you might have guessed, this is the first warning in our streaming analogy: when you watch too much in a given period, you are borrowing minutes from a future period. To continue the internet service provider analogy, you have downloaded too much data in a given window and start “borrowing” capacity from a future period.
What is Throttling?
Throttling in Microsoft Fabric happens when a tenant’s capacity surpasses its purchased resources, potentially leading to a degraded user experience. It operates through different phases, each with its own set of rules and implications:
Can you Recover from Throttling?
Microsoft recommends various workarounds to tackle throttling, including upgrading to a higher capacity, optimizing queries, redistributing tasks, or configuring proactive alerts for capacity admins.
Waiting until the overload state subsides is another option, but it comes at the cost of decreased productivity and potential SLA misses. Capacity administrators can also proactively configure alerts to be notified before throttling occurs, allowing for manual intervention.
Can you Avoid Throttling?
The only real way to avoid throttling is to have a highly accurate prediction of future workloads, purchase the appropriate Fabric F SKU with the correct amount of CUs, and adjust accordingly over time. Predicting the implications of smoothing and throttling also becomes challenging when you factor in the diverse range of workloads that can utilize CUs:
- Runtime and scheduled jobs
- Interactive notebooks across multiple teams/users
- Spark environment configuration
- BI tools and dashboards running interactively
What are Reservation Discounts?
It’s also worth mentioning how Reservation Discounts work. They are automatically applied based on your selected capacity and are factored into the price. However, if you underutilize your capacity, you are charged at the PAYG rate, not the reserved rate. Returning to our streaming analogy, you have a subscription that allows you to watch 10 minutes of content an hour at a discount rate. You are charged a higher rate if you watch for less than 10 minutes. If you watch over 10 minutes, you start to encounter smoothing and throttling.
Azure Databricks Cost Comparison Example
Azure Databricks, a solution also available and used by thousands of enterprise customers, leverages a true on-demand computing model. To compare the two, we need to get technical for a minute. Using an F64 capacity in Microsoft Fabric with an 8-core Spark environment translates to a cluster with one driver and 15 workers in Azure Databricks.
The last over-utilization scenario would likely result in doubling the reserved capacity in Fabric to ensure your capacity can handle the workload. Then you compare $16,818 / month in Fabric against $13,735 / month in Azure Databricks.
In all scenarios above, unless you can perfectly align your consumption with your reserved capacity, Databricks will either be cheaper (when you under-consume) or much more stable (when you over-consume).
You will likely have quiet months, too, where you only utilize 10% of your capacity. The cost difference is vast, with $915 for Azure Databricks against the fixed cost of $8409 for Fabric!
Microsoft Fabric vs Azure Databricks
Microsoft Fabric’s fixed-cost subscription model appears enticing at first glance, offering an all-encompassing analytics solution. However, beneath the surface lies a complex interplay of capacity models, consumption throttling, borrowed workload capacity, and reservation discounts.
For less cost, using Azure Databricks provides additional features, including security and governance, data quality, monitoring, pipeline orchestration, and data science / LLM capabilities.
Analytics and ML workloads are not fixed month to month. Usage will vary greatly. With Microsoft Fabric, you either pay too much by underutilizing or see intermittent throttling, outdated dashboards, and users being locked out. At that point, you’ll likely have to double your capacity and then underutilize the next month.
When you inevitably choose Azure Databricks, Aimpoint Digital is the market leader to guide you to success. Our team of highly skilled analytics experts, data scientists, and engineers work closely with our clients to empower your team to harness the full potential of Databricks. We have a proven track record of success in delivering projects of all types, ranging from large-scale enterprise deployments of AI applications to accelerated data platform migrations, onboarding, and enablement.