Data Engineering & Infrastructure
Imagine this: you are a small business owner selling phone cases both online and through a kiosk in a shopping mall. Cost margins are very tight. Your credit card company sends you an email highlighting the advantages of its new, streamlined unified platform that reduces the fees associated with processing payments… What a pleasant surprise!
A few weeks later, you are balancing the books and notice only a small portion of online payments had reduced fees, and none of the in-store payments did. You call your credit card company, and they explain that reduced fees are only available for certain online transactions.
Welcome to the Microsoft Fabric OneLake two pricing model. We will use this analogy to help explain the cost model based on how you access your data.
In part one of our Microsoft Fabric series, we discussed the hidden costs associated with Microsoft Fabric. In this second part, we’ll explore the cost of accessing your data in Microsoft Fabric.
How Microsoft Fabric Works: The OneLake
Microsoft Fabric is a collection of integrated data services that allow you to store, manage and analyze your data. One of the main features of Fabric is its data lake solution – known as OneLake.
OneLake is built on ADLS Gen2, which is a cloud-based repository for structured and unstructured data. It comes bundled with Microsoft Fabric and you only get one instance of OneLake. It provides file exploration like OneDrive; however, many features like SFTP are not supported. This article provides a nice list of feature comparisons.
Understanding the cost implications of data access methods in OneLake is critical.
First, we will outline the difference between the access methods, then highlight some pitfalls when accessing OneLake outside of Fabric, and then show examples of accessing OneLake via Fabric that further highlight the dual pricing issue.
Two Data Access Methods on OneLake
Let’s dig into the details of how you actually get charged for accessing your data on OneLake. There are two main methods for charging you to access your data on OneLake: Redirection and Proxy.
Per Microsoft, “Redirection is an implementation that reduces consumption of OneLake compute” for certain transactions on your data (like reading files hosted on OneLake), while Proxy is basically everything else that can’t use redirect (like writing files to OneLake).
To further highlight this, we will talk through each of these both inside and outside of Fabric.
Accessing OneLake Outside of Fabric
Microsoft provides many options for accessing data on OneLake and touts it as “a single, unified, logical data lake for the whole organization.” What this article doesn’t show is that almost all of these access mechanisms will incur 3x the cost through the proxy method vs accessing it via Fabric.
Why would you want to access your data on OneLake outside of Fabric? There are several reasons:
- You need to directly explore your files, like uploading and downloading.
- You need to connect to Databricks for advanced analytics, machine learning, and GenAI.
- You run Custom Applications that need to connect through the DFS API to Azure Data Lake Storage (ADLS).
This is the in-store payment terminal analogy. Your credit card company told you its new, streamlined, unified platform would reduce the fees associated with processing payments, but it turns out that terminal payments do not utilize the new platform and do not have reduced fees.
Accessing OneLake via Microsoft Fabric
According to their example, uploading files to ADLS via Fabric:
“Requests to OneLake, such as reading, writing, or listing, consumes your Fabric capacity. OneLake follows a similar mapping of APIs to operations like ADLS. The CU consumption per each type of operation can be viewed in the Fabric Capacity Metrics app. In our example, the file upload resulted in a write transaction that consumed 127.46 CU Seconds of Fabric Capacity. This consumption is reported as "OneLake Write via Proxy" in the operation name in Capacity Metrics App. Now let’s read this data using a Fabric notebook. You consume 1.39 CU Seconds of read transactions. This consumption is reported as "OneLake Read via Redirect" in the Metrics app. Refer to the OneLake consumption page to learn more about how each type of operation consumes capacity units.”
Back to the credit card analogy. It turns out only a portion of website payments can utilize this new technology to reduce fees in the same way that only a portion of transactions accessing OneLake via Fabric operate on the new mechanism (Read via Redirect) versus the old method (Write via Proxy).
Taking a glance at the Wayback Machine, we can see that Microsoft was initially more explicit about these 3X costs. The previous language referred to “applications running inside of Fabric” and “applications running outside of Fabric environments.” The subtle language they use now says: “applications that redirect certain requests” and “data is accessed using applications that proxy requests.”
Is OneLake as Unified as You Think?
This is another example of Microsoft cobbling together products, giving them a fancy name, and putting together an overly complex and hidden pricing model. It’s not clear if Microsoft is intentionally punishing their clients who do NOT use Fabric, but it seems egregious to label OneLake as being a unified service for you to access your data and then charge you more to load files and access your data from external applications...
Our recommendation: if you are storing your data in Azure keep using ADLS Gen2. That way you will have all the benefits of commodity blob storage with transparent, neutral pricing.
Aimpoint Digital has a proven track record of success in delivering projects of all types, ranging from large-scale enterprise deployments of AI applications to accelerated data platform migrations, onboarding, and enablement. Click the “Meet an Expert” button to connect with our team.