There are a number of strategies for masking data in Power BI – each with it’s own benefits or potential drawbacks.
The most secure way to mask data is to simply not include it in the data model. It would be removed during the ETL process by removing the column entirely using Power Query.
Of course the obvious drawback to removing the data prior to loading it in the data model is that it is not available for analysis.
Another secure way to mask data is using Object Level Security (OLS) or Row Level Security (RLS). These processes hide columns (OLS) or rows (RLS) by filtering the data as it loads into the report based on rules and role assignments. OLS and RLS are the standard bearers for masking data in most professional projects because they also essentially ‘remove’ the sensitive data from the report.
Let’s examine the impact of Row Level Security to a simple sales fact table.
By design RLS applied directly to this table (or a related UsersTable) would remove rows – based on the applied DAX rule – often leaving just the data relevant to the user signed into the report. Here’s how that might look when for Salesperson ‘Steve’.
However, using RLS in this manner can also impact calculations that might need to see “ALL” of the data. Consider what happens when we use RLS with a “% of Total” calculation.
When RLS is applied, rows associated with other sales persons are removed from the Sales fact table. This (of course) then throws off the calculation for [% of Total] because the formula no longer has access to those missing values.
In order to preserve this calculation we need to keep all the data in the Sales Fact table – and mask the linkage to each salesperson. This is where data masking comes in handy!
To implement this we first have to obscure the connection between sales amounts and the sales person’s name.
- Create a surrogate ID for each Sales Person in the Users Table.
- Merge the surrogate ID in to the Sales Fact Table – and remove any identifiable columns previously in the table.
- Remove any rules or relationships that might be impacting the Sales Fact Table when RLS is applied. We need ALL the data!
- Use DAX to “MASK” other sales persons based on context of who is logged in to Power BI.
The above formula works in tandem with an RLS rule on the Users Table.
The formula is included in a modified table along with the Surrogate ID. If no RLS is applied, the formula returns “Masked SalesPerson” for everyone and is removed from the Total row.
Once RLS is applied – USERPRINCIPALNAME() will filter the Users Table down to one row in order to hide the other surrogate IDs and the formula updates to show a name. (Keep in mind if this were a real report we would need to use an actual email address for USERPRINCIPALNAME() to work correctly in the service).
Of course this solution still “shows” the other salesperson values, even though the names are masked. What if the request is instead to build a solution that hides those values, but still preserves the [% of Total] measure?
We can solve this by writing DAX that returns a value to denote a user is logged in, and then use a visual level filter to remove the others from view.
Now we just have to filter the visual using a Visual Level Filter.
Ok – it should also be noted that users can potentially see and TURN OFF the visual level filter – so we would need to hide this element if this is something we don’t want happening.
However, look what happens when we turn off the RLS and get back to building the report….
The visual level filter is still active and has vanished the entire table. That’s going to be a bit frustrating over time – especially as a report author. So how can we fix that?
We can tweak the formula to also return a ‘filter-able’ value when RLS is turned off.
This updated formula above returns the number 500 when RLS is not applied.
This way, we can adjust our visual level filter with an OR clause for 1 or 500.
Success!! – now we don’t have to deal with our data disappearing during report development.
Another strategy for masking the non-viewing salesperson data would be to employ asterisks to simply replace the numbers.
This involves modifying the [Total Sales] and [% of Total] calculations to change to “**” when “Masked Salesperson” is showing.
These two solutions work well because we are able to piggyback on the ability of Power BI to ‘see’ who is logged in via the USERPRINCIPALNAME() measure. But what happens when that function is no longer available to us.
Masking Data for Content Embedded on a Public Website
This topic seems to go against all best practice when it comes to masking data.
Why would we want to mask data that is embedded on a public website?
A good example of this would be for organizations, like school districts, that are already charged with publishing data on public websites. They often publish data elements like state testing results but still need to obscure elements of the data when the counts are below a certain threshold, for privacy reasons.
Power BI is a great candidate for this reporting – but the dataset MUST be completely cleansed of absolutely any personal or identifiable data. While the underlying dataset is NOT visible to users interacting with the report, a determined actor could reveal the names of tables and databases using browser tools. so keep this in mind when building out anything that goes into the public domain.
It’s also worth noting that RLS is NOT available with content embedded on public website. In fact – Power BI hides the Embed to Public Website option if RLS is enabled on a report.
So with no RLS available – we have to use a different strategy to mask the data elements. For this task we also want to maintain the totals in our tables but remove any visible values that are less than 10. Let’s do this using the following table below that shows the count of students by race that took a basket weaving test.
To solve this problem – we need to use a modified DAX formula that swaps in asterisks any time the value is less than 10.
Pretty easy right!? We can extend this same logic to any accompanying formulas – by forcing them to return “**” if [Count Tested Masked] shows “**”.