Azure Purview is a new data governance & data catalog service released by Microsoft in December 2020. It enters the market as a low-cost alternative to other data governance programs, and promises to provide unique data discoverability and data visibility across an organization’s entire data estate.
There are a lot of ways to dig into this tool to better understand the potential it can add to an organization. Instead of walking through every feature of the program – let’s frame it through a simple exercise that many people might experience.
Say we have been tasked with building a report and need access to the some basic Employee data tables. Maybe we are also ‘relatively’ new to the organization and haven’t connected previously to this data.
How would you go about locating these data tables?
You’d probably start by asking someone you work with? Maybe send an email to a seasoned data engineer or a colleague you worked with on a new project. Perhaps your organization has a dedicated help desk that can route your request to the proper person.
In many ways – Purview seeks to unpack this task – and to make an organization’s data estate more easily discoverable by those who need it.
In Purview – we could search the data estate on the keyword “employee”
and on the results page – every data asset scanned in Purview that has “employee” as part of its name is returned.
How powerful is this! Granted this is a demo list – but imagine being about to use the filters on the left panel to trim down the results until we arrive at a few target data tables.
Let’s explore one of these tables a little more:
What opens next are a series of rich information windows into this single Azure SQL Table. They include:
an OVERVIEW tab showing descriptions, schema classifications, asset hierarchy, linked glossary terms;
a SCHEMA tab showing each column name, classifications (including any classified PII),
a LINEAGE tab showing the captured path of data movement (I’ve rearranged the design to show all elements)
a CONTACT page in regards to who we can contact about this data.
and a RELATED page to visualize other data elements in the same source
Now let’s say that this is our target table, but we also notice that the same data has been moving through our Azure environment due to various Azure Datafactory copy activities that have migrated it from raw CSV file to clean file to eventually Azure SQL DB. If there isn’t a specified datamart – we could have certainly reached out to someone on the CONTACTS tab to establish if our target source is the correct one.
Let’s peak at the far right side of this lineage view – which shows the current end of the lineage line (where an ‘artifact’ copy activity remains visible, but doesn’t actually do anything).
Suppose that we have decided to source our Employee data from either the Azure SQL DB or Storage account (both pictured above). It’s a small file, so it won’t impact performance if we pick from either the SQL or ADLS location (remember, we are here to see Power BI assets in Purview),
Let’s fast forward, and assume we created the following Power BI assets that included this employee data:
— A Dataflow moving the Employee Data from Azure SQL to Power Query Online
— A PowerBI shared Dataset built off this dataflow.
— A Power BI Report built in Power BI Desktop and published to the service
— A Power BI Dashboard summarizing a few Power BI reports (including the one we just specified).
What does this look like in Purview?
(Of course this assumes that Purview has scanned our Power BI environment after we created these assets)
We can ‘discover’ each of our Power BI assets in a variety of methods but perhaps the most compelling view is the lineage view which tracks the movement of our employee data from dataflow to dashboard.
This is pretty powerful view (that will likely only improve as the Purview moves through Preview releases). There are a few early limitations worth calling out:
Power BI assets lack a schema view (meaning we have no visibility on the columns or data that are included in any of these assets. We notice that tab missing from the list of page options.
In other assets – schema view provides a detailed looks at the classifications and columns included in every scanned table.
Unfortunately, our Power BI assets are missing this schema view in Purview’s early release. Let’s hope it’s added soon, as this is an essential part tracking the flow of sensitive data into Power BI models (notice columns like SSN and salary included above!)
If we step back in the linage to the Dataflow, we can see everywhere this dataflow has been utilized. We can also see how shared datasets are visualized in the same view.
We can expand the datasets, reports, and dashboard icons to see options to navigate directly to those assets in Power BI (presumably, if we have access to them). Unfortunately the “switch to asset” button is not available yet for the dataflow.
There is a similar view available to us in Power BI (if we navigate to the workspace dataset page view and select “View Lineage”.
Here we can see the same assets we saw in Purview – except scoped to just the workspace PBI Testing Space. In Purview – we can see the movement of data ‘across’ multiple workspaces in one consolidated view.
Notice in the Power BI lineage view above we also see some extra items that aren’t visible in Purview – such as the Azure Blob Storage cell that is the source of the the Sales_andEmployeeData dataflow.
The native Power BI lineage view also allows us to view column metadata (something that Purview hasn’t quite pulled in yet).
It’s only just a matter of time before Purview updates and brings in this known Power BI metadata. The idea of one long unified view from raw data to Power BI dashboard is not quite available yet – but it is enticingly close.
If you haven’t had the occasion to explore the capabilities available in Purview – it’s worth a look. Those of you who used Power BI back in 2015 will remember a program that had great potential but lacked a lot of the core functionality and features that exist today (like RLS, variables, shared datasets/dataflows… etc).
Purview is the same way. There is extraordinary potential to it in the data catalog and data governance world (especially for organizations looking for a low-cost point of entry) with new features coming online each month.