Probably one of the most common DAX Patterns to explore is around new and returning customers to a business. This pattern is so wildly popular that it has its own webpage and a detailed breakdown of how to setup and execute these patterns.
According to the site – each solution has been optimized after numerous iterations and tests, but should also be vetted against our own requirements to ensure it produces an acceptable performance. I wanted to explore the ‘new customer’ part of this solution to better understand how each component of it works and to visualize it as we step forward.
Get a Sample DataSet
A simple approach to testing DAX on this pattern is to download a Power BI Desktop file with an AdventureWorks data model already built!
This one is based on the 2020 collection and comes with about 121,000 rows of Sales data and 18,000 customers.
This file also has the Server and Database names parameterized so if we want to be able to refresh it – we would need to install the AdventureWorks sample DB somewhere ‘locally’. For this exercise – we will simply use the PBIX file as-is.
Decision about Design
The patterns we are using today are largely informed by the design of the table and information we seek to have returned. Small changes in design will necessitate small changes in the DAX pattern to accommodate additional filter context requirements.
For this we will use a pattern driven by the Month dimension of the Date Table (which although named ‘Month’ is actually ‘Year & Month’ combined. This will be the foundation for all measures moving forward
We will also create a simple DISTINCTCOUNT measure that returns the unique number of Customers who made a purchase each Month, in order to give us a comparison point when NEW customers start to separate from RETURNING customers.
For this deep dive – we will explore how our New Customers DAX expression is impacted by various changes or filters on this base DATE table. As mentioned earlier – the type of formula we use should be informed by what filters/action we want to take on the table – so I would expect that we will reach the natural limits of this pattern before requiring some code changes.
Two Measure Strategy
One principal of this DAX pattern is that we will create two measures that will work in tandem. The internal measure is a helper-measure that will be called from inside the external measure, which is the work horse of this pattern – and ultimately what is loaded into the table.
Internal Measure: Date of First Purchase
It’s also good practice to ‘think’ about what we want to accomplish before diving into the DAX.
In this exercise – in order to know if a Customer is New – we need to know IF and WHEN they made their first purchase.
We can do this by looking at the sales table and returning the smallest (MIN) OrderDateKey for each customer. Additionally, we need this formula to look across the ENTIRE sales table – so we will need to remove any filters that might be applied from the DATE dimension.
Here are two correct formulas that do just that
Customer Date of First Purchase (all) = CALCULATE ( MIN ( Sales[OrderDateKey] ), ALL('Date') ) ---------- Customer Date of First Purchase (rf) = CALCULATE ( MIN ( Sales[OrderDateKey] ), REMOVEFILTERS ( 'Date' ) )
These formulas are exactly the same – save for how they script the removal of filters from the Date Table. We can test if these formulas are returning the correct result – by setting up a small table with Customer ID (from the Customer table) against these two formulas.
As with all things DAX related – there are typically MULTIPLE CORRECT ways to write a solution. So how do we know if one of these is better than the other?
Let’s run a simple test using Performance Analyzer:
Both test-tables refresh in about the same amount of time. For good measure – let’s copy the QUERY for each visual from performance analyzer and run it in DAX Studio to return the server timings.
Pasting the QUERY that includes ALL in DAX Studio – and running it (after clearing the cache) – shows us it takes about 10 ms total.
If we do the same with the Remove Filters expression – we find that they are still neck-in-neck at 10 ms Total.
Keep in mind that if I run this timing test 5 times concurrently (with clearing the cache) – I get a small variation in Total Time (9ms – 11ms). These are minor differences, so let’s keep both formulas handy to see if either has any measurable impact when we shift to running it inside of our external measure.
External Measure: Count of New Customers for Each Month
We want the number of customers that made their FIRST purchase in a given month. Since we cannot use our Internal Measure directly on the table, we have to create a virtual summary table in memory. This Virtual table will look nearly identical to the tables we built earlier while testing the (ALL) and (Remove Filters) internal measures. The only difference is that we must use the CustomerKey since this is already available to us in the Sales Table.
We build a Virtual table using DAX. The process is exactly the same as if we were building an ACTUAL calculated table using DAX.
We can create a table using the ADDCOLUMNS function, which requires us to specify a table in the first argument. Rather than specifying an actual table in our model, we will use a function that returns a “table”, such as VALUES which returns a 1 column table of the unique values in the column we specify.
Using this VALUES table, we can add a column using the internal measure from earlier.
If we want to visualize this virtual table, we can use the same formula to create an actual calculated table or paste it DAX Studio. In both cases, a 2 column table is returned.
Now consider the impact of Month Context
While these tests are productive in showing us the ‘virtual table’ they do not take in account the month context introduced by the actual table it is evaluated within. This limits the customerkeys in our VALUES expression to only those that are made a purchase in that particular month.
Interact with the Virtual Table
At this point we only want to consider customers that have a Date of first purchase in the same month as the month in context. We could write a filter expression that checks if the Date of First purchase in one of the days in the current month.
Here’s an example of what that might look like:
Since our virtual table already has a unique list of customer IDs, all we need to do is run a COUNTROWS of the filtered virtual table to find the number of new customers for each month.
Is this the “BEST” pattern?
If we compare our pattern for New Customers against one of the patterns posted on the DAX Patterns website, there are some major differences. The key take away from this comparison is to ensure that any pattern selected returns the correct values for our desired report needs, and is responsive and fast against our data model.
Let’s explore one of the solutions provided on the DAX Pattern site (for new customers) and compare it’s performance against our own on our simple sales data model.
Here’s an annotated version of one of the formulas:
The major difference between this solution and our previous solution is the use of TREATAS and KEEPFILTERS. Let’s dive into these two parts of the solution in a little more detail.
By definition, we use TREATAS to create a virtual relationship when a physical relationship is not available. The result of the expression is a table that contains all the rows in column(s) that are also in table expression – which in this example is our virtual table.
Let’s try to visualize what is happening in this step by generating a virtual table in DAX Studio for a single month (June 2019).
We would expect a virtual table for June 2019 to return 499 unique Customer Keys.
Now let’s see what happens when we visualize the second variable in our formula – which contains the TREATAS
What’s very clear at this point is that TREATAS swaps out the columns of our initial virtual table with the specified columns from the DATE and CUSTOMER table. This is especially obvious when comparing the column name of the DateKey column from the previous step to this step.
Now let’s introduce June 2019 context to this TREATAS measure. Keep in mind that this happens automatically when this measure is run in our target visual – but when executed in DAX Studio we have to ‘artificially’ provide this context.
The last variable of this DAX pattern uses KEEPFILTERS in a calculate expression. According to the documentation page, KEEPFILTERS returns results based on the intersection of the arguments used as the context for evaluating the expression.
In layman terms, this means that it only returns the results for the specified context and does not override others that aren’t relevant.
In our example, we can think of KEEPFILTERS as using the filtered TREATAS table (now at 196 rows) to filter a Calculate expression that would otherwise return 499 rows (based on our June 2019 snapshot).
There was a lot of thought (from the guys behind DAX Patterns) around optimizing the performance of this pattern, so while both formulas produce the correct answer, one of them is likely more performant than the other.
Let’s test our solution against the DAX Pattern to see if there is any measurable difference.
Additionally, both perform in a similar total amount of time when testing the query in DAX Studio
Ok, from our simple example, there does not appear to be any significant differences between these two formulas, but certainly if we continued this test against a full-sized model with millions of rows in the FACT table, the performance may start to diverge.