There are a lot of incredible data sources available online that can only be accessed via API requests. Getting to this data may seem a bit daunting, especially for those who may not have experience writing code to pull the data down. In this post we are going to unravel the process of using Power Query to make API requests, but first, two quick disclaimers about using API data in Power BI…
Should we import API data into Power Query? YES! – Power Query is VERY good at dealing with the complex hierarchical structure of API (JSON) data and can help speed up development, especially with new or unfamiliar API data sources.
Should we keep API-based data queries in production Power BI datasets? IT DEPENDS. API data should ultimately be migrated to a more permanent storage solution like a database, where it can persist as long as it is needed. Removing API requests from Power Query will also reduce the likelihood of a potential point of failure. However, there are certainly exceptions where report developers may want to hit the API at each and every refresh and aren’t as concerned with long term data persistence.
Disclaimer over….. Back to the data!
First things first. We need an API-enabled website.
How about: https://collegefootballdata.com/. I like this example because it’s free and requires the use of an API key for authentication.
This site also provides a few other pieces of information that we need when making API requests:
The base URL is the repeatable part of the API call that remains the same throughout.
A swagger page provides a detailed list of all of API calls available from this site – along with a list of parameters that can be included in each request. Today we are going to request player stats by season using the GET /stats/players/season call (pictured below).
This source additionally has a Try it out button which allows us to enter parameters and generates the API request URL automatically! Note that for this call – a YEAR parameter is required. Here is the request URL for 2020.
Before we jump over to Power Query – it’s best to know immediately if this API request is going to work. (For whatever reason, the try it out button on this site isn’t working fully) Instead, let’s use Postman, which is extremely easy to test API calls with. Just paste the URL above into a “GET” request – and under the Authorization tab – paste the API Key provided upon registration. That’s it! – Hit Send.
A working request will immediately return data and in this case a status of 200.
Now let’s translate this process to Power Query. We start with the GET Data from Web tool – and select Advanced. Paste the entire URL request in the URL parts section. Then in the HTTP request Header parameters (optional) cell – type “Authorization”. In the adjoining cell – type “Bearer” then insert a space and then paste the API Key.
Clicking OK will kick off the API request – and open Power Query with the returned results:
It works! More specifically – look at how well Power Query automatically handled the JSON format of the returned results. Granted this particular API call does not return a hierarchal structure (which would require a few more transformations to flatten) but with Power Query – we have easily imported API data directly.
Now there is an obvious problem to this strategy – and that has to do with how the API request limits us to pulling only 1 year at a time. How should we approach getting data from previous years?
The quickest strategy might be to simply ‘duplicate’ the first query and just change the year. This leads to us having 4 queries that must then be appended together. But what if we wanted 10 years of data…. or 20 years … or 50 years!?
There is a better way. Let’s parameterize it!
Open the Advanced Editor and find the part of the query that changes for each year. This is the part we are going to replace with a parameter.
To do so – we need to create a new Parameter:
Let’s name this parameter “Year”, make it type “Text” and give it a default value of 2020.
Next – let’s swap out the hard-coded year in the Advanced Editor with a reference to the parameter value
If this is done correctly – the query will continue to work as expected.
So then how do we take advantage of this newly parameterized API request? Or more specifically… how do we call more than 1 parameter at at time?
Answer: – Make the query a custom function!
It’s a good habit to prefix the name of a function with “fn”. Notice also that the Create Function window automatically detected the parameter value we want to associate with it.
After hitting OK – Power Query spins up a new Query Group containing our original query, the parameter, and the custom function.
To use the custom function – we can call it against a set of data that provides the parameter value. A good example of this would be creating a simple table with our 4 years listed.
The values in this table must be type text to pass correctly to the function and parameter. Next we select “Invoke Custom Function” and verify that our function and ‘parameter’ input column are selected.
Once we hit ok – the function will run against our input table and provide the API results for each year in a table.
We can expand the Stats column….
to show 4 years of API data being pulled in 1 step!
After a little cosmetic cleanup (updating column and query names) – it’s ready to be imported in the model. This particular API call of 4 years brought in over 323,000 rows of data!! Awesome!
This pattern of using column inputs to feed a custom function can be applied for a lot of different applications, considering how easy it is to convert a query to a custom function.