The Rise of A/B Testing – Building effective A/B tests

A/B Testing on the rise? Surely not, only last month I was bemoaning the fact that organisations we’re doing rigorous A/B testing any longer. Was I lying? Misinformed? Or did I simply like the symmetry of the words “demise” and “rise” in my blog titles ;-)? The truth leans more toward the latter (I do like symmetry), but it does go deeper than that. The truth is not that A/B Testing is on the rise, but that it could and should be. In this blog, I want to discuss building effective A/B tests in the modern credit environment

Without rewriting the entire blog from last month (The Demise of A/B Testing), here’s a recap. My concerns stem from the following:

  • A/B Testing has fallen out of popularity.
  • Organisations focus on scoring model improvements as a means of improving business quality, rather than strategy testing.
  • Focus has shifted to one of speed over rigor.
  • Regular strategy changes get applied across the base, but, multiple diverse tests applied on smaller segments of the population would actually be considered ‘best practice’.
  • Accountability for new strategies is negated, as new and old strategies cannot be directly compared with previous generations.

Do we need A/B Testing?

Let’s pause for a moment and consider the need for A/B Testing. Is it really required in the modern credit environment? As credit risk managers, we’ve been testing strategies for decades now. Surely, we know what the optimal strategies should look like? Well, that might be true if we were still operating in the credit environment of the 90’s or 00’s, but the reality is that the credit industry has changed a great deal since then.

Over the last few years, I’ve noticed a significant trend shift in my client and prospect base. With the advent of the fintech, it’s all about exposing credit offerings to markets and populations that didn’t have access to credit previously. Not only to new populations but by using different engagement models.

When the COVID pandemic struck, there was a massive shift in the credit industry to digitise credit grating. Small value short term microlenders scrambled to create online application channels, enabling them to continue to grow their businesses in a time when face to face contact (and thus in person credit application) was dangerous. The mobile telephone became the great technology leveller, widely available and used.  Using this technology (particularly smart phones) organisations can reach broader population bases and gather a wide variety of data about potential customers.

Importantly, with increased digitisation came the need for decision engines that can assimilate and make sense of large quantities of data. Decision engines have allowed credit grantors to move from manual assessment of applications to a faster and more consistent automated approach.

The question now is how we learn what the appropriate credit offerings should be for an entirely new population that interacts using modern engagement channels? There is only so much that our previous experience will tell us, and there is often no historic data on which to model outcomes. We can gather a lot more data on potential customers, but it takes time before we can turn that data into usable information.

This is where A/B Testing should be making a comeback. And the great thing is, a lot of these newly digitised lenders are in an excellent position to capitalise on its benefits.

  • Digitally gathered data is easily processed in decision engines
  • Digital processing means that more credit decisions are automated
  • A good decision engine will have A/B testing built in from the ground up

With all the above in place, all that’s left is deciding where and how to start.

How to decide what to test

My focus in this blog is going to be on new business originations. This doesn’t mean that A/B testing is less important in existing account management, in fact some of the most profitable tests I have seen came out of the account management area. It’s simply that I need to start somewhere, and new credit granting is where we have the first (and arguably most important) interaction with the customer. Offering the correct products to the right applicants with the appropriate terms is key to building a strong credit portfolio.

Given that we need to rapidly learn what the appropriate credit granting strategy is, where should we start?

The first thing to do is consider all the levers that are available to us when granting a credit product. Here are a few examples:

  • At what risk level should I accept an applicant?
  • What products can I offer?
  • What size of facility should I offer?
  • How much should I charge?
  • What product repayment terms should we offer?
  • Are there any ancillary services that I can offer?

Every one of these levers provides opportunities for A/B Testing. Not all of them will be applicable for all credit products and all organisations, so start by creating a list of testing opportunities. All tests must be supported and this determines the number of tests run.

We must ensure that there are sufficient application volumes to support the tests that we run. There’s no point in running a test if we are not going to have faith in the observed results. It is far more beneficial to run one bold test that has definitive outcomes than several smaller tests that return marginal outcomes.

How to measure success?

Deciding which levers to test is only the start. Identifying the success criteria of each test is critical to its ability to succeed.

Each lever that we test may have different outcome metrics. For example, adjusting the risk level that we accept will have an impact on the bad rate of the accepted population. It will also have an impact on the take-up rate, as higher risk customers will be more likely to take up credit offers.

A/B refers to the two test metrics

When designing tests, consider using at least two metrics per test. One being the positive effect that you are trying to achieve e.g. increase in approval rate). The other is the negative trade-off that will come as a result (e.g. increase in bad rate).

Importantly, try and find short term outcomes that will be good proxies for longer term performance. Using the example of adjusting risk cut-offs, we can measure performance risk by looking at the first payment default rate, and vintage curves to determine the success of the strategy early. Thus tests are evaluated quicker, and learnings are rolled out rapidly.

It’s a clever idea to focus each generation of tests around a single lever. For example, create several tests that vary the credit exposure to be granted by risk grade. Some tests offering more, some less. Empirical tools can then be sued to analyse the performance of each risk grade in isolation across all exposure levels. The result may be that the winning strategy is a composite of all the tests. Where higher exposures may work better in some risk grades, lower exposures may be more profitable other risk grades.

What about new scorecards?

In recent years, since the advent of rapidly redeployed scoring models, I have been asked several times about the A/B Testing of new scorecards. How should such tests be implemented, and why we don’t provide native support for A/B testing of scorecards within the ADEPT Decisions Platform? The answer is quite simple. The implementation of a new scorecard is not a test. Empirical scorecards are developed using tools that evaluate their predictive power directly. If your new scorecard has greater predictive power than the current one (specifically in the areas around your scorecard cut-off) then there is no need to implement the scorecards as a test. If the scorecard results offer better results, then it can be assigned to the entire application base. Implementing the new scorecard as an A/B test implies a lack of faith in the modelling software and the developer that built the model.

A/B Testing is about how the scorecard is used

Adjust the score cut-off to balance acceptance rate and bad rate. Vary the allocated credit exposure to maximise revenue and minimise bad debt. Vary pricing terms to entice lower risk customers. All these tests will mould the portfolio towards the requirements of the business. And each of them can be constantly improved upon with new generations of test.

And as risk managers, isn’t this the primary purpose of what we do?

About the Author

Jarrod McElhinney is a Client Solutions Manager at ADEPT Decisions.

About ADEPT Decisions

We disrupt the status quo in the lending industry by providing lenders with customer decisioning, credit risk consulting and advanced analytics to level the playing field, promote financial inclusion and support a new generation of financial products.