How we built Success+ using our ethical machine learning programme
Last editedApr 20205 min read
Here at GoCardless, we recently launched a new product: Success+. This product launch is significant for privacy because it's our first product specifically built on our machine learning capabilities.
Success+ uses payment intelligence to improve the likelihood of payment success. At the moment we're seeing businesses using the intelligent retries feature recover around 76% of failed payments that would otherwise have been lost without retrying the payment.
In my role as GoCardless’s data protection officer, I spend a lot of time helping the company think through not just our lawful use of data, but also its ethical use. That means I work closely with our data science team to make sure our machine learning models and features adhere to best practices for ethical machine learning and privacy by design.
This product launch gives me a great opportunity to share how we approach those principles, with the specific example of how we built them into Success+. Our proactive approach puts both data ethics and data privacy at its very core.
How does Success+ use data?
Success+ will use payments data to develop a suite of features designed to help merchants manage their failures. The first feature, intelligent retries, uses a machine learning model, trained on millions of recurring payments processed by GoCardless, to schedule failed payments for a retry on an optimal day for each customer.
Our algorithm uses the timing and characteristics of a payment to predict the likelihood of a retry attempt failing on a number of days in the future. We combine these predictions with the preferences of each merchant to decide if, when and how many times to retry in the most optimal way.
Our privacy by design programme
When we started to build Success+, we took the steps you’d expect of a company that takes its data protection obligations seriously. Regulated under GDPR, we’re subject to its requirement to include privacy by design and by default.
We kicked off the Success+ project with a privacy impact assessment that allowed us to think critically about the potential risks and harms that a tool like this could have to individual rights and freedoms.
The way we do privacy impact assessments at GoCardless allows us to address two issues:
What do we need to do to make sure this product doesn’t violate the law?
What do we need to do to make sure this product doesn’t violate privacy and other fundamental rights?
Those two things are definitely interconnected, but thinking only about violating the law may not lead to the right outcomes. So we follow a model that doesn’t just ask questions about legal obligations, but requires those building the products to think critically about the potential for harm.
If you’re interested in how we do that, we use a taxonomy of privacy harms based on the Solove model, customised to GoCardless.
The ethics of machine learning
When we conduct a privacy impact assessment for machine learning capabilities, we make sure to consider the ethical implications of using machine learning in the product.
The data science programme at GoCardless follows widely recognised principles of ethical machine learning. The ‘Big data, artificial intelligence, machine learning and data protection Information report from the UK Information Commissioner’s Office is an excellent starting point to understand what any business should consider when using machine learning.
The GoCardless data science team uses the 8 principles of responsible machine learning development, a practical framework for making sure the machine learning models we build incorporate principles of data ethics and protect privacy and other fundamental human rights.
How did we apply the ethical machine learning principles to Success+?
Using the 8 principles and conducting a privacy impact assessment helped us identify how to build and train the Success+ intelligent retries model and how to deploy the product in a way that protects the privacy and fundamental rights of our users. Let me explain that in more detail using seven of the eight principles below. The 8th - displacement strategy - turned out not to be applicable to Success+, as no employees were displaced in the creation of this programme.
1. Trust by Privacy
We want our customers to trust us with the use of their data. Our privacy notice discloses to payers how we may use the data to increase payment rates and drive improvements in our service.
2. Data risk awareness
We make sure to secure and minimise the data our machine learning models use. We limit who in GoCardless can access the raw personal data that powers our models by default. GoCardless employees can only access anonymised datasets. Where personal data access is authorised, we continuously monitor its use.
We continue to take steps to minimise the use of personal data in our machine learning models, for example through techniques like differential privacy. A number of powerful features our models use are created using individual-level data, however we ensure that these features are combined with population averages depending on the amount of individual-level data available.
We can explain how our machine learning models are making decisions. We prefer to use signals in our algorithms that are based on an in-depth understanding of the problem at hand, rather than relying on the pure complexity of our modelling techniques to achieve performance. This helps us understand what is behind each signal and its importance to the model.
We also use tools such as SHAP (SHapley Additive exPlanations), which allow for understanding of the effect of features at an individual prediction level.
Reproducibility in our development environment is key to checking and understanding the model’s performance and its decisions. We have ensured that we can reproduce our model’s decisions where possible, and as we mature, we are looking to securely archive data from each step of transformation pipelines to improve this further.
5. Practical accuracy
Chasing improvements in simple accuracy metrics of algorithms can be tempting, but they often do not track other potential outcomes resulting from the algorithm’s decisions.
We use machine learning to make decisions in complex environments with multiple outcomes and we therefore monitor the overall environment the model is impacting. We not only check the accuracy of predictions, but also change in the various outcomes as a result of those predictions and any (sometimes unexpected) longer term effects those decisions can have on our customers.
6. Bias evaluation
We are aware that there are biases in our data used for model training, which can result in biased decision making. We try to understand, document and monitor these biases and use experimental approaches to measure these. For example, we can compare the decisions made by the model trained on payments processed on days as selected by merchants (which may have societal bias) versus randomly selected days.
7. Human intervention
If I can use another machine learning example, GoCardless’s algorithms are great at helping us identify and prevent cases of fraud on our system. But they are no replacement for the Fraud team, who ensures that our processes are fair and that their performance is aligned with our objectives.
With Success+, we carefully considered if the intelligent retry model’s decisions could adversely affect a payer. For example, retrying a payment unsuccessfully could mean the payer incurs extra costs. We are planning to introduce a process that weighs this risk with the expected gain.
Launching the product at a global scale also means greater risk for payers to be impacted, because our algorithm has been primarily tested on Bacs (GBP) payments. Because of this we have a human in the loop when training, deploying and monitoring the model while it’s in action.
These controls give GoCardless and all of our team a guide on how to always put privacy at the forefront of any future product development that involves machine learning.
But the very nature of machine learning inevitably means new considerations will come into focus, as we explore new products and try to solve different problems. And that means we’ll always be evolving our controls too.Find out more about Success+ here. A big thank you to Liucija in our data science team for helping me pull together and explain a lot of the information in this blog post.