Last editedMar 20233 min read
Data scientists now have a new source of relevant information: open banking. If you are asking yourself what open banking is, you can look at it as a new banking practice that securely gives regulated third-party financial service providers access to financial information.
This enables access and sharing of account information as well as data regarding payment transactions and investments.
Open banking can bring several opportunities for app development and business. When it comes to data science, open banking data offers a lot of new information that can be used to improve models, such as those used to determine creditworthiness through scoring.
In this article, we’ll explore some fundamental concepts that will help us understand how open banking can help data people develop new skills and new products.
What is the difference between AISP and PISP?
Open banking can be characterised by two types of companies that are authorised to provide open banking services:
AISPs (Account Information Service Providers) — these companies have “read-only” access to a customer's bank account, meaning they can only see the information but are unable to access the account itself. An AISP cannot, for example, move money between accounts;
PISPs (Payment Initiation Service Providers) — these service providers can process payment transactions on behalf of a customer, and are able to withdraw money directly from a bank account if they have the proper authorisation;
While AISPs can only provide access to financial data and act as an intermediary between financial institutions, PISPs can also use online banking to make instant online payments on behalf of customers.
To share data, financial institutions rely on application programming interfaces (APIs) and informed consent from the customers is a must. To comply with regulations, all third-party financial service providers have to follow data protection legislation and guidelines, including informing customers about which data is being used, for what purpose, and for how long.
Sharing financial data does not come without some security roadblocks. However, authorised third parties are supervised by the Financial Conduct Authority (FCA) or any other European regulator, depending on the Member State. Authorised third-party companies will either appear on the FCA's Register, and/or the Open Banking Directory.
Data science challenges and opportunities: dealing with a new data source
Open banking data comes with some data science challenges. These can also be associated with open banking development challenges.
Open banking data: consistency and standardisation
Although an open banking standard defines how open banking data is returned, open banking data has consistency and standardisation issues. This means that the content of open banking can vary, and banking data may be presented in slightly different ways, by different banks. For example, one bank may present a date with letters and numbers, while others may only use numbers.
Open banking data also stems from multiple sources and can be multilingual. Depending on the county the bank account is from, or the language the user uses, data is likely to be organised differently and have different parts of their financial data in distinct languages.
There are also various transaction sources and points of sales terminals, using various banks. This means that there is not just one pattern that can be easily applied.
Data scientists can use these challenges as an opportunity to apply existing natural language processing techniques, as well as research and develop new algorithms and approaches.
Open banking data: pattern recognition
In terms of pattern recognition, data scientists must understand text-based patterns using recurring transactions based on text and time series-based patterns using transactions that always occur simultaneously.
The challenge is to recognise the pattern itself and filter out noise in the dataset.
Open banking data: personal identifiable information
Another thing we have to think about is personal identifiable information within open banking data.
Open banking data contains a lot of personal identifiable information, which can be tricky to remove. It can be easy to remove the personal data of the account holder, such as a person’s name, IBAN and phone number. Personal data of others, however, can become a challenge — think about how a data scientist must recognise that a name is not a merchant, but a private person.
Open banking: data cleaning and data enrichment
As a result of these challenges, data scientists must go through extensive data preparation, including data cleaning and data enrichment.
Data cleaning identifies and removes errors and inconsistencies in data, while data enrichment enhances the existing information by supplementing or adding missing data to a dataset.
Data enrichment can add context to each banking transaction, such as on which platform the transaction took place and what category the platform falls under.
How can open banking be used to clean and enrich transaction data
Clean transaction data can have multiple use cases, and open banking has the potential to help you improve your business offerings. Recognising the main purpose of each transaction, based on its description, amount, date, and contextual metadata is of the essence.
Open banking data services can help you organise data points and transaction information, specifying business categories and transaction purposes. These services can usually do this hastily, cleaning information to provide quality data that is simple to understand.
Additional details on merchants, like logos, websites, or even location can help bring better products to market because the raw, unstructured text isn’t useful, but detailed and organised information is priceless.