Skip to content
Breadcrumb
Resources
Open Banking

Enrichment and categorization models: Unlocking the power of transaction data

GoCardless
Written by

Last editedMar 2025

In today’s competitive financial landscape, accurate transaction categorization is critical for organizations that are seeking to lower credit risk, optimize operations, and unlock new growth opportunities. 

What is financial transaction categorization?

The process of categorizing financial transactions offers insights into customers' income, expenditures, and credit obligations, forming the backbone of personalized lending solutions and other financial services.

However, creating a robust categorization engine is no small feat. It requires structured, clean data from across various institutions – an area where GoCardless’ Account Information Service powered product Bank Account Data excels.

While GoCardless doesn’t offer an out-of-the-box financial transaction categorization engine, our Bank Account Data product delivers reliable data needed to develop your own solution, making sure you have full control of the design so it fits the specific needs of your business.

Let’s explore the business problems driving this need, the pros and cons of in-house vs. third-party solutions, and how funding and financial insight platform re:cap used GoCardless to generate powerful cash management insights for its customers. 

Ready to build your model on top of the GoCardless data?

Securely access your user’s bank account information for better lending.

Contact salesLearn more

Weighing up your options: build or buy?

When deciding on a categorization engine solution, organizations often face a critical choice: build or buy.

Building in-house 

Developing a categorization engine internally allows for complete customization so you can meet any unique business needs, while seamlessly integrating it with any other existing systems. This approach also gives you full control over the engine's evolution and adaptability. 

Context matters a lot in transaction categorization, because at the end of the day classifying transactions is highly subjective. It often depends on the specific use-case and whose data you are analysing, especially when trying to categorise business bank account data. For example, depending on what industry the business is in, a purchase at a depot store might mean office supplies for a financial services company, but for a construction business it might be raw materials used in construction. Building in-house empowers companies to design a categorization engine according to their unique perspective, making sure it aligns with their business goals and strategies. 

While the benefits are substantial, building in-house requires significant resources, including specialized teams, robust infrastructure, and ongoing maintenance. However, for companies seeking long-term scalability and tailored solutions, building in-house is often the best choice.

Buying a third-party solution: Third-party categorization engines offer convenience and a faster time-to-market, but can lack the flexibility to adapt to unique business models or niche market needs. They also often depend on pre-defined parameters and models, which may not align with a company’s specific goals and are difficult (or often impossible) to adjust.

Organizations building in-house categorization engines need high-quality, standardised input data as a foundation. This is where GoCardless’ Bank Account Data can help.

How re:cap built their in-house data categorization model with GoCardless

Re:cap is a financial insights and funding platform for fast-growing companies which was looking to optimise its lending operations as it expanded its services. Faced with several challenges, re:cap made a strategic decision to integrate GoCardless Bank Account Data in order to get access to their customers account data, and to invest in building out an in-house categorization engine to have full control over their credit assessment process. This resulted in:

  • Lower credit risk: Without an accurate categorization engine, assessing borrowers' risk is inefficient and error-prone. Re:cap improved its risk processes by using GoCardless’ Bank Account Data, which enhanced transaction categorizations and reduced the likelihood of approving high-risk borrowers.

  • Optimize operations: Manual categorization of financial transactions can take up a lot of time and resource. Re:cap streamlined its workflows by integrating GoCardless, automating categorization and allowing staff to focus on higher-value tasks, thus boosting productivity.

  • Expand into new markets: Different transaction formats, currencies, and languages can create roadblocks. GoCardless enabled re:cap to navigate these complexities effectively, providing accurate classifications regardless of locale or currency, facilitating smoother market entry.

“We built our models in-house because we weren't satisfied with third party solutions that tended to be trained by people who didn’t understand the complexities of our data as well as our experts. We benchmarked them against others in the market and ours are more accurate due to this expertise” says Jonas Tebbe, Co-founder & CPO, re:cap.

The better the data, the better we can train our model to automatically categorise spending and the less manual work there is for us or customers.
Fritz Finken

With GoCardless’ Bank Account Data, re:cap is now able to deliver a deep, real-time understanding of financial health, cashflow runway, and spending trends, as well as the ability to manage risk in its financing portfolio. You can read more about how re:cap use GoCardless in our customer story.

Ultimately, the decision to build or buy depends on the company’s priorities, available resources, and long-term goals. For Company X, building an in-house categorization engine with the right data partner ensures the flexibility and scalability needed to compete in new markets.

The evolution of categorization engines

Transaction categorization has evolved significantly over time, transitioning from simple rule-based approaches to cutting-edge Generative AI (GenAI) models:

  • Rule-based systems

    • Description: Early categorization relies on manual rules or simple keyword matching (e.g., "rent" or "salary").

    • Use Case: Useful for gaining general insights on spend categories when high categorization accuracy isn’t paramount.

    • Benefits: Straightforward, cost-effective, and quick to implement.

    • Challenges:

      • Limited accuracy: Keywords don’t capture context leading to categorization errors. For example, the term "salary" could indicate income from a job or a reimbursement from a friend.

      • Scalability and localisation errors: Keywords like "gift" might mean a purchase in English but a poison in German, leading to potential misclassifications in multilingual datasets.

    • Conclusion: Quickly categorises transactions in a local market but struggles with multilingual or ambiguous data.

Statistical models

  • Description: Uses historical data and statistical techniques to identify patterns.

  • Use case: Effective when predicting specific, well-defined business outcomes (e.g. the likelihood of loan default) using bank account data.

  • Benefits:

    • Data-driven and predictive.

    • Can identify recurring expenses or seasonal trends.

  • Challenges:

    • Requires clean, high-quality data.

    • Ineffective with novel or ambiguous transactions.

  • Conclusion: May provide richer insights than keyword-based systems but demands ongoing data maintenance.

Machine learning (ML) models

  • Description: Trains on labeled datasets to continuously improve categorization.

  • Use case: Suitable for scalable solutions across multiple markets where accuracy is critical, but requires robust infrastructure.

  • Benefits:

    • Highly scalable with improved accuracy over time.

    • Identifies complex patterns across diverse datasets.

  • Challenges:

    • High development and maintenance costs.

    • Requires specialized teams and robust infrastructure.

  • Conclusion: Supports scaling into multiple markets but requires significant initial investment.

Generative AI (GenAI) Models

  • Description: Leverages GenAI (such as Large Language Models) to understand and adapt to the context of transactions dynamically.

  • Use case: Ideal for handling multilingual, ambiguous, or context-specific transactions while continuously learning from new data.

  • Benefits:

    • Captures contextual information, improving categorization accuracy significantly over keyword-based models.

    • Dynamically adapts to new transaction data and formats.

  • Challenges:

    • Resource-intensive to develop and maintain.

    • Black-box nature can make compliance and transparency challenging.

  • Conclusion: The ultimate solution for long-term scalability, though it demands a partnership with a trusted AIS provider like GoCardless.

Example transactions: tackling complexity

Here’s how an example of how a GenAI-powered model would handle complex scenarios:

  • Ambiguous transactions: "Salary transfer - Friend" flagged for review, distinguishing it from "Salary transfer - Employer X."

  • Private vs. business transactions: Transactions such as salary may be categorized differently depending on whether the context suggests a personal expense or a business meeting. For a private individual, salary will often represent the primary source of income, but if looked at in a business bank account, this same transaction represents an expense to the company. Categorization engines need to be trained to distinguish between private and business transactions based on patterns such as time, location, or accompanying metadata.

  • Language variations: "Alga" means “salary ”in Latvian, while in Spanish it translates to “algae”, which illustrates the importance of localisation in cases where transactions originate in different countries.

  • Brand names: “Albert Heijn” might be identified as a private individual’s name and categorized as a personal transfer, while it is actually the largest Dutch grocery store chain.

Conclusion

In a rapidly evolving financial landscape, the ability to accurately categorize transactions is no longer a luxury but a necessity. By leveraging a robust categorization engine, organizations like re:cap have demonstrated how efficient data utilization can significantly lower credit risk, optimize operations, and facilitate market expansion. Partnering with GoCardless not only streamlines processes but also empowers businesses to harness the full potential of their transaction data, ensuring they remain competitive and responsive to customer needs. As financial institutions continue to seek innovative solutions, investing in effective transaction categorization will be pivotal to unlocking new opportunities for growth and success.

Ready to build your model on top of the GoCardless data?

Securely access your user’s bank account information for better lending.

All Categories

PaymentsCash flowOpen BankingFinanceEnterpriseAccountingGoCardlessTechnology
Interested in automating the way you get paid? GoCardless can help
Interested in automating the way you get paid? GoCardless can help

Interested in automating the way you get paid? GoCardless can help

Contact sales