Introduction

Most credit risk customers - banks, fintechs, and payment processors - analyze our firmographic and merchant transactions data via a backtesting exercise. This exercise is used to select attributes that are most likely to predict a delinquency, chargeoff, or chargeback risk event based on historical outcomes data.

Business rules derived from these attributes are then used to set policy decisions. Examples of production rules derived from our data include:

Setting a higher credit limit if revenue growth is positive for several consecutive months and the age of the business is > 5 years
Approve customer that is rejected in current credit policy if there is a match and transaction presence
Don’t require document verification of payment processing volume if monthly revenue is > $25,000

*Note: Enigma is not intended to replace credit bureau data, but rather supplement it for thin files or enhance risk splitting power.

The steps for conducting this exercise are as follows:

Prepare your outcomes file
Request a historical batch append: Select Enigma attributes that were available at time of application/approval/onboarding via the date_accessible attribute. Work with your Enigma relationship managers to conduct this batch append.
(Optional) Generate derived Enigma attributes: Use our sample notebook to generate derived Enigma attributes.
Conduct univariate analyses across all possible features: Understand predictive power on an attribute-by-attribute basis.
(Optional) Multivariate subpopulation analysis: Select a subpopulation based on multivariate feature thresholds with the greatest possible support and higher or lower risk ratio. Plug this subpopulation into a general underwriting model and examine metrics such as AUC.

Note: We highly recommend working with the Enigma solutions team to conduct the batch append on your behalf so that they can detect any data input/output errors. The Enigma solutions team is also on-hand to help conduct the full evaluation upon request.

Prepare your outcomes dataset

Your input file should include the minimum matching inputs (business name, address) and key outcome variables, such as a chargeoff flag. You should feel free to include multiple outcome flags. An example is below:
1. Application date
2. Line assignment amount
3. Approval / decline / manual review codes
4. Delinquency and/or charge-off flags, including dates
Note that you should only include businesses that have had time to mature in their outcomes, e.g. credit card delinquency rates generally stabilizes after a year of history, so a credit card issuer may want to exclude the most recent year's worth of applications. Enigma recommends any business that applied after 2018 and at least one year prior to the current date. Note that pandemic period data from 2020-2021 may also lead to some anomalous results, so ideal vintage years are 2018, 2019, 2022, and 2023.

Request a batch append

Request a batch append to retrieve historical Enigma merchant transaction data and to understand fill rates.
For credit risk evaluations, customers usually use the date_accessible column to get historical data and prevent time travel, i.e. ensuring that any historical backtest uses only data that was available at the time of a customer's application. To prevent time traveling, you may filter out all records in which the customer’s input application date is prior to the date_accessible field.

Note: Historical data is generally only available for merchant transactions attributes.

Firmographic attributes like industry do not have a date associated with it, but their availability date can generally be tied to the issue_date attribute within the registrations object, which denotes when Enigma received a corporate registration.
Immediately upon receipt, Enigma will generally enrich a website record or obtain an industry.

Generate derived attributes (optional)

Derived attributes are sometimes as predictive and valuable as the out-of-the-box features. We have created a notebook that customers can use to generate derived attributes, such as month_of_mtx_history, revenue volatility factors, or one-hot encoded firmographic variables. The notebook takes as input a Enigma-enriched file and outputs a “model-ready” set of attributes.

Our most predictive attributes are often not available “out of the box”, but must be generated via one-hot encoding and additional feature transformations. We have created a generalized Python notebook that takes as input a historical outcomes file and generates 50+ new attributes from it, which you can access here.

Generate univariate sloping charts

As exploratory analysis, we recommend looking across all key attributes, and understanding how different numerical variable percentiles split risk, as well as how boolean variables split risk. Our evaluation notebook contains Python code that accomplishes this.

Attributes that clients have found to be predictive across lending or merchant onboarding use cases include the below:

Match attributes (covers 100% of client input records)

Match presence

Firmographic attributes (covers ~70-80% of client input records)

Industry classification

Derived firmographics attributes (covers ~70-80% of client input records)

Website presence
Years in business

Merchant transaction attributes (covers ~25-70% of client input records)

Merchant transaction presence
Monthly average revenue 12m revenue
Transaction stability daily coverage

Derived merchant transaction attributes (covers ~25-70% of client input records)

Average transaction size
Months of merchant transaction history
Months of consecutive positive growth
Months since last year’s lowest revenue month
Revenue standard deviation

Across each attribute you may create tables that outline key thresholds that split risk. An example is below:

Population description	Support / population amount	Delinquency rate
Total	100% (n=100,000)	3.5%
card_revenue.12m.average_monthly_revenue: 0-25 percentile	25%	3.4%
card_revenue.12m.average_monthly_revenue: 25-50 percentile	25%	2.4%
card_revenue.12m.average_monthly_revenue: 50-75 percentile	25%	2.2%
card_revenue.12m.average_monthly_revenue: 75-100 percentile	25%	1.8%

Generate multivariate subpopulations

After you find several attributes and thresholds for which both the support, e.g. subpopulation size, and the outcome rate is adequate, you can union several of these populations together to create a single rule. For example:
Where card_revenue.12m.average >25K OR months_of_mtx_history > 50

This rule, in addition to all other previous attributes, can act as an input into an already existing underwriting model to predict lift on top of a baseline. Alternatively, it can be used as an overlay as a business rule for determining credit policies such as an automatic line increase.

Note: The Enigma solutions team is also on-hand to help conduct the full evaluation upon request.