Match Endpoint
Match Endpoint
Overview
The match endpoint is used to identify which of Enigma’s profiles correspond to a business that is of interest to a user. In order to use the match endpoint, a user must have identifying information about a business like its name, address, website, etc. Importantly, a user can specify whether they want a business match or a business location match - for more information on these entities see the section on Enigma’s data structure.
Making a request to businesses/match
Header
To make a businesses API request, ensure you first enter your API key by adding the following to the request header:
x-api-key
: YOUR-API-KEY
Query Parameters
Business entity type: Set business_entity_type
to either business
or business_location
. If not configured, the default entity type returned is business_location
.
Note: Business revenue will always be greater than business location revenue and be returned at the brand level. For example, McDonald's revenue will be returned across all businesses, not just a single location - please be sure to stipulate business_location if you are looking for the revenue of an individual location.
Match Threshold: By setting match_threshold
to the desired value between 0 and 1. This will serve as a threshold by which to filter results based on the match_confidence
(see more on match confidence below) determined by the Enigma match model. If not configured the default value is 0.5.
Number of matches returned: By setting top_n
to the desired number of matches to be returned. Each match returned will have a match confidence higher than the provided match_threshold
. If not configured, the default value is 1.
Show Non Matches: In the event that no result has a match confidence higher than the provided match_threshold
, setting show_non_matches
to 1 will result in the API returning as many matches as possible up to top_n
.
Prioritization: By setting prioritization=MTX
, ensures that the top profile returned also has revenue data associated it, as long as it is above the match threshold. See below for more explanation.
Inputs
The match endpoint supports POST
requests. The minimum payload required will depend on whether you are performing a business or business location match.
Enigma currently allows for matching based on the following inputs:
Business Location | Businesses | |
---|---|---|
Business Name | Yes | Yes |
Business Address | Yes | Yes |
Associated Person | Yes | Yes |
Website | No | Yes |
Associated person represents a potential owner of the business.
Enigma does not currently allow for phone numbers or EINs to be used as inputs for business matching, butt plans to do so in the future.
Business Match
The request body for requesting a business match will have the following fields:
{
"name": "",
"person": {
"first_name": "",
"last_name": ""
},
"address": {
"street_address1": "",
"street_address2": "",
"city": "",
"state": "",
"postal_code": ""
},
"website": ""
}
If at least one match is found, the response to this POST request will contain an Enigma ID corresponding to each matched business entity. The Enigma ID is the unique identifier for each profile in Enigma’s dataset. Additionally, the response object will contain a list titled business_location_enigma_ids
. This is a list of Enigma IDs corresponding to the business locations associated with that business. For a full description of fields included in the response refer to the API Reference section.
Enigma allows for the following combination of inputs for business matching:
- Website URL (with or without any other information)
- Name + Address + Person
- Name + Address
- Name + Person
- Person + Address
If providing a website URL in the request body, a business match is first attempted using only the URL. If a match on URL is successful, the other fields provided in the request body are ignored.
If a match on URL is unsuccessful or a URL was not provided, then the endpoint will look to using the other fields (business name, address and/or person) to attempt a match on a corresponding business location and return the Business entity that the location is associated with.
Business Location Match
The request body for making a business location match will have the following fields:
{
"name": "",
"person": {
"first_name": "",
"last_name": ""
},
"address": {
"street_address1": "",
"street_address2": "",
"city": "",
"state": "",
postal_code": ""
},
}
As with business matching, if at least one match is found, the response to this POST request will contain an Enigma ID corresponding to each matched business location entity. For a full description of fields included in the response refer to the API Reference section.
Enigma allows for the following combination of inputs for business location matching:
- Name + Address + Person
- Name + Address
- Name + Person
- Person + Address
Website URL is an invalid parameter for performing a business location match. This is because there are often multiple locations associated with a business’s website.
We recommend storing the Enigma ID in order to make subsequent calls to the ID Endpoint, where you’ll be able to call the same entity based on its ID, and - most importantly - pull Enigma’s attributes for that entity.
The Enigma ID can also be used for entity resolution across your internal data set.
Matched Fields
Alongside each potential match, Enigma provides an attribute called matched_fields
. This attribute shows the business name, address, person and/or website in Enigma’s data asset that was found to be similar to the query inputs.
Match Confidence
Along with each potential match, Enigma provides a confidence score called match_confidence
.
This attribute - the output of our match model - is a quantitative representation of the proximity between the information provided in the request input and the business profile. The score ranges from 0 to 1, where 1 indicates an exact match.
How should the match confidence score be interpreted?
The score does not translate to a percent probability of a match. The score is a measure of a combination of string and semantic similarity between the user provided match inputs and the Enigma business or business location entity.
- A score of 0.75 does not mean it is a 75% probability of a match
- A score of 0.25 does not mean it is a 25% probability of a match
- Enigma considers anything with a match score of >0.5 to be a match with estimated precision of 95% or higher. That is to say, Enigma expects there to be <5% false positives for matches with a score of 0.5 or above.
See below for more detailed information about the match model and how the score works.
Match Models
Below is a description of how Enigma’s matching model works.
Why is matching important?
Essentially, matching means that you don’t need to have perfect information about a business in order to find the correct business profile. The models are highly optimized around the relevancy of the data returned. This means that when you see a match, you can be very confident that it’s the right business. It also means that when there are no relevant results, no match will be returned.
How the matching system works
When a request is made for a business match, the first step is comparing if there is a url match. If there is an exact url match, then there is a definitive match to a business profile with that URL with a match confidence score of 1.
If there is not an exact url match for a Businesses request or the request is for a business location, then a match is assessed using the other potential fields: business name, address, and associated people. The match system generates potential business profile match candidates based on . how closely the inputs match business identifiers of a profile as well as how rare those identifiers are in the database.
The system then creates a match score for each potential match by using a machine-learning model to ask whether the surfaced business truly matches the one specified in the input. More specifically, this list of record pairs - the search input and the Enigma SMB record - go through a model that compares each pair using questions such as:
- What is the string distance between the full/cleaned company names?
- What are the semantic similarities between the company names?
- What are the string distances between the full addresses and address components?
- What are the shared tokens between company names and addresses?
Next, the answers to these questions are turned into features that are fed into a neural network model. (The model is trained and tuned using vast quantities of data that has been labeled to indicate true vs. false matches). This full process completes in under a second.
The 0.5 Threshold
As discussed above, by default a match is considered to be a result whose match_confidence score is at least 0.5. Enigma arrives at this default threshold based on extensive research that indicates this threshold. results in the greatest accuracy while also maintaining the optimal precision-recall tradeoff.
For those less familiar with predictive analytics, here’s a breakdown of these terms:
- Accuracy:The share of observations classified correctly (as matches vs. non-matches)
- Precision:The share of predicted matches that are in fact true matches
- Recall:The share of true matches that are classified as such
Configuration
Recall that the Match endpoint of the Businesses API is configurable. While 0.5 presents the optimal tradeoff in most cases, the threshold can be manipulated on a call-by-call basis. Additionally, you can choose whether you want to show non-matches and/or return multiple results.
One important case where configuration is appropriate is if a user does not have complete inputs. This default threshold assumes that a user provides a company name and a complete address (all fields filled in) or complete person name (both first and last name). If a user is making a match request without a complete address and without a complete person name, the 0.5 threshold will not be appropriate to use. The appropriate threshold will be much lower. For example, Enigma has found that for match requests without street address but that do contain city and state, the appropriate match threshold is around 0.2-0.35.
Prioritizing MTX transaction data
In some instances, there are multiple profiles that cross the match threshold of 0.5. Each profile may represent a different facet of the exact same business (see Entity Resolution), but did not resolve together because the profile firmographics were all slightly different.
If you are most interested in financial and revenue data, you can prioritize the business or location profile that carries such data by using the prioritization parameter within the URL, i.e. /match?prioritization=mtx.
Updated 10 months ago