File Delivery
If a List Generation workflow is suitable for the use case at hand, the file delivery service is the best way to access Enigma data.
Please reach out to the Enigma team if you need to generate a list. The steps below outline how to best prepare for that conversation.
Requesting the Desired Output
It’s important to be aware of the choices available when generating a list so that the output file meets the requirements of your use case. The following are some things to consider.
Entity Type
Indicate whether the desired file should contain a list of businesses or business locations. One file will contain a list of only one of these entity types.
Filters
Provide the Enigma team with an idea of the type of business or business location being targeted. To get an idea of the dimensions along which the Enigma data can be filtered see the Enigma attribute dictionary.
Attributes
When discussing a list generation with a member of the Enigma team it helps to familiarize yourself with the Enigma attribute dictionary. This contains explanations of the various attributes available to include in the output file alongside the basic identifiers of the business or business location like business name, address, Enigma ID, etc.
Some attributes are available as a monthly time-series (currently, only the Merchant Transaction Signals). Let the Enigma team know whether you are interested in only the most recently available month of transaction related data or multiple months of history.
Output Format
The final output can be in csv or parquet format.
File Structure
The structure of the output file is also something that can be customized.
Enigma data is stored as a collection of attributes. Some of these attributes are represented as objects with properties associated with them. The file can be structured so that all such attributes are either flattened or unflattened.
A flattened file will contain no columns with nested values inside them. Every property of an object type attribute will be represented as its own column in the file. Choose a flattened structure if the output file(s) will be opened in a spreadsheet like tool and then analyzed.
An unflattened file will contain columns with nested values inside them. For example, an attribute like industries
will have properties like classification_type
and classification_description
, amongst others. Each of these properties will not be represented as a separate column but will instead be nested inside a single column called “industries”. Choose an unflattened file if the output file(s) will be ingested into a data pipeline and the efficiency of ingesting and programmatically parsing the file are a priority.
Number of output files
In most cases, Enigma recommends sending the output back in one file. There are a few instances where this is not the case.
- If a user wants both firmographics attributes and multiple months of history for a time-series attribute, Enigma recommends splitting the firmographics into one file and the time-series in another file. The Enigma ID can serve as a matching key between the two files in these scenarios
- If a user wants to see matches for business and matches for business locations, Enigma recommends splitting those into two separate files.
Retrieving Files from Enigma
There are two standard ways you may retrieve files output by Enigma:
- Downloaded directly via the Enigma Console File Manager
- Published into a user-defined Data Source, such as a SFTP server or Amazon S3 Bucket which you control or a private SFTP account on an Enigma SFTP server. For more information on how to set up a Data Source, please reference the Console File Manager documentation.
Note: Parquet Download Limitations
Due to a technical limitation at this time, Parquet files generated by Enigma cannot be downloaded directly from the Console File Manager web interface. Parquet files may be accessed by customers when copied to a user-defined Data Source by Customer Success, such as a customer-controlled SFTP server or Amazon S3 bucket or an Enigma-provided private SFTP server for each customer. This can be done by reaching out to customer success via [email protected] (or through other CS contact channels).
As a temporary workaround, Parquet files may be converted to CSV, TSV, or PSV and downloaded from the web interface. Yes, we recognize the supreme irony here and this paradox will soon be resolved.
Interpreting the Output
The output of the file containing the generated list will contain a number of columns representing attributes describing a business or business location.
The Enigma ID corresponding to each record will be included first.
While all additional column names can be renamed by the Enigma team upon request, by default column headers are named according to the following rules:
- For attributes that are represented as arrays, each element of the array is added as a separate column with the header appended with
__X
where X = 0,1,2,… E.g.,names__0
- For attributes that are represented as an array of objects, the column header representing each object in the array is distinguished by
__X
where X = 0,1,2,…, followed by the property name. E.g.,addresses__0__street_address1
- For object-type attributes containing nested properties, the column headers for each property are appended by a double underscore
__
. E.g.,card_revenue_growth__3m__rate_sa
- For attributes containing nested properties that are themselves arrays the column header is appended by a double underscore to distinguish the property name. Then each element of the array is added as a separate column with the header appended with
__X
where X = 0,1,2,…
If a time-series attribute was selected (only merchant transaction attributes currently have a time-series component) then there will be multiple rows of data for each Enigma ID - each row corresponding to one month of a time series.
Please take a look at the Enigma attribute dictionary to check the type of the attribute being appended.
Updated 2 months ago