Send Enigma a File for Batch Append
If you have had a conversation with the Enigma team about your use case and wish to evaluate Enigma data further or begin an integration, then sending Enigma an input file to batch append may be your best option.
Preparing Input File
Once you are ready to send Enigma a file for batch append, the next step is to prepare your input file.
Accepted File Types
Enrichment (Batch Append) accepts and produces the following tabular data types:
File Extension Name Encoding .csv comma, separated, values utf-8 .tsv tab separated values utf-8 .psv pipe | separated | values utf-8 .parquet parquet binary file format
compression support:
snappy, lz4, gzip zstdIf your file is in another format and cannot be reformatted to one of these formats easily please reach out to a member of the Enigma team.
If enriching at the business level please ensure that at least one of the following combinations of fields is included in the input file and clearly marked with a column header:
- Website URL (with or without any other information)
- Name + Address + Person
- Name + Address
- Name + Person
- Person + Address
If enriching at the business location level please ensure that at least one of the following combinations of fields is included in your file and clearly marked with a column header:
- Name + Address + Person
- Name + Address
- Name + Person
- Person + Address
Download our blank CSV template to help get you started. If you’d like the Enigma team to aid in predictive analysis, you can also provide additional fields like marketing or delinquency outcomes.
Transmitting the File
The best way to transmit the file to Enigma is via the File Manager System feature in the Console. Simply drag and drop a file into this tool and it will be securely transmitted to Enigma. To learn more about how Enigma manages security, see the security page.
Requesting the Desired Output
After transmitting the file, it’s important to be aware of the choices available so that the completed batch append meets the requirements of the use case at hand. The following are some things to consider.
Entity Type
Indicate whether the enrichment of input records should happen at the business or business location level.
Attributes
When discussing a batch append with a member of the Enigma team it helps to familiarize yourself with the Enigma attribute dictionary. This contains explanations of the various attributes available to enrich input records.
Some attributes are available as a monthly time-series (currently, only the Merchant Transaction Signals). Let the Enigma team know if you’d like to receive the most recent month only, or if you’d like to receive a historical time series as well.
Another thing to consider is the fact that some Enigma attributes are array types i.e., there is a list of values associated with that attribute. An example could be the attribute industries
, where some businesses could be classified in multiple closely related industries. If you are just starting out with Enigma data or are unfamiliar with the attribute in question, we recommend only requesting the first value for such attributes. If more are needed, that is something to point out in advance of the batch append.
Output Format
The final enriched output can be in csv or parquet format.
File Structure
The structure of the output file is also something that can be customized.
Enigma data is stored as a collection of attributes. Some of these attributes are represented as objects with properties associated with them. The file can be structured so that all such attributes are either flattened or unflattened.
A flattened file will contain no columns with nested values inside them. Every property of an object type attribute will be represented as its own column in the file. Choose a flattened structure if the output file(s) will be opened in a spreadsheet-like tool and then analyzed.
An unflattened file will contain columns with nested values inside them. For example, an attribute like industries
will have properties like classification_type
and classification_description
, amongst others. Each of these properties will not be represented as a separate column but will instead be nested inside a single column called “industries”. Choose an unflattened file if the output file(s) will be ingested into a data pipeline and the efficiency of ingesting and programmatically parsing the file are a priority.
Number of output files
In most cases, Enigma recommends sending the output back in one file. There are a few instances where this is not the case.
- If a user wants both firmographics attributes and multiple months of history for a time-series attribute, Enigma recommends splitting the firmographics into one file and the time-series in another file. The Enigma ID can serve as a matching key between the two files in these scenarios
- If a user wants to see matches for business and matches for business locations separately, Enigma recommends splitting those into two separate files.
Receiving the File Back
There are three ways to receive the file back from Enigma:
File Manager
The enriched file can be deposited back into the File Manager System of the console. Once the file is ready, a notification will be sent by a member of the Enigma team.
We recommend this method of receiving the file if you are evaluating Enigma data prior to a full integration.
SFTP
If an integration with Enigma data is being planned which requires that files be deposited in a server via SFTP, please reach out to the Enigma team. Please refer to the section on Console File Manager for instructions on connecting to an SFTP server. You will have the option of using either an Enigma controlled server (default) or use your own.
S3
If you use Amazon Web Services to manage your cloud infrastructure, then you may be able to receive files from Enigma deposited in a S3 bucket you control.
Please refer to the section on Console File Manager for instructions on setting up a S3 integration.
Note: Parquet Download Limitations
Due to a technical limitation at this time, Parquet files generated by Enigma cannot be downloaded directly from the Console File Manager web interface. Parquet files may be accessed by customers when copied to a user-defined Data Source by Customer Success, such as a customer-controlled SFTP server or Amazon S3 bucket or an Enigma-provided private SFTP server for each customer. This can be done by reaching out to customer success via [email protected] (or through other CS contact channels).
As a temporary workaround, Parquet files may be converted to CSV, TSV, or PSV and downloaded from the web interface. Yes, we recognize the supreme irony here and this paradox will soon be resolved.
Interpreting the Output
The output of the batch append file received via this method will contain the original input columns where the original column headers will be prepended with input_
and be positioned at the front of the list of columns.
The Enigma ID corresponding to each matched input record will be included next.
If matching on the business level, Enigma also provides the Enigma ID for up to five business locations associated with that business.
While all column names can be renamed by the Enigma team upon request, by default any additional column headers are named according to the following rules:
- For attributes that are represented as arrays, each element of the array is added as a separate column with the header appended with
__X
where X = 0,1,2,…. E.g.names__0
- For attributes that are represented as an array of objects, the column header representing each object in the array is distinguished by
__X
where X = 0,1,2,…, followed by the property name. E.g.addresses__0__street_address1
- For object-type attributes containing nested properties, the column headers for each property are appended by a double underscore
__
. E.g.card_revenue_growth__3m__rate_sa
- For attributes containing nested properties that are themselves arrays the column header is appended by a double underscore to distinguish the property name. Then each element of the array is added as a separate column with the header appended with
__X
where X = 0,1,2,…
Please take a look at the Enigma attribute dictionary to check the type of the attribute being appended.
If a time-series attribute is selected (only merchant transaction attributes currently have a time-series component) then there will be multiple rows of data for each Enigma ID - each row corresponding to one month of a time series. Please note the following characteristics for time series attributes:
- If there is transaction presence in ANY month since Jan 2017, Enigma will return the entire history starting from Jan 2017, even if most attribute values are null.
- In other words, any business with transaction presence at any time will have one record per month going back to Jan 2017 by default.
- If there is no transaction presence, the Enigma record will not appear at all in the time series file.
Other columns will be appended by default in the file, indicating the match_confidence
corresponding to the confidence of each match and the matched fields.
Please get in touch with the Enigma team if you feel sending Enigma a file is the best option for you. Our team will be happy to get you set up with either a one-off or recurring delivery of files based on your requirements.
Working with Parquet
Enigma file manager can accept as well as produce Parquet tabular data files.
Individual Parquet Files
Individual parquet files can be uploaded/downloaded like other single files: it can be uploaded and downloaded via the console, or via a Console File Manager source (e.g. SFTP, S3).
Partitioned Parquet Files
Enigma Console can both consume and produce Parquet split across multiple files.
Partitioned Parquet Data Sources
Using partitioned parquet requires the configuration custom data source added via the Console File Manager: Source Manager. This enables the upload of multiple files within a directory. In addition to customer provided source systems such as AWS S3, Enigma provides all customers access to private and secure SFTP accounts, which can be unique for each source.
This custom Data Source will be how you can upload multiple files of Parquet within a single directory to Enigma or obtain sharded parquet files back from us.
Structuring Partitioned Parquet
Once a custom data source is configured, making multiple parquet files available to Enigma as a single dataset is done by:
- Ensuring all parquet files are in a single directory with that ends with
.parquet
(e.g.enigma-50k-entities.parquet/
)- Having the filename end in
part-###.parquet
This looks like:
enigma-50k-entities.parquet/ fiftyk.part-000.parquet fiftyk.part-001.parquet fiftyk.part-002.parquet
Hive Partitioning Not Yet Supported
Enigma Console cannot yet consume or produce Hive-style partitioned parquet. Files split over multiple directories or with
key=value/
style directory names will not be read by Enigma.Please reach out to [email protected] or to your Customer Success partner for early access to this feature as soon as it is available.
Updated 6 months ago