File parser API - by Structured Concepts AB

API documentation

Last updated: 2018-01-16.
Copyright © Structured Concepts AB, all rights reserved.
Current API version: v0.1.

Purpose

The file parser API was designed with ETL processing in mind. We needed a tool that would parse and, when necessary, repair text files with delimited or fixed-width data. The client can call the API using a custom application, web service calls from popular integration software like SSIS or Pentaho, or using a simple "curl" command.

API address

The API uses an HTTPS POST operation. The exact URL contains information about the API version (optional) and the output format for the results.

Output format

The output format is declared in the URL. The following formats are available:

Comma-delimited

Returns rows delimited by a newline character and columns delimited by commas (,). Text is quoted using double quotes (") and backslash (\) is used as an escape character.
https://fileparser.strd.co/csv

Semicolon-delimited

Returns rows delimited by a newline character and columns delimited by semicolons (;). Text is quoted using double quotes (") and backslash (\) is used as an escape character.
https://fileparser.strd.co/sdv

Tab-delimited

Returns rows delimited by a newline character and columns delimited by tabs (ANSI code 9). Text is quoted using double quotes (") and backslash (\) is used as an escape character. Tab delimited files are useful for data that may contain commas or semicolons that would otherwise have to be escaped.
https://fileparser.strd.co/tab

SQL INSERT statements

Returns SQL statements to insert the data into a table (defined in the
table parameter. If no table parameter is given, the output will use the table name ###table###. To improve INSERT performance on the target server, inserts are "batched" 100 rows at a time.
https://fileparser.strd.co/sql

HTML table

Returns the data formatted as an HTML table, using the TABLE element with a THEAD with a single header row followed by a TBODY. Only very basic HTML encoding is performed and no <HTML>, <HEAD> or <BODY> elements or DOCTYPE header are generated.
https://fileparser.strd.co/html

Specific API version

Once you have implemented a working solution, we recommend fixing the API version by declaring it in your API call. This provides a predictable output over time, even when newer versions of the API are deployed to the server.

To set a specific API version, include it in the POST URL, like this:

https://fileparser.strd.co/sdv/api/v0.1

Most recent API version

When you do not specify the API version, the most recent version available is used. The advantage is that you get the latest features and fixes, but your results may change over time as newer versions are released.

To use the most recent API version, leave it out of the POST URL, like this:

https://fileparser.strd.co/sdv

Parameters

All parameters, except the output format and the API version, are sent as HTTP POST variables.

file (required)

Contains the source file. The naming of the file is not significant and the same file name can be used simultaneously or repeatedly as many times as you need.

output (optional)

Specifies if the result is returned as a JSON blob or as raw data. When JSON is selected, the output is included in a JSON blob along with the log output. Then "plain" is chosen, the returned data will have a content type of "text/plain" or "text/html" depending on the output format chosen. Valid inputs are "plain" and "json" in lower case. If "output" is not specified, "json" is assumed.

apikey (optional)

Your organization's API key that allows you to parse full-size files. Note: without an API key, files are limited to 20 kB.

maxerrors (optional)

Specifies the maximum number of errors allowed. If the number of errors returned from the parsing exceeds this number, no result set will be returned to the client. Note that to see the errors, you will need to enable JSON output.

table (optional)

Applicable only when the SQL output format is chosen, this sets the fully qualified name of the output table used in the INSERT statements. If no table name is specified, the default name ###table### is used.

Logging (optional)

There are four parameters that control the detail of the output logs.

  • logerrors: Logs severe errors that typically stop the parser.
  • logwarnings: Warnings are returned on potential datatype conversion issues, and other problems that may affect the data quality but does not necessarily stop the parser.
  • loginfo: Non-critical information generated in the parsing process.
  • logverbose: Very detailed, technical information that can help troubleshooting.
For each parameter, "on", "1", "true" or "yes" will enable the option. If none are specified, no logs are generated. Logs are not saved - they are only returned to the client if
JSON output is specified.

Copyright © Structured Concepts AB. Use of this service is subject to terms and conditions.
Would you rather run this application on your own private infrastructure? Contact us for a quote.