Legacy Product

Fusion 5.10
    Fusion 5.10

    CSV Parser Stage

    This parser breaks down incoming CSV files into the most efficient components for Fusion to index. It produces one new document per row from the CSV input, excluding comment rows and header rows.

    When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.

    Parse CSV content

    id - string

    Default: 148f584d-60b6-4381-8259-d6d7dff69ed0

    label - string

    A label for this Parser Stage

    <= 255 characters

    enabled - boolean

    Default: true

    mediaTypes - array[string]

    pathPatterns - array[object]

    Specify a file name or pattern that must be matched for this parser stage to run. Forward slashes ("/") are used to join names of files inside archives with the archive name.

    object attributes:{syntax : {
     display name: Pattern type
     type: string
    }
    pattern : {
     display name: File name or pattern
     type: string
    }
    }

    inheritMediaTypes - boolean

    Indicates if parser stage should use the default media types. Unchecking this box means that ONLY the manually configured media types will be parsed by the parser and you then MUST provide at least one media type.

    Default: true

    errorHandling - string

    Default: mark

    Allowed values: ignorelogfailmark

    outputFieldPrefix - string

    Fields extracted by this parser will be prefixed with this string. The remainder of the field name will be as detected in the stream

    <= 20 characters

    Match pattern: ^$|^[A-Za-z_][A-Za-z0-9_\-\.]+$

    charset - stringrequired

    Example: "UTF-8"

    Default: detect

    ignoreBOM - booleanrequired

    Ignore Byte-Order Mark (BOM) if present and always use the configured character set. When set to false a valid BOM character set overrides the configured default character set.

    Default: false

    delimiter - string

    Delimiter character between fields. Any single character, including an escaped character, is valid, e.g. , (comma), \t (tab), or | (pipe). Default is comma if auto-detection is disabled

    >= 1 characters

    quote - string

    Quote character, default is a double quote (") if auto-detection is disabled

    <= 1 characters

    quoteEscape - string

    Quote escape character, default is a double quote (") if auto-detection is disabled

    <= 1 characters

    autoDetect - boolean

    Attempt to guess the delimiter, quote, quote escape, and comment characters

    Default: true

    trimWhitespace - boolean

    Trim off leading and trailing whitespace from columns, default true

    Default: true

    hasHeaders - boolean

    Treat the first row as column headers, default true

    Default: true

    headers - array[string]

    List of column headers, overrides file headers if present

    skipEmptyLines - boolean

    Skip any empty lines encountered, default true

    Default: true

    lineSeparator - string

    Line separator character

    >= 1 characters

    nullValue - string

    A string value to replace nulls with, no default

    emptyValue - string

    A string value to replace empty strings with, no default

    includeRowNumber - boolean

    Include the row number (line number) in the emitted documents, default true

    Default: true

    comment - string

    Character at start of row to indicate a comment, default is hash (#) if auto-detection is disabled

    <= 1 characters

    commentHandling - string

    How to handle comments: ignore, add as field to next document, or add a separate documents, default ignore

    Default: ignore

    Allowed values: ignoreas_fieldas_document

    maxRowLength - integer

    Maximum number of characters to allow for a single read line, default 10MB

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 10485760

    maxNumColumns - integer

    Maximum number of columns to allow for a single row, default 1000

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 1000

    maxColumnChars - integer

    Maximum number of characters a single column value can have, default 10MB

    <= 2147483647

    exclusiveMinimum: false

    exclusiveMaximum: false

    Default: 10485760

    columnHandling - string

    What to do when a row has too many or too few columns: Can throw an error, align the column, or do nothing special (default)

    Default: default

    Allowed values: erroraligndefault

    fillValue - string

    A string value to use when aligning the columns (when Column Mismatch Handling is "align")

    Default: <FILL>

    type - stringrequired

    Default: csv

    Allowed values: csv