Aggregation Jobs

Define an aggregation job.

Define an aggregation job to be executed by Fusion Spark.

id - stringrequired

The ID for this Spark job. Used in the API to reference this job

<= 128 characters

Match pattern: ^[A-Za-z0-9_\-]+$

inputCollection - stringrequired

Collection containing signals to be aggregated.

definition - Aggregation Settings

Defines the type of aggregation to perform, either SQL or legacy. SQL aggregations allow you to use ANSI SQL 2003, including numerous built-in functions to define your aggregation and rollup logic. The legacy aggregation option is based on pre-Fusion 4.0 features and will be removed in Fusion 4.1.

timeRange - string

The time range to select signals on, e.g., `[* TO NOW]`. See Solr date range for more options (https://solr.apache.org/guide/8_8/working-with-dates.html).

>= 1 characters

outputCollection - string

The collection to write the aggregates to on output. This property is required if the selected output / rollup pipeline requires it (the default pipeline does). A special value of '-' disables the output.

>= 1 characters

sourceRemove - boolean

If true, the processed source signals will be removed after aggregation. Default is false.

Default: false

sourceCatchup - boolean

If true, only aggregate the signals since the last time the job was successfully run. If there is a record of such previous run then this overrides the starting time of time range set in 'timeRange' property.

Default: true

outputRollup - boolean

Roll-up current results with all previous results for this aggregation id, which are available in "outputCollection".

Default: true

sql - string

Use SQL to perform the aggregation. You do not need to include a time range filter in the WHERE clause as it gets applied automatically before executing the SQL statement.

>= 1 characters

rollupSql - string

Use SQL to perform a rollup of previously aggregated docs. If left blank, the aggregation framework will supply a default SQL query to rollup aggregated metrics.

>= 1 characters

groupingFields - array[string]

The fields to group on

typeFieldName - string

Name of the signal type field; defaults to 'type'

signalTypes - array[string]

The signal types. If not set then any signal type is selected

selectQuery - string

The query to select the desired signals. If not set then '*:*' will be used, or equivalent.

>= 1 characters

Default: *:*

sort - string

The criteria to sort on within a group. If not set then sort order is by id, ascending.

>= 1 characters

outputPipeline - string

What pipeline to use to process the output. If not set then '_system' pipeline will be used.

>= 1 characters

Default: _system

rollupPipeline - string

Pipeline to use for processing results of roll-up. This is by default the same indexing pipeline used for processing the aggregation results.

>= 1 characters

rollupAggregator - string

The aggregator to use when rolling up. If not set then the same aggregator will be used for roll-up.

>= 1 characters

aggregator - string

Aggregator implementation to use. This is either one of the symbolic names (simple, click, em) or a fully-qualified class name of a class extending EventAggregator. If not set then 'simple' is used.

>= 1 characters

aggregates - array[object]

List of functions defining how to aggregate events with results. Not supported for SQL aggregations.

object attributes:{type required : {
display name: Type
type: string
}sourceFields : {
display name: Source fields
type: array
}targetField : {
display name: Target field
type: string
}mapper : {
display name: Use in map phase
type: boolean
}parameters : {
display name: Parameters
type: array
}}

statsFields - array[string]

List of numeric fields in results for which to compute overall statistics. Not supported for SQL aggregations.

parameters - array[object]

Other aggregation parameters (e.g. start / aggregate / finish scripts, cache size, etc).

object attributes:{key required : {
display name: Parameter Name
type: string
}value : {
display name: Parameter Value
type: string
}}

rows - integer

Number of rows to read from the source collection per request.

Default: 10000

readOptions - array[object]

Additional configuration settings to fine-tune how input records are read for this aggregation.

object attributes:{key required : {
display name: Parameter Name
type: string
}value : {
display name: Parameter Value
type: string
}}

aggregationTime - string

Timestamp to use for the aggregation results. Defaults to NOW.

referenceTime - string

Timestamp to use for computing decays and to determine the value of NOW.

skipCheckEnabled - boolean

If the catch-up flag is enabled and this field is checked, the job framework will execute a fast Solr query to determine if this run can be skipped.

Default: true

type - stringrequired

Default: aggregation

Allowed values: aggregation