Aggregation Jobs
Aggregation jobs compile your raw signals into aggregated signals. Most Fusion jobs that consume signals require aggregated signals.
Default job name |
COLLECTION_NAME_click_signals_aggregation
|
Input |
Raw signals (the COLLECTION_NAME_signals collection by default)
|
Output |
|
|
query |
count_i |
type |
timstamp_tdt |
user_id |
doc_id |
session_id |
fusion_query_id |
Required signals fields: |
|
|
|
|
|
|
|
[1] |
1 Required if you are using response signals.
The aggregation process is specified by an aggregation type consisting of the following list of properties:
Name |
Description |
id
|
Aggregation ID |
groupingFields
|
List of signal field names |
signalTypes
|
List of signal types |
aggregator
|
Symbolic name of the aggregator implementation |
selectQuery
|
Query string, default *:* |
sort
|
Ordering of aggregated signals |
timeRange
|
String specifying time range, e.g., [* TO NOW] |
outputPipeline
|
Pipeline ID for processing aggregated events |
outputCollection
|
Output collection name |
rollupPipeline
|
Rollup pipeline ID |
rollupAggregator
|
Name of the aggregator implementation used for rollups |
sourceRemove
|
Boolean, default is false |
sourceCatchup
|
Boolean, default is true |
outputRollup
|
Boolean, default is true |
aggregates
|
List of aggregation functions |
params
|
Arbitrary parameters to be used by specific aggregator implementations |
Use this job when you want to aggregate your data in some way.
aggregationTime - string
Timestamp to use for the aggregation results. Defaults to NOW.
id - stringrequired
The ID for this Spark job. Used in the API to reference this job. Allowed characters: a-z, A-Z, dash (-) and underscore (_). Maximum length: 63 characters.
<= 63 characters
Match pattern: [a-zA-Z][_\-a-zA-Z0-9]*[a-zA-Z0-9]?
inputCollection - stringrequired
Collection containing signals to be aggregated.
optimizeSegments - integer
If set to a value above 0, the aggregator job will optimize the resulting Solr collection into this many segments
exclusiveMinimum: false
Default: 0
outputCollection - string
The collection to write the aggregates to on output. This property is required if the selected output / rollup pipeline requires it (the default pipeline does). A special value of '-' disables the output.
>= 1 characters
parameters - array[object]
Other aggregation parameters (e.g. timestamp field etc..).
object attributes:{key
required : {
display name: Parameter Name
type: string
}value
: {
display name: Parameter Value
type: string
}}
readOptions - array[object]
Additional configuration settings to fine-tune how input records are read for this aggregation.
object attributes:{key
required : {
display name: Parameter Name
type: string
}value
: {
display name: Parameter Value
type: string
}}
referenceTime - string
Timestamp to use for computing decays and to determine the value of NOW.
rollupSql - string
Use SQL to perform a rollup of previously aggregated docs. If left blank, the aggregation framework will supply a default SQL query to rollup aggregated metrics.
>= 1 characters
rows - integer
Number of rows to read from the source collection per request.
Default: 10000
selectQuery - string
The query to select the desired input documents.
>= 1 characters
Default: *:*
signalTypes - array[string]
The signal types. If not set then any signal type is selected
skipCheckEnabled - boolean
If the catch-up flag is enabled and this field is checked, the job framework will execute a fast Solr query to determine if this run can be skipped.
Default: true
skipJobIfSignalsEmpty - boolean
Skip Job run if signals collection is empty
sourceCatchup - boolean
If checked, only aggregate new signals created since the last time the job was successfully run. If there is a record of such previous run then this overrides the starting time of time range set in 'timeRange' property. If unchecked, then all matching signals are aggregated and any previously aggregated docs are deleted to avoid double counting.
Default: true
sourceRemove - boolean
If checked, remove signals from source collection once aggregation job has finished running.
Default: false
sparkConfig - array[object]
Spark configuration settings.
object attributes:{key
required : {
display name: Parameter Name
type: string
}value
: {
display name: Parameter Value
type: string
}}
sql - stringrequired
Use SQL to perform the aggregation. You do not need to include a time range filter in the WHERE clause as it gets applied automatically before executing the SQL statement.
>= 1 characters
timeRange - string
The time range to select signals on, e.g., `[* TO NOW]`. See Solr date range for more options (https://solr.apache.org/guide/8_8/working-with-dates.html).
>= 1 characters
type - stringrequired
Default: aggregation
Allowed values: aggregation
useNaturalKey - boolean
Use a natural key provided in the raw signals data for aggregation, rather than relying on Solr UUIDs. Migrated aggregations jobs from Fusion 4 will need this set to false.
Default: true