Create and Run a SQL Aggregation Job

Table of Contents

Preliminaries
Use the Fusion UI
Use the Fusion API

You can perform a SQL aggregation on a signals collection for a datasource (or on some other collection), through the Fusion UI or using the Fusion API.

Preliminaries

Before you can create and run a SQL aggregation job, you must create an app, create a collection (or use the default collection for the app), and add a datasource. Before you run a SQL aggregation job, you need signal data. Otherwise, there is nothing to aggregate.

Use the Fusion UI

You can use the Fusion UI to perform a SQL aggregation.

Set up a SQL aggregation job

In the Fusion application, navigate to Collections > Jobs.
Click Add and select SQL Aggregation from the dropdown list.
Specify an arbitrary Spark Job ID.
For the Source Collection, select the collection that contains the data to aggregate.

This is not the base collection. For example, to aggregate the signals in experiment_signals, you would select experiment_signals, not experiment.
Click the SQL field to enter aggregation statements. Click Close to close the dialog box.

Select the Advanced option in the top right corner to enter detailed parameters. See SQL Aggregation Jobs for more information.
Enter applicable Signal Types.
Enter the Data Format.
Save the job.

Run a SQL aggregation job

With the app open, navigate to Collections > Jobs.
In the list of jobs, click the job you want to run, and then click Run.

If you do not want to run the job manually, you can also Schedule a Job.

Use the Fusion API

You can use the Fusion API to perform a SQL aggregation. For example:

Configure a SQL aggregation job:

curl ":{api-port}/api/spark/configurations/experiment_signals_aggregation"
{
  "type" : "spark",
  "id" : "experiment_click_signals_aggregation",
  "definition" : {
    "id" : "experiment_click_signals_aggregation",
    "sql" : "SELECT SUM(count_i) AS aggr_count_i, query AS query_s, doc_id AS doc_id_s, time_decay(count_i, timestamp_tdt) AS weight_d FROM default_signals WHERE type_s='click' GROUP BY query, doc_id",
    "selectQuery" : "*:*",
    "outputPipeline" : "_system",
    "rollupAggregator" : "SQL",
    "sourceRemove" : false,
    "sourceCatchup" : true,
    "outputRollup" : true,
    "parameters" : [ {
      "key" : "optimizeOutput",
      "value" : "4"
    } ]
  },
  "inputCollection" : "experiment_click_signals",
  "rows" : 10000
}

Run a SQL aggregation job:

curl -u USERNAME:PASSWORD -X POST -H "Content-Type: application/json" http://localhost:{api-port}/api/jobs/spark:experiment_click_signals_aggregation/actions -d '{"action": "start"}'