Create and Run a SQL Aggregation Job
You can perform a SQL aggregation on a signals collection for a datasource (or on some other collection), through the Fusion UI or using the Fusion API.
Preliminaries
Before you can create and run a SQL aggregation job, you must create an app, create a collection (or use the default collection for the app), and add a datasource. Before you run a SQL aggregation job, you need signal data. Otherwise, there is nothing to aggregate.
Use the Fusion UI
You can use the Fusion UI to perform a SQL aggregation.
-
In the Fusion application, navigate to Collections > Jobs.
-
Click Add and select SQL Aggregation from the dropdown list.
-
Specify an arbitrary Spark Job ID.
-
For the Source Collection, select the collection that contains the data to aggregate.
This is not the base collection. For example, to aggregate the signals in experiment_signals
, you would selectexperiment_signals
, notexperiment
. -
Click the SQL field to enter aggregation statements. Click Close to close the dialog box.
Select the Advanced option in the top right corner to enter detailed parameters. See SQL Aggregation Jobs for more information. -
Enter applicable Signal Types.
-
Enter the Data Format.
-
Save the job.
-
With the app open, navigate to Collections > Jobs.
-
In the list of jobs, click the job you want to run, and then click Run.
If you do not want to run the job manually, you can also Schedule a Job.
Use the Fusion API
You can use the Fusion API to perform a SQL aggregation. For example:
curl ":{api-port}/api/spark/configurations/experiment_signals_aggregation"
{
"type" : "spark",
"id" : "experiment_click_signals_aggregation",
"definition" : {
"id" : "experiment_click_signals_aggregation",
"sql" : "SELECT SUM(count_i) AS aggr_count_i, query AS query_s, doc_id AS doc_id_s, time_decay(count_i, timestamp_tdt) AS weight_d FROM default_signals WHERE type_s='click' GROUP BY query, doc_id",
"selectQuery" : "*:*",
"outputPipeline" : "_system",
"rollupAggregator" : "SQL",
"sourceRemove" : false,
"sourceCatchup" : true,
"outputRollup" : true,
"parameters" : [ {
"key" : "optimizeOutput",
"value" : "4"
} ]
},
"inputCollection" : "experiment_click_signals",
"rows" : 10000
}
curl -u USERNAME:PASSWORD -X POST -H "Content-Type: application/json" http://localhost:{api-port}/api/jobs/spark:experiment_click_signals_aggregation/actions -d '{"action": "start"}'