Querying Fusion SQL
In addition to the SQL functions provided by Spark, Fusion provides several additional functions to simplify common aggregation tasks. To recap, a UDAF aggregates multiple rows for the same group by key and a UDF performs some operation on a single row.
weighted_sum
The weighted_sum
UDAF takes a weight, type, and type-weight mapping to produce an aggregated weight. For example, consider the following SQL snippet:
SELECT query,
doc_id,
filters,
weighted_sum(typed_weight_d, type, 'click:1.0,cart:10.0') AS weight_d
FROM signal_type_groups
GROUP BY query, doc_id, filters
When applied to the rows in the table below, the weighted_sum
function produces a final weight_d
of 12.0 (2*1.0 + 1*10.0). The UDAF is passed rows grouped by query
, doc_id
, and filters
.
query |
type |
doc_id |
filters |
typed_weight_d |
---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
significantTerms
The significantTerms
UDF allows the user to query data for significant terms within the results. This identifies terms that appear an unusually frequently in a specific result set relative to the index. It can be used to identify anomalies in data.
This UDF can be used on tokenized text and multi-value and single-value string fields.
Example:
select significantTerms(fieldX), score from collectionName where fieldy = 'my-query' limit 10
timediff
The timediff
UDF computes the difference, in milliseconds, between to timestamps in the same row. From the session_rollup
job, the timediff
function computes the difference between the current time and the last activity in a session.
click_pos
The click_pos
UDF computes either a reciprocal rank or a raw click position (using a 0-based index) of a document in a page of results. This UDF is used to compute the mean reciprocal rank (MRR) for experiments. For example, given the following list of documents and a doc ID, the click_pos
UDF will return 2:
docs: a,b,c,d
doc ID: c
concat_text
The concat_text
UDF combines multivalued text fields coming from Solr into a field with a single value delimited by spaces. This UDF is useful when a field returned from Solr uses the _txt
suffix, which indicates a multivalued text field.