Collections
Your data is organized into collections. When you create an app, Fusion automatically creates a collection with the same name. You can create additional collections in any app.
A primary collection contains the data that your users will search. Every primary collection is associated with a set of auxiliary collections that contain related data, such as signals, aggregations, and more.
Under the hood, a Fusion collection is a distributed index in Solr, defined by a named configuration stored in ZooKeeper, with these properties:
-
Number of shards
Documents are distributed across this number of partitions.
-
Document routing strategy
How documents are assigned to shards.
-
Replication factor
How many copies of each document in the collection.
-
Replica placement strategy
Where to place replicas in the cluster.
If your data is already stored in a Solr instance or cluster, you can manage this collection in Fusion by creating a Fusion collection that imports the existing Solr collection. See Installation with an existing Solr instance or cluster.
Collection names are case-insensitive, but Fusion preserves case when displaying collection names. |
Auxiliary Collections
Every primary collection is associated with a set of auxiliary collections that contain related data, such as signals, aggregations, and more.
Some auxiliary collections are created for every primary collection. Others are created only for the app’s default collection, one per app.
Auxiliary collections are described below:
|
A search query logs and signals collection. |
1 per collection |
|
A collection for aggregated signals. |
1 per collection |
Do not create primary collections with names that end in the suffixes above; these are reserved for Fusion auxiliary collections, which are created and managed by Fusion directly. |
Fusion maintains a set of Solr collections that store Fusion’s own log files and other internal information. These are called System Collections, described below.
Do not create primary collections named "logs" or beginning with "system_". These names are reserved for Fusion system collections. |
Fusion uses ZooKeeper to register information about all collections, and the Fusion components and services related to a collection. The Fusion components associated with a collection include:
-
Datasources
-
Pipelines
-
Profiles
-
Signals and aggregations
-
Analytics dashboards
System Collections
Fusion automatically creates some collections that are used for internal purposes and shared across all apps:
-
system_autocomplete store the content that the Fusion UI displays when you use the search bar.
-
system_blobs stores blobs in Solr. This is used to store model files for the NLP components and other binary files used by Fusion components.
-
system_history keeps a record of configuration changes, start and stop times for services and experiments, and more.
-
system_jobs_history keeps a record of Fusion jobs, including start/stop times and status.
-
system_logs stores parsed Java logs from the REST API, connectors-classic component, and other parts of Fusion, like proxy, connectors-rpc, and appkit app insights. It also includes http logs and optional gc logs (off by default in Fusion 4.1). Prior to Fusion version 4.1, Java logs were stored in the
logs
collection and HTTP requests were stored in theaudit_logs
collection. -
system_messages is used by Fusion’s messaging services.
-
system_metrics stores metrics about Fusion hosts and services, when enabled. See System Metrics. The data is polled at regular intervals according to the internal configuration variable:
com.lucidworks.apollo.metrics.poll.seconds
. This collection does not appear until after the first set of metrics are collected.
Collection Configuration Properties
Collections have three properties that you can configure only when you are creating a collection using the Collections API.
Property | Description | Default behavior |
---|---|---|
signals* |
The |
When you create a collection in the Fusion UI, |
searchLogs |
The |
When you create a collection in the Fusion UI, this property defaults to true. |
*Signals are events with timestamps that can be used to improve search results. For more information about signals in Fusion, see Signals in the Fusion AI documentation.
**In schemaless mode, if a document contains a field not currently in the Solr schema, Solr processes the field value to determine what the field type should be defined as, and then adds a new field to the schema with the field name and field type. This behavior can be convenient during preliminary application development, but it is rarely appropriate in a production environment.
Using profiles to associate collections with pipelines
Index pipelines and query pipelines are not connected to a specific collection by default. Index profiles and query profiles are configurations that create consistent endpoints for indexing and querying, each with a specific pipeline and collection.
-
Index Profiles work with index pipelines for getting content into the system.
-
Query Profiles work with query pipelines for user queries.
Field Editor UI
The Fusion UI includes a space under Collections to edit Fields. Descriptions for these fields can be found in the Field Type Definitions section of the Solr Reference Guide associated with your Fusion release.
Field options displayed in the UI include:
-
Dynamic checkbox (cannot change via UI)
-
Field Name (cannot change via UI)
-
Field Type (a preset value is shown that can be changed using edit mode)
-
Checkboxes for Indexed, Stored, Multivalued, Required
-
Text field to enter a Default Value
-
Copy Fields uses the plus sign to add rows (static can copy to
raw_content
ortext
; dynamic can copy to anyraw_content
/text
or any other dynamic field) -
Advanced toggles checkboxes for Doc Values, Omit Norms, Omit Positions, Omit term freq and positions, Term Vectors, Term Positions, Term Offsets