Index Stage SDK
Overview
Lucidworks provides an Index Stage SDK in a public repository on GitHub with all the resources you need to develop custom index stages with Java.
Clone the repository to get started:
git clone https://github.com/lucidworks/index-stage-sdk
See Gradle quickstart documentation for more information on Java Projects.
Concepts
Index stage configuration
The index stage configuration file defines configuration options specific to the index stage instance. The options defined in this configuration file are available to the user in the Fusion UI and the API. The plugin configuration class extends the index stage configuration file and is annotated with @RootScheme
.
Adding @Property
and type annotations to your stage configuration interface methods defines metadata and type requirements for your plugin configuration fields. This is similar to Fusion’s connector configuration schema.
APIs
The Index Stage SDK includes several APIs for communication with other Fusion components via the Fusion object. This object is passed to the stage during initialization.
- RestCall
-
The RestCall API provides access to the Fusion REST API. You can find an example of its usage in the Index Stage SDK repository.
- Blobs
-
The Blobs API enables interactions with the Blob Store API.
- Documents
-
The Documents API provides a method for creating new document instances. This is useful for custom stages that output multiple documents from a single input documentation.
Plugins
A plugin is a .zip
file that contains one or more index stage implementations. The file contains .jar
files for stage definitions and additional dependencies. It also contains a manifest file that holds the metadata Fusion uses to run the plugin.
Plugins are uploaded to the Blob store:
-
Navigate to System > Blobs.
-
Click Add.
-
Select Index Stage Plugin.
-
Click Browse… and select your plugin file.
-
Click Upload.
Plugin stage classes must implement the com.lucidworks.indexing.api.IndexStage
interface and be annotated with com.lucidworks.indexing.api.Stage
annotation. For additional convenience, stage implementation can extend the com.lucidworks.indexing.api.IndexStageBase
class, which already contains initialization logic and some helpful methods.
Lifecycle
Creation and initialization
Fusion begins by creating an IndexStage
instance. After the index stage is created, it is initialized using the init(T config, Fusion fusion)
method. This allows for the creation of internal storage instructions and the validation of the configuration.
Initialization occurs immediately after the stage configuration is saved in Fusion. The stage can be maintained and used by Fusion for extensive periods of time, even if no documents are being processed through the stage. This should be considered when making decisions on resource allocation.
Document processing
Once the initalization process completes, Fusion calls the process
method for each document the index pipeline processes.
In most use cases, index stages process a single input document and emit a single output document. For these cases, the process(Document document, Context context)
method should be used.
In other cases, index stages process a single input document but emit multiple output documents. For these cases, the process(Document document, Context context, Consumer<Document> output)
method should be used. The output documents are sent by calling output.accept(doc)
.
A single stage instance can be used to process multiple documents, and the process
method can be called from multiple concurrently running threads. Additionally, Fusion can initialize and maintain multiple stage instances with the same configuration in separate indexing service nodes. Therefore, it’s important to ensure the plugin stage implementation is thread-safe and the processing logic is stateless.
If the index stage throws an exception while processing a document, that document will not be processed further. It does not prevent other documents from being processed. Check the logs for information regarding the exception. |
Logging
The Index Stage SDK uses the SLF4J Reporter logging API.