Legacy Product

Fusion 5.4

Fusion Connectors Concepts

Connectors are the built-in mechanism for pulling your data into Fusion. Lucidworks provides a wide variety of connectors, each specialized for a particular data type.

When you add a datasource to a collection, you specify the connector to use for ingesting data.

See the complete list of connectors, with links to configuration reference information for each one.

Connector architecture

Connector plugins can be hosted within Fusion, or can run remotely. The communication of messages between Fusion and a remote Connector or hosted Connector are identical; Fusion sees them as the same kind of Connector. This means you can implement a plugin locally, connect to a remote Fusion for initial testing, and when done, upload the same artifact into Fusion, so Fusion can host it for you.

The connectors architecture was designed to be scalable. Depending on the connector, jobs can now be scaled by adding new instances of the connector. The fetching process for these new types also supports distributed fetching, so that many instances can contribute to the same job.

Hosted connectors

In the hosted case, connectors are cluster aware. This means that when a new instance of Fusion starts up, the connectors on other Fusion nodes become aware of the new connectors, and vice versa. This makes scaling the crawling process very natural and simple.

Remote connectors

SDK connectors can be hosted within Fusion Server or can run remotely. In the remote case, connectors become clients of Fusion. These clients run a very lightweight process and communicate to Fusion using a very efficient messaging format. This option makes it possible to put the connector wherever the data lives. This may be done for performance reasons, or for security/access reasons.

V1 and V2 platform versions

Initially, Fusion offered classic connectors, also known as V1 connectors. V1 connectors were developed with general-purpose crawler framework called Anda, created by Lucidworks. Anda helps simplify and streamline crawler development, reducing the task of developing a new crawler to gain access to your data.

Fusion 4.1.0 and later supports V2 connectors, which utilize a Java SDK framework. The V2 platform version is included by default for all connectors it is available for. As of Fusion 5.6.1, V2 connectors using a gRPC backend are supported on-prem.

In Fusion 5.2.0 and later, V1 connectors are included in the Fusion image. Fusion locates compatible V1 connectors locally for installation at any time through the UI (under Datasources) or via the Connector Plugins Repository API.

In addition to the features and benefits provided by V1 connectors, V2 connectors offer:

  • Security Access-control Lists (ACL) which are separate from content

  • Fusion connectors support SSL/TLS security

  • Improved scalability, depending on the connector

    • Jobs can be scaled by simply adding instances of the connector

    • The fetching process supports distributed fetching, allowing many instances to contribute to the same job

  • Connectors can be hosted within Fusion, or can run remotely

    • Hosted connectors are cluster-aware, allowing connectors on separate notes to become of new connectors

    • Remote connectors become clients of Fusion and run a lightweight process and communicate to Fusion using an efficient messaging format

    • Remote connectors can be located wherever the data is located, which might be required for performance or security and access

  • Google’s fast and efficient framework gRPC is used as the underlying client/server technology

    • Increased flexibility in the way services and their methods are defined

    • HTTP/2 based transport

    • Efficient serialization format for data handling (protocol buffers)

    • Allows bi-directional/multiplexed stream

Connector logs

You can find connector logs in https://FUSION_HOST:FUSION_PORT/var/log/connectors.

SDK connectors support Diagnostic Mode, which enables Fusion to print more detailed information to the logs about each request, including the ID of every document inserted, updated, or deleted in the oplog. More information on Diagnostic Mode can be found in the Configuration section of the connectors which offer it: