Legacy Product

Fusion 5.4

Search Cluster API

API Objective: Connect Fusion to other Zookeeper-managed Solr clusters.

The search cluster API allows users to connect Fusion with any existing Solr instances in a Zookeeper-managed cluster.

Cluster operations are only supported when connecting through Zookeeper.

Once the Solr cluster is registered with Fusion, requests can be proxied through Fusion to it. The possible requests include search requests, but they can also be content indexing requests, such as the content crawled with a connector.

Once the searchCluster has been configured, the user can create Fusion collections that refer to the Solr collections that have been previously defined.

Background

Solr has three different approaches on how you can control visibility of new documents in search:

  • You can commit manually

  • You can rely on Solr’s autoCommit setting

  • You can specify commitWithin when adding documents

Fusion uses commitWithin to avoid relying on specific Solr side configurations. Fusion controls commitWithin on a per-collection basis so you can have multiple collections with different commit frequencies (for example, product documents can be committed more often than signals).

Global setting for commitWithin:

curl http://localhost:8765/api/v1/configurations/com.lucidworks.apollo.solr.commitWithin
10000

com.lucidworks.apollo.solr.commitWithin is a global configuration property that defines default commitWithin for all documents added through Fusion. Every time you create a new collection in Fusion, per-collection commitWithin is initialized as the global default.

Per-collection setting: You can either specify this property when creating collection or update it with PUT later.

# create collection without specifying commitWithin
sh> curl -H 'Content-type: application/json' -X POST 'http://localhost:8765/api/v1/collections' -d '{"id" : "test"}'
{
  "id" : "test",
  ...
  "commitWithin" : 10000,
  ...
}

# create collection and specify non default value
sh> curl -H 'Content-type: application/json' -X POST 'http://localhost:8765/api/v1/collections' -d '{"id" : "test2", "commitWithin": 20000}'
{
  "id" : "test2",
  ...
  "commitWithin" : 20000,
  ...
}

# update commitWithin at a runtime
sh> curl -H 'Content-type: application/json' -X PUT 'http://localhost:8765/api/v1/collections/test' -d '
{
  "id" : "test",
  "createdAt" : "2015-01-07T17:44:47.396Z",
  "searchClusterId" : "default",
  "commitWithin" : 20000,
  "solrParams" : {
    "name" : "test",
    "numShards" : 1,
    "replicationFactor" : 1
  },
  "type" : "DATA",
  "metadata" : { }
}'

Search Cluster Definition Properties

Property Description

id
Required

The ID of the search cluster. This is only required when creating a new cluster definition with a POST request.

connectString
Required

The string to use to connect to the existing Solr cluster or standalone instance.

If the existing Solr is running in SolrCloud mode, use the connect string for the ZooKeeper ensemble.

If the existing Solr is running as a standalone instance, use the full URL for the Solr instance.

cloud
Required

Defines if the "cluster" being defined is a SolrCloud cluster (true) or a standalone Solr instance (false).

bufferFlushInterval
Optional

Defines how often to flush the buffer when writing to this cluster. If not defined, the system will default to 1000 milliseconds.

bufferSize
Optional

Defines the size of the buffer. If not defined, the system will default to 100 items in the buffer.

concurrency
Optional

Defines the maximum number of concurrent /parallel requests to Solr servers when Fusion index pipeline Solr Indexer stage has property bufferDocsForSolr set to true.

zkClientTimeout
Optional

The maximum amount of time to wait when communicating with the ZooKeeper ensemble for a SolrCloud instance.

zkConnectTimeout
Optional

The maximum amount of time to wait when trying to connect to the ZooKeeper ensemble for a SolrCloud instance.

Examples

Create a new search cluster that is an existing SolrCloud cluster:

REQUEST

curl -u USERNAME:PASSWORD -X POST -H 'Content-type: application/json' -d '{"id":"mySolrCluster", "connectString":"10.0.1.6:5001,10.0.1.6:5002,10.0.1.6:5003", "cloud":true}' https://FUSION_HOST:FUSION_PORT/api/searchCluster

RESPONSE

{
  "id" : "mySolrCluster",
  "connectString" : "10.0.1.6:5001,10.0.1.6:5002,10.0.1.6:5003",
  "cloud" : true,
}

Create a 'cluster' that is a standalone Solr instance:

REQUEST

curl -u USERNAME:PASSWORD -X POST -H 'Content-type: application/json' -d '{"id":"myOtherSolrCluster", "connectString":"https://FUSION_HOST:8983/solr", "cloud":false}' https://FUSION_HOST:FUSION_PORT/api/searchCluster

RESPONSE

{
  "id" : "myOtherSolrCluster",
  "connectString" : "https://FUSION_HOST:8983/solr",
  "cloud" : false,
}

Show the status of each node of 'mySolrCluster':

REQUEST

curl https://FUSION_HOST:FUSION_PORT/api/searchCluster/mySolrCluster/nodes

RESPONSE

[ {
  "name" : "10.0.1.11:7574_solr",
  "baseUrl" : "http://10.0.1.11:7574/solr",
  "state" : "active"
}, {
  "name" : "10.0.1.8:7574_solr",
  "baseUrl" : "http://10.0.1.8:7574/solr",
  "state" : "active"
} ]

Show the system information for one named node:

REQUEST

curl http://10.0.1.8:6764/api/searchCluster/mySolrCluster/systemInfo?nodeName=10.0.1.8:7574_solr

RESPONSE

{
  "10.0.1.8:7574_solr" : {
    "mode" : "solrcloud",
    "lucene" : {
      "solr-spec-version" : "4.8.0",
      "lucene-spec-version" : "4.8.0"
    },
    "jvm" : {
      "version" : "1.8.0_121 25.121-b13",
      "name" : "Oracle Corporation Java HotSpot(TM) 64-Bit Server VM",
      "processors" : 4,
      "memory" : {
        "raw" : {
          "free" : 66736272,
          "total" : 204800000,
          "max" : 204800000,
          "used" : 138063728,
          "used%" : 67.4139296875
        }
      }
    },
    "system" : {
      "name" : "Mac OS X",
      "version" : "10.9.3",
      "arch" : "x86_64",
      "systemLoadAverage" : 2.130859375,
      "committedVirtualMemorySize" : 2963378176,
      "freePhysicalMemorySize" : 9321914368,
      "freeSwapSpaceSize" : 1073741824,
      "processCpuTime" : 313176000000,
      "totalPhysicalMemorySize" : 17179869184,
      "totalSwapSpaceSize" : 1073741824,
      "openFileDescriptorCount" : 208,
      "maxFileDescriptorCount" : 10240,
      "uname" : "Darwin MacMini.local 13.2.0 Darwin Kernel Version 13.2.0: Thu Apr 17 23:03:13 PDT 2014; root:xnu-2422.100.13~1/RELEASE_X86_64 x86_64\n",
      "uptime" : "15:48  up 3 days,  7:08, 7 users, load averages: 2.13 2.01 1.91\n"
    }
  }
}

Loading API specification...