Import Data with the REST API
- Push documents to Fusion using index profiles
- Send documents to an index pipeline
- Indexing CSV Files
It is often possible to get documents into Fusion Server by configuring a datasource with the appropriate connector.
But if there are obstacles to using connectors, it can be simpler to index documents with a REST API call to an index profile or pipeline.
Index profiles allow you to send documents to a consistent endpoint (the profile alias) and change the backend index pipeline as needed. The profile is also a simple way to use one pipeline for multiple collections without any one collection "owning" the pipeline.
You can send documents directly to an index using the Index REST API. The request path is:
These requests are sent as a POST request. The request header specifies the format of the contents of the request body. Create an index profile in the Fusion UI.
To send a streaming list of JSON documents, you can send the JSON file that holds these objects to the API listed above with
application/json as the content type. If your JSON file is a list or array of many items, the endpoint operates in a streaming way and indexes the docs as necessary.
Accessing an index profile through an app lets a Fusion admin secure and manage all objects on a per-app basis. Security is then determined by whether a user can access an app. This is the recommended way to manage permissions in Fusion.
The syntax for sending documents to an index profile that is part of an app is as follows:
curl -u USERNAME:PASSWORD -X POST -H 'content-type: application/json' https://FUSION_HOST:FUSION_PORT/api/apps/APP_NAME/index/INDEX_PROFILE --data-binary @my-json-data.json
|Spaces in an app name become underscores. Spaces in an index profile name become hyphens.|
To prevent the terminal from displaying all the data and metadata it indexes—useful if you are indexing a large file—you can optionally append
?echo=false to the URL.
Be sure to set the content type header properly for the content being sent. Some frequently used content types are:
More types: http://filext.com/faq/office_mime_types.php
$FUSION_HOME/apps/solr-dist/example/exampledocs you can find a few sample documents. This example uses one of these,
To push JSON data to an index profile under an app:
Create an index profile. In the Fusion UI, click Indexing > Index Profiles and follow the prompts.
From the directory containing
books.json, enter the following, substituting your values for username, password, and index profile name:
curl -u USERNAME:PASSWORD -X POST -H 'content-type: application/json' https://FUSION_HOST:FUSION_PORT/api/apps/APP_NAME/index/INDEX_PROFILE?echo=false --data-binary @books.json
Test that your data has made it into Fusion:
Log into the Fusion UI.
Navigate to the app where you sent your data.
Navigate to the Query Workbench.
Select relevant Display Fields, for example
In most cases it is best to delegate permissions on a per-app basis. But if your use case requires it, you can push data to Fusion without defining an app.
To send JSON data without app security, issue the following curl command:
curl -u USERNAME:PASSWORD -X POST -H 'content-type: application/json' https://FUSION_HOST:FUSION_PORT/api/index/INDEX_PROFILE --data-binary @my-json-data.json
To send XML data to an app, use the following:
curl -u USERNAME:PASSWORD -X POST -H 'content-type: application/xml' https://FUSION_HOST:FUSION_PORT/api/apps/APP_NAME/index/INDEX_PROFILE --data-binary @my-xml-file.xml
In Fusion 5.3.0+, documents can be created on the fly using the PipelineDocument JSON notation.
Although sending documents to an index profile is recommended, if your use case requires it, you can send documents directly to an index pipeline.
For more information about index pipeline REST API reference documentation, select the link for your Fusion release:
When you push data to a pipeline, you can specify the name of the parser by adding a parserId querystring parameter to the URL.
If you do not specify a parser, and you are indexing outside of an app (
https://FUSION_HOST:FUSION_PORT/api/index-pipelines/…), then the
_system parser is used.
If you do not specify a parser, and you are indexing in an app context (
https://FUSION_HOST:FUSION_PORT/api/apps/APP_NAME/index-pipelines/…), then the parser with the same name as the app is used.
|This section applies to Fusion 4.0 only.|
Index a PDF document through the
conn_solr index pipeline to a collection named
conn_solr pipeline includes stages to parse documents with Tika, map fields, and index the documents to Solr (in that order).
curl -u USERNAME:PASSWORD -X POST -H "Content-Type: application/pdf" --data-binary @/solr/core/src/test-files/mailing_lists.pdf https://FUSION_HOST:FUSION_PORT/api/index-pipelines/conn_solr/collections/docs/index
In the usual case, to index a CSV or TSV file, the file is split into records, one per row, and each row is indexed as a separate document.