Legacy Product

Fusion 5.10
    Fusion 5.10

    Troubleshoot When Installing Fusion 4.x

    This topic explains how to troubleshoot difficulties that occur when installing or upgrading Fusion.

    Fusion run script failures

    Common problems that cause Fusion run scripts to fail:

    • Wrong Java version.

    • Spaces in Windows install path.

    • Users have insufficient privileges for the installation directory.

    • Java bin directory not in the PATH environment variable.

    • Some Fusion services may already be running, or registered as running.

    • Roaming IP address; try uncommenting this line in https://FUSION_HOST:FUSION_PORT/conf/fusion.cors (fusion.properties in Fusion 4.x):

      default.address = 127.0.0.1

    Check the Java version

    Fusion runs on JDK 1.8. See System Requirements.

    Fusion scripts use the environment variable JAVA_HOME. To check the setting of this variable, log in to the account used to run Fusion, and check that this variable is set to the proper value. On a linux, Mac, or other Unix system, use the following command:

    echo $JAVA_HOME

    On Windows, the command is:

    echo %JAVA_HOME%

    Fusion scripts execute both the java and javac commands. To check the Java version invoked by these commands, run the following commands from a shell or terminal window:

    java -version
    javac -version

    Clear browser cache

    If a previous version of Fusion was accessed in the browser with the same URL as that of the newly installed version of Fusion, then there may be old pages and/or cookies in the browser cache. A hard page refresh will clear old pages from the browser cache. If clearing the page cache does not solve this problem, clear session cookies as well.

    Stop/Clean up/Start

    If the script https://FUSION_HOST:FUSION_PORT/bin/fusion start completes without reporting an error, but the Fusion UI displays a message that it cannot find Collections or Datasources, this may be due to Fusion services not being able to communicate properly (via ZooKeeper). This can happen with developer deployments running on a laptop if the network connection changes or is interrupted, especially when using the embedded ZooKeeper instance that is bundled with Fusion.

    In this situation, you should stop Fusion, inspect the system processes and if necessary, manually terminate running processes and cleanup .pid files to bring the system back to a clean state, then start Fusion once again.

    Although the Fusion run script bin/fusion provides a restart option, the restart option assumes a correctly functioning system and cannot always recover from system failure.

    To stop Fusion:

    Run the script https://FUSION_HOST:FUSION_PORT/bin/fusion with the argument stop:

    $ cd {path_to}https://FUSION_HOST:FUSION_PORT
    $ ./bin/fusion stop
    Successfully stopped ui (process ID 41524)
    Successfully stopped connectors (process ID 41328)
    Successfully stopped api (process ID 41159)
    Successfully stopped solr (process ID 41153)
    Successfully stopped zookeeper (process ID 41151)

    After stopping Fusion, you should make sure that no Fusion services are running. When the Fusion scripts start a Fusion service, they record the process id in a .pid file in the directory https://FUSION_HOST:FUSION_PORT/var. For a Fusion instance that is up and running, we see the following set of .pid files:

    >  find {path_to}https://FUSION_HOST:FUSION_PORT/var -name "*.pid" -print
    
    fusion/var/api/api.pid
    fusion/var/connectors/connectors.pid
    fusion/var/solr/solr.pid
    fusion/var/spark-master/spark-master.pid
    fusion/var/spark-worker/spark-worker.pid
    fusion/var/ui/ui.pid
    fusion/var/zookeeper/zookeeper.pid

    The above output shows the set of .pid created by a single Fusion instance running with embedded ZooKeeper and Solr.

    But if no Fusion services are running, there should not be any .pid files. In the case that all services have been stopped, but there are still some .pid files found, these files should be deleted before starting Fusion.

    Inspect the log files

    If none of the above help, inspect the Fusion log files in directory https://FUSION_HOST:FUSION_PORT/var/log.

    If you experience unexpected termination when running Fusion, first look in the log files for clues.

    One setting you can look into is in $FUSION_HOME/conf/fusion.cors: default.supervision.pollingFailureCountThreshold.

    By default, pollingFailureCountThreshold is set to 1, so the Agent restarts all services the second time it fails to reach a service. Try a modest increase, for example set pollingFailureCountThreshold to 3.

    Log file names that start with "oom" indicate out-of-memory problems. You might need to increase the amount of memory allotted to that service. The amount of memory allotted to each kind of Fusion service is controlled by environment variables that are set in the fusion.cors (fusion.properties in Fusion 4.x) file.

    Troubleshoot a Windows Install

    Check common Windows service install script mistakes:

    • Is the account trying run the install script a poweruser/administrator of the server?

    • Is the DOMAIN\USERNAME correctly specified? Is the Domain correct?

    • Is java installed on the %PATH%? To use a different Java, specify it in bin/windows-service-wrapper.xml.

    • Are there any obvious issues in var/log/windows-service-wrapper.log ?

    • Does Fusion start from the normal bin\fusion start?

    • Are there any other errors in the normal logs?

    Increase memory

    One other thing that can happen if you have not changed any of the default settings is for the services to run out of memory under heavy load, causing the program to crash.

    To find out if this happened, you can check for the presence of any files matching the pattern oom_killer-.log* in the log directory for the service that is being restarted, for example, $FUSION_HOME/var/log/connectors-classic if that is the one being restarted.

    If this is your issue, the first step is to increase the memory of the affected component by modifying conf/fusion.cors (fusion.properties in Fusion 4.x). Go to the jvmOptions for the service in question and change the value of the -Xmx flag. By default you will see something like:

    connectors-classic.jvmOptions = -Xmx1g -Xss256k -Dcom.lucidworks.connectors.pipelines.embedded=false

    -Xmx1g means this service will fail if it needs more than 1 gigabyte of memory to operate. Increase this memory, for example set the flag to Xmx4g for an additional 3 gigabytes. 1g can be very low for some workloads.