Legacy Product

Fusion 5.10
    Fusion 5.10

    Spark Operations

    These topics provide how-tos for Spark operations:

    Node Selectors

    You can control which nodes Spark executors are scheduled on using a Spark configuration property for a job:

    spark.kubernetes.node.selector.<LABEL>=<LABEL_VALUE>

    Use the LABEL specified for the node, and the name of the node as the LABEL_VALUE. For example, if a node is labeled with fusion_node_type=spark_only, schedule Spark executor pods to run on that node using:

    spark.kubernetes.node.selector.fusion_node_type=spark_only
    Spark version 2.4.x does not support tolerations for Spark pods. As a result, Spark pods can’t be scheduled on any nodes with taints.

    Cluster mode

    Fusion 5 ships with Spark and operates in "cluster mode" on top of Kubernetes. In cluster mode, each Spark driver runs in a separate pod, and resources can be managed per job. Each executor also runs in its own pod.

    Spark config defaults

    The table below shows the default configurations for Spark. These settings are configured in the job-launcher config map, accessible using kubectl get configmaps <release-name>-job-launcher. Some of these settings are also configurable via Helm.

    Spark Resource Configurations
    Spark Configuration Default value Helm Variable

    spark.driver.memory

    3g

    spark.executor.instances

    2

    executorInstances

    spark.executor.memory

    3g

    spark.executor.cores

    6

    spark.kubernetes.executor.request.cores

    3

    spark.sql.caseSensitive

    true

    Spark Kubernetes Configurations
    Spark Configuration Default value Helm Variable

    spark.kubernetes.container.image.pullPolicy

    Always

    image.imagePullPolicy

    spark.kubernetes.container.image.pullSecrets

    image.imagePullSecrets

    spark.kubernetes.authenticate.driver.serviceAccountName

    <name>-job-launcher-spark

    spark.kubernetes.driver.container.image

    fusion-dev-docker.ci-artifactory.lucidworks.com

    image.repository

    spark.kubernetes.executor.container.image

    fusion-dev-docker.ci-artifactory.lucidworks.com

    image.repository