Spark Operations

Table of Contents

Spark in Fusion On-Prem
Spark with Fusion AI
Related concepts
Related reference topics
Further Reading

Apache Spark is an open-source cluster-computing framework that serves as a fast and general execution engine for large-scale data processing jobs that can be decomposed into stepwise tasks, which are distributed across a cluster of networked computers.

Spark improves on previous MapReduce implementations by using resilient distributed datasets (RDDs), a distributed memory abstraction that lets programmers perform in-memory computations on large clusters in a fault-tolerant manner.

Spark in Fusion On-Prem

These topics provide information about Spark administration in Fusion Server:

Spark Components. Spark integration in Fusion, including a diagram
Spark Getting Started. Starting Spark processes and working with the shell and the Spark UI
Spark Driver Processes. Fusion jobs run on Spark use a driver process started by the API service
Spark Configuration. How to configure Spark for maximum performance. The article also provides information about ports, directories, and configuring connections for an SSL-enabled Solr cluster.
Scaling Spark Aggregations. How to configure Spark so that aggregations scale
Spark Troubleshooting. How to troubleshoot Spark

Additionally, you can configure and run Spark jobs in Fusion, using the Spark Jobs API or the Fusion UI.

Spark with Fusion AI

With a Fusion AI license, you can also use the Spark cluster to train and compile machine learning models, as well as to run experiments via the Fusion UI or the Spark Jobs API.

Spark jobs for Fusion AI

Spark Operations

Spark in Fusion On-Prem

Spark with Fusion AI

Related concepts

Related reference topics

Further Reading