Built-in SQL Aggregation Jobs Using Cloud Storage Buckets
Built-in SQL aggregation jobs can be set up to use source files in Cloud storage buckets.
This process can be used with the following data types and Cloud storage systems:
-
File formats such as
.parquet
and.orc
files -
Cloud storage systems such as Google Cloud Storage (GCS), Amazon Web Services (AWS), and Azure Kubernetes Service (AKS).
Configure Parameters
Google Cloud Storage (GCS)
-
Create a Kubernetes secret with the necessary credentials. For more information about creating a secret containing the credentials JSON file, see Configuring credentials for Spark jobs.
-
When the secret is successfully created, set the following parameters:
GENERAL PARAMETERS | ||
---|---|---|
Parameter Name |
Example Value |
Notes |
SOURCE COLLECTION |
|
Value: URI path that contains the desired signal data files. Example value returns all parquet files in the directory, where |
DATA FORMAT |
|
Value: File type of input file. Other value can be |
SPARK SETTINGS |
||
Parameter Name |
Example Value |
Notes |
|
|
Value: The Example: |
|
|
Value: The Example: |
|
|
Value: The name of the Example: |
|
|
Value: The name of the Example: |
|
|
Value: The name of the Example: |
Amazon Web Services (AWS)
-
Create a Kubernetes secret with the necessary credentials. For more information about creating a secret containing the credentials JSON file, see Configuring credentials for Spark jobs.
-
When the secret is successfully created, set the following parameters:
GENERAL PARAMETERS | ||
---|---|---|
Parameter Name |
Example Value |
Notes |
SOURCE COLLECTION |
|
Value: URI path that contains the desired signal data files. Example value returns all parquet files in the directory, where |
DATA FORMAT |
|
Value: File type of input file. Other value can be |
SPARK SETTINGS |
||
Parameter Name |
Example Value |
Notes |
|
|
Value: The |
|
|
Value: The |
|
|
Value: The |
|
|
Value: The |
Azure settings
GENERAL PARAMETERS | ||
---|---|---|
Parameter Name |
Example Value |
Notes |
SOURCE COLLECTION |
|
Value: URI path that contains the desired signal data files. Example value returns all parquet files in the directory, where |
DATA FORMAT |
|
Value: File type of input file. Other value can be |
SPARK SETTINGS |
||
Parameter Name |
Example Value |
Notes |
|
|
Makes the system file available inside the Spark job. |
|
|
Obtain the values for {storage-account-name} and {access-key-value} from the Users Azure UI. |