Installing Acceldata Torch
Acceldata Torch uses Kubernetes for deployment and execution. This is guided by Replicated Kots which is responsible for deployment providing a single click experience to end customers. Torch can be deployed in both managed cloud Kubernetes environment and on-premise machines. In on-prem environment, Replicated Kots installs Kubernetes in the provided nodes and ensures that Torch gets deployed in that environment.
Once you sign up for Torch, you will be provided with a license file that needs to be used during the installation process.
To install Acceldata Torch, you should provide a cloud-managed Kubernetes environment, or on-premise nodes where Kubernetes will be installed.
Once the Kubernetes environment is ready, the process of configuring and installing Torch is same for both the environments.
Minimum Hardware Recommendation
The recommended HW configuration depends on the following factors:
- Amount of data to be processed
- Type of Spark deployment used
Data Volume | Spark deployment mode | K8S Cluster Configuration |
---|---|---|
Low(< 10GB) | External Spark (Hadoop cluster) | 1 master and 2 worker nodes (2 cores, 8GB + Memory each) |
Low(< 10GB) | Spark On Kubernetes | 1 master and 4 worker nodes (4 cores, 8GB + Memory each) |
Medium(10GB to 100GB) | External Spark (Hadoop cluster) | 1 master and 2 worker nodes (2 cores, 8GB + Memory each) |
Medium(10GB to 100GB) | Spark On Kubernetes | 1 master and 6 worker nodes (4 cores, 8GB + Memory each) |
High(100GB+ ) | External Spark (Hadoop cluster) | 1 master and 2 worker nodes (2 cores, 8GB + Memory each) |
High(100GB+ ) | Spark On Kubernetes | 1 master and 8 worker nodes (4 cores, 16GB + Memory each) |
On-premise software installation
The first step in the process is to install Kubernetes cluster in the nodes.
- SSH into the master node.
- Execute the following command
- Follow the guided procedure step by step.
- If you are prompted with the given statement,
This application is incompatible with memory swapping enabled. Disable swap to continue? (Y/n)
, pressY
. Firewall
is to be disabled.- Select the network interface, if prompted.
- Finally, Kots installs the components required for Kubernetes master.
Once the Kots components are installed, copy the content at the end and store it for future reference. Example output:
In the next step, log into the worker nodes and execute the join
command as mentioned at the bottom of the master installation procedure.
Follow the instructions and on completion of the installation, the worker nodes are joined with the cluster.
To check if the nodes are ready, execute the following code on the master
node.
Managed Cloud Kubernetes Installation
In a managed Kubernetes cluster, the nodes are managed by the cloud provider, hence only kots needs to be installed.
Execute the following command in an environment where kubectl
is configured and it points to the cluster.
The above command installs Kots and the system is ready for Torch deployment.
Configure and Install Torch
Open any browser and go to the following URL:
http://master-node:8800
to open up the replicated manager. ClickContinue to Setup
button.If you view a pop-up warning message about the connection not being private, proceed by adding an exception.
In the following window, click
skip & continue
to bypass setting SSL certificate for Admin console.
- The password window is displayed.
Enter your password for Kots admin provided the installation is completed. Look out for the following output
- Upload the license provided by Acceldata in the next window that is displayed.
Next, you need to provide configurations.
Click continue.
In a few minutes, Torch installation will complete and the Kubernetes artifacts will be deployed.
Configurations
Torch version
Displays the current Acceldata Torch version that is to be installed. This is a read-only configuration for reference.
Hive Configuration
Click Enable hive support
if Hive support is required. Upload the hive-site.xml
file in the specified location. If enabled, provide the configurations for Other Hadoop settings.
Other Hadoop Configuration
If you have enabled Hive support, core-site.xml
and hdfs-site.xml
must be provided. Also, if the job result is to be saved in HDFS
, then this configuration is required.
Job result persistence configuration
Torch stores the result of the jobs in few distributed file systems. Currently, it can store in HDFS
or AWS S3
.
Select one of the two options given below:
- Use HDFS file system
- Use AWS S3 file system
HDFS configuration:
Inputs required:
- Directory: HDFS directory where job results will be stored (Default: /tmp/ad/torch_results)
note
Other Hadoop configurations are also required for this option. Refer Other Hadoop Config.
AWS S3 configuration:
Inputs required:
AWS S3 Access key: Access Key for the bucket
AWS S3 Secret key: Secret Key for the bucket
AWS S3 Bucket name: Bucket name for where the job results are to be stored.
note
It should only contain alphanumeric letters.
Spark Support
Torch uses Apache Spark for running jobs. Currenly, Torch supports three modes of deployment.
Use Embedded Spark
In this mode, Torch runs jobs locally inside a service. No separate installation of configuration is required.
note
This should only be used for testing.
Use Existing Spark cluster
If there is an existing hadoop cluster with Apache Spark installed, then Torch can run the jobs inside the cluster. It is required to have Apache Livy installed as well. Torch connects to Livy using HTTP and submits the Spark jobs.
Inputs required:
- Apache Livy URL: HTTP endpoint for Livy
- Apache Livy Queue: The queue name to which the jobs are submitted
- Number of executors: Number of executors that are spawned for each job
- Number of CPU cores: Number of CPU cores per executor
- Memory per executor: Amount of memory to be allocated to each executor
Deploy Spark on Kubernetes
In this mode, the installer deploys Spark on Kubernetes and that is used for running the Jobs.
Inputs required:
- Number of executors: Number of executors to be spawned for each job
- Number of CPU cores: Number of CPU cores per executor
- Memory per executor: Amount of memory to be allocated to each executor
note
This is the most preferred option.
Notification Configuration
Click enable notification
, if notification support is required. On enabling, Torch would send emails or Slack messages for multiple events occurring in the system.
Input required:
- Default email ID: The default mail ID from which mails are sent
- Default Slack webhook url: The default channel to send Slack messages
After the configuring the system, click the Deploy button on the next screen. This would take few minutes to complete.
Verify installation
After few minutes, the following services should be visible.
Accessing the Torch Application
For the nginx-ingress-nginx-controller
service shown above, the system assigns a node port 80.
The Torch UI can be accessed from port 80 of the Kubernetes master node.
For example: Open http://xxx.xxx.xxx.xxx
to start the Torch UI, where xxx.xxx.xxx.xxx
is the K8S master node IP or
hostname.