Acceldata Torch Cloud (managed Kubernetes) Installation

Acceldata Torch uses Kubernetes for deployment and execution. This is guided by Replicated Kots which is responsible for deployment providing single click experience to end customer. This document describes how to deploy in EKS. Similar setup can be done for other cloud provider.

Creating the cluster

With eksctl user can create a Amazon managed Kubernetes cluster.

☁ ~ eksctl create cluster \
--name replicated-test-01 \
--version 1.17 \
--region us-east-2 \
--nodegroup-name replicated-nodes \
--node-type t3.large \
--nodes 3 \
--nodes-min 1 \
--nodes-max 4 \
--managed
[] eksctl version 0.27.0
[] using region us-east-2
[] setting availability zones to [us-east-2b us-east-2c us-east-2a]
[] subnets for us-east-2b - public:192.168.0.0/19 private:192.168.96.0/19
[] subnets for us-east-2c - public:192.168.32.0/19 private:192.168.128.0/19
[] subnets for us-east-2a - public:192.168.64.0/19 private:192.168.160.0/19
[] using Kubernetes version 1.17
[] creating EKS cluster "replicated-test-01" in "us-east-2" region with managed nodes
[] will create 2 separate CloudFormation stacks for cluster itself and the initial managed nodegroup
[] if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=us-east-2 --cluster=replicated-test-01'
[] CloudWatch logging will not be enabled for cluster "replicated-test-01" in "us-east-2"
[] you can enable it with 'eksctl utils update-cluster-logging --region=us-east-2 --cluster=replicated-test-01'
[] Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "replicated-test-01" in "us-east-2"
[] 2 sequential tasks: { create cluster control plane "replicated-test-01", 2 sequential sub-tasks: { no tasks, create managed nodegroup "replicated-nodes" } }
[] building cluster stack "eksctl-replicated-test-01-cluster"
[] deploying stack "eksctl-replicated-test-01-cluster"
[] building managed nodegroup stack "eksctl-replicated-test-01-nodegroup-replicated-nodes"
[] deploying stack "eksctl-replicated-test-01-nodegroup-replicated-nodes"
[] waiting for the control plane availability...
[] saved kubeconfig as "/home/dipayan/.kube/config"
[] no tasks
[] all EKS cluster resources for "replicated-test-01" have been created
[] nodegroup "replicated-nodes" has 3 node(s)
[] node "ip-192-168-10-197.us-east-2.compute.internal" is ready
[] node "ip-192-168-61-223.us-east-2.compute.internal" is ready
[] node "ip-192-168-84-99.us-east-2.compute.internal" is ready
[] waiting for at least 1 node(s) to become ready in "replicated-nodes"
[] nodegroup "replicated-nodes" has 3 node(s)
[] node "ip-192-168-10-197.us-east-2.compute.internal" is ready
[] node "ip-192-168-61-223.us-east-2.compute.internal" is ready
[] node "ip-192-168-84-99.us-east-2.compute.internal" is ready
[] kubectl command should work with "/home/dipayan/.kube/config", try 'kubectl get nodes'
[] EKS cluster "replicated-test-01" in "us-east-2" region is ready

Upon completion, user will be able to see the nodes in the cluster using kubectl get nodes command.

☁ ~ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-192-168-10-197.us-east-2.compute.internal Ready <none> 90s v1.17.11-eks-cfdc40
ip-192-168-61-223.us-east-2.compute.internal Ready <none> 86s v1.17.11-eks-cfdc40
ip-192-168-84-99.us-east-2.compute.internal Ready <none> 84s v1.17.11-eks-cfdc40

Kots installation

Next step is to install Replicated kots.

☁ ~ curl https://kots.io/install | bash
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 3567 0 3567 0 0 2215 0 --:--:-- 0:00:01 --:--:-- 2215
Installing replicatedhq/kots v1.21.2 (https://github.com/replicatedhq/kots/releases/download/v1.21.2/kots_linux_amd64.tar.gz)...
######################################################################## 100.0%#=#=-# #
Installed at /usr/local/bin/kubectl-kots

After the installation of kots, you need to install Acceldata Torch.

☁ ~ kubectl kots install torch/beta
Enter the namespace to deploy to: torch
• Deploying Admin Console
• Creating namespace ✓
• Waiting for datastore to be ready ✓
Enter a new password to be used for the Admin Console: ••••••••
• Waiting for Admin Console to be ready ✓
• Press Ctrl+C to exit
• Go to http://localhost:8800 to access the Admin Console

This command would ask for the namespace. When prompted, please provide the namespace where Torch has to be deployed. The command will also ask for the Kots Admin password. When prompted, provide the password. This password is required to log into the admin console.

At the end, the command will start a port forward and the admin console can be accessed at http://localhost:8800 in a browser.

To start the tunnel again, execute kubectl kots admin-console --namespace <namespace>. Replace <namespace> with the namespace provided during installation.

In the browser, the first screen will ask for the password.

Password

Next, it will ask to upload the license file provided by acceldata.

Password.

Then, the configuration screen loads. The process is same for both types of installation.

After the configuring the system, user should click on the deploy button on the next screen. This would take few minutes to complete.

Deployment Screen

Verify installation

After few minutes, the following services should be visible.

kubectl get services -n torch
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ad-analysis-service ClusterIP 10.100.214.57 <none> 19021/TCP 159m
ad-catalog ClusterIP 10.100.94.108 <none> 8888/TCP 159m
ad-catalog-auth-db ClusterIP 10.100.104.183 <none> 27017/TCP 159m
ad-catalog-db ClusterIP 10.100.67.246 <none> 5432/TCP 159m
ad-catalog-ui ClusterIP 10.100.176.217 <none> 4000/TCP 159m
ad-torch-auth ClusterIP 10.100.123.46 <none> 9090/TCP 159m
ad-torch-ml ClusterIP 10.100.156.120 <none> 19035/TCP 159m
kotsadm ClusterIP 10.100.205.153 <none> 3000/TCP 164m
kotsadm-minio ClusterIP 10.100.133.93 <none> 9000/TCP 165m
kotsadm-postgres ClusterIP 10.100.36.198 <none> 5432/TCP 165m
livy ClusterIP 10.100.92.54 <none> 80/TCP 159m
livy-headless ClusterIP None <none> <none> 159m
torch-api-gateway LoadBalancer 10.100.105.132 a2a261184f9474b4f99f22f90c03f440-167265556.us-east-2.elb.amazonaws.com 80:32651/TCP,443:31041/TCP 148m

Accessing the Torch Application

For the torch-api-gateway service shown above, the system will assign an external LoadBalancer dns.

The Torch UI can be accessed from the DNS provided for the nginx ingress.

For example: Open https://a2a261184f9474b4f99f22f90c03f440-167265556.us-east-2.elb.amazonaws.com/ to start the Torch UI.