Acceldata Torch Cloud (managed Kubernetes) Installation

Acceldata Torch uses Kubernetes for deployment and execution. This is guided by Replicated Kots which is responsible for deployment providing single click experience to end customer. This document describes how to deploy in EKS. Similar setup can be done for other cloud provider.

Creating the cluster

With eksctl user can create a Amazon managed Kubernetes cluster.

☁  ~  eksctl create cluster \
--name replicated-test-01 \
--version 1.17 \
--region us-east-2 \
--nodegroup-name replicated-nodes \
--node-type t3.large \ 
--nodes 3 \
--nodes-min 1 \
--nodes-max 4 \
--managed
[ℹ]  eksctl version 0.27.0
[ℹ]  using region us-east-2
[ℹ]  setting availability zones to [us-east-2b us-east-2c us-east-2a]
[ℹ]  subnets for us-east-2b - public:192.168.0.0/19 private:192.168.96.0/19
[ℹ]  subnets for us-east-2c - public:192.168.32.0/19 private:192.168.128.0/19
[ℹ]  subnets for us-east-2a - public:192.168.64.0/19 private:192.168.160.0/19
[ℹ]  using Kubernetes version 1.17
[ℹ]  creating EKS cluster "replicated-test-01" in "us-east-2" region with managed nodes
[ℹ]  will create 2 separate CloudFormation stacks for cluster itself and the initial managed nodegroup
[ℹ]  if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=us-east-2 --cluster=replicated-test-01'
[ℹ]  CloudWatch logging will not be enabled for cluster "replicated-test-01" in "us-east-2"
[ℹ]  you can enable it with 'eksctl utils update-cluster-logging --region=us-east-2 --cluster=replicated-test-01'
[ℹ]  Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "replicated-test-01" in "us-east-2"
[ℹ]  2 sequential tasks: { create cluster control plane "replicated-test-01", 2 sequential sub-tasks: { no tasks, create managed nodegroup "replicated-nodes" } }
[ℹ]  building cluster stack "eksctl-replicated-test-01-cluster"
[ℹ]  deploying stack "eksctl-replicated-test-01-cluster"
[ℹ]  building managed nodegroup stack "eksctl-replicated-test-01-nodegroup-replicated-nodes"
[ℹ]  deploying stack "eksctl-replicated-test-01-nodegroup-replicated-nodes"
[ℹ]  waiting for the control plane availability...
[✔]  saved kubeconfig as "/home/dipayan/.kube/config"
[ℹ]  no tasks
[✔]  all EKS cluster resources for "replicated-test-01" have been created
[ℹ]  nodegroup "replicated-nodes" has 3 node(s)
[ℹ]  node "ip-192-168-10-197.us-east-2.compute.internal" is ready
[ℹ]  node "ip-192-168-61-223.us-east-2.compute.internal" is ready
[ℹ]  node "ip-192-168-84-99.us-east-2.compute.internal" is ready
[ℹ]  waiting for at least 1 node(s) to become ready in "replicated-nodes"
[ℹ]  nodegroup "replicated-nodes" has 3 node(s)
[ℹ]  node "ip-192-168-10-197.us-east-2.compute.internal" is ready
[ℹ]  node "ip-192-168-61-223.us-east-2.compute.internal" is ready
[ℹ]  node "ip-192-168-84-99.us-east-2.compute.internal" is ready
[ℹ]  kubectl command should work with "/home/dipayan/.kube/config", try 'kubectl get nodes'
[✔]  EKS cluster "replicated-test-01" in "us-east-2" region is ready

Upon completion, user will be able to see the nodes in the cluster using kubectl get nodes command.

☁  ~  kubectl get nodes                           
NAME                                           STATUS   ROLES    AGE   VERSION
ip-192-168-10-197.us-east-2.compute.internal   Ready    <none>   90s   v1.17.11-eks-cfdc40
ip-192-168-61-223.us-east-2.compute.internal   Ready    <none>   86s   v1.17.11-eks-cfdc40
ip-192-168-84-99.us-east-2.compute.internal    Ready    <none>   84s   v1.17.11-eks-cfdc40

Kots installation

Next step is to install Replicated kots.

☁  ~  curl https://kots.io/install | bash
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3567    0  3567    0     0   2215      0 --:--:--  0:00:01 --:--:--  2215
Installing replicatedhq/kots v1.21.2 (https://github.com/replicatedhq/kots/releases/download/v1.21.2/kots_linux_amd64.tar.gz)...
######################################################################## 100.0%#=#=-#  #                                                                     
Installed at /usr/local/bin/kubectl-kots

After the installation of kots, you need to install Acceldata Torch.

☁  ~  kubectl kots install torch/beta             
Enter the namespace to deploy to: torch
  • Deploying Admin Console
    • Creating namespace ✓  
    • Waiting for datastore to be ready ✓  
Enter a new password to be used for the Admin Console: ••••••••
  • Waiting for Admin Console to be ready ✓  

  • Press Ctrl+C to exit
  • Go to http://localhost:8800 to access the Admin Console

This command would ask for the namespace. When prompted, please provide the namespace where Torch has to be deployed. The command will also ask for the Kots Admin password. When prompted, provide the password. This password is required to log into the admin console.

At the end, the command will start a port forward and the admin console can be accessed at http://localhost:8800 in a browser.

To start the tunnel again, execute kubectl kots admin-console --namespace <namespace>. Replace <namespace> with the namespace provided during installation.

In the browser, the first screen will ask for the password.

Password

Next, it will ask to upload the license file provided by acceldata.

Password .

Then, the configuration screen loads. The process is same for both types of installation.

After the configuring the system, user should click on the deploy button on the next screen. This would take few minutes to complete.

Deployment Screen

Verify installation

After few minutes, the following services should be visible.

kubectl get services -n torch
NAME                  TYPE           CLUSTER-IP       EXTERNAL-IP                                                              PORT(S)                      AGE
ad-analysis-service   ClusterIP      10.100.214.57    <none>                                                                   19021/TCP                    159m
ad-catalog            ClusterIP      10.100.94.108    <none>                                                                   8888/TCP                     159m
ad-catalog-auth-db    ClusterIP      10.100.104.183   <none>                                                                   27017/TCP                    159m
ad-catalog-db         ClusterIP      10.100.67.246    <none>                                                                   5432/TCP                     159m
ad-catalog-ui         ClusterIP      10.100.176.217   <none>                                                                   4000/TCP                     159m
ad-torch-auth         ClusterIP      10.100.123.46    <none>                                                                   9090/TCP                     159m
ad-torch-ml           ClusterIP      10.100.156.120   <none>                                                                   19035/TCP                    159m
kotsadm               ClusterIP      10.100.205.153   <none>                                                                   3000/TCP                     164m
kotsadm-minio         ClusterIP      10.100.133.93    <none>                                                                   9000/TCP                     165m
kotsadm-postgres      ClusterIP      10.100.36.198    <none>                                                                   5432/TCP                     165m
livy                  ClusterIP      10.100.92.54     <none>                                                                   80/TCP                       159m
livy-headless         ClusterIP      None             <none>                                                                   <none>                       159m
torch-api-gateway     LoadBalancer   10.100.105.132   a2a261184f9474b4f99f22f90c03f440-167265556.us-east-2.elb.amazonaws.com   80:32651/TCP,443:31041/TCP   148m

Accessing the Torch Application

For the torch-api-gateway service shown above, the system will assign an external LoadBalancer dns.

The Torch UI can be accessed from the DNS provided for the nginx ingress.

For example: Open https://a2a261184f9474b4f99f22f90c03f440-167265556.us-east-2.elb.amazonaws.com/ to start the Torch UI.

Creating the cluster#

Kots installation#

Verify installation#

Accessing the Torch Application#

Creating the cluster

Kots installation

Verify installation

Accessing the Torch Application