HPC Cluster Deployment Guide

The following text describes how to deploy HEAppE Middleware and connect it to the HPC cluster. HEAppE should be deployed within HPC centre´s internal network. The deployment process is divided into several steps.

1. Server Environment Prerequisites

  1. Server Environment (deploy on virtual machine)

    For running instances of HEAppE Middleware is necessary to have these hardware/software requirements.

    CPU

    RAM

    HDD

    Operation system

    2

    4GB

    50 GB

    Any Linux distribution (validated on Centos/Ubuntu)

    Package installation
    • Git (Highly recommended)

    • Network-utils (Highly recommended)

    • Epel-release (Highly recommended)

    • Wget (Highly recommended)

    • Nginx (required for ensure HTTPS)

    • Ansible Vault

    • Docker

    • Docker Compose

    Required ports to open in firewall
    • 22 - SSH access to virtual machine (open in restricted mode for specific IP subnet)

    • 80/443 - HTTPS Nginx proxy (Ensure SSL connectivity)

    • 5000 - HTTP HEAppE API (open in restricted mode for specific IP subnet)

    • 6000 - MS-SQL Database (open in restricted mode for specific IP subnet)

  2. HPC Robot accounts creation

    Process of gaining robot accounts is described at the Robot accounts.

2. HEAppE Environment & Configuration Setup

These steps should be performed as a sudo user (recommended). If you are unsure about the folder structure, please refer to the folder validation scheme at the end of this chapter.

HEAppE Middleware installation procedure

This script will guide you through the entire HEAppE Middleware setup process. If you prefer to go through the process manually, follow the steps at HPC Deployment Deployment Expert Guide.

The script can be downloaded here: HEAppE-setup.sh

To run the script, execute the following commands in your console as a sudo user:

Running the HEAppE-setup.sh Script

This section provides examples of how to run the HEAppE-setup.sh` script with and without data staging, along with explanations of each parameter.

With Data Staging

$ ./HEAppE-setup.sh \
    --project project1 \
    --secret-vault-password SecretPassword \
    --heappe-port 5000 \
    --db-port 6000 \
    --db-password Passw0rd \
    --heappe-core-repo https://github.com/It4innovations/HEAppE.git \
    --heappe-core-branch master \
    --data-staging-port 5001 \
    --defaults

This command configures the script to include a data staging port. Below is a breakdown of each parameter:

  • --project project1: Sets the project name to project1. This name defines the directory structure for the project.

  • --secret-vault-password SecretPassword: Specifies the password (SecretPassword) for the secret vault, used to secure sensitive configuration data.

  • --heappe-port 5000: Assigns port 5000 for the HEAppE service.

  • --db-port 6000: Assigns port 6000 for the database service.

  • --db-password Passw0rd: Sets the password for the database to Passw0rd.

  • --heappe-core-repo https://github.com/It4innovations/HEAppE.git: Specifies the Git repository URL for the HEAppE core codebase.

  • --heappe-core-branch master: Uses the master branch of the HEAppE core repository.

  • --data-staging-port 5001: Assigns port 5001 for data staging, enabling data transfer or preparation services.

  • --defaults: Runs the script without user interaction, using default values for unspecified options.

Use this configuration when you need data staging functionality, such as transferring or preparing files for processing in an HPC environment.

Without Data Staging

$ ./HEAppE-setup.sh \
    --project project1 \
    --secret-vault-password SecretPassword \
    --heappe-port 5000 \
    --db-port 6000 \
    --db-password Passw0rd \
    --heappe-core-repo https://github.com/It4innovations/HEAppE.git \
    --heappe-core-branch master \
    --defaults
Script actions:
  1. Creation of the directory structure for the HEAppE Middleware.

  2. Preparation of the .env file for docker containers.

  3. Acquisition of the release of the HEAppE Middleware official repository and copy docker-compose files.

  4. Creation of the appsettings.json file.

  5. Creation of the appsettings-data.json file.

  6. Creation of the seed.njson file.

  7. Credentials Vault initialization and unsealing procedure.

  8. Inicialization the docker containers.

Note

The script creates the following directory structure:

/opt/heappe
├── confs
└── projects
    └── PROJECT
        ├── app
        │   ├── keys
        │   ├── logs
        │   └── confs
        ├── heappe-core
        ├── ssh_agent
        │   └── keys
        └── docker_configurations
            └── .env

Warning

The script will ask you to provide the SA password for the SQL Server - which is deyployed in the docker container and HEAppE Middleware will use it to connect to the database. Please check the password policy of the Microsoft SQL Server. The password must meet the requirements of the SQL Server password policy.

Generation and storing of SSH keys for connection to the HPC cluster

There are several options for managing SSH keys based on your setup:

  • Using HEAppE Middleware for SSH Key Generation (Preferred)

    After the successful HEAppE deployment, you can generate SSH keys via the HEAppE REST API endpoint /heappe/Management/GenerateSecureShellKey. The keys will be stored in the directory /opt/heappe/projects/PROJECT/app/<ACCOUNTING_STRING>/keys/.

  • Using Existing HPC Accounts

    If you have gained robot HPC accounts (HEAppE Internal accounts) from the HPC centre, you need to put these keys to the directory /opt/heappe/projects/PROJECT/app/keys/.

  • Using SSH Keys from SSH Agent

    If you will use SSH keys from the SSH Agent, you need to put these keys to the directory /opt/heappe/projects/PROJECT/ssh_agent/keys/.

    SSH keys need to be loaded into the SSH Agent (Docker container SshAgent).

Note

More information about SSH keys generation is available at the Robot accounts.

3. Cluster script directory initialization

The following figure illustrates the comprehensive directory and shared locations structure that has to be created for successful deployment.

HPC cluster directory structure

HPC cluster directory structure

A) HEAppE Job Execution directory structure

The directory structure for HEAppE Job Execution is created under the Master robot account identity automatically. HEAppE checks the existence of the directory structure when user creates Job by calling JobManagement/CreateJob endpoint and creates it if it does not exist. This directory structure is created on shared storage (usually scratch project directory) that is accessible from all nodes of the HPC cluster.

B) & C) HEAppE Scripts directory structure

The directory structure for HEAppE Scripts is created under the Master robot account identity. Description of the HEAppE cluster scripts is available at the Cluster scripts.

Automatic setup

This setup can be done automatically by calling HEAppE REST API endpoint Management/InitializeClusterScriptDirectory after HEAppE is configured and started. Each Robot account identity has to create symbolic links to this cluster script directory to be able to use it. Symbolic links are created at the Robot account identitity home directory.

Manual setup

Another way of preparation is manual setup. If you choose this way, you need to do the following steps:

  1. Create the directory structure for the application scripts under the Master robot account identity.

    $ mkdir -p {disk_location}/HEAppE
    $ chmod 770 {disk_location}/HEAppE
    $ cd HEAppE
    
    $ mkdir -p Scripts
    $ chmod 770 Scripts
    
  2. Clone the repository with the application scripts.

    $ cd Scripts
    
    $ git clone https://github.com/It4innovations/HEAppE-scripts.git
    
  3. Move the scripts to the .key_scripts directory.

    $ mv HEAppE-scripts/HPC/.key_scripts .key_scripts
    $ rm -Rf HEAppE-scripts
    $ chmod -R 750 .key_scripts
    $ ln -sf {disk_location}/HEAppE/Scripts/.key_scripts ~/.key_scripts
    
  4. Modify the scripts according to your needs.

  5. Create symbolic links under the Robot account identities.

    $ ln -sf {disk_location}/HEAppE/Scripts/.key_scripts ~/.key_scripts
    

Warning

HEAppE .key_scripts that have to be modified at cluster by user when choosing manual way of preparation are listed below.

  • copy_data_to_temp.sh

    Specify hpcprojectid variable (corresponding project id identifier). Eg. project id format DD-XX-XX

  • remote-cmd3.sh

    Replace default value of baseprefixpath variable by the path of HEAppE/Executions folder.

Note

If everything is successfully deployed, HEAppE is available at http://localhost:5000 (Swagger at http://localhost:5000/swagger/index.html).

And you can authenticate with the default credentials: admin/admin via the UserAndLimitationManagement/AuthenticateUserPassword endpoint.