Skip to content

3 Management Node Setup

This section will guide you through the setup of your System Management Server (SMS) host - the node that is responsible for managing the virtual cluster. To align with the OpenHPC install recipe, we will call it the smshost.

Click here to learn more about some of the different node terminologies that you may encounter in an HPC environment.

In an HPC environment, much like a traditional server environment in a datacentre or business, you will typically see various nodes with different roles.

"What's a node?" A node is another term for a server or computer in the HPC environment. Typically the term given to a node is based on the role that this node performs in the cluster environment. Some roles include:

  • management node: manages the HPC cluster environment
  • master node: another term for the management node
  • boss node: yet another term for the management node
  • host node: a less common term for the management node
  • compute node: the worker node that performs computations in the cluster
  • accelerator node: a node that also performs computations, but primarily through an accelerator
  • GPU node: a specific type of accelerator node, that uses GPUs
  • storage node: a node that manages storage for the cluster
  • login node: a public-facing node that fields login attempts to the cluster
  • head node: another term for the login node

In the virtual lab, the smshost will perform most of these roles, including a head node, management node, provisioning node and storage node.

Note

Anticipated time to complete this chapter: TBC from user feedback.

3.1 Deploy smshost


We will deploy the smshost using a simple Vagrant command - vagrant up.

Click here to learn what tasks the vagrant up command performs:
  1. initialises the Vagrant environment,
  2. downloads and initialises the base smshost VM,
  3. informs VirtualBox to modify the VM configuration according to the parameters set out in the Vagrantfile.

This vagrant up process may take some time - depending on the speed of your internet connection and whether or not you have previously downloaded the Vagrant .box file being referenced in Vagrantfile.

Click here to learn more about the Vagrantfile.

There are many parameters that can be defined within the Vagrantfile, including:

.vm.box:
this references the base .box file to be used for provisioning the VM. In the virtual lab, this points to one of two Vagrant boxes, depending on the VM in question ...

smshost.vm.box = "bento/rockylinux-8":
uses the Bento box labelled rockylinux-8 to provision the VM named smshost.
To see what other Bento boxes are available, go to the HashiCorp Vagrant Cloud.

The compute VMs in the virtual lab both point to a local file called compute-node.box:

compute00.vm.box = "file://./compute-node.box"
compute01.vm.box = "file://./compute-node.box"

Tip - click here to learn how Vagrant behaves with the Vagrantfile

Running an ls command (or equivalent) should at least list the Vagrantfile in the current working directory. The vagrant up instruction will reference this configuration file. If it is not located in the current working directory, Vagrant will climb up the directory tree (towards the root) looking for the first Vagrantfile it can find.

This could lead to the wrong Vagrantfile being used, so please ensure the correct working directory before running vagrant up.

See this link for more information.

Important - where to run vagrant up

Be careful when and where you run vagrant up.

Make sure to run vagrant up from your Git root or root lab directory. Vagrant will initilalise the VM with your current working directory as a shared directory within the VM at /vagrant/. Doing this ensures that your VM has access to the configuration files that you downloaded from Git - input.lab.local, compute-node.box, etc.

  1. Navigate to your lab root directory. ie. ~/openhpc-2.x-virtual-lab/

  2. Run the command to initialise and deploy the smshost VM with Vagrant, as follows:

    [~/openhpc-2.x-virtual-lab/]$
    vagrant up smshost
    
    The above command will have Vagrant read the parameters in the Vagrantfile for the smshost, create a VirtualBox VM definition (such as vCPUs, RAM, NICs etc.), download and install the Rocky Linux image into the VM, boot the VM and install any addtional stipulated packages.

    Note

    The Rocky Linux 8 image is approximately 680MB in size.

    Running the vagrant up command may fail with an error relating to "The IP address configured for host-only network is not within the allowed ranges". This is a known issue with the lastest versions of VirtualBox. To resolve this problem please follow the VirtualBox documentation on the matter.

    Important

    This virtual lab uses a single Vagrantfile to manage the smshost VM as well as the compute VMs - known as a multi-machine Vagrantfile. To ensure you are accessing the correct VM definition and VM you are required to add the name of the VM definition after any Vagrant commands, such as:

    • vagrant up smshost
    • vagrant ssh smshost.
  3. Once the VM is booted and the additional packages have been installed, you should be able to access your smshost VM via ssh.

    Tip

    During the vagrant up smshost step, you can open the VirtualBox interface and watch the processes side-by-side. As Vagrant performs certain steps, you can see how they affect the VirtualBox VM configuration.

    You can ssh to the VM at any time using one of the following methods:

    1. Using vagrant:

      [~/openhpc-2.x-virtual-lab/]$
      vagrant ssh smshost
      

    2. Using any SSH client to 127.0.0.1:2299 with the default Vagrant credentials (username::password) vagrant::vagrant

    While vagrant ssh is the easiest method, some people report that clipboard copy/paste functionality is not available on the VM.

    Note

    Your host machine has a shared directory with the VM as defined in the Vagrantfile. By default this directory is the location of the Vagrantfile on your workstation and is at /vagrant/ on the VM.

Congratulations

You have deployed the smshost virtual machine with a base Rocky Linux OS and configured its VirtualBox parameters!

3.2 Add and Configure Parameters


Prior to continuing with the installation of the OpenHPC components on the smshost, several commands must be issued to set up the shell environment. To ensure that all defined variables are set for the current shell environment, the configuration file must be sourced.

Tip

The official OpenHPC recipe mentions an input.local environment file. This file is not present in this OpenHPC 2.x guide.

For the purposes of this virtual lab we are using input.local.lab in its place, which is a simplified pre-customised environment file. In either case the configuration file (or local input file) must be sourced in the existing shell (ie. loaded into the current shell environment).

If you make any updates to the configuration file, the source command must be run again to update the environment variables in the current shell.

Recall that /vagrant/ on the VM is shared with the local host system at the directory location of your Vagrantfile. You will have pulled these configuration files when cloning the lab Git repo.

  1. ssh to the smshost VM and elevate to root user:

    [~/openhpc-2.x-virtual-lab/]$
    vagrant ssh smshost
    
    [vagrant@smshost ~]$
    sudo su
    

    You should be at the following prompt in your terminal:

    [root@smshost vagrant]#
    

    Tip - click here to learn more about best practice and why running commands as root is not a good idea!

    Running commands as root is not best practice!

    Since this is a test lab, it is considerably easier to issue commands as root and not have to worry about occasional sudo workarounds. In general, however, it is not recommended, and to ensure the best-practice habit of not running as root, we will still issue commands with sudo, even when logged in as root.

    Running commands as root makes it very difficult to have an audit trail of exactly which user ran a privileged command and it is much easier to make an irreversible error when you are not consciously aware that you are a privileged user.

  2. Examine the current environment variable status

    [root@smshost vagrant]#
    [root@smshost vagrant]#
    echo ${sms_name}
    echo ${sms_ip}
    

    Both commands will show a blank response.

  3. After sourcing the local input file, the OpenHPC environment variables return the definitions as sourced from the local input file input.local.lab.

    If you are not in the correct working directory (as the lab anticipates you to be), first navigate to the /vagrant directory:

    [root@smshost vagrant]#
    cd /vagrant
    

    Then source the variable file and verify that things loaded correctly by noting the output of the echo commands:

    [root@smshost vagrant]#
    [root@smshost vagrant]#
    
    
    [root@smshost vagrant]#
    
    source input.local.lab
    echo ${sms_name}
    # OUTPUT: smshost
    
    echo ${sms_ip}
    # OUTPUT: 10.10.10.10
    

    Click here for a quick explanation of what just happened with the echo example above.

    When you source the input.local.lab you are essentially running line-by-line through the text file as a series of shell commands.

    As with a standard BASH session, sections prefixed by a # are considered as comments, and are safely ignored.

    Knowing that a source will run line-by-line through the file, let's take a look at the first three input lines in input.local.lab (recall that we are effectively running three separate commands in the shell, one after the other):

    1. # sms host information - this is treated as a comment and has no effect when it is run in the terminal.
    2. sms_name=smshost # Hostname for SMS server - this will load the string smshost into an environment variable sms_name for the duration of this shell session. As with the previous line, the section following the # is safely ignored as a comment.
    3. sms_ip=10.10.10.10 # Internal IP address on SMS server - as with the above line, this will load the string 10.10.10.10 into an environment variable sms_ip for the duration of this shell session. Again, the section following the # is safely ignored as a comment.

    When we run an echo command, we are asking the terminal to show us what is stored in the variables with names sms_name and sms_ip. The ${} syntax is to properly encapsulate the environment variable names for referencing.

    Click here if you are experiencing syntax error messages with source input.local.lab

    Depending on your local environment, there may be some potential pitfalls that lead to syntax errors when sourcing the input file. There are some workarounds to consider, depending on your local environment setup:

    1. Download the file directly via HTTP using wget:

      $
       wget https://gitlab.com/hpc-ecosystems/training/openhpc-2.x-virtual-lab/-/raw/main/input.local.lab
      

    2. To fix the local input.local.lab file install dos2unix:
      sudo dnf install dos2unix
      invoke the above source command starting with dos2unix.

    3. Before cloning with Git, stop it from changing line endings:
      git config --global core.autocrlf false.
      This only works before cloning the Git repository and will be brute force if you use Git for other repositories on the same machine.

    4. Run a sed command to replace the input.local.lab file without installing dos2unix:
      $
      
      
      
      $
      $
      $
      
      
      
      $
      file input.local.lab  
      
      input.local.lab: ASCII text, with CRLF line terminators  
      
      mv input.local.lab input.local.lab.bak
      sed "s/\r//" input.local.lab.bak > input.local.lab
      file input.local.lab  
      
      input.local.lab: ASCII text  
      
      rm input.local.lab.bak
      
  4. Once the input file has been sourced successfully (which is verified by the output of the echo commands), all the environment variables will be set for the current shell session.

    Tip - remember to re-source on every shell

    Every new shell instance must have the input file re-sourced.

    This applies to a new tmux window, a new tmux shell, or after a disconnection, reboot, and so forth.

  5. Add the DNS entry to the /etc/hosts file

    [root@smshost ~]#
    [root@smshost ~]#
    
    
    
    
    echo ${sms_ip} ${sms_name} >> /etc/hosts
    cat /etc/hosts
    
    # OUTPUT
    # ...
    # 10.10.10.10 smshost
    
    Click here to understand what just happened with /etc/hosts.

    The /etc/hosts file is a local DNS file that is private to the host machine.


    The echo command will usually display the results to the terminal screen, however, the use of >> redirects the output to /etc/hosts. The use of >> means to append the output to the end of /etc/hosts. If we used > instead, we would overwrite /etc/hosts and have only one line in the new file!


    By adding the sms_ip and sms_name to the /etc/hosts file, we are enabling the ability to reference the smshost locally by a fully-qualified name.

    In this case, the line 10.10.10.10 smshost in the /etc/hosts file means that any reference to smshost will resolve locally to 10.10.10.10.


    Note that all entries in /etc/hosts are traced from line 0 onwards until the first valid match is found or the end of file is reached. This means that the order of your entries is very important.


    As an aside, can you see how convenient it is to use environment variables to modify system files? If you were to type these values in manually every time you made a change, you run the risk of inconsistencies when parameters differ across the commands due to typo's or out of date information.

    By storing all the variable parameters in a single file input.local.lab it is generally easier to review the entire cluster configuration by simply reading over the contents of the file.

  6. OpenHPC recommends disabling SELinux

    [root@smshost ~]#
    sudo sed -i "s/^SELINUX=.*/SELINUX=disabled/" /etc/selinux/config
    
    Click here to learn more about what the above command does.

    sed is a stream editor that we use to work on text files. Throughout the virtual lab you will need to make changes to existing config files that hold default values. A common habit would be to manually edit each config file with a text editor (vim, vi, etc.) and this is perfectly fine for a once-in-a-while edit, but there's always an inherent risk of configuration-drift or simple human error.

    A safer approach would be to use as much automation as possible. This is where a tool like sed can prove valuable. We will follow this approach throughout the virtual lab.


    In the above step, we are searching the file /etc/selinux/config for a line that contains SELINUX= and we are replacing it with SELINUX=disabled.

    The default configuration in /etc/selinux/config is SELINUX=permissive and if you run a watch in a separate session / tmux pane, you will see the line replace SELINX=permissive with SELINUX=disabled.


    To invoke tmux:
    - tmux
    - Press CTRL+B and then " to open another horizontal pane.
    - type in watch "cat /etc/selinux/config" - pay attention to about halfway in the output, where SELINUX=permissive - Switch to the top pane by pressing CTRL+B and then the up arrow.
    - prepare the guide command: sudo sed -i "s/^SELINUX=.*/SELINUX=disabled/" /etc/selinux/config - while watching the bottom pane, press ENTER and watch the result. - after a short refresh time (2 seconds by default) the SELINUX=permissive will be replaced with SELINUX=disabled.

  7. Reboot the VM to have the SELINUX settings propagate.

    [root@smshost ~]#
    sudo reboot
    

    Tip

    Always run as root and re-source the environment variables on every new shell.

    As stated before, this is a test environment, so it is considerably easier to bypass occasional sudo workarounds by running directly as root. This is not best practice in a production environment!

    OpenHPC makes extensive use of environment variable substitution. If you do not source the local input file input.local.lab then these variables will hold blank values and lead to unpredictable behaviour.

    Always remember to source the input.local.lab file on every new Shell instance.

  8. On reboot, vagrant ssh smshost to the smshost and return to root profile and source the correct environment:

    [vagrant@smshost ~]$
    sudo su
    
    [root@smshost vagrant]#
    source /vagrant/input.local.lab
    

    Important - losing shared directories after reboot.

    When rebooting the VM from within the guest OS, the /vagrant directory may not remap on reboot. If the mapping is lost, a vagrant reload will reload the configuration file.

    Alternatively, vagrant halt (graceful shutdown) followed by vagrant up will reboot the VM. This process also reloads the Vagrantfile, reestablishing the shared directory mapping.

    vagrant reload is functionally equivalent to vagrant halt followed by vagrant up

  9. Disable the Firewall

    [root@smshost vagrant]#
    [root@smshost vagrant]#
    sudo systemctl disable firewalld
    sudo systemctl stop firewalld
    

    To verify the Firewall is indeed disabled:

    [root@smshost vagrant]#
    sudo systemctl status firewalld
    
    Click here to understand why we disable the firewall and why it's generally a bad idea!

    It is not best practice to run a public-facing server without a firewall!

    Again, since this is a test lab, it simplifies the learning objectives when we know that commands will not be blocked by an unexpected firewall issue that only serve as an added distraction and detract from the learning experience.

    Troubleshooting network issues can be a daunting task for the best of us, so we are mitigating the risk for confusion in the lab by removing the possibility of a firewall rule blocking service traffic.

    The OpenHPC install recipe follows this philosophy, but you should be very aware of the risks of following this approach in a production environment.

    Luckily, the risk to your smshost and virtual cluster environment is low since this is a locally-hosted cluster and very difficult for a remote malicious actor to access (but even if they did - not much can be stolen or harmed).

Congratulations

You have now installed and prepared the smshost virtual machine for the OpenHPC components.

The next step is to add the necessary packages in order to provision and manage the virtual cluster.

3.3 Add OpenHPC Components


Now that the base operating system is installed and booted, the next step is to add the desired OpenHPC packages to the smshost. These packages will provide provisioning and resource managment services to the rest of the virtual cluster.

3.3.1 OpenHPC Repository

You will need to enable the OpenHPC repository for local use - this requires external internet access from your smshost to the OpenHPC repository which is hosted on the internet.

[root@smshost ~]#
sudo dnf install -y http://repos.openhpc.community/OpenHPC/2/EL_8/x86_64/ohpc-release-2-1.el8.x86_64.rpm 

3.3.2 EPEL Release

In addition to the OpenHPC repository, the smshost needs access to other base OS distro repositories so that it can resolve the necessary dependencies. These include:

  • BaseOS
  • Appstream
  • Extras
  • PowerTools, and
  • EPEL repositories

From the prior output, you may have noticed that epel-release is enabled automatically when installing ohpc-release (see the Installed: epel-release-* line). Unlike the other repositories which are enabled by default, PowerTools must be enabled from EPEL manually as follows:

[root@smshost vagrant]#
[root@smshost vagrant]#
sudo dnf -y install dnf-plugins-core 
sudo dnf -y config-manager --set-enabled powertools 
Click here to learn more about the EPEL repository.

The EPEL repository is a volunteer-based community repository for Red Hat Enterprise Linux distributions of Linux, and it stands for Extra Packages for Enterprise Linux.

3.3.3 Provisioning and Resource Management

In this virtual lab, system provisioning and workload management will be performed using Warewulf and Slurm, respectively.

To add support for provisioning services, one must add the common base package provided by OpenHPC, as well as the Warewulf Provisioning System.

[root@smshost vagrant]#
[root@smshost vagrant]#
sudo dnf -y install ohpc-base      
sudo dnf -y install ohpc-warewulf

To add support for workload management, we will install the Slurm Workload Manager. Simply put, Slurm will perform the role of a job scheduler for our HPC cluster.

Run the installation command:

[root@smshost vagrant]#
sudo dnf -y install ohpc-slurm-server 
Click here to understand what role the smshost plays with Slurm.

The smshost acts as our Slurm server. This means that all jobs submitted to the cluster will be administered by Slurm, which is hosted on the smshost.

It is convenient that in our virtual lab the smshost serves multiple roles and simplifies the complexity slightly.

Users (in this case, you) will remotely connect to the smshost (in its role as a login node) and then submit jobs through the smshost (in its role as the Slurm server) for processing on the compute nodes.

The client-side components of the workload management system will be added to the corresponding compute node image that will eventually be used to boot the compute nodes, in the next chapter.

Note

In order for the Slurm server to function correctly a number of conditions are required to be satisfied.

  • It is essential that your Slurm configuration file, slurm.conf, is correctly configured. No need to worry! We will do this in the chapter on Resource Management.
  • Slurm (and HPC systems in general) requires synchronised clocks throughout the system. We will utilise NTP for this purpose in the next section.

3.4 Configure Time Server


We will make use of the Network Time Protocol (NTP) to synchronise the clocks of all nodes on our virtual cluster. The following commands will enable NTP services on the smshost using the time server ${ntp_server}, and allow this server to act as a local time server for the cluster. We will be using chrony, which is an alternative to ntpd.

[root@smshost vagrant]#
[root@smshost vagrant]#
[root@smshost vagrant]#
sudo systemctl enable chronyd
echo "local stratum 10" >> /etc/chrony.conf
echo "server ${ntp_server}" >> /etc/chrony.conf

The official OpenHPC recipe opts to allow all servers on the local network to synchronise with the smshost. In this lab, we will restrict the access to fixed IP addresses for our virtual cluster using the variable cluster_ip_range, as follows:

[root@smshost vagrant]#
[root@smshost vagrant]#
echo "allow ${cluster_ip_range}" >> /etc/chrony.conf 
sudo systemctl restart chronyd

To verify that the chronyd service is started correctly:

[root@smshost vagrant]#
sudo systemctl status chronyd

Click here to understand what just happened with /etc/chrony.conf.

By using echo we are once again redirecting the output into the configuration file; in this case /etc/chrony.conf.

server {$ntp_server}: if you look at input.local.lab you will notice that ntp_server=time.google.com. In the previous steps, we added the NTP server details for time.google.com into our cluster configuration for NTP.

allow ${cluster_ip_range}: if you look at input.local.lab this is defined as 10.10.10.0/24 which is CIDR notation to indicate all valid IPv4 addresses in the range 10.10.10.1 to 10.10.10.254. This means that smshost will only serve time synchronisation to nodes on the IPv4 private network 10.10.10.0/24 which we know is the HPC private IP range.

>> will append to an existing file, whereas > will create a new file.

Congratulations

You have now successfully completed the basic configuration of your smshost !

Before moving on to the configuration of the compute images, we will quickly cover how one can make backups of progress throughout this virtual lab.


While it is plausible to run the entire virtual lab without making any backups of your progress, it is recommended to at least make snapshots of major milestones (such as at the end of each chapter of this guide). Be aware that too many snapshots can bloat your resource usage and will increase the amount of disk space you will need to host the VMs.

You can make snapshots using either the Virtualbox GUI or command line:

  1. Through the VirtualBox Manager GUI:

    Figure 2: How to snapshot a VM using the VirtualBox GUI
    Figure 2: How to snapshot a VM using the VirtualBox GUI

  2. Through the command prompt:
    Call the snapshot save instruction from the command line, followed by the <vm_name> (where applicable) and then the desired <snapshot_name>.

    Run this command from your primary host machine's shell, not the VM environment!

    snapshost save is not to be invoked within the vagrant session but at the command prompt outside the vagrant ssh session.

    [~/openhpc-2.x-virtual-lab/]$
    vagrant snapshot save chapter3-smshost-complete
    
    ==> smshost: Snapshotting the machine as 'chapter3-smshost-complete'...
    ==> smshost: Snapshot saved! You can restore the snapshot at any time by
    ==> smshost: using `vagrant snapshot restore`. You can delete it using
    ==> smshost: `vagrant snapshot delete`.
    Machine 'compute00' has not been created yet, and therefore cannot save snapshots. Skipping...
    Machine 'compute01' has not been created yet, and therefore cannot save snapshots. Skipping...
    

    Note

    Running the above command will take a snapshot of all VMs defined by your Vagrantfile at that present moment (see the sample output above, where compute00 and compute01 are skipped). It may be useful to only take a snapshot of a subset of your VMs later on (to avoid backing up VMs that have not changed configuration since the last snapshot).

    To specify the VM that you want to snapshot:

    [~/openhpc-2.x-virtual-lab/]$
    vagrant snapshot save <vm_name> <snapshot_name>
    

    For more information on how to manage and view your saved snapshots, see the Vagrant snapshot documentation.

Click here to recap what you have accomplished in this chapter.

You used vagrant up to provision the smshost host with the definitions in Vagrantfile.

The smshost is running Rocky Linux and has been configured with two network interface cards (public-facing and hpcnet-facing).

To access the smshost you can use vagrant ssh or an SSH client of your choice.

There is a shared folder between the VM and your local host machine, that is located wherever the Vagrantfile is present on your local host machine, and it maps to /vagrant/ on the VM's OS.

You have loaded system environment variables through sourcing input.local.lab and used these parameters to configure the DNS entries in /etc/hosts.

For the virtual lab, the firewall has been disabled.

Additional OpenHPC components have been installed to prepare for the deployment of the HPC software stack.

Finally, you have saved your smshost state with a snapshot.

Congratulations

You have reached the end of Chapter 3 - Well done!

In this chapter you successfully deployed and configured your smshost VM. You are well on your way to your virtual cluster deployment. In the next chapter you will define and configure your compute node image.


Bug report

Click here if you wish to report a bug.

Provide feedback

Click here if you wish to provide us feedback on this chapter.