Skip to content

4 Compute Node Preparation

This section will cover the preparation of the compute node image which will be provisioned using Warewulf.

Note

Anticipated time to complete this chapter: TBC from user feedback.

Tip - snapshots can be your best friend!

Keep in mind your ability to take snapshots of your smshost VM. Since this is virtual lab is intended as a learning experience, you may wish to take snapshots at various milestones to ensure that you are able to quickly recover from any major mistakes to reduce interruptions to your learning progression.

4.1 Initialise Warewulf


At this point, all of the packages required to use Warewulf on the smshost should already have been installed. The next step is to update a number of configuration files which will allow Warewulf to work with Rocky 8 and support local provisioning using a second private interface (refer to Figure 1 in Chapter 1).

Note - eth1 default for the virtual lab

By default, Warewulf is configured to provision over the eth1 interface. If you would prefer to use an alternatively named interface, defined by ${sms_eth_internal}, the steps below should be run to override this default.

  1. Configure Warewulf to use the desired internal interface

    [root@smshost ~]#
    sudo sed -i "s/device = eth1/device = ${sms_eth_internal}/" /etc/warewulf/provision.conf 
    
    Click here to learn more about what the above command does.

    sed is a stream editor that we use to work on text files. Throughout the virtual lab you will need to make changes to existing config files that hold default values. A common habit would be to manually edit each config file with a text editor (vim, vi, etc.) and this is perfectly fine for a once-in-a-while edit, but there's always an inherent risk of configuration-drift or simple human error.

    A safer approach would be to use as much automation as possible. This is where a tool like sed can prove valuable. We will follow this approach throughout the virtual lab.


    The command above uses a regular expression s/device = eth1/device = ${sms_eth_internal}/. It can be broken down into four components:

    1. s
    2. /device = eth1/
    3. /device = ${sms_eth_internal}/
    4. /etc/warewulf/provision.conf

    The steps are explained as follows:

    1. search for:
    2. device = eth1
    3. replace any matching string with device = ${sms_eth_internal}
    4. and perform this search on the /etc/warewulf/provision.conf file

    Note that the value defined in input.local.lab for the variable sms_eth_internal will be substituted in the replace step of the sed command (i.e. it isn't replaced with device = ${sms_eth_internal} as a string, but rather something like device = eth0).

    By using a central source of truth (in this case, input.local.lab) we can ensure parity across configuration changes since there is no risk of human error through a typo in the configuration file, or the referenced parameter. As long as the configuration source file input.local.lab is consistently up-to-date, any system changes can be safely performed from this file.

    Of course, it's also an informative way for any administrator to understand the configuration parameters of the HPC system by simply reading over the contents of input.local.lab.

  2. Enable the internal interface for provisioning:

    [root@smshost ~]#
    [root@smshost ~]#
    sudo ip link set dev ${sms_eth_internal} up
    sudo ip address add ${sms_ip}/${internal_netmask} broadcast + dev ${sms_eth_internal} 
    
    Click here to learn more about what the above command does.

    ip link set dev ${sms_eth_internal} up:
    will set the smshost's internal ethernet interface defined in input.local.lab to UP (i.e. enable the ethernet device in the Operating System).


    Looking at the second command:
    ip address add ${sms_ip}/${internal_netmask} broadcast + dev ${sms_eth_internal}
    We can break this down into sub-parts:

    ip address add:
    will add the defined IPv4 address for the smshost's internal ethernet interface. This is the IPv4 address of the smshost on the hpcnet internal private HPC cluster network (i.e. how the compute nodes will talk to the HPC management server).

    internal_netmask:
    governs the IP address range available for the internal private HPC cluster network.

    broadcast:
    will act as the reserved broadcast address that can be used in the network to send a broadcast to all devices on that network, without knowing their individual IPv4 addresses (a-one-to-all communication).

    +:
    The special symbol + sets the host bit of the interface ip address add <>/<> broadcast + dev <> of the smshost IP address with defined hpcnet netmask to have the standard broadcast IP address.

    Click here to learn how to watch the changes.

    (this assumes you have a second sourced shell visible - such as a tmux pane)

    watch "ip a | grep ${sms_eth_internal}"
    

Whether or not you overwrote the default interface, you are now required to restart and enable the relevant services that are required for provisioning with Warewulf:

[root@smshost vagrant]#


[root@smshost vagrant]#


[root@smshost vagrant]#


[root@smshost vagrant]#
[root@smshost vagrant]#
sudo systemctl enable httpd.service
# OUTPUT: Created symlink /etc/systemd/system/multi-user.target.wants/httpd.service → /usr/lib/systemd/system/httpd.service.

sudo systemctl enable dhcpd.service 
# OUTPUT: Created symlink /etc/systemd/system/multi-user.target.wants/dhcpd.service → /usr/lib/systemd/system/dhcpd.service.

sudo systemctl enable tftp.socket 
# OUTPUT: Created symlink /etc/systemd/system/sockets.target.wants/tftp.socket → /usr/lib/systemd/system/tftp.socket.

sudo systemctl restart httpd 
sudo systemctl restart tftp.socket  
Click here to learn more about what the above commands do.

When a system service is set to enable, the service is scheduled to start on boot.

When a system service is set to restart, the service is attempting to stop and start.

We want to have the httpd and tftp services ready to support the PXE booting process, but we have not yet started the dhcpd service, since we still need to configure that with the dhcpd.conf file, which we will use Warewulf to do.

4.2 Define Compute Image


Now that the provisioning services are enabled, the next step is to define and customise a system image that can be used to provision the compute nodes.

This process starts with defining a base operating system image using Warewulf.

Tip

It is important to understand that in this virtual lab example (and likely your physical HPC system deployment), your compute nodes will be provisioned using stateless provisioning. This means that rather than loading your operating system onto a persistent storage medium or disk, the operating system image is loaded into memory on boot.

Provisioning your system this way ensures parity across your compute nodes (as the same image is deployed to each node on node boot) and that changes made to a particular node's operating system during operation will not persist after a reboot of that node (it is non-persistent).

The OS image needs to be defined on the smshost VM so that Warewulf can repeatedly use this image to deploy to compute nodes.

  1. The first step is to define a directory structure on the smshost that will represent the root filesystem for the compute node (chroot). The default location in this example is /opt/ohpc/admin/images/rocky8.5.

    [root@smshost ~]#
    export CHROOT=/opt/ohpc/admin/images/rocky8.5 
    
    Click here to learn how to watch the changes.

    (this assumes you have a second sourced shell visible - such as a tmux pane)

    It isn't possible to watch the changes from a separate shell, since you are only declaring the environment variable for your current shell. Instead, you can echo the state of the variable before and after running the command.

    echo $CHROOT should reveal a blank line.

    After running the above command, a subsequent echo $CHROOT should reveal
    /opt/ohpc/admin/images/rocky8.5

    Click here to learn more about the above command.

    export assigns an environment variable with a string value.

    In the above step, an environment variable CHROOT is created with the string value /opt/ohpc/admin/images/rocky8.5. While variable name is not important - we could have also used export noderoot=/... if we preferred - we have chosen CHROOT because it is aligned with the concept of chroot's in Linux.

    This path will be a directory structre on the smshost that will represent the compute node's root filesystem.

    chroot - what is it?
    chroot is a Linux utility that modifies the working root directory for a process, essentially limiting access to the rest of the file system that resides 'further up' the directory hierarchy tree.

    For instance, in the above step, we have created a chroot jail that prevents navigation above /opt/ohpc/admin/images/rocky8.5 in the smshost file structure.

    For any process within the jail, it will perceive /opt/ohpc/admin/images/rocky8.5 on smshost as its / directory. In this way, it cannot go further up - in its perceived world, when a process calls cd / it will see /, but in reality the smshost full path will be /opt/ohpc/admin/images/rocky8.5.

    If you want to learn more about chroot, click here.

    To ensure that we always remember to set this CHROOT path in future sessions, we will add it to input.local.lab:

    [root@smshost vagrant]#
    echo CHROOT=/opt/ohpc/admin/images/rocky8.5 >> /vagrant/input.local.lab
    
    Click here to learn how to watch the changes.

    (this assumes you have a second sourced shell visible - such as a tmux pane)

    On the second shell:

    watch "cat /vagrant/input.local.lab | grep CHROOT"
    

    Click here for more information about the purpose of the above command.

    The echo command will output the string
    CHROOT=/opt/ohpc/admin/images/rocky8.5 to the terminal.

    The >> redirects the output to the input.local.lab file.

    Recall that a > will create a new input.local.lab whereas a >> appends to the existing input.local.lab file.

    Now, every time the input.local.lab file is sourced, the value for $CHROOT will be loaded into the environment variable.

  2. Build the initial chroot image (this provides a minimal default Rocky 8.5 image for use with Warewulf).

    [root@smshost vagrant]#
    wwmkchroot -v rocky-8 $CHROOT  # 81MB 
    
    # OUTPUT:
    # Complete!
    # == Running: postchroot
    # == Running: configure_fstab
    # == Running: configure_network
    # == Running: configure_ntp
    # == Running: configure_pam
    # == Running: configure_authentication
    # == Running: configure_sshkeys
    # == Running: configure_rootacct
    # == Running: configure_runlevel
    # == Running: configure_services
    # == Running: configure_timezone
    # == Running: finalize
    # == Running: cleanup
    
    Click here to learn how to watch the changes.

    (this assumes you have a second sourced shell visible - such as a tmux pane)

    On the second shell:

    watch "ls -la $CHROOT"
    

    Click here to learn more about the wwmkchroot command.

    wwmkchroot (AKA 'Warewulf make chroot') creates chroots.

    The above command creates a chroot from the template rocky-8 to the location $CHROOT (which we previously defined as /opt/ohpc/admin/images/rocky8.5).

    Click here for more information about wwmkchroot and Warewulf.

    Click here to learn how to use a locally cached mirror for wwmkchroot instead.

    Warewulf assumes internet access to an external repository when wwmkchroot is invoked.

    You can use a locally cached mirror as an alternate location by updating an environment variable (e.g. ${BOS_MIRROR}) and running the following command:

    perl -pi -e "s#^YUM_MIRROR=(\S+)#YUM_MIRROR=${BOS_MIRROR}#" \
    /usr/libexec/warewulf/wwmkchroot/rocky-8.tmpl
    

    Note the \ in the above input string is a line-wrap, and not an explicit entry parameter.


  3. Enable the OpenHPC and EPEL repositories inside chroot.

    Use dnf to install EPEL repository to the compute node image's relative root:

    [root@smshost vagrant]#
    sudo dnf -y --installroot $CHROOT install epel-release
    
    # OUTPUT:
    # Installed:
    #     epel-release-8-18.el8.noarch
    
    Click here to learn more about what the above command has done.

    The above command mostly resembles a traditional sudo dnf command but there are some extra parameters that you may not be familiar with.

    -y
    prompts an automatic 'yes' response to any confirmation input.

    --installroot
    directs the package installation to an alternative relative root location. In the virtual lab, $CHROOT has been defined as the compute node image path.

    The previous instruction is installing the epel-release package to the compute node image, and not the smshost.

    This is a useful mechanism to add any number of packages to the compute node image without needing to have them installed to the smshost. We will follow this approach regularly throughout the virtual lab.

    Click here to learn more about dnf and installroot.


    Copy the OpenHPC*.repo repository files to the compute node image's repository directory:

    [root@smshost vagrant]#
    sudo cp -p /etc/yum.repos.d/OpenHPC*.repo $CHROOT/etc/yum.repos.d
    

    Click here to learn more about the above command.

    The OpenHPC*.repo files were installed to the smshost's /etc/yum.repos.d repository directory in an earlier step.

    You are now copying those .repo files to the compute node image's /etc/yum.repos.d directory through the chroot relative root.

    Now any dnf command will resolve to the appropriate package repository.

4.3 Add Compute Components


Now that we have a minimal Rocky 8 image, we will add components needed for the compute nodes to function as part of the HPC cluster. These include:

  • Resource management client services
  • NTP support
  • Additional packages needed to support the OpenHPC environment

This process augments the chroot-based install performed by wwmkchroot by modifying the base provisioning image.

  1. Install the compute node base meta package:

    [root@smshost vagrant]#
    sudo dnf -y --installroot=$CHROOT install ohpc-base-compute 
    
    Click here to learn more about the above command.

    You are installing the OpenHPC base files ohpc-base-compute to the compute node's relative root, as defined in $CHROOT.

  2. Copy the smshost DNS configuration to the chroot environment:

    [root@smshost vagrant]#
    sudo cp -p /etc/resolv.conf $CHROOT/etc/resolv.conf
    
    Click here to learn how to watch the changes.

    (this assumes you have a second sourced shell visible - such as a tmux pane)

    On the second shell:

    watch "ls -la $CHROOT/etc/resolv.conf"
    

    Click here to learn about potential risks with copying the resolv.conf file.

    One of the common pitfalls with the virtual cluster configuration is the placement order of the smshost IP address definitions in /etc/resolv.conf.

    Since the /etc/resolv.conf file is parsed FIFO from the top line to the end of file, the moment a 'hit' is found, the forward DNS lookup will map to the first verified IP address 'hit' in the /etc/resolv.conf file.

    There is a possibility that the localhost IP address will be above the hpcnet IP address, which means the smshost DNS name smshost will resolve to localhost instead of 10.10.10.10 (default ${sms_ip}). This won't have a dramatic impact from the smshost perspective, but if a compute node tries to reach smshost and it has a copy of the /etc/resolv.conf file with the wrong placement order, it will resolve smshost to localhost, which will fail on the compute nodes.

    It is a good idea (actually, it is critical) to verify that the reference to the hpcnet IP address is placed above any reference to the localhost IP address for the smshost in the /etc/resolv.conf file before it is copied to the compute nodes with the above command.

    Tip

    When doing this we are making the assumption that your smshost has a working DNS configuration. Please ensure that this is the case before completing this step. There are some tips on common lab-related hiccups (including DNS) in the FAQ.

  3. Copy the local user credential files into chroot to ensure consistent uid/gids for Slurm and MUNGE at install.

    [root@smshost vagrant]#
    sudo cp /etc/passwd /etc/group $CHROOT/etc
    
    # OUTPUT
    # cp: overwrite '/opt/ohpc/admin/images/rocky8.5/etc/passwd'? y
    # cp: overwrite '/opt/ohpc/admin/images/rocky8.5/etc/group'? y 
    

    Note

    Future updates to your user credential files will be synchronised between hosts by your provisioning system.

  4. Add the Slurm client and enable both MUNGE and Slurm


    [root@smshost vagrant]#
    [root@smshost vagrant]#
    
    
    [root@smshost vagrant]#
    sudo dnf -y --installroot=$CHROOT install ohpc-slurm-client 
    sudo chroot $CHROOT systemctl enable munge
    # OUTPUT: Created symlink /etc/systemd/system/multi-user.target.wants/munge.service → /usr/lib/systemd/system/munge.service.
    
    sudo chroot $CHROOT systemctl enable slurmd  
    

    Click here to learn more about the above command.

    Just like the --installroot parameter in dnf will install to a relative root, you can use a relative root to launch services.

    chroot $CHROOT systemctl enable sets the service to enabled in the $CHROOT relative root path for the compute node image.

    Click here to learn more about Slurm and MUNGE.

    Slurm:
    Slurm is a workload manager that will schedule jobs on your virtual cluster. We will delve into more details later in the virtual lab about how to use Slurm. To learn more about Slurm, click here.

    MUNGE:
    MUNGE is an authentication service used in HPC clusters to create and validate credentials. For more information about MUNGE, click here.

  5. Register the Slurm server IP address for the compute nodes (using the configless option)

    [root@smshost vagrant]#
    echo SLURMD_OPTIONS="--conf-server ${sms_ip}" > $CHROOT/etc/sysconfig/slurmd
    

    Click here to learn how to watch the changes.

    (this assumes you have a second sourced shell visible - such as a tmux pane)

    On the second shell:

    watch "cat $CHROOT/etc/sysconfig/slurmd | grep SLURMD_"
    

    Click here to learn more about the above command.

    echo:
    will output SLURMD_OPTIONS="--conf-server ${sms_ip}" to the terminal.


    >:
    redirects the output from the terminal, but it creates a new file, rather than the >> which would append to an existing file.


    $CHROOT/etc/sysconfig/slurmd:
    points to a file slurmd located in the relative root /etc/sysconfig for the compute node image.


    SLURMD_OPTIONS="--conf-server":
    enables the configless Slurm feature, which is explained in more detail later.


    --conf-server ${sms_ip}:
    sets the smshost as the configless Slurm server.


    Note

    Configless Slurm is a Slurm feature that allows the slurmd process running on the compute nodes to pull the configuration information from slurmctld (on the smshost), rather than from a pre-distributed local file. For more information on how this feature works, see the documentation.

  6. Add Network Time Protocol (NTP) support and identify the smshost as a local NTP server

    [root@smshost vagrant]#
    [root@smshost vagrant]#
    sudo dnf -y --installroot=$CHROOT install chrony
    sudo echo "server ${sms_ip} iburst" >> $CHROOT/etc/chrony.conf
    

    Click here to learn more about the above commands.

    sudo dnf -y --installroot=$CHROOT install chrony:
    Installs the package chrony to the compute node relative root directory.


    sudo echo "server ${sms_ip} iburst" >> $CHROOT/etc/chrony.conf:
    Appends a line server 10.10.10.10 iburst into the compute node's /etc/chrony.conf file, assuming that ${sms_ip} is set to the default values of 10.10.10.10.

  7. Ensure that the compute nodes query the smshost for their time synchronisation and do not attempt to query any alternative sources (the compute nodes have no external network access so these queries would fail).

    [root@smshost vagrant]#
    sed -i 's/pool/#pool/g' $CHROOT/etc/chrony.conf
    
    Click here to understand what we are doing with the above command.

    We are commenting out (effectively disabling) the public pool of NTP servers that are referenced in the compute nodes' /etc/chrony.conf which is stored on smshost at $CHROOT/etc/chrony.conf.

  8. Add kernel drivers (matching the kernel version on the smshost)

    [root@smshost vagrant]#
    sudo dnf -y --installroot=$CHROOT install kernel-`uname -r`  # 276MB
    

    Note

    If the kernel drivers step results in an error, typically:

    Last metadata expiration check: 0:10:30 ago on Thu 24 Aug 2023 08:43:11 PM UTC.
    No match for argument: kernel-4.18.0-425.3.1.el8.x86_64
    Error: Unable to find a match: kernel-4.18.0-425.3.1.el8.x86_64
    
    then the kernel drivers are not available at the repo defined on the smshost.

    Try this repository.

    sudo dnf -y --installroot=$CHROOT install kernel  # 350MB
    
  9. Include modules to the user environment:

    [root@smshost vagrant]#
    sudo dnf -y --installroot=$CHROOT install lmod-ohpc
    

    # OUTPUT
    # Installed:
    #   fish-3.3.1-2.el8.x86_64    lmod-ohpc-8.7.6-12.3.ohpc.2.6.x86_64            
    #   lua-5.3.4-12.el8.x86_64                        lua-filesystem-1.6.3-7.el8.x86_64
    #   lua-posix-33.3.1-9.el8.x86_64                  pcre2-utf32-10.32-3.el8_6.x86_64
    #   rc-1.7.4-11.el8.x86_64                         tcl-1:8.6.8-2.el8.x86_64
    #   tcsh-6.20.00-15.el8.x86_64
    #
    # Complete!
    
    Click here to learn more about about modules.

    Many HPC-related software tools rely on specific software versions of dependencies. Whenever a new software version is installed on an HPC, it is common to keep both the old version/s and the current version.

    To keep the multiple versions of a single package installed on your system in order and to make them easily accessible and interchangeable, users will use the modules environment.

    There are two common modules tools to use - lmod and module.

    These tools make it significantly simpler for users to switch software package versions in an intuitive and manageable way.

4.4 Customise Compute Configuration


Before we assemble the compute image, we should perform any additional customisation within the chroot environment. The following steps detail the process of:

  • Adding a local ssh key created by Warewulf (to support remote access)
  • Enabling NFS mounting of a $HOME filesystem
  • Adding the public OpenHPC install path (/opt/ohpc/pub) of the smshost

Tip

To ensure that you make the correct changes to your NFS client mounts, you should know what the fstab file looks like before configuring it.

(This applies across the board, with all files you are editing, always!)

Run the command cat on the file $CHROOT/etc/fstab both before and after making changes to it in step 2 below to see what effect the changes have.

HINT: tmux has been installed on the smshost and is a very useful tool to split your screen into two panes so you can view the output on one pane while running the commands on the other. watch is a recommended tool to use for live updates of the changes, for instance: watch $CHROOT/etc/fstab.

  1. Initialise the Warewulf database and ssh keys

    [root@smshost vagrant]#
    sudo wwinit database
    
    OUTPUT:
    
    # database:     Checking to see if RPM or capability 'mysql-server' is install NO
    # database:     Checking to see if RPM or capability 'mariadb-server' is insta OKd
    # database:     Activating Systemd unit: mariadb
    # database:      + /bin/systemctl -q restart mariadb.
    # service                   OK
    # database:      + mysqladmin --defaults-extra-file=/tmp/0.HmYBjDpgbXhN/my.cnf OK
    # database:     Database version: UNDEF (need to create database)
    # database:     Creating database schema
    # database:      + mysql --defaults-extra-file=/tmp/0.HmYBjDpgbXhN/my.cnf ware OK
    # localhost
    # database:     Configured user does not exist in database. Creating user.
    # database:      + mysql --defaults-extra-file=/tmp/0.HmYBjDpgbXhN/my.cnf ware OK
    # database:     DB root user does not exist in database. Creating root user.
    # database:      + mysql --defaults-extra-file=/tmp/0.HmYBjDpgbXhN/my.cnf ware OK
    # database:     Updating database permissions for base user
    # database:      + mysql --defaults-extra-file=/tmp/0.HmYBjDpgbXhN/my.cnf ware OK
    # database:     Updating database permissions for root user
    # database:      + mysql --defaults-extra-file=/tmp/0.HmYBjDpgbXhN/my.cnf ware OK
    # database:     Checking binstore kind                          SUCCESS
    # Done.
    
    [root@smshost vagrant]#
    sudo wwinit ssh_keys
    
    OUTPUT:
    
    # ssh_keys:     Checking ssh keys for root                                     OK
    # ssh_keys:     Checking root's ssh config                                     OK
    # ssh_keys:     Checking for default RSA host key for nodes                    NO
    # ssh_keys:     Creating default node ssh_host_rsa_key:
    # ssh_keys:      + ssh-keygen -q -t rsa -f /etc/warewulf/vnfs/ssh/ssh_host_rsa OK
    # ssh_keys:     Checking for default DSA host key for nodes                    NO
    # ssh_keys:     Creating default node ssh_host_dsa_key:
    # ssh_keys:      + ssh-keygen -q -t dsa -f /etc/warewulf/vnfs/ssh/ssh_host_dsa OK
    # ssh_keys:     Checking for default ECDSA host key for nodes                  NO
    # ssh_keys:     Creating default node ssh_host_ecdsa_key:                      OK
    # ssh_keys:     Checking for default Ed25519 host key for nodes                NO
    # ssh_keys:     Creating default node ssh_host_ed25519_key:                    OK
    # Done.
    
  2. Add NFS client mounts of /home and /opt/ohpc/pub to base compute image

    [root@smshost vagrant]#
    [root@smshost vagrant]#
    echo "${sms_ip}:/home /home nfs nfsvers=3,nodev,nosuid 0 0" >> $CHROOT/etc/fstab
    echo "${sms_ip}:/opt/ohpc/pub /opt/ohpc/pub nfs nfsvers=3,nodev 0 0" >> $CHROOT/etc/fstab 
    
    Click here to learn more about the fstab and NFS sharing.

    The fstab is the Filesystem Table (FS tab) and stores the necessary information to enable NFS (Network File Sharing), which is the approach we will follow to allow files to be shared between the smshost and the compute nodes (recall that the compute nodes are stateless and do not have their own local disks for storage).

    The above commands follow the traditional approach to adding new lines into a configuration file.

    The file being appended to (note the >>) is the fstab stored on the compute node images.

    /etc/fstab for the compute nodes, but in reality on the smshost this is located at $CHROOT/etc/fstab.


    ${sms_ip}:/home /home nfs:
    tells the compute node to mount /home to the nfs share that is located at ${sms_ip}:/home.

    This command is read in the order:
    ${sms_ip}:/home(3) /home (1) nfs (2)
    1. Mount /home
    2. using nfs
    3. to ${sms_ip}:/home


    Similarly, the second line is mounting /opt/ohpc/pub via nfs to the smshost's /opt/ohpc/pub directory.

  3. Export /home and OpenHPC public packages from smshost

    [root@smshost ~]#
    [root@smshost ~]#
    echo "/home *(rw,no_subtree_check,fsid=10,no_root_squash)" >> /etc/exports
    echo "/opt/ohpc/pub *(ro,no_subtree_check,fsid=11)" >> /etc/exports  
    
    Click here to learn how to watch the changes.

    (this assumes you have a second sourced shell visible - such as a tmux pane)

    On the second shell:

    watch "cat /etc/exports | grep /home"
    
    watch "cat /etc/exports | grep /opt"
    

    Click here to learn more about the /etc/exports file.

    The /etc/exports file is a configuration file for the NFS server. It contains a list of the local files / directories that are to be made available to NFS clients.

    In the above steps, we are adding /home and /opt/ohpc/pub into the /etc/exports file to enable them for access to the NFS clients on the network.

    Important security warning about NFS shares.

    The file sharing settings for /home allows any other machine on the same network as the 'public' interface of the smshost to mount /home - this should not be allowed in a production environment, but is sufficient for the virtual lab environment since there is no user-sensitive information that can be exploited on this virtual infrastructure.

  4. Finalise NFS configuration and restart the nfs-server service to enforce the configuration changes made in the previous step.

    [root@smshost vagrant]#
    [root@smshost vagrant]#
    [root@smshost vagrant]#
    
    sudo exportfs -a
    sudo systemctl restart nfs-server  
    sudo systemctl enable nfs-server
    # OUTPUT: Created symlink /etc/systemd/system/multi-user.target.wants/nfs-server.service → /usr/lib/systemd/system/nfs-server.service.
    

4.5 Add Additional Components


At this stage of the system setup, the official OpenHPC recipe lists a number of optional customisations that you may want to make to your compute image. These customisations include:

  • Add InfiniBand or Omni-Path drivers
  • Increase memlock limits
  • Restrict ssh access to compute resources
  • Add BeeGFS client
  • Add Lustre client
  • Enable syslog forwarding
  • Add Nagios Core monitoring
  • Add ClusterShell
  • Add mrsh
  • Add genders
  • Add ConMan
  • Add GEOPM

Note - the virtual lab will keep things simple!

For conciseness, the virtual lab will not cover any of these optional extras. If you wish to add any of these additional components to your systems, please refer to the official OpenHPC 2.x Install Recipe for information and guidance on this process.

4.6 Import Files


The Warewulf provisioning system includes functionality to export files from the provisioning server (the smshost in this case) to the managed compute nodes (i.e. to distribute files from the smshost to the compute nodes).

This is a convenient way to distribute user credentials to the compute nodes in your cluster. Similarly, this functionality will be used to import the cryptographic key that the MUNGE authentication library requires to be available on each host in the resource management pool.

  1. Instruct Warewulf to import the local file-based credentials to stage them for distribution to each of the managed hosts:

    [root@smshost vagrant]#
    [root@smshost vagrant]#
    [root@smshost vagrant]#
    sudo wwsh file import /etc/passwd
    sudo wwsh file import /etc/group
    sudo wwsh file import /etc/shadow
    
    Click here to learn how to watch the changes.

    (this assumes you have a second sourced shell visible - such as a tmux pane)

    On the second shell:

    watch "wwsh file list"
    

    Click here to learn more about the authentication method adopted in this virtual lab.

    wwsh file import is a Warewulf command to stage files for transfer to Warewulf managed hosts.


    There are many ways to centrally manage user credentials and access on an HPC, most of which incorporate a 3rd-party credential-management tool (FreeIPA, LDAP, etc.). OpenHPC follows the approach of using the smshost as the arbiter of access for the compute nodes by copying all local user credentials on smshost to the compute nodes.

    The /etc/passwd /etc/group and /etc/shadow files are a collection of user names, user groups, and user passwords (HASHes) that are copied to each compute node. If a user is created on smshost then the user credential files are updated, and need to be transferred to the compute nodes before that user is able to connect to the compute nodes.

  2. Instruct Warewulf to import the MUNGE key to stage it for distribution to each of the managed hosts:

    [root@smshost vagrant]#
    sudo wwsh file import /etc/munge/munge.key
    

Note

MUNGE is a service for user and group authentication useful for a cluster environment. To quote the man page:

MUNGE (MUNGE Uid 'N' Gid Emporium) is an authentication service for creating and validating credentials. It is designed to be highly scalable for use in an HPC cluster environment. It allows a process to authenticate the UID and GID of another local or remote process within a group of hosts having common users and groups.

Click here to recap what was done in Chapter 4.

In this chapter you successfully defined and configured the compute node image for provisioning by Warewulf:

  • the smshost internal interface was configured
  • httpd dhcpd and tftpd services were activated
  • a compute image was defined for the root filesystem of the compute nodes (chroot)
  • OpenHPC and EPEL repositories were installed to this compute node root filesystem
  • The smshost DNS configuration was copied to the compute node image
  • The local user credential files were copied to the compute node image
  • Slurm and MUNGE were enabled on the compute node image and Slurm was configured with a configless environment to query the smshost for configuration parameters
  • NTP was configured on the compute node image
  • Kernel drivers were installed to the compute node image
  • The modules environment was installed to the compute node image
  • An SSH key was added to the compute node image to support remote access for root
  • NFS file sharing was configured on smshost to share files with the compute node images
  • Warewulf was instructed to import the local file-based credentials using wwsh file import

Congratulations!

You have reached the end of Chapter 4.

In the next chapter we will finalise the Warewulf configuration and boot up your compute nodes.


Bug report

Click here if you wish to report a bug.

Provide feedback

Click here if you wish to provide us feedback on this chapter.