Skip to content

5 Cluster Provisioning

This section covers the provisioning of the HPC cluster using Warewulf. By the end of this section you will have a working, networked set of compute and control machines that form the foundation of the HPC system.

Note

Anticipated time to complete this chapter: TBC from user feedback.

5.1 Finalise Configuration


In the preceding chapters we completed the following steps:

  • Configured the virtual cluster environment
  • Prepared the smshost software stack
  • Defined the compute node images for administration through Warewulf

Next, we need to provision the compute nodes. In order to do so, we create a bootstrap image which is used to boot the nodes and complete provisioning.

  1. Create the boostrap image with the following command:
[root@smshost vagrant]#
sudo wwbootstrap `uname -r` 
OUTPUT:

# Number of drivers included in bootstrap: 513
# Building and compressing bootstrap
# Integrating the Warewulf bootstrap: 4.18.0-425.3.1.el8.x86_64
# Including capability: provision-adhoc
# Including capability: provision-files
# Including capability: provision-selinux
# Including capability: provision-vnfs
# Including capability: setup-filesystems
# Including capability: setup-ipmi
# Including capability: transport-http
# Compressing the initramfs
# Locating the kernel object
# Bootstrap image '4.18.0-425.3.1.el8.x86_64' is ready
# Done.
Click here to learn more about a bootstrap image.

A bootstrap image is a small PXE image that includes a Linux kernel, device drivers, and minimal set of programs to enable the provisioning process.


PXE is the Pre-eXecution Environment and is the process by which a computer boots from an image provided over the network, rather than a locally installed image on local disk.


To learn more about the Warewulf bootstrap image, click here.

While most of the provisioned image's configuration is conducted in a chroot filesystem, these chroots cannot be directly provisioned by Warewulf.

Once we are satisfied with our chroot configuration, we must encapsulate and compress this filesystem into a Virtual Node File System (VNFS) image which Warewulf can provision.

You can think of the chroot behaving as the source code, and the VNFS behaving as the compiled binary of that source.

[root@smshost vagrant]#
sudo wwvnfs --chroot $CHROOT 
OUTPUT:

# _FORTIFY_SOURCE requires compiling with optimization (-O) at /usr/lib64/perl5/features.ph line 207.
# Using 'rocky8.5' as the VNFS name
# Creating VNFS image from rocky8.5
# Compiling hybridization link tree                           : 0.41 s
# Building file list                                          : 1.36 s
# Compiling and compressing VNFS                              : 160.99 s
# Adding image to datastore                                   : 27.36 s
# Wrote a new configuration file at: /etc/warewulf/vnfs/rocky8.5.conf
# Total elapsed time                                          : 190.11 s
[root@smshost vagrant]#
[root@smshost vagrant]#
[root@smshost vagrant]#
echo "GATEWAYDEV=${eth_provision}" > /tmp/network.$$ 
sudo wwsh -y file import /tmp/network.$$ --name network 
sudo wwsh -y file set network --path /etc/sysconfig/network --mode=0644 --uid=0
OUTPUT

# About to apply 3 action(s) to 1 file(s):
# 
#     SET: PATH                 = /etc/sysconfig/network
#     SET: MODE                 = 0644
#     SET: UID                  = 0
# 
# Proceed?

5.2 Register Nodes


Now that the base configuration is done for the Warewulf server and the compute node configurations are staged within Warewulf, we must add the node definitions to the Warewulf data store.

Finding the list of node definitions in Warewulf.

You can run wwsh node list to check the list of node definitions before running the following loop, and then again after running the loop, to see how nodes are added and defined in the Warewulf data store.

[root@smshost vagrant]#
[root@smshost vagrant]#
[root@smshost vagrant]#
for ((i=0; i<$num_computes; i++)) ; do  
   wwsh -y node new ${c_name[i]} --ipaddr=${c_ip[i]} \
--hwaddr=${c_mac[i]} -D ${eth_provision} ; done 
Click here to learn how to watch the changes.

(this assumes you have a second sourced shell visible - such as a tmux pane)
On the second shell:

watch "wwsh node list"

Click here to learn more about the above step.

You can declare each compute node manually using wwsh node new which is perfectly reasonable for the virtual lab, but for a large compute cluster of 10s or 100s (or even 1000s) of nodes, this would be wholly impractical.

The for ... do script used above is essentially looping through an equivalent of this command, where i is substituted for the next incremental value of i in each subsequent step through the loop. When i is 0 it is equivalent to this:

wwsh -y node new compute00 --ipaddr=10.10.10.100 --hwaddr=08:00:27:f9:f3:b1 -D eth1

The components of the for loop are broken down as follows:

for ((i=0; i<$num_computes; i++)) ; do
Issues a for ... do loop which starts with a counter i at value 0 and incrementally iterates (i++) up to the specified num_computes. If you ever need to add a new sequence of machines, you can either start at i=0 as above, or you can substitute the i and num_computes values with the relevant series for your task.

wwsh -y node new
Creates new nodes in the Warewulf node definitions data store, with the appropriate parameters (compute name, IPv4 address, MAC address, and provisioning ethernet port).

The node entries have now been defined through the wwsh -y node new command (wwsh node list confirms this) and now we will set the provisioning image that is to be used for the compute nodes. We can also use the following step to specify which imported files we want to use in the provisioning process.

[root@smshost vagrant]#
sudo wwsh -y provision set "${compute_regex}" --vnfs=rocky8.5 --bootstrap=`uname -r` --files=dynamic_hosts,passwd,group,shadow,munge.key,network 

After the above changes, we must restart the DHCP server to reconfigure the dhcpd.conf and update the Warewulf PXE database.

[root@smshost vagrant]#
[root@smshost vagrant]#
sudo systemctl restart dhcpd 
sudo wwsh pxe update 

Note

Warewulf does not replace the standard Linux services (such as DHCP) but acts as a frontend to manage these services.

Warewulf does not create any new technologies to accomplish diskless-remote-boot-linux or network file sharing, etc. - it uses all the existing standard tools and protocols and simply acts as a single interface to all of these various processes.

For troubleshooting and investigation, the standard files involved with PXE, DHCP, NFS, etc. can still be investigated as usual.

5.3 Boot Compute Nodes


Now that we have configured the compute node images and prepared the PXE boot process on the smshost, it is time for us to start up the other virtual machines that form part of the Vagrant specification file.

Exit out of smshost (you can use CTRL+D to iteratively disconnect from any nested sessions), or use another host terminal window, to launch the other virtual machines with the following commands:

[~/openhpc2.x/]$
[~/openhpc2.x/]$
vagrant up compute00 
vagrant up compute01 
Click here to learn how to watch the changes.

You can load the VirtualBox GUI and watch the ceation of two compute nodes labelled compute00 and compute01 in the interface.

If you wish, you can watch the console of the compute nodes and you will see the virtual compute nodes booting from a PXE-delivered compute image from the smshost.

The requested communicator could not be found" -- safely ignore!

Please take note that you will see the following error message when running the above commands:

The requested communicator '' could not be found. Please verify the name is correct and try again.

This can be safely ignored. It has no impact on the lab.

Click here to understand why you can ignore the communicator error.

In order to enable the PXE bootable compute node images with Vagrant, we have created essentially a stripped down .box file. Vagrant expects to have a second network interface connection in order to communicate with the VM, but we don't want that since it isn't how a traditional compute node would be configured.

We have removed the 'communicator' interface from the stripped down .box file which results in the warning.

Click here to recap what you have accomplished in this chapter.

You provisioned the compute nodes by creating a bootstrap image which was used to boot the compute nodes and complete their OS provisioning.

The chroot configuration was encapsulated and compressed into a VNFS image which is readable by Warewulf for provisioning.

The compute node definitions were added into the Warewulf node definitions data store (accessbile for query through wwsh node list).

Warewulf updated the dhcpd.conf file with the PXE parameters and you used vagrant up to provision and boot up the two compute nodes compute00 and compute01.

Congratulations

Your compute nodes are now successfully booted up and ready!


Bug report

Click here if you wish to report a bug.

Provide feedback

Click here if you wish to provide us feedback on this chapter.