How to setup Lustre Client (2.12.x) with InfiniBand on RockyLinux 8

Lustre is a very popular open-source distributed parallel file system used in High Performance Computing. However in my experience with using it I could not find a good and easy to understand documentation.

Prerequisites

  1. RockyLinux 8.x (CentOS / RHEL)
  2. ConnectX-5 (or newer InfiniBand Adapter)
  3. dkms

Pre-Installation

Each Lustre version usually targets a particular version of a kernel and distro specifically. At the time of writing this blog post Lustre 2.12.8 was the latest LTS Lustre release available to public. You can more information about the kernels and the distros supported by Lustre either from the changelog posted in the wiki or the support matrix page.

I would strongly advise users to run the kernel supported by Lustre including the patch number.

You can skip to the next part if you have already installed the kernel supported by Lustre.

I use a nifty little plugin called versionlock for dnf which allows me to freeze the version of a package preserving its version whenever dnf update is run.

You can install versionlock with the following command.

1
dnf install python3-dnf-plugin-versionlock

Once you installed versionlock you can freeze package versions using versionlock add. For example I want to freeze my kernel package to 4.18.0-348.2.1.el8_5, which is the official supported version by Lustre.

1
dnf versionlock add kernel-4.18.0-348.2.1.el8_5

Lustre depends on several kernel packages.

Install all the required packages:

1
2
3
4
5
6
7
8
9
VER="4.18.0-348.2.1.el8_5"
dnf install \
kernel-$VER \
kernel-devel-$VER \
kernel-headers-$VER \
kernel-abi-whitelists-$VER \
kernel-tools-$VER \
kernel-tools-libs-$VER \
kernel-tools-libs-devel-$VER

After installing the packages I suggest you freeze them to prevent dnf from updating them when a new kernel is available. You can freeze the packages using versionlock:

1
2
3
4
5
6
7
8
9
VER="4.18.0-348.2.1.el8_5"
dnf versionlock add \
kernel-$VER \
kernel-devel-$VER \
kernel-headers-$VER \
kernel-abi-whitelists-$VER \
kernel-tools-$VER \
kernel-tools-libs-$VER \
kernel-tools-libs-devel-$VER

Check the current packages frozen with dnf versionlock list:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
dnf versionlock list

kernel-tools-0:4.18.0-348.2.1.el8_5.*
kernel-0:4.18.0-348.2.1.el8_5.*
kernel-modules-0:4.18.0-348.2.1.el8_5.*
kernel-tools-libs-0:4.18.0-348.2.1.el8_5.*
kernel-core-0:4.18.0-348.2.1.el8_5.*
kernel-devel-0:4.18.0-348.2.1.el8_5.*
kernel-modules-extra-0:4.18.0-348.2.1.el8_5.*
kernel-headers-0:4.18.0-348.2.1.el8_5.*

You can clear any frozen packages with

dnf versionlock clear

or unfreeze a single package with

dnf versionlock delete <package name>

Once you have installed the kernel reboot your system.

1
reboot

Confirm your kernel version with uname -r.

You are now ready to begin installing Mellanox InfiniBand drivers.

Installing the MOFED drivers

By default the drivers shipped with the distro are a bit unrealiable and you might need to uninstall it before proceeding. Once done you can download the official MOFED drivers from Mellanox here.

Select the Downloads tab, scoll down to see the latest version of MOFED available. Select RHEL/CentOS and then select RHEL/CentOS 8.5. Select x86_64 or the architecture you are running on and then click on the ISO link. You need to accept the terms and conditions before downloading.

Save the file somewhere you can access later. For this example we have downloaded MLNX_OFED_LINUX-5.5-1.0.3.2-rhel8.5-x86_64.iso.

The above steps might differ from user to user. Please change accordingly.

Create a temporary mount point /mnt and mount the ISO file.

1
mount -o ro,loop MLNX_OFED_LINUX-5.5-1.0.3.2-rhel8.5-x86_64.iso /mnt

Install the InfiniBand drivers:

1
2
cd /mnt
./mlnxofedinstall --distro rhel8.5 --all

Note here that I mentioned the distro as rhel8.5. MOFED drivers don’t support RockyLinux by default. Since RockyLinux is designed to be a 1:1 bug-for-bug compatible with Red Hat Enterprise Linux (RHEL) you can force the installer to assume the distro is RHEL.

You might be required to install some dependency packages required for the installer to proceed. The installer will share the command needed to install the dependencies.

Once you have installed reboot the system to load the InfiniBand drivers.

Configuring InfiniBand for IPoIB

Run ipstat to check the physical state of the your InfiniBand adapter.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
ibstat

CA 'mlx5_0'
	CA type: MT4119
	Number of ports: 1
	Firmware version: 16.32.1010
	Hardware version: 0
	Node GUID: 0x506b4b0300fbee72
	System image GUID: 0x506b4b0300fbee72
	Port 1:
		State: Active
		Physical state: Initializing
		Rate: 100
		Base lid: 9
		LMC: 0
		SM lid: 12
		Capability mask: 0x2651e848
		Port GUID: 0x506b4b0300fbee72
		Link layer: InfiniBand

You can see that our physical state is stuck as Initializing. Enable opensm (Subnet Manager) to change the state to LinkUp.

1
systemctl enable --now opensm.service

You now need to configure the InfiniBand interface like a typical Ethernet interface.

You can use nmtui (NetworkManager) to configure the interface (usually called ib0)

Configure a static or dynamic IP for your InfiniBand adapter.

You may notice a parameter called Transport Mode. Mellanox recommends Datagram Mode for better scalability and performance and defaults to it (except for Connect-IB cards).

You may read more about it in the official Mellanox documentation and Linux Kernel documentation.

For the sake of simplicity we will choose the default settings however it is worth investigating the other option for optimizing the performance of hardware resources.

Once done you can verify if the IP has be assigned properly with:

1
ip link show ib0 #Change according to interface

Once verified we can move onto the installation of Lustre.

Installing Lustre Client

Lustre requires Extra Packages for Enterprise Linux (EPEL) repository enabled as it requires a package dkms.

Install the EPEL repository:

1
2
dnf install epel-release
dnf install dkms

Alternatively you can also choose to install a binary kernel module (kmod) for which you can skip the installation of dkms.

Now we need to add the repository containing Lustre packages.

Using nano or any editor create a file with the following content:

1
2
3
4
5
6
7
8
nano /etc/yum.repos.d/lustre.repo

[lustre-client]
name=lustre-client
baseurl=https://downloads.whamcloud.com/public/lustre/latest-release/el8/client
# exclude=*debuginfo*
enabled=1
gpgcheck=0

Clear your dnf cache and update repository metadata

1
2
dnf clean metadata
dnf update

I recommend this way since it ensures that Lustre module is built and installed properly.

1
dnf install lustre-client lustre-client-dkms

Install Lustre Client along with its binary kernel module (Alternative)

1
dnf install lustre-client kmod-lustre-client

Once done we now need to configure the Lustre Network (LNet). This is a required step used by Lustre for routing network metadata and file I/O.

There are two ways to write the configuration for Lnet. We shall create a static configuration for LNet. But with Lustre version 2.7.0 and above you can dynamically define the routing using a utility callled lnetctl.

Create a lustre.conf modprobe file:

1
2
3
nano /etc/modprobe.d/lustre.conf

options lnet networks="o2ib0(ib0)"

Here the InfiniBand interface we are using is ib0.

For multi-rail setup:

1
2
3
nano /etc/modprobe.d/lustre.conf

options lnet networks="o2ib0(ib0),o2ib1(ib1)"

Once done reboot the machine.

Mounting Lustre

Let’s say you have a scratch filesystem you created on a Lustre server and would like to mount it at /scratch

1
mount -t lustre -o flock 192.168.15.250@o2ib0:/scratch /scratch

If we have multiple MGS nodes you can specify the primary, secondary, and other MGS nodes for Lustre to connect.

1
mount -t lustre -o flock 192.168.15.250@o2ib0:192.168.15.251@o2ib0:/scratch /scratch

I recommend the flock option while mounting Lustre filesystems. This enables support for coherent posix file locks on open files. This is the default mode from Lustre 2.13 and above.

Verify mount points with:

1
2
3
4
df -ht lustre
Filesystem                                        Size  Used Avail Use% Mounted on
192.168.15.250@o2ib:192.168.15.251@o2ib:/scratch  1.2P  120T  1.1P  10% /scratch
192.168.15.252@o2ib:192.168.15.253@o2ib:/home     8.0T  402G  7.6T   5% /home

Congrats! You successfully installed, configured and mounted Lustre on client nodes.

If you want to read further I suggest going through the Lustre documentation.