Presentation Supercomputing Zulip @ IAC SIEpedia IT News

Index of all issues October 2024 August 2024 May 2024 March 2024 October 2023 July 2023 May 2023 March 2023 January 2023 October 2022 May 2022 February 2022
For earlier issues, please browse the Index
For the former SIENews, please go to the SIENews index

ITNews

The newsletter of the SIE de Investigación y Enseñanza and the Servicios Informáticos N. 8 - October 2023

Important configuration changes to the SO burros

If you have logged in recently on the SO burro you surely have seen the banner informing that it will be shut down on October 30th due to a software update.

This is in line with the upgrade of all "burros" to Ubuntu 22.04, but in this case the change goes deeper than that. Since its installation, one of the SO machines was the "head" node for the other, but we are now going to separate them, and install them as two distinct "burro"s: the first, smaller with 20 cores but two P100 GPUs; the other with 192 cores and 4.5TB of RAM). Each one will have its queueing system, following the now "standard" installation in "burro"s (see https://vesta.ll.iac.es/SIE/hpc/burros/index.html).

If you were planning to use these machines during the first days of November, you can move your jobs to any of the three public burros now in production, which have the same operating system and Slurm configuration that these two machines will have after the upgrade.

New dashboard for SLURM usage in burros

We keep expanding the pool of burros with Slurm to manage their use. To make life easier we already have extensive documentation with many examples. However, often the question is: which burro should I use? To help you make this decision, we have prepared a dashboard showing the real-time usage of the burros. This way, you can easily identify which ones are currently being underused and have low queueing times.

The dashboard can be found at http://pasa/grafana/burros

We also remind you that there is a similar dashboard for LaPalma, accessible at http://pasa/grafana/lapalma

Tip: you can also check the status of the queue in all the burros at once with: squeue -M all.

Software upgrades

A few important upgrades were done in the last few weeks.

Slurm has been upgraded to version 23.02.6, which addresses a number of filesystem race conditions that could let an attacker take control of an arbitrary file, or remove entire directories' contents. Details about this can be found at: https://www.schedmd.com/news.php. Also, memory control with Slurm has been activated in all public "burros". Make sure you read the relevant section in the documentation to understand how to request more/less memory, especially if you expect your jobs to be memory-intensive.

The Python virtual environment for machine learning has been upgraded to Python 3.10, and several included packages have been updated as well, like Pytorch, now running v2.0.1, and Tensorflow, v2.13. To load this environment, simply type module load ml_py310. It is available in those public burros equipped with GPU.

Last, HTCondor has been upgraded to the very latest stable version, 23.0.0, released on September 30. This big jump in version number (from 10.0) shifts HTCondor to a version numbering scheme where the "major" number is the calendar year of the release (this scheme is named Calendar Versioning, or CalVer for short, see https://calver.org/ for details).

Comparing the performance of the available compilers

At the IAC three different compilers are available: gnu, intel and nvidia. Which one should you use? While there are several factors that could influence this decision, the performance of the generated executable files is obviously an important one.

As part of a larger benchmarking exercise, we have recently performed a comparison of the three compilers, in three different "burros", running three pseudo applications (BT, SP and LU) of the NAS Parallel Benchmarks. The conclusion is not clear-cut in all cases, but for the public burros the best bet (for the time being) seems to be to use the nvidia compiler (always assuming that your application is similar to one of the three pseudo applications above). For details of the results obtained, how the codes were compiled, etc. check this Zulip topic.

SIE de Investigación y Enseñanza :: N. 8 - October 2023 - Contact: