Presentation Supercomputing Zulip @ IAC SIEpedia IT News

Follow @SIEie_IAC
Index of all issues July 2023 May 2023 March 2023 January 2023 October 2022 May 2022 February 2022
For earlier issues, please browse the Index
For the former SIENews, please go to the SIENews index

ITNews

The newsletter of the SIE de Investigación y Enseñanza and the Servicios Informáticos N. 7 - July 2023

Supercomputing statistics (1st semester, 2023)

HTCondor usage has increased substantially in the first semester of 2023, with about 700,000 hours (not including the month of February, which saw the transition form the "old" to the "new" HTCondor system, and for which no meaningful usage data could be collected), more than twice the number of CPU hours consumed in the second semester of 2022. On average, this corresponds to an occupation of about 200 slots, about 19% of all available slots. The HTCondor usage summary (since 2006!) can be seen at the HTCondor statistics webpage.

As for the LaPalma3 supercomputer, there is a ~ 20% decline in usage by IAC researchers, with 8.26 million hours in the first semester of 2023, compared to 10.15 million hours in second semester of 2022. On the contrary, RES users have used up 7.14 million hours, almost thrice those consumed in the second semester of 2022. The latest data and graphs about the LaPalma3 usage can be found at the LaPalma statistics webpage.

As for diva, about 550,000 CPU hours have been used, which is about 66% of the total computing capability. This is a slight increase over the ~ 60% usage in the second semester of 2022.

LaPalma3 job queue live statistics

The use of LaPalma3 has been steadily increasing over the years, and at the SIE we want users to have a better experience using the system. One common complaint is that "my jobs are stuck forever on the queue, but two weeks ago they were running perfectly". To mitigate these situations and to be more open about the usage of LaPalma3, we have made public the following tools:

  • A web-based dashboard with a lot of interesting real-time information of the system. It contains, among other things, information about the current number of CPUs being allocated or idle. Especially useful for planning your submissions may be the "Oversubscription" gauge, which indicates the number of CPUs waiting in the queue. If that is close to zero, it would be a great time for submitting jobs!
  • A new squeue-all command, where you can see the jobs in the queueing system of all users in LaPalma (the squeue command will show only your jobs). Due to data protection issues, the submitter of each job is not given, but you can easily identify your jobs by their ID. Particularly useful is the "Priority" column, where you can see the priority of your jobs relative to all other jobs in the queue.

Upcoming seminars after the summer break

After the summer holidays, we all come back refreshed and reinvigorated, eager to get back into work. We all want to be more productive and efficient, and the next two planned seminars organized by the SIE perfectly fit this agenda.

  • Slurm Workload Manager (September 19th). This seminar will focus on the Slurm Workload Manager. If you have used LaPalma, deimos/diva or any other supercomputer, you are most likely already familiar with Slurm, which allows a more efficient use of computing resources, by implementing job queues. As explained in our previous Newsletter, we are installing Slurm in the public "burros" (and also in the project "burros" that request it), so in this seminar we will explain the basic Slurm concepts, how to use the queues effectively, how to install Slurm in your project "burro", etc.
  • Data backups (October 17th). Having a sane backup plan for all your data should have always been a top priority, but with the current trend to work mainly with laptops, this is even more important now: would you be able to easily recover all your work if your laptop was lost or stolen? If stolen, would your personal data (passwords, ID numbers, etc.) be safe from prying eyes? Preparing a good backup plan does involve some work, but it can be your best insurance against future problems with your computer (which at some point are bound to happen!). In this seminar Angel will explain his current backup strategy, which will certainly be useful for the design of your own backup plan.

Cancellation of subscription to NAG licences

In 2011 we started the subscription to the NAG library, paying an annual license fee. It consists of a "collection of reliable, portable, and rigorous mathematical and statistical algorithms used in thousands of applications worldwide". In particular, we have two site-wide, floating licenses for the C/C++ and Fortran libraries, for both the GNU and the Intel compilers. However, its usage has been declining with the years, and as of now the NAG library is very seldom, if at all, used. For this reason, we are considering cancelling the renewal of the NAG license, which currently costs more the five thousand euro annually.

If you are a NAG library user and think that, on the contrary, we should renew its license, please send us (sinfin@iac.es) en email stating so, along with a description of past usage and how you will use it in the near future. Note that programs compiled with the NAG library should keep running regardless of the license status (license is only needed at compile time).

Battery life in laptops

Laptop batteries can have a lifetime of several years if used properly, though the charge they can hold unavoidably decreases with time. Particularly important is not leaving the laptop off for a long time. The Maximizing Battery Life and Lifespan Apple's webpage says that "If you store a device when its battery is fully discharged, the battery could fall into a deep discharge state, which renders it incapable of holding a charge." This probably applies to both Mac and non-Mac laptops, but in our experience Mac seem to be more prone to this problem.

Indeed, already three of the MacBook Pro laptops bought by the Research Area had to be retired from service because their battery died after many months elapsed with the laptop never used. If you have an IAC-provided laptop, and are not currently using it, please:

  • If you do no longer need it, return it to the Research Area or to the CAU.
  • If do not plan to use it for a few months or so, for whatever reason, recharge it from time to time. The website mentioned above suggests that "If you plan to store your device for longer than six months, charge it to 50% every six months."

Distributed Denial of Service (DDoS) attack to the IAC

A few weeks ago, the IAC (Instituto de Astrofísica de Canarias) suffered a Distributed Denial of Service (DDoS) attack that went unnoticed and had no major impact, thanks to the detection and mitigation carried out by both the IAC and RedIRIS automatic systems.

It is common for the IAC to experience different types of random attacks; however, this DDoS attack deserves special attention as it was specifically directed at the IAC with the aim of overwhelming the website and network-accessible resources.

This attack should make us all think about the importance of adopting the computer security measures provided by the IT Services (Servicios Informáticos, SI). Given that this attack was specifically targeted at the IAC, we can be certain that we are a target for some cybercriminal group, as some Spanish research centers have already been targeted, either to encrypt their system's information and demand ransom, or simply to damage the institutions' reputation.

Let's collaborate, to the best of our abilities, in maintaining our network secure.

(Full text in https://www.iac.es/en/outreach/news/distributed-denial-service-ddos-attack-iac)

SIE de Investigación y Enseñanza :: N. 7 - July 2023 - Contact: