Presentation Supercomputing Zulip @ IAC SIEpedia IT News

Follow @SIEie_IAC
Logo SIE

Supercomputing at the IAC

Why Supercomputing?

Supercomputing is a general term that encompasses any high-speed computational process, and whose definition changes as new computing methods are developed. There is a large bibliography about this topic, but in this page we focus on the advantages that can be achieved by IAC researchers by using Supercomputing. If you are interested in theoretical aspects, this Wikipedia article about Supercomputing gives you a good overview and many references and links to further information.

The main reason to use Supercomputing is to get your computational results in less time. That time could be reduced by factor of 1.5, 2, 5, 10, 100, 1000, ... the limit will depend on the restrictions of your problem and program, the Supercomputing techniques you apply and the available resources to compute it. Even if time is not a limiting factor in your computations, using Supercomputing you may be able to work with much bigger problems than those you were previously able to, in the same time.

We describe here the available Supercomputing resources at IAC and some links with more information about them. In March 6th, 2018, we did a presentation about the Supercomputing resources at IAC, you might want to take a look of it (sorry, link only available from IAC's internal network). If you are thinking about using Supercomputing at IAC and need any help, please contact with us (). We may help you to choose the best option depending on your problem, to compile your applications and transfer your data to these environments, to prepare your submit files and perform the submissions, etc.

Teide-HPC and LaPalma Supercomputers (Parallel Computing)

Parallel Computing is the simultaneous execution of the same task (split up and specially adapted) on multiple processors in order to obtain faster results (see also: Parallel Computing by Wikipedia). If you have a problem that requires a huge amount of calculations to be processed, but some of those operations are independent and could be performed at the same time, then you should consider using Parallel Computing in order to get your results in less time. Algorithms with huge loops and iterations with no (or a few) dependencies among them (like simulations of galaxies) are good candidates to be parallelized. Once you have your parallel code, a Supercomputer is needed in order to run it.

Researches at IAC have access to two Supercomputers, Teide-HPC and La Palma:

  1. Teide-HPC (Teide High Performance Computing) is a supercomputer located in the Instituto Tecnológico de Energías Renovables S.A. (ITER). It is the second most powerful supercomputer in Spain and it appears in the 169th position (June 2014) within the Top500 list of the most powerful computers in the world. It is composed of 1,100 Fujitsu computer servers, with a total of 17,800 computing cores (featuring the latest in Intel Sandy Bridge processors, allowing to obtain not only the best performance but also great energy efficiency), 36 TB of memory, a high-performance network and a parallel system of NetApp storage. According to Top500 it has a theoretical peak performance of 340.8 TFLOPs with a maximal LINPACK performance achieved of 274.0 TFLOPs.

    IAC's users need an account to be able to connect and run their programs in Teide-HPC. Please, send an email to support (res_support@iac.es) to get this account and also to solve any issue related to Teide-HPC. We have also prepared documentation about this machine for IAC's users, it is available at SIEwiki in vesta (internal access only).
     
  2. LaPalma, in its third version LaPalma3, belongs to IAC and it is one of thirteen nodes located on Spanish territory linked together to form the Spanish Supercomputing Network (RES). LaPalma node was previously part of MareNostrum, one of the most powerful computers in Europe. In its present status (March 2018), LaPalma has a pick of 83.85 TFLOPS with 4032 cores Intel Xeon E5-2670 and 2GB of RAM per core. The total disk space is 346 TB and Lustre Parallel Filesystem is available. LaPalma has a fast Infiniband network (40 Gb/s) for internal communication (both computation and storage system). A large set of scientific programs and libraries are already installed, and it is possible to install new software packages on demand, if they are compatible and widely used. 50% of the computation time of LaPalma is assigned to the RES, and the other 50% (4,554,547 hours per four-month period) is available for IAC researchers, who can apply for it at any time, although it is recommendable to do it in the official periods to get higher priority.

    Please, visit the following links for more information (some of them are only available from the internal IAC network):
    • LaPalma User's Guide: with complete information about how to use this Supercomputer.
    • LaPalma Computing Time: where you can find out how to get an account on LaPalma (depending on whether you are an IAC researcher or not) and used time per period.
    • Support: If you have further questions, or have any issue when using LaPalma, please send an email to res_support@iac.es.

HTCondor (Distributed Computing)

Distributed computing is the process of running a single computational task on more than one computer (see also: Distributed Computing by Wikipedia). For instance, suppose we need to reduce some data using an application we have developed. We have a very large number of sets of data to reduce and the processing time is some hours per each set. If we compute all sets one by one in our machine, it may take several weeks to have all results… Now imagine we have hundreds of machines where we could run our program reducing different sets of data at the same time, then we could get all results just in a few hours!! HTCondor software makes this possible, and it will do all the work for you: HTCondor will copy the input files to the remote machines, execute your program there with different data and bring back the results to your machine when they are complete.

At the IAC the HTCondor system – High-Throughput Computing (HTC) system – is installed in several "burros" and desktop machines, allowing us to run our applications (shell scripts, astronomical software, our own programs written in C, Fortran, Python, IDL, Matlab, etc.) in other computers when they are not being used by their owners. At this time (Apr. 2023), HTCondor is made up of more than 1000 cores, which are ready to execute other users' programs when they are idle, i.e. you could get the equivalent to one month of serial execution in well below one hour! (this is the theorical maximum, as not all HTCondor cores are always idle; a more realistic estimation could be an average of around 500 idle cores).

Please, visit the following links for more information:

"Burros" (High Performance Linux PCs)

Users who need to run CPU- or memory-intensive jobs, which are unsuitable for their own PCs or other IAC's Supercomputing resources (like LaPalma, TeideHPC, HTCondor system, etc.), can access any of several High performance Linux PCs. They are also suitable for developing, debugging and testing parallel applications before submitting them to other Supercomputers. These machines are open to any user and do not require advanced reservation, but please follow the next simple rules:

  1. Before running any job on them, please check their load (with uptime or htop): if it is higher than the number of cores, wait a bit till it goes down before launching your application. Also check that the load does not exceed the number of cores after your program starts.
  2. If you are testing your parallel codes, check how many cores are being used and don't take up all the cores.
  3. These machines should be used only when developing or testing your parallel programs: if you need to run a parallel aplication for hours or days on a large number of cores, there are better alternatives, such as TeideHPC or LaPalma (please, contact us).
  4. Some of these machines have a huge disk space (about 20TB). Don't abuse it! There are no backups of your data on any of these machines, so don't use it like a storage system. Do not forget to delete or move your data to other locations once your executions are done to make room for other users.

Some of these machines are listed here:

Sorry, this list is only available when connecting from the IAC's internal network. Contact SIE for further details...

Other HPC Resources

There are more resources available in other institutions that can be accessed by researchers at IAC: