next up previous contents
Next: Basic job submission Up: Hands-on session with Condor: Previous: Preliminary   Contents

Subsections

Introduction

Condor is developed by the Condor Team at the University of Wisconsin-Madison (UW-Madison), and was first installed as a production system in the UW-Madison Computer Sciences department more than 10 years ago.

In a nutshell, Condor is a specialized batch system for managing compute-intensive jobs. Like most batch systems, Condor provides a queuing mechanism, scheduling policy, priority scheme, and resource classifications. Users submit their compute jobs to Condor, Condor puts the jobs in a queue, runs them, and then informs the user as to the result.

Batch systems normally operate only with dedicated machines. Often termed compute servers, these dedicated machines are typically owned by one organization and dedicated to the sole purpose of running compute jobs. Condor can schedule jobs on dedicated machines. But unlike traditional batch systems, Condor is also designed to effectively utilize non-dedicated machines to run jobs. By being told to only run compute jobs on machines which are currently not being used (no keyboard activity, no load average, no active telnet users, etc), Condor can effectively harness otherwise idle machines throughout a pool of machines. This is important because often times the amount of compute power represented by the aggregate total of all the non-dedicated desktop workstations sitting on people's desks throughout the organization is far greater than the compute power of a dedicated central resource.

Getting to know the IAC Condor Pool

Before we run anything with Condor, we need to find out what resources are available at our pool. For this, we can use CondorView to view historical data, or condor_status to find about the current state of our pool.

CondorView statistics

This is a very easy-to-use web application that let's you see through time how many machines were in our pool, how many were being used by Condor, who submitted jobs to the pool, etc.

At present the CondorView interface is at http://duraznero/, accessible through the IAC Condor page at http://research.iac.es/sieinvens/SINFIN/Condor/index.php.

The condor_status command

The concept of matchmaking: ads in Condor.

Before you learn how to submit a job, it is important to understand how Condor allocates resources. Condor simplifies job submission by acting as a matchmaker of ClassAds. Condor's ClassAds are analogous to the classified advertising section of the newspaper. Sellers advertise specifics about what they have to sell, hoping to attract a buyer. Buyers may advertise specifics about what they wish to purchase. Both buyers and sellers list constraints that need to be satisfied. In Condor, users submitting jobs can be thought of as buyers of compute resources and machine owners are sellers.

All machines in a Condor pool advertise their attributes, such as available RAM memory, CPU type and speed, virtual memory size, current load average, along with other static and dynamic properties. This machine ClassAd also advertises under what conditions it is willing to run a Condor job and what type of job it would prefer. You may advertise that your machine is only willing to run jobs at night and when there is no keyboard activity on your machine. In addition, you may advertise a preference (rank) for running jobs submitted by you or one of your co-workers.

Likewise, when submitting a job, you specify a ClassAd with your requirements and preferences. The ClassAd includes the type of machine you wish to use. For instance, perhaps you are looking for the fastest floating point performance available. You want Condor to rank available machines based upon floating point performance. Or, perhaps you care only that the machine has a minimum of 128 Mbytes of RAM.

Condor plays the role of a matchmaker by continuously reading all the job ClassAds and all the machine ClassAds, matching and ranking job ads with machine ads. Condor makes certain that all requirements in both ClassAds are satisfied.

Inspecting Machine ClassAds with condor_status.

Once Condor is installed, you will get a feel for what a machine ClassAd does by trying the condor_status command.

naranja(67)~/Condor-Course/dagman1> condor_status

Name          OpSys       Arch   State      Activity   LoadAv Mem   ActvtyTime

canistel.iac. LINUX       INTEL  Claimed    Suspended  0.800   500  0+00:00:04
codorniz.iac. LINUX       INTEL  Owner      Idle       5.000   500  0+19:25:20
correhuela.ia LINUX       INTEL  Claimed    Suspended  0.830  1005  0+00:00:04
drosera.iac.e LINUX       INTEL  Claimed    Suspended  0.830   248  0+00:00:04
paraguayo.iac LINUX       INTEL  Owner      Idle       0.000   500  0+00:50:04
resines.ll.ia LINUX       INTEL  Owner      Idle       3.030  1005  0+04:13:10
temple.ll.iac LINUX       INTEL  Owner      Idle       2.000   500  0+04:16:09
abeto.iac.es  SOLARIS29   SUN4u  Claimed    Suspended  0.420   256  0+00:02:00
aguila.iac.es SOLARIS29   SUN4u  Owner      Idle       0.050   640  0+01:18:55
ajedrea.iac.e SOLARIS29   SUN4u  Claimed    Busy       1.000   512  0+19:00:42
albatros.iac. SOLARIS29   SUN4u  Claimed    Suspended  0.090   640  0+00:00:04
anchoa.ll.iac SOLARIS29   SUN4u  Claimed    Busy       1.000   256  0+19:13:47
ansar.iac.es  SOLARIS29   SUN4u  Claimed    Busy       1.000   576  0+15:35:16
asno.iac.es   SOLARIS29   SUN4u  Claimed    Busy       1.020   256  0+01:00:53
avestruz.iac. SOLARIS29   SUN4u  Claimed    Busy       0.990   128  0+17:51:11
[...]

                     Machines Owner Claimed Unclaimed Matched Preempting

         INTEL/LINUX        7     4       3         0       0          0
     SUN4u/SOLARIS29       94    26      68         0       0          0

               Total      101    30      71         0       0          0
naranja(68)~/Condor-Course/dagman1>

But there is much more to condor_status...Here there are some useful options of the condor_status command:

Some of the listed attributes are used by Condor for scheduling. Other attributes are for information purposes. An important point is that any of the attributes in a machine ad can be utilized at job submission time as part of a request or preference on what machine to use. Additional attributes can be easily added. For example, your site administrator can add a physical location attribute to your machine ClassAds.


Exercises

Refer to the condor_status command reference page in the Condor Manual to find out how to obtain the following information:

  1. A list of all the Linux machines available, sorted by their amount of memory.
  2. A list of the java version installed in all the Java-capable Solaris machines (printed in the format given below), using only one condor_status command:

    The machine toro.iac.es has Java Version: 1.4.1_01a
    The machine vibora.iac.es has Java Version: 1.4.1_01a
    The machine viola.iac.es has Java Version: 1.4.1_01a
    The machine zorro.ll.iac.es has Java Version: 1.4.1_01a
    [...]
    


next up previous contents
Next: Basic job submission Up: Hands-on session with Condor: Previous: Preliminary   Contents

Angel M de Vicente 2004-10-25