Condor is developed by the Condor Team at the University of Wisconsin-Madison (UW-Madison), and was first installed as a production system in the UW-Madison Computer Sciences department more than 10 years ago.
In a nutshell, Condor is a specialized batch system for managing compute-intensive jobs. Like most batch systems, Condor provides a queuing mechanism, scheduling policy, priority scheme, and resource classifications. Users submit their compute jobs to Condor, Condor puts the jobs in a queue, runs them, and then informs the user as to the result.
Batch systems normally operate only with dedicated machines. Often termed compute servers, these dedicated machines are typically owned by one organization and dedicated to the sole purpose of running compute jobs. Condor can schedule jobs on dedicated machines. But unlike traditional batch systems, Condor is also designed to effectively utilize non-dedicated machines to run jobs. By being told to only run compute jobs on machines which are currently not being used (no keyboard activity, no load average, no active telnet users, etc), Condor can effectively harness otherwise idle machines throughout a pool of machines. This is important because often times the amount of compute power represented by the aggregate total of all the non-dedicated desktop workstations sitting on people's desks throughout the organization is far greater than the compute power of a dedicated central resource.
Before we run anything with Condor, we need to find out what resources are available at our pool. For this, we can use CondorView to view historical data, or condor_status to find about the current state of our pool.
This is a very easy-to-use web application that let's you see through time how many machines were in our pool, how many were being used by Condor, who submitted jobs to the pool, etc.
At present the CondorView interface is at http://duraznero/, accessible through the IAC Condor page at https://research.iac.es/sieinvens/SINFIN/Condor/index.php.
Before you learn how to submit a job, it is important to understand how Condor allocates resources. Condor simplifies job submission by acting as a matchmaker of ClassAds. Condor's ClassAds are analogous to the classified advertising section of the newspaper. Sellers advertise specifics about what they have to sell, hoping to attract a buyer. Buyers may advertise specifics about what they wish to purchase. Both buyers and sellers list constraints that need to be satisfied. In Condor, users submitting jobs can be thought of as buyers of compute resources and machine owners are sellers.
All machines in a Condor pool advertise their attributes, such as available RAM memory, CPU type and speed, virtual memory size, current load average, along with other static and dynamic properties. This machine ClassAd also advertises under what conditions it is willing to run a Condor job and what type of job it would prefer. You may advertise that your machine is only willing to run jobs at night and when there is no keyboard activity on your machine. In addition, you may advertise a preference (rank) for running jobs submitted by you or one of your co-workers.
Likewise, when submitting a job, you specify a ClassAd with your requirements and preferences. The ClassAd includes the type of machine you wish to use. For instance, perhaps you are looking for the fastest floating point performance available. You want Condor to rank available machines based upon floating point performance. Or, perhaps you care only that the machine has a minimum of 128 Mbytes of RAM.
Condor plays the role of a matchmaker by continuously reading all the job ClassAds and all the machine ClassAds, matching and ranking job ads with machine ads. Condor makes certain that all requirements in both ClassAds are satisfied.
Once Condor is installed, you will get a feel for what a machine ClassAd does by trying the condor_status command.
naranja(67)~/Condor-Course/dagman1> condor_status
Name OpSys Arch State Activity LoadAv Mem ActvtyTime
canistel.iac. LINUX INTEL Claimed Suspended 0.800 500 0+00:00:04
codorniz.iac. LINUX INTEL Owner Idle 5.000 500 0+19:25:20
correhuela.ia LINUX INTEL Claimed Suspended 0.830 1005 0+00:00:04
drosera.iac.e LINUX INTEL Claimed Suspended 0.830 248 0+00:00:04
paraguayo.iac LINUX INTEL Owner Idle 0.000 500 0+00:50:04
resines.ll.ia LINUX INTEL Owner Idle 3.030 1005 0+04:13:10
temple.ll.iac LINUX INTEL Owner Idle 2.000 500 0+04:16:09
abeto.iac.es SOLARIS29 SUN4u Claimed Suspended 0.420 256 0+00:02:00
aguila.iac.es SOLARIS29 SUN4u Owner Idle 0.050 640 0+01:18:55
ajedrea.iac.e SOLARIS29 SUN4u Claimed Busy 1.000 512 0+19:00:42
albatros.iac. SOLARIS29 SUN4u Claimed Suspended 0.090 640 0+00:00:04
anchoa.ll.iac SOLARIS29 SUN4u Claimed Busy 1.000 256 0+19:13:47
ansar.iac.es SOLARIS29 SUN4u Claimed Busy 1.000 576 0+15:35:16
asno.iac.es SOLARIS29 SUN4u Claimed Busy 1.020 256 0+01:00:53
avestruz.iac. SOLARIS29 SUN4u Claimed Busy 0.990 128 0+17:51:11
[...]
Machines Owner Claimed Unclaimed Matched Preempting
INTEL/LINUX 7 4 3 0 0 0
SUN4u/SOLARIS29 94 26 68 0 0 0
Total 101 30 71 0 0 0
naranja(68)~/Condor-Course/dagman1>
But there is much more to condor_status...Here there are some useful options of the condor_status command:
condor_status -available
condor_status -run
condor_status -sort Memory
condor_status -l codorniz.iac.es
For example:
naranja(68)~/Condor-Course/dagman1> condor_status -l naranja.iac.es
MyType = "Machine"
TargetType = "Job"
Name = "naranja.iac.es"
Machine = "naranja.iac.es"
Rank = 0.000000
CpuBusy = ((LoadAvg - CondorLoadAvg) >= 0.500000)
COLLECTOR_HOST_STRING = "codorniz"
CondorVersion = "$CondorVersion: 6.6.3 Mar 29 2004 $"
CondorPlatform = "$CondorPlatform: SUN4X-SOLARIS29 $"
VirtualMachineID = 1
ExecutableSize = 284
JobUniverse = 5
NiceUser = FALSE
ImageSize = 8304
VirtualMemory = 384888
Disk = 30672106
CondorLoadAvg = 0.940000
LoadAvg = 0.940000
KeyboardIdle = 1
ConsoleIdle = 60233
Memory = 640
Cpus = 1
StartdIpAddr = "<161.72.64.97:62302>"
Arch = "SUN4u"
OpSys = "SOLARIS29"
UidDomain = "iac.es"
FileSystemDomain = "iac.es"
Subnet = "161.72.64"
HasIOProxy = TRUE
TotalVirtualMemory = 384888
TotalDisk = 30672106
KFlops = 102016
Mips = 601
LastBenchmark = 1096346012
TotalLoadAvg = 0.940000
TotalCondorLoadAvg = 0.940000
ClockMin = 671
ClockDay = 2
TotalVirtualMachines = 1
HasFileTransfer = TRUE
HasMPI = TRUE
HasJICLocalConfig = TRUE
HasJICLocalStdin = TRUE
JavaVendor = "Sun Microsystems Inc."
JavaVersion = "1.4.1_01a"
JavaMFlops = 15.282580
HasJava = TRUE
HasRemoteSyscalls = TRUE
HasCheckpointing = TRUE
StarterAbilityList = "HasFileTransfer,HasMPI,HasJICLocalConfig,HasJICLocalStdin,
HasJava,HasRemoteSyscalls,HasCheckpointing"
CpuBusyTime = 0
CpuIsBusy = FALSE
State = "Claimed"
EnteredCurrentState = 1096364649
Activity = "Suspended"
EnteredCurrentActivity = 1096366304
Start = ((KeyboardIdle > 15 * 60) && (((LoadAvg - CondorLoadAvg) <= 0.300000) ||
(State != "Unclaimed" && State != "Owner")))
Requirements = START
CurrentRank = 0.000000
RemoteUser = "plopez@iac.es"
RemoteOwner = "plopez@iac.es"
ClientMachine = "naranja.iac.es"
JobId = "3362.0"
JobStart = 1096364653
[...]
DaemonStartTime = 1096054048
UpdateSequenceNumber = 1087
MyAddress = "<161.72.64.97:62302>"
LastHeardFrom = 1096366308
UpdatesTotal = 247
UpdatesSequenced = 246
UpdatesLost = 3
UpdatesHistory = "0x04400000000000000000000000000000"
naranja(69)~/Condor-Course/dagman1>
Some of the listed attributes are used by Condor for scheduling. Other attributes are for information purposes. An important point is that any of the attributes in a machine ad can be utilized at job submission time as part of a request or preference on what machine to use. Additional attributes can be easily added. For example, your site administrator can add a physical location attribute to your machine ClassAds.
Refer to the condor_status command reference page in the Condor Manual to find out how to obtain the following information:
The machine toro.iac.es has Java Version: 1.4.1_01a The machine vibora.iac.es has Java Version: 1.4.1_01a The machine viola.iac.es has Java Version: 1.4.1_01a The machine zorro.ll.iac.es has Java Version: 1.4.1_01a [...]
Next: Basic job submission
Up: Hands-on session with Condor:
Previous: Preliminary
Contents