HTCondor(3): Submit files (description and examples)
Please note that all the SIEpedia's articles address specific issues or questions raised by IAC users, so they do not attempt to be rigorous or exhaustive, and may or may not be useful or applicable in different or more general contexts.
Introduction | Useful Commands | Submit files (desc. & examples) | Submit files (HowTo) | FAQs | HTCondor and IDL |
IMPORTANT: This documentation is deprecated. It will not be further updated. The new documentation for HTCondor can be found here
HTCondor submit files (description and examples)
Introduction
To execute your application with HTCondor, you have to specify some parameters like the name of your executable, its arguments, inputs and outputs, requirements, etc. This information is written in a plain text using submit commands in a file called HTCondor Submit Description File or simply submit file. Once that file is filled with all needed info, you have to submit it to HTCondor using condor_submit
in your terminal, and then it will be processed and your jobs will be added to the queue in order to be executed.
Submit files have considerably changed after the release of versions 8.4.X (first version 8.4.0 released in Sept 2015, since Feb 2017 we are using versions 8.6.X). Some operations were not possible or highly painful in previous versions (like dealing with an undetermined number of files with arbitrary names, declaring variables and macros and performing operations with them, including submission commands from other files, adding conditional statements, etc.). To solve that, many researchers developed external scripts (perl, python, bash, etc.) to dynamically create description files and submit them, what in most cases resulted in complex submissions and less efficient executions, not to mention that usually it was needed a hard work to adapt those scripts when the application, arguments and/or IO files changed.
With the addition of new, powerful and flexible commands, most of those problems have been solved, so there should be no need of using external scripts and we highly recommend you always use a HTCondor submit description file instead of developing scripts in other languages. If you did that in the past, please, consider migrating your old scripts, we will give you support if you find any problems.
In this section you will find templates and examples of HTCondor Submit Description Files. Use them as reference to create your own submit files and contact us if you have any doubt or issue. Topics:
- Creating a submit file (description and structure of submit files: comments, variables, commands, etc.)
- Templates and examples of submit files
- OLD examples
- Some more useful commands and info
Caution!: Before submitting your real jobs, perform always some simple tests in order to make sure that both your submit file and program will work in a proper way: if you are going to submit hundreds of jobs and each job takes several hours to finish, before doing that try with just a few jobs and change the input data in order to let them finish in minutes. Then check the results to see if everything went fine before submitting the real jobs. Also we recommend you use condor_submit -dry-run
to debug your jobs and make sure they will work as expected, see useful commands page). Bear in mind that submitting untested files and/or jobs may cause a waste of time and resources if they fail, and also your priority will be lower in following submissions.
Creating a Submit File
As many other languages, HTCondor submit files allow the use of comments, variable, macros, commands, etc. Here we will describe the most common ones, you can check the official documentation for a complete and detailed information about submit files and submitting process.
Comments
HTCondor uses symbol #
for comments. Everything found after that symbol will be ignored. Please, do not mix commands and comments in the same line, since it may produce errors. We recommend you always write commands and comments in different lines.
# This is a valid comment
A = 4 # This may produce errors when expanding A
, do not use comments and anything else in the same line!
Variables and macros
There are many predefined variables and macros in HTCondor that you can use, and you can define your own ones.
- To define a variable, just chose a valid name (names are case-insensitive) and assign a value to it, like
N = 4
,Name = "example"
- To get the value of a variable, use next syntax:
$(varName)
, both$
symbol and parentheses()
are mandatory. - You can do basic operations with variables, like
B = $(A) + 1
, etc. (since version 8.4.0 is not needed to use the old and complex syntax$$[(...)]
for the operations). To get the expression evaluated, you may need to use function macros like$INT(B)
,$REAL(B)
, etc. - There are several special automatic variables defined by HTCondor that will help you when creating your submit file. The most useful one is
$(Process)
or$(ProcId)
, that will contain the Process ID of each job (if you submitN
jobs, the value of$(Process)
will be0
for the first job andN-1
in the last job). This variable is like an iteration counter and you can use it to specify different inputs, outputs, arguments, ... for each job. There are some other automatic variables, like$(Cluster)
or$(ClusterId)
that stores the ID of each submission,$(Item)
,$(ItemIndex)
,$(Step)
,$(Row)
, etc. (see Example1 for further information). - There are several pre-defined Function Macros. Their syntax is
$FunctName(varName)
and they can perform some operations on variablevarName
like evaluating expressions and type conversions, selecting a value from a list according an index, getting random numbers, string operations, filenames processing, setting environment variables, etc. Before creating your own macros, check if HTCondor has already a pre-defined Function Macro with the same purpose.
Submit commands
You will need to add several HTCondor submit commands in your script file in order to specify which executable you want to run and where it is located, its arguments if any, input files, which result files will be generated, etc. There is a wide set of HTCondor with almost 200 different submit description file commands to cover many different scenarios. But in most situations you will only need to specify a few of them (usually about 10-15). Here we will present the most common ones (commands are case-insensitive):
- Mandatory commands:
executable
: specify where your executable is located (you can use an absolute path, a relative one to the directory where you do the submission or to another directory specified withinitialdir
). You should specify only the executable and not other things like arguments, etc., there are specific commands for that. HTCondor will automatically copy the executable file from your machine to any machine where your job will be executed, so you do not need to worry about that.queue
: this command will send your job(s) to the queue, so it should be the last command in your submit file. In previous versions of HTCondor it was quite limited, only allowing the number of jobs as argument. But since version 8.4.0, this command is very powerful and flexible, and you can use it to specify variables, iterations over other commands, files to be processed, list of arguments, etc. (see complete syntax and examples).
- Highly recommended commands:
output
: it will copy the standard output printed on the screen (stdout
) of the remote machines when executing your program to the local file you specify here. Since all the jobs will use the same name, the filename should include some variable parts that change depending on the job to avoid overwritten the same file, like$(Process)
(and also$(Cluster)
if you do not want that different submissions ruin your output files). Even if your program does not print any useful results on screen, it is very recommended you save the screen output to check if there were errors, debug them if any, etc.error
: the same as previous command, but for standard error output (stderr
).log
: it will save a log of your submission that later can be analysed with HTCondor tools. This is very useful when there is any problem with your job(s) to find the problem and fix it. The log should be the same for all jobs submitted in the same cluster, so you should not use$(Process)
in the filename (but including$(Cluster)
is recommended).universe
: there are several runtime environments in HTCondor called universes, we will mostly use the one namedvanilla
since it is the easiest one. This is the universe by default, so if you miss this command, your jobs will also go tovanilla
universe.
- Useful commands when working with inputs and outputs (arguments, files, keyboard, etc.):
arguments
: it is used to specify options and flags for your executable file, like when using it in command line.should_transfer_files
: assignYES
to it in order to activate HTCondor file transfer system (needed when working with files).when_to_transfer_output
: it will usually have a value ofON_EXIT
to only copy output files when your job is finished, avoiding the copy of temporary or incomplete files if your job fails or it is moved to another machine.transfer_input_files
: it is used to specify where the needed input files are located. We can use a comma-separated list of files (with absolute or relative paths, as mentioned inexecutable
command). Local path will be ignored, and HTCondor will copy all files to the root directory of a virtual location on the remote machine (your executable will be also copy to the same place, so input files will be in the same directory). If you specify a directory in this command, you can choose if you want to copy only the content of the directory (add a slash "/
" at the end, for instancemyInputDir/
) or the directory itself and its content (do not add a slash).transfer_output_files
: a comma-separated list of result files to be copied back to our machine. If this command is omitted, HTCondor will automatically copy all files that have been created or modified on the remote machine. Sometimes omitting this command is useful, but other times our program creates many temporary or useless files and we only want to get the ones we specify with this command.
transfer_output_remaps
: it changes the name of the output files when copying them to your machine. That is useful when your executable generates result file(s) with the same name, so changing the filename to include a variable part (like$(Process)
and maybe also$(Cluster)
) will avoid overwritten them.initialdir
: this command is used to specify the base directory for input and output files, instead of the directory where the submission was performed from. If this command include a variable part (like$(Process)
), you can use this command to specify a different base directory for each job.input
: if your program needs some data from keyboard, you can specify a file or a comma-separated list of files containing it (each end of line in the file will have the same behaviour as pressingIntro
key in the keyboard, like when usingstdin
redirection in command line with<
). As other similar commands, you can use absolute or relative paths.transfer_executable
: by default its value isTrue
, but if it is set toFalse
, HTCondor will not copy the executable file to the remote machine(s). This is useful when the executable is a system command or a program that is installed in all machines, so it is not needed to copy it.
- Other useful commands:
request_memory
,request_disk
: if your program needs a certain amount of total RAM memory or free disk space, you can use these commands to force that your jobs will be only executed on machines with at least the requested memory/free disk space [HowTo]requirements
: this is a very useful command if your program has any special needs. With it you can specify that your job can be only executed on some machines (or some machines cannot run your program) according to a wide set of parameters (machine name, operative system and version and a large etc.) [HowTo]rank
: you can specify some values or combination of them (total memory, free disk space, MIPS, etc.) and HTCondor will choose the best machines for your jobs according to your specifications, where the higher the value, the better (this command is used to specify preferences, not requirements) [HowTo]getenv
: if it is set toTrue
, all your environment variables will be copied at submission time and they will be available when your program is executed on remote machines (if you do not use this command or it is set toFalse
, then your jobs will have no environment variables). This is useful when running some programs that need a special environment, like python, etc. [HowTo]nice_user
: if it is set toTrue
, your jobs will be executed with a fake user with very low priority, what could be very useful when the queue is (almost) empty, so you can run your jobs without wasting your real user priority (you can activate and deactivate this feature when your jobs are being executed, so you can begin running your jobs as nice user if the queue is empty and change to normal user when the queue has many other jobs, or vice versa) [HowTo]concurrency_limits
: you can limit the maximum number of your jobs that could be executed at the same time. You should use this command if your program needs licences and there are a few of them (likeIDL
, see also this alternative) or if for any reason you cannot use the HTCondor file transfer system and all your jobs access to the same shared resource (/scratch
,/net/nas
, etc.), in order to avoid that too many concurrent access can stress the network [HowTo]include
: since HTCondor v8.4.0, it is possible to include externally defined submit commands using syntax:include : <myfile>
. You can even include the output of external scripts that will be executed at submission time, adding a pipe symbol after the file:include : <myscript.sh> |
environment
: this command will allow you to set/unset/change any environment variable(s) [HowTo]priority
: if some of your jobs/clusters are more important than others and you want to execute them first, you can usepriority
command to assign them a priority (the higher the value, the higher priority). This command only have an effect on your own jobs, and it is not related to users priority [HowTo]job_machine_attrs
,job_machine_attrs_history_length
: use these commands to reduce the effects of black holes in HTCondor, what causes that many of your jobs could fail in a short time [HowTo]noop_job
: you specify a condition and those jobs that evaluate it to true will not be executed. This is useful when some of your jobs failed and you want to repeat only the failing jobs, not all of them [HowTo]+PreCmd
,+PreArguments
,+PostCmd
,+PostArguments
: These commands allow you to run some scripts before and/or after your executable. That is useful to prepare, convert, decompress, etc. your inputs and outputs if needed, or debug your executions [HowTo]notify_user
,notification
: use these commands if you want to receive a notification (an email) when your jobs begin, fail and/or finish [HowTo]if
...elif
...else
...endif
: since HTCondor version 8.4.0, a limited conditional semantic is available. You can use it to specify different commands or options depending on the defined/undefined variables, HTCondor version, etc.on_exit_hold
,on_exit_remove
,periodic_hold
,periodic_remove
,periodic_release
, etc.: you can modify the default behaviour of your jobs and the associated status. These commands can be used in a wide set of circumstances. For instance, you can force that jobs that are running for more than X minutes or hours will be deleted or get a on hold status (with this you can prevent that failing jobs will be running forever, since they will be stopped or deleted if they run for a much longer while than expected) or the opposite, hold those jobs that finish in an abnormal short time to check later what happened. Or you can also periodically release your held jobs, to run them on other machines if for any reason your jobs work fine on some machines, but fail on others [HowTo]deferrall_time
,deferral_window
,deferral_prep_time
: you can force your jobs begin at a given date and time. That is useful when the input data is not ready when submitting and your jobs have to wait till a certain time [HowTo]
Templates and examples
Here you can find basic templates of submit files, you can use them as starting point and then do the customizations needed for your executions. Check the examples in following sections for details and explanations.
Common Template
###################################################### # HTCondor Submit Description File. COMMON TEMPLATE # Next commands should be added to all your submit files ######################################################if !defined
FNAME FNAME = condor_execendif
ID = $(Cluster).$(Process)output
= $(FNAME).$(ID).outerror
= $(FNAME).$(ID).errlog
= $(FNAME).$(Cluster).loguniverse
= vanillashould_transfer_files
= YESwhen_to_transfer_output
= ON_EXIT
Explanation:
Let's analyse the common template:
- First block:
- Here we will define some variables that will be used later. The first of them is
FNAME
and first we ask with theif defined
condition whether that variable is not already defined (if so, we will use the previous value). This variable will contain the base name for the files where HTCondor will save the information displayed on the screen (stdout
andstderr
) and the log file. It is interesting to give a common name to those files generated by HTCondor so later we can identify and manage them together. Since all jobs will use the name specified there, we have to include a variable part that has to be different in each job, in order to avoid overwriting the files. We recommend you use a combination of$(Process)
(it contains the process ID that is different for each job) and$(Cluster)
(it contains the cluster ID that is different for each submission), as we have done when defining$(ID)
. In this way, different jobs and different submission will use different filenames and none of them will be overwritten.
- Here we will define some variables that will be used later. The first of them is
- Second block:
- With
output
command we force HTCondor to write in the specified file all the screen output (stdout
) generated by each job. We have used the variables$(FNAME)
and$(ID)
defined above. - With
error
command we managestderr
in the same way we did withoutput
. - Then we have also specified a HTCondor log file with
log
command. You should not use$(Process)
in the filename of the log since all jobs should share the same log.
- With
- Third block:
universe
: there are runtime environments in HTCondor called universes, we will mostly use the one namedvanilla
since it is the easiest one. This is the universe by default, so if you miss this command, your jobs will go also tovanilla
universe.should_transfer_files=YES
andwhen_to_transfer_output=ON_EXIT
commands are used to specify that input files have to be copied to the remote machines and output files must be copied back to your machine only when our program is finished. Although these commands are only needed when working with files, we recommend you always use them unless you are totally sure you can omit them.
Examples when working with input/output files and arguments
Most times you will want to run applications that deal with input and/or output files. Commonly, the input files will be located on your local machine, but since your application will be executed on other machine(s), it will be needed to copy your input files there, and then copy the result files back to your computer once your program is done. HTCondor have some commands to automatically do both operations in an easy way, so you do not need to worry about the file transfers: you just need to specify where your files are and HTCondor will copy them.
Note: All these examples will begin defining a specific variable FNAME
that contains the base name of the files that HTCondor will generate to save the stdout
, stderr
and log. Next, the common template explained above with be included using command include
(we assume that the common template filename is condor_common.tmpl
).
Example A (arbitrary filenames)
# Including Common Template FNAME = exampleA Explanation: We use With Then we use |
Example B (based on ProcessID, old system before HTCondor v8.4.0)
# Including Common Template FNAME = example2 Explanation:
With Then we define the variable At the end, we send all jobs to the queue with |
Example C (lists of files and arguments written in submit file)
# Including Common Template FNAME = exampleC Explanation: We will use the flexibility of At submission time, HTCondor will iterate over the list and expand the assignations. For instance, our jobs will have next values:
When using this format you can specify as many commands separated by commas as needed between Writing the list of items in the submit file can be a little bit tedious, but it may be easily done in an external file using scripts. Then you can directly specify the file. For instance, suppose you have all items in a file named
|
Example D (lists of files and arguments in external file)
# Including Common Template FNAME = exampleD Explanation: This example is similar to the previous one, but this time the list of input files and arguments is written in a file with the following format: input_file1,args1 input_file2,args2 input_file3,args3 ... To illustrate the slice feature, we have been asked to process only items (lines) from 28 to 43 with step 5 (28, 33, 38 and 43), this could be useful when we want to run only certain experiments. The syntax for the slices is very easy, the same as Python: We have to be careful with the results. Our program writes them in a file named |
Example E (
# Including Common Template FNAME = exampleE Explanation: The key of this example is the
We use Last thing we have to solve is the problem with the required input files (all #!/bin/bash |
Example F (loops)
# Including Common Template FNAME = exampleF Explanation: In this example we only need to simulate a 3 nested loops from a 1-level loop (we will use
Then we only need to set the limits ( For a 2-level loop, you can use next code: I = ($(Process) / $(MAX_J)) J = ($(Process) % $(MAX_J)) |
Example G: This example shows the use of several useful commands for specific conditions. It is also a summary of the HOWTOs, you can find further details and explanation about the submit commands there
- Execute
myprogram
with argument "-run
" from 0 to 99 by default. - BLOCK A: Execute only on machines with at least 4GB RAM and 2GB of free disk space. The higher memory and the faster calculations, the better (we can use KFLOPS to choose the faster machines doing floating point operations, but since memory and kflops have different units, we need to weight them, for instance, multiplying memory by 200).
- BLOCK B: Execute only on machines with Linux Fedora21 or upper and avoid executing on
cata
,miel
and those with hostname beginning with letterm
ord
. - BLOCK C: It is needed to run script
processData.sh
before (argument:-decompress
) and after (argument:-compress
) to prepare our data. - BLOCK D: Our executable needs the environment variables and variable
OUT
has to be set with the argument. - BLOCK E: Avoid black holes (when your jobs do not execute correctly on a machine, and since they finish quickly, that machine is getting most of the jobs).
- BLOCK F: Get a notification via email when errors in the job. If the job finishes before 5 minutes or takes more than 2 hours to be done, there was a problem: hold it to check later what happened.
- BLOCK G: Our program needs licenses, so we cannot run more than 20 jobs at the same time. Execute jobs as nice user to save priority since there are no other jobs running at this moment.
# Including Common Template FNAME = exampleG |
# ... #BLOCK D |
IMPORTANT: Although your program could use shared locations (/net/XXXX/scratch
, /net/nasX
, etc.) to read/write files from any machine so there is no need to copy files, we highly recommend you always use the HTCondor file transfer system to avoid network congestion since files will be accessed locally on the remote machines. Bear in mind that HTCondor can execute hundreds of your jobs at the same time, and if all of them concurrently access to the same shared location, network could experience a huge stress and fail. If for any reason you cannot copy files and you have to use shared locations -you are using huge files of several GB, etc.-, then contact us before submitting to adapt your jobs in order to avoid network congestion.
Submit file HowTo
NOTE: Submit File HOWTOs have been moved to their own page: HTCondor(4): Submit File (HowTo)
- How to ... add requirements on the target machines where my jobs will be run?
- How to ... add preferences on the target machines where my jobs will be run?
- How to ... get/set environment variables?
- How to ... control HTCondor notifications?
- How to ... run some shell commands/scripts/programs before/after our application?
- How to ... specify the priority of your jobs?
- How to ... deal with jobs that fail?
- How to ... limit the number of concurrent running jobs?
- How to ... do some complex operations in my submit file?
- How to ... work with nested loops?
- How to ... program my jobs to begin at a predefined time?
- How to ... run jobs that have dependencies among them?
- How to ... know the attributes of the machines where our jobs are run?
OLD Examples
This section presents several examples of submit files, from very basic examples to more complex ones, step by step. These examples were created for previous versions of HTCondor and since version 8.4.0 there are easier and more flexible ways to get the same results in most cases. However, we have left these old examples here since they may help you, but bear in mind that they may be obsolete.
- Example 1. Our first submit file: executable and arguments
- Example 2. Adding simple inputs and outputs:
stdin
,stdout
andstderr
- Example 3. Simple examples including input and output files
- Example 4. A more complex example, step by step
- Example 5. Working with more complex loops and macros
These examples will cover the most common cases based on our experience with IAC's users. If you want a complete documentation, you can run man condor_submit
in your shell, visit the condor_submit
page in the reference manual and/or the Submitting a Job section). Some more examples of submit description files are also available at HTCondor site.
Example 1. Our first submit file: executable and arguments ^ Top
The first thing you have to specify is the executable of the application to run and its arguments, and then launch the jobs. For that purpose we will use executable
, arguments
and queue
commands, respectively (note that commands are case insensitive). If your application is located in a private directory that is not accessible for other users and/or from other machines, then you need to add should_transfer_files
command and HTCondor will copy your application to the machines where it will be run.
In our first example we have developed an application called "myprogram
" located in the same directory where we are going to do the submission. We want to run it with 2 different sets of arguments -c -v 453
and -g 212
. Then our submit file will be the following one:
universe
= vanillashould_transfer_files
= YESexecutable
= myprogramarguments
= "-c -v 453"queue
arguments
= "-g 212"queue
We will explain here why we use each of these commands:
universe
: there are several runtime environments in HTCondor, we will mostly use the one namedvanilla
since it is the easiest one. This is the universe by default, so if you miss this command, your jobs will go also tovanilla
universe.should_transfer_files
: use it with valueYES
to specify that your files are not accessible and should be copied to the remote machinesexecutable
: Specify the name and path of your executable. The path can be absolute or relative (to the directory in which thecondor_submit
command is run). HTCondor will copy the executable to each machine where your job(s) will be run.arguments
: Specify the parameters of your application. There is an old syntax, but it is recommendable to use the new one enclosed by double quote marks. If you need to specify complex arguments including simple or double quote marks, check the new syntax in the argument list in HTCondor documentation.queue
: Place one job into the HTCondor queue, orN
if you usequeue <N>
.
Save this file (for example, call it myprogram.submit
) and do the submission in the same directory where your program is located:
[...]$ condor_submit
myprogram.submit
That is all, your jobs will be added into the HTCondor queue, you can check it running condor_q
.
Example 2. Adding simple inputs and outputs: stdin
, stdout
and stderr
^ Top
Now we will deal with inputs and outputs. Let's configure three HTCondor jobs to print "Hello World!" and the ID of each job. We will use OS command echo
so outputs will be printed in stdout
(the screen), but since we cannot access to the screen of other machines when running the jobs, we should find the way to save these outputs to files. Of course, each job should write a different file and it may be
interesting to store them in a separated directory, for instance an existing one called output_files
. Also we may want to see any errors (from stderr
) and save a log file. The resulting HTCondor submit file could be the next one:
# First block N = 3universe
= vanillashould_transfer_files
= YESinitialdir
= /path/to/filesinput
=output
= echo_example.$(Cluster).$(Process).outerror
= echo_example.$(Cluster).$(Process).errlog
= echo_example.$(Cluster).log # Second blockexecutable
= /bin/echotransfer_executable
= Falsearguments
= "Hello World, I am job: $(Process)!"queue
$(N)
Let's analyze this example:
- First block:
- The first line contains a macro declaration,
N = 3
, so from that point we can use that macro writing$(N)
(you must use parenthesis,$N
is NOT valid). should_transfer_files = YES
command is used to specify that files should be copied to/from the remote machines.- Then with
initialdir
we specify the path to input and output files (not the executable), it can be an absolute path or relative (to the directory in which thecondor_submit
command is run). If your files are in the same directory where you are doing the submission, then you do not need to use this command. input
command is empty since we do not need it in this example. But if you run your program in this way:myprogram < data.in
, then you should add next commandinput = data.in
.- With
output
command we force HTCondor to write in the specified file all the screen output (stdout
). Note that to avoid all jobs writing in the same file, we have used the$(Cluster)
macro (it is an ID of each submission) and the$(Process)
macro (it is an ID given to each job, from 0 toN-1
). - With
error
command we managestderr
in the same way we did withoutput
. - Then we have also specified a log file with
log
command.
- The first line contains a macro declaration,
- Second block:
- We specify the name of your application using
executable
command (we set it to/bin/echo
). - Since the executable is an OS command available in each machine, it is not needed that HTCondor makes a copy to each machine, so we have used
transfer_executable = False
to avoid that. arguments
command specify the arguments of your program. We have use the predefined$(Process)
macro so each job will print its own ID. This can be used also like a counter or loop in your arguments.- At the end we send
N
jobs to the queue usingqueue <N>
command.
- We specify the name of your application using
If we save the submit file with name echo.submit
and send it to the queue using condor_submit echo.submit
(let's suppose it gets Cluster ID 325), the result should be something like the following one, assuming we are located in the directory where we did the submission:
./echo.submit /path/to/output_files/echo_example.325.0.out # (content: Hello World, I am job: 0!) /path/to/output_files/echo_example.325.1.out # (content: Hello World, I am job: 1!) /path/to/output_files/echo_example.325.2.out # (content: Hello World, I am job: 2!) /path/to/output_files/echo_example.325.0.err # (content: Empty if no errors) /path/to/output_files/echo_example.325.1.err # (content: Empty if no errors) /path/to/output_files/echo_example.325.2.err # (content: Empty if no errors) /path/to/output_files/echo_example.325.log # (content: Info about jobs execution)
HTCondor is mainly designed to run batch programs and they usually have no interaction with users, but if your program needs any input from the stdin
(i.e. keyboard), you can specify it writing all the inputs in a file and then using input
command to indicate that file, with the same syntax as the output
command.
Example 3. Simple examples including input and output files ^ Top
Now we know how to specify standard inputs and outputs, let's see how we can deal with input and output files. We will study two different situations to see how we can solve each one, depending on whether our executable accepts arguments for input/output files or not.
Example 3A. We can specify our input/output files as arguments ^ Top
Suppose that we have developed an application called myprogram
that needs two arguments, the first one is the name of the input file and the second one is the name of the output file that will be generated. We usually run this application in the following way:
./myprogram /path/to/input/data.in data.out
We have 300 different input data files named data0.in
, data1.in
, data2.in
, ..., data299.in
and we want to use HTCondor to execute them (each job will process a different input file). Then we just need to write the next submit file to execute jobs in HTCondor:
N = 300 ID = $(Cluster).$(Process) FNAME = example3Aoutput
= $(FNAME).$(ID).outerror
= $(FNAME).$(ID).errlog
= $(FNAME).$(Cluster).loguniverse
= vanillashould_transfer_files
= YESwhen_to_transfer_output
= ON_EXITtransfer_input_files
= /path/to/input/data$(Process).intransfer_output_files
= data$(Process).outexecutable
= myprogramarguments
= "data$(Process).in data$(Process).out"queue
$(N)
This submit file is similar to previous examples. We have defined some useful macros (ID
and FNAME
) to avoid writing the same text several times, and we have also used some new commands like transfer_input_files
to specify input files and transfer_output_files
for the output files (if you need to specify several input and/or output files, use a comma separated list). Remember we have to activate the HTCondor copying files mechanism using should_transfer_files
command, and we have also used when_to_transfer_output
to tell HTCondor that it should only copy the output files when our program is finished. If you do not use transfer_output_files
command, then HTCondor will copy all generated or modified files located in the same directory where your application was executed (see this FAQ for more info).
You do not need to deal with copying files, HTCondor will copy the input files from the specified location on your machine to the same directory where your program will be executed on the remote machine (that is why we have used no path for the input file in the arguments
command, since that file will be in the same place as the executable). Once your program is finished, HTCondor will copy the output file from the remote machine to yours and it will be located in the same directory where you did the submission (remember you can change this behaviour with initialdir
command).
In this example we have supposed that input files have a convenient name, containing a known pattern that includes a consecutive number from 0
to N-1
. This is the easiest situation, and although it is not strictly needed to rename your input files, we recommend you change filenames to make much easier to specify them using HTCondor commands. There are several simple ways to rename your files, like using the rename
linux command, a bash script, etc. For instance, if your input files have different names, but all of them have .in
extension, then next simple bash script will do the work renaming all of them so the result will be data0.in
, data1.in
, data2.in
, ..., data299.in
following alphabetic order (you can modify it to use your own criteria, save the equivalence between old and new names, etc):
#!/bin/bash n=0cd
/path/to/input/for
filein
*.indo
mv
$file
data$n
.in n=$((n+1))done
Example 3B. We cannot specify arguments ^ Top
Sometimes our executable does not accept arguments and it needs to find some specific files. For instance, suppose that our application myprogram
needs to find an input file called data.in
in the same directory where it will be executed and then it will produce an output file called data.out
, also in the same directory. Again, we will also assume that we have all our input files in /path/to/input/
, so we have to prepare them. Since all the files must have the same name, we cannot use the same directory, so we are going to create directories with names input0
, input1
, input2
, ..., input299
and each of these directory will contain the pertinent data.in
file. To do that, we can use a bash script like the next one:
#!/bin/bash n=0cd
/path/to/input/for
filein
*.indo
mkdir
input$n
mv
$file
input$n
/data.inecho
"$file -> input$n
/data.in" >> file_map.txt n=$((n+1))done
Last script simply creates a new directory and move into it the input file, renaming it as data.in
. We have also added a extra line to create a file called file_map.txt
that will include a list with the original and the new name and location for each file, that could be useful to identify later the outputs. Now we need to write the submit file:
N = 300 ID = $(Cluster).$(Process) fname = example3Boutput
= $(fname).$(ID).outerror
= $(fname).$(ID).errlog
= $(fname).$(Cluster).loguniverse
= vanillashould_transfer_files
= YESwhen_to_transfer_output
= ON_EXITtransfer_input_files
= /path/to/input/input$(Process)/data.intransfer_output_files
= data.outtransfer_output_remaps
= "data.out=data$(ID).out"executable
= myprogramarguments
= ""queue
$(N)
We have introduced a few changes in the submit file. Now we will use transfer_input_files
to choose the proper data.in
file according to the directory of each job. Output files will be copied to the same directory where the submission is done and since all of them will have the same name, we need to avoid that they will be overwritten using transfer_output_remaps
command. With that command we will rename all output files to include the ID
.
Sometimes we want that the output files will be located in the same directory where the related input file is placed. Then, since output files will be in different directories, there is no need to change their names. In these situations, we can remove the transfer_output_remaps
command and use instead the initialdir
command to specify that HTCondor should use a different directory for both input and output files in each execution (this will not affect the executable file):
initialdir
=/path/to/input/input$(Process)transfer_input_files
= data.intransfer_output_files
= data.out
Note: Using known patterns and consecutive numbers as names of files makes very easy that you can specify input and output files in HTCondor, and you only need to use simple linux commands and/or bash scripts to rename these files (always keep a backup of your original files!). However, there are other ways to work with HTCondor if for any reason you do not want or you cannot change the names of your files.
Also remember that if you specify directories with transfer_input_files
and transfer_output_files
and they finish with a slash ("/
"), HTCondor will copy the content of the directories, but not the directory itself. That can be used to copy input or output files without knowing their names, we only need to place them in a pertinent directory structure, using a bash script like that presented in example 3B (but without changing the name of the files). Also if your application is able to use the stdin
to get the name of the files, you can write those names in another file with a known pattern and then specify that file using a HTCondor input
command.
Also you can add in your submit file some more commands that could be very useful when dealing with inputs and output files. For instance, preCmd
and postCmd
commands allow you to run scripts or shell commands before and after executing your program, respectively, so you can use them to rename or change the location of your input and output files, or any other operation that you may need. You have more information about these commands in Submit File (HowTo) section.
Example 4. A more complex example, step by step ^ Top
This example should be enough to run HTCondor jobs in most common situations. In this example, assume that we have an application called myprogram
that accepts two arguments: the first one is the input file to be processed, where each line is a set of values that can be independently computed. The second argument is the name of the output file that will be created with the results.
In our example, we have a huge input file with several thousands of lines, called data.in
and it takes quite a long time to be computed (several days), so we will use HTCondor to reduce this amount of time. What we are going to do is to split the huge input file in N
smaller files with names data0.in
, data1.in
, ..., data(N-1).in
and create a HTCondor job to process each one.
The first step is to decide how many files we will create. Since each file will be a HTCondor job, this is a critical step, we have to make our decision according to next criteria:
- We should create a relatively large number of jobs, at least a few hundreds of them. If we split our input in just 2 files, that means that there will be only 2 jobs to be executed by HTCondor, so the maximum speedup we could get is 2 (our results will be ready in half time compared to a normal serial execution). But if we generate 100 hundreds jobs, then we could get a time factor reduction of 100x, or 500x if we generate 500 jobs... Of course, this is always a theoretical limit, it is almost impossible to reach it (all jobs have several overheads, probably there will be more users running jobs with HTCondor, the number of idle machines is always changing, your jobs could be evicted and restarted later, etc.), but generating a large number of jobs will increase your chances to get your results in less time. If you are wondering how much speedup you can get, on average HTCondor has around 350 idle slots at working hours, but at nights or weekends there could be peaks of about 600 idle slots. Anyway, you can generate as many jobs as you want, even several thousands of them, HTCondor will manage it and run your jobs when slots get idle. A large number of short jobs could be more efficient than a low number of long ones, but also bear in mind that transferring input and output files consumes resources and time: if your jobs need that HTCondor transfers many/long files to/from remote machines, then you may need to significantly reduce the number of jobs to avoid overloading the network and also to decrease the total time consumed by those file transfers.
- Most times the number of jobs has to be chosen according to the estimation of the time a job needs to be processed. We should not choose jobs that only last few seconds/minutes, because executing a job has an overhead (communications, creating the execution environment, transferring files, etc.), so if your job is too short, it could happen that this overhead takes more time than executing your program. On the other hand, if your jobs need several hours to be finished, it is likely they will be suspended/killed and restarted from the beginning many times, so the final consumed time could be really high. There is not a fixed rule about the duration of your jobs and sometimes you cannot choose it... But if you can choose, a job that needs from 10 to 30 minutes to be done should be fine (the bigger the files you need to transfer, the larger the jobs should be to reduce the total number of jobs and, therefore, the amount of file transfers). When possible, avoid those large jobs that need more than one hour to be processed, unless heavy file transfers are involved (if files are so big, consider using a share location like
scratch
instead of copying them to all remote machines, and then add a limit to the number of concurrent jobs).
For instance, our original data.in
file has 97564 lines and we will try to follow these recommendations when splitting it. Before choosing the number of jobs, we need to run some tests to have an estimation about how much time our program needs to process different inputs. For example, suppose we have already done those tests and, on average, our program needs about 4 second per line, so it can process 250 lines in around 17 minutes. If we split our huge file in smaller ones of 250 lines each, then we will have 391 files. That means 391 jobs will be generated, what is a good amount. Since we just need to transfer one input file and one output file and their sizes will be about just a few KB, this time it is not needed to think about the overhead of file transfers. If we are really lucky and HTCondor is able to immediately execute all our jobs at the same time, then we could get our results in about 17 minutes. It is almost sure that will not happen, we may need to wait some more minutes or hours, but we will get our results much faster than a serial execution that needs 97564 * 4 seconds to be processed, almost 5 days.
So, we have finally chosen N = 391
. Next step should be to split our file, that could be easily done with Linux commands like split
or awk
. For example, see next command:
awk
'{filename = "A
" int((NR-1)/B
) "C
"; print >> filename}'D
where A
: prefix of the output file, B
: number of lines to split, C
: postfix of the output file and D
: input file. When used, this command will split the input file (D
) in files containing a number of C
lines each and named A
0C
, A
1C
, A
2C
, ...
Then we will use that command in the next way: A = data
, B = 250
, C = .in
and D = data.in
[...]$ awk
'{filename = "data" int((NR-1)/250) ".in"; print >> filename}' data.in
After executing the previous command, we will have 391 files of 250 lines each (except the last one), from data0.in
to data390.in
, what means we are going to execute 391 jobs. Then, we will also name our output files in the same way: data0.out
, data0.out
, ..., data390.out
. At this point we are ready to create our submission file, we only need to specify what the executable is, the arguments, the inputs and outputs and where to find them.
If for any reason you want to include a header, you can use next command:
[...]$sed -i
'1i
Write your header here...' data*.in
To process all files we need to change the arguments in each execution. We could explicitly do that writing N
times the proper argument
and queue
commands in the submit file, but this is a very awful way to solve the problem, besides other factors. A much simpler (and elegant) way is to use a loop, from 0 to 390 (N - 1
), to generate all the arguments. To simulate this loop, we could try to write an script (for instance, a bash script) in order to generate N
submit files where each one has the correct arguments, but again this is not the best solution: managing 391 HTCondor submit files is bothersome and, even worse, efficiency will be reduced: every time you do a submission, HTCondor will create a Cluster for that execution, what involves an overhead, so we should try to create only one cluster with N
jobs rather than N
clusters with only one job each. To solve this problem, HTCondor offers us a simple way to process this loop: we can use the $(Process)
macro, so each job will have a different value from 0
to N-1
. Then, the HTCondor submit file should be similar to the following one:
# Set number of jobs to execute N = 391 ID = $(Cluster).$(Process)output
= myprogram.$(ID).outerror
= myprogram.$(ID).errlog
= myprogram.$(Cluster).loguniverse
= vanillashould_transfer_files
= YESwhen_to_transfer_output
= ON_EXITtransfer_input_files
= data$(Process).intransfer_output_files
= data$(Process).outexecutable
= myprogramarguments
= "data$(Process).in data$(Process).out"queue
$(N)
The final submit file shown above is very simple and easy to understand. The first blocks were explained in the previous example, we just defined a new macro called ID
to make some commands shorter. Then, should_transfer_files
command is again used to force the file transfers and we have added a when_to_transfer_output
command to tell HTCondor that the files should be transferred after completion.
The key of this example is the transfer_input_files
and transfer_output_files
commands. With these two commands we tell HTCondor which files have to be copied to the remote machine before executing the program and which files have to be copied back to the machine where the submission was done as results. Before queueing the jobs, we use the arguments
command to specify the name of the input file (first argument) and the output file (second argument).
And that is all: HTCondor will expand $(Process)
macro in every job, so it will copy the file data0.in
to the remote machine where job number 0
will be executed with arguments "data0.in data0.out
" and, afterwards, will copy data0.out
back to the submit machine, and so on with all remaining jobs till N - 1
.
Some remarks to this example:
- NOTE 1: We are supposing that our inputs and outputs are not in a shared directory so it will not be accessible from other machines where your jobs will be run. It might be possible to solve these problems changing your application and using shared locations, like those in
/net/<your_machine>/scratch/...
, but this solution is highly not recommendable, moreover if you are using big files or many of them and your application is constantly accessing them to perform read/write operation. If you do so, a big amount of concurrently access may produce locks and a considerable slowdown in your and others' computer's performance. To avoid that, it is a much better idea to copy your input files to the target machine where your job will be run and then bring the results back to your machine. You do not need to take care of this copying process, HTCondor will do all the work for you, the only thing you need to do is use HTCondor commandstransfer_input_files
andtransfer_output_files
to specify where files and directories to be copied are located. If you cannot avoid intensive accesses to your files located in shared resources likescracth
, then consider the possibility of limiting your concurrent running jobs. - NOTE 2: We are assuming here that all inputs and outputs are located in the same directory where the submission will be done. If that is not true, we can specify absolute or relative path (to the submission directory) in the
transfer_input_files
command, or useinitialdir
command as explained in the previous example, affecting to both input and output files. Remember that when usingtransfer_input_files
ortransfer_output_files
you can also specify a directory to be copied to the remote machine. If you specify a long path, HTCondor will not create it all, just the last level (if you want to copy only the content and not the directory itself, add an slash at the end of the directory). For instance, suppose thatdata_inputs
directory only contains a file calleddata1.dat
:
Command | Exec Dir @ remote machine |
transfer_input_files = /path/to/inputs/data_inputs/data1.dat
| data1.dat
|
transfer_input_files = /path/to/inputs/data_inputs
| data_inputs (and its content)
|
transfer_input_files = /path/to/inputs/data_inputs/
| data1.dat
|
tree
to see where files and directories will be placed when executing.
- NOTE 3: Another assumption is that we can specify arguments to our executable. That is now always true, it could happen that the executable is expecting to find files with predefined names, for example,
data.in
as input and it will generatedata.out
as output. If we cannot change this behaviour (for instance, we do not have access to the source code), we need to do some small modifications. The first step is to change ourawk
script for splitting files in order to place every resulting file in a different directory (dataXX/
), but with the same name (data.in
), so our inputs will be located indata0/data.in
,data1/data.in
, ...,data390/data.in
. Then, we will add next commands in the submit file (following lines should be placed before thequeue
command):
Initialdir
= data$(Process)arguments
= ""
Initialdir
command we are specifying that HTCondor has to search for the inputs in that directory (it will be different for each job), and output files will be also placed in that directory. For instance, job with ID 34
will transfer the input file located in data34/data.in
and after the execution it will place the output file in data34/data.out
.
Initialdir
command and changing our submit file with next commands:
transfer_input_files
= data$(Process)/data.intransfer_output_files
= data.outtransfer_output_remaps
= "data.out=data$(Process).out"arguments
= ""
transfer_input_files
command we specify that every data.in
have to be copied from the proper directory. Then we use transfer_output_files
to copy back the output file, but since all the output files will have the same name, we need to use transfer_output_remaps
to change the name and avoiding all jobs overwriting the same file, so they will be renamed to data0.out
, data1.out
, ... data390.out
(this command ONLY works with files, NOT with directories). Finally, we do not specify any arguments since the names of the files are those expected by the executable.
+PreCmd
and/or +PostCmd
commands to run shell commands/scripts/programs before and/or after your main executable, so you can use this commands to rename or move your input and output files. See Submit File (HowTo) section for more information.
- NOTE 4: If we want to change the number of lines per file, we do not need to change the submit file. For instance, now we want files with 350 lines so after running the
awk
command, we will have 279 input files andN = 279
. Then we can use the same submit file and change the value ofN
when doing the submission using the-append
options, that allows us to change the value of existing macros or define new ones:
[...]$condor_submit
myprogram.submit-append
'N = 279'
Example 5. Working with more complex loops and macros ^ Top
After studying simple loops where we directly use the $(Process)
macro from 0
to N -1
, we will see some more complex situations where we need to do some operations with macros. Now assume that we have developed an application called myprogram
that needs the following inputs:
- We have to specify next arguments,
-init XX -end YY
:- First job (ID:
0
):-init 0 -end 99
- Second job (ID:
1
):-init 100 -end 199
- ...
- Last job (ID:
N-1
):-init [N*100] -end [((N+1)*100)-1]
- First job (ID:
- The application expects to find the following files and directories located in the same directory where it will run, although right now they are in different locations:
- a common file (it does not depend on the arguments) called
data.in
located in/path/to/inputs/data.in
- all files located inside
/path/to/inputs/data_inputs
directory - a specific directory called
specific-XXX/
(whereXXX
is the value of the-init
argument) located in/path/to/inputs/specific-XXX/
- a common file (it does not depend on the arguments) called
With these inputs, our program will produce next outputs in the same directory where it was executed:
- A file called
data.out
- A directory called
data_outputs-XXX
(whereXXX
is the value of the-init
argument) with many files inside
We will present the HTCondor submit file for this situation and it will be discussed right after:
# Set number of jobs to execute N = 50 ID = $(Cluster).$(Process)output
= myprogram.$(ID).outerror
= myprogram.$(ID).errlog
= myprogram.$(Cluster).logshould_transfer_files
= YESwhen_to_transfer_output
= ON_EXITuniverse
= vanilla # Step in arguments STEP = 100 init = $$([$(Process) * $(STEP)]) end = $$([(($(Process) + 1) * $(STEP)) -1]) BDIR = /path/to/inputs+TransferInput
= "$(BDIR)/data.in, $(BDIR)/data_inputs/, $(BDIR)/specific-$(init)"+TransferOutput
= "data.out, data_outputs-$(init)"transfer_output_remaps
= "data.out=data-$(init).out"executable
= myprogramarguments
= "-init $(init) -end $(end)"queue
$(N)
Let's skip the first and second blocks since we have explained those commands in previous examples (we have just set N = 50
in this example, we can change this value when submitting if we use -append
option). In the third block we have used a special syntax \$\$(\[...\])
to define macros init
and end
. With this syntax we specify that we want to evaluate the macro, allowing arithmetic operators like *
, /
, +
, -
, %
, ... If you need complex macros, there is a number of operators, predefined functions, etc. (for instance, eval()
could be very helpful, or other functions to manipulate strings, lists, ...) and also other [[predefined macros -> http://research.cs.wisc.edu/htcondor/manual/v8.6/3_5Configuration_Macros.html#SECTION00451800000000000000] that you can use to generate random numbers, randomly choose one value among several of them, etc.
Most HTCondor commands will use the resulting value when expanding these macros, but unfortunately that does not work for all commands. For instance, transfer_input_files
and transfer_output_files
commands do a simple expansion, but do not evaluate the operations, so instead of getting the directory specific-100
, you will get specific-$$([$(1) * $(100)])
. To avoid that, we have to use other commands that correctly expand complex macros and have similar functionality. In this case +TransferInput
and +TransferOutput
respectively do the same with similar syntax (they expect strings, so you have to use quotes). We have also defined a simple macro BDIR
to avoid writing the path several times.
According to the written commands in the third and fourth blocks, the behaviour of this submit file will be the next one:
- Inputs: when our program runs in other machine(s), it will find next structure in its local directory:
data.in
(file), all the content ofdata_inputs
(but NOT the directory itself) andspecific-XXX
(directory and its content). - Outputs: On the outputs side, once all jobs have finished, we should find in our machine the next structure in the same directory where we did the submission:
data.out
(file) and one directory calleddata-XXX
for each job (whereXXX
is the value of each-init
argument). Note that we have a problem because our application always name the result file withdata.out
, so all jobs will override it in destination. To avoid that, we use thetransfer_output_remaps
command to specify thatdata.out
file has to be renamed todata-XXX.out
and then all results will be copied in different files (this command ONLY works with files, NOT with directories).
Some more useful commands and info
If you have some issues when creating submit files or running your jobs, please, check the HOWTOs and FAQs pages, since there you could find some more examples or visit the useful commands page. Much more information is available at the official documentation about HTCondor and the Howto recipes. If you need further support, just contact us.
Check also:
- HTCondor(1): Introduction
- HTCondor(2): Useful Commands
- HTCondor(3): Submit files (description & examples)
- HTCondor(4): Submit files (HowTo)
- HTCondor(5): FAQs
- HTCondor(6): HTCondor and IDL