Why does my Job not start?
First you should run condor_q -analyze. Often it says why the job does not start. In the MorfeusConsole you can execute this command by right clicking on the corresponding job ("Show job analysis").
I send my jobs via the terminal server NWZCitrix. When I log back in later, I don't see them in the queue anymore, even though they haven't been executed yet. Why?
The terminal server consists of many different nodes, each with its own queue. If you log on again, you will often not reach the same node, so that the jobs sent before are not in the local queue. But with condor_q -global (or by clicking on the corresponding check mark in the MorfeusConsole) the jobs of all queues can be displayed. Your jobs should also appear here again.
How do I find out on which computer my job is currently being calculated?
The MorfeusConsole displays this information in the last column of the queue.
At a command prompt, use the following command: condor_q -run
Sometimes my program runs in the grid, sometimes it doesn't. What could be the reason?
You may not have specified the requirements of your program correctly in your job description file. For example, if your program requires 2 GB of memory, but you didn't specify this, it depends on chance whether you end up on a PC with enough memory or not. Also be careful not to send 64-bit programs to 32-bit nodes.
I've always been able to use Morfeus without any problems. But since a short time the jobs don't start anymore. They remain in idle state and the log file only writes something like "000 (072.000.000) 05/04 10:25:11 Job submitted from host: <220.127.116.11:1042>".
Have you recently changed your password? If so, you will need to run "condor_store_cred delete" once on the computer from which you submitted the job and then run "condor_store_cred add" again to tell Condor your new password.
How do I delete my own job from the queue?
In the MorfeusConsole you can easily remove jobs from the queue by right clicking on them.
If you work with the command line, you have to distinguish between two cases:
- If you want to delete the job from the computer from which the job was started, run:
- If you want to delete the job from another computer, run:
condor_rm -name COMPUTERNAME ID
ID stands for the job ID, i.e. the number assigned to the job. You can get this number from condor_q
COMPUTERNAME stands for the name of the computer. The name of the computer from which you sent the job can be found in condor_q -global. After you execute the command, the name of the computer is listed in the -- SCHEDD:line.
When I use condor_q to look for my job, there is an H in the ST column. What does that mean?
The job has the status "Held", i.e. it is currently not being processed. A job is automatically set to this status if there are too many problems with it. The problem does not necessarily have to be with your job, but can also be caused by short-term disturbances in the Morfeus Grid (e.g. network problems).
If you want to release your job again, you can do this by right-clicking on your job in the MorfeusConsole, or by typing the command condor_release into the command prompt.
For a specific job with the job ID number. type: condor_release nr
If you want to release all jobs again, type: condor_release -all
If you sent the job from another computer, you must specify the option "-name", see previous question.
If you have any further questions, please contact us by e-mail with the Hotline.