http://markmail.org/message/ps5uxpwcd6bocbx3
I found the following post on the gridengine.info blog:
"Feedback needed: Obsolete options and parameters considered for removal
Posted by chris on 24/06/2008
Grid Engine developers posted a list today of SGE configuration parameters and client arguments that are being considered for removal from the product because they are either obsolete or they duplicate settings found elsewhere.
<snip>
- qmaster_params merge ACCT_RESERVED_USAGE and SHARETREE_RESERVED_USAGE We can't imaging a use case to have these values separated"
Our reason for needing them separated is that our Cern grid users use CPU/Wallclock to determine the efficiency of their jobs on any given system. In general, this is a useful metric to have.
Indeed, the accounted CPU time should, in my mind, always represent the amount of CPU time consumed and not the wallclock value (or a derivation from it).
I understand that the ACCT_RESERVED_USAGE option is there to give a value for wallclock*slots in the case where you wish to account for time * slots used, but this seems the wrong place to put it. Also, it doesn't seem to do that for me - at least when using our OpenMP environment: [root@eddie01 ~]# qconf -sp OpenMP pe_name OpenMP slots 1400 user_lists NONE xuser_lists NONE start_proc_args /bin/true stop_proc_args /bin/true allocation_rule $pe_slots control_slaves TRUE job_is_first_task FALSE urgency_slots min
and with ACCT_RESERVED_USAGE=true and SHARETREE_RESERVED_USAGE=false (trimmed output):
[orichard@frontend02 scripts]$ qacct -j 1445999 ============================================================== jobnumber 1445999 qsub_time Wed Oct 22 12:05:14 2008 start_time Wed Oct 22 12:05:48 2008 end_time Wed Oct 22 12:06:09 2008 granted_pe OpenMP slots 2 failed 0 exit_status 0 ru_wallclock 21 ru_utime 0 ru_stime 0 cpu 0
For the SHARETREE_RESERVED_USAGE - we balance use on our system based on the amount of time a job occupies a slot, regardless of whether it is using the CPU or not (we have one slot per cpu), for which we require SHARETREE_RESERVED_USAGE=true.
-- Orlando.
Orlando Richards wrote:
Hi folks,
We seem to have a problem with CPU time always being accounted as equal to Wallclock time (or sometimes 1s higher) - even if the job is just a "sleep 20s" job. The UTIME and STIME report correctly though.
We're running SGE 6.1u4.
We have execd_params SHARETREE_RESERVED_USAGE=TRUE \ ACCT_RESERVED_USAGE=FALSE
so would expect the CPU time to be recorded as roughly UTIME + STIME - but this is not the case.
I tried setting SHARETREE_RESERVED_USAGE to FALSE as well, to see if it made any difference, and suddenly we get the expected behaviour (CPU time = 0, wallclock = 20).
Does anyone know if this is expected behaviour? Is there anything we can do to correct it?
Sample qacct -j JOBID output for a 20s sleep job:
============================================================== qname ecdf hostname node005.beowulf.cluster group is_iti_ug owner orichard project ecdf_baseline department defaultdepartment jobname simple.sh jobnumber 1445888 taskid undefined account sge priority 5 qsub_time Wed Oct 22 11:51:42 2008 start_time Wed Oct 22 11:52:18 2008 end_time Wed Oct 22 11:52:38 2008 granted_pe NONE slots 1 failed 0 exit_status 0 ru_wallclock 20 ru_utime 0 ru_stime 0 ru_maxrss 0 ru_ixrss 0 ru_ismrss 0 ru_idrss 0 ru_isrss 0 ru_minflt 1622 ru_majflt 0 ru_nswap 0 ru_inblock 0 ru_oublock 0 ru_msgsnd 0 ru_msgrcv 0 ru_nsignals 0 ru_nvcsw 30 ru_nivcsw 4 cpu 20 mem 40.020 io 0.000 iow 0.000 maxvmem 103.973M
-- -- Dr Orlando Richards Information Services IT Infrastructure Division Unix Section Tel: 0131 650 4994
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
--------------------------------------------------------------------- To unsubscribe, e-mail: user...@gridengine.sunsource.net For additional commands, e-mail: user...@gridengine.sunsource.net
#################################################################
> 다음은 job 파일인 openmp.sh 의 내용입니다.
> --------------------------------------
> #!/bin/bash
> #$ -V
> #$ -cwd
> #$ -N openmp_job
> #$ -pe openmp 16
> #$ -q long
> #$ -R yes
> #$ -wd /work02/htjou/op2
> #$ -l h_rt=05:00:00
> #$ -M htjou@kordi.re.kr
> #$ -m e
> export OMP_NUM_THREADS=16
> time /work02/htjou/op2/test.x
> cleanipcs
> exit 0
'Knowledge Base > Linux' 카테고리의 다른 글
Updating RedHat/CentOS Kickstart with new drivers (1) | 2009.12.07 |
---|---|
우분투 pure-ftpd 설치 (0) | 2009.11.19 |
How to Upgrade Sun Grid Engine (SGE) and Migrate to New Server (0) | 2009.08.31 |
mpich2 컴파일 옵션(mpd, smpd, gforker) (0) | 2009.08.31 |
Tight MPICH2 Integration in Grid Engine (0) | 2009.08.31 |