http://idolinux.blogspot.com/2010/04/mpich2-with-sun-grid-engine.html
MPICH2 with Sun Grid Engine
A prerequisite to this install log is a configured and validated Sun Grid Engine (SGE) 6.2 installation. Read more about that at Deploying Sun Grid Engine on a Cluster.
There are multiple ways to integrate MPICH2 and SGE:
This build log will cover the first method with mpd. Download, build an install MPICH2.
Configure the queue shell.
Download and install integration scripts.
Create the parallel execution environment template file by pasting the following into a mpich2_mpd.template file:
Import the pe config.
Add the pe to the allowed list in your queue.
Enable killing of the processes at the end of a job.
As a normal user, submit a test job.
mpich2_mpd.sh
mpihello.c
Check output.
Test MPICH2 without SGE.
References:
MPICH2 is a high-performance and widely portable implementation of the Message Passing Interface (MPI) standard (both MPI-1 and MPI-2). The goals of MPICH2 are: (1) to provide an MPI implementation that efficiently supports different computation and communication platforms including commodity clusters (desktop systems, shared-memory systems, multicore architectures), high-speed networks (10 Gigabit Ethernet, InfiniBand, Myrinet, Quadrics) and proprietary high-end computing systems (Blue Gene, Cray, SiCortex) and (2) to enable cutting-edge research in MPI through an easy-to-extend modular framework for other derived implementations. -www.mcs.anl.gov
There are multiple ways to integrate MPICH2 and SGE:
- Tight Integration of the mpd startup method
- Tight Integration of the daemonless smpd startup method
- Tight Integration of the daemon-based smpd startup method
- Tight Integration of the gforker startup method
This build log will cover the first method with mpd. Download, build an install MPICH2.
# cd /usr/global/src/
# wget http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/1.2.1p1/mpich2-1.2.1p1.tar.gz
# tar xzvf mpich2-1.2.1p1.tar.gz
# cd mpich2-1.2.1p1
# ./configure --prefix=/usr/global/mpich2-1.2.1p1 2>&1 |tee config.log
# make 2>&1 |tee build.log
# make install 2>&1 |tee install.log
# ln -s /usr/global/mpich2-1.2.1p1 /usr/global/mpich2
# export PATH=/usr/global/mpich2/bin:$PATH
# export LD_LIBRARY_PATH=/usr/global/mpich2/lib:$LD_LIBRARY_PATH
# which mpicc
/usr/global/mpich2/bin/mpicc
Configure the queue shell.
# qconf -mq all.q
shell /bin/bash
shell_start_mode unix_behavior
Download and install integration scripts.
# cd /usr/global/src/
# wget http://gridengine.sunsource.net/howto/mpich2-integration/mpich2-62.tgz
# cd /usr/global/sge/
# tar xzvf /usr/global/src/mpich2-62.tgz
# cd mpich2_mpd/src
# ./aimk
# ./install.sh
Create the parallel execution environment template file by pasting the following into a mpich2_mpd.template file:
pe_name mpich2_mpd
slots 64
user_lists NONE
xuser_lists NONE
start_proc_args /usr/global/sge/mpich2_mpd/startmpich2.sh -catch_rsh $pe_hostfile \
/usr/global/mpich2
stop_proc_args /usr/global/sge/mpich2_mpd/stopmpich2.sh -catch_rsh \
/usr/global/mpich2
allocation_rule $round_robin
control_slaves TRUE
job_is_first_task FALSE
urgency_slots min
accounting_summary FALSE
Import the pe config.
# qconf -Ap mpich2_mpd.template
Add the pe to the allowed list in your queue.
# qconf -mq all.q
pe_list make mpich2_mpd
Enable killing of the processes at the end of a job.
# qconf -mconf
execd_params ENABLE_ADDGRP_KILL=TRUE
As a normal user, submit a test job.
$ mkdir ~/test
$ cd ~/test
$ echo 'export PATH=/usr/global/mpich2/bin:$PATH' >>~/.bashrc
$ echo 'export LD_LIBRARY_PATH=/usr/global/mpich2/lib:$LD_LIBRARY_PATH' >>~/.bashrc
$ . ~/.bashrc
$ which mpicc
/usr/global/mpich2/bin/mpicc
$ echo 'MPD_SECRETWORD=mr45-j9z' >~/.mpd.conf
$ chmod 600 ~/.mpd.conf
$ mpicc -o mpihello mpihello.c
$ qsub mpich2_mpd.sh
mpich2_mpd.sh
#!/bin/bash
# Export all environment variables
#$ -V
# Your job name
#$ -N test
# Use current working directory
#$ -cwd
# Join stdout and stderr
#$ -j y
# PARALLEL ENVIRONMENT:
#$ -pe mpich2_mpd 4
export MPICH2_ROOT=/usr/global/mpich2
export PATH=$MPICH2_ROOT/bin:$PATH
export MPD_CON_EXT="sge_$JOB_ID.$SGE_TASK_ID"
echo "Got $NSLOTS slots."
# The order of arguments is important. Forst global, then local options.
mpiexec -machinefile $TMPDIR/machines -n $NSLOTS ~/test/mpihello
exit 0
mpihello.c
// a simple mpi test
// compile with:
// $ mpicc -o ~/mpi_hello mpi_hello.c
#include <stdio.h>
#include <mpi.h>
int main (argc, argv)
int argc;
char *argv[];
{
int rank,size;
MPI_Init(&argc,&argv); /* starts MPI */
MPI_Comm_rank(MPI_COMM_WORLD,&rank); /* get current process id */
MPI_Comm_size(MPI_COMM_WORLD,&size); /* get number of processes */
printf("Hello world from process %d of %d\n",rank,size);
MPI_Finalize();
return 0;
}
Check output.
$ cat test.po4479
-catch_rsh /var/spool/sge/node06/active_jobs/4479.1/pe_hostfile /usr/global/mpich2
node06:1
node11:1
node10:1
node08:1
startmpich2.sh: check for local mpd daemon (1 of 10)
/usr/global/sge/bin/lx24-amd64/qrsh -inherit -V node06 /usr/global/mpich2/bin/mpd
startmpich2.sh: check for local mpd daemon (2 of 10)
startmpich2.sh: check for mpd daemons (1 of 10)
/usr/global/sge/bin/lx24-amd64/qrsh -inherit -V node11 /usr/global/mpich2/bin/mpd -h node06 -p 51413 -n
/usr/global/sge/bin/lx24-amd64/qrsh -inherit -V node10 /usr/global/mpich2/bin/mpd -h node06 -p 51413 -n
/usr/global/sge/bin/lx24-amd64/qrsh -inherit -V node08 /usr/global/mpich2/bin/mpd -h node06 -p 51413 -n
startmpich2.sh: check for mpd daemons (2 of 10)
startmpich2.sh: got all 4 of 4 nodes
$ cat test.o4479
Got 4 slots.
Hello World from Node 0.
Hello World from Node 2.
Hello World from Node 3.
Hello World from Node 1.
$ ssh node06 ps -e f -o pid,ppid,pgrp,command --cols=120 |less
...
3538 1 3538 /usr/global/sge/bin/lx24-amd64/sge_execd
24471 3538 24471 \_ sge_shepherd-4479 -bg
24568 24471 24568 | \_ /bin/bash /var/spool/sge/node06/job_scripts/4479
24569 24568 24568 | \_ python2.4 /usr/global/mpich2/bin/mpiexec -machinefile /tmp/4479.1.all.q/machines -n 4 /hom
24517 3538 24517 \_ sge_shepherd-4479 -bg
24518 24517 24518 \_ /usr/global/sge/utilbin/lx24-amd64/qrsh_starter /var/spool/sge/node06/active_jobs/4479.1/1.nod
24528 24518 24528 \_ python2.4 /usr/global/mpich2/bin/mpd
24570 24528 24570 \_ python2.4 /usr/global/mpich2/bin/mpd
24571 24570 24571 \_ /home/myuser/test/mpihello
...
24509 1 24472 /usr/global/sge/bin/lx24-amd64/qrsh -inherit -V node06 /usr/global/mpich2/bin/mpd
24532 1 24472 /usr/global/sge/bin/lx24-amd64/qrsh -inherit -V node11 /usr/global/mpich2/bin/mpd -h node06 -p 51413 -
24534 1 24472 /usr/global/sge/bin/lx24-amd64/qrsh -inherit -V node10 /usr/global/mpich2/bin/mpd -h node06 -p 51413 -
24537 1 24472 /usr/global/sge/bin/lx24-amd64/qrsh -inherit -V node08 /usr/global/mpich2/bin/mpd -h node06 -p 51413 -
...
$ ssh node11 ps -e f -o pid,ppid,pgrp,command --cols=120 |less
...
3681 1 3681 /usr/global/sge/bin/lx24-amd64/sge_execd
20881 3681 20881 \_ sge_shepherd-4479 -bg
20882 20881 20882 \_ /usr/global/sge/utilbin/lx24-amd64/qrsh_starter /var/spool/sge/node11/active_jobs/4479.1/1.nod
20889 20882 20889 \_ python2.4 /usr/global/mpich2/bin/mpd -h node06 -p 51413 -n
20890 20889 20890 \_ python2.4 /usr/global/mpich2/bin/mpd -h node06 -p 51413 -n
20891 20890 20891 \_ /home/myuser/test/mpihello
...
Test MPICH2 without SGE.
$ ssh node01
$ /usr/global/mpich2/bin/mpdboot -f /etc/machine.list -n 4
$ /usr/global/mpich2/bin/mpdtrace
$ /usr/global/mpich2/bin/mpdringtest
$ /usr/global/mpich2/bin/mpiexec -n 4 hostname
$ /usr/global/mpich2/bin/mpiexec -n 4 /bin/sh -c "hostname; ping -c1 node01; whoami"
$ /usr/global/mpich2/bin/mpiexec -n 4 ~/test/mpihello
$ /usr/global/mpich2/bin/mpdallexit
References:
'Knowledge Base > Linux' 카테고리의 다른 글
conky 설정 (0) | 2011.03.08 |
---|---|
우분투 10.10 conky 설치하기 (0) | 2011.03.08 |
RHEL 5 & RHEL 6 일반계정으로 root 권한 획득하는 쉬운 방법 (0) | 2010.10.21 |
하나의 입력 장치로 두 대의 컴퓨터를 사용... synergy (0) | 2010.10.01 |
command tip (0) | 2010.06.01 |