Back to Stream Computing Spec †
System organization †
Photos †How to use †Account for the cluster †You need an account in gpgpu.cs.tsukuba.ac.jp. (Ask Zhang san to make the account) gpgpu is a gateway machine that can throw your MPI-based application the the cluster. SSH to gpgpu †gpgpu accepts your ssh connection. After you login to gpgpu, you can have connections to the cluster nodes. From gpgpu, you can access to the nodes using the folowing machine name; gpgpue00 gpgpue01 gpgpue02 gpgpue03 gpgpue04 gpgpue05 gpgpue06 gpgpue07 Your home directry is shared with all the nodes via NIS. Setup for MPI execution †PATH for MPI commands †At gpgpu, you need to add the following path to your PATH exvironment due to using commands for MPI at the machine; /usr/lib64/openmpi/1.4-gcc/bin/ For example, add the path above in the .bashrc in your home directry as follows; PATH=$PATH:/usr/lib64/openmpi/1.4-gcc/bin/ After you add the path above in the .bashrc, affect it to your current login environment using source command or just login to gpgpu again. MANPATH for MPI related things †To refer man pages for MPI, in gpgpu, export the path as follows; export MANPATH=/usr/lib64/openmpi/1.4-gcc/man/:$MANPATH Try to see the man pages for mpirun and MPI_Send. SSH key †The OpenMPI we use in the cluster invokes processes using SSH remotely. If you do not seup your SSH key to login automatically (such as rlogin without password), you need to type your password whenever any MPI process is created in any node in the cluster. [yama@gpgpu ~]$ ssh-keygen Generating public/private rsa key pair. Enter file in which to save the key (/home/yama/.ssh/id_rsa): &color(red,){(<= Just enter here)}; Created directory '/home/yama/.ssh'. Enter passphrase (empty for no passphrase): &color(red,){(<= Just enter here)}; Enter same passphrase again: &color(red,){(<= Just enter here)}; Your identification has been saved in /home/yama/.ssh/id_rsa. Your public key has been saved in /home/yama/.ssh/id_rsa.pub. Then, just execute the following command lines to register the keys to be used at the SSH login; [yama@gpgpu ~]$ cat .ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys [yama@gpgpu ~]$ chmod 600 $HOME/.ssh/authorized_keys To test the SSH key setting is correct or not, just login to gpgpue00; [yama@gpgpu ~]$ ssh gpgpue00 The authenticity of host 'gpgpue00 (192.168.2.1)' can't be established. RSA key fingerprint is 34:d4:aa:8c:ef:49:5d:eb:f2:fc:37:ac:c2:4d:7d:19. Are you sure you want to continue connecting (yes/no)? &color(red,){yes}; Warning: Permanently added 'gpgpue00,192.168.2.1' (RSA) to the list of known hosts. Last login: Fri Feb 11 14:00:20 2011 from gpgpu-gw [yama@gpgpu00 ~]$ SSH to all the nodes of the cluster from gpgpu00-07 once to get the finger print information fo al the nodes to login automatically. (Can we escape this if we use some techniques?? If you know any good method, please suggest it.) MPI execution †Now you can use the environment for MPI program. Let us try a sample program of pingpong enclosed in the OpenMPI package. First download Compile the program using the mpicc command; mpicc pingpong.c -o pingpong Now you can find an executable file named "pingpong" in the directory. It is the time execute it in the culuster nodes using mpirun. You need to make a "machinefile" to specify machines where your program is distributed. Edit a machinefile adding the nodes mentiond abve such as gpgpue00-07. Here, we assume that the machinefile name is "machinefile" with following two machines; gpgpue00 gpgpue01 Try to execute the pinpong program using mpirun as follows: mpirun -prefix /opt/openmpi/1.4.2/ -machinefile machinefile -np 2 -nolocal pingpong /opt/openmpi/1.4.2/ is the path to OpenMPI on the nodes. In default, the processes uses Infiniband network for the MPI connection. You do not need to think about anything for the connection. The output from the command line above is shown below; I am gpgpu00. I am gpgpu01. Hello from 0 of 2 Hello from 1 of 2 Timer accuracy of ~1.907349 usecs 8 bytes took 51 usec ( 0.314 MB/sec) 16 bytes took 6 usec ( 5.369 MB/sec) 32 bytes took 9 usec ( 7.064 MB/sec) 64 bytes took 6 usec ( 21.475 MB/sec) 128 bytes took 142 usec ( 1.805 MB/sec) 256 bytes took 13 usec ( 39.768 MB/sec) 512 bytes took 10 usec ( 102.261 MB/sec) 1024 bytes took 13 usec ( 159.073 MB/sec) 2048 bytes took 24 usec ( 170.098 MB/sec) 4096 bytes took 21 usec ( 390.452 MB/sec) 8192 bytes took 28 usec ( 587.346 MB/sec) 16384 bytes took 324 usec ( 101.132 MB/sec) 32768 bytes took 185 usec ( 354.224 MB/sec) 65536 bytes took 242 usec ( 541.631 MB/sec) 131072 bytes took 459 usec ( 571.175 MB/sec) 262144 bytes took 583 usec ( 899.028 MB/sec) 524288 bytes took 1082 usec ( 969.160 MB/sec) 1048576 bytes took 1930 usec (1086.608 MB/sec) 2097152 bytes took 3868 usec (1084.398 MB/sec) 4194304 bytes took 7496 usec (1119.096 MB/sec) 8388608 bytes took 14646 usec (1145.511 MB/sec) 16777216 bytes took 30754 usec (1091.056 MB/sec) 33554432 bytes took 61213 usec (1096.317 MB/sec) 67108864 bytes took 120136 usec (1117.215 MB/sec) 134217728 bytes took 239794 usec (1119.442 MB/sec) 268435456 bytes took 476880 usec (1125.799 MB/sec) Asynchronous ping-pong 8 bytes took 13 usec ( 1.220 MB/sec) 16 bytes took 6 usec ( 5.369 MB/sec) 32 bytes took 6 usec ( 10.737 MB/sec) 64 bytes took 5 usec ( 25.565 MB/sec) 128 bytes took 10 usec ( 25.565 MB/sec) 256 bytes took 8 usec ( 65.075 MB/sec) 512 bytes took 10 usec ( 102.261 MB/sec) 1024 bytes took 12 usec ( 171.799 MB/sec) 2048 bytes took 20 usec ( 204.522 MB/sec) 4096 bytes took 21 usec ( 390.452 MB/sec) 8192 bytes took 85 usec ( 193.032 MB/sec) 16384 bytes took 61 usec ( 536.871 MB/sec) 32768 bytes took 71 usec ( 925.515 MB/sec) 65536 bytes took 125 usec (1049.152 MB/sec) 131072 bytes took 232 usec (1128.862 MB/sec) 262144 bytes took 462 usec (1134.687 MB/sec) 524288 bytes took 861 usec (1217.958 MB/sec) 1048576 bytes took 1750 usec (1198.378 MB/sec) Bi-directional asynchronous ping-pong 8 bytes took 5 usec ( 3.196 MB/sec) 16 bytes took 4 usec ( 7.895 MB/sec) 32 bytes took 4 usec ( 16.777 MB/sec) 64 bytes took 4 usec ( 31.581 MB/sec) 128 bytes took 9 usec ( 29.020 MB/sec) 256 bytes took 8 usec ( 65.075 MB/sec) 512 bytes took 9 usec ( 113.025 MB/sec) 1024 bytes took 11 usec ( 186.738 MB/sec) 2048 bytes took 18 usec ( 226.051 MB/sec) 4096 bytes took 19 usec ( 429.497 MB/sec) 8192 bytes took 32 usec ( 509.033 MB/sec) 16384 bytes took 286 usec ( 114.628 MB/sec) 32768 bytes took 129 usec ( 508.092 MB/sec) 65536 bytes took 227 usec ( 577.475 MB/sec) 131072 bytes took 364 usec ( 720.047 MB/sec) 262144 bytes took 521 usec (1006.418 MB/sec) 524288 bytes took 967 usec (1084.331 MB/sec) 1048576 bytes took 2062 usec (1017.125 MB/sec) Max rate = 1217.958048 MB/sec Min latency = 1.907349 usec OpenMP †gcc in each node supports OpenMP. Here, we try an OpenMP program named "omptest". Please download gcc -fopenmp omptest.c -o omptest Then just run the executable directly. It will be parallelized to the number of CPU cores (i.e. 4 threads). [yama@gpgpu00 OpenMPTest]$ ./omptest Hello World from 0 Hello World from 2 Hello World from 1 Hello World from 3 proc3: sum = 25 proc0: sum = 25 proc1: sum = 25 proc2: sum = 25 sum = 100 Enjoy!! Written by Shinichi Yamagiwa |