搜档网
当前位置:搜档网 › Formal verification of programs that use MPI one-sided communication

Formal verification of programs that use MPI one-sided communication

Formal Veri?cation of Programs That Use MPI One-Sided Communication

Salman Pervez1,Ganesh Gopalakrishnan1,Robert M.Kirby1,

Rajeev Thakur2,and William Gropp2

1School of Computing

University of Utah

Salt Lake City,UT84112,USA

2Mathematics and Computer Science Division

Argonne National Laboratory

Argonne,IL60439,USA

Abstract.Formal veri?cation methods based on model checking are ap-

plied to analyze the correctness properties of one existing and two new

distributed locking protocols implemented using MPI’s one-sided com-

munication.Model checking exposed an overlooked correctness issue with

the?rst of these protocols which was developed relying only on manual

reasoning.Model checking helped con?rm the basic correctness proper-

ties of the two new protocols,while also identifying the remaining prob-

lems in them.Our experience is that MPI based programming,especially

the tricky and relatively poorly undersood one-sided communication fea-

tures,stand to gain immensely from model checking.Considering that

many other areas of concurrent hardware and software design are now

routinely employing model checking,our experience con?rms that the

MPI community can greatly bene?t from the use of formal veri?cation.

1Introduction

Concurrent protocols are notoriously hard to design and verify.Experience has shown that virtually all non-trivial protocol implementations contain bugs such as deadlocks,livelocks,and memory leaks,despite extensive care taken during design and testing.It has also been realized that most of these bugs are basic design errors due to“unexpected”(untested)concurrent behaviors.Therefore, it stands to reason that if?nite state models of these protocols are created,and exhaustively analyzed for the desired formal properties,robust protocol imple-mentations would result.The know-how for such?nite state modeling,property description,and exhaustive analysis developed over the last three decades–known as model checking[2]–has,indeed,been successfully applied to numer-ous software and hardware systems.It is now an integral part of the Windows Device Driver Development Kit[1].Virtually all cache coherence protocols de-veloped and deployed in modern microprocessors have been veri?ed using model checking.However,despite the overall nature of concurrency as well as concur-rent programming bugs being similar in scienti?c programming as with these

other areas,we?nd very little evidence of model checking being applied in this area(related work is in Section6).

In this paper,we conduct case studies that show the promise of the applica-tion of model checking in the area of parallel scienti?c programming using MPI. In particular,we focus on MPI one-sided communication[?].Being(relatively) recently introduced,one-sided communication is insu?ciently understood as well as documented.One-sided communication involves shared memory concurrency, which is known to be inherently harder to reason about than the message passing concurrency of traditional MPI.One-sided communication exacerbates veri?ca-tion complexity because it only guarantees weak ordering semantics with respect to loads and stores which can freely reorder within a given synchronization epoch. This paper demonstrates that using model checking,bugs in MPI programs that use one-sided communication can be caught easily,while expending only modest amounts of human and computer-time.After presenting background on MPI one-sided communication in Section??,we provide an overview of model checking in Section2.We then describe the design of an existing distributed byte-range lock-ing protocol[12]and its formal veri?cation through model checking(Section3). Model checking helped uncover the serious problem of a potential deadlock,of which the authors of the algorithm were unaware.Model checking also found a more benign problem of extra(zero-byte)sends in the algorithm,which might lend itself to an implementation dependent correction using MPI lock/unlock within a fence),doing read-modify-writes via a Get-modify-Put in the same synchronization epoch(even though gets and puts are de?ned

to be nonblocking),and doing a put and a get to/from the same memory loca-tion in the same synchronization epoch.For example,the broadcast algorithms in Appendix B and C of[3]are incorrect because they rely on MPI

Get,such as MPICH2[11],the code will,in-deed,deadlock.Since MPI one-sided communication can be implemented in a variety of ways[5],the result of making such mistakes is often implementation dependent:the program may work?ne on some implementations and not on others.

2Model Checking

Model checking is a term that has acquired an overloaded meaning.It essentially is the activity of exhaustively examining all possible behaviors of a model of a concurrent program(akin to wind tunnel testing of scale models of airplanes).We consider?nite state model checking where the model of the concurrent system is expressed in a modeling language–Promela[6]in our case(all the pseudo-codes expressed in this paper have an almost direct Promela encoding).By exhaus-tively executing the concurrent system model,a model checker reveals its entire state transition structure,and is able to establish temporal properties such as “always P,”“A implies eventually Q,”etc,with respect to this structure.The state graphs we generate in our work are a result of the interleaved execution of various processes/threads.A fundamental problem with model checking is that reachable state graphs are exponential in the number of concurrent pro-cesses.The last three decades of research has,essentially,been about getting a good handle on this exponential growth,so much so that astronomically large ?nite state spaces–or often even many classes of in?nite state spaces–can be handled by model checkers.Despite the very large state spaces of the MPI mod-els discussed in this paper,our model checking runs?nished within acceptable durations(often in minutes)on standard workstations.

3Formal Veri?cation of Byte-Range Locking

In[12],Thakur et al.present an algorithm implemented using MPI one-sided communication(with passive-target synchronization lock-unlock)for coordinat-ing a collection of parallel processes contending for byte-range locks.We?rst describe the algorithm brie?y(details in[12]),followed by a description of how we model checked it.Due to space limits,we cannot present the full pseudocode of the original protocol;the reader may kindly refer to the original paper[12] for details.

3.1The Byte-Range Locking Algorithm

The algorithm is split into two phases.The lock acquire phase and the lock release phase.Each process in the system keeps a global state consisting of a

3

flag bit(initialized to0)and it start and end values(initialized to-1)in a shared memory window lockwin.A?ag value of0indicates that the process is not currently contending for the lock,while1indicates that it either has acquired the lock or is contending for the lock.A process updates its global state and reads others’by acquiring an exclusive lock to lockwin and making MPI_Put and/or MPI_Get calls.The semantics of MPI one-sided guarantee that these actions on lockwin are atomic.This means that if two processes simultaneously try to write into and read from lockwin,one will succeed while the other will have to wait.

In order to acquire the lock,a process sets its flag bit to1and updates its start and end values.It then checks to see if the lock is held/wanted by a process whose byte-range con?icts with its own.If it?nds such a process,it blocks on an MPI_Recv call.Otherwise,it successfully acquires the lock.The process holding the lock must do a similar check when it releases the lock.If it ?nds a process with a con?icting byte-range that tried to do an acquire while it was holding the lock,it makes an MPI_SEND call to that process.

3.2Checking the Byte-Range Locking Algorithm

To model the algorithm,we were?rst required to model the MPI one-sided communication constructs used in the algorithm,and capture their semantics precisely as speci?ed in the MPI Standard.For example,the MPI-2Standard speci?es that if a communication epoch is started with MPI lock,it must end with MPI unlock and that the put/get/accumulate calls made within this epoch are not guaranteed to complete before MPI unlock returns.Fur-thermore,there are no ordering guarantees of the puts/gets/accumulates within an epoch.Therefore,in order to obtain adequate execution space coverage,all permutations of put/get/accumulate calls in the epoch must be examined.How-ever,the byte-range locking algorithm uses the MPI EXCLUSIVE lock type, which means that while a certain process has entered the synchronization epoch, no other process may enter until that process has left.This makes the synchro-nization epoch an atomic block and renders all permutations of the calls within it equivalent from the perspective of other processes.Modeling the byte-range locking algorithm itself was relatively straightforward.4

When we model checked our model with three processes,our model checker SPIN[6]discovered an error indicating an“invalid end state.”Upon deeper probing,the following error scenario was identi?ed(explained through an exam-ple,which assumes that P1tries to lock byte-range 1,2 ,P2tries to lock 3,4 , and P3tries to lock 2,3 ):

–P1and P3successfully pass the acquire phase,and are using their respective byte-range locks.

–P2enters now,notices con?ict with respect to both P1and P3,and blocks on the MPI

Send,when only one send is needed.

Hence,while P2ends up successfully waking up and acquiring the lock,the extra MPI

Send

call.

2[10,20,1]

1[10,20,1]

3Acquire 4[?1,?1,0]

5Send

67[10,20,0]

9[10,20,1]

Receive

8[10,20,0]

Receive

10[10,20,0]Receive 12P1P2

Fig.1.A Deadlock Scenario Found Through Model Checking

We then proceeded to model the system as if these extra MPI

4Correcting the Byte-Range Locking Algorithm

We present two alternative approaches to correcting the byte-range locking al-gorithm.

Alternative1:One way to eliminate deadlocks is to add a third state to the “?ag”used in the algorithm.This is shown in the pseudo code in Figure2.In the algorithm of[12]a?ag value of‘0’indicates that the process does not have the lock,while a?ag value of‘1’indicates that it either has the lock or is in the process of determining whether it can acquire the lock.In other words,the ‘1’state is overloaded.In the proposed?x,we add a third state of‘2,’with‘0’denoting the same as before,‘1’now denoting that the process has the lock, and‘2’denoting that it is in the process of determining whether it can acquire the lock.There is no change to the lock-release algorithm,but the lock-acquire algorithm changes as follows.

When a process wants to acquire a lock,it writes its?ag value as‘2’and its start and end values in global memory indicating its intent to try for the lock. It now reads the state of the other processes from global memory.If it?nds a process with a con?icting byte range and a?ag value of‘1’,then the process knows that it does not have the lock,and it resets its?ag value to‘0’and blocks on an MPI

1Lock_acquire(int start,int end)

2{

3val[0]=2;/*flag*/val[1]=start;val[2]=end;

4while(1){

5/*add self to locklist*/

6MPI_Win_lock(MPI_LOCK_EXCLUSIVE,homerank,0,lockwin);

7MPI_Put(&val,3,MPI_INT,homerank,3*(myrank),3,MPI_INT,

8lockwin);

9MPI_Get(locklistcopy,3*(nprocs-1),MPI_INT,homerank,0,1,

10locktype1,lockwin);

11MPI_Win_unlock(homerank,lockwin);

12/*check to see if lock is already held*/

13conflict=0;

14flag1=0;

15flag2=0;

16for(i=0;i<(nprocs-1);i++){

17if((flag==1)&&(byte ranges conflict with lock request)){

18flag1=1;

19break;

20}

21if((flag==2)&&(byte ranges conflict with lock request)){

22flag2=1;

23break;

24}

25}

26if(flag1==1){

27/*reset flag to0,wait for notification,and then retry the lock*/

28MPI_Win_lock(MPI_LOCK_EXCLUSIVE,homerank,0,lockwin);

29val[0]=0;

30MPI_Put(val,1,MPI_INT,homerank,3*(myrank),1,MPI_INT,

31lockwin);

32MPI_Win_unlock(homerank,lockwin);

33/*wait for notification from some other process*/

34MPI_Recv(NULL,0,MPI_BYTE,MPI_ANY_SOURCE,WAKEUP,comm,

35MPI_STATUS_IGNORE);

36/*retry the lock*/

37Lock_acquire(start,end);

38}

39else if(flag2==1){

40/*reset flag to0,start/end offsets to-1,and then retry the lock*/ 41MPI_Win_lock(MPI_LOCK_EXCLUSIVE,homerank,0,lockwin);

42val[0]=0;/*flag*/val[1]=-1;val[2]=-1;

43MPI_Put(val,3,MPI_INT,homerank,3*(myrank),3,MPI_INT,

44lockwin);

45MPI_Win_unlock(homerank,lockwin);

46/*retry the lock*/

47Lock_acquire(start,end);

48}

49else{

50MPI_Win_lock(MPI_LOCK_EXCLUSIVE,homerank,0,lockwin);

51val[0]=1;

52MPI_Put(val,1,MPI_INT,homerank,3*(myrank),1,MPI_INT,

53lockwin);

54MPI_Win_unlock(homerank,lockwin);

55/*lock is acquired*/

56break;

57}

58}

59}

Fig.2.Pseudocode for the deadlock-free byte-range locking algorithm(alternative1)

7

1Lock_acquire(int start,int end)

2{

3val[0]=1;/*flag*/val[1]=start;val[2]=end;val[3]=-1/*pick*/; 4int picklist[num_procs];

5while(1){

6/*add self to locklist*/

7MPI_Win_lock(MPI_LOCK_EXCLUSIVE,homerank,0,lockwin);

8MPI_Put(&val,4,MPI_INT,homerank,4*(myrank),4,MPI_INT,

9lockwin);

10MPI_Get(locklistcopy,4*(nprocs-1),MPI_INT,homerank,0,1,

11locktype1,lockwin);

12MPI_Win_unlock(homerank,lockwin);

13/*check to see if lock is already held*/

14cprocs_i=0;

15for(i=0;i<(nprocs-1);i++){

16if((flag==1)&&(byte range conflicts with Pi‘s request)){

17conflict=1;picklist[cprocs_i]=Pi;cprocs_i++;}

18}

19if(conflict==1){

20for(j=0;j

21MPI_Win_lock(MPI_LOCK_EXCLUSIVE,homerank,0,lockwin);

22val[0]=0;val[3]=picklist[j];

23/*reset flag to0,indicate pick and pick_counter*/

24MPI_Put(&val,4,MPI_INT,homerank,4*(myrank),4,MPI_INT,

25lockwin);

26MPI_Win_unlock(homerank,lockwin);

27if(picklist[j]has released the lock)||detect_deadlock(){

28/*repeat for the next process in picklist*/

29j++;

30}

31else{

32/*wait for notification from picklist[j]*/

33MPI_Recv(NULL,0,MPI_BYTE,MPI_ANY_SOURCE,WAKEUP,comm,

34MPI_STATUS_IGNORE);

35/*retry the lock*/

36}

37}

38/*if the entire list has been traversed,retry the lock*/

39}

40else{

41break;/*lock is acquired*/

42}

43}

44}

Fig.3.Pseudocode for the deadlock-free byte-range locking algorithm(alternative2)

1detect_deadlock(){

2cur_pick=locklistcopy[4*myrank+3];

3while(i

4/*picking this process means a cycle is completed*/

5if(locklistcopy[4*cur_pick+3]==my_rank)

6return1;

7/*no cycle can be formed*/

8else if(locklistcopy[4*cur_pick+3]==-1)

9return0;

10else

11cur_pick=locklistcopy[4*cur_pick+3];

12}

13}

Fig.4.Avoiding circular loops among processes picked to wake up other processes in

alternative2

8

An Assessment of the Alternative Algorithms:Model checking using SPIN has helped establish the following formal properties of these algorithms:

–Absence of deadlocks(both alternatives)

–Communal progress(if a collection of processes request a lock then someone will eventually obtain it):Alternative2satis?es this under all fair schedules (all processes are scheduled to run in?nitely often),while Alternative1places

a few additional restrictions to rule out a few rare schedules(details on our

website in our technical report[?]).

We note that neither of these alternatives eliminates the extra sends.That said, Alternative2considerably reduces these extra sends,as it restricts the number of processes that can wake up a particular process to1.The exact performance tradeo?s will be determined as part of our future work.We are still seeking algorithms that:(i)avoid deadlock,(ii)avoid extra sends,and(iii)are e?cient. 5Related Work

Excellent tutorials as well as research papers on model checking are widely avail-able,and we omit a survey here.In[7],the author surveys the area of parallel scienti?c programming and techniques to portray executions through intuitive graphical means.He o?ers a formally-based tool that can help understand MPI program executions.In[9],the authors present the use of Promela and SPIN for modeling and verifying simple MPI C programs(2D di?usion)that employ basic MPI message passing constructs.To our knowledge,nobody has applied model checking to analyze one-sided communication constructs.As remarked earlier, the application of formal veri?cation methods can often have signi?cant impact when applied to relatively new(and hence relatively less understood)parallel programming primitives,as well as their applications.

6Conclusions and Future Work

We have shown how formal veri?cation based on model checking can be used to?nd actual deadlocks in published algorithms that use the MPI one-sided communication primitives.We also discuss how this technology can help shed light on a number of related issues such as forward progress,as well as the possibility of having unconsumed messages.We presented and analyzed two alternative algorithms,and analyzed their characteristics.

Our work is still in its early stages.Capitalizing on the maxim that formal methods can have their biggest impact when applied to constructs that are rela-tively new or are under development,we plan to formalize the entire set of MPI one-sided communication primitives.This can help develop a comprehensive ap-proach to verifying programs that use the MPI one-sided constructs.As future case studies,we plan to analyze numerous other algorithms,including the scal-able fetch-and-increment algorithm described in[4].We plan to explore the use

9

of automated tools to extract models from MPI programs,instead of creating them by hand.We plan to explore the advantages of using other modeling lan-guages such as+CAL[8],and also investigate the possibility of directly model checking MPI programs,instead of their extracted formal models. Acknowledgments

This work was supported by NSF award CNS-0509379and by the Mathematical,In-formation,and Computational Sciences Division subprogram of the O?ce of Advanced Scienti?c Computing Research,O?ce of Science,U.S.Department of Energy,under Contract W-31-109-ENG-38.

References

1.Thomas Ball,Byron Cook,Vladimir Levin,and Sriram K.Rajamani.SLAM and

static driver veri?er:Technology transfer of formal methods inside microsoft.In IFM04:Integrated Formal Methods,pages1–20.Springer-Verlag,April2004.

2.Edmund M.Clarke,Orna Grumberg,and Doron Peled.Model Checking.MIT

Press,December1999.

3.Marina Kraeva Glenn R.Luecke,Silvia Spanoyannis.The performance and scala-

bility of SHMEM and MPI-2one-sided routines on a SGI Origin2000and a cray T3E-600.Concurrency and Computation:Practice and Experience,16(10):1037–1060,2004.

4.William Gropp,Ewing Lusk,and Rajeev https://www.sodocs.net/doc/1f17572547.html,ing MPI-2:Advanced Features

of the Message-Passing Interface.MIT Press,Cambridge,MA,1999.

5.William Gropp and Rajeev Thakur.An evaluation of implementation options for

MPI one-sided communication.In Recent Advances in Parallel Virtual Machine and Message Passing Interface,12th European PVM/MPI Users’Group Meeting, pages415–424.Lecture Notes in Computer Science3666,Springer,September2005. 6.Gerard J.Holzmann.The Spin Model Checker:Primer and Reference Manual.

Addison-Wesley,2003.ISBN0-32122-862-6.

7.Dieter Kranzlmller.Event Graph Analysis for Debugging Massively Parallel Pro-

grams.PhD thesis,John Kepler University Linz,Austria,September2000.

http://www.gup.uni-linz.ac.at/~dk/thesis.

8.Leslie Lamport,2006.https://www.sodocs.net/doc/1f17572547.html,/users/lamport/tla/pluscal.html.

9.Stephen F.Siegel and George Avrunin.Veri?cation of mpi-based software for

scienti?c computation.In Proceedings of the11th International SPIN Workshop on Model Checking Software,pages286–303,April2004.LNCS volume2989.

10. A.S.Tanenbaum.Distributed Operating Systems,chapter6,pages289–375.

Prentice-Hall,Inc.,1995.

11.Rajeev Thakur,William Gropp,and Brian Toonen.Optimizing the synchroniza-

tion operations in MPI one-sided communication.International Journal of High-Performance Computing Applications,19(2):119–128,Summer2005.

12.Rajeev Thakur,Robert Ross,and Robert Latham.Implementing byte-range

locks using MPI one-sided communication.In Recent Advances in Parallel Vir-tual Machine and Message Passing Interface,12th European PVM/MPI Users’Group Meeting,pages120–129.Lecture Notes in Computer Science3666,Springer, September2005.

10

相关主题