OpenSteer(oid)+And+GPUX

=OpenSteer(oid) and GPUX:= toc The OpenSteer(oid) is targeted to develop a crowd dynamics application that would run on Graphics Processing Unit by making use of its astronomically powerful parallel architecture. The OpenSteer(oid) makes use of the C++ OpenSteer library that provides state of the art steering behaviors and algorithms that would enable us to write crowd dynamics application. The problem that arises is that the OpenSteer library code is written for a CPU based environment that runs sequentially. The OpenSteer(oid) not only comes up with its own code but it also aims to Parallelize the portion of OpenSteer code it intends to use.

The Problem:
Finding out the Parallelizable code (the code that can run in parallel) is not an easy task as it might apparently seem so. The code can extend up to hundreds and thousands of lines of code and doing all that stuff manually would result in a wastage of few years and a couple of brain strokes. Plus newer versions of applications are released instantly differing from their previous versions, hence manually determining the vectorizable portion is definitely not the solution. This process desperately needs to be Automated.

Apart from this, there is an issue of Data Dependency. Languages like C/C++ allows lavish use of pointers that are allocated dynamically. You can never say for sure how many times the same memory space is being used for entirely different tasks. In case of sequential execution, this is actually a good practice cause you can de-allocate a memory space previously used for some other purpose and then make use of it again later in your code. The sequential code runs in a blocking style, hence things would work fine since both instructions would execute at different times. But if you intend to parallelize your code you could end up making a mess. Take an example of cooking pan, on the same day you can use it for making a sweet dish later on you can cook steak in the same pan. But you would do that at different points of time. But imagine the case of Parallel execution, you would definitely not want to cook the sweet dish and the steak in the same pan(same memory) at the same time. That would be an embarrassing fiasco. This problems arises cause the code was previously written for sequential execution hence it is optimized to work in that environment which would create many issues in case of Parallel execution.

The following are the types of Data dependencies that we might encounter:


 * //**Read after Write (RAW)**//
 * //**Write after Write(WAW)**//
 * //**Write after Read(WAR)**//

Consider two instructions **i** and **j**, **j** runs after **i** in the sequential code.


 * // RAW (read after write) // - // j tries to read a source before i writes it, so j incorrectly gets the old value.//**


 * // WAW (write after write) // - // j tries to write an operand before it is written by i . The writes end up being performed in the wrong order, leaving the value written by i rather than the value written by j in the destination. //**


 * // WAR (write after read) // - // j tries to write a destination before it is read by i, so i incorrectly gets the new value. //**

** What is GPUX? **
A tool-chain for porting legacy applications to GPGPU by analyzing program traces. GPUX identifies vectorizable loops and diagnose loop-carried dependencies in non-vectorizable loops thereby helping programmers to vectorize them.

What does GPUX identify?
1) Identify vectorizable loops in a program. Loops are vectorizable when either they do not have any loop-carried dependency or have a loop-carried dependency that can be easily mitigated. (2) Find a partitioning for the set of vectorizable loops that minimizes communication between CPU and GPU while maximizing speed-up.  (3) Extract the loops earmarked for GPU execution to extern functions implemented using a GPGPU API like CUDA while inserting code for handling data transfers in what is left of the original program.  (4) Compilation and Execution

Porting OpenSteer(oid) onto GPU using GPUX:
The GPUX will help identify the parallelizable code in the Cpu version of OpenSteer(oid) hence it will help a great deal in porting it onto GPU by automating the process of identifying the loops, data dependencies, bank conflicts, etc.

=media type="custom" key="8678684"= include component="comments" page="OpenSteer(oid) And GPUX" limit="10"

=Citations:=