Skip to content
Snippets Groups Projects
Commit 9ded4b05 authored by zahoussem's avatar zahoussem
Browse files

updating README.md

parent 515af074
Branches
No related tags found
No related merge requests found
......@@ -181,19 +181,172 @@ add_pruda_task(p_task_b);
// launch the tasks on GPU and the corresponding memory copies
// (locks and synchronization params are under developpement)
create_cpu_threads();
```
}
The user now must define the task parameters so they can be
initialized at compile time in user.h and user.cu as follows :
```c
// user.h
std::tuple<struct kernel_t<int *,int*,int *, int> * , struct kernel_t<int *,int*,int *> * > get_listing();
```
where the return type of get_listing() is a tuple of the list of kernels.
```c
// user.cu
// the list of the kernels must be given here
struct kernel_t<int *,int*,int *, int> m_1;
struct kernel_t<int *,int*,int *> m_2;
void init_kernel_listing(){
// user must add its kernels here.
tasks = std::make_tuple(&m_1,&m_2);
}
```
Once, every thing is set in place. The user calls make to compile
pruda along with his own code.
# PRUDA scheduling tools and policies
## Single core strategy for non preemptive schedulers
The first strategy, called {\it single-stream} , uses one Cuda stream
to enforce kernel scheduling decision. The scheduler uses three queue:
task queue ({\sf tq}) which contains all PRUDA tasks list and active
kernels queue {\sf rq} which contains the active PRUDA jobs and the
stream queue {\sf sq}, which contains kernels that will be submitted
to GPU. When a kernel is activated, it is added to the {\it correct}
active kernels queue {\sf rq} via {\sf pruda\_subscribe}($\cdots$)
primitive. Further, if Cuda stream queue {\sf sq} is empty, it is
moved from the {\sf rq} to {\sf sq} if it is the most priority job
according to the given scheduling policy using pruda\_resched
primitive.
As only one Cuda stream is used, once the pruda task is executing, it
can not be preempted by another higher priority task, therefore only
non preemptive scheduling algorithms can be implemented using this
strategy. However, we would like to highlight that we allow pruda user
to abort the current kernel under execution by calling
pruda\_abort() primitive.
This strategy is simple and easy to implement. It provides an implicit
synchronization between active tasks, i.e. if task {\sf B} is in the
stream queue while {\sf A} is running, {\sf B} will wait until {\sf A}
finishes its execution before starting without overlapping. However,
the use of this strategy involves reserving all the GPU resources
(both SMs) for a single pruda task at a time even if this task is not
heavy and not using all GPU cores, therefore resource are wasted. In
the next strategies, we will show how to overcome these limitations.
## Single core strategy for preemptive schedulers
In the second strategy, called "multiple streams", PRUDA creates
multiple streams to take scheduling decisions, allowing concurrent
kernel execution on GPUs and preemption.
First, we recall that the TX2 allows only two priority
levels. Therefore, we create only two streams: one with high priority
and the other with low priority. The queue of the high priority stream
is denoted by {\sf h-sq}, the second stream queue is denoted by {\sf
l-sq}. We recall that using several streams allow asynchronous and
concurrent execution between the two streams, however within the same
stream, the execution is always FIFO.
When a task is active, it is added to the correct ready-task queue
{\sf rq}. Further, the scheduler checks one of the following
situations:
1. {\sf h-sq}~$= \emptyset \wedge $~{\sf l-sq} $= \emptyset $ : the
scheduler will allocate the task to the {\sf l-sq} queue, therefore
the task will be submitted {\it immediately} to the GPU.
2. sf h-sq = emptyset and l-sq != \emptyset :
the scheduler checks that the activated task has a higher priority
than the task in {\sf l-sq}. If yes, the task is inserted into the
high priority queue {\sf h-sq}, therefore it preempts the task in
the {\sf l-sq} if possible. Otherwise, no scheduling decision are
taken.
According to the scheduling decisions mechanism described in the text
above, only one preemption is allowed when a task is already in
execution. For example, if a task {\sf C} arrives after {\sf B} has
preempted {\sf A}, task {\sf C} must wait until {\sf B} finished even
if it is the highest priority active job. We are currently developing
schedulability analysis for such limited preemption and priority
system. We would like also to highlight that preempted tasks, will
continue to use GPU resources if the high priority task is not using
{\it all} of the GPU resources.
Even if this strategy solves preemption limitations of the previous
one, it is more complex. It uses also a GPU as a single core. In the
next section, we use each SM in the GPU as a single processor allowing
parallel execution within the GPU.
## Multicore strategy for GANG preemptive schedulers
The third strategy uses the GPU in similar way as the previous one;
therefore two streams are created and with the same queue
configuration. However, we allow tasks to call the primitive {\sf
pruda\_allocate\_to\_sm}($\cdots$). Thus, using a GPU as a
multiprocessor rather than a single core. We consider two types of
pruda tasks : the ones that are allocated to a given SM and the other
that are not (we consider that the PRUDA tasks, not calling the
allocation primitive as tasks requiring the GPU exclusively).
In addition to the scheduling structures described for the previous
strategy, this strategy uses one queue per SM : {\sf sm0-q} and {\sf
sm1-q}. When a task is active, if it uses both SMs, no other task
will be scheduled at the same time, therefore it will be added to {\sf
l-sq} or {\sf h-sq} similarly as in the previous
strategy. Otherwise, it uses a single SM and it is assigned to the
correct SM queue. Later, the two job having the highest priority in
{\sf sm0-q} and {\sf sm1-q} are scheduled first by being inserted in
{\sf l-sq} and {\sf h-sq}. This allows parallel execution on both
streaming multiprocessor. This strategy allows using the GPU of TX2 as
a 2-core platform.
The allocation primitive in fact tests if a given block/thread is in
the correct SM, if yes, it continues onward execution, otherwise it
exits. Therefore, the user have either to take that into account when
using the block and thread indexes or must use new primitives we
provide to calculate indexes. The thread and block indexing mechanism
we provide is simple but effective. The user is free to use the Cuda
indexes but {\bf carefully} or our platform indexes. We highlight here
that both of the previous strategies does not require any modification
in the kernel code nor in the programming fashion (indexing). Although
this method is more complex to implement than the two previous ones,
it provides both temporal and spatial tasks execution control on
GPUs. Analyzing the behavior of this final strategy is a challenging
theoretical question, that is considered for future work.
#Real-time policies using PRUDA
Implementing real-time schedulers using PRUDA is simple. In fact, it
requires implementing the {\sf pruda\_subscribe} primitive and the
{\sf pruda\_resched} primitive. The goal of the first is to put the
active task in the correct queue according to its priority. If the
scheduling algorithm is fixed priority, it has to put it directly in
the corresponding priority queue. If the algorithm is EDF, it requires
calculating the priority and further inserting the task into the
correct queue. The goal of the second primitive is to select which
active task to select and in which Cuda stream queue it should be
inserted, therefore to be submitted to the GPU. The user is also able
to call {\sf pruda\_abort} to exit the execution of a given kernel to
mix real-time with non real-time tasks if desired. The description of
PRUDA provided in the current and the previous section is described in
Figure \ref{fig:pruda_show}. We highlight that pruda primitives
(except subscribe and resched) can be used even for non pruda tasks.
# OPTIONS
PRUDA manage memory copies between
CPU and GPU and kernel pulls to make scheduling decisions.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment