updating README.md

9ded4b05 · zahoussem · 515af074 · 9ded4b05 · 9ded4b05
Commit 9ded4b05 authored 5 years ago by zahoussem
--- a/README.md
+++ b/README.md
@@ -181,19 +181,172 @@ add_pruda_task(p_task_b);
 // launch the tasks on GPU and the corresponding memory copies 
 // (locks and synchronization params are under developpement)
 create_cpu_threads();
+```


-}
+
+
+The user now must define the task parameters so they can be
+initialized at compile time in user.h and user.cu as follows : 
+
+```c
+// user.h 
+
+std::tuple<struct kernel_t<int *,int*,int *, int> * , struct kernel_t<int *,int*,int *> * >   get_listing();
 ```
+where the return type of get_listing() is a tuple of the list of kernels.

+```c
+// user.cu
+// the list of the kernels must be given here 
+struct kernel_t<int *,int*,int *, int> m_1;
+struct kernel_t<int *,int*,int *> m_2;
+
+void init_kernel_listing(){  
+  // user must add its kernels here.
+  tasks = std::make_tuple(&m_1,&m_2);
+}
+```

+Once, every thing is set in place. The user calls make to compile
+pruda along with his own code.



 # PRUDA scheduling tools and policies
 ## Single core strategy for non preemptive schedulers
+The first strategy, called {\it single-stream} , uses one Cuda stream
+to enforce kernel scheduling decision. The scheduler uses three queue:
+task queue ({\sf tq}) which contains all PRUDA tasks list and active
+kernels queue {\sf rq} which contains the active PRUDA jobs and the
+stream queue {\sf sq}, which contains kernels that will be submitted
+to GPU.  When a kernel is activated, it is added to the {\it correct}
+active kernels queue {\sf rq} via {\sf pruda\_subscribe}($\cdots$)
+primitive. Further, if Cuda stream queue {\sf sq} is empty, it is
+moved from the {\sf rq} to {\sf sq} if it is the most priority job
+according to the given scheduling policy using  pruda\_resched
+primitive.
+
+As only one Cuda stream is used, once the pruda task is executing, it
+can not be preempted by another higher priority task, therefore only
+non preemptive scheduling algorithms can be implemented using this
+strategy. However, we would like to highlight that we allow pruda user
+to abort the current kernel under execution by calling 
+  pruda\_abort() primitive.
+
+
+This strategy is simple and easy to implement. It provides an implicit
+synchronization between active tasks, i.e. if task {\sf B} is in the
+stream queue while {\sf A} is running, {\sf B} will wait until {\sf A}
+finishes its execution before starting without overlapping. However,
+the use of this strategy involves reserving all the GPU resources
+(both SMs) for a single pruda task at a time even if this task is not
+heavy and not using all GPU cores, therefore resource are wasted. In
+the next strategies, we will show how to overcome these limitations.
+
 ## Single core strategy for  preemptive schedulers
+
+In the second strategy, called "multiple streams", PRUDA creates
+multiple streams to take scheduling decisions, allowing concurrent
+kernel execution on GPUs and preemption.
+
+First, we recall that the TX2 allows only two priority
+levels. Therefore, we create only two streams: one with high priority
+and the other with low priority. The queue of the high priority stream
+is denoted by {\sf h-sq}, the second stream queue is denoted by {\sf
+  l-sq}. We recall that using several streams allow asynchronous and
+concurrent execution between the two streams, however within the same
+stream, the execution is always FIFO.
+
+When a task is active, it is added to the correct ready-task queue
+{\sf rq}. Further, the scheduler checks one of the following
+situations:
+
+
+1. {\sf h-sq}~$= \emptyset \wedge $~{\sf l-sq} $= \emptyset $ : the
+  scheduler will allocate the task to the {\sf l-sq} queue, therefore
+  the task will be submitted {\it immediately} to the GPU.
+
+
+2. sf h-sq = emptyset  and l-sq != \emptyset  :
+  the scheduler checks that the activated task has a higher priority
+  than the task in {\sf l-sq}.  If yes, the task is inserted into the
+  high priority queue {\sf h-sq}, therefore it preempts the task in
+  the {\sf l-sq} if possible. Otherwise, no scheduling decision are
+  taken.
+
+
+According to the scheduling decisions mechanism described in the text
+above, only one preemption is allowed when a task is already in
+execution. For example, if a task {\sf C} arrives after {\sf B} has
+preempted {\sf A}, task {\sf C} must wait until {\sf B} finished even
+if it is the highest priority active job. We are currently developing
+schedulability analysis for such limited preemption and priority
+system. We would like also to highlight that preempted tasks, will
+continue to use GPU resources if the high priority task is not using
+{\it all} of the GPU resources.
+
+Even if this strategy solves preemption limitations of the previous
+one, it is more complex. It uses also a GPU as a single core. In the
+next section, we use each SM in the GPU as a single processor allowing
+parallel execution within the GPU.
+
+
 ## Multicore strategy for GANG preemptive schedulers
+The third strategy uses the GPU in similar way as the previous one;
+therefore two streams are created and with the same queue
+configuration. However, we allow tasks to call the primitive {\sf
+  pruda\_allocate\_to\_sm}($\cdots$). Thus, using a GPU as a
+multiprocessor rather than a single core. We consider two types of
+pruda tasks : the ones that are allocated to a given SM and the other
+that are not (we consider that the PRUDA tasks, not calling the
+allocation primitive as tasks requiring the GPU exclusively).
+
+In addition to the scheduling structures described for the previous
+strategy, this strategy uses one queue per SM : {\sf sm0-q} and {\sf
+  sm1-q}. When a task is active, if it uses both SMs, no other task
+will be scheduled at the same time, therefore it will be added to {\sf
+  l-sq} or {\sf h-sq} similarly as in the previous
+strategy. Otherwise, it uses a single SM and it is assigned to the
+correct SM queue. Later, the two job having the highest priority in
+{\sf sm0-q} and {\sf sm1-q} are scheduled first by being inserted in
+{\sf l-sq} and {\sf h-sq}. This allows parallel execution on both
+streaming multiprocessor. This strategy allows using the GPU of TX2 as
+a 2-core platform.
+
+The allocation primitive in fact tests if a given block/thread is in
+the correct SM, if yes, it continues onward execution, otherwise it
+exits. Therefore, the user have either to take that into account when
+using the block and thread indexes or must use new primitives we
+provide to calculate indexes. The thread and block indexing mechanism
+we provide is simple but effective. The user is free to use the Cuda
+indexes but {\bf carefully} or our platform indexes. We highlight here
+that both of the previous strategies does not require any modification
+in the kernel code nor in the programming fashion (indexing). Although
+this method is more complex to implement than the two previous ones,
+it provides both temporal and spatial tasks execution control on
+GPUs. Analyzing the behavior of this final strategy is a challenging
+theoretical question, that is considered for future work.
+
+
+#Real-time policies using PRUDA
+
+
+Implementing real-time schedulers using PRUDA is simple. In fact, it
+requires implementing the {\sf pruda\_subscribe} primitive and the
+{\sf pruda\_resched} primitive. The goal of the first is to put the
+active task in the correct queue according to its priority. If the
+scheduling algorithm is fixed priority, it has to put it directly in
+the corresponding priority queue. If the algorithm is EDF, it requires
+calculating the priority and further inserting the task into the
+correct queue. The goal of the second primitive is to select which
+active task to select and in which Cuda stream queue it should be
+inserted, therefore to be submitted to the GPU. The user is also able
+to call {\sf pruda\_abort} to exit the execution of a given kernel to
+mix real-time with non real-time tasks if desired. The description of
+PRUDA provided in the current and the previous section is described in
+Figure \ref{fig:pruda_show}. We highlight that pruda primitives
+(except subscribe and resched) can be used even for non pruda tasks.
 # OPTIONS 
 PRUDA manage memory copies between
 CPU and GPU and kernel pulls to make scheduling decisions.

--- a/src/user.cu
+++ b/src/user.cu