README.md

2632fbb2 · zahoussem · c18dc1de · 2632fbb2
Commit 2632fbb2 authored 5 years ago by zahoussem
--- a/README.md
+++ b/README.md
@@ -5,9 +5,10 @@ PRUDA is a set of programming tools and mechanisms to control
 scheduling within the GPU. It also provides the implementation the
 following real-time scheduling policies:
-   - Fixed Priority (FP):  preemptive and non-preemptive
+   - Fixed Priority (FP):  preemptive and non-preemptive single core
   - Earliest Deadline First (EDF) : preemptif and non preemptive
-   - EDF-Gang scheduling techniques : the GPU is considered as a multiprocessor architecture.
+   - Gang scheduling techniques : preemptive and non preemptive GANG
+     using the GPU as a multiprocessor architecture.
 Additionally PRUDA aims to not modify CUDA-user programming
 style. Therefore, the PRUDA user can use already developped CUDA
@@ -31,60 +32,17 @@ unified memory.
 A GPU is compound of one or several streaming multiprocessors (SMs)
 and one or several copy engines (CEs). Streaming multiprocessors are
 able to achieve computations (kernels), whereas copy engines execute
-memory copy operations between different memory spaces. Programming
+memory copy operations between different memory spaces.  Each SM can
-the GPU requires dividing parallel computations into several grids,
+be considered as a processor, or both SMs as a single processor
-and each grid to several blocks. A block is a set of multiple
+according to the scheduling policy. 
-threads. A GPU can be programmed using generic platforms such OpenCL
-or proprietary independent APIs. We use CUDA, a NVIDIA proprietary
+To get the execution sm for a given kernel, it can invoke :
-platform, to have a tight control on SMs and CEs in C/C++
+  pruda\_get\_sm().  PRUDA allows also enforcing the allocation of a
-programming language and using the NVIDIA compiler *nvcc*.
+  given kernel to a specific SM by using {\sf
+  pruda\_allocate\_to\_sm(int sm\_id)}, where the {\sf sm\_id} is the
+  id of the target streaming multiprocessor. 
-From PRUDA perspective, the GPU is a set of copy engines and one or
-more processors. Each SM can be considered as a processor, or both SMs
+  Implementation details
-as a single processor. PRUDA manage memory copies between CPU and GPU
-and kernel pulls to make scheduling decisions.
-When a kernel is invoked by CPU code, it submits commands to the
-GPU. How and when commands are consumed, is hidden by constructors for
-intellectual property concerns. PRUDA has been tested on Jetson
-TX2. It is compound of 6 ARM-based CPU cores, along with an integrated
-NVIDIA PASCAL-based GPU. The GPU in the TX2 is compound of 256 Cuda
-cores, divided into two SMs and one copy engine. CPUs and GPU share
-the same memory module. From a programming perspective, one may either
-allocate two separate memory spaces for CPU and GPU using {\sf malloc}
-and {\sf CudaMalloc} primitives respectively. The programmer may use a
-memory space visible logically by the CPU and the GPU called CUDA
-unified memory (even for discrete GPUs), therefore no memory copies
-are needed between CPU and GPU tasks such memory spaces (buffers)
-allocated using the {\sf CudaMallocManaged} primitive. PRUDA allows
-handling both memory copy operations by enabling and desabling
-automatic memory copy operations.
-Typical Cuda programs are organized in the same way. first, memory
-allocation operations are achieved both on CPU and GPU. Further,
-memory copies are operated between CPU and GPU. Later, the GPU kernel
-is launched, and finally results are copied back to the CPU by memory
-copy operations. Cuda Malloc is a costly operation. Therefore, in
-PRUDA, this operation must be achieved by the programmer, out of the
-real-time task processing.
-all thread of any block are executed
-only by only one SM, however different blocks of the same kernel may
-be executed on different SMs. In Figure \ref{fig:sched_jetson}, the
-green kernel is executed on both SM0 and SM1, the red SM is executed
-only on SM0. The kernel execution order and mechanisms are driven by
-internal closed-source NVIDIA drivers (in our case of study). A PRUDA
-user may get the SM where a given block/thread is executing using {\sf
-  pruda\_get\_sm()} primitive. PRUDA allows also enforcing the
-allocation of a given kernel to a specific SM by using PRUDA primitive
-{\sf pruda\_allocate\_to\_sm(int sm\_id)}, where the {\sf sm\_id} is
-the id of the target streaming multiprocessor. Implementation details
  about how these primitives can be found in the PRUDA description
  section.
@@ -122,13 +80,124 @@ section.
 ## CUDA operations:
 PRUDA allows a kernel to execute within a single SM 
 # PRUDA usage by example
+```c
+#include "../inc/user.h"
+#include "../inc/tools.h"
+#define N 8
+__global__ void add( int *a, int *b, int *c ) {
+  printf("here 2 \n");
+    int tid = blockDim.x * blockIdx.x + threadIdx.x;
+    while (tid < N) {
+        c[tid] = a[tid] + b[tid]; 
+        tid += blockDim.x; 
+    }
+}
+__global__ void mul( int *a, int *b, int *c, int h ) {
+    int tid = blockDim.x * blockIdx.x + threadIdx.x;
+    while (tid < N) {
+        c[tid] = a[tid] * b[tid]; 
+        tid += blockDim.x; 
+    }
+}
+int main(){
+  // initializing pointers  
+  int *a, *b, *c;  
+  int *dev_a, *dev_b, *dev_c,ac; 
+  // allocating CPU memory
+  a = (int*)malloc( N * sizeof(int) );
+  b = (int*)malloc( N * sizeof(int) );
+  c = (int*)malloc( N * sizeof(int) );
+  ... init vars
+  // allocating GPU memory
+  cudaMalloc( (void**)&dev_a, N * sizeof(int) );
+  cudaMalloc( (void**)&dev_b, N * sizeof(int) );
+  cudaMalloc( (void**)&dev_c, N * sizeof(int) );
+  // initializing the list of kernels
+  init_kernel_listing();
+  create_kernel(std::get<1>(get_listing()),add,2,5,dev_a,dev_b,dev_c);
+  create_kernel(std::get<0>(get_listing()),mul,2,5,dev_a,dev_b,dev_c,ac);
+  // gpu scheduling parameters for kernel add
+  struct gpu_sched_param add_p;
+  gb.period_us =  3000000;
+  gb.deadline_us= 3000000;
+  gb.priority = 20;
+  // gpu scheduliung parameters 
+  struct gpu_sched_param mul_p;
+  ga.period_us =  6000000;
+  ga.deadline_us= 6000000;
+  ga.priority = 15;
+  int gs_a = 10;
+  int bs_a = 5;
+  int gs_b = 10;
+  int bs_b = 5;
+  // declaring the gpu tasks  (a container of params)
+  struct pruda_task_t * p_task_b = create_pruda_task(1, add_p, gs_a, bs_a);
+  struct pruda_task_t * p_task_a = create_pruda_task(0, mul_p, gs_b, bs_b);
+  // initializing the scheduler with the desired staretgy and scheduling policy
+  init_scheduler(SINGLE, FP);
+  // adding pruda tasks to the scheduling unit
+  add_pruda_task(p_task_a);
+  add_pruda_task(p_task_b);
+  // launch the tasks on GPU and the corresponding memory copies 
+  // (locks and synchronization params are under developpement)
+  create_cpu_threads();
+  ... 
+}
+```
 # PRUDA scheduling tools and policies
 ## Single core strategy for non preemptive schedulers
 ## Single core strategy for  preemptive schedulers
 ## Multicore strategy for GANG preemptive schedulers
+# OPTIONS 
+PRUDA manage memory copies between
+CPU and GPU and kernel pulls to make scheduling decisions.