adding platform description

6281caeb · zahoussem · 2f68927f · 6281caeb
Commit 6281caeb authored 5 years ago by zahoussem
--- a/README.md
+++ b/README.md
 # PRUDA : Real-time programing interface on the top of CUDA


+
+PRUDA is a set of programming tools and mechanisms to control
+scheduling within the GPU. It also provide implementation the following scheduling policies:
+
+   - Fixed Priority non preemptive and preemptive
+   - Earliest Deadline First (EDF) preemptif and non preemptive scheduling techniques
+   - Gang scheduling techniques where the GPU is considered as a multiprocessor architecture.
+
+Details about each scheduling policy will be given in different
+section. First, we will describe the PRUDA functionalities,
+structures. We will also show how a scheduling policy can be easily
+implemnted with PRUDA.
+
+
+
+
+## prerquistis
+
+
+- Programming with CUDA
+- Basic Knowledge about real-time systems
+
+PRUDA is a platform built on the top of CUDA for real-time systems. Therefore, 
+
+
+
+## The GPU in the eye of PRUDA:
+
+A GPU is compound of one or several streaming multiprocessors (SMs)
+and one or several copy engines (CEs). Streaming multiprocessors are
+able to achieve computations (kernels), whereas copy engines execute
+memory copy operations between different memory spaces. Programming
+the GPU requires dividing parallel computations into several grids,
+and each grid to several blocks. A block is a set of multiple
+threads. A GPU can be programmed using generic platforms such OpenCL
+or proprietary independent APIs. We use CUDA, a NVIDIA proprietary
+platform, to have a tight control on SMs and CEs in C/C++
+programming language and using the NVIDIA compiler *nvcc*.
+
+
+From PRUDA perspective, the GPU is a set of copy engines and one or
+more processors. Each SM can be considered as a processor, or both SMs
+as a single processor. PRUDA manage memory copies between CPU and GPU
+and kernel pulls to make scheduling decisions.
+
+
+When a kernel is invoked by CPU code, it submits commands to the
+GPU. How and when commands are consumed, is hidden by constructors for
+intellectual property concerns. PRUDA has been tested on Jetson
+TX2. It is compound of 6 ARM-based CPU cores, along with an integrated
+NVIDIA PASCAL-based GPU. The GPU in the TX2 is compound of 256 Cuda
+cores, divided into two SMs and one copy engine. CPUs and GPU share
+the same memory module. From a programming perspective, one may either
+allocate two separate memory spaces for CPU and GPU using {\sf malloc}
+and {\sf CudaMalloc} primitives respectively. The programmer may use a
+memory space visible logically by the CPU and the GPU called CUDA
+unified memory (even for discrete GPUs), therefore no memory copies
+are needed between CPU and GPU tasks such memory spaces (buffers)
+allocated using the {\sf CudaMallocManaged} primitive. PRUDA allows
+handling both memory copy operations by enabling and desabling
+automatic memory copy operations.
+
+Typical Cuda programs are organized in the same way. first, memory
+allocation operations are achieved both on CPU and GPU. Further,
+memory copies are operated between CPU and GPU. Later, the GPU kernel
+is launched, and finally results are copied back to the CPU by memory
+copy operations. Cuda Malloc is a costly operation. Therefore, in
+PRUDA, this operation must be achieved by the programmer, out of the
+real-time task processing.
+
+
+
+
+
+all thread of any block are executed
+only by only one SM, however different blocks of the same kernel may
+be executed on different SMs. In Figure \ref{fig:sched_jetson}, the
+green kernel is executed on both SM0 and SM1, the red SM is executed
+only on SM0. The kernel execution order and mechanisms are driven by
+internal closed-source NVIDIA drivers (in our case of study). A PRUDA
+user may get the SM where a given block/thread is executing using {\sf
+  pruda\_get\_sm()} primitive. PRUDA allows also enforcing the
+allocation of a given kernel to a specific SM by using PRUDA primitive
+{\sf pruda\_allocate\_to\_sm(int sm\_id)}, where the {\sf sm\_id} is
+the id of the target streaming multiprocessor. Implementation details
+about how these primitives can be found in the PRUDA description
+section.
+
+
+  
+  To enforce an execution order between different kernels, we use a
+  specific data structure, called Cuda streams. A Cuda stream has a
+  FIFO behavior. Therefore, kernels submitted to a Cuda stream are
+  executed one after the other in a {\bf sequential}
+  fashion. Therefore, synchronization between two consecutive kernels
+  is implicitly achieved. This property will be used later to
+  implement non preemptive EDF and fixed priority real-time scheduling
+  policies.
+
+  In Cuda, the user may define several streams. A priority might be
+  set between different streams. Therefore, if a Stream {\sf A} have a
+  higher priority than stream {\sf B}, all kernels of {\sf A} are
+  meant to execute before kernels that are submitted to {\sf B}. If a
+  kernel of in {\sf B} is executing, while a kernel is activated on
+  {\sf A}, the GPU might preempt the kernel of {\sf B}, to execute the
+  kernel of {\sf A} according to our benchmarking according to the GPU
+  preemption level. We highlight that fine-grain preemption
+  capabilities are available in NVIDIA GPUs starting from the PASCAL
+  architecture. For example, if a preemption is set a a block level,
+  preemption will be achieved when all already executing blocks finish
+  their execution. Recent VOLTA GPUs allow even finer preemption
+  levels. Even if it is possible to create more than 2 streams, only
+  two levels of priority are available in the Jetson TX2 platform.
+  These properties will be used further to achieve EDF and fixed
+  priority preemptive scheduling policies.
+
+  Other PRUDA primitives will be detailed later.
+ 
+
+
+
+
+## CUDA functionalities :
+
+
+PRUDA allows a kernel to execute within a single SM 
+## Singe core strateg
\ No newline at end of file