Projects tagged ‘cuda’ and ‘parallel’


[15 total ]

0 Users

by SIMD or SIMT processing device such as Nvidia's CUDA-capable GPU
Created 4 months ago.

0 Users

This project will implement the heat equation with serial and parallel code for performance comparisons. The serial code is written in c, and the parallel code is written with cuda to be executed on ... [More] nvidia gpu's. I will begin with a single GPU, then work up to multiple GPU's. Ultimately I will want to expand the data set across a GPU cluster, which I am currently building as part of my research project. [Less]
Created 2 months ago.

0 Users

A GPU-based implementation of the lattice Boltzmann CFD method
Created 2 months ago.

0 Users

Projeto criado para usar como repositório de códigos de um estudo que está sendo realizado por acadêmicos de Sistemas de Informação e Tecnologia em Redes de Computadores da SETREM. Estão aqui ... [More] os códigos (paralelos e sequenciais) usados no estudo. Estes códigos encontram-se em diversas linguagens de programação, e de primeiro momento visam evidenciar aplicabilidade de capacidades de cada ferramenta/linguagem. Conforme artigos e textos forem publicados em eventos, também serão aqui relacionados. [Less]
Created 4 months ago.

0 Users

The java developer can use GPU by CUDA C.
Created 4 months ago.

0 Users

OpenCL for java. The java developer can use GPU by OpenCL.
Created 3 months ago.

0 Users

A fast, parallel, versatile QED modelling framework. Uses Geometric Calculus and CUDA---see wiki.
Created 12 months ago.

0 Users

Note: Komrade has been superseded by Thrust. Refer to the Thrust website for further details and changes since the final Komrade v0.9 release. What is Komrade?Komrade is a CUDA library of ... [More] parallel algorithms with an interface resembling the C++ Standard Template Library (STL). Komrade provides a high-level interface for GPU programming while remaining both fast and flexible. ExamplesKomrade is best explained through examples. The following source code generates random numbers on the host and transfers them to the device where they are sorted. #include #include #include #include #include int main(void) { // generate random data on the host komrade::host_vector h_vec(20); komrade::generate(h_vec.begin(), h_vec.end(), rand); // transfer to device and sort komrade::device_vector d_vec = h_vec; komrade::sort(d_vec.begin(), d_vec.end()); return 0; }This code sample computes the sum of 100 random numbers on the GPU. #include #include #include #include #include #include int main(void) { // generate random data on the host komrade::host_vector h_vec(100); komrade::generate(h_vec.begin(), h_vec.end(), rand); // transfer to device and compute sum komrade::device_vector d_vec = h_vec; int x = komrade::reduce(d_vec.begin(), d_vec.end(), komrade::plus()); return 0; }Refer to the Tutorial page for further information and examples. AcknowledgmentsWe wish to thank the following people who have made important intellectual and/or software contributions to Komrade: Mark Harris Michael Garland Nadathur Satish Shubho Sengupta Additionally, we thank the compiler group at NVIDIA for their continued improvements to nvcc. In particular, we appreciate the work Bastiaan Aarts has done to enhance nvcc's C++ support. [Less]
Created 8 months ago.

0 Users

A linear algebra API & command line interpreter written with cuda optimizations.
Created 6 months ago.

0 Users

EXISTING PROBLEM: NUMERICAL INTEGRATION Definite integrals arise in many different areas and the Fundamental Theorem of Calculus is a powerful tool for evaluating definite integrals. However, it ... [More] cannot always be applied. There are some functions which do not have an antiderivative which can be expressed in terms of familiar functions such as polynomials, exponentials and trigonometric functions. One such example is E(-X2). Of course, this is an important function since it is the probability density function for the normal distribution. Moreover, we sometimes only have information about a function by making observations at a certain number of points. In that case, we do not have a nice formula for the function we are integrating, but only some data points. One of the current solution to the above problem is the Trapezoidal Rule. EXISTING SOLUTION: THE TRAPEZOIDAL RULE The trapezoidal rule uses trapezoids instead of rectangles to approximate the definite interval over a closed bounded interval. By using points on the graph of the function determined by a uniform width partition of the interval the upper boundary of the trapezoid is formed. Of course the more subintervals, (or said another way: the more trapezoids) the more accuracy of the estimation. And here lies the biggest challenge in the implementation of the Trapezoidal Rule - the sheer computational complexity involved - particularly when high levels of accuracy are required. SOLUTION PROPOSED USING CUDA / PROJECT OBJECTIVE We have proposed a parallel algorithm for the Trapezoidal Rule, which exploits the poer of CUDA. Running 4 blocks of 256 threads each, per call - subject to a maximum limit of 2^27 calls (after this the function starts making approximations). CODE BRIEF On execution, the user is asked to choose a mode for computation - Quick, Standard or Extended - depending on which the relevant function is called. In the Quick or the Default Mode, the Integration is performed over from 0 to 1. The accuracy is two decimal places. In the Standard and Extended modes, the user gets to choose one out of the 5 common types of functions: Inverse, Logarithmic, Algebraic, Trigonometric and Exponential. The accuracy is three decimal places in Standard, while it increases to 6 decimal places in Extended Mode. In addition, the Extended Mode also allows the user to control the main kernel function. He can specify the Depth of Recursion at which the function should start making serial calls, as well as the Depth of recursion at which it should quit. [Less]
Created 4 months ago.