Browsing projects by Tag(s)

Select a tag to browse associated projects and drill deeper into the tag cloud.

Showing page 1 of 1

Thrust is a CUDA library of parallel algorithms with an interface resembling the C++ Standard Template Library (STL). Thrust provides a flexible high-level interface for GPU programming that greatly enhances developer productivity. Develop high-performance applications rapidly with Thrust!

4.66667
   
  0 reviews  |  4 users  |  35,908 lines of code  |  6 current contributors  |  Analyzed almost 2 years ago
 
 

Introductionstdcuda is a library of data-parallel algorithms with an STL-like interface. What is stdcuda?stdcuda is designed to allow CUDA programmers convenient access to parallel algorithms through a templated interface similar to the C++ Standard Template Library. stdcuda provides a suite of ... [More] commonly encountered data parallel algorithms which may be used as primitive building blocks of larger systems. Featuresstdcuda exposes the high-performance computing capabilities of emerging CUDA-capable parallel platforms through a familiar serial programmatic interface. A few of these features include: vector_dev provides convenient device memory management similar to std::vector. scan provides an efficient parallel prefix-sum. reduce provides an efficient parallel reduction. All functions are implemented through header files, without the hassles common to linked libraries. ExamplesManaging Device Arrays with vector_dev// vector_example.cu // This example demonstrates how stdcuda manages device memory #include #include // stdcuda classes and functions reside in the stdcuda namespace using namespace stdcuda; int main(void) { // create a vector of ints residing on a CUDA device vector_dev data(10000); // fill it with random values srand(13); for(int i = 0; i != data.size(); ++i) { data[i] = rand(); } // print the 1024th value int val = data[1024]; printf("The 1024th value is %i\n", val); return 0; }Host-to-Device Copy// copy_example.cu // This example demonstrates how to copy an array from the host to the device #include #include #include #include int main(void) { // Because per-element access to a vector_dev is slow, we should initialize // a vector on the host and copy it to a device vector en masse to // amortize the transfer cost // create a vector of ints residing on the host std::vector h_data(10000); // fill it with random values srand(13); for(int i = 0; i != h_data.size(); ++i) { h_data[i] = rand(); } // create a vector of ints residing on the device and copy from h_data stdcuda::vector_dev d_data(h_data.begin(), h_data.end()); // check to ensure the 1024th elements of each match if(h_data[1024] == d_data[1024]) { printf("No problems!\n"); } return 0; }Parallel Reduction// reduction_example.cu // This example demonstrates how to compute the sum // of a large array of numbers with a parallel reduction #include #include #include #include int main(void) { // create a vector of ints residing on the device stdcuda::vector_dev d_data(10000); // initialize as before ... // find the sum of the elements of d_data with a reduction printf("Reducing %u elements:\n", d_data.size()); int sum = stdcuda::reduce(d_data.begin(), d_data.end(), 0); printf("The sum is %i\n", sum); return 0; }Counting// counting_example.cu // This example demonstrates how to count the number of occurrences // of some element in a large array of numbers with match and pop_count #include #include #include #include int main(void) { // create a vector_dev as before stdcuda::vector_dev d_data(10000); // initialize as before ... // create an array to hold a bit vector stdcuda::vector_dev matches(d_data.size()); // identify all matches of the number 10 stdcuda::match(d_data.begin(), d_data.end(), matches.begin(), 10); // count all non-zero elements of the matches array int result = pop_count(matches.begin(), matches.end()); printf("%i occurrences.\n", result); return 0; }Stream compactionUsing stdcudaIn order to use stdcuda functions in your CUDA code, you need only checkout the source and make it accessible via your include path: $ svn checkout http://stdcuda.googlecode.com/svn/trunk/stdcuda stdcudaRelated LibrariesCUDPP provides a plan-based interface to several data-parallel primitives such as scan, stream compaction, and sparse matrix-vector multiplication using CUDA. CUDPP is carefully tuned with the objective of peak performance on GPU hardware. [Less]

0
 
  0 reviews  |  0 users  |  4,021 lines of code  |  0 current contributors  |  Analyzed over 2 years ago
 
 
 
 

Creative Commons License Copyright © 2013 Black Duck Software, Inc. and its contributors, Some Rights Reserved. Unless otherwise marked, this work is licensed under a Creative Commons Attribution 3.0 Unported License . Ohloh ® and the Ohloh logo are trademarks of Black Duck Software, Inc. in the United States and/or other jurisdictions. All other trademarks are the property of their respective holders.