cuda_kernels
pack_scalar_f64_kernel = None if cp is None else cp.RawKernel(pack_scalar_code('double'), 'pack_scalar_double')
module-attribute
pack_scalar_f32_kernel = None if cp is None else cp.RawKernel(pack_scalar_code('float'), 'pack_scalar_float')
module-attribute
unpack_scalar_f64_kernel = None if cp is None else cp.RawKernel(unpack_scalar_code('double'), 'unpack_scalar_double')
module-attribute
unpack_scalar_f32_kernel = None if cp is None else cp.RawKernel(unpack_scalar_code('float'), 'unpack_scalar_float')
module-attribute
pack_vector_f64_kernel = None if cp is None else cp.RawKernel(pack_vector_code('double'), 'pack_vector_double')
module-attribute
pack_vector_f32_kernel = None if cp is None else cp.RawKernel(pack_vector_code('float'), 'pack_vector_float')
module-attribute
unpack_vector_f64_kernel = None if cp is None else cp.RawKernel(unpack_vector_code('double'), 'unpack_vector_double')
module-attribute
unpack_vector_f32_kernel = None if cp is None else cp.RawKernel(unpack_vector_code('float'), 'unpack_vector_float')
module-attribute
pack_scalar_code(float_dtype)
Pack into o_destinationBuffer data from i_sourceArray.
The indexation into i_sourceArray is stored in i_indexes. i_offset is the offset in the destination buffer. i_nIndex allows to protect from out-of-bound read in kernel.
tid is the global unique index calculated from the CUDA scheduler inner data.
unpack_scalar_code(float_dtype)
Unpack into o_destinationArray data from i_sourceBuffer.
The indexation into o_destinationArray is stored in i_indexes. i_offset is the offset in the source buffer. i_nIndex allows to protect from out-of-bound read in kernel.
tid is the global unique index calculated from the CUDA scheduler inner data.
pack_vector_code(float_dtype)
Pack into o_destinationBuffer data from i_sourceArrayX/Y.
The indexation into i_sourceArrayX/Y is stored in i_indexesX/Y. i_offset is the offset in the destination buffer. i_nIndexX/Y allows to protect from out-of-bound read in kernel. i_rotate refers to the rotation that needs to be applied prior to assignment.
tid is the global unique index calculated from the CUDA scheduler inner data.
unpack_vector_code(float_dtype)
Unpack into o_destinationArrayX/Y data from i_sourceBuffer.
The indexation into o_destinationArrayX/Y is stored in i_indexesX/Y. i_offset is the offset in the source buffer. i_nIndexX/Y allows to protect from out-of-bound read in kernel.
tid is the global unique index calculated from the CUDA scheduler inner data.