Skip to content

cuda_kernels

pack_scalar_f64_kernel = None if cp is None else cp.RawKernel(pack_scalar_code('double'), 'pack_scalar_double') module-attribute

pack_scalar_f32_kernel = None if cp is None else cp.RawKernel(pack_scalar_code('float'), 'pack_scalar_float') module-attribute

unpack_scalar_f64_kernel = None if cp is None else cp.RawKernel(unpack_scalar_code('double'), 'unpack_scalar_double') module-attribute

unpack_scalar_f32_kernel = None if cp is None else cp.RawKernel(unpack_scalar_code('float'), 'unpack_scalar_float') module-attribute

pack_vector_f64_kernel = None if cp is None else cp.RawKernel(pack_vector_code('double'), 'pack_vector_double') module-attribute

pack_vector_f32_kernel = None if cp is None else cp.RawKernel(pack_vector_code('float'), 'pack_vector_float') module-attribute

unpack_vector_f64_kernel = None if cp is None else cp.RawKernel(unpack_vector_code('double'), 'unpack_vector_double') module-attribute

unpack_vector_f32_kernel = None if cp is None else cp.RawKernel(unpack_vector_code('float'), 'unpack_vector_float') module-attribute

pack_scalar_code(float_dtype)

Pack into o_destinationBuffer data from i_sourceArray.

The indexation into i_sourceArray is stored in i_indexes. i_offset is the offset in the destination buffer. i_nIndex allows to protect from out-of-bound read in kernel.

tid is the global unique index calculated from the CUDA scheduler inner data.

unpack_scalar_code(float_dtype)

Unpack into o_destinationArray data from i_sourceBuffer.

The indexation into o_destinationArray is stored in i_indexes. i_offset is the offset in the source buffer. i_nIndex allows to protect from out-of-bound read in kernel.

tid is the global unique index calculated from the CUDA scheduler inner data.

pack_vector_code(float_dtype)

Pack into o_destinationBuffer data from i_sourceArrayX/Y.

The indexation into i_sourceArrayX/Y is stored in i_indexesX/Y. i_offset is the offset in the destination buffer. i_nIndexX/Y allows to protect from out-of-bound read in kernel. i_rotate refers to the rotation that needs to be applied prior to assignment.

tid is the global unique index calculated from the CUDA scheduler inner data.

unpack_vector_code(float_dtype)

Unpack into o_destinationArrayX/Y data from i_sourceBuffer.

The indexation into o_destinationArrayX/Y is stored in i_indexesX/Y. i_offset is the offset in the source buffer. i_nIndexX/Y allows to protect from out-of-bound read in kernel.

tid is the global unique index calculated from the CUDA scheduler inner data.