QTrk
|
Class to specifically perform quadrant interpolation calculations on the GPU. More...
#include <QI.h>
Classes | |
struct | DeviceInstance |
Contains all QI memory space that is allocated per device (shared between streams). More... | |
struct | StreamInstance |
Contains all QI memory space that is allocated per stream. More... | |
Public Member Functions | |
template<typename TImageSampler > | |
void | Execute (BaseKernelParams &p, const QTrkComputedConfig &cfg, StreamInstance *s, DeviceInstance *d, device_vec< float3 > *initial, device_vec< float3 > *output) |
Execute a batch of QI calculations. Runs Iterate the correct number of times and recalculates the number of radial spokes for every iteration, based on QTrkComputedConfig::qi_angstep_factor. More... | |
void | InitDevice (DeviceInstance *d, QTrkComputedConfig &cc) |
Ready a device for QI calculations. More... | |
void | InitStream (StreamInstance *s, QTrkComputedConfig &cc, cudaStream_t stream, int batchSize) |
Ready a stream for QI calculations. More... | |
void | Init (QTrkComputedConfig &cfg, int batchSize) |
Initialize this QI instance. More... | |
Private Member Functions | |
template<typename TImageSampler > | |
void | Iterate (BaseKernelParams &p, device_vec< float3 > *initial, device_vec< float3 > *output, StreamInstance *s, DeviceInstance *d, int angularSteps) |
Perform one QI iteration. More... | |
dim3 | blocks (int njobs) |
Same function as QueuedCUDATracker::blocks. More... | |
dim3 | threads () |
Same function as QueuedCUDATracker::threads. More... | |
Private Attributes | |
QIParams | params |
Structure with settings relevant to quadrant interpolation. More... | |
int | qi_FFT_length |
Parameter for length required for arrays going into FFT. 2 * radial steps. More... | |
int | batchSize |
See QueuedCUDATracker::batchSize. Local copy. More... | |
int | numThreads |
See QueuedCUDATracker::numThreads. More... | |
Class to specifically perform quadrant interpolation calculations on the GPU.
Maintains all memory and functions related to QI to keep QueuedCUDATracker more clean.
|
inlineprivate |
void QI::Execute | ( | BaseKernelParams & | p, |
const QTrkComputedConfig & | cfg, | ||
QI::StreamInstance * | s, | ||
QI::DeviceInstance * | d, | ||
device_vec< float3 > * | initial, | ||
device_vec< float3 > * | output | ||
) |
Execute a batch of QI calculations. Runs Iterate the correct number of times and recalculates the number of radial spokes for every iteration, based on QTrkComputedConfig::qi_angstep_factor.
[in] | p | Reference to BaseKernelParams with parameters for this call. |
[in] | cfg | Reference to the settings to use. |
[in] | s | The QI::StreamInstance to use. |
[in] | d | The QI::DeviceInstance to use. |
[in] | initial | Vector with the initial positions from which to start the radial samplings on the first iteration. |
[out] | output | Vector that will be filled with the final results. |
Definition at line 268 of file QI_impl.h.
void QI::Init | ( | QTrkComputedConfig & | cfg, |
int | batchSize | ||
) |
Initialize this QI instance.
Copy relevant settings to local datastructures.
[in] | cfg | The configuration used for the algorithm. |
[in] | batchSize | The calculation batch size. See batchSize. |
Definition at line 387 of file QI_impl.h.
void QI::InitDevice | ( | DeviceInstance * | d, |
QTrkComputedConfig & | cc | ||
) |
Ready a device for QI calculations.
Calculate required parameters and fill a DeviceInstance.
[in,out] | d | The instance to be initialized. |
[in] | cc | The configuration used for the algorithm. |
Definition at line 344 of file QI_impl.h.
void QI::InitStream | ( | StreamInstance * | s, |
QTrkComputedConfig & | cc, | ||
cudaStream_t | stream, | ||
int | batchSize | ||
) |
Ready a stream for QI calculations.
Calculate required parameters and fill a StreamInstance.
[in,out] | s | The instance to be initialized. |
[in] | cc | The configuration used for the algorithm. |
[in] | stream | The normal CUDA stream this QI Stream will relate to. |
[in] | batchSize | The calculation batch size. See batchSize. |
Definition at line 363 of file QI_impl.h.
|
private |
Perform one QI iteration.
This is where the actual algorithm is executed. See [3] for details.
The kernels are called in the following order:
[in] | p | KernelParamaters to use. |
[in] | initial | The initial positions to use as sampling centers. |
[out] | output | Vector with the resulting positions. |
[in] | s | The stream to execute. |
[in] | d | The device on which to execute. |
[in] | angularSteps | The number of angular steps to use. |
Huge speedup achieved by calculating each quadrant's radial bin in a seperate thread. This leads to more, smaller kernels running concurrently so more memory latency hiding can be achieved through higher occupancy. For a visual display of this, try running the NVidia Visual Profiler with both versions of quadrant profile kernels.
Definition at line 280 of file QI_impl.h.
|
inlineprivate |
|
private |
See QueuedCUDATracker::batchSize. Local copy.
|
private |
|
private |
|
private |