QTrk
|
Structure to maintain data for each stream. Shell around CUDA native streams. More...
#include <QueuedCUDATracker.h>
Public Types | |
enum | State { StreamIdle, StreamPendingExec, StreamExecuting } |
Possible stream states. More... | |
Public Member Functions | |
Stream (int streamIndex) | |
Constructor. Don't use directly, use CreateStream instead. More... | |
~Stream () | |
Delete the stream instance. More... | |
bool | IsExecutionDone () |
Test to see if the stream's batch is finished. More... | |
void | OutputMemoryUse () |
Prints the device and host allocated memory usage of a single stream to debug output. More... | |
int | JobCount () |
Get the number of jobs currently in this stream's queue. More... | |
Public Attributes | |
pinned_array< float3 > | results |
3D array in pinned host memory with the localization result values. More... | |
pinned_array< float3 > | com |
3D array in pinned host memory with the center of mass results. More... | |
pinned_array< float > | imgMeans |
Array in pinned host memory with the ROI intensity means. More... | |
pinned_array< LocalizationParams > | locParams |
Array in pinned host memory with additional localization parameters. More... | |
device_vec< LocalizationParams > | d_locParams |
Array in device memory with additional localization parameters. More... | |
std::vector< LocalizationJob > | jobs |
Vector of jobs. Filled with jobs then all jobs are executed at once and the vector is cleared. More... | |
cudaImageListf | images |
Image list of all images belonging to the queued jobs. More... | |
pinned_array< float > | hostImageBuf |
Buffer in pinned host memory holding the images for the image list images . More... | |
Threads::Mutex | imageBufMutex |
Mutex for accesses to the images in memory. More... | |
cudaStream_t | stream |
cudaEvent_t | localizationDone |
CUDA event used to determine when a batch is finished. More... | |
cudaEvent_t | imageCopyDone |
CUDA event for profiling of image copies. More... | |
cudaEvent_t | comDone |
CUDA event for profiling of the center of mass algorithm. More... | |
cudaEvent_t | qiDone |
CUDA event for profiling of the quadrant interpolation algorithm. More... | |
cudaEvent_t | qalignDone |
CUDA event for profiling of the quadrant align algorithm. More... | |
cudaEvent_t | zcomputeDone |
CUDA event for profiling of the z localization. More... | |
cudaEvent_t | batchStart |
CUDA event to record the start of a batch. More... | |
device_vec< float3 > | d_resultpos |
3D vector in device memory to hold intermediate results. More... | |
device_vec< float3 > | d_com |
QI::StreamInstance | qi_instance |
Linked stream of the QI submodule. More... | |
QI::StreamInstance | qalign_instance |
Linked stream of the QI submodule used to perform quadrant alignment. See LT_ZLUTAlign. More... | |
device_vec< float > | d_imgmeans |
Vector in device memory to hold ROI means. More... | |
device_vec< float > | d_radialprofiles |
Vector in device memory to hold all calculated radial profiles. Size is [ radialsteps * njobs ]. More... | |
device_vec< float > | d_zlutcmpscores |
Vector in device memory to hold all calculated error curves. Size is [ zlutplanes * njobs ]. More... | |
uint | localizeFlags |
Flags for localization choices. See LocalizeModeEnum. More... | |
Device * | device |
Reference to the device instance this stream should run on. More... | |
State | state |
The state flag for the stream. More... | |
Structure to maintain data for each stream. Shell around CUDA native streams.
Typically, there are 4 streams per available GPU.
Streams are used to discretize all GPU transactions (memory transfer to device, calculations, transfer from device) into bigger batches to increase efficiency. Each stream has its own job queue, pre-allocated memory for a whole batch, and their batches can be executed individually from one another. On newer devices, streams can queue and run their operations concurrently, leading to higher effective calculation speeds by overlapping memory transfers and calculations. Host variables are maintained in pinned memory to optimize transfer speeds.
QueuedCUDATracker::ScheduleLocalization finds a currently available stream and adds the new job to its queue. When a stream's state is set to StreamPendingExec, it is automatically executed by the scheduling thread SchedulingThreadMain.
Definition at line 272 of file QueuedCUDATracker.h.
Possible stream states.
Enumerator | |
---|---|
StreamIdle | The Stream is idle and can accept more jobs. In other words, the queue is not full. |
StreamPendingExec | The Stream is ready to be executed. That is, the queue is full and the batch is ready or Flush was called. |
StreamExecuting | The Stream is currently active on the GPU and executing its batch. |
Definition at line 344 of file QueuedCUDATracker.h.
QueuedCUDATracker::Stream::Stream | ( | int | streamIndex | ) |
Constructor. Don't use directly, use CreateStream instead.
[in] | streamIndex | Index used of mutex name. |
Definition at line 265 of file QueuedCUDATracker.cu.
QueuedCUDATracker::Stream::~Stream | ( | ) |
Delete the stream instance.
Definition at line 277 of file QueuedCUDATracker.cu.
bool QueuedCUDATracker::Stream::IsExecutionDone | ( | ) |
Test to see if the stream's batch is finished.
Definition at line 293 of file QueuedCUDATracker.cu.
|
inline |
Get the number of jobs currently in this stream's queue.
The maximum queue size is batchSize.
Definition at line 299 of file QueuedCUDATracker.h.
void QueuedCUDATracker::Stream::OutputMemoryUse | ( | ) |
Prints the device and host allocated memory usage of a single stream to debug output.
Definition at line 299 of file QueuedCUDATracker.cu.
cudaEvent_t QueuedCUDATracker::Stream::batchStart |
CUDA event to record the start of a batch.
Definition at line 324 of file QueuedCUDATracker.h.
pinned_array<float3> QueuedCUDATracker::Stream::com |
3D array in pinned host memory with the center of mass results.
Definition at line 302 of file QueuedCUDATracker.h.
cudaEvent_t QueuedCUDATracker::Stream::comDone |
CUDA event for profiling of the center of mass algorithm.
Definition at line 320 of file QueuedCUDATracker.h.
device_vec<float3> QueuedCUDATracker::Stream::d_com |
Definition at line 328 of file QueuedCUDATracker.h.
device_vec<float> QueuedCUDATracker::Stream::d_imgmeans |
Vector in device memory to hold ROI means.
Definition at line 333 of file QueuedCUDATracker.h.
device_vec<LocalizationParams> QueuedCUDATracker::Stream::d_locParams |
Array in device memory with additional localization parameters.
Definition at line 305 of file QueuedCUDATracker.h.
device_vec<float> QueuedCUDATracker::Stream::d_radialprofiles |
Vector in device memory to hold all calculated radial profiles. Size is [ radialsteps * njobs ].
Definition at line 334 of file QueuedCUDATracker.h.
device_vec<float3> QueuedCUDATracker::Stream::d_resultpos |
3D vector in device memory to hold intermediate results.
Definition at line 327 of file QueuedCUDATracker.h.
device_vec<float> QueuedCUDATracker::Stream::d_zlutcmpscores |
Vector in device memory to hold all calculated error curves. Size is [ zlutplanes * njobs ].
Definition at line 335 of file QueuedCUDATracker.h.
Device* QueuedCUDATracker::Stream::device |
Reference to the device instance this stream should run on.
Definition at line 338 of file QueuedCUDATracker.h.
pinned_array<float> QueuedCUDATracker::Stream::hostImageBuf |
Buffer in pinned host memory holding the images for the image list images
.
Definition at line 310 of file QueuedCUDATracker.h.
Threads::Mutex QueuedCUDATracker::Stream::imageBufMutex |
Mutex for accesses to the images in memory.
Definition at line 311 of file QueuedCUDATracker.h.
cudaEvent_t QueuedCUDATracker::Stream::imageCopyDone |
CUDA event for profiling of image copies.
Definition at line 319 of file QueuedCUDATracker.h.
cudaImageListf QueuedCUDATracker::Stream::images |
Image list of all images belonging to the queued jobs.
Definition at line 308 of file QueuedCUDATracker.h.
pinned_array<float> QueuedCUDATracker::Stream::imgMeans |
Array in pinned host memory with the ROI intensity means.
Definition at line 303 of file QueuedCUDATracker.h.
std::vector<LocalizationJob> QueuedCUDATracker::Stream::jobs |
Vector of jobs. Filled with jobs then all jobs are executed at once and the vector is cleared.
Definition at line 306 of file QueuedCUDATracker.h.
cudaEvent_t QueuedCUDATracker::Stream::localizationDone |
CUDA event used to determine when a batch is finished.
Definition at line 317 of file QueuedCUDATracker.h.
uint QueuedCUDATracker::Stream::localizeFlags |
Flags for localization choices. See LocalizeModeEnum.
Definition at line 337 of file QueuedCUDATracker.h.
pinned_array<LocalizationParams> QueuedCUDATracker::Stream::locParams |
Array in pinned host memory with additional localization parameters.
Definition at line 304 of file QueuedCUDATracker.h.
QI::StreamInstance QueuedCUDATracker::Stream::qalign_instance |
Linked stream of the QI submodule used to perform quadrant alignment. See LT_ZLUTAlign.
Definition at line 331 of file QueuedCUDATracker.h.
cudaEvent_t QueuedCUDATracker::Stream::qalignDone |
CUDA event for profiling of the quadrant align algorithm.
Definition at line 322 of file QueuedCUDATracker.h.
QI::StreamInstance QueuedCUDATracker::Stream::qi_instance |
Linked stream of the QI submodule.
Definition at line 330 of file QueuedCUDATracker.h.
cudaEvent_t QueuedCUDATracker::Stream::qiDone |
CUDA event for profiling of the quadrant interpolation algorithm.
Definition at line 321 of file QueuedCUDATracker.h.
pinned_array<float3> QueuedCUDATracker::Stream::results |
3D array in pinned host memory with the localization result values.
Definition at line 301 of file QueuedCUDATracker.h.
State QueuedCUDATracker::Stream::state |
The state flag for the stream.
Definition at line 349 of file QueuedCUDATracker.h.
cudaStream_t QueuedCUDATracker::Stream::stream |
Definition at line 314 of file QueuedCUDATracker.h.
cudaEvent_t QueuedCUDATracker::Stream::zcomputeDone |
CUDA event for profiling of the z localization.
Definition at line 323 of file QueuedCUDATracker.h.