QTrk
Public Types | Public Member Functions | Public Attributes | List of all members
QueuedCUDATracker::Stream Struct Reference

Structure to maintain data for each stream. Shell around CUDA native streams. More...

#include <QueuedCUDATracker.h>

Public Types

enum  State { StreamIdle, StreamPendingExec, StreamExecuting }
 Possible stream states. More...
 

Public Member Functions

 Stream (int streamIndex)
 Constructor. Don't use directly, use CreateStream instead. More...
 
 ~Stream ()
 Delete the stream instance. More...
 
bool IsExecutionDone ()
 Test to see if the stream's batch is finished. More...
 
void OutputMemoryUse ()
 Prints the device and host allocated memory usage of a single stream to debug output. More...
 
int JobCount ()
 Get the number of jobs currently in this stream's queue. More...
 

Public Attributes

pinned_array< float3 > results
 3D array in pinned host memory with the localization result values. More...
 
pinned_array< float3 > com
 3D array in pinned host memory with the center of mass results. More...
 
pinned_array< float > imgMeans
 Array in pinned host memory with the ROI intensity means. More...
 
pinned_array< LocalizationParamslocParams
 Array in pinned host memory with additional localization parameters. More...
 
device_vec< LocalizationParamsd_locParams
 Array in device memory with additional localization parameters. More...
 
std::vector< LocalizationJobjobs
 Vector of jobs. Filled with jobs then all jobs are executed at once and the vector is cleared. More...
 
cudaImageListf images
 Image list of all images belonging to the queued jobs. More...
 
pinned_array< float > hostImageBuf
 Buffer in pinned host memory holding the images for the image list images. More...
 
Threads::Mutex imageBufMutex
 Mutex for accesses to the images in memory. More...
 
cudaStream_t stream
 
cudaEvent_t localizationDone
 CUDA event used to determine when a batch is finished. More...
 
cudaEvent_t imageCopyDone
 CUDA event for profiling of image copies. More...
 
cudaEvent_t comDone
 CUDA event for profiling of the center of mass algorithm. More...
 
cudaEvent_t qiDone
 CUDA event for profiling of the quadrant interpolation algorithm. More...
 
cudaEvent_t qalignDone
 CUDA event for profiling of the quadrant align algorithm. More...
 
cudaEvent_t zcomputeDone
 CUDA event for profiling of the z localization. More...
 
cudaEvent_t batchStart
 CUDA event to record the start of a batch. More...
 
device_vec< float3 > d_resultpos
 3D vector in device memory to hold intermediate results. More...
 
device_vec< float3 > d_com
 
QI::StreamInstance qi_instance
 Linked stream of the QI submodule. More...
 
QI::StreamInstance qalign_instance
 Linked stream of the QI submodule used to perform quadrant alignment. See LT_ZLUTAlign. More...
 
device_vec< float > d_imgmeans
 Vector in device memory to hold ROI means. More...
 
device_vec< float > d_radialprofiles
 Vector in device memory to hold all calculated radial profiles. Size is [ radialsteps * njobs ]. More...
 
device_vec< float > d_zlutcmpscores
 Vector in device memory to hold all calculated error curves. Size is [ zlutplanes * njobs ]. More...
 
uint localizeFlags
 Flags for localization choices. See LocalizeModeEnum. More...
 
Devicedevice
 Reference to the device instance this stream should run on. More...
 
State state
 The state flag for the stream. More...
 

Detailed Description

Structure to maintain data for each stream. Shell around CUDA native streams.

Typically, there are 4 streams per available GPU.

Streams are used to discretize all GPU transactions (memory transfer to device, calculations, transfer from device) into bigger batches to increase efficiency. Each stream has its own job queue, pre-allocated memory for a whole batch, and their batches can be executed individually from one another. On newer devices, streams can queue and run their operations concurrently, leading to higher effective calculation speeds by overlapping memory transfers and calculations. Host variables are maintained in pinned memory to optimize transfer speeds.

QueuedCUDATracker::ScheduleLocalization finds a currently available stream and adds the new job to its queue. When a stream's state is set to StreamPendingExec, it is automatically executed by the scheduling thread SchedulingThreadMain.

Definition at line 272 of file QueuedCUDATracker.h.

Member Enumeration Documentation

§ State

Possible stream states.

Todo:
Why is there no StreamDoneExec state?
Enumerator
StreamIdle 

The Stream is idle and can accept more jobs. In other words, the queue is not full.

StreamPendingExec 

The Stream is ready to be executed. That is, the queue is full and the batch is ready or Flush was called.

StreamExecuting 

The Stream is currently active on the GPU and executing its batch.

Definition at line 344 of file QueuedCUDATracker.h.

344  {
345  StreamIdle,
348  };
The Stream is ready to be executed. That is, the queue is full and the batch is ready or Flush was ca...
The Stream is idle and can accept more jobs. In other words, the queue is not full.
The Stream is currently active on the GPU and executing its batch.

Constructor & Destructor Documentation

§ Stream()

QueuedCUDATracker::Stream::Stream ( int  streamIndex)

Constructor. Don't use directly, use CreateStream instead.

Parameters
[in]streamIndexIndex used of mutex name.

Definition at line 265 of file QueuedCUDATracker.cu.

266  : imageBufMutex(SPrintf("imagebuf%d", streamIndex).c_str())
267 {
268  device = 0;
269  hostImageBuf = 0;
270  images.data=0;
271  stream=0;
273  localizeFlags=0;
274 }
uint localizeFlags
Flags for localization choices. See LocalizeModeEnum.
cudaImageListf images
Image list of all images belonging to the queued jobs.
State state
The state flag for the stream.
The Stream is idle and can accept more jobs. In other words, the queue is not full.
Device * device
Reference to the device instance this stream should run on.
Threads::Mutex imageBufMutex
Mutex for accesses to the images in memory.
pinned_array< float > hostImageBuf
Buffer in pinned host memory holding the images for the image list images.
std::string SPrintf(const char *fmt,...)
Definition: utils.cpp:132

§ ~Stream()

QueuedCUDATracker::Stream::~Stream ( )

Delete the stream instance.

Bug:
Why aren't QI instances and device vectors deleted?

Definition at line 277 of file QueuedCUDATracker.cu.

278 {
279  cudaSetDevice(device->index);
280 
281  if(images.data) images.free();
282  cudaEventDestroy(localizationDone);
283  cudaEventDestroy(qiDone);
284  cudaEventDestroy(comDone);
285  cudaEventDestroy(imageCopyDone);
286  cudaEventDestroy(zcomputeDone);
287  cudaEventDestroy(batchStart);
288 
289  if (stream)
290  cudaStreamDestroy(stream); // stream can be zero if in debugStream mode.
291 }
cudaImageListf images
Image list of all images belonging to the queued jobs.
int index
Device index of the device this instance is located on.
cudaEvent_t comDone
CUDA event for profiling of the center of mass algorithm.
cudaEvent_t zcomputeDone
CUDA event for profiling of the z localization.
cudaEvent_t qiDone
CUDA event for profiling of the quadrant interpolation algorithm.
cudaEvent_t batchStart
CUDA event to record the start of a batch.
cudaEvent_t localizationDone
CUDA event used to determine when a batch is finished.
cudaEvent_t imageCopyDone
CUDA event for profiling of image copies.
Device * device
Reference to the device instance this stream should run on.

Member Function Documentation

§ IsExecutionDone()

bool QueuedCUDATracker::Stream::IsExecutionDone ( )

Test to see if the stream's batch is finished.

Note
Also always returns true when not in StreamExecuting.
Returns
Boolean flag indicating whether execution is done.

Definition at line 293 of file QueuedCUDATracker.cu.

294 {
295  cudaSetDevice(device->index);
296  return cudaEventQuery(localizationDone) == cudaSuccess;
297 }
int index
Device index of the device this instance is located on.
cudaEvent_t localizationDone
CUDA event used to determine when a batch is finished.
Device * device
Reference to the device instance this stream should run on.

§ JobCount()

int QueuedCUDATracker::Stream::JobCount ( )
inline

Get the number of jobs currently in this stream's queue.

The maximum queue size is batchSize.

Returns
The number of jobs in the queue.

Definition at line 299 of file QueuedCUDATracker.h.

299 { return jobs.size(); }
std::vector< LocalizationJob > jobs
Vector of jobs. Filled with jobs then all jobs are executed at once and the vector is cleared...

§ OutputMemoryUse()

void QueuedCUDATracker::Stream::OutputMemoryUse ( )

Prints the device and host allocated memory usage of a single stream to debug output.

Definition at line 299 of file QueuedCUDATracker.cu.

300 {
303 
304  int hostMem = hostImageBuf.memsize() + com.memsize() + locParams.memsize() + results.memsize();
305 
306  dbgprintf("Stream memory use: %d MB on host, %d MB device memory (%d for images). \n", hostMem/1024/1024, deviceMem/1024/1024, images.totalNumBytes()/1024/1024);
307 }
cudaImageListf images
Image list of all images belonging to the queued jobs.
device_vec< LocalizationParams > d_locParams
Array in device memory with additional localization parameters.
pinned_array< float3 > com
3D array in pinned host memory with the center of mass results.
device_vec< float3 > d_resultpos
3D vector in device memory to hold intermediate results.
size_t memsize()
Definition: gpu_utils.h:241
int memsize()
Return the total size of memory in bytes used for QI by each stream.
Definition: QI.h:54
pinned_array< float3 > results
3D array in pinned host memory with the localization result values.
QI::StreamInstance qi_instance
Linked stream of the QI submodule.
device_vec< float > d_radialprofiles
Vector in device memory to hold all calculated radial profiles. Size is [ radialsteps * njobs ]...
CUBOTH int totalNumBytes()
pinned_array< LocalizationParams > locParams
Array in pinned host memory with additional localization parameters.
void dbgprintf(const char *fmt,...)
Definition: utils.cpp:149
device_vec< float3 > d_com
device_vec< float > d_zlutcmpscores
Vector in device memory to hold all calculated error curves. Size is [ zlutplanes * njobs ]...
size_t memsize()
Definition: gpu_utils.h:154
pinned_array< float > hostImageBuf
Buffer in pinned host memory holding the images for the image list images.

Member Data Documentation

§ batchStart

cudaEvent_t QueuedCUDATracker::Stream::batchStart

CUDA event to record the start of a batch.

Definition at line 324 of file QueuedCUDATracker.h.

§ com

pinned_array<float3> QueuedCUDATracker::Stream::com

3D array in pinned host memory with the center of mass results.

Definition at line 302 of file QueuedCUDATracker.h.

§ comDone

cudaEvent_t QueuedCUDATracker::Stream::comDone

CUDA event for profiling of the center of mass algorithm.

Definition at line 320 of file QueuedCUDATracker.h.

§ d_com

device_vec<float3> QueuedCUDATracker::Stream::d_com

Definition at line 328 of file QueuedCUDATracker.h.

§ d_imgmeans

device_vec<float> QueuedCUDATracker::Stream::d_imgmeans

Vector in device memory to hold ROI means.

Definition at line 333 of file QueuedCUDATracker.h.

§ d_locParams

device_vec<LocalizationParams> QueuedCUDATracker::Stream::d_locParams

Array in device memory with additional localization parameters.

Definition at line 305 of file QueuedCUDATracker.h.

§ d_radialprofiles

device_vec<float> QueuedCUDATracker::Stream::d_radialprofiles

Vector in device memory to hold all calculated radial profiles. Size is [ radialsteps * njobs ].

Definition at line 334 of file QueuedCUDATracker.h.

§ d_resultpos

device_vec<float3> QueuedCUDATracker::Stream::d_resultpos

3D vector in device memory to hold intermediate results.

Definition at line 327 of file QueuedCUDATracker.h.

§ d_zlutcmpscores

device_vec<float> QueuedCUDATracker::Stream::d_zlutcmpscores

Vector in device memory to hold all calculated error curves. Size is [ zlutplanes * njobs ].

Definition at line 335 of file QueuedCUDATracker.h.

§ device

Device* QueuedCUDATracker::Stream::device

Reference to the device instance this stream should run on.

Definition at line 338 of file QueuedCUDATracker.h.

§ hostImageBuf

pinned_array<float> QueuedCUDATracker::Stream::hostImageBuf

Buffer in pinned host memory holding the images for the image list images.

Definition at line 310 of file QueuedCUDATracker.h.

§ imageBufMutex

Threads::Mutex QueuedCUDATracker::Stream::imageBufMutex

Mutex for accesses to the images in memory.

Definition at line 311 of file QueuedCUDATracker.h.

§ imageCopyDone

cudaEvent_t QueuedCUDATracker::Stream::imageCopyDone

CUDA event for profiling of image copies.

Definition at line 319 of file QueuedCUDATracker.h.

§ images

cudaImageListf QueuedCUDATracker::Stream::images

Image list of all images belonging to the queued jobs.

Definition at line 308 of file QueuedCUDATracker.h.

§ imgMeans

pinned_array<float> QueuedCUDATracker::Stream::imgMeans

Array in pinned host memory with the ROI intensity means.

Definition at line 303 of file QueuedCUDATracker.h.

§ jobs

std::vector<LocalizationJob> QueuedCUDATracker::Stream::jobs

Vector of jobs. Filled with jobs then all jobs are executed at once and the vector is cleared.

Definition at line 306 of file QueuedCUDATracker.h.

§ localizationDone

cudaEvent_t QueuedCUDATracker::Stream::localizationDone

CUDA event used to determine when a batch is finished.

Definition at line 317 of file QueuedCUDATracker.h.

§ localizeFlags

uint QueuedCUDATracker::Stream::localizeFlags

Flags for localization choices. See LocalizeModeEnum.

Definition at line 337 of file QueuedCUDATracker.h.

§ locParams

pinned_array<LocalizationParams> QueuedCUDATracker::Stream::locParams

Array in pinned host memory with additional localization parameters.

Definition at line 304 of file QueuedCUDATracker.h.

§ qalign_instance

QI::StreamInstance QueuedCUDATracker::Stream::qalign_instance

Linked stream of the QI submodule used to perform quadrant alignment. See LT_ZLUTAlign.

Definition at line 331 of file QueuedCUDATracker.h.

§ qalignDone

cudaEvent_t QueuedCUDATracker::Stream::qalignDone

CUDA event for profiling of the quadrant align algorithm.

Definition at line 322 of file QueuedCUDATracker.h.

§ qi_instance

QI::StreamInstance QueuedCUDATracker::Stream::qi_instance

Linked stream of the QI submodule.

Definition at line 330 of file QueuedCUDATracker.h.

§ qiDone

cudaEvent_t QueuedCUDATracker::Stream::qiDone

CUDA event for profiling of the quadrant interpolation algorithm.

Definition at line 321 of file QueuedCUDATracker.h.

§ results

pinned_array<float3> QueuedCUDATracker::Stream::results

3D array in pinned host memory with the localization result values.

Definition at line 301 of file QueuedCUDATracker.h.

§ state

State QueuedCUDATracker::Stream::state

The state flag for the stream.

Definition at line 349 of file QueuedCUDATracker.h.

§ stream

cudaStream_t QueuedCUDATracker::Stream::stream

Definition at line 314 of file QueuedCUDATracker.h.

§ zcomputeDone

cudaEvent_t QueuedCUDATracker::Stream::zcomputeDone

CUDA event for profiling of the z localization.

Definition at line 323 of file QueuedCUDATracker.h.


The documentation for this struct was generated from the following files: