Structure to maintain data for each stream. Shell around CUDA native streams. More...

#include <QueuedCUDATracker.h>

Public Types
enum	State { StreamIdle, StreamPendingExec, StreamExecuting }
	Possible stream states. More...

Public Member Functions
	Stream (int streamIndex)
	Constructor. Don't use directly, use CreateStream instead. More...

	~Stream ()
	Delete the stream instance. More...

bool	IsExecutionDone ()
	Test to see if the stream's batch is finished. More...

void	OutputMemoryUse ()
	Prints the device and host allocated memory usage of a single stream to debug output. More...

int	JobCount ()
	Get the number of jobs currently in this stream's queue. More...

Public Attributes
pinned_array< float3 >	results
	3D array in pinned host memory with the localization result values. More...

pinned_array< float3 >	com
	3D array in pinned host memory with the center of mass results. More...

pinned_array< float >	imgMeans
	Array in pinned host memory with the ROI intensity means. More...

pinned_array< LocalizationParams >	locParams
	Array in pinned host memory with additional localization parameters. More...

device_vec< LocalizationParams >	d_locParams
	Array in device memory with additional localization parameters. More...

std::vector< LocalizationJob >	jobs
	Vector of jobs. Filled with jobs then all jobs are executed at once and the vector is cleared. More...

cudaImageListf	images
	Image list of all images belonging to the queued jobs. More...

pinned_array< float >	hostImageBuf
	Buffer in pinned host memory holding the images for the image list `images`. More...

Threads::Mutex	imageBufMutex
	Mutex for accesses to the images in memory. More...

cudaStream_t	stream

cudaEvent_t	localizationDone
	CUDA event used to determine when a batch is finished. More...

cudaEvent_t	imageCopyDone
	CUDA event for profiling of image copies. More...

cudaEvent_t	comDone
	CUDA event for profiling of the center of mass algorithm. More...

cudaEvent_t	qiDone
	CUDA event for profiling of the quadrant interpolation algorithm. More...

cudaEvent_t	qalignDone
	CUDA event for profiling of the quadrant align algorithm. More...

cudaEvent_t	zcomputeDone
	CUDA event for profiling of the z localization. More...

cudaEvent_t	batchStart
	CUDA event to record the start of a batch. More...

device_vec< float3 >	d_resultpos
	3D vector in device memory to hold intermediate results. More...

device_vec< float3 >	d_com

QI::StreamInstance	qi_instance
	Linked stream of the QI submodule. More...

QI::StreamInstance	qalign_instance
	Linked stream of the QI submodule used to perform quadrant alignment. See LT_ZLUTAlign. More...

device_vec< float >	d_imgmeans
	Vector in device memory to hold ROI means. More...

device_vec< float >	d_radialprofiles
	Vector in device memory to hold all calculated radial profiles. Size is [ radialsteps * njobs ]. More...

device_vec< float >	d_zlutcmpscores
	Vector in device memory to hold all calculated error curves. Size is [ zlutplanes * njobs ]. More...

uint	localizeFlags
	Flags for localization choices. See LocalizeModeEnum. More...

Device *	device
	Reference to the device instance this stream should run on. More...

State	state
	The state flag for the stream. More...

Detailed Description

Structure to maintain data for each stream. Shell around CUDA native streams.

Typically, there are 4 streams per available GPU.

Streams are used to discretize all GPU transactions (memory transfer to device, calculations, transfer from device) into bigger batches to increase efficiency. Each stream has its own job queue, pre-allocated memory for a whole batch, and their batches can be executed individually from one another. On newer devices, streams can queue and run their operations concurrently, leading to higher effective calculation speeds by overlapping memory transfers and calculations. Host variables are maintained in pinned memory to optimize transfer speeds.

QueuedCUDATracker::ScheduleLocalization finds a currently available stream and adds the new job to its queue. When a stream's state is set to StreamPendingExec, it is automatically executed by the scheduling thread SchedulingThreadMain.

Definition at line 272 of file QueuedCUDATracker.h.

Member Enumeration Documentation

§ State

enum QueuedCUDATracker::Stream::State

Possible stream states.

Todo:: Why is there no StreamDoneExec state?

Enumerator
StreamIdle	The Stream is idle and can accept more jobs. In other words, the queue is not full.
StreamPendingExec	The Stream is ready to be executed. That is, the queue is full and the batch is ready or Flush was called.
StreamExecuting	The Stream is currently active on the GPU and executing its batch.

Definition at line 344 of file QueuedCUDATracker.h.

                    {
             StreamIdle,         
             StreamPendingExec,  
             StreamExecuting     
         };

Constructor & Destructor Documentation

§ Stream()

QueuedCUDATracker::Stream::Stream ( int streamIndex )

Constructor. Don't use directly, use CreateStream instead.

Parameters

[in] streamIndex Index used of mutex name.

Definition at line 265 of file QueuedCUDATracker.cu.

     : imageBufMutex(SPrintf("imagebuf%d", streamIndex).c_str())
 { 
     device = 0;
     hostImageBuf = 0; 
     images.data=0; 
     stream=0;
     state=StreamIdle;
     localizeFlags=0;
 }

§ ~Stream()

QueuedCUDATracker::Stream::~Stream ( )

Delete the stream instance.

Bug:: Why aren't QI instances and device vectors deleted?

Definition at line 277 of file QueuedCUDATracker.cu.

 {
     cudaSetDevice(device->index);
 
     if(images.data) images.free();
     cudaEventDestroy(localizationDone);
     cudaEventDestroy(qiDone);
     cudaEventDestroy(comDone);
     cudaEventDestroy(imageCopyDone);
     cudaEventDestroy(zcomputeDone);
     cudaEventDestroy(batchStart);
 
     if (stream)
         cudaStreamDestroy(stream); // stream can be zero if in debugStream mode.
 }

Member Function Documentation

§ IsExecutionDone()

bool QueuedCUDATracker::Stream::IsExecutionDone ( )

Test to see if the stream's batch is finished.

Note: Also always returns true when not in StreamExecuting.

Returns: Boolean flag indicating whether execution is done.

Definition at line 293 of file QueuedCUDATracker.cu.

 {
     cudaSetDevice(device->index);
     return cudaEventQuery(localizationDone) == cudaSuccess;
 }

§ JobCount()

int QueuedCUDATracker::Stream::JobCount ( )

inline

Get the number of jobs currently in this stream's queue.

The maximum queue size is batchSize.

Returns: The number of jobs in the queue.

Definition at line 299 of file QueuedCUDATracker.h.

299 { return jobs.size(); }

QueuedCUDATracker::Stream::jobs

std::vector< LocalizationJob > jobs

Vector of jobs. Filled with jobs then all jobs are executed at once and the vector is cleared...

Definition: QueuedCUDATracker.h:306

§ OutputMemoryUse()

void QueuedCUDATracker::Stream::OutputMemoryUse ( )

Prints the device and host allocated memory usage of a single stream to debug output.

Definition at line 299 of file QueuedCUDATracker.cu.

 {
     int deviceMem = d_com.memsize() + d_locParams.memsize() + qi_instance.memsize() + d_radialprofiles.memsize() +
         d_resultpos.memsize() + d_zlutcmpscores.memsize() + images.totalNumBytes();
 
     int hostMem = hostImageBuf.memsize() + com.memsize() + locParams.memsize() + results.memsize();
 
     dbgprintf("Stream memory use: %d MB on host, %d MB device memory (%d for images). \n", hostMem/1024/1024, deviceMem/1024/1024, images.totalNumBytes()/1024/1024);
 }

Member Data Documentation

§ batchStart

cudaEvent_t QueuedCUDATracker::Stream::batchStart

CUDA event to record the start of a batch.

Definition at line 324 of file QueuedCUDATracker.h.

§ com

pinned_array<float3> QueuedCUDATracker::Stream::com

3D array in pinned host memory with the center of mass results.

Definition at line 302 of file QueuedCUDATracker.h.

§ comDone

cudaEvent_t QueuedCUDATracker::Stream::comDone

CUDA event for profiling of the center of mass algorithm.

Definition at line 320 of file QueuedCUDATracker.h.

§ d_com

device_vec<float3> QueuedCUDATracker::Stream::d_com

Definition at line 328 of file QueuedCUDATracker.h.

§ d_imgmeans

device_vec<float> QueuedCUDATracker::Stream::d_imgmeans

Vector in device memory to hold ROI means.

Definition at line 333 of file QueuedCUDATracker.h.

§ d_locParams

device_vec<LocalizationParams> QueuedCUDATracker::Stream::d_locParams

Array in device memory with additional localization parameters.

Definition at line 305 of file QueuedCUDATracker.h.

§ d_radialprofiles

device_vec<float> QueuedCUDATracker::Stream::d_radialprofiles

Vector in device memory to hold all calculated radial profiles. Size is [ radialsteps * njobs ].

Definition at line 334 of file QueuedCUDATracker.h.

§ d_resultpos

device_vec<float3> QueuedCUDATracker::Stream::d_resultpos

3D vector in device memory to hold intermediate results.

Definition at line 327 of file QueuedCUDATracker.h.

§ d_zlutcmpscores

device_vec<float> QueuedCUDATracker::Stream::d_zlutcmpscores

Vector in device memory to hold all calculated error curves. Size is [ zlutplanes * njobs ].

Definition at line 335 of file QueuedCUDATracker.h.

§ device

Device* QueuedCUDATracker::Stream::device

Reference to the device instance this stream should run on.

Definition at line 338 of file QueuedCUDATracker.h.

§ hostImageBuf

pinned_array<float> QueuedCUDATracker::Stream::hostImageBuf

Buffer in pinned host memory holding the images for the image list images.

Definition at line 310 of file QueuedCUDATracker.h.

§ imageBufMutex

Threads::Mutex QueuedCUDATracker::Stream::imageBufMutex

Mutex for accesses to the images in memory.

Definition at line 311 of file QueuedCUDATracker.h.

§ imageCopyDone

cudaEvent_t QueuedCUDATracker::Stream::imageCopyDone

CUDA event for profiling of image copies.

Definition at line 319 of file QueuedCUDATracker.h.

§ images

cudaImageListf QueuedCUDATracker::Stream::images

Image list of all images belonging to the queued jobs.

Definition at line 308 of file QueuedCUDATracker.h.

§ imgMeans

pinned_array<float> QueuedCUDATracker::Stream::imgMeans

Array in pinned host memory with the ROI intensity means.

Definition at line 303 of file QueuedCUDATracker.h.

§ jobs

std::vector<LocalizationJob> QueuedCUDATracker::Stream::jobs

Vector of jobs. Filled with jobs then all jobs are executed at once and the vector is cleared.

Definition at line 306 of file QueuedCUDATracker.h.

§ localizationDone

cudaEvent_t QueuedCUDATracker::Stream::localizationDone

CUDA event used to determine when a batch is finished.

Definition at line 317 of file QueuedCUDATracker.h.

§ localizeFlags

uint QueuedCUDATracker::Stream::localizeFlags

Flags for localization choices. See LocalizeModeEnum.

Definition at line 337 of file QueuedCUDATracker.h.

§ locParams

pinned_array<LocalizationParams> QueuedCUDATracker::Stream::locParams

Array in pinned host memory with additional localization parameters.

Definition at line 304 of file QueuedCUDATracker.h.

§ qalign_instance

QI::StreamInstance QueuedCUDATracker::Stream::qalign_instance

Linked stream of the QI submodule used to perform quadrant alignment. See LT_ZLUTAlign.

Definition at line 331 of file QueuedCUDATracker.h.

§ qalignDone

cudaEvent_t QueuedCUDATracker::Stream::qalignDone

CUDA event for profiling of the quadrant align algorithm.

Definition at line 322 of file QueuedCUDATracker.h.

§ qi_instance

QI::StreamInstance QueuedCUDATracker::Stream::qi_instance

Linked stream of the QI submodule.

Definition at line 330 of file QueuedCUDATracker.h.

§ qiDone

cudaEvent_t QueuedCUDATracker::Stream::qiDone

CUDA event for profiling of the quadrant interpolation algorithm.

Definition at line 321 of file QueuedCUDATracker.h.

§ results

pinned_array<float3> QueuedCUDATracker::Stream::results

3D array in pinned host memory with the localization result values.

Definition at line 301 of file QueuedCUDATracker.h.

§ state

State QueuedCUDATracker::Stream::state

The state flag for the stream.

Definition at line 349 of file QueuedCUDATracker.h.

§ stream

cudaStream_t QueuedCUDATracker::Stream::stream

Definition at line 314 of file QueuedCUDATracker.h.

§ zcomputeDone

cudaEvent_t QueuedCUDATracker::Stream::zcomputeDone

CUDA event for profiling of the z localization.

Definition at line 323 of file QueuedCUDATracker.h.

The documentation for this struct was generated from the following files:

cudatrack/QueuedCUDATracker.h
cudatrack/QueuedCUDATracker.cu

Public Types

Public Member Functions

Public Attributes