QTrk
Classes | Public Types | Public Member Functions | Public Attributes | Protected Member Functions | Static Protected Member Functions | Protected Attributes | List of all members
QueuedCUDATracker Class Reference

CUDA implementation of the QueuedTracker interface. More...

#include <QueuedCUDATracker.h>

Inheritance diagram for QueuedCUDATracker:
QueuedTracker

Classes

struct  Device
 Structure to maintain data for each GPU. More...
 
struct  KernelProfileTime
 Structure used to hold profiling data. More...
 
struct  Stream
 Structure to maintain data for each stream. Shell around CUDA native streams. More...
 

Public Types

typedef Image4DMemory< float > ImageLUT
 Datatype to hold image lookup tables. More...
 
- Public Types inherited from QueuedTracker
typedef std::map< std::string, std::string > ConfigValueMap
 Datastructure used to return additional settings in a string-string key-value pair mapping. More...
 

Public Member Functions

 QueuedCUDATracker (const QTrkComputedConfig &cc, int batchSize=-1)
 Initialize a QueuedCUDATracker instance. More...
 
 ~QueuedCUDATracker ()
 Delete an instance of a CUDA Tracker. More...
 
void EnableTextureCache (bool useTextureCache)
 Enable or disable the use of textures. More...
 
void SetLocalizationMode (LocMode_t locType) override
 Select which algorithm is to be used. More...
 
void ScheduleLocalization (void *data, int pitch, QTRK_PixelDataType pdt, const LocalizationJob *jobInfo) override
 Add a job to the queue to be processed. A job entails running the required algorithms on a single region of interest. More...
 
void ClearResults () override
 Clear results. More...
 
void SetRadialZLUT (float *data, int numLUTs, int planes) override
 Set the radial lookup tables to be used for z tracking. More...
 
void SetRadialWeights (float *zcmp) override
 Set radial weights used for comparing LUT profiles. More...
 
void GetRadialZLUT (float *data) override
 Get the radial lookup tables used for z tracking. More...
 
void GetRadialZLUTSize (int &count, int &planes, int &radialSteps) override
 Get the dimensions of the radial lookup table data. More...
 
int FetchResults (LocalizationResult *results, int maxResults) override
 Fetch available results. More...
 
void BeginLUT (uint flags) override
 Setup to begin building a lookup table. More...
 
void BuildLUT (void *data, int pitch, QTRK_PixelDataType pdt, int plane, vector2f *known_pos=0) override
 Add a new lookup table plane. More...
 
void FinalizeLUT () override
 Finalize the lookup tables in memory. More...
 
void EnableRadialZLUTCompareProfile (bool enabled)
 Set a flag to enable saving of error curves. More...
 
void GetRadialZLUTCompareProfile (float *dst)
 Get saved error curve. More...
 
std::string GetProfileReport () override
 Get the output of performance profiling. More...
 
void Flush () override
 Stop waiting for more jobs to do, and just process the current batch. More...
 
int GetQueueLength (int *maxQueueLen) override
 Get the lengths of the queue of jobs to be handled. More...
 
bool IsIdle () override
 Test to see if the tracker is idle. More...
 
int GetResultCount () override
 Get the number of finished localization jobs (=results) available in memory. More...
 
void SetPixelCalibrationImages (float *offset, float *gain) override
 Set the pixel calibration images. More...
 
void SetPixelCalibrationFactors (float offsetFactor, float gainFactor) override
 Set the pixel calibration factors. More...
 
ConfigValueMap GetConfigValues () override
 Get the used additional configurations. More...
 
void SetConfigValue (std::string name, std::string value) override
 Set an additional setting. More...
 
- Public Member Functions inherited from QueuedTracker
 QueuedTracker ()
 
virtual ~QueuedTracker ()
 
void ScheduleImageData (ImageData *data, const LocalizationJob *jobInfo)
 Quick function to schedule a single ROI from an ImageData object. More...
 
virtual int ScheduleFrame (void *imgptr, int pitch, int width, int height, ROIPosition *positions, int numROI, QTRK_PixelDataType pdt, const LocalizationJob *jobInfo)
 Schedule an entire frame at once, allowing for further optimizations. More...
 
virtual void GetImageZLUTSize (int *dims)
 Get the dimensions of the image lookup table data. More...
 
virtual void GetImageZLUT (float *dst)
 Get the image lookup tables used. More...
 
virtual bool SetImageZLUT (float *src, float *radial_zlut, int *dims)
 Set the image lookup tables to be used for z tracking. More...
 
virtual std::string GetWarnings ()
 Get a report of encountered errors. More...
 
virtual bool GetDebugImage (int ID, int *w, int *h, float **pData)
 Get the debug image for a specific thread. More...
 
ImageData DebugImage (int ID)
 Get the debug image as an ImageData object. More...
 
void ScheduleLocalization (uchar *data, int pitch, QTRK_PixelDataType pdt, uint frame, uint timestamp, vector3f *initial, uint zlutIndex)
 Add an image to the queue to be processed. Creates a job. More...
 
void ComputeZBiasCorrection (int bias_planes, CImageData *result, int smpPerPixel, bool useSplineInterp)
 
float ZLUTBiasCorrection (float z, int zlut_planes, int bead)
 
void SetZLUTBiasCorrection (const CImageData &data)
 
CImageDataGetZLUTBiasCorrection ()
 

Public Attributes

KernelProfileTime time
 
KernelProfileTime cpu_time
 
int batchesDone
 Number of fully completed batches. More...
 
std::string deviceReport
 String holding a human-readable description of used GPUs. Filled during InitializeDeviceList. More...
 
- Public Attributes inherited from QueuedTracker
QTrkComputedConfig cfg
 The settings used by this instance of QueuedTracker. More...
 

Protected Member Functions

dim3 blocks ()
 Calculate the number of thread blocks of size numThreads needed to have 1 thread per job. More...
 
dim3 blocks (int workItems)
 Calculate the number of thread blocks of size numThreads needed to have 1 thread per workItems. More...
 
dim3 threads ()
 Get the CUDA native datatype with the threadblock dimensions to use. More...
 
void SchedulingThreadMain ()
 Loop executed by the scheduling thread which executes threads when needed. More...
 
template<typename TImageSampler >
void ExecuteBatch (Stream *s)
 Execute the queued batch on a stream. More...
 
StreamGetReadyStream ()
 Get a stream that is not currently executing, and still has room for images. More...
 
void InitializeDeviceList ()
 Build the list of devices to be used based on the QTrkSettings::cuda_device flag. More...
 
StreamCreateStream (Device *device, int streamIndex)
 Initialize a stream instance. More...
 
void CopyStreamResults (Stream *s)
 Copy localization results from device to host memory. Also updates profiling times. More...
 
void StreamUpdateZLUTSize (Stream *s)
 Update zlut vector dimensions. Use when settings change. More...
 
void CPU_ApplyOffsetGain (CPUTracker *trk, int beadIndex)
 Use the CPU-based tracker to apply the pixel calibrations. More...
 

Static Protected Member Functions

static void SchedulingThreadEntryPoint (void *param)
 Entry point for thread creation. More...
 

Protected Attributes

int batchSize
 Amount of images to be sent at once per stream. More...
 
int numThreads
 Number of threads to use in a general thread block. More...
 
std::vector< Stream * > streams
 Vector of usable streams. More...
 
std::list< LocalizationResultresults
 Vector of completed results. More...
 
int resultCount
 Number of results available. More...
 
LocMode_t localizeMode
 Flags for localization choices. See LocalizeModeEnum. More...
 
Threads::Mutex resultMutex
 Mutex for result memory accesses. More...
 
Threads::Mutex jobQueueMutex
 Mutex for job queue accesses. More...
 
std::vector< Device * > devices
 Vector of device instances used. More...
 
bool useTextureCache
 Flag to use texture cache. Default is true. Disable using EnableTextureCache. More...
 
float gc_offsetFactor
 Factor by which to scale the pixel calibration offset. More...
 
float gc_gainFactor
 Factor by which to scale the gain. More...
 
std::vector< float > gc_offset
 Vector with offsets used for pixel correction. More...
 
std::vector< float > gc_gain
 Vector with gains used for pixel correction. More...
 
Threads::Mutex gc_mutex
 Mutex for pixel calibration operations. More...
 
uint zlut_build_flags
 Flags for ZLUT building. Not actually used yet. More...
 
QI qi
 The QI instance used to perform the 2D localization using the quadrant interpolation algorithm. More...
 
QI qalign
 Instance of QI used specifically for quadrant alignment. More...
 
cudaDeviceProp deviceProp
 Variable used to save device properties obtained from the CUDA API. More...
 
Threads::HandleschedulingThread
 Handle to the scheduling thread for later reference. More...
 
Atomic< bool > quitScheduler
 Thread shutdown flag with built-in thread safety (atomic). More...
 
- Protected Attributes inherited from QueuedTracker
CImageDatazlut_bias_correction
 

Detailed Description

CUDA implementation of the QueuedTracker interface.

Exploits the inherent parallel computing power of GPU hardware to increase the speed at which tracking can be performed. Optimizations will be listed on a per-function basis. Speeds of almost 40000 100x100 ROI/s with 5 QI iterations and Z localization have been achieved on a setup with a GTX1080 card.

The currently implemented tracking algorithms are COM, Quadrant Interpolation and 2D Gaussian with Max-Likelihood estimation.

It will automatically use all available CUDA devices if using the QTrkCUDA_UseAll value for QTrkSettings::cuda_device.

Method:

Thread-Safety:

We assume 2 threads concurrently accessing the tracker functions.

Issues:

Definition at line 137 of file QueuedCUDATracker.h.

Member Typedef Documentation

§ ImageLUT

Datatype to hold image lookup tables.

Definition at line 198 of file QueuedCUDATracker.h.

Constructor & Destructor Documentation

§ QueuedCUDATracker()

QueuedCUDATracker::QueuedCUDATracker ( const QTrkComputedConfig cc,
int  batchSize = -1 
)

Initialize a QueuedCUDATracker instance.

Parameters
[in]ccThe settings to be used.
[in]batchSizeThe number of ROIs to be handled in one batch. Default (-1) calculates optimum. See batchSize for optimization info.
Exceptions
runtime_errorInit error: GPU does not support CUDA capability 2.0 or higher.
runtime_errorInit error: Failed to create GPU streams.

We use QTrkComputedConfig::numThreads for the number of CUDA streams. Default (numThreads < 1) is 4 streams per CUDA device.

Number of threads per thread block is a single warp so each block can be executed concurrently on a single SM. Assuming no code branching, which should hold for kernels doing nothing but the same calculations on different locations in memory.

Calculate maximum batchSize based on available texture memory. Factor 0.99 is empirical, 100% of maximum does not work for some reason.

Initialize the QI module to handle the Quadrant Interpolation algorithm.

Pre-calculate the radial sampling grid based on radial and angular step settings from the passed QTrkComputedConfig.

Initialize the available CUDA devices. Copy the sampling table to device memory.

Todo:
Experiment with global values (settings) in global and/or shared device memory.

Initialize streams.

Texture cache is enabled by default. Use EnableTextureCache to disable if required.

Start the scheduling thread running SchedulingThreadMain.

Run an empty kernel to startup driver and get rid of first run overhead.

Definition at line 103 of file QueuedCUDATracker.cu.

104  : resultMutex("result"), jobQueueMutex("jobqueue")
105 {
106  cfg = cc;
107 
109 
112  if (cfg.numThreads < 1) {
113  cfg.numThreads = devices.size()*4;
114  }
115  int numStreams = cfg.numThreads;
116 
117  cudaGetDeviceProperties(&deviceProp, devices[0]->index);
118 
119  // Commented out because disabling timeout protection causes a bug that disables stream concurrency.
120  // Only useful for debugging anyway; no single kernel should take longer than 2 seconds unless breakpointed.
121  //if (deviceProp.kernelExecTimeoutEnabled)
122  // throw std::runtime_error(SPrintf("CUDA Tracker init error: CUDA Kernel execution timeout is enabled for %s. Disable WDDM Time-out Detection and Recovery (TDR) in the nVidia NSight Monitor before running this code", deviceProp.name));
123  if (deviceProp.major < 2)
124  throw std::runtime_error("CUDA Tracker init error: GPU not supported, capability < 2.0");
125 
128  numThreads = deviceProp.warpSize;
129 
130  if(batchSize<0)
131  batchSize = (int)(deviceProp.maxTexture2D[1]/cfg.height * 0.99f);
132  this->batchSize = batchSize;
133 
134  dbgprintf("CUDA Hardware: %s. \n# of CUDA processors: %d. Using %d streams\n", deviceProp.name, deviceProp.multiProcessorCount, numStreams);
135  dbgprintf("Warp size: %d. Max threads: %d, Batch size: %d\n", deviceProp.warpSize, deviceProp.maxThreadsPerBlock, batchSize);
136  // dbgprintf("Mem: %u MB. Per block: %u B\n", deviceProp.totalGlobalMem/1024/1024, deviceProp.sharedMemPerBlock); // Total memory available in Mbytes
137 
139  qi.Init(cfg, batchSize);
140 
142  std::vector<float2> zlut_radialgrid(cfg.zlut_angularsteps);
143  for (int i=0;i<cfg.zlut_angularsteps;i++) {
144  float ang = 2*3.141593f*i/(float)cfg.zlut_angularsteps;
145  zlut_radialgrid[i]=make_float2(cos(ang),sin(ang));
146  }
147 
151  for (uint i=0;i<devices.size();i++) {
152  Device* d = devices[i];
153  cudaSetDevice(d->index);
154  qi.InitDevice(&d->qi_instance, cfg);
155  d->zlut_trigtable = zlut_radialgrid;
156  }
157 
158  //dbgprintf("\n");
159  //outputTotalGPUMemUse("pre-streams");
160 
162  streams.reserve(numStreams);
163  try {
164  for (int i=0;i<numStreams;i++)
165  streams.push_back( CreateStream( devices[i%devices.size()], i ) );
166  }
167  catch(...) {
169  throw std::runtime_error("CUDA Tracker init error: Failed to create GPU streams.");
170  }
171 
172  //dbgprintf("\n");
173  //streams[0]->OutputMemoryUse();
174  //dbgprintf("\n");
175  //outputTotalGPUMemUse("post-streams");
176  //dbgprintf("\n");
177 
178  batchesDone = 0;
180  useTextureCache = true;
181  resultCount = 0;
183 
184  quitScheduler = false;
187 
190 
192  ForceCUDAKernelsToLoad <<< dim3(),dim3() >>> ();
193 }
float gc_offsetFactor
Factor by which to scale the pixel calibration offset.
static void SchedulingThreadEntryPoint(void *param)
Entry point for thread creation.
bool useTextureCache
Flag to use texture cache. Default is true. Disable using EnableTextureCache.
Threads::Mutex resultMutex
Mutex for result memory accesses.
unsigned int uint
Definition: std_incl.h:127
Threads::Mutex jobQueueMutex
Mutex for job queue accesses.
int batchSize
Amount of images to be sent at once per stream.
Threads::Handle * schedulingThread
Handle to the scheduling thread for later reference.
int numThreads
Number of threads to use in a general thread block.
int height
Height of regions of interest to be handled. Typically equals width (square ROI). ...
Definition: qtrk_c_api.h:107
LocMode_t localizeMode
Flags for localization choices. See LocalizeModeEnum.
static Handle * Create(ThreadEntryPoint method, void *param)
Definition: threads.h:99
int resultCount
Number of results available.
cudaDeviceProp deviceProp
Variable used to save device properties obtained from the CUDA API.
QTrkComputedConfig cfg
The settings used by this instance of QueuedTracker.
void DeleteAllElems(T &c)
Definition: utils.h:19
std::vector< Device * > devices
Vector of device instances used.
Stream * CreateStream(Device *device, int streamIndex)
Initialize a stream instance.
std::vector< Stream * > streams
Vector of usable streams.
void dbgprintf(const char *fmt,...)
Definition: utils.cpp:149
Use only COM.
Definition: qtrk_c_api.h:7
float gc_gainFactor
Factor by which to scale the gain.
void InitializeDeviceList()
Build the list of devices to be used based on the QTrkSettings::cuda_device flag. ...
Atomic< bool > quitScheduler
Thread shutdown flag with built-in thread safety (atomic).
void InitDevice(DeviceInstance *d, QTrkComputedConfig &cc)
Ready a device for QI calculations.
Definition: QI_impl.h:344
int numThreads
Number of threads/streams to use. Defaults differ between CPU and GPU implementations.
Definition: qtrk_c_api.h:114
int batchesDone
Number of fully completed batches.
QI qi
The QI instance used to perform the 2D localization using the quadrant interpolation algorithm...
void Init(QTrkComputedConfig &cfg, int batchSize)
Initialize this QI instance.
Definition: QI_impl.h:387
int zlut_angularsteps
Number of angular steps to sample on.
Definition: qtrk_c_api.h:199
uint zlut_build_flags
Flags for ZLUT building. Not actually used yet.

§ ~QueuedCUDATracker()

QueuedCUDATracker::~QueuedCUDATracker ( )

Delete an instance of a CUDA Tracker.

Definition at line 195 of file QueuedCUDATracker.cu.

196 {
197  quitScheduler = true;
199 
202 }
Threads::Handle * schedulingThread
Handle to the scheduling thread for later reference.
void DeleteAllElems(T &c)
Definition: utils.h:19
static void WaitAndClose(Handle *h)
Definition: threads.h:128
std::vector< Device * > devices
Vector of device instances used.
std::vector< Stream * > streams
Vector of usable streams.
Atomic< bool > quitScheduler
Thread shutdown flag with built-in thread safety (atomic).

Member Function Documentation

§ BeginLUT()

void QueuedCUDATracker::BeginLUT ( uint  flags)
overridevirtual

Setup to begin building a lookup table.

Sets the flags by which the lookup table is built. Flags are defined in a uint bitmask format as:

Name Value Description
0 Default, no special flags.
BUILDLUT_IMAGELUT 1 (2^0) Build an image LUT. An image LUT seems to be a work in progress to save ROIs rather than profiles in the LUT.
BUILDLUT_FOURIER 2 (2^1) Build a fourier LUT.
BUILDLUT_NORMALIZE 4 (2^2) Normalize radial profiles. Irrelevant, since FinalizeLUT always normalizes.
BUILDLUT_BIASCORRECT8 (2^3) Enable bias correction. Currently only partly implemented, don't use.
Parameters
[in]flagsUINT interpreted as a series of bits to set settings.

Implements QueuedTracker.

Definition at line 442 of file QueuedCUDATracker.cu.

443 {
444  zlut_build_flags = flags;
445 }
uint zlut_build_flags
Flags for ZLUT building. Not actually used yet.

§ blocks() [1/2]

dim3 QueuedCUDATracker::blocks ( )
inlineprotected

Calculate the number of thread blocks of size numThreads needed to have 1 thread per job.

Use in conjunction with threads as:

kernel <<< blocks(), threads() >>> (a, b);
Returns
CUDA 3D data for use in kernel calls.

Definition at line 376 of file QueuedCUDATracker.h.

376 { return dim3((batchSize+numThreads-1)/numThreads); }
int batchSize
Amount of images to be sent at once per stream.
int numThreads
Number of threads to use in a general thread block.

§ blocks() [2/2]

dim3 QueuedCUDATracker::blocks ( int  workItems)
inlineprotected

Calculate the number of thread blocks of size numThreads needed to have 1 thread per workItems.

Use in conjunction with threads as:

kernel <<< blocks(items), threads() >>> (a, b);
Parameters
[in]workItemsNumber of total threads needed.
Returns
CUDA 3D data for use in kernel calls.

Definition at line 388 of file QueuedCUDATracker.h.

388 { return dim3((workItems+numThreads-1)/numThreads); }
int numThreads
Number of threads to use in a general thread block.

§ BuildLUT()

void QueuedCUDATracker::BuildLUT ( void *  data,
int  pitch,
QTRK_PixelDataType  pdt,
int  plane,
vector2f known_pos = 0 
)
overridevirtual

Add a new lookup table plane.

Takes a stack of ROI images through data. Determines and adds the profile for each ROI to its respective LUT.

Parameters
[in]dataPointer to the start of an image stack.
[in]pitchWidth of the data in memory so offsets can be calculated.
[in]pdtPixel data type for the data. See QTRK_PixelDataType.
[in]planeThe plane number the ROIs are taken for.
[in]known_posCenter position from which to start making the radial profile. A standard QI localization with applied settings is performed if 0 (default).
Note
Uses the CPU tracker for LUT building!

Implements QueuedTracker.

Definition at line 447 of file QueuedCUDATracker.cu.

448 {
449  // Copy to image
450  Device* d = streams[0]->device;
451  cudaSetDevice(d->index);
452 
453  int nbeads = d->radial_zlut.count; // jobcount
454  std::vector<vector2f> positions(nbeads);
455 
456  float *profiles = new float [nbeads * cfg.zlut_radialsteps];
457  memset(profiles, 0, sizeof(float) * nbeads * cfg.zlut_radialsteps);
458 
459  // Should be fast enough for zlut building
460  parallel_for(nbeads, [&] (int i) {
461  CPUTracker trk (cfg.width,cfg.height);
462  void *img_data = (uchar*)data + pitch * cfg.height * i;
463 
464  if (pdt == QTrkFloat)
465  trk.SetImage((float*)img_data, pitch);
466  else if (pdt == QTrkU8)
467  trk.SetImage8Bit((uchar*)img_data, pitch);
468  else
469  trk.SetImage16Bit((ushort*)img_data,pitch);
470 
471  CPU_ApplyOffsetGain(&trk, i);
472 
473  if(known_pos) {
474  positions[i] = known_pos[i];
475  } else {
476  vector2f com = trk.ComputeMeanAndCOM();
477  bool bhit;
479  trk.ComputeRadialProfile(&profiles[i * cfg.zlut_radialsteps], cfg.zlut_radialsteps, cfg.zlut_angularsteps, cfg.zlut_minradius, cfg.zlut_maxradius, positions[i], false, 0, true);
480  }
481  });
482 
483  // add to device 0 LUT
484  device_vec<float> d_profiles (nbeads * cfg.zlut_radialsteps);
485  d_profiles.copyToDevice(profiles, nbeads * cfg.zlut_radialsteps);
486  delete[] profiles;
487 
488  dim3 numThreads(4, 64);
489  dim3 numBlocks( (nbeads + numThreads.x - 1) / numThreads.x, (cfg.zlut_radialsteps + numThreads.y - 1) / numThreads.y);
490  AddProfilesToZLUT<<< numBlocks, numThreads, 0, streams[0]->stream >>> (d_profiles.data, nbeads, cfg.zlut_radialsteps, plane, d->radial_zlut);
491  cudaStreamSynchronize(streams[0]->stream);
492 }
float zlut_minradius
Distance in pixels from the bead center from which to start sampling profiles. Default 1...
Definition: qtrk_c_api.h:135
64 bit float
Definition: qtrk_c_api.h:37
float qi_minradius
Distance in pixels from the bead center from which to start sampling profiles. Default 1...
Definition: qtrk_c_api.h:141
int width
Width of regions of interest to be handled. Typically equals height (square ROI). ...
Definition: qtrk_c_api.h:106
Class with all CPU algorithm implementations.
Definition: cpu_tracker.h:38
int numThreads
Number of threads to use in a general thread block.
int height
Height of regions of interest to be handled. Typically equals width (square ROI). ...
Definition: qtrk_c_api.h:107
unsigned short ushort
Definition: std_incl.h:128
8 bit unsigned int
Definition: qtrk_c_api.h:35
int zlut_radialsteps
Number of radial steps to sample on.
Definition: qtrk_c_api.h:198
void parallel_for(int count, TF f)
Definition: threads.h:251
float qi_maxradius
Max radius in pixels of the sampling circle.
Definition: qtrk_c_api.h:204
QTrkComputedConfig cfg
The settings used by this instance of QueuedTracker.
int qi_radialsteps
Number of radial steps to sample on.
Definition: qtrk_c_api.h:202
__global__ void AddProfilesToZLUT(float *d_src, int nbeads, int radialsteps, int plane, cudaImageListf zlut)
std::vector< Stream * > streams
Vector of usable streams.
int qi_angstepspq
Number of angular steps to sample on per quadrant.
Definition: qtrk_c_api.h:203
int qi_iterations
Number of times to run the QI algorithm, sampling around the last found position. ...
Definition: qtrk_c_api.h:140
float qi_angstep_factor
Factor to reduce angular steps on lower iterations. Default 1.0 (no effect).
Definition: qtrk_c_api.h:157
int zlut_angularsteps
Number of angular steps to sample on.
Definition: qtrk_c_api.h:199
float zlut_maxradius
Max radius in pixels of the sampling circle.
Definition: qtrk_c_api.h:200
unsigned char uchar
Definition: std_incl.h:130
void CPU_ApplyOffsetGain(CPUTracker *trk, int beadIndex)
Use the CPU-based tracker to apply the pixel calibrations.

§ ClearResults()

void QueuedCUDATracker::ClearResults ( )
overridevirtual

Clear results.

Implements QueuedTracker.

Definition at line 861 of file QueuedCUDATracker.cu.

862 {
863  resultMutex.lock();
864  results.clear();
865  resultCount=0;
867 }
Threads::Mutex resultMutex
Mutex for result memory accesses.
void lock()
Definition: threads.h:74
int resultCount
Number of results available.
std::list< LocalizationResult > results
Vector of completed results.
void unlock()
Definition: threads.h:79

§ CopyStreamResults()

void QueuedCUDATracker::CopyStreamResults ( Stream s)
protected

Copy localization results from device to host memory. Also updates profiling times.

Parameters
[in]sThe stream from which to copy results.

Definition at line 674 of file QueuedCUDATracker.cu.

675 {
676  resultMutex.lock();
677  for (int a=0;a<s->JobCount();a++) {
678  LocalizationJob& j = s->jobs[a];
680  r.job = j;
681  r.firstGuess = vector2f( s->com[a].x, s->com[a].y );
682  r.pos = vector3f( s->results[a].x , s->results[a].y, s->results[a].z);
683  r.imageMean = s->imgMeans[a];
684  r.pos.z = ZLUTBiasCorrection(s->results[a].z, devices[0]->radial_zlut.h, j.zlutIndex);
685  results.push_back(r);
686  }
687  resultCount+=s->JobCount();
689 
690  // Update times
691  float qi, com, imagecopy, zcomp, getResults;
692  cudaEventElapsedTime(&imagecopy, s->batchStart, s->imageCopyDone);
693  cudaEventElapsedTime(&com, s->imageCopyDone, s->comDone);
694  cudaEventElapsedTime(&qi, s->comDone, s->qiDone);
695  cudaEventElapsedTime(&zcomp, s->qiDone, s->zcomputeDone);
696  cudaEventElapsedTime(&getResults, s->zcomputeDone, s->localizationDone);
697  time.com += com;
698  time.qi += qi;
699  time.imageCopy += imagecopy;
700  time.zcompute += zcomp;
701  time.getResults += getResults;
702  batchesDone ++;
703 }
Threads::Mutex resultMutex
Mutex for result memory accesses.
int zlutIndex
Bead number of this ROI. Used to get the right ZLUT from memory.
Definition: qtrk_c_api.h:58
void lock()
Definition: threads.h:74
Structure for job results.
Definition: qtrk_c_api.h:67
vector2< float > vector2f
Definition: std_incl.h:39
float imageMean
Average pixel value of the ROI.
Definition: qtrk_c_api.h:73
int resultCount
Number of results available.
std::list< LocalizationResult > results
Vector of completed results.
KernelProfileTime time
vector3f pos
Final 3D position found. If no z localization was performed, the value of z will be 0...
Definition: qtrk_c_api.h:69
LocalizationJob job
Job metadata. See LocalizationJob.
Definition: qtrk_c_api.h:68
std::vector< Device * > devices
Vector of device instances used.
float ZLUTBiasCorrection(float z, int zlut_planes, int bead)
vector2f firstGuess
(x,y) position found by the COM localization. Used as initial position for the subsequent algorithms...
Definition: qtrk_c_api.h:71
vector3< float > vector3f
Definition: std_incl.h:114
void unlock()
Definition: threads.h:79
int batchesDone
Number of fully completed batches.
QI qi
The QI instance used to perform the 2D localization using the quadrant interpolation algorithm...
Structure for region of interest metadata.
Definition: qtrk_c_api.h:49

§ CPU_ApplyOffsetGain()

void QueuedCUDATracker::CPU_ApplyOffsetGain ( CPUTracker trk,
int  beadIndex 
)
protected

Use the CPU-based tracker to apply the pixel calibrations.

This is only used for LUT building because that entirely still runs on the CPU (because speed is not a limit).

Parameters
[in]trkInstance of CPUTracker to use
[in]beadIndexNumber of the ROI to scale

Definition at line 739 of file QueuedCUDATracker.cu.

740 {
741  if (!gc_offset.empty() || !gc_gain.empty()) {
742  int index = cfg.width*cfg.height*beadIndex;
743 
744  gc_mutex.lock();
745  float gf = gc_gainFactor, of = gc_offsetFactor;
746  gc_mutex.unlock();
747 
748  trk->ApplyOffsetGain(gc_offset.empty() ? 0: &gc_offset[index] , gc_gain.empty() ? 0: &gc_gain[index], of, gf);
749 // if (j->job.frame%100==0)
750  }
751 }
float gc_offsetFactor
Factor by which to scale the pixel calibration offset.
Threads::Mutex gc_mutex
Mutex for pixel calibration operations.
int width
Width of regions of interest to be handled. Typically equals height (square ROI). ...
Definition: qtrk_c_api.h:106
void lock()
Definition: threads.h:74
int height
Height of regions of interest to be handled. Typically equals width (square ROI). ...
Definition: qtrk_c_api.h:107
void ApplyOffsetGain(float *offset, float *gain, float offsetFactor, float gainFactor)
Scale the input image with the background calibration values.
Definition: cpu_tracker.cpp:95
QTrkComputedConfig cfg
The settings used by this instance of QueuedTracker.
float gc_gainFactor
Factor by which to scale the gain.
std::vector< float > gc_gain
Vector with gains used for pixel correction.
void unlock()
Definition: threads.h:79
std::vector< float > gc_offset
Vector with offsets used for pixel correction.

§ CreateStream()

QueuedCUDATracker::Stream * QueuedCUDATracker::CreateStream ( Device device,
int  streamIndex 
)
protected

Initialize a stream instance.

Parameters
[in]devicePointer to the device instance this stream runs on.
[in]streamIndexThe number of this stream.

Definition at line 309 of file QueuedCUDATracker.cu.

310 {
311  Stream* s = new Stream(streamIndex);
312 
313  try {
314  s->device = device;
315  cudaSetDevice(device->index);
316  cudaStreamCreate(&s->stream);
317 
319  s->images.allocateHostImageBuffer(s->hostImageBuf);
320 
321  s->jobs.reserve(batchSize);
322  s->results.init(batchSize);
323  s->com.init(batchSize);
324  s->d_com.init(batchSize);
325  s->d_resultpos.init(batchSize);
326  s->locParams.init(batchSize);
327  s->imgMeans.init(batchSize);
328  s->d_imgmeans.init(batchSize);
329  s->d_locParams.init(batchSize);
330  s->d_radialprofiles.init(cfg.zlut_radialsteps*batchSize);
331 
332  qi.InitStream(&s->qi_instance, cfg, s->stream, batchSize);
333  qalign.InitStream(&s->qalign_instance, cfg, s->stream, batchSize);
334 
335  cudaEventCreate(&s->localizationDone);
336  cudaEventCreate(&s->comDone);
337  cudaEventCreate(&s->imageCopyDone);
338  cudaEventCreate(&s->zcomputeDone);
339  cudaEventCreate(&s->qiDone);
340  cudaEventCreate(&s->batchStart);
341  } catch (...) {
342  delete s;
343  throw;
344  }
345  return s;
346 }
int width
Width of regions of interest to be handled. Typically equals height (square ROI). ...
Definition: qtrk_c_api.h:106
static cudaImageList< float > alloc(int w, int h, int amount)
Definition: cudaImageList.h:35
int batchSize
Amount of images to be sent at once per stream.
int height
Height of regions of interest to be handled. Typically equals width (square ROI). ...
Definition: qtrk_c_api.h:107
int zlut_radialsteps
Number of radial steps to sample on.
Definition: qtrk_c_api.h:198
QTrkComputedConfig cfg
The settings used by this instance of QueuedTracker.
QI qalign
Instance of QI used specifically for quadrant alignment.
QI qi
The QI instance used to perform the 2D localization using the quadrant interpolation algorithm...
void InitStream(StreamInstance *s, QTrkComputedConfig &cc, cudaStream_t stream, int batchSize)
Ready a stream for QI calculations.
Definition: QI_impl.h:363

§ EnableRadialZLUTCompareProfile()

void QueuedCUDATracker::EnableRadialZLUTCompareProfile ( bool  enabled)
inlinevirtual

Set a flag to enable saving of error curves.

Errors obtained by comparing a radial profile to a ZLUT will be kept in memory rather than destroyed. Only saves for one localization. Error curve can be retreived by GetRadialZLUTCompareProfile.

Note
Not implemented for CUDA.
Parameters
[in]enabledFlag (boolean) to save error curves. Default is false.

Implements QueuedTracker.

Definition at line 179 of file QueuedCUDATracker.h.

179 {}

§ EnableTextureCache()

void QueuedCUDATracker::EnableTextureCache ( bool  useTextureCache)
inline

Enable or disable the use of textures.

Texture cache provides a significant speedup due to 2D L1 caching as compared to normal linear memory caching (List of CUDA references).

Parameters
[in]useTextureCacheBoolean to enable texture cache usage.

Definition at line 158 of file QueuedCUDATracker.h.

bool useTextureCache
Flag to use texture cache. Default is true. Disable using EnableTextureCache.

§ ExecuteBatch()

template<typename TImageSampler >
void QueuedCUDATracker::ExecuteBatch ( Stream s)
protected

Execute the queued batch on a stream.

This entails the following steps:

  1. Async copy host-side buffer to device
  2. Bind image to texture memory
  3. Run COM kernel
  4. QI loop:
    1. Run QI kernels: Sample from texture into quadrant profiles
    2. Run CUFFT. Each iteration per axis does 2x forward FFT, and 1x backward FFT.
    3. Run QI kernel: Compute positions
  5. Compute ZLUT profiles
  6. Depending on localize flags:
    1. copy ZLUT profiles (for ComputeBuildZLUT flag)
    2. generate compare profile kernel + compute Z kernel (for ComputeZ flag)
  7. Unbind image from texture memory
  8. Async copy results to host
Parameters
[in]sPointer to the stream to execute.

Definition at line 560 of file QueuedCUDATracker.cu.

561 {
562  if (s->JobCount()==0)
563  return;
564  //dbgprintf("Sending %d images to GPU stream %p...\n", s->jobCount, s->stream);
565 
566  Device *d = s->device;
567  cudaSetDevice(d->index);
568 
569  BaseKernelParams kp;
570  kp.imgmeans = s->d_imgmeans.data;
571  kp.images = s->images;
572  kp.njobs = s->JobCount();
573  kp.locParams = s->d_locParams.data;
574 
575  cudaEventRecord(s->batchStart, s->stream);
576 // dbgprintf("copying %d jobs to gpu\n", s->JobCount());
577 
578  s->d_locParams.copyToDevice(s->locParams.data(), s->JobCount(), true, s->stream);
579 
581  s->images.copyToDevice(s->hostImageBuf.data(), true, s->stream);
582  }
583 
584  if (!d->calib_gain.isEmpty() || !d->calib_offset.isEmpty()) {
585  dim3 numThreads(16, 16, 2);
586  dim3 numBlocks((cfg.width + numThreads.x - 1 ) / numThreads.x,
587  (cfg.height + numThreads.y - 1) / numThreads.y,
588  (s->JobCount() + numThreads.z - 1) / numThreads.z);
589 
590  gc_mutex.lock();
591  float of = gc_offsetFactor, gf = gc_gainFactor;
592  gc_mutex.unlock();
593 
594  ApplyOffsetGain <<< numBlocks, numThreads, 0, s->stream >>>
595  (kp, s->device->calib_gain, s->device->calib_offset, gf, of);
596  }
597 
598  cudaEventRecord(s->imageCopyDone, s->stream);
599 
600 
601  TImageSampler::BindTexture(s->images);
603  BgCorrectedCOM<TImageSampler> <<< blocks(s->JobCount()), threads(), 0, s->stream >>>
604  (s->JobCount(), s->images, s->d_com.data, cfg.com_bgcorrection, s->d_imgmeans.data);
605  checksum(s->d_com.data, 1, s->JobCount(), "com");
606  }
607  cudaEventRecord(s->comDone, s->stream);
608 
609  // DbgOutputVectorToFile("D:\\TestImages\\imgmeans.csv", s->d_imgmeans, false);
610 
611  device_vec<float3> *curpos = &s->d_com;
612  if (s->localizeFlags & LT_QI) {
614  qi.Execute <TImageSampler> (kp, cfg, &s->qi_instance, &s->device->qi_instance, &s->d_com, &s->d_resultpos);
615  curpos = &s->d_resultpos;
616  }
617 
618  if (s->localizeFlags & LT_Gaussian2D) {
619  G2MLE_Compute<TImageSampler> <<< blocks(s->JobCount()), threads(), 0, s->stream >>>
620  (kp, cfg.gauss2D_sigma, cfg.gauss2D_iterations, s->d_com.data, s->d_resultpos.data, 0, 0);
621  curpos = &s->d_resultpos;
622  }
623 
624  cudaEventRecord(s->qiDone, s->stream);
625 
626  int numZIterations = (s->localizeFlags & LT_ZLUTAlign) ? 3 : 1;
627  for (int i=0;i<numZIterations;i++) {
629  ZLUTParams zp;
630  zp.planes = d->radial_zlut.h;
634  zp.img = d->radial_zlut;
635  zp.trigtable = d->zlut_trigtable.data;
636  zp.zcmpwindow = d->zcompareWindow.data;
637 
638  // Compute radial profiles
639  if (s->localizeFlags & LT_LocalizeZ) {
640  dim3 numThreads(16, 16);
641  dim3 numBlocks( (s->JobCount() + numThreads.x - 1) / numThreads.x,
642  (cfg.zlut_radialsteps + numThreads.y - 1) / numThreads.y);
643  ZLUT_RadialProfileKernel<TImageSampler> <<< numBlocks , numThreads, 0, s->stream >>>
644  (s->JobCount(), s->images, zp, curpos->data, s->d_radialprofiles.data, s->d_imgmeans.data);
645  ZLUT_NormalizeProfiles<<< blocks(s->JobCount()), threads(), 0, s->stream >>> (s->JobCount(), zp, s->d_radialprofiles.data);
646  }
647  // Compute Z
648  if (s->localizeFlags & LT_LocalizeZ) {
649  dim3 numThreads(8, 16);
650  ZLUT_ComputeProfileMatchScores <<< dim3( (s->JobCount() + numThreads.x - 1) / numThreads.x, (zp.planes + numThreads.y - 1) / numThreads.y), numThreads, 0, s->stream >>>
651  (s->JobCount(), zp, s->d_radialprofiles.data, s->d_zlutcmpscores.data, s->d_locParams.data);
652  ZLUT_ComputeZ <<< blocks(s->JobCount()), threads(), 0, s->stream >>> (s->JobCount(), zp, curpos->data, s->d_zlutcmpscores.data);
653  }}
654 
655  if (i>0) {
657  qalign.Execute<TImageSampler> (kp, cfg, &s->qalign_instance, &s->device->qalign_instance, &s->d_resultpos, &s->d_resultpos);
658  }
659  }
660 
661  TImageSampler::UnbindTexture(s->images);
662  cudaEventRecord(s->zcomputeDone, s->stream);
663 
665  s->d_com.copyToHost(s->com.data(), true, s->stream);
666  curpos->copyToHost(s->results.data(), true, s->stream);
667  s->d_imgmeans.copyToHost(s->imgMeans.data(), true, s->stream);
668  }
669 
670  // Make sure we can query the all done signal
671  cudaEventRecord(s->localizationDone, s->stream);
672 }
int gauss2D_iterations
Number of times to run the 2D gaussian algorithm.
Definition: qtrk_c_api.h:163
float zlut_minradius
Distance in pixels from the bead center from which to start sampling profiles. Default 1...
Definition: qtrk_c_api.h:135
float gc_offsetFactor
Factor by which to scale the pixel calibration offset.
Structure to group parameters to be passed to every kernel.
void copyToHost(T *dst, bool async, cudaStream_t s=0)
Definition: gpu_utils.h:126
Structure to group parameters about the ZLUT.
Threads::Mutex gc_mutex
Mutex for pixel calibration operations.
int width
Width of regions of interest to be handled. Typically equals height (square ROI). ...
Definition: qtrk_c_api.h:106
__global__ void ZLUT_ComputeProfileMatchScores(int njobs, ZLUTParams params, float *profiles, float *compareScoreBuf, LocalizationParams *locParams)
Definition: Kernels.h:121
void lock()
Definition: threads.h:74
int numThreads
Number of threads to use in a general thread block.
int height
Height of regions of interest to be handled. Typically equals width (square ROI). ...
Definition: qtrk_c_api.h:107
void checksum(T *data, int elemsize, int numelem, const char *name)
int zlut_radialsteps
Number of radial steps to sample on.
Definition: qtrk_c_api.h:198
KernelProfileTime cpu_time
void Execute(BaseKernelParams &p, const QTrkComputedConfig &cfg, StreamInstance *s, DeviceInstance *d, device_vec< float3 > *initial, device_vec< float3 > *output)
Execute a batch of QI calculations. Runs Iterate the correct number of times and recalculates the num...
Definition: QI_impl.h:268
__global__ void ApplyOffsetGain(BaseKernelParams kp, cudaImageListf calib_gain, cudaImageListf calib_offset, float gainFactor, float offsetFactor)
Definition: Kernels.h:169
__global__ void ZLUT_ComputeZ(int njobs, ZLUTParams params, float3 *positions, float *compareScoreBuf)
Definition: Kernels.h:108
float2 * trigtable
Array of precomputed radial spoke sampling points (cos,sin pairs)
Enable z localization.
Definition: qtrk_c_api.h:21
float com_bgcorrection
Background correction factor for COM. Defines the number of standard deviations data needs to be away...
Definition: qtrk_c_api.h:133
int planes
The number of planes per lookup table.
QTrkComputedConfig cfg
The settings used by this instance of QueuedTracker.
cudaImageListf images
The images for this batch.
LocalizationParams * locParams
Additional localization parameters. Holds the ZLUT/bead index.
__global__ void ZLUT_NormalizeProfiles(int njobs, ZLUTParams params, float *profiles)
Definition: Kernels.h:142
QI qalign
Instance of QI used specifically for quadrant alignment.
COM+QI.
Definition: qtrk_c_api.h:9
int angularSteps
The number of angular steps used to generate the lookup table.
2D Gaussian localization
Definition: qtrk_c_api.h:10
float gc_gainFactor
Factor by which to scale the gain.
float maxRadius
Maximum radial distance in pixels of the sampling area.
cudaImageListf img
The imagelist holding the LUTs.
int njobs
Number of jobs in the batch.
Enable ZLUT align.
Definition: qtrk_c_api.h:19
dim3 blocks()
Calculate the number of thread blocks of size numThreads needed to have 1 thread per job...
float gauss2D_sigma
Standard deviation to use in the 2D gaussian algorithm.
Definition: qtrk_c_api.h:164
float * imgmeans
Array of image means.
void unlock()
Definition: threads.h:79
float minRadius
Radius in pixels of the starting point of the sampling area.
QI qi
The QI instance used to perform the 2D localization using the quadrant interpolation algorithm...
int zlut_angularsteps
Number of angular steps to sample on.
Definition: qtrk_c_api.h:199
float zlut_maxradius
Max radius in pixels of the sampling circle.
Definition: qtrk_c_api.h:200
dim3 threads()
Get the CUDA native datatype with the threadblock dimensions to use.
float * zcmpwindow
The radial weights to use for the error curve calculation.

§ FetchResults()

int QueuedCUDATracker::FetchResults ( LocalizationResult results,
int  maxResults 
)
overridevirtual

Fetch available results.

Note
Removes results from internal QueuedTracker memory.
Parameters
[in]resultsArray of pre-allocated LocalizationResult to which to add the results.
[in]maxResultsMaximum number of results to fetch. Corresponds to maximum size of dstResult.
Returns
Number of fetched results.

Implements QueuedTracker.

Definition at line 705 of file QueuedCUDATracker.cu.

706 {
707  resultMutex.lock();
708  int numResults = 0;
709  while (numResults < maxResults && !results.empty()) {
710  dstResults[numResults++] = results.front();
711  results.pop_front();
712  resultCount--;
713  }
715  return numResults;
716 }
Threads::Mutex resultMutex
Mutex for result memory accesses.
void lock()
Definition: threads.h:74
int resultCount
Number of results available.
void unlock()
Definition: threads.h:79

§ FinalizeLUT()

void QueuedCUDATracker::FinalizeLUT ( )
overridevirtual

Finalize the lookup tables in memory.

Normalizes the profiles for radial lookup tables and calculates derivates and adds boundary conditions for image LUTs.

Implements QueuedTracker.

Definition at line 494 of file QueuedCUDATracker.cu.

495 {
496  Device* srcd = devices[0];
497  cudaSetDevice(srcd->index);
498 
499  cudaImageListf& src = srcd->radial_zlut;
500  float *tmp = new float [src.w*src.h*src.count];
501  srcd->radial_zlut.copyToHost(tmp);
502 
503  for (int i=0;i<src.h*src.count;i++){
504  NormalizeRadialProfile(&tmp[src.w*i],src.w);
505  }
506 
507  SetRadialZLUT(tmp, src.count, src.h);
508 
509  for(int bead = 0; bead < src.count; bead++){
510  for(int step = 0; step < cfg.zlut_radialsteps; step++){
511  dbgprintf("%f ",tmp[(bead * cfg.zlut_radialsteps)+step]);
512  }
513  }
514  dbgprintf("\n"); //*/
515 
516  delete[] tmp;
517 }
void NormalizeRadialProfile(scalar_t *prof, int rsteps)
Definition: utils.cpp:257
int zlut_radialsteps
Number of radial steps to sample on.
Definition: qtrk_c_api.h:198
QTrkComputedConfig cfg
The settings used by this instance of QueuedTracker.
std::vector< Device * > devices
Vector of device instances used.
void SetRadialZLUT(float *data, int numLUTs, int planes) override
Set the radial lookup tables to be used for z tracking.
void dbgprintf(const char *fmt,...)
Definition: utils.cpp:149
void copyToHost(T *dst, bool async=false, cudaStream_t s=0)

§ Flush()

void QueuedCUDATracker::Flush ( )
overridevirtual

Stop waiting for more jobs to do, and just process the current batch.

Implements QueuedTracker.

Definition at line 519 of file QueuedCUDATracker.cu.

520 {
522  for (uint i=0;i<streams.size();i++) {
523  if(streams[i]->JobCount()>0 && streams[i]->state != Stream::StreamExecuting)
525  }
527 }
The Stream is ready to be executed. That is, the queue is full and the batch is ready or Flush was ca...
void lock()
Definition: threads.h:74
unsigned int uint
Definition: std_incl.h:127
Threads::Mutex jobQueueMutex
Mutex for job queue accesses.
std::vector< Stream * > streams
Vector of usable streams.
The Stream is currently active on the GPU and executing its batch.
void unlock()
Definition: threads.h:79

§ GetConfigValues()

QueuedCUDATracker::ConfigValueMap QueuedCUDATracker::GetConfigValues ( )
overridevirtual

Get the used additional configurations.

Implements QueuedTracker.

Definition at line 881 of file QueuedCUDATracker.cu.

882 {
883  ConfigValueMap cvm;
884  cvm["use_texturecache"] = useTextureCache ? "1" : "0";
885  return cvm;
886 }
bool useTextureCache
Flag to use texture cache. Default is true. Disable using EnableTextureCache.
std::map< std::string, std::string > ConfigValueMap
Datastructure used to return additional settings in a string-string key-value pair mapping...

§ GetProfileReport()

std::string QueuedCUDATracker::GetProfileReport ( )
overridevirtual

Get the output of performance profiling.

Note
Profiling is only implemented in CUDA at the moment.
Returns
String with the parsed profiling output.

Reimplemented from QueuedTracker.

Definition at line 869 of file QueuedCUDATracker.cu.

870 {
871  float f = 1.0f/batchesDone;
872 
873  return deviceReport + "Time profiling: [GPU], [CPU] \n" +
874  SPrintf("%d batches done of size %d, on %d streams", batchesDone, batchSize, streams.size()) + "\n" +
875  SPrintf("Image copying: %.2f,\t%.2f ms\n", time.imageCopy*f, cpu_time.imageCopy*f) +
876  SPrintf("QI: %.2f,\t%.2f ms\n", time.qi*f, cpu_time.qi*f) +
877  SPrintf("COM: %.2f,\t%.2f ms\n", time.com*f, cpu_time.com*f) +
878  SPrintf("Z Computing: %.2f,\t%.2f ms\n", time.zcompute*f, cpu_time.zcompute*f);
879 }
int batchSize
Amount of images to be sent at once per stream.
KernelProfileTime cpu_time
KernelProfileTime time
std::string deviceReport
String holding a human-readable description of used GPUs. Filled during InitializeDeviceList.
std::vector< Stream * > streams
Vector of usable streams.
int batchesDone
Number of fully completed batches.
std::string SPrintf(const char *fmt,...)
Definition: utils.cpp:132

§ GetQueueLength()

int QueuedCUDATracker::GetQueueLength ( int *  maxQueueLen)
overridevirtual

Get the lengths of the queue of jobs to be handled.

Parameters
[out]maxQueueLenPre-allocated integer that returns the maximum size of the queue if nonzero.
Returns
Number of jobs currently being handled and in the queue.

Implements QueuedTracker.

Definition at line 379 of file QueuedCUDATracker.cu.

380 {
382  int qlen = 0;
383  for (uint a=0;a<streams.size();a++){
384  qlen += streams[a]->JobCount();
385  }
387 
388  if (maxQueueLen) {
389  *maxQueueLen = streams.size()*batchSize;
390  }
391 
392  return qlen;
393 }
void lock()
Definition: threads.h:74
unsigned int uint
Definition: std_incl.h:127
Threads::Mutex jobQueueMutex
Mutex for job queue accesses.
int batchSize
Amount of images to be sent at once per stream.
std::vector< Stream * > streams
Vector of usable streams.
void unlock()
Definition: threads.h:79

§ GetRadialZLUT()

void QueuedCUDATracker::GetRadialZLUT ( float *  dst)
overridevirtual

Get the radial lookup tables used for z tracking.

Parameters
[out]dstPointer to the pre-allocated memory in which to save the data.

Implements QueuedTracker.

Definition at line 834 of file QueuedCUDATracker.cu.

835 {
836  cudaImageListf* zlut = &devices[0]->radial_zlut;
837 
838  if (zlut->data) {
839  for (int i=0;i<zlut->count;i++) {
840  float* img = &data[i*cfg.zlut_radialsteps*zlut->h];
841  zlut->copyImageToHost(i, img);
842  }
843  }
844 }
void copyImageToHost(int img, T *dst, bool async=false, cudaStream_t s=0)
Definition: cudaImageList.h:97
int zlut_radialsteps
Number of radial steps to sample on.
Definition: qtrk_c_api.h:198
QTrkComputedConfig cfg
The settings used by this instance of QueuedTracker.
std::vector< Device * > devices
Vector of device instances used.

§ GetRadialZLUTCompareProfile()

void QueuedCUDATracker::GetRadialZLUTCompareProfile ( float *  dst)
inlinevirtual

Get saved error curve.

See EnableRadialZLUTCompareProfile.

Note
Not implemented for CUDA.
Parameters
[in]dstPointer to the pre-allocated memory in which to save the error curve. Size is count * planes.

Implements QueuedTracker.

Definition at line 180 of file QueuedCUDATracker.h.

180 {} // dst = [count * planes]

§ GetRadialZLUTSize()

void QueuedCUDATracker::GetRadialZLUTSize ( int &  count,
int &  planes,
int &  radialsteps 
)
overridevirtual

Get the dimensions of the radial lookup table data.

Parameters
[out]countReference to pre-allocated int. Returns number of lookup tables.
[out]planesReference to pre-allocated int. Returns number of planes per lookup table.
[out]radialstepsReference to pre-allocated int. Returns number of steps per plane.

Implements QueuedTracker.

Definition at line 846 of file QueuedCUDATracker.cu.

847 {
848  count = devices[0]->radial_zlut.count;
849  planes = devices[0]->radial_zlut.h;
850  rsteps = cfg.zlut_radialsteps;
851 }
int zlut_radialsteps
Number of radial steps to sample on.
Definition: qtrk_c_api.h:198
QTrkComputedConfig cfg
The settings used by this instance of QueuedTracker.
std::vector< Device * > devices
Vector of device instances used.

§ GetReadyStream()

QueuedCUDATracker::Stream * QueuedCUDATracker::GetReadyStream ( )
protected

Get a stream that is not currently executing, and still has room for images.

Definition at line 349 of file QueuedCUDATracker.cu.

350 {
351  while (true) {
353 
354  Stream *best = 0;
355  for (uint i=0;i<streams.size();i++)
356  {
357  Stream*s = streams[i];
358  if (s->state == Stream::StreamIdle) {
359  if (!best || (s->JobCount() > best->JobCount()))
360  best = s;
361  }
362  }
363 
365 
366  if (best)
367  return best;
368 
369  Threads::Sleep(1);
370  }
371 }
void lock()
Definition: threads.h:74
unsigned int uint
Definition: std_incl.h:127
Threads::Mutex jobQueueMutex
Mutex for job queue accesses.
The Stream is idle and can accept more jobs. In other words, the queue is not full.
std::vector< Stream * > streams
Vector of usable streams.
static void Sleep(int ms)
Definition: threads.h:134
void unlock()
Definition: threads.h:79

§ GetResultCount()

int QueuedCUDATracker::GetResultCount ( )
overridevirtual

Get the number of finished localization jobs (=results) available in memory.

Returns
The number of available results.

Implements QueuedTracker.

Definition at line 853 of file QueuedCUDATracker.cu.

854 {
855  resultMutex.lock();
856  int r = resultCount;
858  return r;
859 }
Threads::Mutex resultMutex
Mutex for result memory accesses.
void lock()
Definition: threads.h:74
int resultCount
Number of results available.
void unlock()
Definition: threads.h:79

§ InitializeDeviceList()

void QueuedCUDATracker::InitializeDeviceList ( )
protected

Build the list of devices to be used based on the QTrkSettings::cuda_device flag.

Definition at line 76 of file QueuedCUDATracker.cu.

77 {
78  int numDevices;
79  cudaGetDeviceCount(&numDevices);
80 
81  // Select the most powerful one
84  devices.push_back(new Device(cfg.cuda_device));
85  } else if(cfg.cuda_device == QTrkCUDA_UseAll) {
86  // Use all devices
87  for (int i=0;i<numDevices;i++)
88  devices.push_back(new Device(i));
89  } else if (cfg.cuda_device == QTrkCUDA_UseList) {
90  for (uint i=0;i<cudaDeviceList.size();i++)
91  devices.push_back(new Device(cudaDeviceList[i]));
92  } else {
93  devices.push_back (new Device(cfg.cuda_device));
94  }
95  deviceReport = "Using devices: ";
96  for (uint i=0;i<devices.size();i++) {
97  cudaDeviceProp p;
98  cudaGetDeviceProperties(&p, devices[i]->index);
99  deviceReport += SPrintf("%s%s", p.name, i<devices.size()-1?", ":"\n");
100  }
101 }
static int GetBestCUDADevice()
#define QTrkCUDA_UseBest
Definition: qtrk_c_api.h:118
unsigned int uint
Definition: std_incl.h:127
#define QTrkCUDA_UseList
Definition: qtrk_c_api.h:116
std::string deviceReport
String holding a human-readable description of used GPUs. Filled during InitializeDeviceList.
QTrkComputedConfig cfg
The settings used by this instance of QueuedTracker.
std::vector< Device * > devices
Vector of device instances used.
int cuda_device
CUDA only. Flag for device selection.
Definition: qtrk_c_api.h:131
#define QTrkCUDA_UseAll
Definition: qtrk_c_api.h:117
static std::vector< int > cudaDeviceList
std::string SPrintf(const char *fmt,...)
Definition: utils.cpp:132

§ IsIdle()

bool QueuedCUDATracker::IsIdle ( )
overridevirtual

Test to see if the tracker is idle.

That is, GetQueueLength == 0.

Returns
Boolean indicating if the tracker is idle.

Implements QueuedTracker.

Definition at line 373 of file QueuedCUDATracker.cu.

374 {
375  int ql = GetQueueLength(0);
376  return ql == 0;
377 }
int GetQueueLength(int *maxQueueLen) override
Get the lengths of the queue of jobs to be handled.

§ ScheduleLocalization()

void QueuedCUDATracker::ScheduleLocalization ( void *  data,
int  pitch,
QTRK_PixelDataType  pdt,
const LocalizationJob jobInfo 
)
overridevirtual

Add a job to the queue to be processed. A job entails running the required algorithms on a single region of interest.

If a localization can not be added to the queue because it is full, the thread will be put to sleep and periodically try again.

Parameters
[in]dataPointer to the data. Type specified by pdt.
[in]pitchDistance in bytes between two successive rows of pixels (e.g. address of (0,0) - address of (0,1)).
[in]pdtType of data, specified by QTRK_PixelDataType.
[in]jobInfoStructure with metadata for the ROI to be handled. See LocalizationJob.

Implements QueuedTracker.

Definition at line 405 of file QueuedCUDATracker.cu.

406 {
407  Stream* s = GetReadyStream();
408 
410  int jobIndex = s->jobs.size();
411  LocalizationJob job = *jobInfo;
412  s->jobs.push_back(job);
413  s->localizeFlags = localizeMode; // which kernels to run
414  s->locParams[jobIndex].zlutIndex = jobInfo->zlutIndex;
415 
416  if (s->jobs.size() == batchSize)
417  s->state = Stream::StreamPendingExec;
419 
420  s->imageBufMutex.lock();
421  // Copy the image to the batch image buffer (CPU side)
422  float* hostbuf = &s->hostImageBuf[cfg.height*cfg.width*jobIndex];
423  CopyImageToFloat( (uchar*)data, cfg.width, cfg.height, pitch, pdt, hostbuf);
425  for(int i=0;i<4;i++) hostbuf[i]=0;
426  }
427  s->imageBufMutex.unlock();
428 
429  //dbgprintf("Job: %d\n", jobIndex);
430 }
Ignore the first four pixels of a frame. Some cameras save timestamps rather than image data in the f...
Definition: qtrk_c_api.h:23
int width
Width of regions of interest to be handled. Typically equals height (square ROI). ...
Definition: qtrk_c_api.h:106
The Stream is ready to be executed. That is, the queue is full and the batch is ready or Flush was ca...
int zlutIndex
Bead number of this ROI. Used to get the right ZLUT from memory.
Definition: qtrk_c_api.h:58
void lock()
Definition: threads.h:74
Threads::Mutex jobQueueMutex
Mutex for job queue accesses.
int batchSize
Amount of images to be sent at once per stream.
int height
Height of regions of interest to be handled. Typically equals width (square ROI). ...
Definition: qtrk_c_api.h:107
LocMode_t localizeMode
Flags for localization choices. See LocalizeModeEnum.
void CopyImageToFloat(uchar *data, int width, int height, int pitch, QTRK_PixelDataType pdt, float *dst)
Copies image data from a generic QTRK_PixelDataType array to a float array.
Definition: utils.cpp:641
QTrkComputedConfig cfg
The settings used by this instance of QueuedTracker.
void unlock()
Definition: threads.h:79
Structure for region of interest metadata.
Definition: qtrk_c_api.h:49
unsigned char uchar
Definition: std_incl.h:130
Stream * GetReadyStream()
Get a stream that is not currently executing, and still has room for images.

§ SchedulingThreadEntryPoint()

void QueuedCUDATracker::SchedulingThreadEntryPoint ( void *  param)
staticprotected

Entry point for thread creation.

Definition at line 212 of file QueuedCUDATracker.cu.

213 {
214  ((QueuedCUDATracker*)param)->SchedulingThreadMain();
215 }
CUDA implementation of the QueuedTracker interface.

§ SchedulingThreadMain()

void QueuedCUDATracker::SchedulingThreadMain ( )
protected

Loop executed by the scheduling thread which executes threads when needed.

Definition at line 217 of file QueuedCUDATracker.cu.

218 {
219  std::vector<Stream*> activeStreams;
220 
221  while (!quitScheduler) {
223  Stream* s = 0;
224  for (uint i=0;i<streams.size();i++)
225  if (streams[i]->state == Stream::StreamPendingExec) {
226  s=streams[i];
227  s->state = Stream::StreamExecuting;
228  // dbgprintf("Executing stream %p [%d]. %d jobs\n", s, i, s->JobCount());
229  break;
230  }
232 
233  if (s) {
234  s->imageBufMutex.lock();
235 
236  // Launch filled batches, or if flushing launch every batch with nonzero jobs
237  if (useTextureCache)
238  ExecuteBatch<ImageSampler_Tex> (s);
239  else
240  ExecuteBatch<ImageSampler_MemCopy> (s);
241  s->imageBufMutex.unlock();
242  activeStreams.push_back(s);
243  }
244 
245  // Fetch results
246  for (uint a=0;a<activeStreams.size();a++) {
247  Stream* s = activeStreams[a];
248  if (s->IsExecutionDone()) {
249  // dbgprintf("Stream %p done.\n", s);
251  s->localizeFlags = 0; // reset this for the next batch
253  s->jobs.clear();
254  s->state = Stream::StreamIdle;
256  activeStreams.erase(std::find(activeStreams.begin(),activeStreams.end(),s));
257  break;
258  }
259  }
260 
261  Threads::Sleep(1);
262  }
263 }
The Stream is ready to be executed. That is, the queue is full and the batch is ready or Flush was ca...
bool useTextureCache
Flag to use texture cache. Default is true. Disable using EnableTextureCache.
void CopyStreamResults(Stream *s)
Copy localization results from device to host memory. Also updates profiling times.
void lock()
Definition: threads.h:74
unsigned int uint
Definition: std_incl.h:127
Threads::Mutex jobQueueMutex
Mutex for job queue accesses.
The Stream is idle and can accept more jobs. In other words, the queue is not full.
std::vector< Stream * > streams
Vector of usable streams.
static void Sleep(int ms)
Definition: threads.h:134
The Stream is currently active on the GPU and executing its batch.
Atomic< bool > quitScheduler
Thread shutdown flag with built-in thread safety (atomic).
void unlock()
Definition: threads.h:79

§ SetConfigValue()

void QueuedCUDATracker::SetConfigValue ( std::string  name,
std::string  value 
)
overridevirtual

Set an additional setting.

Parameters
[in]nameName of the setting.
[in]valueValue of the setting.

Implements QueuedTracker.

Definition at line 888 of file QueuedCUDATracker.cu.

889 {
890  if (name == "use_texturecache")
891  useTextureCache = atoi(value.c_str()) != 0;
892 }
bool useTextureCache
Flag to use texture cache. Default is true. Disable using EnableTextureCache.

§ SetLocalizationMode()

void QueuedCUDATracker::SetLocalizationMode ( LocMode_t  locType)
overridevirtual

Select which algorithm is to be used.

Parameters
[in]locTypeAn integer used as a bitmask for settings based on LocalizeModeEnum.

Implements QueuedTracker.

Definition at line 395 of file QueuedCUDATracker.cu.

396 {
397  Flush();
398  while (!IsIdle());
399 
401  localizeMode = mode;
403 }
void Flush() override
Stop waiting for more jobs to do, and just process the current batch.
void lock()
Definition: threads.h:74
Threads::Mutex jobQueueMutex
Mutex for job queue accesses.
LocMode_t localizeMode
Flags for localization choices. See LocalizeModeEnum.
bool IsIdle() override
Test to see if the tracker is idle.
void unlock()
Definition: threads.h:79

§ SetPixelCalibrationFactors()

void QueuedCUDATracker::SetPixelCalibrationFactors ( float  offsetFactor,
float  gainFactor 
)
overridevirtual

Set the pixel calibration factors.

The factors can be used to increase or decrease the effects of the images supplied through SetPixelCalibrationImages for further finetuning. These only have an effect when an image is actually set through that function.

Parameters
[in]offsetFactorFactor by which to scale the offset values.
[in]gainFactorFactor by which to scale the gain values.

Implements QueuedTracker.

Definition at line 753 of file QueuedCUDATracker.cu.

754 {
755  gc_mutex.lock();
756  gc_gainFactor = gainFactor;
757  gc_offsetFactor = offsetFactor;
758  gc_mutex.unlock();
759 }
float gc_offsetFactor
Factor by which to scale the pixel calibration offset.
Threads::Mutex gc_mutex
Mutex for pixel calibration operations.
void lock()
Definition: threads.h:74
float gc_gainFactor
Factor by which to scale the gain.
void unlock()
Definition: threads.h:79

§ SetPixelCalibrationImages()

void QueuedCUDATracker::SetPixelCalibrationImages ( float *  offset,
float *  gain 
)
overridevirtual

Set the pixel calibration images.

These images are used to scale the input image to get rid of background influences in the image. The values are per-pixel-per-ROI. Result = gain*(pixel+offset).

Parameters
[in]offsetArray with the offset values to use per pixel. Size and order is [width*height*numbeads].
[in]gainArray with gain values to use per pixel. Size and order is [width*height*numbeads].

Implements QueuedTracker.

Definition at line 718 of file QueuedCUDATracker.cu.

719 {
720  for (uint i=0;i<devices.size();i++) {
721  devices[i]->SetPixelCalibrationImages(offset, gain, cfg.width, cfg.height);
722  }
723 
724  // Copy to CPU side buffers for BuildLUT
725  int nelem = devices[0]->radial_zlut.count * cfg.width * cfg.height;
726  if (offset && gc_offset.size()!=nelem) {
727  gc_offset.resize(nelem);
728  gc_offset.assign(offset,offset+nelem);
729  }
730  if (!offset) gc_offset.clear();
731 
732  if (gain && gc_gain.size()!=nelem) {
733  gc_gain.reserve(nelem);
734  gc_gain.assign(gain, gain+nelem);
735  }
736  if (!gain) gc_gain.clear();
737 }
int width
Width of regions of interest to be handled. Typically equals height (square ROI). ...
Definition: qtrk_c_api.h:106
unsigned int uint
Definition: std_incl.h:127
int height
Height of regions of interest to be handled. Typically equals width (square ROI). ...
Definition: qtrk_c_api.h:107
QTrkComputedConfig cfg
The settings used by this instance of QueuedTracker.
std::vector< Device * > devices
Vector of device instances used.
std::vector< float > gc_gain
Vector with gains used for pixel correction.
std::vector< float > gc_offset
Vector with offsets used for pixel correction.

§ SetRadialWeights()

void QueuedCUDATracker::SetRadialWeights ( float *  zcmp)
overridevirtual

Set radial weights used for comparing LUT profiles.

Parameters
[in]zcmpArray of radial weights to use. zcmp has to have zlut_radialsteps elements.

Implements QueuedTracker.

Definition at line 798 of file QueuedCUDATracker.cu.

799 {
800  for (uint i=0;i<devices.size();i++) {
801  devices[i]->SetRadialWeights(zcmp);
802  }
803 }
unsigned int uint
Definition: std_incl.h:127
std::vector< Device * > devices
Vector of device instances used.

§ SetRadialZLUT()

void QueuedCUDATracker::SetRadialZLUT ( float *  data,
int  count,
int  planes 
)
overridevirtual

Set the radial lookup tables to be used for z tracking.

Data can be zero to allocate ZLUT data. LUTs should have been created before by BuildLUT, but not necessarily by the current instance as long as setting match.

Parameters
[in]dataPointer to the start of the ZLUT data.
[in]countNumber of ZLUTs in the dataset.
[in]planesNumber of planes per ZLUT.

Implements QueuedTracker.

Definition at line 787 of file QueuedCUDATracker.cu.

788 {
789  for (uint i=0;i<devices.size();i++) {
790  devices[i]->SetRadialZLUT(data, cfg.zlut_radialsteps, planes, numLUTs);
791  }
792 
793  for (uint i=0;i<streams.size();i++) {
795  }
796 }
void StreamUpdateZLUTSize(Stream *s)
Update zlut vector dimensions. Use when settings change.
unsigned int uint
Definition: std_incl.h:127
int zlut_radialsteps
Number of radial steps to sample on.
Definition: qtrk_c_api.h:198
QTrkComputedConfig cfg
The settings used by this instance of QueuedTracker.
std::vector< Device * > devices
Vector of device instances used.
std::vector< Stream * > streams
Vector of usable streams.

§ StreamUpdateZLUTSize()

void QueuedCUDATracker::StreamUpdateZLUTSize ( Stream s)
protected

Update zlut vector dimensions. Use when settings change.

Parameters
[in]sThe stream for which to update.

Definition at line 805 of file QueuedCUDATracker.cu.

806 {
807  cudaSetDevice(s->device->index);
808  s->d_zlutcmpscores.init(s->device->radial_zlut.h * batchSize);
809  // radialsteps never changes
810 }
int batchSize
Amount of images to be sent at once per stream.

§ threads()

dim3 QueuedCUDATracker::threads ( )
inlineprotected

Get the CUDA native datatype with the threadblock dimensions to use.

Use in conjunction with blocks as:

kernel <<< blocks(), threads() >>> (a, b);
Returns
CUDA 3D data for use in kernel calls.

Definition at line 399 of file QueuedCUDATracker.h.

399 { return dim3(numThreads); }
int numThreads
Number of threads to use in a general thread block.

Member Data Documentation

§ batchesDone

int QueuedCUDATracker::batchesDone

Number of fully completed batches.

Definition at line 487 of file QueuedCUDATracker.h.

§ batchSize

int QueuedCUDATracker::batchSize
protected

Amount of images to be sent at once per stream.

Higher batchsize = higher speeds. Reason why it's faster:

  1. More threads & blocks are created, allowing more efficient memory latency hiding by warp switching, or in other words, higher occupancy is achieved.
  2. Bigger batches are copied at a time, achieving higher effective PCIe bus bandwidth.

Definition at line 359 of file QueuedCUDATracker.h.

§ cpu_time

KernelProfileTime QueuedCUDATracker::cpu_time

Definition at line 486 of file QueuedCUDATracker.h.

§ deviceProp

cudaDeviceProp QueuedCUDATracker::deviceProp
protected

Variable used to save device properties obtained from the CUDA API.

Definition at line 418 of file QueuedCUDATracker.h.

§ deviceReport

std::string QueuedCUDATracker::deviceReport

String holding a human-readable description of used GPUs. Filled during InitializeDeviceList.

Definition at line 489 of file QueuedCUDATracker.h.

§ devices

std::vector<Device*> QueuedCUDATracker::devices
protected

Vector of device instances used.

Definition at line 407 of file QueuedCUDATracker.h.

§ gc_gain

std::vector<float> QueuedCUDATracker::gc_gain
protected

Vector with gains used for pixel correction.

Definition at line 412 of file QueuedCUDATracker.h.

§ gc_gainFactor

float QueuedCUDATracker::gc_gainFactor
protected

Factor by which to scale the gain.

Definition at line 410 of file QueuedCUDATracker.h.

§ gc_mutex

Threads::Mutex QueuedCUDATracker::gc_mutex
protected

Mutex for pixel calibration operations.

Definition at line 413 of file QueuedCUDATracker.h.

§ gc_offset

std::vector<float> QueuedCUDATracker::gc_offset
protected

Vector with offsets used for pixel correction.

Definition at line 411 of file QueuedCUDATracker.h.

§ gc_offsetFactor

float QueuedCUDATracker::gc_offsetFactor
protected

Factor by which to scale the pixel calibration offset.

Definition at line 409 of file QueuedCUDATracker.h.

§ jobQueueMutex

Threads::Mutex QueuedCUDATracker::jobQueueMutex
protected

Mutex for job queue accesses.

Definition at line 406 of file QueuedCUDATracker.h.

§ localizeMode

LocMode_t QueuedCUDATracker::localizeMode
protected

Flags for localization choices. See LocalizeModeEnum.

Definition at line 404 of file QueuedCUDATracker.h.

§ numThreads

int QueuedCUDATracker::numThreads
protected

Number of threads to use in a general thread block.

Used by the blocks and threads functions to quickly calculate parameterspace covering kernel execution dimensions.

Definition at line 365 of file QueuedCUDATracker.h.

§ qalign

QI QueuedCUDATracker::qalign
protected

Instance of QI used specifically for quadrant alignment.

Definition at line 417 of file QueuedCUDATracker.h.

§ qi

QI QueuedCUDATracker::qi
protected

The QI instance used to perform the 2D localization using the quadrant interpolation algorithm.

Definition at line 416 of file QueuedCUDATracker.h.

§ quitScheduler

Atomic<bool> QueuedCUDATracker::quitScheduler
protected

Thread shutdown flag with built-in thread safety (atomic).

Definition at line 421 of file QueuedCUDATracker.h.

§ resultCount

int QueuedCUDATracker::resultCount
protected

Number of results available.

Definition at line 403 of file QueuedCUDATracker.h.

§ resultMutex

Threads::Mutex QueuedCUDATracker::resultMutex
protected

Mutex for result memory accesses.

Definition at line 405 of file QueuedCUDATracker.h.

§ results

std::list<LocalizationResult> QueuedCUDATracker::results
protected

Vector of completed results.

Definition at line 402 of file QueuedCUDATracker.h.

§ schedulingThread

Threads::Handle* QueuedCUDATracker::schedulingThread
protected

Handle to the scheduling thread for later reference.

Definition at line 420 of file QueuedCUDATracker.h.

§ streams

std::vector<Stream*> QueuedCUDATracker::streams
protected

Vector of usable streams.

Definition at line 401 of file QueuedCUDATracker.h.

§ time

KernelProfileTime QueuedCUDATracker::time

Definition at line 486 of file QueuedCUDATracker.h.

§ useTextureCache

bool QueuedCUDATracker::useTextureCache
protected

Flag to use texture cache. Default is true. Disable using EnableTextureCache.

Definition at line 408 of file QueuedCUDATracker.h.

§ zlut_build_flags

uint QueuedCUDATracker::zlut_build_flags
protected

Flags for ZLUT building. Not actually used yet.

Definition at line 414 of file QueuedCUDATracker.h.


The documentation for this class was generated from the following files: