GPU Stress Module

Memory, Compute & Atomic Operation Stress Testing

Overview

The GPU Stress module is designed to push your Graphics Processing Unit to its absolute limits by implementing three highly intensive algorithms: a brutal memory test, an extreme computational test, and a maximum intensity atomic operations test. These are engineered to expose potential instabilities, test thermal limits, and evaluate the raw performance of your GPU's various sub-systems.

Algorithms

Maximum Intensity Memory Test

This test employs extreme random access patterns across eight distinct data buffers to maximize cache thrashing and memory bandwidth utilization. It also utilizes intensive shared memory operations and high register usage to stress the GPU's memory hierarchy and register file.

// EXTREME random access patterns - maximum cache thrashing const unsigned int chaos1 = (tid * 1009 + i * 1013 + warp_id * 1019) % size; const float val1 = data[chaos1]; // Intensive shared memory operations with maximum contention shared_data[shared_idx] = fmaf(shared_data[shared_idx], val1 + s, val2 * s); __syncthreads();

Maximum Intensity Computational Test

This algorithm pushes the GPU's Streaming Multiprocessors (SMs) and Floating Point Units (FPUs) to their limits. It features maximum register usage and a cascade of complex mathematical operations, including transcendental, power, root, hyperbolic, and inverse trigonometric functions, creating ultra-complex dependency chains.

// LEVEL 1: Basic transcendental functions (maximum intensity) r[i+1] = fmaf(sinf(r[i] * 0.1f), cosf(r[i+1] * 0.1f), ...); // LEVEL 8: Ultra-complex dependency chains float temp1 = fmaf(r[i], r[(i+32) % 128], r[(i+64) % 128]); r[i+1] = fmaf(sinf(temp1), cosf(temp2), fmaf(expf(temp1 * 0.001f), logf(fabsf(temp2) + 1.0f), r[i]));

Maximum Intensity Atomic Test

This test focuses on stressing the GPU's atomic operation capabilities by generating maximum contention on shared memory locations. It performs a wide array of atomic operations (add, subtract, max, min, exchange, OR, AND, XOR, CAS, increment) across integer, float, unsigned long long, and double data types.

// MAXIMUM contention - all threads fight for same locations atomicAdd(&counters[ultra_contested_int], 1); atomicExch(&float_data[ultra_contested_float], (float)tid); atomicOr(&ull_data[ultra_contested_ull], (unsigned long long)tid); // INSANE contention - single location battles atomicAdd(&counters[insane_location], 1);

Hardware Stress Targets

💾

Global Memory & Caches

Chaotic access patterns and large data sets overwhelm GPU caches and saturate memory bandwidth.

Streaming Multiprocessors (SMs) & FPUs

Extensive floating-point arithmetic and complex function calls push SMs and FPUs to their limits.

🔄

Atomic Units & L2 Cache

High contention atomic operations severely stress the GPU's atomic units and the L2 cache's ability to handle concurrent writes.

📝

Register File & Shared Memory

Maximum register usage per thread and intensive shared memory access patterns stress the GPU's on-chip memory resources.

Performance Characteristics

Computational Intensity

~328 Memory Ops/Thread/Iter
~89k Compute Ops/Thread/Iter
40+ Atomic Ops/Thread/Iter
100% GPU Utilization

Technical Implementation

Key Stress Features:

  • Multiple global memory buffers for maximal thrashing
  • Aggressive use of `fmaf` for combined operations
  • Prime number spacing for memory accesses to avoid caching patterns
  • Very high register pressure per thread
  • Diverse atomic operations on different data types
  • Explicit `__syncthreads()` for shared memory contention
  • Deliberately small atomic target arrays for extreme contention

HIP/CUDA Specifics:

  • Utilizes `__global__` kernels for direct GPU execution
  • `__restrict__` pointers for compiler optimization hints
  • `extern __shared__` for dynamic shared memory allocation
  • `hipMalloc` and `hipMemcpy` for host-device data transfer
  • Synchronization with `hipDeviceSynchronize()`
⚠️ EXTREME PERFORMANCE WARNING

This module is designed to consume 100% of your GPU's resources. Extended execution *will* lead to very high temperatures, power consumption, and potentially system instability or hardware damage. Ensure your cooling solution is robust and monitor temperatures diligently during testing. Use at your own risk!

← Back to Modules