CUTLASS supports various memory layouts for tensors. The layout determines how multi-dimensional tensors are stored in linear memory.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/NVIDIA/cutlass/llms.txt
Use this file to discover all available pages before exploring further.
LayoutType Enum
Thecutlass.LayoutType enum defines available memory layouts:
Basic Layouts
LayoutType.RowMajor
Row-major layout (C/C++ convention). Consecutive elements in a row are stored contiguously in memory.i * N + j
Use case: Standard for C/C++ applications and most PyTorch operations.
LayoutType.ColumnMajor
Column-major layout (Fortran/BLAS convention). Consecutive elements in a column are stored contiguously in memory.j * M + i
Use case: Interoperability with BLAS libraries, Fortran code, and column-major frameworks.
Interleaved Layouts
Interleaved layouts pack multiple elements together for improved memory access patterns with certain data types.LayoutType.ColumnMajorInterleaved2
Column-major with 2-way interleaving.LayoutType.RowMajorInterleaved2
Row-major with 2-way interleaving.LayoutType.ColumnMajorInterleaved32
Column-major with 32-way interleaving.LayoutType.RowMajorInterleaved32
Row-major with 32-way interleaving.LayoutType.ColumnMajorInterleaved64
Column-major with 64-way interleaving.LayoutType.RowMajorInterleaved64
Row-major with 64-way interleaving.Tensor Layouts
Tensor layouts are used for convolution operations and multi-dimensional tensors.LayoutType.TensorNHWC
Tensor layout with dimensions ordered as (N, H, W, C) - commonly used in computer vision.- N: Batch size
- H: Height
- W: Width
- C: Channels
LayoutType.TensorNCHW
Tensor layout with dimensions ordered as (N, C, H, W).LayoutType.TensorNDHWC
5D tensor layout for 3D convolutions: (N, D, H, W, C).LayoutType.TensorNWC
3D tensor layout: (N, W, C).Layout Selection Guide
Performance Considerations
| Scenario | Recommended Layout | Reason |
|---|---|---|
| PyTorch matrices | RowMajor | Default PyTorch layout |
| NumPy matrices | RowMajor | Default NumPy layout |
| BLAS/LAPACK interop | ColumnMajor | BLAS convention |
| INT8 GEMM | ColumnMajorInterleaved32 | Optimized Tensor Core access |
| CNN inputs (TF) | TensorNHWC | TensorFlow default |
| CNN inputs (PyTorch) | TensorNCHW | PyTorch default |
Alignment Requirements
Some layouts require specific alignment:- Interleaved layouts require dimensions divisible by interleaving factor
- TensorCore operations may require 8-byte or 16-byte alignment
Layout Conversion
PyTorch Transpose
Convert between row-major and column-major in PyTorch:NumPy Transpose
Layout in GEMM Operations
Example: Row-Major GEMM
Example: Mixed Layouts
Layout Naming Convention
CUTLASS uses shorthand notation in kernel names:| Layout | Shorthand | Example |
|---|---|---|
| ColumnMajor | n | cutlass_gemm_n |
| RowMajor | t | cutlass_gemm_t |
| ColumnMajorInterleaved32 | n32 | cutlass_gemm_n32 |
| RowMajorInterleaved32 | t32 | cutlass_gemm_t32 |
| TensorNHWC | nhwc | cutlass_conv_nhwc |
C++ Mapping
Python layout types map directly to C++ CUTLASS layout types:Source Code References
- Python enum: cutlass/python/cutlass_library/library.py:401-421
- C++ layout tags: cutlass/python/cutlass_library/library.py:424-445
- Layout implementation:
cutlass/include/cutlass/layout/