rocBLAS User Guide¶
Contents:
- 1. Getting Started Guide
- 2. Installation and Building for Linux
- 3. Installation and Building for Windows
- 4. API Reference Guide
- 5. Using rocBLAS API
- 5.1. rocBLAS Datatypes
- 5.2. rocBLAS Enumeration
- 5.2.1. rocblas_operation
- 5.2.2. rocblas_fill
- 5.2.3. rocblas_diagonal
- 5.2.4. rocblas_side
- 5.2.5. rocblas_status
rocblas_status
rocblas_status::rocblas_status_success
rocblas_status::rocblas_status_invalid_handle
rocblas_status::rocblas_status_not_implemented
rocblas_status::rocblas_status_invalid_pointer
rocblas_status::rocblas_status_invalid_size
rocblas_status::rocblas_status_memory_error
rocblas_status::rocblas_status_internal_error
rocblas_status::rocblas_status_perf_degraded
rocblas_status::rocblas_status_size_query_mismatch
rocblas_status::rocblas_status_size_increased
rocblas_status::rocblas_status_size_unchanged
rocblas_status::rocblas_status_invalid_value
rocblas_status::rocblas_status_continue
rocblas_status::rocblas_status_check_numerics_fail
- 5.2.6. rocblas_datatype
rocblas_datatype
rocblas_datatype::rocblas_datatype_f16_r
rocblas_datatype::rocblas_datatype_f32_r
rocblas_datatype::rocblas_datatype_f64_r
rocblas_datatype::rocblas_datatype_f16_c
rocblas_datatype::rocblas_datatype_f32_c
rocblas_datatype::rocblas_datatype_f64_c
rocblas_datatype::rocblas_datatype_i8_r
rocblas_datatype::rocblas_datatype_u8_r
rocblas_datatype::rocblas_datatype_i32_r
rocblas_datatype::rocblas_datatype_u32_r
rocblas_datatype::rocblas_datatype_i8_c
rocblas_datatype::rocblas_datatype_u8_c
rocblas_datatype::rocblas_datatype_i32_c
rocblas_datatype::rocblas_datatype_u32_c
rocblas_datatype::rocblas_datatype_bf16_r
rocblas_datatype::rocblas_datatype_bf16_c
rocblas_datatype::rocblas_datatype_invalid
- 5.2.7. rocblas_pointer_mode
- 5.2.8. rocblas_atomics_mode
- 5.2.9. rocblas_layer_mode
- 5.2.10. rocblas_gemm_algo
- 5.2.11. rocblas_gemm_flags
- 5.3. rocBLAS Helper functions
- 5.3.1. Auxiliary Functions
rocblas_create_handle()
rocblas_destroy_handle()
rocblas_set_stream()
rocblas_get_stream()
rocblas_set_pointer_mode()
rocblas_get_pointer_mode()
rocblas_set_atomics_mode()
rocblas_get_atomics_mode()
rocblas_query_int8_layout_flag()
rocblas_pointer_to_mode()
rocblas_set_vector()
rocblas_get_vector()
rocblas_set_matrix()
rocblas_get_matrix()
rocblas_set_vector_async()
rocblas_set_matrix_async()
rocblas_get_matrix_async()
rocblas_initialize()
rocblas_status_to_string()
- 5.3.2. Device Memory Allocation Functions
- 5.3.3. Build Information Functions
- 5.3.1. Auxiliary Functions
- 5.4. rocBLAS Level-1 functions
- 5.4.1. rocblas_iXamax + batched, strided_batched
- 5.4.2. rocblas_iXamin + batched, strided_batched
- 5.4.3. rocblas_Xasum + batched, strided_batched
- 5.4.4. rocblas_Xaxpy + batched, strided_batched
rocblas_saxpy()
rocblas_daxpy()
rocblas_haxpy()
rocblas_caxpy()
rocblas_zaxpy()
rocblas_saxpy_batched()
rocblas_daxpy_batched()
rocblas_haxpy_batched()
rocblas_caxpy_batched()
rocblas_zaxpy_batched()
rocblas_saxpy_strided_batched()
rocblas_daxpy_strided_batched()
rocblas_haxpy_strided_batched()
rocblas_caxpy_strided_batched()
rocblas_zaxpy_strided_batched()
- 5.4.5. rocblas_Xcopy + batched, strided_batched
- 5.4.6. rocblas_Xdot + batched, strided_batched
rocblas_sdot()
rocblas_ddot()
rocblas_hdot()
rocblas_bfdot()
rocblas_cdotu()
rocblas_cdotc()
rocblas_zdotu()
rocblas_zdotc()
rocblas_sdot_batched()
rocblas_ddot_batched()
rocblas_hdot_batched()
rocblas_bfdot_batched()
rocblas_cdotu_batched()
rocblas_cdotc_batched()
rocblas_zdotu_batched()
rocblas_zdotc_batched()
rocblas_sdot_strided_batched()
rocblas_ddot_strided_batched()
rocblas_hdot_strided_batched()
rocblas_bfdot_strided_batched()
rocblas_cdotu_strided_batched()
rocblas_cdotc_strided_batched()
rocblas_zdotu_strided_batched()
rocblas_zdotc_strided_batched()
- 5.4.7. rocblas_Xnrm2 + batched, strided_batched
- 5.4.8. rocblas_Xrot + batched, strided_batched
rocblas_srot()
rocblas_drot()
rocblas_crot()
rocblas_csrot()
rocblas_zrot()
rocblas_zdrot()
rocblas_srot_batched()
rocblas_drot_batched()
rocblas_crot_batched()
rocblas_csrot_batched()
rocblas_zrot_batched()
rocblas_zdrot_batched()
rocblas_srot_strided_batched()
rocblas_drot_strided_batched()
rocblas_crot_strided_batched()
rocblas_csrot_strided_batched()
rocblas_zrot_strided_batched()
rocblas_zdrot_strided_batched()
- 5.4.9. rocblas_Xrotg + batched, strided_batched
- 5.4.10. rocblas_Xrotm + batched, strided_batched
- 5.4.11. rocblas_Xrotmg + batched, strided_batched
- 5.4.12. rocblas_Xscal + batched, strided_batched
rocblas_sscal()
rocblas_dscal()
rocblas_cscal()
rocblas_zscal()
rocblas_csscal()
rocblas_zdscal()
rocblas_sscal_batched()
rocblas_dscal_batched()
rocblas_cscal_batched()
rocblas_zscal_batched()
rocblas_csscal_batched()
rocblas_zdscal_batched()
rocblas_sscal_strided_batched()
rocblas_dscal_strided_batched()
rocblas_cscal_strided_batched()
rocblas_zscal_strided_batched()
rocblas_csscal_strided_batched()
rocblas_zdscal_strided_batched()
- 5.4.13. rocblas_Xswap + batched, strided_batched
- 5.5. rocBLAS Level-2 functions
- 5.5.1. rocblas_Xgbmv + batched, strided_batched
- 5.5.2. rocblas_Xgemv + batched, strided_batched
- 5.5.3. rocblas_Xger + batched, strided_batched
rocblas_sger()
rocblas_dger()
rocblas_cgeru()
rocblas_zgeru()
rocblas_cgerc()
rocblas_zgerc()
rocblas_sger_batched()
rocblas_dger_batched()
rocblas_cgeru_batched()
rocblas_zgeru_batched()
rocblas_cgerc_batched()
rocblas_zgerc_batched()
rocblas_sger_strided_batched()
rocblas_dger_strided_batched()
rocblas_cgeru_strided_batched()
rocblas_zgeru_strided_batched()
rocblas_cgerc_strided_batched()
rocblas_zgerc_strided_batched()
- 5.5.4. rocblas_Xsbmv + batched, strided_batched
- 5.5.5. rocblas_Xspmv + batched, strided_batched
- 5.5.6. rocblas_Xspr + batched, strided_batched
- 5.5.7. rocblas_Xspr2 + batched, strided_batched
- 5.5.8. rocblas_Xsymv + batched, strided_batched
- 5.5.9. rocblas_Xsyr + batched, strided_batched
- 5.5.10. rocblas_Xsyr2 + batched, strided_batched
- 5.5.11. rocblas_Xtbmv + batched, strided_batched
- 5.5.12. rocblas_Xtbsv + batched, strided_batched
- 5.5.13. rocblas_Xtpmv + batched, strided_batched
- 5.5.14. rocblas_Xtpsv + batched, strided_batched
- 5.5.15. rocblas_Xtrmv + batched, strided_batched
- 5.5.16. rocblas_Xtrsv + batched, strided_batched
- 5.5.17. rocblas_Xhemv + batched, strided_batched
- 5.5.18. rocblas_Xhbmv + batched, strided_batched
- 5.5.19. rocblas_Xhpmv + batched, strided_batched
- 5.5.20. rocblas_Xher + batched, strided_batched
- 5.5.21. rocblas_Xher2 + batched, strided_batched
- 5.5.22. rocblas_Xhpr + batched, strided_batched
- 5.5.23. rocblas_Xhpr2 + batched, strided_batched
- 5.6. rocBLAS Level-3 functions
- 5.6.1. rocblas_Xgemm + batched, strided_batched
rocblas_sgemm()
rocblas_dgemm()
rocblas_hgemm()
rocblas_cgemm()
rocblas_zgemm()
rocblas_sgemm_batched()
rocblas_dgemm_batched()
rocblas_hgemm_batched()
rocblas_cgemm_batched()
rocblas_zgemm_batched()
rocblas_sgemm_strided_batched()
rocblas_dgemm_strided_batched()
rocblas_hgemm_strided_batched()
rocblas_cgemm_strided_batched()
rocblas_zgemm_strided_batched()
- 5.6.2. rocblas_Xsymm + batched, strided_batched
- 5.6.3. rocblas_Xsyrk + batched, strided_batched
- 5.6.4. rocblas_Xsyr2k + batched, strided_batched
- 5.6.5. rocblas_Xsyrkx + batched, strided_batched
- 5.6.6. rocblas_Xtrmm + batched, strided_batched
- 5.6.7. rocblas_Xtrsm + batched, strided_batched
- 5.6.8. rocblas_Xhemm + batched, strided_batched
- 5.6.9. rocblas_Xherk + batched, strided_batched
- 5.6.10. rocblas_Xher2k + batched, strided_batched
- 5.6.11. rocblas_Xherkx + batched, strided_batched
- 5.6.12. rocblas_Xtrtri + batched, strided_batched
- 5.6.1. rocblas_Xgemm + batched, strided_batched
- 5.7. rocBLAS Extension
- 5.7.1. rocblas_axpy_ex + batched, strided_batched
- 5.7.2. rocblas_dot_ex + batched, strided_batched
- 5.7.3. rocblas_dotc_ex + batched, strided_batched
- 5.7.4. rocblas_nrm2_ex + batched, strided_batched
- 5.7.5. rocblas_rot_ex + batched, strided_batched
- 5.7.6. rocblas_scal_ex + batched, strided_batched
- 5.7.7. rocblas_gemm_ex + batched, strided_batched
- 5.7.8. rocblas_gemm_ext2
- 5.7.9. rocblas_trsm_ex + batched, strided_batched
- 5.7.10. rocblas_Xgeam + batched, strided_batched
- 5.7.11. rocblas_Xdgmm + batched, strided_batched
- 5.8. rocBLAS Beta Features
- 5.9. Graph Support for rocBLAS
- 5.10. Device Memory Allocation in rocBLAS
- 5.10.1. Environment Variable for Preallocating
- 5.10.2. Functions for Manually Setting Memory Size
- 5.10.3. Function for Setting User Owned Workspace
- 5.10.4. Functions for Finding How Much Memory Is Required
- 5.10.5. rocBLAS Function Return Values for Insufficient Device Memory
- 5.10.6. Stream-Ordered Memory Allocation
- 5.11. Logging in rocBLAS
- 6. Programmer’s Guide
- 6.1. Library Source Code Organization
- 6.2. Handle, Stream, and Device Management
- 6.3. Device Memory Allocation
- 6.4. Thread Safe Logging
- 6.5. rocBLAS Numerical Checking
- 6.6. rocBLAS Order of Argument Checking and Logging
- 6.6.1. Legacy BLAS
- 6.6.2. rocBLAS
- 6.6.3. rocBLAS has the Following Differences When Compared To Legacy BLAS
- 6.6.4. To Accommodate the Additions
- 6.6.5. Device Memory Size Queries
- 6.6.6. rocBLAS Control Flow
- 6.6.7. Legacy L1 BLAS “single vector”
- 6.6.8. Legacy L1 BLAS “two vector”
- 6.6.9. Legacy L2 BLAS
- 6.6.10. Legacy L3 BLAS
- 6.7. rocBLAS Benchmarking and Testing
- 7. Contributor’s Guide
- 8. Acknowledgement
- 9. Disclaimer