NCU
- NCU doc
ERR_NVGPUCTRPERM: NCU require permission to access performance counters on target device. Solution:
create a file with the .conf extension containing options
nvidia NVreg_RestrictProfilingToAdminUsers=0
in/etc/modprobe.d
Example:
ncu -k <kernel name> -s <skip match count> -c <# calls to profile> --kill yes --set full -o <output file> -f <program>
NSYS
- NSYS doc
- Post on how to profile PyTorch applications with NSYS. By using NVTX ranges, programmer can manually mark sections in the python code so that they appear in the NSYS profile results.
Example:
nsys profile -t cuda,cublas,cudnn,nvtx -c cudaProfilerApi -o <output name> -f true <program>
Common
- NCU and NSYS profiles are backward compatible (can read old-version profiles) but not necessarily forward compatible. Profiles produced on later version may not be readable from older version NCU/NSYS.
- NCU and NSYS are somewhat bound to NVIDIA driver versions. If driver version is newer than current NSYS's maximum supported driver version, seemingly random errors may occur (e.g. I had a
Unable to read device UUID
error in a system that had CUDA 12.3 and CUDA 11.7 installed. The driver version was CUDA 12.3's version, but I ran the NSYS shipped with CUDA 11.7 and got this error only for _some specific applications_, which was very confusing.) Nvidia Forum Post