Updated CUDA basics (markdown)

Jane (Yuan) Xu
2025-08-22 23:08:56 -04:00
parent 28477f8138
commit d34bed95e2

@ -30,9 +30,9 @@ TensorIterator kernels
* you can use `torch.cuda.set_sync_debug_mode` to warn or error out on cuda synchronizations, if you are trying to understand where synchronizations are coming from in your workload or if you are accidentally synchronizing in your operations
* Use pytorch built-in profiler (kineto) or nsys to get information on GPU utilization and most time-consuming kernels
See https://www.dropbox.com/s/3350s3qfy8rpm5a/CUDA_presentation.key?dl=0 for high level overview of CUDA programming and useful links
See the linked slides in https://www.nvidia.com/en-us/on-demand/session/gtc24-s62191/ for a high level overview of CUDA programming.
Goto N1023015 to do TensorIterator cuda perf lab
Go to N1023015 to do TensorIterator cuda perf lab.
## Next
Unit 7: Data (Optional) - [[Data Basics|Data-Basics]]