This fixes 4 main issues:
- The way the cuda sanitizer handle it's state is weird. In particular, because the lifetime of the Mode is linked to the submodule, then this might outlive the python runtime and other modules loaded. On my current version, this even outlives the "sys" module. Given that I'm not sure the impact of changing this lifetime handling, I'm making the exit handler a no-op when python is already dying and thus no point cleaning up.
- Adds a "disable" method to be able to test after the mode is enabled.
- Fix `Tensor.as_sublass()` to properly disable modes when creating the new Tensor object just like we already do in `make_subclass` and `make_wrapper_subclass`. The change here is just to apply the exact same treatment to it.
- ~Fix `Tensor.as_subclass()` not to propagate autograd as there is no valid backward associated here.~ We have test that check that this behavior happen so I guess this is not an obvious bugfix and expected behavior. Reverted that change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138218
Approved by: https://github.com/ngimel
Small rework of how the error message is formatted, introduces a distinction between the arguments and the output of kernels. Verified manually on multiple examples that the message is printed as expected.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85008
Approved by: https://github.com/lw
Example of a simple synchronization error:
```
a = torch.rand(4, 2, device="cuda")
with torch.cuda.stream(second_stream):
torch.mul(a, 5, out=a)
```
Output produced by CSAN:
```
============================
CSAN detected a possible data race on tensor with data pointer 139719969079296
Access by stream 94646435460352 during kernel:
aten::mul.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
writing to argument: self, out, output
With stack trace:
File "/private/home/sypniewski/pytorch/torch/cuda/_sanitizer.py", line 364, in _handle_kernel_launch
stack_trace = traceback.StackSummary.extract(
File "/private/home/sypniewski/pytorch/torch/cuda/_sanitizer.py", line 544, in __torch_dispatch__
errors = self.event_handler._handle_kernel_launch(
File "/private/home/sypniewski/pytorch/torch/utils/_python_dispatch.py", line 76, in wrapped
return f(self, *args, **kwargs)
File "/private/home/sypniewski/pytorch/tester.py", line 9, in <module>
torch.mul(a, 5, out=a)
Previous access by stream 0 during kernel:
aten::rand(int[] size, *, int? dtype=None, int? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
writing to argument: output
With stack trace:
File "/private/home/sypniewski/pytorch/torch/cuda/_sanitizer.py", line 364, in _handle_kernel_launch
stack_trace = traceback.StackSummary.extract(
File "/private/home/sypniewski/pytorch/torch/cuda/_sanitizer.py", line 544, in __torch_dispatch__
errors = self.event_handler._handle_kernel_launch(
File "/private/home/sypniewski/pytorch/torch/utils/_python_dispatch.py", line 76, in wrapped
return f(self, *args, **kwargs)
File "/private/home/sypniewski/pytorch/tester.py", line 6, in <module>
a = torch.rand(10000, device="cuda")
Tensor was allocated with stack trace:
File "/private/home/sypniewski/pytorch/torch/cuda/_sanitizer.py", line 420, in _handle_memory_allocation
traceback.StackSummary.extract(
File "/private/home/sypniewski/pytorch/torch/utils/_cuda_trace.py", line 23, in fire_callbacks
cb(*args, **kwargs)
File "/private/home/sypniewski/pytorch/torch/_ops.py", line 60, in __call__
return self._op(*args, **kwargs or {})
File "/private/home/sypniewski/pytorch/torch/cuda/_sanitizer.py", line 541, in __torch_dispatch__
outputs = func(*args, **kwargs)
File "/private/home/sypniewski/pytorch/torch/utils/_python_dispatch.py", line 76, in wrapped
return f(self, *args, **kwargs)
File "/private/home/sypniewski/pytorch/tester.py", line 6, in <module>
a = torch.rand(10000, device="cuda")
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83984
Approved by: https://github.com/ezyang