mirror of
https://github.com/pytorch/pytorch.git
synced 2025-10-20 21:14:14 +08:00
Add a stable TORCH_LIBRARY to C shim (#148124)
This PR adds two main parts: - shim.h stable C APIs into torch::Library APIs - a higher level API in torch/csrc/stable/library.h that calls into this shim.h + otherwise is self contained Goal: custom kernel writers should be able to call the apis in the directories above in order to register their library in a way that allows their custom extension to run with a different libtorch version than it was built with. Subplots resolved: - Do we want a whole separate StableLibrary or do we want to freeze torch::Library and add `m.stable_impl(cstring, void (*fn)(void **, int64_t, int64_t)` into it - Yes, we want a separate StableLibrary. We cannot freeze Library and it is NOT header only. - Should I use unint64_t as the common denominator instead of void* to support 32bit architectures better? - Yes, and done - Should I add a stable `def` and `fragment` when those can be done in python? - I think we do want these --- and now they're done - Where should library_stable_impl.cpp live? -- no longer relevant - I need some solid test cases to make sure everything's going ok. I've intentionally thrown in a bunch of random dtypes into the signature, but I still haven't tested returning multiple things, returning nothing, complex dtypes, etc. - Have since tested all the torch library endpoints. the others can be tested in a followup to separate components that need to be in shim.h vs can be added later Pull Request resolved: https://github.com/pytorch/pytorch/pull/148124 Approved by: https://github.com/albanD, https://github.com/zou3519, https://github.com/atalman
This commit is contained in:
committed by
PyTorch MergeBot
parent
4d10da731b
commit
971606befa
@ -227,6 +227,49 @@ class TestCppExtensionAOT(common.TestCase):
|
||||
if return_code != 0:
|
||||
return return_code
|
||||
|
||||
@unittest.skipIf(not TEST_CUDA, "some aspects of this test require CUDA")
|
||||
def test_libtorch_agnostic(self):
|
||||
import libtorch_agnostic
|
||||
|
||||
# (1) first test that SGD CPU kernel works
|
||||
param = torch.rand(5, device="cpu")
|
||||
grad = torch.rand_like(param)
|
||||
weight_decay = 0.01
|
||||
lr = 0.001
|
||||
maximize = False
|
||||
|
||||
new_param = libtorch_agnostic.ops.sgd_out_of_place(
|
||||
param, grad, weight_decay, lr, maximize
|
||||
)
|
||||
torch._fused_sgd_(
|
||||
(param,),
|
||||
(grad,),
|
||||
(),
|
||||
weight_decay=weight_decay,
|
||||
momentum=0.0,
|
||||
lr=lr,
|
||||
dampening=0.0,
|
||||
nesterov=False,
|
||||
maximize=maximize,
|
||||
is_first_step=False,
|
||||
)
|
||||
self.assertEqual(new_param, param)
|
||||
|
||||
# (2) then test that we don't hog unnecessary memory
|
||||
def _run_identity(prior_mem, device):
|
||||
t = torch.rand(32, 32, device=device)
|
||||
self.assertGreater(torch.cuda.memory_allocated(device), prior_mem)
|
||||
identi_t = libtorch_agnostic.ops.identity(t)
|
||||
assert identi_t is t
|
||||
|
||||
device = torch.cuda.current_device()
|
||||
init_mem = torch.cuda.memory_allocated(device)
|
||||
|
||||
for _ in range(3):
|
||||
_run_identity(init_mem, device)
|
||||
curr_mem = torch.cuda.memory_allocated(device)
|
||||
self.assertEqual(curr_mem, init_mem)
|
||||
|
||||
|
||||
@torch.testing._internal.common_utils.markDynamoStrictTest
|
||||
class TestPybindTypeCasters(common.TestCase):
|
||||
|
Reference in New Issue
Block a user