pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-21 05:34:18 +08:00

Author	SHA1	Message	Date
Aaron Gokaslan	0247ed27cc	Apply Clang-Tidy readability-container-size-empty (#93236 ) Not only is this change usually shorter and more readable, it also can yield better performance. size() is not always a constant time operation (such as on LinkedLists), but empty() always is. Pull Request resolved: https://github.com/pytorch/pytorch/pull/93236 Approved by: https://github.com/malfet	2023-01-29 23:28:19 +00:00
jjsjann123	c11b301bcd	[NVFUSER] refactor nvfuser build (#89621 ) This PR is the first step towards refactors the build for nvfuser in order to have the coegen being a standalone library. Contents inside this PR: 1. nvfuser code base has been moved to `./nvfuser`, from `./torch/csrc/jit/codegen/cuda/`, except for registration code for integration (interface.h/interface.cpp) 2. splits the build system so nvfuser is generating its own `.so` files. Currently there are: - `libnvfuser_codegen.so`, which contains the integration, codegen and runtime system of nvfuser - `nvfuser.so`, which is nvfuser's python API via pybind. Python frontend is now exposed via `nvfuser._C.XXX` instead of `torch._C._nvfuser` 3. nvfuser cpp tests is currently being compiled into `nvfuser_tests` 4. cmake is refactored so that: - nvfuser now has its own `CMakeLists.txt`, which is under `torch/csrc/jit/codegen/cuda/`. - nvfuser backend code is not compiled inside `libtorch_cuda_xxx` any more - nvfuser is added as a subdirectory under `./CMakeLists.txt` at the very end after torch is built. - since nvfuser has dependency on torch, the registration of nvfuser at runtime is done via dlopen (`at::DynamicLibrary`). This avoids circular dependency in cmake, which will be a nightmare to handle. For details, look at `torch/csrc/jit/codegen/cuda/interface.cpp::LoadingNvfuserLibrary` Future work that's scoped in following PR: - Currently since nvfuser codegen has dependency on torch, we need to refactor that out so we can move nvfuser into a submodule and not rely on dlopen to load the library. @malfet - Since we moved nvfuser into a cmake build, we effectively disabled bazel build for nvfuser. This could impact internal workload at Meta, so we need to put support back. cc'ing @vors Pull Request resolved: https://github.com/pytorch/pytorch/pull/89621 Approved by: https://github.com/davidberard98	2023-01-26 02:50:44 +00:00
Aaron Gokaslan	387d769156	[BE]: Replace string compares with more efficient cpp comparisons (#92765 ) Replace cpp string comparisons with more efficient equality operators. These string comparisons are not just more readable, but they also allow for short-circuiting for faster string equality checks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92765 Approved by: https://github.com/ezyang	2023-01-22 21:40:19 +00:00
Edward Z. Yang	5c6f5439b7	Implement SymBool (#92149 ) We have known for a while that we should in principle support SymBool as a separate concept from SymInt and SymFloat ( in particular, every distinct numeric type should get its own API). However, recent work with unbacked SymInts in, e.g., https://github.com/pytorch/pytorch/pull/90985 have made this a priority to implement. The essential problem is that our logic for computing the contiguity of tensors performs branches on the passed in input sizes, and this causes us to require guards when constructing tensors from unbacked SymInts. Morally, this should not be a big deal because, we only really care about the regular (non-channels-last) contiguity of the tensor, which should be guaranteed since most people aren't calling `empty_strided` on the tensor, however, because we store a bool (not a SymBool, prior to this PR it doesn't exist) on TensorImpl, we are forced to immediately compute these values, even if the value ends up not being used at all. In particular, even when a user allocates a contiguous tensor, we still must compute channels-last contiguity (as some contiguous tensors are also channels-last contiguous, but others are not.) This PR implements SymBool, and makes TensorImpl use SymBool to store the contiguity information in ExtraMeta. There are a number of knock on effects, which I now discuss below. * I introduce a new C++ type SymBool, analogous to SymInt and SymFloat. This type supports logical and, logical or and logical negation. I support the bitwise operations on this class (but not the conventional logic operators) to make it clear that logical operations on SymBool are NOT short-circuiting. I also, for now, do NOT support implicit conversion of SymBool to bool (creating a guard in this case). This does matter too much in practice, as in this PR I did not modify the equality operations (e.g., `==` on SymInt) to return SymBool, so all preexisting implicit guards did not need to be changed. I also introduced symbolic comparison functions `sym_eq`, etc. on SymInt to make it possible to create SymBool. The current implementation of comparison functions makes it unfortunately easy to accidentally introduce guards when you do not mean to (as both `s0 == s1` and `s0.sym_eq(s1)` are valid spellings of equality operation); in the short term, I intend to prevent excess guarding in this situation by unit testing; in the long term making the equality operators return SymBool is probably the correct fix. * ~~I modify TensorImpl to store SymBool for the `is_contiguous` fields and friends on `ExtraMeta`. In practice, this essentially meant reverting most of the changes from https://github.com/pytorch/pytorch/pull/85936 . In particular, the fields on ExtraMeta are no longer strongly typed; at the time I was particularly concerned about the giant lambda I was using as the setter getting a desynchronized argument order, but now that I have individual setters for each field the only "big list" of boolean arguments is in the constructor of ExtraMeta, which seems like an acceptable risk. The semantics of TensorImpl are now that we guard only when you actually attempt to access the contiguity of the tensor via, e.g., `is_contiguous`. By in large, the contiguity calculation in the implementations now needs to be duplicated (as the boolean version can short circuit, but the SymBool version cannot); you should carefully review the duplicate new implementations. I typically use the `identity` template to disambiguate which version of the function I need, and rely on overloading to allow for implementation sharing. The changes to the `compute_` functions are particularly interesting; for most of the functions, I preserved their original non-symbolic implementation, and then introduce a new symbolic implementation that is branch-less (making use of our new SymBool operations). However, `compute_non_overlapping_and_dense` is special, see next bullet.~~ This appears to cause performance problems, so I am leaving this to an update PR. * (Update: the Python side pieces for this are still in this PR, but they are not wired up until later PRs.) While the contiguity calculations are relatively easy to write in a branch-free way, `compute_non_overlapping_and_dense` is not: it involves a sort on the strides. While in principle we can still make it go through by using a data oblivious sorting network, this seems like too much complication for a field that is likely never used (because typically, it will be obvious that a tensor is non overlapping and dense, because the tensor is contiguous.) So we take a different approach: instead of trying to trace through the logic computation of non-overlapping and dense, we instead introduce a new opaque operator IsNonOverlappingAndDenseIndicator which represents all of the compute that would have been done here. This function returns an integer 0 if `is_non_overlapping_and_dense` would have returned `False`, and an integer 1 otherwise, for technical reasons (Sympy does not easily allow defining custom functions that return booleans). The function itself only knows how to evaluate itself if all of its arguments are integers; otherwise it is left unevaluated. This means we can always guard on it (as `size_hint` will always be able to evaluate through it), but otherwise its insides are left a black box. We typically do NOT expect this custom function to show up in actual boolean expressions, because we will typically shortcut it due to the tensor being contiguous. It's possible we should apply this treatment to all of the other `compute_` operations, more investigation necessary. As a technical note, because this operator takes a pair of a list of SymInts, we need to support converting `ArrayRef<SymNode>` to Python, and I also unpack the pair of lists into a single list because I don't know if Sympy operations can actually validly take lists of Sympy expressions as inputs. See for example `_make_node_sizes_strides` * On the Python side, we also introduce a SymBool class, and update SymNode to track bool as a valid pytype. There is some subtlety here: bool is a subclass of int, so one has to be careful about `isinstance` checks (in fact, in most cases I replaced `isinstance(x, int)` with `type(x) is int` for expressly this reason.) Additionally, unlike, C++, I do NOT define bitwise inverse on SymBool, because it does not do the correct thing when run on booleans, e.g., `~True` is `-2`. (For that matter, they don't do the right thing in C++ either, but at least in principle the compiler can warn you about it with `-Wbool-operation`, and so the rule is simple in C++; only use logical operations if the types are statically known to be SymBool). Alas, logical negation is not overrideable, so we have to introduce `sym_not` which must be used in place of `not` whenever a SymBool can turn up. To avoid confusion with `__not__` which may imply that `operators.__not__` might be acceptable to use (it isn't), our magic method is called `__sym_not__`. The other bitwise operators `&` and `\|` do the right thing with booleans and are acceptable to use. * There is some annoyance working with booleans in Sympy. Unlike int and float, booleans live in their own algebra and they support less operations than regular numbers. In particular, `sympy.expand` does not work on them. To get around this, I introduce `safe_expand` which only calls expand on operations which are known to be expandable. TODO: this PR appears to greatly regress performance of symbolic reasoning. In particular, `python test/functorch/test_aotdispatch.py -k max_pool2d` performs really poorly with these changes. Need to investigate. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92149 Approved by: https://github.com/albanD, https://github.com/Skylion007	2023-01-21 02:21:56 +00:00
Edward Z. Yang	6420fecdc4	Introduce sym_min and sym_max (#92107 ) It turns out our old max/min implementation didn't do anything, because `__max__` and `__min__` are not actually magic methods in Python. So I give 'em the `sym_` treatment, similar to the other non-overrideable builtins. NB: I would like to use `sym_max` when computing contiguous strides but this appears to make `python test/functorch/test_aotdispatch.py -v -k test_aot_autograd_symbolic_exhaustive_nn_functional_max_pool2d_cpu_float32` run extremely slowly. Needs investigating. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92107 Approved by: https://github.com/albanD, https://github.com/voznesenskym, https://github.com/Skylion007	2023-01-18 20:57:27 +00:00
Salil Desai	da43584bef	[Reland] Clean Up MobileOptimizerType Rewrite Flags Public API and Documentation (#92081 ) Summary: X-link: https://github.com/facebookresearch/d2go/pull/459 Reland of D41690203 (`370df963e0`) Remove MobileOptimizerType and all rewrite flags from torch.X and torch._C.X to clean up torch.X and torch._C.X namespaces The affected rewrite flags are - CONV_BN_FUSION - FUSE_ADD_RELU - HOIST_CONV_PACKED_PARAMS - INSERT_FOLD_PREPACK_OPS - REMOVE_DROPOUT - VULKAN_AUTOMATIC_GPU_TRANSFER Bc-Breaking Change: Before this change, the rewrite flags were accessible through all of 1. torch.utils.mobile_optimizer.MobileOptimizerType.X 2. torch._C.MobileOptimizerType.X 3. torch.X 4. torch.MobileOptimizerType.X 5. torch._C.X But after this change, only torch.utils.mobile_optimizer.MobileOptimizerType.X (option 1 above) and the newly added torch._C._MobileOptimizerType.X remain Corresponding updates to PyTorch Tutorial Docs are in https://github.com/pytorch/tutorials/pull/2163 Test Plan: ```buck test caffe2/test:test_mobile_optimizer``` ``` Summary Pass: 6 Skip: 1 ↻ caffe2/test:test_mobile_optimizer - test_mobilenet_optimize_for_mobile (test_mobile_optimizer.TestOptimizer) ListingSuccess: 1 Finished test run: https://www.internalfb.com/intern/testinfra/testrun/4222124793514412 ``` ___ ```buck test caffe2/torch/fb/mobile/tests:model_exporter_tests``` Tests pass ___ With temporary testing changes in D41690204: ```buck run caffe2:test_rewrite_flags_api``` Before: ``` torch.utils.mobile_optimizer.MobileOptimizerType.VULKAN_AUTOMATIC_GPU_TRANSFER Expected: ✅ \| Result: ✅ torch._C._MobileOptimizerType.VULKAN_AUTOMATIC_GPU_TRANSFER Expected: ✅ \| Result: ❌ (module 'torch._C' has no attribute '_MobileOptimizerType') torch._C.MobileOptimizerType.VULKAN_AUTOMATIC_GPU_TRANSFER Expected: ❌ \| Result: ✅ torch.VULKAN_AUTOMATIC_GPU_TRANSFER Expected: ❌ \| Result: ✅ torch.MobileOptimizerType.VULKAN_AUTOMATIC_GPU_TRANSFER Expected: ❌ \| Result: ✅ torch._C.VULKAN_AUTOMATIC_GPU_TRANSFER Expected: ❌ \| Result: ✅ ``` After: ``` torch.utils.mobile_optimizer.MobileOptimizerType.VULKAN_AUTOMATIC_GPU_TRANSFER Expected: ✅ \| Result: ✅ torch._C._MobileOptimizerType.VULKAN_AUTOMATIC_GPU_TRANSFER Expected: ✅ \| Result: ✅ torch._C.MobileOptimizerType.VULKAN_AUTOMATIC_GPU_TRANSFER Expected: ❌ \| Result: ❌ (module 'torch._C' has no attribute 'MobileOptimizerType') torch.VULKAN_AUTOMATIC_GPU_TRANSFER Expected: ❌ \| Result: ❌ (module 'torch' has no attribute 'VULKAN_AUTOMATIC_GPU_TRANSFER') torch.MobileOptimizerType.VULKAN_AUTOMATIC_GPU_TRANSFER Expected: ❌ \| Result: ❌ (module 'torch' has no attribute 'MobileOptimizerType') torch._C.VULKAN_AUTOMATIC_GPU_TRANSFER Expected: ❌ \| Result: ❌ (module 'torch._C' has no attribute 'VULKAN_AUTOMATIC_GPU_TRANSFER') ``` ```buck test caffe2/test:public_bindings -- test_no_new_bindings``` ``` Summary Pass: 1 ListingSuccess: 1 Finished test run: https://www.internalfb.com/intern/testinfra/testrun/7881299473114294 ``` Reviewed By: SS-JIA Differential Revision: D42442395 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92081 Approved by: https://github.com/albanD	2023-01-14 17:06:00 +00:00
Nikita Shulga	8f1c3c68d3	[BE] Use nested namespaces in .cpp/.cu files (#92100 ) As we live in C++17 world This is a functional no-op, just - `s/namespace at { namespace native {/namespace at::native {/` - `s/namespace torch { namespace jit {/namespace torch::jit {/` Pull Request resolved: https://github.com/pytorch/pytorch/pull/92100 Approved by: https://github.com/izaitsevfb	2023-01-13 16:32:34 +00:00
Aaron Gokaslan	b9182cbbd8	Fixup torch jit with some initializers and moves (#92037 ) Fixup some minor codequality issues in torch JIT Pull Request resolved: https://github.com/pytorch/pytorch/pull/92037 Approved by: https://github.com/ezyang	2023-01-12 17:29:24 +00:00
Peter Bell	eece6da162	[inductor] Reduce device context manager overhead (#91045 ) This adds `torch.cuda._DeviceGuard` which is a stripped down version of `torch.cuda.device` with lower overhead. To do this, it only accepts `int` as the device so we don't need to call `_get_device_index` and is implemented with a new C++ helper `torch._C._cuda_exchangeDevice` that allows `_DeviceGuard.__enter__` to be just a single function call. On my machine, I see a drop from 3.8us of overhead to 0.94 us with this simple benchmark: ```python def set_device(): with torch.cuda.device(0): pass %timeit set_device() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/91045 Approved by: https://github.com/ngimel, https://github.com/anijain2305	2023-01-12 16:51:59 +00:00
Eddie Yan	e096d2db5a	[BC-Breaking] Separate `stream_id`, `device_index`, and `device_type` in `pack` and `unpack` for `Streams` (#81596 ) #75854 A naive attempt at working around the limitations of using a single 64-bit integer to pack `stream_id`, `device_index`, and `device_type`. Stills needs sanity checks, testing, and minimization of BC-breaking changes. Currently a Holder for the `StreamData3` struct is used for `IValue` compatibility. While doing this seems to work for `ivalue.h` and `ivalue_inl.h`, this doesn't seem to be naively working for the JIT CUDA stream wrapper? (Something about ambiguous calls if an `intrusive_ptr` to `c10::ivalue::StreamData3Holder` is used as the return type for `pack()`. It turns out that the methods required to access the fields for rematerializing a CUDA Stream are basically already present anyway, so `pack` is simply removed in the wrapper for now and the methods to access the required fields are called directly. CC @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/81596 Approved by: https://github.com/ezyang	2023-01-12 14:16:49 +00:00
Mengwei Liu	6676193b5e	[frontend] Expose real_type getter for torch.Argument (#91938 ) Exposing an API to get real_type from an Argument. This is useful for Argument types such as SymInt. Differential Revision: [D42425661](https://our.internmc.facebook.com/intern/diff/D42425661/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91938 Approved by: https://github.com/ezyang	2023-01-12 01:26:50 +00:00
Leon Gao	8007c2d96a	Python Script Object to IValue (#91776 ) Summary: * when we try to port py obj of script module/obj to c++, `tryToInferType` is flawed in providing type inference metadata. but change it would break normal torch.jit.script flow, so we try extract the ivalue in the py obj value. Test Plan: NA Reviewed By: PaliC Differential Revision: D41749823 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91776 Approved by: https://github.com/842974287	2023-01-11 23:06:57 +00:00
PyTorch MergeBot	3aeb7127b4	Revert "Clean Up MobileOptimizerType Rewrite Flags Public API and Documentation (#91600 )" This reverts commit 370df963e062d8eb409d4426dd59b3f0cac8c3d1. Reverted https://github.com/pytorch/pytorch/pull/91600 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally	2023-01-10 21:38:40 +00:00
Salil Desai	370df963e0	Clean Up MobileOptimizerType Rewrite Flags Public API and Documentation (#91600 ) Summary: X-link: https://github.com/facebookresearch/d2go/pull/452 Remove MobileOptimizerType and all rewrite flags from torch.X and torch._C.X to clean up torch.X and torch._C.X namespaces The affected rewrite flags are - CONV_BN_FUSION - FUSE_ADD_RELU - HOIST_CONV_PACKED_PARAMS - INSERT_FOLD_PREPACK_OPS - REMOVE_DROPOUT - VULKAN_AUTOMATIC_GPU_TRANSFER Bc-Breaking Change: Before this change, the rewrite flags were accessible through all of 1. torch.utils.mobile_optimizer.MobileOptimizerType.X 2. torch._C.MobileOptimizerType.X 3. torch.X 4. torch.MobileOptimizerType.X 5. torch._C.X But after this change, only torch.utils.mobile_optimizer.MobileOptimizerType.X (option 1 above) and the newly added torch._C._MobileOptimizerType.X remain Corresponding updates to PyTorch Tutorial Docs are in https://github.com/pytorch/tutorials/pull/2163 Test Plan: ```buck test caffe2/test:test_mobile_optimizer``` ``` Summary Pass: 6 Skip: 1 ↻ caffe2/test:test_mobile_optimizer - test_mobilenet_optimize_for_mobile (test_mobile_optimizer.TestOptimizer) ListingSuccess: 1 Finished test run: https://www.internalfb.com/intern/testinfra/testrun/4222124793514412 ``` ___ With temporary testing changes in D41690204: ```buck run caffe2:test_rewrite_flags_api``` Before: ``` torch.utils.mobile_optimizer.MobileOptimizerType.VULKAN_AUTOMATIC_GPU_TRANSFER Expected: ✅ \| Result: ✅ torch._C._MobileOptimizerType.VULKAN_AUTOMATIC_GPU_TRANSFER Expected: ✅ \| Result: ❌ (module 'torch._C' has no attribute '_MobileOptimizerType') torch._C.MobileOptimizerType.VULKAN_AUTOMATIC_GPU_TRANSFER Expected: ❌ \| Result: ✅ torch.VULKAN_AUTOMATIC_GPU_TRANSFER Expected: ❌ \| Result: ✅ torch.MobileOptimizerType.VULKAN_AUTOMATIC_GPU_TRANSFER Expected: ❌ \| Result: ✅ torch._C.VULKAN_AUTOMATIC_GPU_TRANSFER Expected: ❌ \| Result: ✅ ``` After: ``` torch.utils.mobile_optimizer.MobileOptimizerType.VULKAN_AUTOMATIC_GPU_TRANSFER Expected: ✅ \| Result: ✅ torch._C._MobileOptimizerType.VULKAN_AUTOMATIC_GPU_TRANSFER Expected: ✅ \| Result: ✅ torch._C.MobileOptimizerType.VULKAN_AUTOMATIC_GPU_TRANSFER Expected: ❌ \| Result: ❌ (module 'torch._C' has no attribute 'MobileOptimizerType') torch.VULKAN_AUTOMATIC_GPU_TRANSFER Expected: ❌ \| Result: ❌ (module 'torch' has no attribute 'VULKAN_AUTOMATIC_GPU_TRANSFER') torch.MobileOptimizerType.VULKAN_AUTOMATIC_GPU_TRANSFER Expected: ❌ \| Result: ❌ (module 'torch' has no attribute 'MobileOptimizerType') torch._C.VULKAN_AUTOMATIC_GPU_TRANSFER Expected: ❌ \| Result: ❌ (module 'torch._C' has no attribute 'VULKAN_AUTOMATIC_GPU_TRANSFER') ``` ```buck test caffe2/test:public_bindings -- test_no_new_bindings``` ``` Summary Pass: 1 ListingSuccess: 1 Finished test run: https://www.internalfb.com/intern/testinfra/testrun/7881299473114294 ``` Differential Revision: D41690203 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91600 Approved by: https://github.com/albanD, https://github.com/malfet	2023-01-10 20:16:53 +00:00
PyTorch MergeBot	b3603f8129	Revert "Deduplicate c10 error and PyTorchError hierarchy (#87855 )" This reverts commit 34f2d3e6ae56744c20c2f859f97101dff291bbbc. Reverted https://github.com/pytorch/pytorch/pull/87855 on behalf of https://github.com/osalpekar due to perf regression in quantization tests	2023-01-06 19:56:35 +00:00
BowenBao	66745831d7	[ONNX] Support constant 'aten::__contains__' (#91660 ) #84624 introduces an update on `torch.norm` [dispatch logic](`eaa43d9f25/torch/functional.py (L1489)`) which now depends on `layout`. Resulting in regressions to export related operators from TorchScript. This PR resolves the regression by partially supporting a subset use case of `prim::layout` (only `torch.strided`), `aten::__contains__` (only constants) operators. It requires much more effort to properly support other layouts, e.g. `torch.sparse_coo`. Extending JIT types, and supporting related family of ops like `aten::to_sparse`. This is out of the scope of this PR. Fixes #83661 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91660 Approved by: https://github.com/justinchuby, https://github.com/kit1980	2023-01-06 01:39:32 +00:00
Aaron Gokaslan	18b37bbff9	Clang-Tidy: Improve tensorexpr headers with additional std::moves (#91572 ) Splitting #91559 into smaller pieces Pull Request resolved: https://github.com/pytorch/pytorch/pull/91572 Approved by: https://github.com/ezyang	2023-01-05 09:57:54 +00:00
Wanchao Liang	17bc40c19d	add __hash__ to FunctionSchema (#90730 ) This PR adds __hash__ to FunctionSchema pybind binding, so that it could be used for things like dict indexing Pull Request resolved: https://github.com/pytorch/pytorch/pull/90730 Approved by: https://github.com/ezyang	2023-01-04 18:59:22 +00:00
William Phetsinorath	34f2d3e6ae	Deduplicate c10 error and PyTorchError hierarchy (#87855 ) Fixes #53370 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87855 Approved by: https://github.com/albanD	2023-01-02 15:53:36 +00:00
Aaron Gokaslan	553b592824	Clang-Tidy: use modern for each loops and transparent functors (#91449 ) This applies some more clang-tidy fixups. Particularly, this applies the modernize loops and modernize-use-transparent-functors checks. Transparent functors are less error prone since you don't have to worry about accidentally specifying the wrong type and are newly available as of C++17. Modern foreach loops tend be more readable and can be more efficient to iterate over since the loop condition is removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91449 Approved by: https://github.com/ezyang	2022-12-29 23:37:51 +00:00
Aaron Gokaslan	c470ad4f4a	Add missing overload for ivalue toSym(Int\|Float) (#91405 ) Noticed the toSymFloat / toSymInt overloads always copied the internal pointer of an ivalue even if it was an rvalue unlike other overloads (like toTensor). This fixes that issue by adding the appropriate methods needed to facilitate that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91405 Approved by: https://github.com/ezyang	2022-12-28 11:07:37 +00:00
min-jean-cho	6d2b0cbb40	[Re-landing 86706] [JIT] Frozen Graph Linear-BatchNormNd Folding (#91020 ) Re-landing #86706 This PR adds linear-batchnormNd folding for JIT frozen graphs. Performance benchmark A preliminary benchmark with a simple model of linear+bn1d tested on first socket, physical cores of skylake machine. FP32, JIT without linear-bn folding ![Screenshot (1368)](https://user-images.githubusercontent.com/93151422/195168944-cfc5b920-bc82-4be1-a221-d194c8fa6c18.png) with linear-bn folding ![Screenshot (1367)](https://user-images.githubusercontent.com/93151422/195168926-267b0515-45a1-4f08-922d-c150845199ae.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91020 Approved by: https://github.com/davidberard98	2022-12-21 08:00:32 +00:00
Aaron Gokaslan	3916d7a575	Apply modernize-use-emplace to aten, c10, torch (#91077 ) Apply clang-tidy check modernize-use-emplace. This is slightly more efficient by using an inplace constructor and is the recommended style in parts of the codebase covered by clang-tidy. This just manually applies the check to rest of the codebase. Pinging @ezyang as this is related to my other PRs he reviewed like #89000 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91077 Approved by: https://github.com/ezyang	2022-12-19 07:49:56 +00:00
PyTorch MergeBot	31b8dc7542	Revert "[JIT] Frozen Graph Linear-BatchNormNd Folding (#86706 )" This reverts commit e585156c59767ff13306a31d8c31ffe7a33439dc. Reverted https://github.com/pytorch/pytorch/pull/86706 on behalf of https://github.com/davidberard98 due to possibly causing internal build failures, will revert and investigate later	2022-12-16 00:49:54 +00:00
min-jean-cho	e585156c59	[JIT] Frozen Graph Linear-BatchNormNd Folding (#86706 ) This PR adds linear-batchnormNd folding for JIT frozen graphs. Performance benchmark A preliminary benchmark with a simple model of linear+bn1d tested on first socket, physical cores of skylake machine. FP32, JIT without linear-bn folding ![Screenshot (1368)](https://user-images.githubusercontent.com/93151422/195168944-cfc5b920-bc82-4be1-a221-d194c8fa6c18.png) with linear-bn folding ![Screenshot (1367)](https://user-images.githubusercontent.com/93151422/195168926-267b0515-45a1-4f08-922d-c150845199ae.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86706 Approved by: https://github.com/davidberard98	2022-12-14 23:24:50 +00:00
Charlie West-Taylor	cfd552547f	Use the Python frame safely in _pythonCallstack (#88993 ) Currently, the result of `PyEval_GetFrame()` is piped straight to `Py_INCREF`. However, `PyEval_GetFrame` [may return null](https://docs.python.org/3/c-api/reflection.html#c.PyEval_GetFrame), which seems to be the case sometimes, when calling `_pythonCallstack` from another thread. This is handled in the subsequent `while (nullptr != frame)` block, but `Py_INCREF`, called before it, [doesn't handle this case](https://docs.python.org/3/c-api/refcounting.html#c.Py_INCREF), so the program segfaults. The safe form of `Py_INCREF` is `Py_XINCREF`, so use that instead ([docs](https://docs.python.org/3/c-api/refcounting.html#c.Py_XINCREF)). Pull Request resolved: https://github.com/pytorch/pytorch/pull/88993 Approved by: https://github.com/albanD	2022-11-17 00:59:15 +00:00
Kazuaki Ishizaki	e0c194f10b	Fix typos in messages under torch (#88961 ) This PR fixes typos of messages and parms in c++ source and head files under `torch` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88961 Approved by: https://github.com/albanD	2022-11-14 19:06:41 +00:00
Edward Z. Yang	46796fe5e9	Fix XLA symbolic shapes binding (#88928 ) Obsoletes https://github.com/pytorch/pytorch/pull/88772 Mostly revolves around NOT assuming that the inside is a SymNode, but instead duck-typed to be a SymNode. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/88928 Approved by: https://github.com/SherlockNoMad	2022-11-13 00:31:27 +00:00
Wei-Sheng Chin	19d7941e37	Fix Python-bound function signature (torch._C.Graph.addInput) (#88528 ) In pytorch/torch/_C/__init__.pyi, Graph.addInput has signature ```python def addInput(self, name: str) -> Value: ... ``` which doesn't match the corresponding function ```cpp Value* addInput(const std::string& name = "") { return block_->addInput(name); } ``` in python_ir.cpp. This PR aligns the bound function on both C++ and Python sides. Without this PR, mypy will compain whenever a change contains some calls to `addInput`; for example, ![image](https://user-images.githubusercontent.com/3524474/200092086-429b8d63-9321-4d03-b0d6-f4c9bd361756.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88528 Approved by: https://github.com/davidberard98	2022-11-09 01:31:45 +00:00
Nikita Shulga	caaf37a111	Fix `PyTorchStreamWriter` exception handling (#88128 ) Avoid double exception in destructor if attempting to serialize to python object that does not have `write` method Use `Finalizer` class in `PyTorchStreamWriter::writeEndOfFile()` to a always set `finailized_` property even if excretion occurs. (as there isn't much one can do at this point) Add expicit check for the attribue to `_open_zipfile_writer_buffer` and add unitests Modernize code a bit by using Python-3 `super()` method Fixes https://github.com/pytorch/pytorch/issues/87997 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88128 Approved by: https://github.com/albanD	2022-10-31 23:38:03 +00:00
Aaron Gokaslan	59fe272c1e	Fix: prefer .is_none() over .is(py::none()) for pybind11 (#88051 ) Fixes minor perf regression I saw in #85688 and replaced throughout the code base. `obj == Py_None` is directly equivalent to is_none(). Constructing a temporary py::none() object needlessly incref/decref the refcount of py::none, this method avoids that and therefore is more efficient. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88051 Approved by: https://github.com/albanD	2022-10-31 16:41:27 +00:00
Salil Desai	df1cc0ef47	[Vulkan] Add Vulkan Rewrite to Transfer Inputs and Outputs to Vulkan and CPU Backends Respectively (#87432 ) With this change, we don't have to manually invoke transferring input and output backends when we run vulkan models. Graph rewrite code based off of: - `32efff45ba (diff-a473bddb458dc24225866a45092d6eca064eddd256245d93020e48e216eee4d5R160-R179)` Differential Revision: [D39519168](https://our.internmc.facebook.com/intern/diff/D39519168/) NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39519168/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/87432 Approved by: https://github.com/mcr229, https://github.com/digantdesai	2022-10-31 14:18:45 +00:00
Salil Desai	bc68625151	[Vulkan] Add support for Optimization Blocklist to Vulkan Rewrite (#87431 ) Optimization Blocklist will be used in a future diff (D40315730) to make the rewrite to transfer input/output backends optional Differential Revision: [D40315729](https://our.internmc.facebook.com/intern/diff/D40315729/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87431 Approved by: https://github.com/mcr229, https://github.com/digantdesai	2022-10-31 14:15:51 +00:00
Edward Z. Yang	1ff52225f1	Unify SymIntNode and SymFloatNode into SymNode (#87817 ) This refactor was prompted by challenges handling mixed int/float operations in C++. A previous version of this patch added overloads for each permutation of int/float and was unwieldy https://github.com/pytorch/pytorch/pull/87722/ This PR takes a different approach. The general outline of the patch is to combine the C++ types SymIntNode and SymFloatNode into a single type, SymNode. This is type erased; we no longer know statically at C++ if we have an int/float and have to test it with the is_int()/is_float() virtual methods. This has a number of knock on effects. - We no longer have C++ classes to bind to Python. Instead, we take an entirely new approach to our Python API, where we have a SymInt/SymFloat class defined entirely in Python, which hold a SymNode (which corresponds to the C++ SymNode). However, SymNode is not pybind11-bound; instead, it lives as-is in Python, and is wrapped into C++ SymNode using PythonSymNode when it goes into C++. This implies a userland rename. In principle, it is also possible for the canonical implementation of SymNode to be written in C++, and then bound to Python with pybind11 (we have this code, although it is commented out.) However, I did not implement this as we currently have no C++ implementations of SymNode. Because we do return SymInt/SymFloat from C++ bindings, the C++ binding code needs to know how to find these classes. Currently, this is done just by manually importing torch and getting the attributes. - Because SymInt/SymFloat are easy Python wrappers, __sym_dispatch__ now takes SymInt/SymFloat, rather than SymNode, bringing it in line with how __torch_dispatch__ works. Some miscellaneous improvements: - SymInt now has a constructor that takes SymNode. Note that this constructor is ambiguous if you pass in a subclass of SymNode, so an explicit downcast is necessary. This means toSymFloat/toSymInt are no more. This is a mild optimization as it means rvalue reference works automatically. - We uniformly use the caster for c10::SymInt/SymFloat, rather than going the long way via the SymIntNode/SymFloatNode. - Removed some unnecessary toSymInt/toSymFloat calls in normalize_* functions, pretty sure this doesn't do anything. - guard_int is now a free function, since to guard on an int you cannot assume the method exists. A function can handle both int and SymInt inputs. - We clean up the magic method definition code for SymInt/SymFloat/SymNode. ONLY the user classes (SymInt/SymFloat) get magic methods; SymNode gets plain methods; this is to help avoid confusion between the two types. Signed-off-by: Edward Z. Yang <ezyang@fb.com> cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87817 Approved by: https://github.com/albanD, https://github.com/anjali411	2022-10-27 20:56:02 +00:00
samdow	169ec120ef	[Modes] refactor modes to only use a stack in cpp (#86458 ) Refactors the mode code to only have the C++ mode stack and not the "C++ mode" like we originally had. This also simplifies the mode logic in a number of places Pull Request resolved: https://github.com/pytorch/pytorch/pull/86458 Approved by: https://github.com/zou3519	2022-10-21 19:18:23 +00:00
albanD	12b2f70a89	Symintify pad ops (#87046 ) Following comments below, we need to add support for `std::negate`/`std::min`/`std::max`/`operator-` for SymInt Pull Request resolved: https://github.com/pytorch/pytorch/pull/87046 Approved by: https://github.com/ezyang	2022-10-19 21:43:08 +00:00
lezcano	48f0231223	Fix Scalar(bool) handling in toIValue (#87179 ) At the moment, they were casted to `int64`, which breaks quite a few casting rules for example in `ops.aten`. Quite a vintage bug, circa 2020. With this fix, the following code prints `torch.bool`, rather than `torch.int64`. ```python import torch msk = torch.tensor([False]) b = torch.tensor([False]) print(torch.ops.aten.where.ScalarSelf(msk, True, b).dtype) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/87179 Approved by: https://github.com/albanD	2022-10-18 18:53:03 +00:00
albanD	c21dcffc00	Very limited pow support (#87042 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/87042 Approved by: https://github.com/ezyang	2022-10-17 13:14:07 +00:00
albanD	3a4c0900c7	Reland 3 of Merge more symbolic meta kernels and symint changes from branch (#86795 ) Take 3 Contains: - symintification of split* - floor support on SymFloat - pad_backward, gather, scatter meta Pull Request resolved: https://github.com/pytorch/pytorch/pull/86795 Approved by: https://github.com/z-a-f	2022-10-17 02:09:40 +00:00
tangleintel	7980ed95bd	Support unpacking python dictionary in torch.jit.trace() (#81623 ) # Support unpacking python dictionary in torch.jit.trace() ## Problem statement & Motivation ### Problem 1(usability): Say, if you have a model and its forward method defined as follows: `def forward(self, key1=value1, key2=value2, key3=value3)` And you have a dataset and each data point in the dataset is a python dict as follows: `data = {key1:value1, key3:value3, key2:value2}` The problem is that if you want to trace the model using the dict data by the giving dataset, you need unpack the dictionary and reorder its value manually and make up a tuple as `data_tuple = (value1, value2, value3)` as the `example_inputs` parameter of `torch.jit.trace()`. This marshalling process is not user friendly. ### Problem 2 (feasibility): Say, if you have a model and its forward method defined as follows: `def forward(self, key1=None, key2=None, key3=None)` -> The default value is None And you have a dataset and each data point in the dataset is a python dict as follows: `data = {key1:value1, key3:value3}` -> Only part of the required value by forward was given, the rest use the default value. The problem is that if you want to trace the model using the dict data by the giving dataset, it's not feasible at all. Cause neither you can pass a tuple like `T1 = (value1, value3)` nor `T2 = (value1, None, value3)`. T1 will mismatch value3 with key2 and T2 include None type which will be blocked by tracer's type checking. (Of course you can pass `T3 = (value1,)` to make the trace function finish without exception, but the traced model you get probably is not what you expect cause the different input may result in different traced result.). These problems come from the HuggingFace's PT model, especially in text-classification tasks with datasets such as [MRPC,](https://paperswithcode.com/dataset/mrpc) [MNLI](https://paperswithcode.com/dataset/multinli) etc. ## Solution To address these two issues, we propose to support a new type, that is, python dict as example_inputs parameter for torch.jit.trace(). We can base on the runtime type information of the example_inputs object to determine if we fall back to the original tuple path or go into the new dictionary path. Both problem 1 and problem 2 can be solved by utilizing the "``" operator. ## Limitation & Mitigation 1. If we use dict as example_inputs to trace the model, then we have to pass a dictionary to the traced model too. (Cause probably we will change the order of debug name of the input parameter in torchscript IR, thus we can't assume the traced model's input parameters order are the same with the original model.). We need highlight this too in the document to mitigate this problem. For example: ``` # fetch a data from dataloader, and the data is a dictionary # and the example_inputs_dict is like: {key1:value1, key3:value3, key2:value2} # the forward() is like: def forward(self, key1=value1, key2=value2, key3=value3) example_inputs_dict = next(iter(dataloader)) jit_model = model.eval() # use the dictionary to trace the model jit_model = torch.jit.trace(jit_model, example_inputs_dict, strict=False) # Now the IR will be graph(%self : __torch__.module.___torch_mangle_n.Mymodule, %key1 : type1, %key3 : type3, %key2 : type2) jit_model = torch.jit.freeze(jit_model) # It's OK to use dict as the parameter for traced model jit_model(example_inputs_dict) example_inputs_tuple = (value1, value3, value2) # It's wrong to rely on the original args order. jit_model(example_inputs_tuple) ``` ## Note 1. This PR will make some UT introduced in [39601](https://github.com/pytorch/pytorch/pull/39601) fail, which I think should be classified as unpacking a tuple containing a single dictionary element in our solution. 4. I think there is ambiguity since currently we only specify passing a tuple or a single Tensor as our example_inputs parameter in torch.jit.trace()*'s documentation, but it seems we can still passing a dictionary. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81623 Approved by: https://github.com/davidberard98	2022-10-15 05:33:09 +00:00
BowenBao	45274c56a4	[ONNX] Partially re-enable RoiAlign and RoiPool unit tests (#86169 ) This PR depends on https://github.com/pytorch/vision/pull/6685 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86169 Approved by: https://github.com/justinchuby, https://github.com/AllenTiTaiWang, https://github.com/abock	2022-10-13 14:39:44 +00:00
albanD	66cab5245f	Reland 2 min/max support for SymInt/Floats, finish as_strided/scatter/squeeze() backward symint support (#86797 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86797 Approved by: https://github.com/bdhirsh	2022-10-13 00:31:19 +00:00
PyTorch MergeBot	2aa981ab74	Revert "Reland 2 of Merge more symbolic meta kernels and symint changes from branch (#86334 ) (#86488 )" This reverts commit 978b46d7c96627e3b3553ad70ad21cb161d05f90. Reverted https://github.com/pytorch/pytorch/pull/86488 on behalf of https://github.com/osalpekar due to Broke executorch builds internally with the following message: RuntimeError: Missing out variant for functional op: aten::split.Tensor(Tensor(a -> *) self, SymInt split_size, int dim=0) -> Tensor(a)[] . Make sure you have loaded your custom_ops_generated_lib	2022-10-11 23:39:50 +00:00
PyTorch MergeBot	811b8e012b	Revert "min/max support for SymInt/Floats, finish as_strided/scatter/squeeze() backward symint support (#86643 )" This reverts commit 86f914e9966e91b3d3e7c1504f5b1f00a9498d88. Reverted https://github.com/pytorch/pytorch/pull/86643 on behalf of https://github.com/osalpekar due to Need to revert this to cleanly revert https://github.com/pytorch/pytorch/pull/86488. This should be safe to re-land later	2022-10-11 23:12:40 +00:00
albanD	86f914e996	min/max support for SymInt/Floats, finish as_strided/scatter/squeeze() backward symint support (#86643 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86643 Approved by: https://github.com/anjali411	2022-10-11 17:37:30 +00:00
albanD	978b46d7c9	Reland 2 of Merge more symbolic meta kernels and symint changes from branch (#86334 ) (#86488 ) symintify split_with_sizes, dropout, fused_fake_obs_quant. meta for padding_2d ops add meta_bernoulli_ meta kernel for at::gather get pytorch_struct to pass: meta for scatter_add, fix backward symintify split ops Pull Request resolved: https://github.com/pytorch/pytorch/pull/86488 Approved by: https://github.com/ezyang	2022-10-10 15:54:28 +00:00
PyTorch MergeBot	75df4b5e3d	Revert "Merge more symbolic meta kernels and symint changes from branch (#86334 )" This reverts commit 08e3999fa494238f8f62346a140da36bd43864e7. Reverted https://github.com/pytorch/pytorch/pull/86334 on behalf of https://github.com/seemethere due to Trying to revert https://github.com/pytorch/pytorch/pull/86207, this PR causes merge conflicts with the initial revert so will have to revert this as well	2022-10-07 16:03:30 +00:00
Brian Hirsh	08e3999fa4	Merge more symbolic meta kernels and symint changes from branch (#86334 ) symintify split_with_sizes, dropout, fused_fake_obs_quant. meta for padding_2d ops add meta_bernoulli_ meta kernel for at::gather get pytorch_struct to pass: meta for scatter_add, fix backward symintify split ops Pull Request resolved: https://github.com/pytorch/pytorch/pull/86334 Approved by: https://github.com/ezyang	2022-10-06 23:29:04 +00:00
Edward Z. Yang	79dd621f76	Symbolic shapes mega merge PR (Oct 3) (#86160 ) - TensorGeometry supports symint - check_size supports symint - functorch batch rule improved symint - Some operator support for symint in LTC - More supported operations on SymInt and SymFloat - More symint support in backwards formulas This merge includes code contributions from bdhirsh and anjali411. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/86160 Approved by: https://github.com/Chillee	2022-10-04 04:12:09 +00:00
Horace He	82d9592f1b	Batch of symintifications to allow more models to pass in inference (#86104 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86104 Approved by: https://github.com/ezyang	2022-10-04 04:01:58 +00:00

... 3 4 5 6 7 ...

922 Commits