pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 12:54:11 +08:00

Author	SHA1	Message	Date
Michael Andreas Dagitses	501d0729cb	move build_variables.bzl and ufunc_defs.bzl from pytorch-root/tools/ to the root Pull Request resolved: https://github.com/pytorch/pytorch/pull/78542 This makes importing easier in different build systems that have different absolute names for the pytorch-root. Differential Revision: [D36782582](https://our.internmc.facebook.com/intern/diff/D36782582/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36782582/)! Approved by: https://github.com/malfet	2022-06-02 19:39:27 +00:00
Michael Andreas Dagitses	844368c032	remove Bazel globs that don't match any files Pull Request resolved: https://github.com/pytorch/pytorch/pull/78007 These are vestigial. Differential Revision: [D36558465](https://our.internmc.facebook.com/intern/diff/D36558465/) Approved by: https://github.com/kit1980	2022-06-02 18:39:47 +00:00
dzdang	2aad28a539	[quant][core][gpu][feature] Implemented quantized cuda gelu Summary: Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/77212 Approved by: https://github.com/jerryzh168	2022-05-24 22:31:45 +00:00
Antonio Kim	02c4d877b4	Codegen Non-Native IR Nodes (#76535 ) Add codegen infrastructure to generate IR nodes for non-native ops. The proposed change is to add a `non_native` key to the `{backend}_native_functions.yaml` file that contains schema definitions similar to what is found in `native_functions.yaml`. e.g. ``` non_native: ... - func: expand(Tensor input, int[] size, bool is_scalar_expand) -> Tensor ... ``` these definitions are parsed into a `LazyIrSchema` that can be used for generating IR nodes using `GenLazyIR`. Fixes #74628 CC: @wconstab @desertfire @henrytwo Pull Request resolved: https://github.com/pytorch/pytorch/pull/76535 Approved by: https://github.com/wconstab	2022-05-24 19:29:23 +00:00
Michael Andreas Dagitses	e517fc8b28	eliminate Bazel's libtorch_cpp_generated_sources Pull Request resolved: https://github.com/pytorch/pytorch/pull/76179 This list is redundant with the shared build structure. Differential Revision: [D35818500](https://our.internmc.facebook.com/intern/diff/D35818500/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D35818500/)! Approved by: https://github.com/dreiss	2022-05-17 03:46:49 +00:00
Michael Andreas Dagitses	a013d83bf9	eliminate Bazel's libtorch_python_generated_sources Pull Request resolved: https://github.com/pytorch/pytorch/pull/76178 These contents are already identified in the shared build structure. Differential Revision: [D35817999](https://our.internmc.facebook.com/intern/diff/D35817999/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D35817999/)! Approved by: https://github.com/dreiss	2022-05-17 03:43:02 +00:00
PyTorch MergeBot	7eaf4780ba	Revert "[LT] Store OpKind for each IR subclass in a static field" This reverts commit ac37ddc79557d7a5ec184a7d2924241ccfc8a333. Reverted https://github.com/pytorch/pytorch/pull/76711 on behalf of https://github.com/malfet	2022-05-09 20:50:09 +00:00
Bin Bao	ac37ddc795	[LT] Store OpKind for each IR subclass in a static field Summary: Currently OpKind is stored as an object field called op_ for each IR node, and one usage of op_ is to avoid dynamic_cast in NodeCast when we need to downcast a base-node pointer into a concrete sub-node pointer. As a result, we need to construct and pass in an op when downcasting nodes, and this becomes quite anonnying when we start to implement the trie-based IR node reusing. More importantly, the op for each subclass should be unique for that subclass and thus making it a const static field is a more logical design. In this PR, we still keep the object-level op_ for easier XLA adoption. As furture work, we can come back to remove op_, make the op() method virtual, and get rid of OpKind in all the node constructors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76711 Approved by: https://github.com/wconstab, https://github.com/JackCaoG	2022-05-06 19:14:46 +00:00
mikey dagitses	37fb636b7f	fix package violation caused by D35587412 (#76808 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76808 This reached into aten/TARGETS in fbcode. ghstack-source-id: 155484095 Test Plan: Verified manually. Reviewed By: dreiss, malfet Differential Revision: D36128458 fbshipit-source-id: c7447b3a40fe905993e799d211241e72183f8acb (cherry picked from commit b68eb7a45d8973fadab2dfcafcbb0f63801abd40)	2022-05-05 23:39:03 +00:00
mikey dagitses	ac45fb9b93	switch Bazel to the shared generate-code genrule (#75790 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75790 We were building it before, but now we use it in downstream rules. This enables us to eliminate the handwritten genrule. ghstack-source-id: 155300051 Test Plan: Verified locally and in CI. Reviewed By: dreiss Differential Revision: D35645390 fbshipit-source-id: 478bb37a6ec295c232f66383babf46606e83ed5e (cherry picked from commit 2822d4c5b48c6d9282149b2d43cf72d645237196)	2022-05-04 15:26:25 +00:00
mikey dagitses	eb27c85160	move generate-code into shared build structure (#75699 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75699 ghstack-source-id: 155255334 Test Plan: Rely on CI. Reviewed By: dreiss Differential Revision: D35587412 fbshipit-source-id: 5ab79c07029de279a1fae36519654a73bb61d430 (cherry picked from commit 4896b72a6c0cc087e36889d21d2d885009d94a6d)	2022-05-03 09:53:37 +00:00
anjali411	b204ad863f	Revert "Revert "Allow specifying tags for aten operators in native_functions.yaml"" This reverts commit ea44645c9a682a4e212e64b94a86383c3388ed6b. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76456 Approved by: https://github.com/osalpekar	2022-04-28 02:04:57 +00:00
dzdang	6e292f1a21	[quant][core][gpu][improvement] Integrated quantized cudnn max pool2d with existing quantized_max_pool2d (#76129 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76129 Previously, quantized_max_pool2d_cudnn was made available to the frontend through torch.ops.quantized.max_pool2d. We improve the integration by also making it available through torch.max_pool2d, which is made possible by registering quantized_max_pool2d_cudnn in native_functions.yaml under quantized_max_pool2d, which is called in max_pool2d. Ideally and ultimately, we will get rid of the quantized_max_pool2d registration in native_functions.yaml, and directly register quantized_max_pool2d and quantized_max_pool2d_cudnn under max_pool2d, but current support for quantized dispatch keys blocks us from doing so. Test Plan: ``` python test/run_tests.py ``` ``` python test/run_tests.py ``` Differential Revision: D35789078 D35789078 Reviewed By: jerryzh168 Pulled By: dzdang fbshipit-source-id: 5d8220255bfab663b4779b5d3c66dea9f79d8ee7 (cherry picked from commit c27164da29043f7dc9a4c27d24a93cd37162c23e)	2022-04-27 01:52:45 +00:00
Scott Wolchok	e816e17655	[PyTorch] Add native fast path for transformer encoder inference (#76333 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76333 The current PyTorch multi-head attention and transformer implementations are slow. This should speed them up for inference. ghstack-source-id: 154737857 (Note: this ignores all push blocking failures!) Test Plan: CI Reviewed By: cpuhrsch Differential Revision: D35239925 fbshipit-source-id: 5a7eb8ff79bc6afb4b7d45075ddb2a24a6e2df28	2022-04-26 12:58:03 -04:00
mikey dagitses	f4200600e4	move Bazel version header generation to shared build structure (#75332 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75332 ghstack-source-id: 154678044 Test Plan: Rely on OSS CI. Reviewed By: malfet Differential Revision: D35434229 fbshipit-source-id: 7cdd33fa32d0c485f44477e414c24c9bc4b74963 (cherry picked from commit 60285c613e8703c52f36f0bf1178e35c04574ffa)	2022-04-25 17:51:30 +00:00
mikey dagitses	d78dd825ba	define the caffe2_serialize target in Bazel (#75942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75942 This also requires changes to the target definition and the xplat translator to get it working. ghstack-source-id: 154678046 Test Plan: Verify locally and rely on CI. Reviewed By: malfet Differential Revision: D35704597 fbshipit-source-id: 6b0d9f5a044609b24dda656f80233ba6186c097f (cherry picked from commit 6de43c5ca7a973c9f8b71f4d60d4d5e85cc2ba21)	2022-04-25 16:14:05 +00:00
Jon Janzen	2387efd356	Revert "[PyTorch] Add native fast path for transformer encoder inference" This reverts commit b369b89f235f54bc9de85d768fb62ac4579681dc. This has internal changes and should not have been landed via mergebot. Ref: https://github.com/pytorch/pytorch/pull/75809#issuecomment-1108717166	2022-04-25 11:40:02 -04:00
Scott Wolchok	b369b89f23	[PyTorch] Add native fast path for transformer encoder inference Pull Request resolved: https://github.com/pytorch/pytorch/pull/75809 The current PyTorch multi-head attention and transformer implementations are slow. This should speed them up for inference. Differential Revision: [D35239925](https://our.internmc.facebook.com/intern/diff/D35239925/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D35239925/)! Approved by: https://github.com/ezyang	2022-04-25 06:11:36 +00:00
Edward Yang	36420b5e8c	Rename tools/codegen to torchgen (#76275 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76275 In preparation for addressing https://github.com/pytorch/pytorch/issues/73212 Diff was generated with: ``` git mv tools/codegen torchgen git grep -l 'tools.codegen' \| xargs sed -i 's/tools.codegen/torchgen/g' sed -i "s/\${TOOLS_PATH}\/codegen/\${TORCH_ROOT}\/torchgen/g" caffe2/CMakeLists.txt ``` and a manual edits to: * tools/test/test_gen_backend_stubs.py * torchgen/build.bzl * torchgen/gen_backend_stubs.py aka this diff: ``` diff --git a/tools/test/test_gen_backend_stubs.py b/tools/test/test_gen_backend_stubs.py index 3dc26c6d2d..104054575e 100644 --- a/tools/test/test_gen_backend_stubs.py +++ b/tools/test/test_gen_backend_stubs.py @@ -9,7 +9,7 @@ from torchgen.gen_backend_stubs import run from torchgen.gen import _GLOBAL_PARSE_NATIVE_YAML_CACHE # noqa: F401 path = os.path.dirname(os.path.realpath(__file__)) -gen_backend_stubs_path = os.path.join(path, '../torchgen/gen_backend_stubs.py') +gen_backend_stubs_path = os.path.join(path, '../../torchgen/gen_backend_stubs.py') # gen_backend_stubs.py is an integration point that is called directly by external backends. # The tests here are to confirm that badly formed inputs result in reasonable error messages. diff --git a/torchgen/build.bzl b/torchgen/build.bzl index ed04e35a43..d00078a3cf 100644 --- a/torchgen/build.bzl +++ b/torchgen/build.bzl @@ -1,6 +1,6 @@ def define_targets(rules): rules.py_library( - name = "codegen", + name = "torchgen", srcs = rules.glob(["*/.py"]), deps = [ rules.requirement("PyYAML"), @@ -11,6 +11,6 @@ def define_targets(rules): rules.py_binary( name = "gen", - srcs = [":codegen"], + srcs = [":torchgen"], visibility = ["//visibility:public"], ) diff --git a/torchgen/gen_backend_stubs.py b/torchgen/gen_backend_stubs.py index c1a672a655..beee7a15e0 100644 --- a/torchgen/gen_backend_stubs.py +++ b/torchgen/gen_backend_stubs.py @@ -474,7 +474,7 @@ def run( ) -> None: # Assumes that this file lives at PYTORCH_ROOT/torchgen/gen_backend_stubs.py - pytorch_root = pathlib.Path(__file__).parent.parent.parent.absolute() + pytorch_root = pathlib.Path(__file__).parent.parent.absolute() template_dir = os.path.join(pytorch_root, "aten/src/ATen/templates") def make_file_manager(install_dir: str) -> FileManager: ``` run_all_fbandroid_tests Test Plan: sandcastle Reviewed By: albanD, ngimel Differential Revision: D35770317 fbshipit-source-id: 153ac4a7fef15b1e750812a90bfafdbc8f1ebcdf (cherry picked from commit c6d485d1d4648fa1c8a4c14c5bf3d8e899b9b4dd)	2022-04-25 01:38:06 +00:00
Scott Wolchok	0a5e788ab2	[PyTorch] Add NestedTensorCPU and NestedTensorCUDA dispatch keys (#75808 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75808 Just as it is often difficult to write a single kernel that can handle both CPU and CUDA, so can it be difficult to do the same for NestedTensor. ghstack-source-id: 154171542 (Note: this ignores all push blocking failures!) Test Plan: CI? Reviewed By: bdhirsh Differential Revision: D35603836 fbshipit-source-id: fb0ebb19d34531ed96ce176aca325f8e2b5f90e6 (cherry picked from commit 0bcd753f93c04256c1b745f84a74ecccf0dceef5)	2022-04-19 18:12:12 +00:00
Michael Andreas Dagitses	ef50186a7d	remove unused //:tools_jit target and dependency from generate-code Pull Request resolved: https://github.com/pytorch/pytorch/pull/74837 This is no longer used. Differential Revision: [D35187901](https://our.internmc.facebook.com/intern/diff/D35187901/) Approved by: https://github.com/malfet	2022-04-17 20:54:46 +00:00
Scott Wolchok	97c993ca7a	[PyTorch] Add NestedTensor support functions for transformers Pull Request resolved: https://github.com/pytorch/pytorch/pull/75491 Here are the NestedTensor kernels we'll need for the improved transformer implementation. Differential Revision: [D35409275](https://our.internmc.facebook.com/intern/diff/D35409275/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D35409275/)! Approved by: https://github.com/cpuhrsch	2022-04-14 16:30:23 +00:00
Brian Hirsh	23b8414391	code-generate non-aliasing {view}_copy kernels (#73442 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73442 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D35016025 Pulled By: bdhirsh fbshipit-source-id: 2a7f303ec76f5913b744c7822a531d55a57589c9 (cherry picked from commit 3abe13c2a787bcbe9c41b0a335c96e5a3d3642fb)	2022-04-11 19:48:55 +00:00
mikey dagitses	d92213f741	move setup_helpers programs to their own package (#74838 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74838 This matches structure of internal builds. ghstack-source-id: 153272856 Test Plan: Should be a no-op, rely on CI to validate. Reviewed By: ezyang Differential Revision: D35187899 fbshipit-source-id: 44b51145df54c41836149704d0d84d4a882f158e (cherry picked from commit af48ae5e7dd6ea7d6dc3acfb367d76508f7d6b0c)	2022-04-08 11:17:48 +00:00
mikey dagitses	0d7aad822e	move Bazel //:tools_autograd to the //tools/autograd package (#74745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74745 This is to be consistent with internal builds. ghstack-source-id: 152722873 Test Plan: Rely on CI. Reviewed By: ezyang Differential Revision: D35144475 fbshipit-source-id: f4cea54ceb05cd70de76c7a679466c92d063d15e (cherry picked from commit a52691384ad6773cf6e7ebbb5709a4d2696875e8)	2022-04-06 16:18:37 +00:00
mikey dagitses	60729d02f1	remove unused nn_path from generate_code (#74563 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74563 This is used inconsistently in all the generate_code program invocations. Nevertheless, nothing consumes this flag, so we can safely remove it. This was removed in #25353. ghstack-source-id: 152249818 Test Plan: Should be a no-op, rely on CI. Reviewed By: malfet Differential Revision: D35053096 fbshipit-source-id: 3ad19e83ca14649b514dc163c3caff6cbd118e14 (cherry picked from commit a43f05bb43553249caac3c3479986cbc45d286ae)	2022-03-31 18:35:30 +00:00
mikey dagitses	40bf3cfeb7	use the shared //tools/codegen:gen in OSS Bazel (#74471 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74471 ghstack-source-id: 152438662 Test Plan: Rely on CI. Reviewed By: malfet Differential Revision: D35011683 fbshipit-source-id: 3729c6c7c1b323e253d9832bb51752af8ae2faa6 (cherry picked from commit 1727d8693fa07e308f7390e6f645516dc9dee6a9)	2022-03-31 16:01:42 +00:00
mikey dagitses	79307fbde0	use the //tools/codegen target in Bazel (#74465 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74465 This requires adding py_library and its PyPI dependency provider "requirement". ghstack-source-id: 152438643 Test Plan: Rely on CI to validate. Reviewed By: malfet Differential Revision: D35009795 fbshipit-source-id: 424c4968474b3c2fb37d2c7dba932b37605a63f7 (cherry picked from commit 91e442c3bf0e204b0fb6c98405aaaa7308011511)	2022-03-31 12:54:14 +00:00
PyTorch MergeBot	ea44645c9a	Revert "Allow specifying tags for aten operators in native_functions.yaml" This reverts commit 1dab71ab258df5168bb635530a820625f9d4b522. Reverted https://github.com/pytorch/pytorch/pull/72549 on behalf of https://github.com/malfet	2022-03-28 18:04:38 +00:00
anjali411	1dab71ab25	Allow specifying tags for aten operators in native_functions.yaml Pull Request resolved: https://github.com/pytorch/pytorch/pull/72549 Approved by: https://github.com/ezyang	2022-03-25 21:17:52 +00:00
Will Constable	3547f20872	Land remaining parts of Torchscript Lazy Tensor backend (#74111 ) Summary: Also enables bazel build to run lazy codegen. Bazel (oss) build feeds off the same filelists as cmake/buck (build_variables.bzl), so enabling it is easier than keeping it disabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74111 Test Plan: Run CI and verify test_lazy_ops is running via OSS cmake builds Reviewed By: bdhirsh Differential Revision: D34772403 fbshipit-source-id: 8a63f58b9536e6ac1be530667932176ef2549496 (cherry picked from commit e807ffb1918853d10b924fdc24f85ee5b1a39021)	2022-03-22 23:14:03 +00:00
Will Constable	44a8d4d998	Add lazy tensor unit tests, disabled (#74309 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74309 Since the test file is large, it can be landed on its own and then switched on in the diff that actually builds lazy tensor code. Test Plan: verify CI passes Reviewed By: desertfire Differential Revision: D34928619 fbshipit-source-id: cd556155326f7fb55b3f29031f80bc36c936d565 (cherry picked from commit 60945adbefb6a8d19f89e330f8b344d076b13bfc)	2022-03-17 15:31:26 +00:00
Will Constable	72b1194464	Run lazy tensor codegen in generate_code.py (#73996 ) Summary: Hooks into existing autograd codegen script (generate_code.py) to take advantage of its integrations into buck/cmake/bazel. Adds a new option (--gen_lazy_ts_backend) to. generate_code.py, calling this from CMake OSS build and fbcode build, but not from other internal xplat/ovrsource builds (these could be opted in later) Bazel support is added in a later diff. Includes one generated file (torch/csrc/lazy/generated/LazyIr.h) in a unit test (test/cpp/lazy/test_ir.cpp) to partially verify the generator is working, but does not compile the remaining output sources from the generator yet as they depend on other files not yet landed from lazy_tensor_staging branch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/73996 Test Plan: OSS/internal CI - verify all builds are working and test_ir.cpp compiles LazyIr.h Reviewed By: ezyang Differential Revision: D34408536 fbshipit-source-id: 8af0aea3b95d81eccafc17d64390d70ddd176515 (cherry picked from commit f930612f2bad61c76eb02d85cfbec9f33a1459dc)	2022-03-17 15:31:26 +00:00
Christian Puhrsch	484c0de670	Minimal NestedTensor (#72881 ) Summary: This PR adds a minimal version of a NestedTensor. It introduces the general harness future development can be built around. Pull Request resolved: https://github.com/pytorch/pytorch/pull/72881 Reviewed By: albanD Differential Revision: D34259177 Pulled By: cpuhrsch fbshipit-source-id: 0245c36f603424e20f3b09651043c207f526d760 (cherry picked from commit 10764e8d427f29b364567e4cbc86ed73c3933158)	2022-03-02 16:31:51 +00:00
Edward Yang	ce7910ba81	ufunc codegen (#65851 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65851 Design doc: https://docs.google.com/document/d/12rtlHnPUpaJ-I52Iob3L0WA3rKRr_OY7fXqeCvn2MVY/edit First read the design doc to understand the user syntax. In this PR, we have converted add to use ufunc codegen; most of the cpp changes are deleting the preexisting implementations of add, and ufunc/add.h are the new implementations in the ufunc format. The bulk of this PR is in the new codegen machinery. Here's the order to read the files: * `tools/codegen/model.py` * Some self-explanatory utility classes: `ScalarType`, `DTYPE_CLASSES` * New classes for representing ufunc entries in `native_functions.yaml`: `UfuncKey` and `UfuncInnerLoop`, as well as parsing logic for these entries. UfuncKey has some unusual entries (e.g., CPUScalar) that don't show up in the documentation, more on these below). * A predicate `is_ufunc_dispatch_key` for testing which dispatch keys should get automatically generated when an operator opts into ufuncs (CPU and CUDA, for now!) * `tools/codegen/api/types.py` * More self-explanatory utility stuff: ScalarTypeToCppMapping mapping ScalarType to CppTypes; Binding.rename for changing the name of a binding (used when we assign constructor variables to member variables inside CUDA functors) * New VectorizedCType, representing `at::vec::Vectorized<T>`. This is used inside vectorized CPU codegen. * New `scalar_t` and `opmath_t` BaseCppTypes, representing template parameters that we work with when doing codegen inside ufunc kernel loops (e.g., where you previously had Tensor, now you have `scalar_t`) * `StructuredImplSignature` represents a `TORCH_IMPL_FUNC` definition, and straightforwardly follows from preexisting `tools.codegen.api.structured` * `tools/codegen/translate.py` - Yes, we use translate a LOT in this PR. I improved some of the documentation, the only substantive changes are adding two new conversions: given a `scalar_t` or a `const Scalar&`, make it convertible to an `opmath_t` * `tools/codegen/api/ufunc.py` * OK, now we're at the meaty stuff. This file represents the calling conventions of three important concepts in ufunc codegen, which we'll describe shortly. All of these APIs are relatively simple, since there aren't any complicated types by the time you get to kernels. * stubs are the DispatchStub trampolines that CPU kernels use to get to their vectorized versions. They drop all Tensor arguments (as they are in TensorIterator) but otherwise match the structured calling convention * ufuncs are the inner loop template functions that you wrote in ufunc/add.h which do the actual computation in question. Here, all the Tensors and Scalars have been converted into the computation type (`opmath_t` in CUDA, `scalar_t` in CPU) * ufunctors are a CUDA-only concept representing functors that take some of their arguments on a host-side constructor, and the rest in the device-side apply. Once again, Tensors and Scalars are converted into the computation type, `opmath_t`, but for clarity all the functions take `scalar_t` as argument (as this is the type that is most salient at the call site). Because the constructor and apply are code generated separately, `ufunctor_arguments` returns a teeny struct `UfunctorBindings` * `tools/codegen/dest/ufunc.py` - the workhorse. This gets its own section below. * `tools/codegen/gen.py` - just calling out to the new dest.ufunc implementation to generate UfuncCPU_add.cpp, UFuncCPUKernel_add.cpp and UfuncCUDA_add.cu files per ufunc operator. Each of these files does what you expect (small file that registers kernel and calls stub; CPU implementation; CUDA implementation). There is a new file manager for UFuncCPUKernel files as these need to get replicated by cmake for vectorization. One little trick to avoid recompilation is we directly replicate code generated forward declarations in these files, to reduce the number of headers we depend on (this is codegen, we're just doing the preprocessors job!) * I'll talk about build system adjustments below. OK, let's talk about tools/codegen/dest/ufunc.py. This file can be roughly understood in two halves: one for CPU code generation, and the other for CUDA code generation. CPU codegen. Here's roughly what we want to generate: ``` // in UfuncCPU_add.cpp using add_fn = void ()(TensorIteratorBase&, const at::Scalar&); DECLARE_DISPATCH(add_fn, add_stub); DEFINE_DISPATCH(add_stub); TORCH_IMPL_FUNC(ufunc_add_CPU) (const at::Tensor& self, const at::Tensor& other, const at::Scalar& alpha, const at::Tensor& out) { add_stub(device_type(), this, alpha); } // in UfuncCPUKernel_add.cpp void add_kernel(TensorIteratorBase& iter, const at::Scalar& alpha) { at::ScalarType st = iter.common_dtype(); RECORD_KERNEL_FUNCTION_DTYPE("add_stub", st); switch (st) { AT_PRIVATE_CASE_TYPE("add_stub", at::ScalarType::Bool, bool, [&]() { auto _s_alpha = alpha.to<scalar_t>(); cpu_kernel(iter, [=](scalar_t self, scalar_t other) { return ufunc::add(self, other, _s_alpha); }); }) AT_PRIVATE_CASE_TYPE( "add_stub", at::ScalarType::ComplexFloat, c10::complex<float>, [&]() { auto _s_alpha = alpha.to<scalar_t>(); auto _v_alpha = at::vec::Vectorized<scalar_t>(_s_alpha); cpu_kernel_vec( iter, [=](scalar_t self, scalar_t other) { return ufunc::add(self, other, _s_alpha); }, [=](at::vec::Vectorized<scalar_t> self, at::vec::Vectorized<scalar_t> other) { return ufunc::add(self, other, _v_alpha); }); }) ... ``` The most interesting change about the generated code is what previously was an `AT_DISPATCH` macro invocation is now an unrolled loop. This makes it easier to vary behavior per-dtype (you can see in this example that the entry for bool and float differ) without having to add extra condtionals on top. Otherwise, to generate this code, we have to hop through several successive API changes: * In TORCH_IMPL_FUNC(ufunc_add_CPU), go from StructuredImplSignature to StubSignature (call the stub). This is normal argument massaging in the classic translate style. * In add_kernel, go from StubSignature to UfuncSignature. This is nontrivial, because we must do various conversions outside of the inner kernel loop. These conversions are done by hand, setting up the context appropriately, and then the final ufunc call is done using translate. (BTW, I introduce a new convention here, call on a Signature, for code generating a C++ call, and I think we should try to use this convention elsewhere) The other piece of nontrivial logic is the reindexing by dtype. This reindexing exists because the native_functions.yaml format is indexed by UfuncKey: ``` Generic: add (AllAndComplex, BFloat16, Half) ScalarOnly: add (Bool) ``` but when we do code generation, we case on dtype first, and then we generate a `cpu_kernel` or `cpu_kernel_vec` call. We also don't care about CUDA code generation (which Generic) hits. Do this, we lower these keys into two low level keys, CPUScalar and CPUVector, which represent the CPU scalar and CPU vectorized ufuncs, respectively (Generic maps to CPUScalar and CPUVector, while ScalarOnly maps to CPUScalar only). Reindexing then gives us: ``` AllAndComplex: CPUScalar: add CPUVector: add Bool: CPUScalar: add ... ``` which is a good format for code generation, but too wordy to force native_functions.yaml authors to write. Note that when reindexing, it is possible for there to be a conflicting definition for the same dtype; we just define a precedence order and have one override the other, so that it is easy to specialize on a particular dtype if necessary. Also note that because CPUScalar/CPUVector are part of UfuncKey, technically you can manually specify them in native_functions.yaml, although I don't expect this functionality to be used. CUDA codegen. CUDA code generation has many of the same ideas as CPU codegen, but it needs to know about functors, and stubs are handled slightly differently. Here is what we want to generate: ``` template <typename scalar_t> struct CUDAFunctorOnSelf_add { using opmath_t = at::opmath_type<scalar_t>; opmath_t other_; opmath_t alpha_; CUDAFunctorOnSelf_add(opmath_t other, opmath_t alpha) : other_(other), alpha_(alpha) {} __device__ scalar_t operator()(scalar_t self) { return ufunc::add(static_cast<opmath_t>(self), other_, alpha_); } }; ... two more functors ... void add_kernel(TensorIteratorBase& iter, const at::Scalar & alpha) { TensorIteratorBase& iter = this; at::ScalarType st = iter.common_dtype(); RECORD_KERNEL_FUNCTION_DTYPE("ufunc_add_CUDA", st); switch (st) { AT_PRIVATE_CASE_TYPE("ufunc_add_CUDA", at::ScalarType::Bool, bool, [&]() { using opmath_t = at::opmath_type<scalar_t>; if (false) { } else if (iter.is_cpu_scalar(1)) { CUDAFunctorOnOther_add<scalar_t> ufunctor( iter.scalar_value<opmath_t>(1), (alpha).to<opmath_t>()); iter.remove_operand(1); gpu_kernel(iter, ufunctor); } else if (iter.is_cpu_scalar(2)) { CUDAFunctorOnSelf_add<scalar_t> ufunctor( iter.scalar_value<opmath_t>(2), (alpha).to<opmath_t>()); iter.remove_operand(2); gpu_kernel(iter, ufunctor); } else { gpu_kernel(iter, CUDAFunctor_add<scalar_t>((alpha).to<opmath_t>())); } }) ... REGISTER_DISPATCH(add_stub, &add_kernel); TORCH_IMPL_FUNC(ufunc_add_CUDA) (const at::Tensor& self, const at::Tensor& other, const at::Scalar& alpha, const at::Tensor& out) { add_kernel(this, alpha); } ``` The functor business is the bulk of the complexity. Like CPU, we decompose CUDA implementation into three low-level keys: CUDAFunctor (normal, all CUDA kernels will have this), and CUDAFunctorOnOther/CUDAFunctorOnScalar (these are to support Tensor-Scalar specializations when the Scalar lives on CPU). Both Generic and ScalarOnly provide ufuncs for CUDAFunctor, but for us to also lift these into Tensor-Scalar specializations, the operator itself must be eligible for Tensor-Scalar specialization. At the moment, this is hardcoded to be all binary operators, but in the future we can use tags in native_functions.yaml to disambiguate (or perhaps expand codegen to handle n-ary operators). The reindexing process not only reassociates ufuncs by dtype, but it also works out if Tensor-Scalar specializations are needed and codegens the ufunctors necessary for the level of specialization here (`compute_ufunc_cuda_functors`). Generating the actual kernel (`compute_ufunc_cuda_dtype_body`) just consists of, for each specialization, constructing the functor and then passing it off to `gpu_kernel`. Most of the hard work is in functor generation, where we take care to make sure `operator()` has the correct input and output types (which `gpu_kernel` uses to arrange for memory accesses to the actual CUDA tensor; if you get these types wrong, your kernel will still work, it will just run very slowly!) There is one big subtlety with CUDA codegen: this won't work: ``` Generic: add (AllAndComplex, BFloat16, Half) ScalarOnly: add_bool (Bool) ``` This is because, even though there are separate Generic/ScalarOnly entries, we only generate a single functor to cover ALL dtypes in this case, and the functor has the ufunc name hardcoded into it. You'll get an error if you try to do this; to fix it, just make sure the ufunc is named the same consistently throughout. In the code, you see this because after testing for the short circuit case (when a user provided the functor themselves), we squash all the generic entries together and assert their ufunc names are the same. Hypothetically, if we generated a separate functor per dtype, we could support differently named ufuncs but... why would you do that to yourself. (One piece of nastiness is that the native_functions.yaml syntax doesn't stop you from shooting yourself in the foot.) A brief word about CUDA stubs: technically, they are not necessary, as there is no CPU/CPUKernel style split for CUDA kernels (so, if you look, structured impl actually calls add_kernel directly). However, there is some code that still makes use of CUDA stubs (in particular, I use the stub to conveniently reimplement sub in terms of add), so we still register it. This might be worth frying some more at a later point in time. Build system changes. If you are at FB, you should review these changes in fbcode, as there are several changes in files that are not exported to ShipIt. The build system changes in this patch are substantively complicated by the fact that I have to implement these changes five times: * OSS cmake build * OSS Bazel build * FB fbcode Buck build * FB xplat Buck build (selective build) * FB ovrsource Buck build Due to technical limitations in the xplat Buck build related to selective build, it is required that you list every ufunc header manually (this is done in tools/build_variables.bzl) The OSS cmake changes are entirely in cmake/Codegen.cmake there is a new set of files cpu_vec_generated (corresponding to UfuncCPUKernel files) which is wired up in the same way as other files. These files are different because they need to get compiled multiple times under different vectorization settings. I adjust the codegen, slightly refactoring the inner loop into its own function so I can use different base path calculation depending on if the file is traditional (in the native/cpu folder) or generated (new stuff from this diff. The Bazel/Buck changes are organized around tools/build_variables.bzl, which contain the canonical list of ufunc headers (aten_ufunc_headers), and tools/ufunc_defs.bzl (added to ShipIt export list in D34465699) which defines a number of functions that compute the generated cpu, cpu kernel and cuda files based on the headers list. For convenience, these functions take a genpattern (a string with a {} for interpolation) which can be used to easily reformat the list of formats in target form, which is commonly needed in the build systems. The split between build_variables.bzl and ufunc_defs.bzl is required because build_variables.bzl is executed by a conventional Python interpreter as part of the OSS cmake, but we require Skylark features to implement the functions in ufunc_defs.bzl (I did some quick Googling but didn't find a lightweight way to run the Skylark interpreter in open source.) With these new file lists, the rest of the build changes are mostly inserting references to these files wherever necessary; in particular, cpu kernel files have to be worked into the multiple vectorization build flow (intern_build_aten_ops in OSS Bazel). Most of the subtlety relates to selective build. Selective build requires operator files to be copied per overall selective build; as dhruvbird explains to me, glob expansion happens during the action graph phase, but the selective build handling of TEMPLATE_SOURCE_LIST is referencing the target graph. In other words, we can't use a glob to generate deps for another rule, because we need to copy files from wherever (included generated files) to a staging folder so the rules can pick them up. It can be somewhat confusing to understand which bzl files are associated with which build. Here are the relevant mappings for files I edited: * Used by everyone - tools/build_tools.bzl, tools/ufunc_defs.bzl * OSS Bazel - aten.bzl, BUILD.bazel * FB fbcode Buck - TARGETS * FB xplat Buck -BUCK, pt_defs.bzl, pt_template_srcs.bzl * FB ovrsource Buck - ovrsource_defs.bzl, pt_defs.bzl Note that pt_defs.bzl is used by both xplat and ovrsource. This leads to the "tiresome" handling for enabled backends, as selective build is CPU only, but ovrsource is CPU and CUDA. BTW, while I was at it, I beefed up fb/build_arvr.sh to also do a CUDA ovrsource build, which was not triggered previously. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31306586 Pulled By: ezyang fbshipit-source-id: 210258ce83f578f79cf91b77bfaeac34945a00c6 (cherry picked from commit d65157b0b894b6701ee062f05a5f57790a06c91c)	2022-03-01 00:33:40 +00:00
Shunting Zhang	763ad1bf25	(2/2) Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions: frontend change (#72899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72899 Reland D33282878 (`911d527b87`). This is the frontend change. ghstack-source-id: 149204031 Test Plan: Refer to D33282878 (`911d527b87`). Also check CI Reviewed By: gmagogsfm Differential Revision: D34252127 fbshipit-source-id: 27b17ddd4d05d904eb91fd9ee094d9121f00e388 (cherry picked from commit 1d276baca308110ac40111ccd622400b3bbdc864)	2022-02-16 03:45:15 +00:00
Nikita Shulga	133461e5d6	Move CUDA linalg code to its own subfolder (#72304 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72304 This is a no-op change that simply moves files around in preparation of moving linear algebra in its own dynamically boundable module This also simplifies torch_cuda_cu build rules, as all files from linalg it needs are in its own folder now. Bazel CUDA rules are in some weird disarray(needed to add wildcard there as it ignores files mentioned in build_variables.so) and similar wildcard needs to be added to internal build system. Test Plan: Imported from OSS Reviewed By: dagitses, ngimel Differential Revision: D33992796 Pulled By: malfet fbshipit-source-id: 3f4fa1c224016d03e1a982a7ae5ac7807bc772e2 (cherry picked from commit 6a5a1b0c3f306a8915a45bcdf2c51f15b02d8a14)	2022-02-07 17:55:50 +00:00
mikey dagitses	286f5a51f9	move //c10:tests target to the shared //c10/test package (#70928 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70928 ghstack-source-id: 148159366 Test Plan: Ensured that the same number of tests are found and run. Reviewed By: malfet Differential Revision: D33455272 fbshipit-source-id: fba1e3409b14794be3e6fe4445c56dd5361cfe9d (cherry picked from commit b45fce500aa9c3f69915bf0857144ba6d268e649)	2022-02-03 20:14:57 +00:00
Peter Bell	847dbb8684	CMake: Clean up unused definitions (#69216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69216 This cleans up 4 pre-processor defines not used by any code: - HAVE_GCC_GET_CPUID - USE_GCC_GET_CPUID - USE_AVX - USE_AVX2 `cpuid` isn't used in PyTorch any more, we only use `cpuinfo`. `USE_AVX` is also not used, instead `HAVE__CPU_DEFINITIONS` tells you which `CPU_CAPABILITY` flags are being compiled. There is also `fbgemm`'s code path adding `third_party` as an include path, despite `fbgemm` having a dedicated include directory and a CMake setup that properly includes it. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33794424 Pulled By: malfet fbshipit-source-id: 99d504af088818d4a26c2f6ce67ec0d59a5eb703 (cherry picked from commit 2e099d41f0e2f7d96c6013ac83223a75f4e4f862)	2022-01-31 22:49:11 +00:00
Peter Bell	2bb6a4f437	Generate aten_interned_strings.h automatically (#69407 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69407 This generates aten_interned_strings.h from `native_functions.yaml` which is more like how it was originally done. The items deleted from `interned_strings.h` are duplicates that need to be removed in order for the code to compile, some of the remaining items may still be out of date but it is fairly benign even if that's the case. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32923636 Pulled By: albanD fbshipit-source-id: a0fd6b3714e70454c5f4ea9b19da5e047d2a4687	2022-01-18 08:29:54 -08:00
Han Qi	1bc3571078	[pytorch][PR] Add ability for a mobile::Module to save as flatbuffer (#70201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70201 Included functions: save_mobile_module -> saves a mobile::Module to flatbuffer load_mobile_module_from_file -> loads a flatbuffer into mobile::Module parse_mobile_module -> parses from bytes or deserialized flatbuffer module object Compared to previous attempts, this diff only adds flatbuffer to cmake target and leaves fbcode/xplat ones unchanged. Test Plan: unittest Reviewed By: malfet, gmagogsfm Differential Revision: D33239362 fbshipit-source-id: b9ca36b83d6af2d78cc50b9eb9e2a6fa7fce0763	2022-01-12 16:30:39 -08:00
Peter Bell	e1aa5db108	Bazel: Only run ATen codegen once (#70147 ) Summary: Due to a merge conflict, the new bazel cuda build does something rather obnoxious. It runs ATen codegen with `--per-operator-headers` enabled and extracts a subset of the generated files; then calls it again without the flag to extract the CUDA files. This PR instead calls the codegen once but keeps track of what is CPU and what is CUDA in separate lists. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70147 Reviewed By: VitalyFedyunin Differential Revision: D33413020 Pulled By: malfet fbshipit-source-id: 4b502c38a209d1aa63d715e2336df6fc5aac2212	2022-01-05 06:56:52 -08:00
Peter Bell	c34aa715fa	AT_MKL_SEQUENTIAL and build changes (#70259 ) Summary: Re-land of https://github.com/pytorch/pytorch/pull/69419 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70259 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D33246757 Pulled By: ngimel fbshipit-source-id: 738f8558d4cad6752be14108f9931ec3514f6682	2021-12-22 13:52:23 -08:00
Thuyen Ngo	e35bf56461	[Bazel] Add CUDA build to CI (#66241 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/35316 On master, bazel cuda build is disabled due to lack of a proper `cu_library` rule. This PR: - Add `rules_cuda` to the WORKSPACE and forward `cu_library` to `rules_cuda`. - Use a simple local cuda and cudnn repositories (adopted from TRTorch) for cuda 11.3. - Fix current broken cuda build. - Enable cuda build in CI, not just for `:torch` target but all the test binaries to catch undefined symbols. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66241 Reviewed By: ejguan Differential Revision: D31544091 Pulled By: malfet fbshipit-source-id: fd3c34d0e8f80fee06f015694a4c13a8e9e12206	2021-12-17 13:44:29 -08:00
Michael Dagitses	02c63c3006	extract out c10 targets to the c10 package (#69992 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69992 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D33141013 fbshipit-source-id: e5edd6bd5b5834ac27390ba940ebed9148512c8d	2021-12-16 13:11:49 -08:00
Peter Bell	4829dcea09	Codegen: Generate seperate headers per operator (#68247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68247 This splits `Functions.h`, `Operators.h`, `NativeFunctions.h` and `NativeMetaFunctions.h` into seperate headers per operator base name. With `at::sum` as an example, we can include: ```cpp <ATen/core/sum.h> // Like Functions.h <ATen/core/sum_ops.h> // Like Operators.h <ATen/core/sum_native.h> // Like NativeFunctions.h <ATen/core/sum_meta.h> // Like NativeMetaFunctions.h ``` The umbrella headers are still being generated, but all they do is include from the `ATen/ops' folder. Further, `TensorBody.h` now only includes the operators that have method variants. Which means files that only include `Tensor.h` don't need to be rebuilt when you modify function-only operators. Currently there are about 680 operators that don't have method variants, so this is potentially a significant win for incremental builds. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32596272 Pulled By: albanD fbshipit-source-id: 447671b2b6adc1364f66ed9717c896dae25fa272	2021-12-14 06:40:08 -08:00
Peter Bell	b08d64202a	Remove THGeneral (#69041 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69041 `TH_CONCAT_{N}` is still being used by THP so I've moved that into it's own header but all the compiled code is gone. Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D32872477 Pulled By: ngimel fbshipit-source-id: 06c82d8f96dbcee0715be407c61dfc7d7e8be47a	2021-12-13 16:14:28 -08:00
Yanan Cao	17f3179d60	Back out "[pytorch][PR] Add ability for a mobile::Module to save as flatbuffer" (#69796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69796 (Note: this ignores all push blocking failures!) Test Plan: External CI + Sandcastle Reviewed By: zhxchen17 Differential Revision: D33032671 fbshipit-source-id: dbf6690e960e25d6a5f19043cbe792add2acd7ef	2021-12-10 21:29:53 -08:00
Peter Bell	9962bfb3c9	Remove THTensor (#69040 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69040 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D32872478 Pulled By: ngimel fbshipit-source-id: f93e16509d64308d91e374744410a6a811e7f4e3	2021-12-10 02:29:11 -08:00
Peter Bell	b2e79ed5ec	Remove WindowsTorchApiMacro.h in favor of Export.h (#69585 ) Summary: Follow up to https://github.com/pytorch/pytorch/issues/68095 This also changes the files from the ATen folder to include c10's `Export.h` instead since they can't ever be exporting `TORCH_PYTHON_API`. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/69585 Reviewed By: mrshenli Differential Revision: D32958594 Pulled By: albanD fbshipit-source-id: 1ec7ef63764573fa2b486928955e3a1172150061	2021-12-09 17:30:09 -08:00

1 2 3 4 5

224 Commits