pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Nikita Shulga	36ac095ff8	Migrate PyTorch to C++17 (#85969 ) With CUDA-10.2 gone we can finally do it! This PR mostly contains build system related changes, invasive functional ones are to be followed. Among many expected tweaks to the build system, here are few unexpected ones: - Force onnx_proto project to be updated to C++17 to avoid `duplicate symbols` error when compiled by gcc-7.5.0, as storage rule for `constexpr` changed in C++17, but gcc does not seem to follow it - Do not use `std::apply` on CUDA but rely on the built-in variant, as it results in test failures when CUDA runtime picks host rather than device function when `std::apply` is invoked from CUDA code. - `std::decay_t` -> `::std::decay_t` and `std::move`->`::std::move` as VC++ for some reason claims that `std` symbol is ambigious - Disable use of `std::aligned_alloc` on Android, as its `libc++` does not implement it. Some prerequisites: - https://github.com/pytorch/pytorch/pull/89297 - https://github.com/pytorch/pytorch/pull/89605 - https://github.com/pytorch/pytorch/pull/90228 - https://github.com/pytorch/pytorch/pull/90389 - https://github.com/pytorch/pytorch/pull/90379 - https://github.com/pytorch/pytorch/pull/89570 - https://github.com/facebookincubator/gloo/pull/336 - https://github.com/facebookincubator/gloo/pull/343 - `919676fb32` Fixes https://github.com/pytorch/pytorch/issues/56055 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85969 Approved by: https://github.com/ezyang, https://github.com/kulinseth	2022-12-08 02:27:48 +00:00
Digant Desai	f2d95765e4	[pthreadpool] Set max threadlimit to tsan limit (#89453 ) Summary: This will make sure we don't run into an internal assert for clang tsan which has a cap of 63 on concurrently held lock count. Seems like it is failing with 64 since the comparison is `<`, so setting it to 63 here. ``` llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_deadlock_detector.h:67 "((n_all_locks_)) < (((sizeof(all_locks_with_contexts_)/sizeof((all_locks_with_contexts_)[0]))))" ``` Created from CodeHub with https://fburl.com/edit-in-codehub Test Plan: CI Sandcastle run Reviewed By: kimishpatel, salilsdesai Differential Revision: D41444710 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89453 Approved by: https://github.com/mcr229	2022-12-08 02:02:53 +00:00
Will Constable	772b726068	Revert "Disable dynamo tracing torchrec.distributed (#90087 )" (#90416 ) This reverts commit 7e9a8a1361a090cee86544a3c029b9b4ed622e9c. This revert fixes a torchbench dlrm amp crash. Auto revert fails due to conflict. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90416 Approved by: https://github.com/yanboliang, https://github.com/malfet	2022-12-08 01:50:54 +00:00
Sergii Dymchenko	00118f5c30	Fix issue 38095 TODO in test_jit_fuser_te.py (#90246 ) Fix TODO related to https://github.com/pytorch/pytorch/issues/38095 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90246 Approved by: https://github.com/clee2000	2022-12-08 01:39:26 +00:00
Richard Barnes	ad188a227e	Introduce CUDA Device Assertions Infrastructure (#84609 ) Summary: This diff introduces a set of changes that makes it possible for the host to get assertions from CUDA devices. This includes the introduction of `CUDA_KERNEL_ASSERT2` A preprocessor macro to be used within a CUDA kernel that, upon an assertion failure, writes the assertion message, file, line number, and possibly other information to UVM (Managed memory). Once this is done, the original assertion is triggered, which places the GPU in a Bad State requiring recovery. In my tests, data written to UVM appears there before the GPU reaches the Bad State and is still accessible from the host after the GPU is in this state. Messages are written to a multi-message buffer which can, in theory, hold many assertion failures. I've done this as a precaution in case there are several, but I don't actually know whether that is possible and a simpler design which holds only a single message may well be all that is necessary. `TORCH_DSA_KERNEL_ARGS` This preprocess macro is added as an _argument_ to a kernel function's signature. It expands to supply the standardized names of all the arguments needed by `C10_CUDA_COMMUNICATING_KERNEL_ASSERTION` to handle device-side assertions. This includes, eg, the name of the pointer to the UVM memory the assertion would be written to. This macro abstracts the arguments so there is a single point of change if the system needs to be modified. `c10::cuda::get_global_cuda_kernel_launch_registry()` This host-side function returns a singleton object that manages the host's part of the device-side assertions. Upon allocation, the singleton allocates sufficient UVM (Managed) memory to hold information about several device-side assertion failures. The singleton also provides methods for getting the current traceback (used to identify when a kernel was launched). To avoid consuming all the host's memory the singleton stores launches in a circular buffer; a unique "generation number" is used to ensure that kernel launch failures map to their actual launch points (in the case that the circular buffer wraps before the failure is detected). `TORCH_DSA_KERNEL_LAUNCH` This host-side preprocessor macro replaces the standard ``` kernel_name<<<blocks, threads, shmem, stream>>>(args) ``` invocation with ``` TORCH_DSA_KERNEL_LAUNCH(blocks, threads, shmem, stream, args); ``` Internally, it fetches the UVM (Managed) pointer and generation number from the singleton and append these to the standard argument list. It also checks to ensure the kernel launches correctly. This abstraction on kernel launches can be modified to provide additional safety/logging. `c10::cuda::c10_retrieve_device_side_assertion_info` This host-side function checks, when called, that no kernel assertions have occurred. If one has. It then raises an exception with: 1. Information (file, line number) of what kernel was launched. 2. Information (file, line number, message) about the device-side assertion 3. Information (file, line number) about where the failure was detected. Checking for device-side assertions Device-side assertions are most likely to be noticed by the host when a CUDA API call such as `cudaDeviceSynchronize` is made and fails with a `cudaError_t` indicating > CUDA error: device-side assert triggered CUDA kernel errors Therefore, we rewrite `C10_CUDA_CHECK()` to include a call to `c10_retrieve_device_side_assertion_info()`. To make the code cleaner, most of the logic of `C10_CUDA_CHECK()` is now contained within a new function `c10_cuda_check_implementation()` to which `C10_CUDA_CHECK` passes the preprocessor information about filenames, function names, and line numbers. (In C++20 we can use `std::source_location` to eliminate macros entirely!) # Notes on special cases * Multiple assertions from the same block are recorded * Multiple assertions from different blocks are recorded * Launching kernels from many threads on many streams seems to be handled correctly * If two process are using the same GPU and one of the processes fails with a device-side assertion the other process continues without issue * X Multiple assertions from separate kernels on different streams seem to be recorded, but we can't reproduce the test condition * X Multiple assertions from separate devices should be all be shown upon exit, but we've been unable to generate a test that produces this condition Differential Revision: D37621532 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84609 Approved by: https://github.com/ezyang, https://github.com/malfet	2022-12-08 01:26:07 +00:00
Sergii Dymchenko	f99f239531	Fix issue 38095 TODOs in gloo tests (#89985 ) Fix TODOs related to https://github.com/pytorch/pytorch/issues/38095 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89985 Approved by: https://github.com/ZainRizvi	2022-12-08 01:12:37 +00:00
Atul Jangra	1ba94b3882	Support pickle version 4 by adding missing ops (#90223 ) Summary: In this logic, we are traversing the entries to find the module for STACK_GLOBAL entries. According to `2837241f22/Lib/pickletools.py (L1799)` we need to look for GET, BINGET and LONG_BINGET. So this diff updates that. Also while testing, I found some cases of empty modules, for cases such as tanh. For this I added the option to skip processing when this is the case. Test Plan: Tested with f392778829 Differential Revision: D41748595 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90223 Approved by: https://github.com/PaliC	2022-12-08 01:06:40 +00:00
Edward Z. Yang	d5c6a74699	Rewrite dynamo cond() handling to not recursively call export (#90286 ) The original implementation of cond() operator support in dynamo operated by recursively calling export() on the inner subgraph. This is problematic for a number of reasons: * My original motivating reason: the original implementation had to play tricks to feed real tensors to the recursive export call, which means that it doesn't work well with tracing with dynamic shapes (where we MUST stay in fake tensors to accurately track dynamic shapes across the cond invocation) * If there are pending side effects, the recursive export() call won't see those side effects (as they are only tracked by Dynamo, not actually applied to the Python environment.) You can see an example where dynamo cond tracing does the wrong thing at https://github.com/pytorch/pytorch/pull/90208 * If there were side effects inside the true/false branch, these side effects were silently lost (as the export only returns the graph of tensor operations, and not any of the residual Python bytecodes necessary to reapply any side effects.) This could have substantive effects on the export of subsequent parts of the model, as those parts of the models could rely on the side effects. * It was not possible to track NN module accesses inside the true/false branches, necessitating a hack where the NN module was explicitly passed in as an input to cond https://github.com/pytorch/pytorch/pull/87020#issuecomment-1338842844 which doesn't really make any sense from a backend compilation perspective * Guards induced from the inside of the true/false branch were not properly propagated to the top level guards; they were just silently dropped (in fact, the original implementation checked that the true/false branch produce the same guards which... is not useful? Like, I don't think that actually is even necessary for correctness) This PR replaces the old implementation with a new implementation based on graphstate checkpointing. The basic idea is to process a cond(), we checkpoint the state of our interpreter, run the true branch, rollback to our checkpoint, run the false branch, rollback to our checkpoint and then merge the changes from both of the checkpoints. I require the true/false branches to have exactly the same side effects, but union their guards. Some of the details: * Dynamo is too aggressive with tracking side effects when processing closures, c.f. https://github.com/pytorch/torchdynamo/pull/233/files#r1040480078 The basic problem is whenever I define a closure, this immediately counts as a side effect, even if I didn't actually mutate anything. This triggered on the nested cond export example. To prevent this from happening, I optimistically avoid tracking side effects, but if a STORE_DEREF happens, I restart analysis with the relevant Source.name() added to `mutated_closure_cell_contents` so we start tracking on closure allocation. This is enough to fix the relevant test. * For the most part, I assert that the graph states must be equivalent after applying the true/false branches. During debugging, I found it useful to be able to compare two graph states and give a better description about what the divergence was. You can test this using the `diff()` method I've added to a few structures. * The implementation now supports NestedUserFunctionVariable, which is nice as it allows the true/false branches to be defined closer to the cond implementation. * I fixed the naming of the true/false subgraphs; previously they were named `name_0`, `name_1`, now they are named `cond_true_0` and `cond_false_0` * I added `name_to_input` to the saved graph state. I don't actually know if this is necessary, but it seemed like a good idea. * I have to play some tricks to get the speculating execution of the true/false branch to record into a subgraph. After a careful read of OutputGraph, I found that what would work is overriding graph with a fresh Graph that we want to write things into, and manually setting up the inputs/outputs. It's a little delicate as you have to make sure you reset the Graph to its original before you restore a checkpoint, as checkpoints don't actually save graph for efficiency, and just undo changes on the graph. This capability may usefully get refactored to OutputGraph but I didn't do it in this PR for simplicity. There are some further problems with the cond() implementation that I leave for future work. Most of these were preexisting with the original implementation. * Not a problem per se, but if an NN module is used by both the true/false branch, it will show up in the final graph twice (since it has to be a submodule of the GraphModule that makes use of it.) I hope the export pipeline can deal with this. * List of tensor output for cond is not supported. * The true/false return values may not have consistent sizes/dims/etc, and we don't check them for consistency. * If we modify fake tensors in the true/false branches, we aren't rolling them back, c.f. https://github.com/pytorch/torchdynamo/issues/1840 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/90286 Approved by: https://github.com/voznesenskym	2022-12-08 01:05:12 +00:00
Edward Z. Yang	54d344b0b7	Type torch._dynamo.side_effects (#90202 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/90202 Approved by: https://github.com/voznesenskym	2022-12-08 01:05:12 +00:00
Edward Z. Yang	ca5f69ef19	Convert InstructionTranslatorGraphState and OutputGraphState to NamedTuple (#90186 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/90186 Approved by: https://github.com/voznesenskym	2022-12-08 01:05:12 +00:00
Edward Z. Yang	1119aac485	Type torch._dynamo.symbolic_convert (#90185 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/90185 Approved by: https://github.com/voznesenskym	2022-12-08 01:05:12 +00:00
Edward Z. Yang	7abd035b2f	Add missing mypy-nofollow.ini (#90179 ) I'm not sure how lintrunner worked without this lol. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/90179 Approved by: https://github.com/albanD, https://github.com/voznesenskym	2022-12-08 01:05:12 +00:00
Jerry Zhang	47071c3d47	[quant] Add support for symmetric quant in executorch (#90304 ) Summary: This PR adds symmetric quant in the backend config for executorch Test Plan: NA, will be tested in meta internal flow Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/90304 Approved by: https://github.com/cccclai, https://github.com/jcaip, https://github.com/andrewor14	2022-12-08 01:03:00 +00:00
PyTorch MergeBot	9f7bc7bc24	Revert "[Quant][fx][bc-breaking] Make convert.py smaller (#90189 )" This reverts commit 824641b083860df4d7ffef06a798ea2702bc4bde. Reverted https://github.com/pytorch/pytorch/pull/90189 on behalf of https://github.com/seemethere due to Fails internal tests due to potential circular import, see https://www.internalfb.com/diff/D41817429?dst_version_fbid=1453307181865235&transaction_fbid=899728221278938	2022-12-08 00:51:13 +00:00
Bin Bao	d7c30e11c6	[inductor] Remove .to from lowering (#90280 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90280 Approved by: https://github.com/ngimel	2022-12-08 00:40:41 +00:00
Nikita Shulga	b8b439aede	C++17 friendly iterator implementation (#90379 ) Get rid of std::iterator inheritance/references for `c10::DictIterator`, `c10::IListRefIterator` and `c10::ListIterator` Followup after https://github.com/pytorch/pytorch/pull/90174 Fixes deprecation warning and extension compilation failures using VC++ that raises following errors: ``` C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\include\ATen/core/IListRef.h(517): error C4996: 'std::iterator<std::bidirectional_iterator_tag,T,ptrdiff_t,T ,T &>::value_type': warning STL4015: The std::iterator class template (used as a base class to provide typedefs) is deprecated in C++17. (The <iterator> header is NOT deprecated.) The C++ Standard has never required user-defined iterators to derive from std::iterator. To fix this warning, stop deriving from std::iterator and start providing publicly accessible typedefs named iterator_category, value_type, difference_type, pointer, and reference. Note that value_type is required to be non-const, even for constant iterators. You can define _SILENCE_CXX17_ITERATOR_BASE_CLASS_DEPRECATION_WARNING or _SILENCE_ALL_CXX17_DEPRECATION_WARNINGS to acknowledge that you have received this warning. C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\include\ATen/core/List.h(169): error C4996: 'std::iterator<std::random_access_iterator_tag,T,ptrdiff_t,T ,T &>::difference_type': warning STL4015: The std::iterator class template (used as a base class to provide typedefs) is deprecated in C++17. (The <iterator> header is NOT deprecated.) The C++ Standard has never required user-defined iterators to derive from std::iterator. To fix this warning, stop deriving from std::iterator and start providing publicly accessible typedefs named iterator_category, value_type, difference_type, pointer, and reference. Note that value_type is required to be non-const, even for constant iterators. You can define _SILENCE_CXX17_ITERATOR_BASE_CLASS_DEPRECATION_WARNING or _SILENCE_ALL_CXX17_DEPRECATION_WARNINGS to acknowledge that you have received this warning. ``` Discovered while working on https://github.com/pytorch/pytorch/pull/85969 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90379 Approved by: https://github.com/ezyang, https://github.com/dagitses	2022-12-08 00:30:20 +00:00
Michael Wootton	5351176caa	Kineto activity fix (#89785 ) Continuation of https://github.com/pytorch/pytorch/pull/88207 A compile time guard was preventing ActivityType::CUDA from being available on rocm. This caused both the GPU_FALLBACK and CUDA modes to be active at the same time. So operators were being charged gpu time for the hipEventRecord ranges and the actual kernel execution times. This caused incorrect (and often negative) cuda times, in e.g. table(). Previously a cmake variable was not being propagated to a '-D', causing an issue on Windows, which uses cuda but not cupti. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89785 Approved by: https://github.com/jeffdaily, https://github.com/malfet	2022-12-08 00:24:55 +00:00
Peter Bell	79406378ae	[primTorch] Add prim and ref for as_strided_scatter (#88426 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88426 Approved by: https://github.com/mruberry	2022-12-08 00:17:39 +00:00
Yanli Zhao	1f137c1e2f	add save and load stats in memory_tracker (#90144 ) add save and load stats in memory_tracker, so that users could plot the traces in another place, rather than just inside trainer Pull Request resolved: https://github.com/pytorch/pytorch/pull/90144 Approved by: https://github.com/rohan-varma	2022-12-08 00:17:21 +00:00
Natalia Gimelshein	bc93454e4a	correctly set strides for expanded/unsqueezed dimensions (#90341 ) Fixes https://github.com/pytorch/torchdynamo/issues/1959, #90260 However, I wasn't able to make existing stride tests fail before the fix, even though I'm comparing all, not just significant strides. Separately running refs on meta tensors produces wrong strides as shown in #90260, however, it looks like in meta tests some other way of computing meta info is used (I've been running ``` pytest -s -v test/test_meta.py -k test_meta_outplace_expand_cuda_float64 ``` and verified that it has sample input that should fail, and that it indeed compares all the strides, but the produced `meta_rs` results somehow still had correct strides). Edit: @SherlockNoMad helped me figure out how to fail the tests, and now I've set the correct ops for checking. `expand` fails for some test inputs because it special-cases 0-dim input case, correctly modeling it in prims would require a lot of changes, so skipping that for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90341 Approved by: https://github.com/SherlockNoMad	2022-12-07 23:38:33 +00:00
Xiaodong Wang	50ec416599	Fix C2 Ambiguous namespace (#89534 ) Summary: cuda:: is a ambiguous namespace. Make it explicit c10::cuda Differential Revision: D41469007 /caffe2/caffe2/core/context_gpu.cu(564): error: "caffe2::cuda" is ambiguous/caffe2/caffe2/core/context_gpu.cu(564): error: expected a ";"/caffe2/caffe2/core/context_gpu.cu(568): warning #12-D: parsing restarts here after previous syntax error Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"/caffe2/caffe2/core/context_gpu.cu(569): error: "caffe2::cuda" is ambiguous/caffe2/caffe2/core/context_gpu.cu(628): error: "caffe2::cuda" is ambiguous 4 errors detected in the compilation of "/caffe2/caffe2/core/context_gpu.cu". Pull Request resolved: https://github.com/pytorch/pytorch/pull/89534 Approved by: https://github.com/malfet	2022-12-07 23:36:41 +00:00
Manuel Candales	56ab94d6e4	[Vulkan][TCC] Add tests for quantized convolution with QUInt8 activation, weights and bias (#90012 ) Summary: - Registered vulkan_prepack::create_qconv2d_context to the QuantizedCPU backend. - Registered vulkan_prepack::run_qconv2d_context to the Vulkan backend. - Added function test_quantized_conv2d, in order to test Vulkan Quantized Conv2d with QUInt8 activation, weight and bias (all QUInt8). - Added multiples tests for vulkan quantized conv2d (regular, depthwise and pointwise). All these tests make use of the test_quantized_conv2d function. This function tests the correctness of vulkan quantized conv2d, by comparing the following two processes: (we start with randomly generated float cpu tensors) - random float cpu tensors -> to vulkan -> quantize them -> apply vulkan conv2d quantized op -> dequantize result -> to cpu - random float cpu tensors -> quantize them -> dequantize -> apply cpu floating point conv2d op on dequantized tensors -> quantize result -> dequantize This function takes three boolean flags that modify its behavior: - prepacking: - if false, then we directly call at::native::vulkan::ops::quantized_conv2d - if true, then we call vulkan_prepack::create_qconv2d_context and vulkan_prepack::run_qconv2d_context. - compute_quantization_params & random_quantization_params: - if both are false, all quantization params are fixed (given as input) - if compute_quantization_params is true, all params are computed - if random_quantization_params is true, the input params are random and the output params are computed. (compute_quantization_params takes precedence over random_quantization_params) Test Plan: On Mac ``` cd ~/fbsource buck1 run -c pt.vulkan_full_precision=1 //xplat/caffe2:pt_vulkan_quantized_api_test_binAppleMac\#macosx-arm64 ``` On Android ``` cd ~/fbsource buck1 build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 -c pt.vulkan_full_precision=1 //xplat/caffe2:pt_vulkan_quantized_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_quantized_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_quantized_api_test adb shell "/data/local/tmp/vulkan_quantized_api_test" ``` Reviewed By: SS-JIA Differential Revision: D41047096 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90012 Approved by: https://github.com/salilsdesai	2022-12-07 23:21:57 +00:00
Nikita Shulga	e0f681aa85	Add manual cuda deps search logic (#90411 ) If PyTorch is package into a wheel with [nvidia-cublas-cu11](https://pypi.org/project/nvidia-cublas-cu11/), which is designated as PureLib, but `torch` wheel is not, can cause a torch_globals loading problem. Fix that by searching for `nvidia/cublas/lib/libcublas.so.11` an `nvidia/cudnn/lib/libcudnn.so.8` across all `sys.path` folders. Test plan: ``` docker pull amazonlinux:2 docker run --rm -t amazonlinux:2 bash -c 'yum install -y python3 python3-devel python3-distutils patch;python3 -m pip install torch==1.13.0;curl -OL https://patch-diff.githubusercontent.com/raw/pytorch/pytorch/pull/90411.diff; pushd /usr/local/lib64/python3.7/site-packages; patch -p1 </90411.diff; popd; python3 -c "import torch;print(torch.__version__, torch.cuda.is_available())"' ``` Fixes https://github.com/pytorch/pytorch/issues/88869 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90411 Approved by: https://github.com/atalman	2022-12-07 23:06:51 +00:00
Facebook Community Bot	3ef4fc2012	Automated submodule update: FBGEMM (#74729 ) This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `f99e161663` Test Plan: Ensure that CI jobs succeed on GitHub before landing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74729 Approved by: https://github.com/malfet	2022-12-07 22:36:35 +00:00
Andrew Gu	ecd418673b	[FSDP][Easy] ufmt files (#90384 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90384 Approved by: https://github.com/H-Huang	2022-12-07 21:18:23 +00:00
Manuel Candales	32973651e6	[Vulkan] Enable copying QInt8 and QInt32 tensors from cpu to vulkan. (#90357 ) Summary: Copying QInt8 and QInt32 from cpu to vulkan: - Added shader nchw_to_image_int8 - Added shader nchw_to_image_int32 Copying QInt8 and QInt32 from vulkan to cpu Note: This functionality is currently disabled until issues on Android are resolved. - Added shader image_to_nchw_int32 - QInt8 works with the same existing image_to_nchw_quantized shaders Added multiple tests for each supported dtype: - cpu_to_vulkan_and_dequantize: These tests check the correctness of copying quantized cpu tensor to vulkan by comparing the output of the following: - cpu float tensor -> quantize -> to vulkan -> dequantize -> to cpu - cpu float tensor -> quantize -> dequantize - cpu_to_vulkan_and_vulkan_to_cpu (currently disabled until copying vulkan quantized to cpu is enabled): These tests check the correctness of copying from cpu to vulkan and from vulkan to cpu by creating a random cpu float tensor, quantizing it, then copying it to vulkan, then back to cpu and comparing the output tensor to the original quantized tensor. - quantize_per_tensor_and_vulkan_to_cpu (currently disabled until copying vulkan quantized to cpu is enabled): These tests check the correctness of copying quantized tensor from vulkan to cpu by comparing the output of the following: - cpu float tensor -> to vulkan -> quantize -> to cpu - cpu float tensor -> quantize Test Plan: On Mac ``` cd ~/fbsource buck1 run -c pt.vulkan_full_precision=1 //xplat/caffe2:pt_vulkan_quantized_api_test_binAppleMac\#macosx-arm64 ``` On Android ``` cd ~/fbsource buck1 build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 -c pt.vulkan_full_precision=1 //xplat/caffe2:pt_vulkan_quantized_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_quantized_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_quantized_api_test adb shell "/data/local/tmp/vulkan_quantized_api_test" ``` Reviewed By: kimishpatel Differential Revision: D41654287 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90357 Approved by: https://github.com/SS-JIA	2022-12-07 21:17:35 +00:00
Angela Yi	a076bdb357	[fx] Copy codegen in legalize_graph (#90023 ) Test Plan: CI Differential Revision: D41666330 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90023 Approved by: https://github.com/SherlockNoMad	2022-12-07 21:09:38 +00:00
Edward Z. Yang	6dcc214ac2	Fix AssertionError fake_mode is not None in distributed (#90392 ) Fixes https://github.com/pytorch/pytorch/issues/90375 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/90392 Approved by: https://github.com/voznesenskym	2022-12-07 20:12:39 +00:00
Edward Z. Yang	2ad6ed8ac9	Fix some typed storage is deprecated warnings. (#89867 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89867 Approved by: https://github.com/albanD	2022-12-07 20:09:57 +00:00
PyTorch MergeBot	1b1301f16a	Revert "[pruning][core][feature] Implement prune for structured pruning (#89777 )" This reverts commit 3531e44307fa58460e2488bcaace948678d6cf9f. Reverted https://github.com/pytorch/pytorch/pull/89777 on behalf of https://github.com/clee2000 due to breaking test_ao_sparcity due to import `3531e44307` https://github.com/pytorch/pytorch/actions/runs/3641476330/jobs/6147830487, probably a landrace with 824641b083860df4d7ffef06a798ea2702bc4bde?	2022-12-07 19:41:15 +00:00
Chien-Chin Huang	44779d9bc6	[FSDP][optim_state_dict][2/N] Add _get_fqn_to_fsdp_param_info to map from original FQN to flat_param (#89899 ) Motivation: Add a helper to map from the FQN to the corresponding flat_param. The helper will directly get flat_param from fsdp_state and flat_handler as flat_param is not registered to the module if `use_orig_params` is True. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89899 Approved by: https://github.com/awgu	2022-12-07 19:40:47 +00:00
Bin Bao	f7cdd3a7a0	[inductor] Use a large tolerance for botnet26t_256 (#90383 ) Summary: botnet26t_256 shows random tolerance failure on CI. The root cause of this randomness is still to-be-invesitgated, but let's use a larger tolerance for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90383 Approved by: https://github.com/ezyang	2022-12-07 19:35:06 +00:00
YJ Shi	2b0b4bb6fd	[Dynamo] Fix llvm target for meta schedule & add torch to tvm ndarray helper func (#90214 ) Fixes #90213. Also a torch.tensor to tvm.nd.array helper function is added to avoid data copy with dlpack. @jansel @Chillee Pull Request resolved: https://github.com/pytorch/pytorch/pull/90214 Approved by: https://github.com/wconstab	2022-12-07 19:23:56 +00:00
Sergii Dymchenko	6a7659f304	Fix issue 38095 TODO in test_autograd.py (#90031 ) Fix TODO related to https://github.com/pytorch/pytorch/issues/38095 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90031 Approved by: https://github.com/clee2000	2022-12-07 19:09:43 +00:00
Richard Zou	4b1053497c	[vmap] Prepend "legacy" to files for old vmap implementation (#90324 ) We have an older torch.vmap implementation. It is no longer supported. It still needs to exist somewhere for the sake of BC with torch.autograd.functional. This PR makes it clear what files are meant for implementing the old vmap implementation. I've seen a couple of PRs recently adding support for the old vmap implementation, so this will lessen the confusion. Test Plan: - CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/90324 Approved by: https://github.com/samdow	2022-12-07 18:46:15 +00:00
Nikita Shulga	94d800ffd1	Make Transformers compilable by C++17 (#90389 ) `register` keyword is removed in C++17, but keeping it there under ifdef as I have not measured the perf implication on older compiler, though there shouldn't be any: all modern compilers supposed to downright ignore it. This code originates from https://github.com/facebookresearch/xformers/pull/375 will propose similar PR to remove register keyword usage to that repo. Yet another thing discovered while working on https://github.com/pytorch/pytorch/pull/85969 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90389 Approved by: https://github.com/drisspg	2022-12-07 18:10:44 +00:00
Jesse Cai	3531e44307	[pruning][core][feature] Implement prune for structured pruning (#89777 ) Summary: This PR implements `prune` in BaseStructuredSparsifier: `prune` is a function that takes in a model with structured sparsity parametritizations (the result of `prepare`) and will return a resized model with the masked out weights removed. `prune` is defined by a mapping from patterns to different pruning functions. - patterns are just sequences of operations, for example `(nn.Linear, activation, nn.Linear)` - pruning functions are functions that take in an matched pattern as args and will resize the appropriate layer sizes and weights. ``` def prune_linear_activation_linear(linear1, activation, linear2): pass ``` - This is one line in the pattern config `(nn.Linear, activation, nn.Linear): prune_linear_activation_linear` At a high level `prune` works by finding instances of the graph that match different patterns and then calling the mapped pruning functions on those matched patterns. This is unlike the previous code which attempted to do both at the same time. There may be some gaps in the patterns compared to the previous implementation, but the conversion functionality support should be the same. Currently we have pruning functions for the following patterns: - linear -> linear - linear -> activation -> linear - conv2d -> conv2d - conv2d -> activation -> conv2d - conv2d -> activation -> pool -> conv2d - conv2d -> pool -> activation -> conv2d - conv2d -> adaptive pool -> flatten -> linear Added in MyPy type hints as well for the prune_functions. Test Plan: Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/89777 Approved by: https://github.com/vkuzo	2022-12-07 17:52:01 +00:00
Jesse Cai	d680ea7e36	[quant]Fix public bindings for DTypeWithConstraint (#90315 ) Summary: Need this to fix `test_public_bindings`. Test Plan: `python test/test_public_bindings.py` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/90315 Approved by: https://github.com/HDCharles	2022-12-07 17:52:01 +00:00
Michael Voznesensky	4cdc96fb4f	Add hooks structure for passing around user provided hooks, add a new guard_failure_fn (#90371 ) This PR introduces a new function we can pass to torch._dynamo.optimize - guard_failure_fn. Usage is in the PR, and the one stacked on top of it, but the gist of it is that it emits failed guard reason strings alongside code. This is useful for tests and debugging, as it gives far finer grained assertions and control than the compile counter alone. This is a resubmit of https://github.com/pytorch/pytorch/pull/90129 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90371 Approved by: https://github.com/ezyang	2022-12-07 17:51:53 +00:00
Nikita Shulga	c92cf6bee3	[BE][CI] Add windows test run instructions (#90388 ) Specifies how to activate VisualStudio, Anaconda and set `PYTHONPATH` to run tests in CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/90388 Approved by: https://github.com/atalman, https://github.com/ZainRizvi	2022-12-07 17:41:54 +00:00
andrewor14	824641b083	[Quant][fx][bc-breaking] Make convert.py smaller (#90189 ) Summary: This commit moves helper functions that are not core to the convert logic out of convert.py, which was more than 1000 lines. This helps with readability since a new developer won't have to scroll through hundreds of lines of util functions to understand the core logic. There should be no change in functionality in this commit. BC-breaking notes: The following helper functions that were previously exposed under the `torch.ao.quantization.fx.convert` namespace are now made private. Many of these are moved to the new convert_utils.py ``` convert_custom_module convert_standalone_module convert_weighted_module get_module_path_and_prefix, has_none_qconfig, insert_dequantize_node, is_conversion_supported, maybe_recursive_remove_dequantize, replace_observer_or_dequant_stub_with_dequantize_node, restore_state, run_weight_observers, ``` Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/90189 Approved by: https://github.com/jerryzh168	2022-12-07 16:16:25 +00:00
Charlie Yan	99fb39f508	reland #89243 : [Composable API] replicate: add support for DDP args (#90255 ) reland https://github.com/pytorch/pytorch/pull/89243 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90255 Approved by: https://github.com/zhaojuanmao	2022-12-07 15:22:33 +00:00
Peter Bell	e6a7278753	Give std/var correction overloads proper defaults (#56398 ) The correction overloads defaults were left off for forward compatibility reasons, but this FC window expired well over a year ago at this point. Differential Revision: [D29625593](https://our.internmc.facebook.com/intern/diff/D29625593) Pull Request resolved: https://github.com/pytorch/pytorch/pull/56398 Approved by: https://github.com/mruberry	2022-12-07 15:15:00 +00:00
Nikita Shulga	b0bd5c4508	[MPS] Fix median_out_mps caching (#90326 ) We should cache graph based on input tensor type Fixes https://github.com/pytorch/pytorch/issues/90311 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90326 Approved by: https://github.com/kulinseth	2022-12-07 07:24:58 +00:00
fduwjj	85ae28b454	Reformat optim import (#90294 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90294 Approved by: https://github.com/awgu	2022-12-07 07:11:12 +00:00
Pruthvi Madugundu	15949fc248	[ROCm] Enable few test_prim UTs for ROCm (#88983 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88983 Approved by: https://github.com/IvanYashchuk, https://github.com/jeffdaily, https://github.com/malfet	2022-12-07 06:21:31 +00:00
Bert Maher	26d1dbc4f8	[inductor] More correct check for fbcode environment (#90312 ) Summary: importing torch.fb seemed like a good idea, but we don't always have torch.fb inside fbcode. Testing for torch.version.git_version is more reliable, since we'll never have a git_version inside fbcode, which is an hg repo. Test Plan: `buck2 run mode/dev-nosan //caffe2/test/inductor:smoke` Reviewed By: soumith, jansel Differential Revision: D41777058 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90312 Approved by: https://github.com/soumith	2022-12-07 04:50:11 +00:00
Ram Rachum	351d73b97f	Fix exception causes all over the codebase (#90271 ) This is the continuation to #90134 and hopefully the final PR in this series. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90271 Approved by: https://github.com/kit1980	2022-12-07 04:29:00 +00:00
David Berard	8f079b895b	[Dynamo+FSDP] Update benchmarks with use_orig_params=True (#90100 ) After https://github.com/pytorch/pytorch/pull/89523, we now need to assert use_orig_params=True, even in the non-recursive case where (I think) we wouldn't otherwise need to run with use_orig_params=True. Tested with `python benchmarks/dynamo/torchbench.py --training --accuracy --only hf_T5 --fsdp` Pull Request resolved: https://github.com/pytorch/pytorch/pull/90100 Approved by: https://github.com/wconstab	2022-12-07 03:33:58 +00:00
Yanbo Liang	898b46d6cc	[Dynamo][Easy] capture more exceptions when import skip modules (#90338 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/90338 Approved by: https://github.com/williamwen42	2022-12-07 02:05:39 +00:00

1 2 3 4 5 ...

54606 Commits