pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Stephen Jia	732255f031	[vulkan] Add VMA as a third_party subrepo (#83906 ) the [VulkanMemoryAllocator](https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator) is a popular library for GPU memory allocation using Vulkan. The Vulkan backend has a dependency on it, but since it is only a single header file we currently include it by checking it into the repo under [aten/src/ATen/native/vulkan/api/vk_mem_alloc.h](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/vulkan/api/vk_mem_alloc.h). However, it is better to check it in as a third party submodule, since it allows better version tracking. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83906 Approved by: https://github.com/kimishpatel	2022-08-23 18:42:46 +00:00
Jing Xu	3c7044728b	Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289 ) More detailed description of benefits can be found at #41001. This is Intel's counterpart of NVidia’s NVTX (https://pytorch.org/docs/stable/autograd.html#torch.autograd.profiler.emit_nvtx). ITT is a functionality for labeling trace data during application execution across different Intel tools. For integrating Intel(R) VTune Profiler into Kineto, ITT needs to be integrated into PyTorch first. It works with both standalone VTune Profiler [(https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html)) and Kineto-integrated VTune functionality in the future. It works for both Intel CPU and Intel XPU devices. Pitch Add VTune Profiler's ITT API function calls to annotate PyTorch ops, as well as developer customized code scopes on CPU, like NVTX for NVidia GPU. This PR rebases the code changes at https://github.com/pytorch/pytorch/pull/61335 to the latest master branch. Usage example: ``` with torch.autograd.profiler.emit_itt(): for i in range(10): torch.itt.range_push('step_{}'.format(i)) model(input) torch.itt.range_pop() ``` cc @ilia-cher @robieta @chaekit @gdankel @bitfort @ngimel @orionr @nbcsm @guotuofeng @guyang3532 @gaoteng-git Pull Request resolved: https://github.com/pytorch/pytorch/pull/63289 Approved by: https://github.com/malfet	2022-07-13 13:50:15 +00:00
PyTorch MergeBot	1454515253	Revert "Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289 )" This reverts commit f988aa2b3ff77d5aa010bdaae4e52c6ee345c04d. Reverted https://github.com/pytorch/pytorch/pull/63289 on behalf of https://github.com/malfet due to broke trunk, see `f988aa2b3f`	2022-06-30 12:49:41 +00:00
Jing Xu	f988aa2b3f	Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289 ) More detailed description of benefits can be found at #41001. This is Intel's counterpart of NVidia’s NVTX (https://pytorch.org/docs/stable/autograd.html#torch.autograd.profiler.emit_nvtx). ITT is a functionality for labeling trace data during application execution across different Intel tools. For integrating Intel(R) VTune Profiler into Kineto, ITT needs to be integrated into PyTorch first. It works with both standalone VTune Profiler [(https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html)) and Kineto-integrated VTune functionality in the future. It works for both Intel CPU and Intel XPU devices. Pitch Add VTune Profiler's ITT API function calls to annotate PyTorch ops, as well as developer customized code scopes on CPU, like NVTX for NVidia GPU. This PR rebases the code changes at https://github.com/pytorch/pytorch/pull/61335 to the latest master branch. Usage example: ``` with torch.autograd.profiler.emit_itt(): for i in range(10): torch.itt.range_push('step_{}'.format(i)) model(input) torch.itt.range_pop() ``` cc @ilia-cher @robieta @chaekit @gdankel @bitfort @ngimel @orionr @nbcsm @guotuofeng @guyang3532 @gaoteng-git Pull Request resolved: https://github.com/pytorch/pytorch/pull/63289 Approved by: https://github.com/malfet	2022-06-30 05:14:03 +00:00
John Detloff	e487ba7333	Add nlohmann/json submodule (#80322 ) Summary: Introduce nlohmann/json as a submodule within pytorch/third_party. This library is already a transitive dependency and is included in our licenses file. Adding it directly to third_party will enable its use by the CoreML backend. Test Plan: There are no code changes, so submodule sync and perform the steps outline in the building from source section of the pytorch readme. Differential Revision: D37449817 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80322 Approved by: https://github.com/mcr229	2022-06-28 23:54:33 +00:00
PyTorch MergeBot	ec4be38ba9	Revert "To add hipify_torch as a submodule in pytorch/third_party (#74704 )" This reverts commit 93b0fec39dd112d5c06106ad0186d55d61f1531a. Reverted https://github.com/pytorch/pytorch/pull/74704 on behalf of https://github.com/malfet due to broke torchvision	2022-06-21 23:54:00 +00:00
Bhavya Medishetty	93b0fec39d	To add hipify_torch as a submodule in pytorch/third_party (#74704 ) `hipify_torch` as a submodule in `pytorch/third_party` Pull Request resolved: https://github.com/pytorch/pytorch/pull/74704 Approved by: https://github.com/jeffdaily, https://github.com/malfet	2022-06-21 18:56:49 +00:00
Nikita Shulga	fa7117c64a	Update PeachPy submodule (#78326 ) Forked the repo, merged latest changes into pre-generated branch and update pregenerared opcodes Re-enabled NNPACK builds on MacOS Picking `f8ef1a3c0a` fixes https://github.com/pytorch/pytorch/issues/76094 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78326 Approved by: https://github.com/atalman, https://github.com/albanD	2022-05-26 13:58:36 +00:00
Nikita Shulga	8473173c36	Remove breakpad dependency This functionality does not seem to be used and there are some requests to update dependency. Add `third_party` to torch_cpu include directories if compiling with Caffe2 support, as `caffe2/quantization/server/conv_dnnlowp_op.cc` depends on `third_party/fbgemm/src/RefImplementations.h` Pull Request resolved: https://github.com/pytorch/pytorch/pull/75394 Approved by: https://github.com/janeyx99, https://github.com/seemethere	2022-05-03 20:21:55 +00:00
PyTorch MergeBot	d79d9fa283	Revert "Remove breakpad dependency" This reverts commit 9aa3c7fd8389735b04622bf07f6ef85c608374d0. Reverted https://github.com/pytorch/pytorch/pull/75394 on behalf of https://github.com/malfet	2022-04-17 17:58:51 +00:00
Nikita Shulga	9aa3c7fd83	Remove breakpad dependency This functionality does not seem to be used and there are some requests to update dependency Pull Request resolved: https://github.com/pytorch/pytorch/pull/75394 Approved by: https://github.com/janeyx99, https://github.com/seemethere	2022-04-17 17:43:45 +00:00
acxz	f98881b1bf	update eigen submodule to latest release (3.4.0) with rocm fixes Fixes #73177 Pull Request resolved: https://github.com/pytorch/pytorch/pull/73178 Approved by: https://github.com/jeffdaily, https://github.com/suo, https://github.com/malfet	2022-04-07 18:54:58 +00:00
Han Qi	1bc3571078	[pytorch][PR] Add ability for a mobile::Module to save as flatbuffer (#70201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70201 Included functions: save_mobile_module -> saves a mobile::Module to flatbuffer load_mobile_module_from_file -> loads a flatbuffer into mobile::Module parse_mobile_module -> parses from bytes or deserialized flatbuffer module object Compared to previous attempts, this diff only adds flatbuffer to cmake target and leaves fbcode/xplat ones unchanged. Test Plan: unittest Reviewed By: malfet, gmagogsfm Differential Revision: D33239362 fbshipit-source-id: b9ca36b83d6af2d78cc50b9eb9e2a6fa7fce0763	2022-01-12 16:30:39 -08:00
Yanan Cao	17f3179d60	Back out "[pytorch][PR] Add ability for a mobile::Module to save as flatbuffer" (#69796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69796 (Note: this ignores all push blocking failures!) Test Plan: External CI + Sandcastle Reviewed By: zhxchen17 Differential Revision: D33032671 fbshipit-source-id: dbf6690e960e25d6a5f19043cbe792add2acd7ef	2021-12-10 21:29:53 -08:00
Han Qi	d3649309e6	[pytorch][PR] Add ability for a mobile::Module to save as flatbuffer (#69306 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69306 Included functions: save_mobile_module -> saves a mobile::Module to flatbuffer load_mobile_module_from_file -> loads a flatbuffer into mobile::Module parse_mobile_module -> parses from bytes or deserialized flatbuffer Module object Test Plan: unittests Reviewed By: gmagogsfm Differential Revision: D32806835 fbshipit-source-id: 71913c6650e225634f878946bd16960d377a7f57	2021-12-09 14:53:31 -08:00
Alban Desmaison	00ebbd5ef6	Revert D32010095: [pytorch][PR] Add ability for a mobile::Module to save as flatbuffer Test Plan: revert-hammer Differential Revision: D32010095 (`41d35dc201`) Original commit changeset: d763b0557780 fbshipit-source-id: bf746a0389135c9f5f67f00f449435ce08fb5f6d	2021-12-02 06:41:40 -08:00
Han Qi	41d35dc201	Add ability for a mobile::Module to save as flatbuffer (#67351 ) Summary: Included functions: * save_mobile_module -> saves a mobile::Module to flatbuffer * load_mobile_module_from_file -> loads a flatbuffer into mobile::Module * parse_mobile_module -> parses from bytes or deserialized flatbuffer Module object Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/67351 Reviewed By: iseeyuan Differential Revision: D32010095 Pulled By: qihqi fbshipit-source-id: d763b0557780f7c2661b6485105b045e41a5e8f1	2021-12-01 23:58:15 -08:00
driazati	bd8608cd5c	Use CMake for breakpad (#63186 ) Summary: We currently build breakpad from [this fork](https://github.com/driazati/breakpad) to include extra logic to restore signal handlers that were previously present. With some [new additions](https://github.com/google/breakpad/compare/main...driazati:main) this fork now includes a CMake based build, so we can add breakpad as a proper dependency rather than rely on including it in Docker images as a system library which is error prone (we have a bunch of images) and hard to extend to MacOS / Windows. This also includes some changes to the crash handling code to support MacOS / Windows in a similar way to Linux. ```python import torch # On Windows this writes crashes to C:\Users\<user>\AppData\pytorch_crashes # On MacOS/Linux this writes crashes to /tmp/pytorch_crashes torch.utils._crash_handler.enable_minidumps() # Easy way to cause a segfault and trigger the handler torch.bincount(input=torch.tensor([9223372036854775807])) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/63186 Reviewed By: malfet, seemethere Differential Revision: D30318404 Pulled By: driazati fbshipit-source-id: 0d7daf3701cfaba5451cc529a0730272ab1eb1dc	2021-08-19 10:42:01 -07:00
Nikita Shulga	6e5d065b2b	Add pocketfft as submodule (#62841 ) Summary: Using https://github.com/mreineck/pocketfft Also delete explicit installation of pocketfft during the build as it will be available via submodule Limit PocketFFT support to cmake-3.10 or newer, as `set_source_files_properties` does not seem to work as expected with cmake-3.5 Partially addresses https://github.com/pytorch/pytorch/issues/62821 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62841 Reviewed By: seemethere Differential Revision: D30140441 Pulled By: malfet fbshipit-source-id: d1a1cf1b43375321f5ec5b3d0b538f58082f7825	2021-08-17 15:29:56 -07:00
Xiang Gao	6c70cbedb6	step 0 of cuDNN v8 convolution API integration (#51390 ) Summary: This PR is step 0 of adding PyTorch convolution bindings using the cuDNN frontend. The cuDNN frontend is the recommended way of using cuDNN v8 API. It is supposed to have faster release cycles, so that, for example, if people find a specific kernel has a bug, they can report it, and that kernel will be blocked in the cuDNN frontend and frameworks could just update that submodule without the need for waiting for a whole cuDNN release. The work is not complete, and this PR is only step 0. What this PR does: - Add cudnn-frontend as a submodule. - Modify cmake to build that submodule. - Add bindings for convolution forward in `Conv_v8.cpp`, which is disabled by a macro by default. - Tested manually by enabling the macro and run `test_nn.py`. All tests pass except those mentioned below. What this PR doesn't: - Only convolution forward, no backward. The backward will use v7 API. - No 64bit-indexing support for some configuration. This is a known issue of cuDNN, and will be fixed in a later cuDNN version. PyTorch will not implement any workaround for issue, but instead, v8 API should be disabled on problematic cuDNN versions. - No test beyond PyTorch's unit tests. - Not tested for correctness on real models. - Not benchmarked for performance. - Benchmark cache is not thread-safe. (This is marked as `FIXME` in the code, and will be fixed in a follow-up PR) - cuDNN benchmark is not supported. - There are failing tests, which will be resolved later: ``` FAILED test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float16 - AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0.001 and atol=1e-05, found 32 element(s) (out of 32) whose difference(s) exceeded the margin of error (in... FAILED test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float32 - AssertionError: False is not true : Tensors failed to compare as equal!With rtol=1.3e-06 and atol=1e-05, found 32 element(s) (out of 32) whose difference(s) exceeded the margin of error (... FAILED test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_large_cuda - RuntimeError: CUDNN_BACKEND_OPERATION: cudnnFinalize Failed cudnn_status: 9 FAILED test/test_nn.py::TestNN::test_Conv2d_depthwise_naive_groups_cuda - AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0 and atol=1e-05, found 64 element(s) (out of 64) whose difference(s) exceeded the margin of error (including 0 an... FAILED test/test_nn.py::TestNN::test_Conv2d_deterministic_cudnn - RuntimeError: not supported yet FAILED test/test_nn.py::TestNN::test_ConvTranspose2d_groups_cuda_fp32 - RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM FAILED test/test_nn.py::TestNN::test_ConvTranspose2d_groups_cuda_tf32 - RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM ``` Although this is not a complete implementation of cuDNN v8 API binding, I still want to merge this first. This would allow me to do small and incremental work, for the ease of development and review. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51390 Reviewed By: malfet Differential Revision: D28513167 Pulled By: ngimel fbshipit-source-id: 9cc20c9dec5bbbcb1f94ac9e0f59b10c34f62740	2021-05-19 12:54:09 -07:00
Rong Rong	19f77700ec	clean up typos in submodule (#54372 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54372 Reviewed By: heitorschueroff Differential Revision: D27233797 Pulled By: walterddr fbshipit-source-id: f8d321199b6ae8b482e2ac3f10575402551365ef	2021-03-22 11:13:06 -07:00
Nikita Shulga	6f3aa58d80	Fix autograd thread crash with python-3.9 (#50998 ) Summary: Update pybind repo to include `gil_scoped_acquire::disarm()` methods In python_engine allocate scoped_acquire as unique_ptr and leak it if engine is finalizing for Python-3.9+ Fixes https://github.com/pytorch/pytorch/issues/50014 and https://github.com/pytorch/pytorch/issues/50893 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50998 Reviewed By: ezyang Differential Revision: D26038314 Pulled By: malfet fbshipit-source-id: 035411e22825e8fdcf1348fed36da0bc33e16f60	2021-01-26 13:29:47 -08:00
Ilia Cherniavskii	fdc62c74a6	Add Kineto submodule (separate PR) (#48332 ) Summary: Separate PR to add Kineto submodule, mirrors the one I used in my stack (45887) Pull Request resolved: https://github.com/pytorch/pytorch/pull/48332 Reviewed By: gdankel Differential Revision: D25139969 Pulled By: ilia-cher fbshipit-source-id: b9ca2be5f15647655eeb4b2fbf4c82f84eee3dd8	2020-11-20 23:46:34 -08:00
Eli Uriegas	aa8aa30a0b	third_party: Update pybind to point to fork (#48117 ) Summary: There are specific patches we need for Python 3.9 compatibilty and that process is currently hung up on separate issues. Let's update to a newer version of our forked pybind to grab the Python 3.9 fixes while we wait for them to be upstreamed Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/48117 Relates to: https://github.com/pybind/pybind11/pull/2657 Full comparison for this update looks like this: `59a2ac2745`...seemethere:v2.6-fb Fixes https://github.com/pytorch/pytorch/issues/47776 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48120 Reviewed By: gchanan Differential Revision: D25030688 Pulled By: seemethere fbshipit-source-id: 10889c813aeaa70ef1298adad5c631e6b5a39d72	2020-11-19 19:30:09 -08:00
Nikita Shulga	49af421143	Embed callgrind headers (#45914 ) Summary: Because access to https://sourceware.org/git/valgrind.git can be really slow especially in some regions Pull Request resolved: https://github.com/pytorch/pytorch/pull/45914 Reviewed By: seemethere Differential Revision: D24144420 Pulled By: malfet fbshipit-source-id: a454c8c3182c570ec344bf6468bb5e55d8b8da79	2020-10-06 17:51:10 -07:00
Taylor Robie	2b13d9413e	Re-land: Add callgrind collection to Timer #44717 (#45586 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45586 Test Plan: The unit test has been softened to be less platform sensitive. Reviewed By: mruberry Differential Revision: D24025415 Pulled By: robieta fbshipit-source-id: ee986933b984e736cf1525e1297de6b21ac1f0cf	2020-09-30 17:43:06 -07:00
Mike Ruberry	51d0ae9207	Revert D24010742: [pytorch][PR] Add callgrind collection to Timer Test Plan: revert-hammer Differential Revision: D24010742 (`9b27e0926b`) Original commit changeset: df6bc765f8ef fbshipit-source-id: 4c1edd57ea932896f7052716427059c924222501	2020-09-30 10:15:46 -07:00
Taylor Robie	9b27e0926b	Add callgrind collection to Timer (#44717 ) Summary: This PR allows Timer to collect deterministic instruction counts for (some) snippets. Because of the intrusive nature of Valgrind (effectively replacing the CPU with an emulated one) we have to perform our measurements in a separate process. This PR writes a `.py` file containing the Timer's `setup` and `stmt`, and executes it within a `valgrind` subprocess along with a plethora of checks and error handling. There is still a bit of jitter around the edges due to the Python glue that I'm using, but the PyTorch signal is quite good and thus this provides a low friction way of getting signal. I considered using JIT as an alternative, but: A) Python specific overheads (e.g. parsing) are important B) JIT might do rewrites which would complicate measurement. Consider the following bit of code, related to https://github.com/pytorch/pytorch/issues/44484: ``` from torch.utils._benchmark import Timer counts = Timer( "x.backward()", setup="x = torch.ones((1,)) + torch.ones((1,), requires_grad=True)" ).collect_callgrind() for c, fn in counts[:20]: print(f"{c:>12} {fn}") ``` ``` 812800 ???:_dl_update_slotinfo 355600 ???:update_get_addr 308300 work/Python/ceval.c:_PyEval_EvalFrameDefault'2 304800 ???:__tls_get_addr 196059 ???:_int_free 152400 ???:__tls_get_addr_slow 138400 build/../c10/core/ScalarType.h:c10::typeMetaToScalarType(caffe2::TypeMeta) 126526 work/Objects/dictobject.c:_PyDict_LoadGlobal 114268 ???:malloc 101400 work/Objects/unicodeobject.c:PyUnicode_FromFormatV 85900 work/Python/ceval.c:_PyEval_EvalFrameDefault 79946 work/Objects/typeobject.c:_PyType_Lookup 72000 build/../c10/core/Device.h:c10::Device::validate() 70000 /usr/include/c++/8/bits/stl_vector.h:std::vector<at::Tensor, std::allocator<at::Tensor> >::~vector() 66400 work/Objects/object.c:_PyObject_GenericGetAttrWithDict 63000 ???:pthread_mutex_lock 61200 work/Objects/dictobject.c:PyDict_GetItem 59800 ???:free 58400 work/Objects/tupleobject.c:tupledealloc 56707 work/Objects/dictobject.c:lookdict_unicode_nodummy ``` Moreover, if we backport this PR to 1.6 (just copy the `_benchmarks` folder) and load those counts as `counts_1_6`, then we can easily diff them: ``` print(f"Head instructions: {sum(c for c, _ in counts)}") print(f"1.6 instructions: {sum(c for c, _ in counts_1_6)}") count_dict = {fn: c for c, fn in counts} for c, fn in counts_1_6: _ = count_dict.setdefault(fn, 0) count_dict[fn] -= c count_diffs = sorted([(c, fn) for fn, c in count_dict.items()], reverse=True) for c, fn in count_diffs[:15] + [["", "..."]] + count_diffs[-15:]: print(f"{c:>8} {fn}") ``` ``` Head instructions: 7609547 1.6 instructions: 6059648 169600 ???:_dl_update_slotinfo 101400 work/Objects/unicodeobject.c:PyUnicode_FromFormatV 74200 ???:update_get_addr 63600 ???:__tls_get_addr 46800 work/Python/ceval.c:_PyEval_EvalFrameDefault 33512 work/Objects/dictobject.c:_PyDict_LoadGlobal 31800 ???:__tls_get_addr_slow 31700 build/../aten/src/ATen/record_function.cpp:at::RecordFunction::RecordFunction(at::RecordScope) 28300 build/../torch/csrc/utils/python_arg_parser.cpp:torch::FunctionSignature::parse(_object, _object, _object, _object, bool) 27800 work/Objects/object.c:_PyObject_GenericGetAttrWithDict 27401 work/Objects/dictobject.c:lookdict_unicode_nodummy 24115 work/Objects/typeobject.c:_PyType_Lookup 24080 ???:_int_free 21700 work/Objects/dictobject.c:PyDict_GetItemWithError 20700 work/Objects/dictobject.c:PyDict_GetItem ... -3200 build/../c10/util/SmallVector.h:at::TensorIterator::binary_op(at::Tensor&, at::Tensor const&, at::Tensor const&, bool) -3400 build/../aten/src/ATen/native/TensorIterator.cpp:at::TensorIterator::resize_outputs(at::TensorIteratorConfig const&) -3500 /usr/include/c++/8/x86_64-redhat-linux/bits/gthr-default.h:std::unique_lock<std::mutex>::unlock() -3700 build/../torch/csrc/utils/python_arg_parser.cpp:torch::PythonArgParser::raw_parse(_object, _object, _object) -4207 work/Objects/obmalloc.c:PyMem_Calloc -4500 /usr/include/c++/8/bits/stl_vector.h:std::vector<at::Tensor, std::allocator<at::Tensor> >::~vector() -4800 build/../torch/csrc/autograd/generated/VariableType_2.cpp:torch::autograd::VariableType::add__Tensor(at::Tensor&, at::Tensor const&, c10::Scalar) -5000 build/../c10/core/impl/LocalDispatchKeySet.cpp:c10::impl::ExcludeDispatchKeyGuard::ExcludeDispatchKeyGuard(c10::DispatchKey) -5300 work/Objects/listobject.c:PyList_New -5400 build/../torch/csrc/utils/python_arg_parser.cpp:torch::FunctionParameter::check(_object, std::vector<pybind11::handle, std::allocator<pybind11::handle> >&) -5600 /usr/include/c++/8/bits/std_mutex.h:std::unique_lock<std::mutex>::unlock() -6231 work/Objects/obmalloc.c:PyMem_Free -6300 work/Objects/listobject.c:list_repeat -11200 work/Objects/listobject.c:list_dealloc -28900 build/../torch/csrc/utils/python_arg_parser.cpp:torch::FunctionSignature::parse(_object, _object, _object*, bool) ``` Remaining TODOs: Include a timer in the generated script for cuda sync. * Add valgrind to CircleCI machines and add a unit test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44717 Reviewed By: soumith Differential Revision: D24010742 Pulled By: robieta fbshipit-source-id: df6bc765f8efce7193893edba186cd62b4b23623	2020-09-30 05:52:54 -07:00
Luca Wehrstedt	6a6c29c1c9	Update TensorPipe submodule (#37729 ) Summary: In order to include these fixes that were blocking https://github.com/pytorch/pytorch/pull/35483: - `673eda9efc` - `ff8d1733ad` - `c73367836f` Pull Request resolved: https://github.com/pytorch/pytorch/pull/37729 Reviewed By: beauby Differential Revision: D21378972 Pulled By: lw fbshipit-source-id: 3375fe1fa6e79817da3bb033127c3c8f31c3ffc3	2020-05-04 04:44:57 -07:00
Lucas Hosseini	8a30553738	[TensorPipe/RPC] Add TensorPipe dependency (#36695 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36695 Reviewed By: lw Differential Revision: D21312297 Pulled By: beauby fbshipit-source-id: 39fdc3de91efa4ac97dd169f09fb304b273b0050	2020-04-30 11:05:15 -07:00
Michael Suo	68895eda9d	add fmt, take 7 (#37356 ) Summary: fmt is a formatting library for C++. It has several properties that make it nice for inclusion in PyTorch: - Widely used - Basically copies how Python does it - Support for all the compilers and platforms we care about - Standards track (C++20) - Small code size - Header only This PR includes it as a submodule and sets up the build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37356 Differential Revision: D21262619 Pulled By: suo fbshipit-source-id: 1d9a1a5ed08a634213748e7b02fc718ef8dac4c9	2020-04-29 09:08:24 -07:00
Kimish Patel	0e52627358	Fixing pthreadpool symbol conflict issue. (#33869 ) Summary: Mainly renaming pthread_create of C2, the only one referred internally in NNPACK, that is conflicting, to pthread_create_c2. Removed 2 other conflicting symbols that are not used internally at all. Pointing XNNPACK to original repo instead of the fork. Copy pasted the new interface and implementation to caff2/utils/threadpool, so that for internal builds we compile against this. When threadpool is unified this will be removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33869 Differential Revision: D20140580 Pulled By: kimishpatel fbshipit-source-id: de70df0af9c7d6bc065e85ede0e1c4dd6a9e6be3	2020-02-28 21:23:18 -08:00
Ashkan Aliabadi	6aecfd1e80	Mobile Backend: NHWC memory layout + XNNPACK integration. (#33722 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33722 In order to improve CPU performance on floating-point models on mobile, this PR introduces a new CPU backend for mobile that implements the most common mobile operators with NHWC memory layout support through integration with XNNPACK. XNNPACK itself, and this codepath, are currently only included in the build, but the actual integration is gated with USE_XNNPACK preprocessor guards. This preprocessor symbol is intentionally not passed on to the compiler, so as to enable this rollout in multiple stages in follow up PRs. This changeset will build XNNPACK as part of the build if the identically named USE_XNNPACK CMAKE variable, defaulted to ON, is enabled, but will not actually expose or enable this code path in any other way. Furthermore, it is worth pointing out that in order to efficiently map models to these operators, some front-end method of exposing this backend to the user is needed. The less efficient implementation would be to hook these operators into their corresponding native implementations, granted that a series of XNNPACK-specific conditions are met, much like how NNPACK is integrated with PyTorch today for instance. Having said that, while the above implementation is still expected to outperform NNPACK based on the benchmarks I ran, the above integration would be leave a considerable gap between the performance achieved and the maximum performance potential XNNPACK enables, as it does not provide a way to compute and factor out one-time operations out of the inner most forward() loop. The more optimal solution, and one we will decide on soon, would involve either providing a JIT pass that maps nn operators onto these newly introduced operators, while allowing one-time calculations to be factored out, much like quantized mobile models. Alternatively, new eager-mode modules can also be introduced that would directly call into these implementations either through c10 or some other mechanism, also allowing for decoupling of op creation from op execution. This PR does not include any of the front end changes mentioned above. Neither does it include the mobile threadpool unification present in the original https://github.com/pytorch/pytorch/issues/30644. Furthermore, this codepath seems to be faster than NNPACK in a good number of use cases, which can potentially allow us to remove NNPACK from aten to make the codebase a little simpler, granted that there is widespread support for such a move. Regardless, these changes will be introduced gradually and in a more controlled way in subsequent PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32509 Test Plan: Build: CI Functionality: Not exposed Reviewed By: dreiss Differential Revision: D20069796 Pulled By: AshkanAliabadi fbshipit-source-id: d46c1c91d4bea91979ea5bd46971ced5417d309c	2020-02-24 21:58:56 -08:00
Ashkan Aliabadi	039dc90854	Revert D19521853: [pytorch][PR] Mobile Backend: NHWC memory layout + XNNPACK integration. Test Plan: revert-hammer Differential Revision: D19521853 Original commit changeset: 99a1fab31d0e fbshipit-source-id: 76dfc1f481797ba2386997533cf19957637687d6	2020-02-23 22:07:19 -08:00
Ashkan Aliabadi	941b42428a	Mobile Backend: NHWC memory layout + XNNPACK integration. (#32509 ) Summary: In order to improve CPU performance on floating-point models on mobile, this PR introduces a new CPU backend for mobile that implements the most common mobile operators with NHWC memory layout support through integration with XNNPACK. XNNPACK itself, and this codepath, are currently only included in the build, but the actual integration is gated with USE_XNNPACK preprocessor guards. This preprocessor symbol is intentionally not passed on to the compiler, so as to enable this rollout in multiple stages in follow up PRs. This changeset will build XNNPACK as part of the build if the identically named USE_XNNPACK CMAKE variable, defaulted to ON, is enabled, but will not actually expose or enable this code path in any other way. Furthermore, it is worth pointing out that in order to efficiently map models to these operators, some front-end method of exposing this backend to the user is needed. The less efficient implementation would be to hook these operators into their corresponding native implementations, granted that a series of XNNPACK-specific conditions are met, much like how NNPACK is integrated with PyTorch today for instance. Having said that, while the above implementation is still expected to outperform NNPACK based on the benchmarks I ran, the above integration would be leave a considerable gap between the performance achieved and the maximum performance potential XNNPACK enables, as it does not provide a way to compute and factor out one-time operations out of the inner most forward() loop. The more optimal solution, and one we will decide on soon, would involve either providing a JIT pass that maps nn operators onto these newly introduced operators, while allowing one-time calculations to be factored out, much like quantized mobile models. Alternatively, new eager-mode modules can also be introduced that would directly call into these implementations either through c10 or some other mechanism, also allowing for decoupling of op creation from op execution. This PR does not include any of the front end changes mentioned above. Neither does it include the mobile threadpool unification present in the original https://github.com/pytorch/pytorch/issues/30644. Furthermore, this codepath seems to be faster than NNPACK in a good number of use cases, which can potentially allow us to remove NNPACK from aten to make the codebase a little simpler, granted that there is widespread support for such a move. Regardless, these changes will be introduced gradually and in a more controlled way in subsequent PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32509 Reviewed By: dreiss Differential Revision: D19521853 Pulled By: AshkanAliabadi fbshipit-source-id: 99a1fab31d0ece64961df074003bb852c36acaaa	2020-02-23 19:08:42 -08:00
David Reiss	42faf961c8	Update fbjni submodule to new upstream and latest version Summary: The central fbjni repository is now public, so point to it and take the latest version, which includes support for host builds and some condensed syntax. Test Plan: CI Differential Revision: D18217840 fbshipit-source-id: 454e3e081f7e3155704fed692506251c4018b2a1	2019-10-31 11:48:25 -07:00
Hong Xu	ee6cdb5726	Upgrade sleef to v3.4.0. (#26749 ) Summary: This reset the sleef submodule to upstream, since everything else except a small build sanity fix <`191f655caa`> has been merged to upstream. The new release includes an important fix for trigonometric functions on MacOS, which would unblock https://github.com/pytorch/pytorch/issues/26431. This should supersede https://github.com/pytorch/pytorch/issues/20536. Close https://github.com/pytorch/pytorch/issues/20536. cc colesbury resistor Pull Request resolved: https://github.com/pytorch/pytorch/pull/26749 Differential Revision: D17572783 Pulled By: ezyang fbshipit-source-id: dd7827e8c8500a0050e3e318d184134c792d3ecc	2019-09-25 08:25:43 -07:00
Ivan Kobzarev	d62bca9792	jni-java wrapper for pytorchScript api (#25084 ) Summary: TLDR; initial commit of android java-jni wrapper of pytorchscript c++ api The main idea is to provide java interface for android developers to use pytorchscript modules. java API tries to repeat semantic of c++ and python pytorchscript API org.pytorch.Module (wrapper of torch::jit::script::Module) - static Module load(String path) - IValue forward(IValue... inputs) - IValue runMethod(String methodName, IValue... inputs) org.pytorch.Tensor (semantic of at::Tensor) - newFloatTensor(long[] dims, float[] data) - newFloatTensor(long[] dims, FloatBuffer data) - newIntTensor(long[] dims, int[] data) - newIntTensor(long[] dims, IntBuffer data) - newByteTensor(long[] dims, byte[] data) - newByteTensor(long[] dims, ByteBuffer data) org.pytorch.IValue (semantic of at::IValue) - static factory methods to create pytorchscript supported types Examples of usage api could be found in PytorchInstrumentedTests.java: Module module = Module.load(path); IValue input = IValue.tensor(Tensor.newByteTensor(new long[]{1}, Tensor.allocateByteBuffer(1))); IValue output = module.forward(input); Tensor outputTensor = output.getTensor(); ThreadSafety: Api is not thread safe, all synchronization must be done on caller side. Mutability: org.pytorch.Tensor buffer is DirectBuffer with native byte order, can be created with static factory methods specifing DirectBuffer. At the moment org.pytorch.Tensor does not hold at::Tensor on jni side, it has: long[] dimensions, type, DirectByteBuffer blobData Input tensors are mutable (can be modified and used for the next inference), Uses values from buffer on the momment of Module#forward or Module#runMethod calls. Buffers of input tensors is used directly by input at::Tensor Output is copied from output at::Tensor and is immutable. Dependencies: Jni level is implemented with usage of fbjni library, that was developed in Facebook, and was already used and opensourced in several opensource projects, added to the repo as submodule from personal account to be able to switch submodule when fbjni will be opensourced separately. ghstack-source-id: b39c848359a70d717f2830a15265e4aa122279c0 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25084 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25105 Reviewed By: dreiss Differential Revision: D16988107 Pulled By: IvanKobzarev fbshipit-source-id: 41ca7c9869f8370b8504c2ef8a96047cc16516d4	2019-08-23 10:42:44 -07:00
Jesse Hellemn	eb51131fb4	Revert D16423217: [pytorch][PR] Update sleef to master, fixes #20535 Differential Revision: D16423217 Original commit changeset: 587de3f10e83 fbshipit-source-id: 466e56eab73ce669cc179d08b7f39d2c8b0ffb34	2019-07-24 11:10:15 -07:00
Edward Yang	7203612f85	Update sleef to master, fixes #20535 (#23168 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/20535 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/23168 Differential Revision: D16423217 Pulled By: ezyang fbshipit-source-id: 587de3f10e839b94f51f673741b5fda8849e32f6	2019-07-24 08:18:14 -07:00
Ilia Cherniavskii	580eab6562	Restore TBB module (#20454 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20454 ghimport-source-id: 14aca1dedbe647d41e55e7538a6b7eeab0fc4384 Differential Revision: D15326062 Pulled By: ilia-cher fbshipit-source-id: 02b005a679b10dc7a264978e87a8d2bb98ab972f	2019-05-28 02:49:36 -07:00
Tongliang Liao	ecf012213b	Update submodule URL based on redirection. (#20973 ) Summary: Changes: - protobuf has been moved to protocolbuffers/protobuf a while ago. - cpuinfo has been moved to pytorch/cpuinfo and updated in FBGEMM recently. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20973 Differential Revision: D15511926 Pulled By: soumith fbshipit-source-id: 2c50373c9b245524f839bd1059870dd2b84e3b81	2019-05-26 22:29:21 -07:00
Zachary DeVito	785583a435	Use ignore=dirty in submodules. (#20135 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20135 ghimport-source-id: 73a07e07ed9485f80374262de2fb9b87e687a47a Differential Revision: D15214187 Pulled By: zdevito fbshipit-source-id: 2f2272f0ee7dad3935e6c31897a0b635b4e66133	2019-05-07 15:41:19 -07:00
Li Yu	a3933b87c6	Back out "Revert D14613517: [pytorch][PR] Updating onnxtrt submodule to master branch" (#18514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18514 Original commit changeset: d6267ddfc339 Reviewed By: bddppq Differential Revision: D14634476 fbshipit-source-id: 2633b0b4c512d71001e5c20cd79c0c0d7856f942	2019-03-26 23:44:33 -07:00
Li Yu	66e8c74814	Revert D14613517: [pytorch][PR] Updating onnxtrt submodule to master branch Differential Revision: D14613517 Original commit changeset: dd20d718db55 fbshipit-source-id: d6267ddfc339d04f182e2de1750a601c8d6bf8c6	2019-03-26 17:37:55 -07:00
Kevin Chen	bbe110f4e1	Updating onnxtrt submodule to master branch Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18441 Differential Revision: D14613517 Pulled By: bddppq fbshipit-source-id: dd20d718db55942df9cce7acd1151d6902bc57ff	2019-03-26 14:25:55 -07:00
Junjie Bai	0fe6e8c870	Remove ComputeLibrary submodule Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18052 Reviewed By: ezyang Differential Revision: D14477355 fbshipit-source-id: c56b802f6d69701596c327cf9af6782f30e335fa	2019-03-16 09:06:42 -07:00
Lu Fang	e6cf3c886d	add foxi submodule (#17184 )	2019-02-20 16:25:05 -05:00
Zachary DeVito	aefc83f46d	fixing some rebuild issues (#14969 ) Summary: This fixes rebuild issues with the ninja part of the build. With this patch all ninja files will now report `nothing to do` if nothing has changed assuming `BUILD_CAFFE2_OPS=0`. 1. This only does the python file processing for caffe2 when BUILD_CAFFE2_OPS=1, this part of the build file is written in such a way that it is always required to rerun and can take substantial time to move files around in the no-op build. In the future this part should be rewritten to use a faster method of copying the files or should treat copying the files as part of the build rules and only run when the files are out of date. 2. This points `sleef` to a patched version that fixes a dead build output that is causing everything to relink all the time. See https://github.com/shibatch/sleef/pull/231#partial-pull-merging for the upstream change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14969 Reviewed By: soumith Differential Revision: D13395998 Pulled By: zdevito fbshipit-source-id: ca85b7be9e99c5c578103c144ef0f2c3b927e724	2018-12-09 16:32:19 -08:00
Your Name	5e06fa0baf	ONNX changes to use int32_t (instead of enum) to store data type Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14926 Reviewed By: houseroad Differential Revision: D13390642 Pulled By: bddppq fbshipit-source-id: c2314b24d9384f188fda2b9a5cc16465ad39581e	2018-12-08 01:06:08 -08:00

1 2 3 4

159 Commits