pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
frdong	92770d25cd	fix comparison of narrow type with wide type in loop condition (#53951 ) Summary: fix Semmle warning: Comparison of narrow type with wide type in loop condition For example there is below piece of code: for (int i=0; i<array.size(); ++i) {} The problem is that array.size() return type is size_t can be larger type than int depending on the implementation so there is chance that i overflows (for very large array that array size is beyond the range of integer) and this loop will never be terminated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53951 Reviewed By: zou3519 Differential Revision: D27181495 Pulled By: malfet fbshipit-source-id: 0612c5cedcdc656c193085e7fbb87dd163f20688	2021-03-22 16:40:35 -07:00
Adam Simpkins	ccdcfba5de	[caffe2] Refactor tensor serialization function (#53404 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53404 This refactors `TensorSerializer::Serialize()` so that we have a separate helper function for each data type. This should make it slightly easier in the future to add new serialization formats for specific data types. ghstack-source-id: 124085413 Test Plan: Confirmed the existing tests pass. This diff is not expected to have any behavior changes. Reviewed By: mraway, glamtechie Differential Revision: D26658204 fbshipit-source-id: 232776262db6486ba845a7ba223e3987053dac27	2021-03-17 12:36:31 -07:00
Scott Wolchok	8a5b946ff6	[caffe2] Don't call TensorImpl::size() in dim32() (#53852 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53852 dim32() requires that its argument is in range, so we can use the faster `TensorImpl::sizes()` call instead. ghstack-source-id: 123784862 Test Plan: Ran MergeNet AdIndexer benchmark under perf stat. Before: ``` Performance counter stats for 'scripts/bwasti/static_runtime/run.sh' (5 runs): 7,008.70 msec task-clock # 0.997 CPUs utilized ( +- 0.25% ) 4,203 context-switches # 0.600 K/sec ( +- 14.71% ) 3 cpu-migrations # 0.000 K/sec 93,896 page-faults # 0.013 M/sec ( +- 0.80% ) 13,869,719,763 cycles # 1.979 GHz ( +- 0.23% ) (50.05%) 27,561,765,867 instructions # 1.99 insn per cycle ( +- 0.06% ) (50.04%) 4,288,245,412 branches # 611.846 M/sec ( +- 0.05% ) (50.01%) 19,633,433 branch-misses # 0.46% of all branches ( +- 0.83% ) (50.01%) # Table of individual measurements: 7.0670 (+0.0379) # 6.9897 (-0.0394) # 7.0203 (-0.0088) # 6.9829 (-0.0462) # 7.0856 (+0.0565) # # Final result: 7.0291 +- 0.0205 seconds time elapsed ( +- 0.29% ) ``` After: ``` Performance counter stats for 'scripts/bwasti/static_runtime/run.sh' (5 runs): 6,935.61 msec task-clock # 0.997 CPUs utilized ( +- 0.47% ) 2,913 context-switches # 0.420 K/sec ( +- 15.25% ) 3 cpu-migrations # 0.000 K/sec 92,628 page-faults # 0.013 M/sec ( +- 0.50% ) 13,724,940,495 cycles # 1.979 GHz ( +- 0.47% ) (50.01%) 27,226,217,974 instructions # 1.98 insn per cycle ( +- 0.02% ) (50.03%) 4,220,129,358 branches # 608.472 M/sec ( +- 0.06% ) (50.04%) 19,025,346 branch-misses # 0.45% of all branches ( +- 0.53% ) (50.04%) # Table of individual measurements: 6.9402 (-0.0145) # 6.8570 (-0.0978) # 6.9311 (-0.0236) # 7.0101 (+0.0554) # 7.0352 (+0.0805) # # Final result: 6.9547 +- 0.0315 seconds time elapsed ( +- 0.45% ) ``` Roughly 1% cycles win, which is outside the quoted noise level. Reviewed By: hlu1 Differential Revision: D26994107 fbshipit-source-id: f4c4963be0a5c268cbcdac5359f8278750218ae6	2021-03-12 16:22:29 -08:00
Adam Simpkins	33aaea912a	[caffe2] Support deserializing tensors using alternate serialization formats (#53403 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53403 This updates the `TensorProto` field to independently track the data type of the in-memory (deserialized) data from the serialized data format. This will allow us to support multiple different serialization formats in the future. For instance, we could choose to perform quantization of floating point data types, or varint encoding for integer fields. For now this diff does not actually change the serialization code path yet, and does not introduce any new serialization formats, but only refactors the deserialization code path to make it easier to introduce new formats. I'm not really that thrilled with the heavy use of macros and templates here, but I didn't really see better alternatives that made it as simple to specify new deserialization function implementations. ghstack-source-id: 123594220 Test Plan: Confirmed that the existing unit tests pass. This diff only touches the deserialization code path and not the serialization code to help ensure that the deserialization code works with the existing serialization logic, and that there are no changes to the current serialization format. Reviewed By: mraway Differential Revision: D26658206 fbshipit-source-id: d7297d600aee28b92fd9f4ece437b7f519060942	2021-03-12 11:35:15 -08:00
Adam Simpkins	91531d3047	[caffe2] add a CAFFE2_NODISCARD macro to help support old compilers (#53754 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53754 Some of the PyTorch CircleCI builds still use gcc 5.4, and compile with `-Werror=attributes` causing this old compiler to fail because it does not understand the `[[nodiscard]]` attribute. Let's define a `CAFFE2_NODISCARD` macro to work around this. ghstack-source-id: 123594084 Test Plan: I'm using this macro in subsequent diffs in the stack. Reviewed By: mraway Differential Revision: D26959584 fbshipit-source-id: c7ba94f7ea944b6340e9fe20949ba41931e11d41	2021-03-12 11:32:30 -08:00
Adam Simpkins	7e5ffbfa94	[caffe2] add a SerializationOptions field for the save operator (#53402 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53402 Add an `options` field to the `Save` operator which accepts options for how to serialize different blobs. At the moment this simply allows controlling the existing `chunk_size` behavior, but in the future we can add other options, such as the ability to control compression settings or other serialization formats. ghstack-source-id: 123567034 Test Plan: Added a new test to `load_save_test.py` that passes in options and verifies that blobs were serialized with the expected number of chunks. buck test caffe2/caffe2:caffe2_test_cpu \ caffe2/caffe2/core:serialization_test \ caffe2/caffe2/python/operator_test:load_save_test Reviewed By: mraway Differential Revision: D26502577 fbshipit-source-id: 6e302e530bb96990517c2e35c505db7f14a56284	2021-03-11 13:02:58 -08:00
Adam Simpkins	99d7c8ff94	[caffe2] use AddNAlreadyReserved() when serializing blobs (#53400 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53400 This is a reland of D26617038 (`b4a8d98247`) after rebasing onto D26802576 (`f595ba1bae`). Optimize the blob serialization code by using `AddNAlreadyReserved()` when serializing tensor data, rather than making N separate `Add()` calls. `AddNAlreadyReserved()` is a simple addition operation, while each `Add()` call checks to see if it needs to reserve new space, and then updates the element data, which is unnecessary in this case. ghstack-source-id: 123567030 Test Plan: This appears to improve raw serialization performance by 30 to 35% for float, double, and int64_t types which use this function. This improvement appears relatively consistent across large and small tensor sizes. Reviewed By: mraway Differential Revision: D26853941 fbshipit-source-id: 4ccaa5bc1dd7f7864068d71a0cde210c699cbdba	2021-03-10 15:27:52 -08:00
Scott Wolchok	b2758cdc77	[PyTorch] Don't copy vector arguments to caffe2::Tensor::Resize (#53389 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53389 Resize was written to take arguments by value, which was totally fine if they were ArrayRef or a series of integers, but not so fine if they're std::vector. ghstack-source-id: 123212128 Test Plan: Existing CI should make sure it builds Inspected assembly for ios_caffe.cc and saw no more vector copy before calling Resize Reviewed By: smessmer Differential Revision: D26852105 fbshipit-source-id: 9c3b9549d50d32923b532bbc60d0246e2c2b5fc7	2021-03-08 12:33:33 -08:00
Sam Estep	8c798e0622	Forbid trailing whitespace (#53406 ) Summary: Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857 These are the only hand-written parts of this diff: - the addition to `.github/workflows/lint.yml` - the file endings changed in these four files (to appease FB-internal land-blocking lints): - `GLOSSARY.md` - `aten/src/ATen/core/op_registration/README.md` - `scripts/README.md` - `torch/csrc/jit/codegen/fuser/README.md` The rest was generated by running this command (on macOS): ``` git grep -I -l ' $' -- . ':(exclude)/contrib/' ':(exclude)third_party' \| xargs gsed -i 's/ *$//' ``` I looked over the auto-generated changes and didn't see anything that looked problematic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406 Test Plan: This run (after adding the lint but before removing existing trailing spaces) failed: - https://github.com/pytorch/pytorch/runs/2043032377 This run (on the tip of this PR) succeeded: - https://github.com/pytorch/pytorch/runs/2043296348 Reviewed By: walterddr, seemethere Differential Revision: D26856620 Pulled By: samestep fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97	2021-03-05 17:22:55 -08:00
Giuseppe Ottaviano	69bb0e0285	[caffe2] Avoid some double (and triple) lookups in workspace (#53319 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53319 Noticed these in profiles. Also switch to `unordered_map`. Test Plan: Unit tests. Reviewed By: swolchok Differential Revision: D26504408 fbshipit-source-id: 9e14d55909a4af019058b8c27c67ee2348cd02a9	2021-03-04 22:57:02 -08:00
cyy	d8730194e7	use device methods (#52899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52899 Reviewed By: zou3519 Differential Revision: D26752203 Pulled By: albanD fbshipit-source-id: eaef89377999b20655fe85d5a38ca7a2c5882de7	2021-03-02 20:14:23 -08:00
Hao Lu	a296fa36ac	[Caffe2] Implement BlackBoxPredictor::BenchmarkIndividualOps (#52903 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52903 Implement BlackBoxPredictor::BenchmarkIndividualOps so that we can clean up the output tensors properly after each iteration and get more accurate per operator timing. Add four more metrics to track setup_time, memory_alloc_time, memory_dealloc_time, and output_dealloc_time. Reviewed By: ajyu Differential Revision: D26657473 fbshipit-source-id: 1cf282192b531513b9ee40b37252087818412f81	2021-02-27 19:49:22 -08:00
Natalia Gimelshein	21c3f6f415	Revert D26617038: [caffe2] use AddNAlreadyReserved() when serializing blobs Test Plan: revert-hammer Differential Revision: D26617038 (`b4a8d98247`) Original commit changeset: 97dedbae889d fbshipit-source-id: 6921d0a64dee26e18f16628773953bbe7280998e	2021-02-25 21:32:40 -08:00
Adam Simpkins	b4a8d98247	[caffe2] use AddNAlreadyReserved() when serializing blobs Summary: Optimize the blob serialization code by using `AddNAlreadyReserved()` when serializing tensor data, rather than making N separate `Add()` calls. `AddNAlreadyReserved()` is a simple addition operation, while each `Add()` call checks to see if it needs to reserve new space, and then updates the element data, which is unnecessary in this case. Test Plan: This appears to improve raw serialization performance by 30 to 35% for float, double, and int64_t types which use this function. This improvement appears relatively consistent across large and small tensor sizes. Differential Revision: D26617038 fbshipit-source-id: 97dedbae889d35463628f3016ac56986e685289e	2021-02-25 20:24:01 -08:00
Adam Simpkins	27d89057f8	[caffe2] fix deserialization of unknown tensor data_type values (#52411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52411 The `TensorDeserializer` code previously did not correctly handle unknown `data_type` values. It attempted to deserialize the data as floats, rather than recognizing that it did not understand the data type and erroring out. Google protobuf will never return unknown values for enum fields. If an unknown value is found in serialized data, the protobuf code discards it. As a result `has_data_type()` will return false, but `get_data_type()` will simply return the default value, which happens to be set to `FLOAT`. As a result if we ever encounter a serialized blob with an unknown data type the previous code would incorrectly think the data type was `FLOAT`. This fixes the code to check if the `data_type` value is present before reading it. ghstack-source-id: 121915981 Test Plan: Included a unit test that verifies this behavior. Confirmed that without this fix the code proceeded with the float deserialization code path. When deserializing int32_t data it fortunately did fail later due to an unexpected field length check, but this isn't guaranteed to be the case. In some cases it potentially could incorrectly succeed and return wrong data. Reviewed By: mraway Differential Revision: D26375502 fbshipit-source-id: 4f84dd82902e18df5e693f4b28d1096c96de7916	2021-02-17 19:13:43 -08:00
cyy	39aa3db62b	use make_shared and make_unique and clean unneeded code (#51829 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51829 Reviewed By: izdeby Differential Revision: D26306098 Pulled By: smessmer fbshipit-source-id: 4f6c0469c68f044c0bfe0925fcf7b030a25d15e2	2021-02-10 21:38:43 -08:00
Richard Barnes	fa325d7c9f	Use `sum_integers` and `multiply_integers` (#51146 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51146 Test Plan: Sandcastle tests Reviewed By: ngimel Differential Revision: D25903430 fbshipit-source-id: 329c14018c9e5192864eed88a8ed0a5068ff1c69	2021-02-10 18:05:45 -08:00
Kai Yang	fc314350ad	Make RebatchingBuffer compatible with auto shape inference Summary: no-op to operator behavior, resolve https://fburl.com/wte0v7tf Test Plan: buck test Reviewed By: huangyi1979 Differential Revision: D26333212 fbshipit-source-id: d237e8caf5977bc19fcced6aeedc6464fc905457	2021-02-09 12:37:26 -08:00
Taylor Robie	094d597679	raise windows tol to 30% (#51733 ) Summary: Up the Windows tolerance set by https://github.com/pytorch/pytorch/pull/35818, as CI is still showing some flakes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51733 Test Plan: CI Reviewed By: zou3519 Differential Revision: D26258005 Pulled By: robieta fbshipit-source-id: 864c848b7b31a05a2d07d1e683342b3202377c10	2021-02-04 14:09:10 -08:00
Ailing Zhang	621198978a	Move USE_NUMPY to more appropriate targets (#51143 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51143 Test Plan: CI Reviewed By: wconstab Differential Revision: D26084123 fbshipit-source-id: af4abe4ef87c1ebe5434938320526a925f5c34c8	2021-01-27 15:44:12 -08:00
Jane Xu	533cb9530e	Introducing TORCH_CUDA_CPP_API and TORCH_CUDA_CU_API to the code (#50627 ) Summary: Sub-step of my attempt to split up the torch_cuda library, as it is huge. Please look at https://github.com/pytorch/pytorch/issues/49050 for details on the split and which files are in which target. This PR introduces two new macros for Windows DLL purposes, TORCH_CUDA_CPP_API and TORCH_CUDA_CU_API. Both are defined as TORCH_CUDA_API for the time being. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50627 Reviewed By: mruberry Differential Revision: D25955441 Pulled By: janeyx99 fbshipit-source-id: ff226026833b8fb2fb7c77df6f2d6c824f006869	2021-01-21 19:09:11 -08:00
Jane Xu	71ca600af9	Renaming CAFFE2_API to TORCH_API (#49496 ) Summary: Since caffe2 and torch have been consolidated, CAFFE2_API should be merged with TORCH_API. Addresses a TODO. Manually edited some references of the removed `CAFFE2_API`: * `CONTRIBUTING.md` * `caffe2/proto/CMakeLists.txt` * `cmake/ProtoBuf.cmake` * `c10/macros/Export.h` * `torch/csrc/WindowsTorchApiMacro.h` Pull Request resolved: https://github.com/pytorch/pytorch/pull/49496 Reviewed By: malfet, samestep Differential Revision: D25600726 Pulled By: janeyx99 fbshipit-source-id: 7e068d959e397ac183c097d7e9a9afeca5ddd782	2020-12-18 10:54:50 -08:00
Sebastian Messmer	4431731c68	Making ops c10-full: Storage arguments (#49146 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49146 Add support for Storage arguments to IValue and the JIT typing system, and make ops that were blocked on that c10-full. ghstack-source-id: 118710665 (Note: this ignores all push blocking failures!) Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D25456799 fbshipit-source-id: da14f125af352de5fcf05a83a69ad5a69d5a3b45	2020-12-16 14:00:34 -08:00
Hao Lu	da6f249a10	[caffe2] DeserializeToNDArray (#49135 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49135 Differential Revision: D25417845 fbshipit-source-id: 4d8efd440bc2577fb717f911a401e7b81d48b907	2020-12-10 21:59:25 -08:00
Rong Rong	54022e4f9b	add new build settings to torch.__config__ (#48380 ) Summary: many newly added build settings are not saved in torch.__config__. adding them to the mix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48380 Reviewed By: samestep Differential Revision: D25161951 Pulled By: walterddr fbshipit-source-id: 1d3dee033c93f2d1a7e2a6bcaf88aedafeac8d31	2020-12-01 14:16:36 -08:00
Brian Hirsh	b5149513ec	migrate export_caffe2_op_to_c10.h macros to the new dispatcher registration API, update code_analyzer regex (#48308 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48308 The original regex that I added didn't correctly match namespaces that started with an underscore (e.g. `_test`), which caused a master-only test to fail. The only change from the previous commit is that I updated the regex like so: before: `^.TORCH_LIBRARY_IMPL_init_([^_]+)_([^_]+)_[0-9]+(\(.)?$` after: `^.TORCH_LIBRARY_IMPL_init_([_][^_]+)_([^_]+)_[0-9]+(\(.)?$` I added in a `[_]` to the beginning of the namespace capture. I did the same for the `_FRAGMENT` regex. Verified that running `ANALYZE_TEST=1 tools/code_analyzer/build.sh` (as the master-only test does) produces no diff in the output. Fixing regex pattern to allow for underscores at the beginning of the namespace This reverts commit 3c936ecd3c68f395dad01f42935f20ed8068da02. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D25123295 Pulled By: bdhirsh fbshipit-source-id: 54bd1e3f0c8e28145e736142ad62a18806bb9672	2020-11-30 13:05:33 -08:00
Brian Hirsh	3c936ecd3c	Revert D25056091: migrate export_caffe2_op_to_c10.h macros to the new dispatcher registration API Test Plan: revert-hammer Differential Revision: D25056091 (`0ea4982cf3`) Original commit changeset: 0f647ab9bc5e fbshipit-source-id: e54047b91d82df25460ee00482373c4580f94d50	2020-11-19 19:10:14 -08:00
Brian Hirsh	0ea4982cf3	migrate export_caffe2_op_to_c10.h macros to the new dispatcher registration API (#48097 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48097 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D25056091 Pulled By: bdhirsh fbshipit-source-id: 0f647ab9bc5e5aee497dac058df492f6e742cfe9	2020-11-19 17:56:56 -08:00
Max Ouellet	0f89be616a	Removing non-thread-safe log statement from ReinitializeTensor (#48185 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48185 In a scenario where we have Caffe2 wrapped into a dynamic library, we were running into the memory corruption crash at program termination: "corrupted size vs. prev_size in fastbins" Turns out the crash occurs in glog's logging.cc, which is not thread-safe and has to initialize a static hostname string when flushing. If this ends up happening on multiple threads simultaneously, this can lead to a memory corruption. ``` ==1533667== Invalid free() / delete / delete[] / realloc() ==1533667== at 0xA3976BB: operator delete(void, unsigned long) (vg_replace_malloc.c:595) ==1533667== by 0x37E36AE: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() (basic_string.h:647) ==1533667== by 0xAD87 (`97b9712aed`)F6B: __run_exit_handlers (in /usr/lib64/libc-2.28.so) ==1533667== by 0xAD8809 (`153e2e96d4`)F: exit (in /usr/lib64/libc-2.28.so) ==1533667== by 0xAD71799: (below main) (in /usr/lib64/libc-2.28.so) ==1533667== Address 0x165cd720 is 0 bytes inside a block of size 31 free'd ==1533667== at 0xA3976BB: operator delete(void, unsigned long) (vg_replace_malloc.c:595) ==1533667== by 0x37E36AE: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() (basic_string.h:647) ==1533667== by 0xAD87 (`97b9712aed`)F6B: __run_exit_handlers (in /usr/lib64/libc-2.28.so) ==1533667== by 0xAD8809 (`153e2e96d4`)F: exit (in /usr/lib64/libc-2.28.so) ==1533667== by 0xAD71799: (below main) (in /usr/lib64/libc-2.28.so) ==1533667== Block was alloc'd at ==1533667== at 0xA39641F: operator new(unsigned long) (vg_replace_malloc.c:344) ==1533667== by 0x37F4E18: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_mutate(unsigned long, unsigned long, char const, unsigned long) (basic_string.tcc:317) ==1533667== by 0x37F4F2E: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_replace(unsigned long, unsigned long, char const, unsigned long) (basic_string.tcc:466) ==1533667== by 0x5170344: GetHostName(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) (logging.cc:227) ==1533667== by 0x51702D4 (`fc7f026980`): google::LogDestination::hostname[abi:cxx11]() (logging.cc:555) ==1533667== by 0x5173789: google::(anonymous namespace)::LogFileObject::Write(bool, long, char const, int) (logging.cc:1072) ==1533667== by 0x51746DF: google::LogDestination::LogToAllLogfiles(int, long, char const, unsigned long) (logging.cc:773) ==1533667== by 0x5170BDC: google::LogMessage::SendToLog() (logging.cc:1386) ==1533667== by 0x5171236: google::LogMessage::Flush() (logging.cc:1305) ==1533667== by 0x517114D: google::LogMessage::~LogMessage() (logging.cc:1264) ==1533667== by 0x108DC840: caffe2::ReinitializeTensor(caffe2::Tensor, c10::ArrayRef<long>, c10::TensorOptions) (tensor.cc:0) ==1533667== by 0x103BBED0: caffe2::int8::Int8GivenTensorFillOp::RunOnDevice() (int8_given_tensor_fill_op.h:29) ==1533667== ``` There doesn't seem to be an obvious easy solution here. The logging API being used by c10 is fundamentally not thread-safe, at least when it uses glog. Glog does have a threadsafe API (raw_logging), but this doesn't seem to be used by c10 right now. I suspect other callers are not running into this crash because: - They have other libraries using glog in their module, so the static variable in glog gets initialized before getting into a race condition - They don't use int8 network in a glog context, thus avoiding this problematic log statement An alternative fix would be to correctly initialize the dtype of the int8 tensor, which is currently always uninitialized, making the log statement always trigger for int8 networks. Initializing the int8 tensor correctly in tensor_int8.h is proving to be challenging though, at least without knowledge of Caffe2's codebase. And even then, it wouldn't fix the issue for all use cases. Test Plan: Ran my app with valgrind, I no longer get the crash and valgrind doesn't complain about a memory corruption anymore Reviewed By: thyu, qizzzh Differential Revision: D25040725 fbshipit-source-id: 1392a97ccf9b4c9ade1ea713610ee44a1578ae7d	2020-11-18 17:42:22 -08:00
Ankur Singla	549ef1d668	[caffe][memonger] Extend operator schema check to dag memonger (#48021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48021 Extending operator schema check for simple memonger to dag memonger as well. As part of this a fix is being made to handle inplace ops (having at least one output name same as input blob). Earlier all the output blobs from ops were being treated as shareable but it failed assertion of external input blobs with the same name not allowed to share. Test Plan: Added corresponding unit tests Reviewed By: hlu1 Differential Revision: D24968862 fbshipit-source-id: b6679a388a82b0d68f65ade64b85560354aaa3ef	2020-11-16 19:17:55 -08:00
Tristan Rice	825ee7e7f8	[caffe2] plan_executor_test: add test case for should_stop loops (#47613 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47613 This is to test some more cancellation edge cases that were missing before. It passes under the current code. Test Plan: buck test mode/opt caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest --stress-runs 10 Reviewed By: dahsh Differential Revision: D24836956 fbshipit-source-id: 3b00dc081cbf4f26e7756d597099636edb49d256	2020-11-16 12:59:13 -08:00
Richard Barnes	c543b3b582	Fix a downcast (#47919 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47919 Suppresses a downcast warning. Test Plan: Reproduces with ``` buck test mode/dev-nosan //caffe2/torch/fb/sparsenn:gpu_test ``` Reviewed By: suphoff Differential Revision: D24866987 fbshipit-source-id: 44f19ab37a7d95abe08f570abfebc702827a2510	2020-11-13 22:26:29 -08:00
Ankur Singla	f743b5639a	[caffe2][memonger] Add support for distributed inference predict nets in DAG memonger (#47718 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47718 Distributed Inference splits a predict net into multiple parts, part0 being the main part which contains ops to make remote calls to other parts. part0 predict net may contain AsyncIf ops to optimize rpc call usage. AsyncIf ops have internal nets which may refer to memongered blobs. This change handles AsyncIf ops to update internal nets to refer to memongered blobs. As part of this change, I am also updating dag memonger traversal to always start from root op, i.e. ops with 0 in degree. Earlier logic will start traversing ops based on input head blobs and if one of the head inputs is getting used in a non-root op which gets visited before its parent, the traversal will throwing assertion error here: https://fburl.com/diffusion/ob110s9z . Almost for all the distributed inference part0 nets, it was throwing this assertion error. Test Plan: Added corresponding tests in memonger_test.py . Could not find unit tests in c++ version of memonger. Reviewed By: hlu1 Differential Revision: D24872010 fbshipit-source-id: 1dc99b2fb52b2bc692fa4fc0aff6b7e4c5e4f5b0	2020-11-13 14:12:07 -08:00
Richard Zou	17c58720fe	Revert D24346771: [caffe2][memonger] Add support for distributed inference predict nets in DAG memonger Test Plan: revert-hammer Differential Revision: D24346771 (`5882f2e540`) Original commit changeset: ad2dd2e63f3e fbshipit-source-id: 90346f08c890eebe71f068748a8e24e4db88c250	2020-11-10 12:11:22 -08:00
Ankur Singla	5882f2e540	[caffe2][memonger] Add support for distributed inference predict nets in DAG memonger Summary: Distributed Inference splits a predict net into multiple parts, part0 being the main part which contains ops to make remote calls to other parts. part0 predict net may contain AsyncIf ops to optimize rpc call usage. AsyncIf ops have internal nets which may refer to memongered blobs. This change handles AsyncIf ops to update internal nets to refer to memongered blobs. Here is one reference part0 predict net with AsyncIf ops: https://www.internalfb.com/intern/paste/P145812115/ As part of this change, I am also updating dag memonger traversal to always start from root op, i.e. ops with 0 in degree. Earlier logic will start traversing ops based on input head blobs and if one of the head inputs is getting used in a non-root op which gets visited before its parent, the traversal will throwing assertion error here: https://fburl.com/diffusion/ob110s9z . Almost for all the distributed inference part0 nets, it was throwing this assertion error. Reviewed By: hlu1 Differential Revision: D24346771 fbshipit-source-id: ad2dd2e63f3e822ad172682f6d63f8474492255d	2020-11-10 09:35:28 -08:00
Basil Hosmer	f05b66b70d	pass TypeMeta by value (#45026 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45026 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23802943 Pulled By: bhosmer fbshipit-source-id: 81b06ef00bf8eb4375c0e0ff2032e03bd1d1188a	2020-10-30 10:14:17 -07:00
Hao Lu	51bf7bed84	[caffe2] Allow memonger to optimize nets with inplace(enforced) ops (#46560 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46560 Follow-up for D24236604 (`16c52d918b`). For nets that pass the schema check, memonger actually makes sure to preserve the inplaceness of operators if they are already inplace. So we can safely enable it for correct input nets. (Note: this ignores all push blocking failures!) Differential Revision: D24402482 fbshipit-source-id: a7e95cb0e3eb87adeac79b9b69eef207957b0bd5	2020-10-22 13:23:33 -07:00
Richard Barnes	c44300884e	Clarify timing of GetDeviceProperty() (#46715 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46715 Test Plan: N/A Reviewed By: ezyang Differential Revision: D24455538 fbshipit-source-id: 1770807d178f618ef6338e28f669f09e4cbd2009	2020-10-22 11:29:31 -07:00
Tristan Rice	0c9787c758	caffe2: use at::mt19937 instead of std::mt19937 (10x speedup) (#43987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43987 This replaces the caffe2 CPU random number (std::mt19937) with at::mt19937 which is the one currently used in pytorch. The ATen RNG is 10x faster than the std one and appears to be more robust given bugs in the std (https://fburl.com/diffusion/uhro7lqb) For large embedding tables (10GB+) we see UniformFillOp taking upwards of 10 minutes as we're bottlenecked on the single threaded RNG. Swapping to at::mt19937 cuts that time to 10% of the current. Test Plan: Ran all relevant tests + CI. This doesn't introduce new features (+ is a core change) so existing tests+CI should be sufficient to catch regressions. Reviewed By: dzhulgakov Differential Revision: D23219710 fbshipit-source-id: bd16ed6415b2933e047bcb283a013d47fb395814	2020-10-16 16:08:35 -07:00
Tristan Rice	dd169ca17c	caffe2/plan_executor: propagate exceptions from reporter substeps (#46424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46424 Currently if an exception occurs in a reporter thread the process is killed via std::terminate. This adds support for handling the reporter exception if FLAGS_caffe2_handle_executor_threads_exceptions is set to true. Test Plan: buck test mode/opt -c python.package_style=inplace //caffe2/caffe2/python:hypothesis_test //caffe2/caffe2:caffe2_test_cpu -- --stress-runs 100 Reviewed By: dahsh Differential Revision: D24345027 fbshipit-source-id: 0659495c9e27680ebae41fe5a3cf26ce2f455cb3	2020-10-16 12:28:57 -07:00
Hao Lu	16c52d918b	[caffe2] Bypass memonger for in-place ops (#46378 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46378 Reviewed By: dzhulgakov Differential Revision: D24236604 fbshipit-source-id: 9f599687467ea969e89243482f8e2a41f7db0a23	2020-10-15 16:03:52 -07:00
Danny Huang	85c3ba5588	[caffe2] add PlanExecutorTest ErrorPlanWithCancellableStuckNet (#46110 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46110 ## Motivation * `Cancel` is now added to `OperatorBase` and `NetBase` (https://github.com/pytorch/pytorch/pull/44145). * We need a test to cover and exhibit that we can cancel stuck net and propagate error with plan executor. ## Summary * Added PlanExecutorTest `ErrorPlanWithCancellableStuckNet` for plan executor. * Set cancelCount to zero at the beginning of tests to avoid global state be carried over in some test environment. Test Plan: ## Unit Test Added ``` buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest --stress-runs 1000 ``` Reviewed By: d4l3k Differential Revision: D24226577 fbshipit-source-id: c834383bfe6ab50747975c229eb42a363eed3458	2020-10-12 12:00:15 -07:00
Danny Huang	87226f72d2	[caffe2] temp remove ErrorPlanWithCancellableStuckNet (#46080 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46080 temp removal of ErrorPlanWithCancellableStuckNet, will fill out more Test Plan: ``` buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest ``` remove a test Reviewed By: fegin Differential Revision: D24213971 fbshipit-source-id: e6e600bad00b45c726311193b4b3238f1700526e	2020-10-08 23:35:45 -07:00
Danny Huang	487624e369	[caffe2] plan executor error propagation test with blocking cancellable op (#45319 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45319 ## Motivation * `Cancel` is now added to `OperatorBase` and `NetBase` (https://github.com/pytorch/pytorch/pull/44145) * We need a test to cover and exhibit that we can cancel stuck net and propagate error with plan executor. ## Summary * Added `ErrorPlanWithCancellableStuckNet` for plan executor. * We set a plan with two nets: one stuck net with blocking operator that never returns, and one with error net with error op that throws, and tested it throw and cancel. Test Plan: ## Unit Test added ``` buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest --stress-runs 100 ``` ``` Summary Pass: 400 ListingSuccess: 2 ``` Reviewed By: d4l3k Differential Revision: D23920548 fbshipit-source-id: feff41f73698bd6ea9b744f920e0fece4ee44438	2020-10-08 19:54:49 -07:00
Tristan Rice	59e4803b94	Recommit: caffe2/plan_executor: wait for 1 minute after exception and then abort (#45981 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45981 This is a recommit of previously reverted D20850851 (`3fbddb92b1`). TL;DR - combining condition_variables and atomics is a bad idea https://stackoverflow.com/questions/49622713/c17-atomics-and-condition-variable-deadlock This also adds some ifdefs to disable the death test for mobile, xplat and tsan builds since forking doesn't play nicely with them. Test Plan: buck test mode/opt //caffe2/caffe2/python:hypothesis_test -- --stress-runs 1000 test_atomic_iter_with_concurrent_steps --timeout 120 buck test mode/opt //caffe2/caffe2/python:hypothesis_test -- --stress-runs 100 buck test mode/opt caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest --stress-runs 100 no timeouts https://www.internalfb.com/intern/testinfra/testconsole/testrun/7036874440059883/ will ensure no timeouts in OSS Reviewed By: walterddr, dahsh Differential Revision: D24165505 fbshipit-source-id: 17cd23bfbcd9c2826a4067a387023d5186353196	2020-10-08 14:17:30 -07:00
Rong Rong	1bb2d41b68	Revert D20850851: caffe2/plan_executor: wait for 1 minute after exception and then abort Test Plan: revert-hammer Differential Revision: D20850851 (`3fbddb92b1`) Original commit changeset: 330503775d80 fbshipit-source-id: 612c6c3c4d5586bc8ad00a112cd00fc74fb44243	2020-10-07 09:04:24 -07:00
Tristan Rice	3fbddb92b1	caffe2/plan_executor: wait for 1 minute after exception and then abort (#45297 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45297 If we have two concurrent substeps and one of them throws an exception and the other is blocking, we'll currently hang. This waits up to 1 minute for it to complete before terminating the process. Test Plan: buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest --stress-runs 100 Reviewed By: dahsh Differential Revision: D20850851 fbshipit-source-id: 330503775d8062a34645ba55fe38e6770de5e3c7	2020-10-06 12:59:09 -07:00
Sebastian Messmer	2ac7de7d53	Remove hacky_wrapper from BackendSelect kernels (#44062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44062 Previously, BackendSelect kernels were still written in the legacy way, i.e. they took one TensorOptions argument instead of scattered dtype, layout, device, pin_memory, and they used hacky_wrapper to be callable. This caused a re-wrapping step. Calling into a BackencSelect kernel required taking the individual scattered arguments, packing them into a TensorOptions, and the kernel itself then gathered them again for redispatch. Now with this PR, BackendSelect kernels are written in the new way and no hacky_wrapper or rewrapping is needed for them. ghstack-source-id: 112825789 Test Plan: vs master: https://www.internalfb.com/intern/fblearner/details/216117032/ vs previous diff: https://www.internalfb.com/intern/fblearner/details/216170194/ Reviewed By: ezyang Differential Revision: D23484192 fbshipit-source-id: e8fb49c4692404b6b775d18548b990c4cdddbada	2020-09-25 09:04:03 -07:00
Bugra Akyildiz	27c7158166	Remove __future__ imports for legacy Python2 supports (#45033 ) Summary: There is a module called `2to3` which you can target for future specifically to remove these, the directory of `caffe2` has the most redundant imports: ```2to3 -f future -w caffe2``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45033 Reviewed By: seemethere Differential Revision: D23808648 Pulled By: bugra fbshipit-source-id: 38971900f0fe43ab44a9168e57f2307580d36a38	2020-09-23 17:57:02 -07:00
Nikita Shulga	2ae74c0632	Compile less legacy code when BUILD_CAFFE2 is set to False (take 2) (#44453 ) Summary: 2nd attempt to land https://github.com/pytorch/pytorch/pull/44079 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44453 Reviewed By: walterddr, seemethere Differential Revision: D23619528 Pulled By: malfet fbshipit-source-id: c7c206ebd327dcf3994789bd47008b05ff862fe7	2020-09-11 16:27:47 -07:00

... 2 3 4 5 6 ...

1480 Commits