590 Commits

Author SHA1 Message Date
fb18c29486 [BE] Tweak Meta copyright headers (#90805)
s/Facebook, Inc./Meta Platforms, Inc/
s/Confidential and proprietary./This source code is licensed under the BSD-style license/

Per https://www.internalfb.com/intern/wiki/Open_Source/Licenses/Straight_BSD/

Also, add linter that prevents adding those in the future

Fixes https://github.com/pytorch/pytorch/issues/90187
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90805
Approved by: https://github.com/zpao
2022-12-14 20:30:31 +00:00
d3a3604581 [pthreadpool] Don't recreate threadpool if the counts are same (#90478)
Summary: Don't do anything if the incoming count and current threadpool size are same

Test Plan: CI

Reviewed By: salilsdesai

Differential Revision: D41628132

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90478
Approved by: https://github.com/salilsdesai
2022-12-10 03:17:08 +00:00
f2d95765e4 [pthreadpool] Set max threadlimit to tsan limit (#89453)
Summary:
This will make sure we don't run into an internal assert for clang tsan which has a cap of 63 on concurrently held lock count.
Seems like it is failing with 64 since the comparison is `<`, so setting it to 63 here.

```
llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_deadlock_detector.h:67 "((n_all_locks_)) < (((sizeof(all_locks_with_contexts_)/sizeof((all_locks_with_contexts_)[0]))))"
```

Created from CodeHub with https://fburl.com/edit-in-codehub

Test Plan:
CI

Sandcastle run

Reviewed By: kimishpatel, salilsdesai

Differential Revision: D41444710

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89453
Approved by: https://github.com/mcr229
2022-12-08 02:02:53 +00:00
7a3afe61d2 Check all CUDA API calls for errors in caffe2/ (#81816)
Test Plan: Sandcastle

Differential Revision: D35194868

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81816
Approved by: https://github.com/ezyang
2022-10-28 00:41:06 +00:00
e0229d6517 Remove caffe2 mobile (#84338)
We're no longer building Caffe2 mobile as part of our CI, and it adds a lot of clutter to our make files. Any lingering internal dependencies will use the buck build and so wont be effected.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84338
Approved by: https://github.com/dreiss
2022-09-08 01:49:55 +00:00
d79ccb7b45 [pthreadpool] Cap max thread count to fix TSAN issues (#83950)
Summary: Cap the thread count to 64 unconditionally to solve this tsan issue which leads to harder to debug, flaky test failures.

Test Plan: CI

Reviewed By: kimishpatel

Differential Revision: D38136212

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83950
Approved by: https://github.com/kimishpatel
2022-08-24 18:17:27 +00:00
4f34cd6d1e Replace all CHECK_ and DCHECK_ with TORCH_* macros (#82032)
Avoid exposing defines that conflict with google logging, since this blocks external usage of libtorch in certain cases.

All the 'interesting' changes should be in these two files, and the rest should just be mechanical changes via sed.
c10/util/logging_is_not_google_glog.h
c10/util/logging_is_google_glog.h

Fixes https://github.com/pytorch/pytorch/issues/81415

cc @miladm @malfet
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82032
Approved by: https://github.com/soumith, https://github.com/miladm
2022-07-26 01:20:44 +00:00
e3c9cb675a Explicitly convert to double for comparison (#79964)
Otherwise it errors on ONNX clang10 build: https://github.com/pytorch/pytorch/runs/6976721937?check_suite_focus=true
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79964
Approved by: https://github.com/malfet
2022-06-21 22:46:52 +00:00
de9fd07093 Relax the thread count assertion, that is modify EXPECT_EQ -> EXPECT_GE (#79806)
Summary:
D36484910 added logarithm integration which ended up starting a new thread with the following stacktrace and causing the [caffe2/caffe2:caffe2_test_cpu_asan_no_sig tests](https://fburl.com/test/jk58c0el) to fail:

    SIGINT(2), PID: 1229124, Thread 1229126:
     # 0  c10::get_backtrace[abi:cxx11](unsigned long, unsigned long, bool)
    # 1  c10::FatalSignalHandler::stacktraceSignalHandler(bool)
    # 2  c10::FatalSignalHandler::stacktraceSignalHandler(int, siginfo_t*, void*)
    # 3  c10::FatalSignalHandler::stacktraceSignalHandlerStatic(int, siginfo_t*, void*)
    # 4  0x0000000000000000
    # 5  __GI___futex_abstimed_wait_cancelable64
    # 6  __GI___pthread_cond_wait
    # 7  std::condition_variable::wait(std::unique_lock<std::mutex>&)
    # 8  folly::AsyncLogWriter::ioThread()
    # 9  folly::AsyncLogWriter::restartThread()::$_4::operator()() const
    # 10 void std::__invoke_impl<void, folly::AsyncLogWriter::restartThread()::$_4>(std::__invoke_other, folly::AsyncLogWriter::restartThread()::$_4&&)
    # 11 std::__invoke_result<folly::AsyncLogWriter::restartThread()::$_4>::type std::__invoke<folly::AsyncLogWriter::restartThread()::$_4>(folly::AsyncLogWriter::restartThread()::$_4&&)
    # 12 void std:🧵:_Invoker<std::tuple<folly::AsyncLogWriter::restartThread()::$_4> >::_M_invoke<0ul>(std::_Index_tuple<0ul>)
    # 13 std:🧵:_Invoker<std::tuple<folly::AsyncLogWriter::restartThread()::$_4> >::operator()()
    # 14 std:🧵:_State_impl<std:🧵:_Invoker<std::tuple<folly::AsyncLogWriter::restartThread()::$_4> > >::_M_run()
    # 15 execute_native_thread_routine
    # 16 start_thread
    # 17 __GI___clone

Reviewed By: dustinh1999

Differential Revision: D37251436

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79806
Approved by: https://github.com/voznesenskym
2022-06-21 00:05:48 +00:00
5932c37198 [caffe2] drop XROS ports (#76366)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76366

caffe2 is not currently being built for XROS.

Test Plan: CI

Reviewed By: kimishpatel

Differential Revision: D35923922

fbshipit-source-id: 260dacadf0bd5b6bab7833a4ce81e896d280b053
(cherry picked from commit 8370b8dd2519d55a79fa8d45e7951ca8dc0b21a8)
2022-04-26 23:54:22 +00:00
6c03f8d9e5 Drop unused variables and add some const (#71106)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71106

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D33490855

fbshipit-source-id: 9fc4a4e4a7ad5e6c31f394ec6d8221b964fdf043
2022-01-11 12:38:59 -08:00
704af23ee4 Use a reference in GetSingleArgument (#71007)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71007

A string copy at Line 417 is currently consuming 125,749,287,000 cycles/day. I suspect the issue is with a copy-on-return, but we can experiment with introducing a reference in the middle to see if that produces a good savings without changing the interface.

Reference
```
["Inline caffe2::ArgumentHelper::GetSingleArgument @ caffe2/caffe2/utils/proto_utils.cc:417"]
```

Test Plan: Sandcastle

Reviewed By: xw285cornell

Differential Revision: D33478883

fbshipit-source-id: e863e359c0c718fcd0d52fd4b3c7858067de0670
2022-01-07 20:18:56 -08:00
2d38d37f5f use irange for loops (#69533)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69533

Modified loops in files under fbsource/fbcode/caffe2/ from the format
```
for(TYPE var=x0;var<x_max;x++)
```
to the format
```
for(const auto var: irange(xmax))
```

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D32837942

fbshipit-source-id: 8663037a38ade8f81bd5e983a614d197ea11f0d1
2021-12-07 16:53:27 -08:00
f587267dc7 Revert D31705359: use irange for loops 8
Test Plan: revert-hammer

Differential Revision:
D31705359 (17e5200441)

Original commit changeset: c9ea2fbc0f9c

fbshipit-source-id: 08fff2d12beca953ad30dd0baabf86e39ac84f14
2021-12-02 12:55:08 -08:00
17e5200441 use irange for loops 8 (#66743)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66743

Modified loops in files under fbsource/fbcode/caffe2/ from the format

`for(TYPE var=x0;var<x_max;x++)`

to the format

`for(const auto var: irange(xmax))`

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D31705359

fbshipit-source-id: c9ea2fbc0f9cd29e97a52dcb203addc5f2abb09b
2021-12-02 10:21:29 -08:00
b8dfb45ac2 Refactor cub namespace handling (#66219)
Summary:
This PR is to update PyTorch with the following cub changes:
- Starting cub 1.13.1, cub requires users to define `CUB_NS_QUALIFIER` if `CUB_NS_PREFIX` is also defined. Besides that, a new mechanism `CUB_WRAPPED_NAMESPACE` is added.

And I do the following change to PyTorch:
- Starting CUDA 11.5, define `CUB_WRAPPED_NAMESPACE` globally as an nvcc flag.
- Fix caffe2 failures caused by the above change.
- Add a `aten/src/ATen/cuda/cub_definitions.cuh` that defines helper macros about feature availability.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66219

Reviewed By: bdhirsh

Differential Revision: D31626931

Pulled By: ngimel

fbshipit-source-id: 97ebf5ef671ade8bf46d0860edc317f22660f26d
2021-10-25 14:37:09 -07:00
77beccaedb Do not build PyTorch with caffe2 by default (#66658)
Summary:
CAFFE2 has been deprecated for a while, but still included in every PyTorch build.
We should stop building it by default, although CI should still validate that caffe2 code is buildable.

Build even fewer dependencies when compiling mobile builds without Caffe2
Introduce `TEST_CAFFE2` in torch.common.utils
Skip `TestQuantizedEmbeddingOps` and `TestJit.test_old_models_bc`  is code is compiled without Caffe2
Should be landed after https://github.com/pytorch/builder/pull/864

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66658

Reviewed By: driazati, seemethere, janeyx99

Differential Revision: D31669156

Pulled By: malfet

fbshipit-source-id: 1cc45e2d402daf913a4685eb9f841cc3863e458d
2021-10-21 20:32:47 -07:00
2f099c7555 Revert D30652629: use irange for loops
Test Plan: revert-hammer

Differential Revision:
D30652629 (687c2267d4)

Original commit changeset: 0ae6c4bbbb55

fbshipit-source-id: 5c4f067b584a021c8c9656454d1ee60999600fb3
2021-10-15 15:23:10 -07:00
687c2267d4 use irange for loops (#66234)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66234

Modified loops in files under fbsource/fbcode/caffe2/ from the format

`for(TYPE var=x0;var<x_max;x++)`

to the format

`for(const auto var: irange(xmax))`

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

bypass_size_limit
allow-large-files

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D30652629

fbshipit-source-id: 0ae6c4bbbb554bad42e372792a6430e1acf15e3e
2021-10-15 13:50:33 -07:00
eb3b9fe719 [XROS][ML] System specific adjustments for UTs to work. (#65245)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65245

Building and running c10 and qnnpack tests on XROS.

Notable changes:
- Adding #if define(_XROS_) in few places not supported by XROS
- Changing Threadpool to abstract class
ghstack-source-id: 139513579

Test Plan: Run c10 and qnnpack tests on XROS.

Reviewed By: veselinp, iseeyuan

Differential Revision: D30137333

fbshipit-source-id: bb6239b935187fac712834341fe5a8d3377762b1
2021-10-01 18:15:14 -07:00
085e2f7bdd [ROCm] Changes not to rely on CUDA_VERSION or HIP_VERSION (#65610)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65610

- Replace HIP_PLATFORM_HCC with USE_ROCM
- Dont rely on CUDA_VERSION or HIP_VERSION and use USE_ROCM and ROCM_VERSION.

- In the next PR
   - Will be removing the mapping from CUDA_VERSION to HIP_VERSION and CUDA to HIP in hipify.
   - HIP_PLATFORM_HCC is deprecated, so will add HIP_PLATFORM_AMD to support HIP host code compilation on gcc.

cc jeffdaily sunway513 jithunnair-amd ROCmSupport amathews-amd

Reviewed By: jbschlosser

Differential Revision: D30909053

Pulled By: ezyang

fbshipit-source-id: 224a966ebf1aaec79beccbbd686fdf3d49267e06
2021-09-29 09:55:43 -07:00
03a58a2ba0 [Caffe2] Create fewer strings during argument fetching (#64285)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64285

With C++14 heterogeneous ordered container lookup, it is no longer necessary to create a `std::string` in order to look up elements of a `CaffeMap` keyed by std::string. Accordingly, this diff reworks the argument-getting operator functions to avoid that in favor of `c10::string_view`.
ghstack-source-id: 137139818
ghstack-source-id: 137139818

Test Plan: buildsizebot iOS apps -- code size win. less strings is probably marginally good for perf but this only happens at setup time anyway.

Reviewed By: dzhulgakov

Differential Revision: D26826676

fbshipit-source-id: ee653b14dc2c528bae8c90f0fc6a7a419cbca1d6
2021-09-01 13:30:54 -07:00
ab7a472980 [ROCm] Update HIP_VERSION to TORCH_HIP_VERSION (#62786)
Summary:
- HIP_VERSION semantic versioning will change in ROCm4.3. The changes essentially remove the dependency on HIP_VERSION provided in the hip header to keep code compatible with older and newer versions of ROCm.
- TORCH_HIP_VERSION is derived from HIP_VERSION_MAJOR and HIP_VERSION_MINOR

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62786

Reviewed By: bdhirsh

Differential Revision: D30281682

Pulled By: seemethere

fbshipit-source-id: e41e69fb9e13de5ddd1af99ba5bbdcbb7b64b673
2021-08-13 15:00:43 -07:00
f82d4b8957 Mark unused functions with C10_UNUSED (#62929)
Summary:
Which fixes number of warnings

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62929

Reviewed By: walterddr, albanD

Differential Revision: D30171953

Pulled By: malfet

fbshipit-source-id: f82475289ff4aebb0c97794114e94a24d00d2ff4
2021-08-09 13:00:33 -07:00
08f6bc1da6 Stop exporting symbols in anonymous namespaces (#62952)
Summary:
The cases are found out by compiling against clang on Windows.
Those functions will still be exported under this case, which is a waste of space in the symbol table.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62952

Reviewed By: gchanan

Differential Revision: D30191291

Pulled By: ezyang

fbshipit-source-id: 3319b0ec4f5fb02e0fe1b81dbbcedcf12a0c795e
2021-08-09 12:52:12 -07:00
174433267c [dte] fastpath implementation for broadcast utility function (4/x) (#62493)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62493

This diff adds a broadcast fastpath for the caffe2 broadcast utility function, which just copies the contents of a smaller tensor into a larger one. We also update the tests to exercise the new functionality.

Test Plan: unit tests + let CI run

Differential Revision: D29938285

fbshipit-source-id: 543ecc548500380e307be91902696033454964a2
2021-07-30 16:15:10 -07:00
eef85f89b9 [dte] broadcast fastpath implementations for reduce utility functions (2/x) (#62428)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62428

In this diff we add a broadcast fastpath for reduce utility functions. These functions are used by various elementwise ops, whose tests we update to exercise the new functionality.

Test Plan: Added test cases to elementwise ops (which will exercise the new reducer functionality) that will be run by CI. It's worth noting there's still no code (outside of the new test cases) that takes the new code paths added -- the user must explicitly request  `allow_broadcast_fastpath=True`, and nothing outside of the added tests currently does so.

Differential Revision: D29938264

fbshipit-source-id: 5d5542bd93afb85fd9f7a4073f766adc07eb3b65
2021-07-29 17:27:39 -07:00
9f9244aabe [dte] scaffolding for c2 operator broadcasting fastpath (1/x) (#62369)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62369

This diff is a big no-op that just sets up scaffolding for passing the "allow_broadcast_fastpath" from caffe2 operator protos created in Python down to C++. To facilitate this, we create helper template wrappers that pass a flag for "allow_broadcast_fastpath" down to elementwise functors. This flag will determine whether to try and take the broadcast fastpath, which we will add in subsequent diffs.

Test Plan: sandcastle + let github CI run

Differential Revision: D28154475

fbshipit-source-id: 15750a0bcd2994fbc6a61fb5653d8cae6b0177dd
2021-07-29 16:31:02 -07:00
a9b0a921d5 Disable avoid-non-const-global-variables lint check (#62008)
Summary:
As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH`

All changes but the ones to `.clang-tidy` are generated using following script:
```
for i in `find . -type f -iname "*.c*" -or -iname "*.h"|xargs grep cppcoreguidelines-avoid-non-const-global-variables|cut -f1 -d:|sort|uniq`;  do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008

Reviewed By: driazati, r-barnes

Differential Revision: D29838584

Pulled By: malfet

fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13
2021-07-22 18:04:40 -07:00
c74c0c5718 add thrust/host_vector.h header for cuda 11.4 build (#61004)
Summary:
needed for cuda 11.4 build

Close https://github.com/pytorch/pytorch/issues/61011

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61004

Reviewed By: ngimel

Differential Revision: D29523896

Pulled By: malfet

fbshipit-source-id: acb11bdd19c0cc240696be21e5c492f8976fea65
2021-07-06 12:44:56 -07:00
357a21bc92 Fix numerical issue of rowwise normalization in Caffe2 and internal tests. (#60880)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60880

Fix numerical issue of rowwise normalization in Caffe2 and internal tests.

Test Plan: buck test mode/opt //dper3/dper3/modules/tests:xdeepint_test -- --exact 'dper3/dper3/modules/tests:xdeepint_test - test_xdeepint_with_full_features_with_interactions_3 (dper3.dper3.modules.tests.xdeepint_test.XdeepInt_Test)'

Reviewed By: esqu1

Differential Revision: D29431597

fbshipit-source-id: 72df52fdcbb29ad3de7b9472f25fde26cf804a76
2021-06-30 17:31:04 -07:00
b4a4a8434d [1/n]support double for Caffe2 ScatterWeightedSum (#60402)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60402

Add float64 data type support for ScatterWeightedSum for cases that 10^7 precision is not sufficient.

Test Plan: buck test caffe2/caffe2/python/operator_test:sparse_ops_test -- testScatterWeightedSum

Reviewed By: jianyuh

Differential Revision: D29190324

fbshipit-source-id: 871a60744694e901a2c7685a67350860745d6729
2021-06-29 14:17:04 -07:00
d36ce61a5e use explicitly non-returning GPU atomics (#60607)
Summary:
Enables an important performance optimization for ROCm, in light of the discussion in https://github.com/pytorch/pytorch/issues/41028.

CC jithunnair-amd sunway513

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60607

Reviewed By: jbschlosser

Differential Revision: D29409894

Pulled By: ngimel

fbshipit-source-id: effca258a0f37eaefa35674a7fd19459ca7dc95b
2021-06-28 18:17:29 -07:00
c3977bf3da [caffe2/utils] Add some fine-grained rules to avoid package boundary violations
Test Plan: CI

Reviewed By: igorsugak

Differential Revision: D29401295

fbshipit-source-id: e921e5578c1fcc8df6bd670ae9f95722b8e32d85
2021-06-28 14:45:30 -07:00
03de807d81 [caffe2/utils] Add explicit rule to avoid package boundary violation (#60677)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60677

Add a rule to wrap conversions.h and depend on that, rather than
relying on a glob which violates package boundaries.

Test Plan: `buck2 build fbcode//caffe2/caffe2:caffe2_core`

Reviewed By: mzlee

Differential Revision: D29370841

fbshipit-source-id: d4dd383eb8457d4f5118574e34e6f17c32fde647
2021-06-28 14:43:30 -07:00
20bda0057e [caffe2/utils] Add explicit rule to avoid package boundary violation
Summary:
Add a rule to wrap proto_utils.h and depend on that, rather than
relying on a glob which violates package boundaries.

Reviewed By: igorsugak

Differential Revision: D29273453

fbshipit-source-id: 08f198a03d06ee2fdf61f5dbe1d0087db22aec8b
2021-06-22 12:22:24 -07:00
7c1bca9e94 [caffe2/utils] Add explicit rule to avoid package boundary violation
Summary:
Add a rule to wrap simple_queue.h and depend on that, rather than
relying on a glob which violates package boundaries.

Test Plan: `buck2 build fbcode//caffe2/caffe2:caffe2_core`

Reviewed By: igorsugak

Differential Revision: D29273415

fbshipit-source-id: f2b62a82cd6478bd71a8194d661d1c8b023c0953
2021-06-22 12:21:08 -07:00
567e6d3a87 Remove Caffe2 thread-pool leak warning (#60318)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/57273.

Some users reported that they dislike the Caffe2 thread-pool leak warning, as it floods their logs, and have requested disabling it, or have asked for a way to filter it.

It seems caffe2 pthreadpool already exists because of some dependency in the binary distribution, so `torch.set_num_threads()` invocation isn't required to reproduce the issue (as is otherwise the case when building from the master branch).

https://github.com/pytorch/pytorch/issues/60171's test script does have a `set_num_threads` invocation & hence that's why I was able to reproduce the issue after building from the master branch's source code.

cc malfet & ejguan, who have the authority to make a decision.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60318

Reviewed By: albanD

Differential Revision: D29265771

Pulled By: ezyang

fbshipit-source-id: 26f678af2fec45ef8f7e1d39a57559790eb9e94b
2021-06-22 10:26:55 -07:00
aeb55225e0 [caffe2] add a basic implementation of run-time feature rollout checks (#59355)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59355

Add a `CheckKnob()` function for doing run-time checks of feature roll-out
knobs.  This provides an API for safely controlling the roll-out of new
functionality in the code.

Test Plan: Included some basic unit tests.

Reviewed By: voznesenskym

Differential Revision: D26536430

fbshipit-source-id: 2e53234c6d9ce624848fc8b2c76f6833f344f48b
2021-06-04 14:34:41 -07:00
f976275858 Run pthreadpool with _NoPThreadPoolGuard on the same thread (#58759)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58759

* Makes `pthreadpool()->run` respect `_NoPThreadPoolGuard`
   Runs tasks on the same thread instead of parallelizing when guard is present

Test Plan:
buck build //xplat/caffe2:aten_test_test_thread_pool_guard
./buck-out/last/aten_test_test_thread_pool_guard

Reviewed By: kimishpatel

Differential Revision: D28597425

fbshipit-source-id: 0365ad9947c239f5b37ce682802d4d401b8b0a48
2021-05-25 11:39:05 -07:00
3a66a1cb99 [clang-tidy] Exclude cppcoreguidelines-avoid-magic-numbers (#57841)
Summary:
Add cppcoreguidelines-avoid-magic-numbers exclusion to clang-tidy
Remove existing nolint warnings using following script:
```
for file in `git ls-files | grep -v \.py`; do gsed '/^ *\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers)/d' -i  $file; done
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57841

Reviewed By: samestep

Differential Revision: D28295045

Pulled By: malfet

fbshipit-source-id: 7c6e8d1213c9593f169ed3df6a916498f1a97163
2021-05-07 20:02:33 -07:00
4cb534f92e Make PyTorch code-base clang-tidy compliant (#56892)
Summary:
This is an automatic change generated by the following script:
```
#!/usr/bin/env python3
from subprocess import check_output, check_call
import os

def get_compiled_files_list():
    import json
    with open("build/compile_commands.json") as f:
        data = json.load(f)
    files = [os.path.relpath(node['file']) for node in data]
    for idx, fname in enumerate(files):
        if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'):
            files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')]
    return files

def run_clang_tidy(fname):
    check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"])
    changes = check_output(["git", "ls-files", "-m"])
    if len(changes) == 0:
        return
    check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"])

def main():
    git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n")
    compiled_files = get_compiled_files_list()
    for idx, fname in enumerate(git_files):
        if fname not in compiled_files:
            continue
        if fname.startswith("caffe2/contrib/aten/"):
            continue
        print(f"[{idx}/{len(git_files)}] Processing {fname}")
        run_clang_tidy(fname)

if __name__ == "__main__":
    main()
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892

Reviewed By: H-Huang

Differential Revision: D27991944

Pulled By: malfet

fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179
2021-04-28 14:10:25 -07:00
dc8a8cea79 Move caffe2 signal_handler to c10. (#56717)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56717

The signal_handler was under the caffe2 namespacee but was being used
by PyTorch as well.

I've fixed this my moving it to the c10 namespace where now both C2 and PyTorch
can use it.

The signal_handler interface in caffe2/utils/signal_handler.h is kept the same
for backward compatiblity for C2, but most of the commmon code is moved to c10.
ghstack-source-id: 127446929

Test Plan: waitforbuildbot

Reviewed By: ezyang

Differential Revision: D27946738

fbshipit-source-id: d6228d1a0108f4c807d405e7a0bb799c5375388f
2021-04-26 23:08:12 -07:00
a8ea490f67 Revert caffe2 print stack traces flag (#56496)
Summary:
This reverts the change in #56198 which broke some internal tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56496

Pulled By: driazati

Reviewed By: walterddr

Differential Revision: D27886611

fbshipit-source-id: b04de01b3bcf886294ff7ae45776b5955ce19858
2021-04-20 11:43:33 -07:00
43c747859c Use c10 backtrace generation in caffe2 (#56198)
Summary:
This cuts out caffe2's old backtrace generation in favor of the one already in c10.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56198

Pulled By: driazati

Reviewed By: nikithamalgifb

Differential Revision: D27868282

fbshipit-source-id: aa9b9691271eaa3f95baab48773ffefebd924ae2
2021-04-20 07:00:33 -07:00
2f5c352162 Fix protobuf warnings in caffe2 (#56186)
Summary:
This guards some deprecated usages of the Protobuf API behind an `#ifdef` (this is how onnx does it as well)
](https://our.intern.facebook.com/intern/diff/27803121/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56186

Pulled By: driazati

Reviewed By: bertmaher, dzhulgakov

Differential Revision: D27803121

fbshipit-source-id: 2d3a348ec1ab9879a0d8f2dff17c5444fd4baf2c
2021-04-19 15:19:53 -07:00
facbcec298 Make leak_corrupted_threadpool non-atomic (#55341)
Summary:
Following up on https://github.com/pytorch/pytorch/pull/54895#discussion_r606402656.

A race-condition wouldn't arise because `leak_corrupted_threadpool` can be set to true only after fork via the `pthread_atfork` handler, when a (child) process would be single-threaded. It's set to false also when the process is still single-threaded (`pthreadpool` is called during an invocation to `set_num_threads`, prior to which a child process would remain single-threaded). All threads (if & when multiple threads would be created) would always see `leak_corrupted_threadpool` as false if it would be accessed concurrently.

Since no reader threads can exist while a writer thread changes its value (false->true and true->false), `leak_corrupted_threadpool` might as well be a non-atomic bool.

### Pros
1. No thread-synchronization is required for `leak_corrupted_threadpool`, as it's a non-atomic bool.
2. The call to `compare_exchange_strong` has been be removed.

cc: malfet VitalyFedyunin ezyang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55341

Reviewed By: albanD

Differential Revision: D27669442

Pulled By: ezyang

fbshipit-source-id: 926cb5c1b0a537c1c2ab164b0d51d37c1f1b67f0
2021-04-10 19:25:33 -07:00
f1a0b817f0 [pthreadpool] Apply cap for macos builds (#55435)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55435

We've seen issues from the macos skylight app that PyTorch is super slow due to the lack of cap support in pthreadpools. For mac builds, we set the thread count to `#threads/2`.
ghstack-source-id: 125900852

Test Plan:
- Sandcastle CI
- CircleCI

Reviewed By: kimishpatel

Differential Revision: D27578871

fbshipit-source-id: 7b947bc5d6cf289378abf5f479575e112325d02b
2021-04-08 03:56:12 -07:00
8c1a70a7c9 [A*][Gen-1.5] Add shape inference func for PredictorCall.
Summary:
ATT, so that the shape inference works for a model with only distributed parts.

Previously, we rely on a full_predictor net to do shape inference. For very large models, the full_predictor net won't be generated, so we have to do shape inference based on distributed parts. Surprisingly, the PredictorCall op does tensor name mapping so it has to have shape inference func supported.

Test Plan: Added unittests.

Reviewed By: khabinov

Differential Revision: D27250956

fbshipit-source-id: 3ebd36ba1eb020bb5d00358cffb8f038a6a996e8
2021-04-06 21:18:40 -07:00
e3691be2d9 Dump C++ stack traces of all threads for distributed tests. (#55003)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55003

Using the `caffe2::setPrintStackTracesOnFatalSignal` utility in
distributed tests to set a signal handler that dumps the state of all threads
for all processes when it receives a FATAL signal. This would help in debugging
tests further.

I had to revert all the python faulthandler code since only one signal handler
function is supported, so running python faulthandler with
`setPrintStackTracesOnFatalSignal` doesn't work.

Sample output:
```
SIGSEGV(11), PID: 3492872, Thread 3492872:
[0] ???(0x7fa7b2d1d61b) in libcaffe2_caffe2_caffe2_cpu.so
[1] ???(0x7fa7b2d1d3fb) in libcaffe2_caffe2_caffe2_cpu.so
[2] ???(0x7fa7b2d1d33d) in libcaffe2_caffe2_caffe2_cpu.so
[3] ???(0x7fa7b2d1d167) in libcaffe2_caffe2_caffe2_cpu.so
[4] ???(0x7fa7ce683150) in libpthread.so.0
[5] ???(0x7fa7be2b233c) in libcaffe2__C_impl_cuda.so
[6] ???(0x7fa7be2ce80c) in libcaffe2__C_impl_cuda.so
[7] ???(0x7fa7be2a0512) in libcaffe2__C_impl_cuda.so
[8] torch::distributed::rpc::TensorPipeAgent::send(torch::distributed::rpc::WorkerInfo const&, torch::distributed::rpc::Message&&, float, std::unordered_map<signed char, signed char, std::hash<signed char>, std::equal_to<signed char>, std::allocator<std::pair<signed char const, signed char> > > const&)+0x24f(0x7fa7be29f71f) in libcaffe2__C_impl_cuda.so
[9] torch::distributed::autograd::sendMessageWithAutograd(torch::distributed::rpc::RpcAgent&, torch::distributed::rpc::WorkerInfo const&, torch::distributed::rpc::Message&&, bool, float, bool)+0x393(0x7fa7b602b203) in libcaffe2_libtorch.so
[10] torch::distributed::rpc::pyRpcPythonUdf(torch::distributed::rpc::WorkerInfo const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::vector<at::Tensor, std::allocator<at::Tensor> >&, float, bool)+0x201(0x7fa7bd844971) in libcaffe2__C_impl_cuda.so
```
ghstack-source-id: 125630551

Test Plan: waitforbuildbot

Reviewed By: SciPioneer

Differential Revision: D27419714

fbshipit-source-id: 8aca9a14ef688004053d8798124d9c3a3fbe3489
2021-04-03 13:59:56 -07:00