Commit Graph

130 Commits

Author SHA1 Message Date
93b1e47586 [inductor][Observability] Add log for Optimus to enable easier debug (#110452)
Summary: The log breaks one of ads-model export flows, and we change the log to debug

Test Plan: see details in D49710166

Differential Revision: D49844303

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110452
Approved by: https://github.com/jackiexu1992
2023-12-01 18:25:56 +00:00
d1c092ae1b Update impl_abstract_pystub to be less boilerplatey (#113182)
Summary:

We've made the following changes:
- The new way to use the API is `m.impl_abstract_pystub(module, context)`.
  Every subsequent m.def of an op inside the TORCH_LIBRARY block gives
  the op the `impl_abstract_pystub`.
- Added a mechanism to determine if an operator was defined in Python or C++.
  Library.define in Python appends the op to a global set, which is analogous
  to what we do for tracking Library.impl.
- If someone does `torch.library.impl_abstract` in Python for an operator, then
  we require that it has an `impl_abstract_pystub` specified and we also check
  that the module in the `impl_abstract_pystub` is the same as the module where
  the call to `torch.library.impl_abstract` exists.
- Unfortunately we can't check the "context" (which is the buck target on
  buck-based systems) because buck sits above us.

bypass-github-export-checks

Test Plan: - existing tests

Differential Revision: D51080493

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113182
Approved by: https://github.com/ezyang
2023-11-08 00:39:00 +00:00
bc3e2e03cd Revert "Update impl_abstract_pystub to be less boilerplatey (#112851)"
This reverts commit 6ae4e3a8d249a96d9a8bbfba389d0509783e11e1.

Reverted https://github.com/pytorch/pytorch/pull/112851 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/112851#issuecomment-1799539354))
2023-11-07 18:53:13 +00:00
6ae4e3a8d2 Update impl_abstract_pystub to be less boilerplatey (#112851)
Summary:
We've made the following changes:
- The new way to use the API is `m.impl_abstract_pystub(module, context)`.
  Every subsequent m.def of an op inside the TORCH_LIBRARY block gives
  the op the `impl_abstract_pystub`.
- Added a mechanism to determine if an operator was defined in Python or C++.
  Library.define in Python appends the op to a global set, which is analogous
  to what we do for tracking Library.impl.
- If someone does `torch.library.impl_abstract` in Python for an operator, then
  we require that it has an `impl_abstract_pystub` specified and we also check
  that the module in the `impl_abstract_pystub` is the same as the module where
  the call to `torch.library.impl_abstract` exists.
- Unfortunately we can't check the "context" (which is the buck target on
  buck-based systems) because buck sits above us.

Test Plan: - existing tests

Differential Revision: D50972148

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112851
Approved by: https://github.com/ezyang
2023-11-07 16:07:42 +00:00
8124a6c40c [TORCH_LIBRARY] Add impl_abstract_pystub (#109529)
We want users to be able to define custom ops in C++ but put the
abstract impl in Python (since it is easier to write them in Python and
the abstract impl better models device semantics and data-dependent
operators).

`m.impl_abstract_pystub(opname, python_module, context)` declares the
abstract_impl of the operator to exist in the given python module.
When the abstract_impl needs to be accessed (either via FakeTensor or
Meta), and it does not exist, the PyTorch Dispatcher will yell
with a descriptive error message.

Some details:
- We construct a new global AbstractImplPyStub mapping in
  Dispatcher.cpp. Read/write to this map is protected by the Dispatcher
  lock.
- We add a new Meta Tensor fallback kernel. The fallback errors out if there is
  no meta kernel, but also offers a nicer error message if we see that there is
  a pystub.
- We create a `torch._utils_internal.throw_abstract_impl_not_imported_error`
  helper function to throw errors. This way, we can throw different error
  messages in OSS PyTorch vs internal PyTorch. To invoke this from C++, we
  added a PyInterpreter::throw_abstract_impl_not_imported_error.

Differential Revision: [D49464753](https://our.internmc.facebook.com/intern/diff/D49464753/)

Differential Revision: [D49464753](https://our.internmc.facebook.com/intern/diff/D49464753)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109529
Approved by: https://github.com/ezyang, https://github.com/bdhirsh
2023-09-22 04:55:36 +00:00
49b18ae546 Revert "python functionalization: add helpers, functionalize_sync and mirror_autograd_meta (#107917)"
This reverts commit 0ad595954a1766f26aa55b0f72814d55865bb1dc.

Reverted https://github.com/pytorch/pytorch/pull/107917 on behalf of https://github.com/clee2000 due to breaking internal builds D49346637 ([comment](https://github.com/pytorch/pytorch/pull/107917#issuecomment-1722566885))
2023-09-17 20:57:41 +00:00
0ad595954a python functionalization: add helpers, functionalize_sync and mirror_autograd_meta (#107917)
Added two new utils to help with turning python functionalization on in AOTAutograd (next PR):

(1) updated `torch._sync()`. Previously, this API could only handle `torch.Tensor` instances that had a `FunctionalTensorWrapper` TensorImpl. It now needs to handle python `FunctionalTensor`'s. In theory I can probably break BC and change this API (since it's private?), but I decided not to do it in this PR stack do minimize the chance of reverts. Instead of updating that API directly (which is in C++), I just added a python shim that first tries to unwrap the python `FunctionalTensor` if there is one, then calls the existing C++ logic

(2) `mirror_autograd_meta` is now a standalone API that tries to mirror the `requires_grad` and `is_leaf` autograd metadata from one tensor to another. Previously this was hardcoded into `torch._to_functional_tensor()`. But I now need to use it in a more standalone way: later in AOTAutograd when we unwrap and re-wrap a tensor subclasses, we need to manually mirror the autograd metadata from the original to the updated version of the subclass.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107917
Approved by: https://github.com/ezyang
ghstack dependencies: #106404
2023-09-15 20:19:25 +00:00
fbfb9a1648 [Dynamo] Improve PT2 fbcode logging observability (#106932)
Summary:
https://docs.google.com/document/d/1D5K3_ELsda3tIUeSyNL_2yee-M3jVWbirqSQ5BDNvHQ/edit

This is the revamped version of D47908299.

For each frame, we will record a list of compilation metrics: e.g, backend_compile time, entire_frame_compile time, cache_size, co_filename, co_firstlineno, co_name, guards, graph input_count, graph node_count, graph op_count.

With the help of job info: mast_job_name, global_rank, we can satisfy the requirements from `Things I’ve used/wanted to use our logging to determine` in https://docs.google.com/document/d/1D5K3_ELsda3tIUeSyNL_2yee-M3jVWbirqSQ5BDNvHQ/edit (or add more metrics for this framework)

Test Plan:
```
buck2 test //caffe2/test:test_dynamo
```

Differential Revision: D48142400

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106932
Approved by: https://github.com/anijain2305
2023-08-11 20:46:04 +00:00
59dff01319 Add top level function to check if running with deploy (#101420)
Also not sure if this should be a public function or not. Leaving it private for now but let me know if you prefer for it to be public.

FYI @nikitaved this will logically conflict with your triton kernel PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101420
Approved by: https://github.com/malfet
2023-05-16 16:05:49 +00:00
1e807f1189 Log PT2 compile to Scuba (#98790)
Summary:
Modeled off of https://www.internalfb.com/code/fbsource/[5f363eaeab1b5d620b9df83ba0de65adfd96771b]/fbcode/caffe2/torch/fb/trainer/profilers/gpu_mem_signpost.py?lines=106-115

I didn't use the Scuba integration in torch/_inductor/fb/logging.py to avoid
having to make a new Scuba table; probably should do this.

Test Plan:
```
buck2 test //caffe2/test:test_dynamo
```

Differential Revision: D44850903

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98790
Approved by: https://github.com/desertfire, https://github.com/bertmaher
2023-04-11 20:10:35 +00:00
12cb26509a Apply ufmt to torch internal (#81643)
This is a big bang PR, merge conflicts are probably expected and will be addressed at merge.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81643
Approved by: https://github.com/ezyang
2022-07-22 02:19:50 +00:00
1022443168 Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: revert-hammer

Differential Revision:
D30279364 (b004307252)

Original commit changeset: c1ed77dfe43a

fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e
2021-08-12 11:45:01 -07:00
b004307252 [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: manual inspection & sandcastle

Reviewed By: zertosh

Differential Revision: D30279364

fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a
2021-08-12 10:58:35 -07:00
e62189ad69 [jit] Better checking for overload function declarations. (#59956)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59956

Issue #50175. Basically two things need to be checked and are lacking currently:
1. Overload declarations should always have a single `pass` statement as the body.
2. There should be always an implementation provided for decls which doesn't
   have the torch.jit._overload decorator. So in this case we need to check
   whether we are actually compiling a function body with decorator ahead.

Test Plan:
python test/test_jit.py TestScript.test_function_overloads

Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D29106555

fbshipit-source-id: 2d9d7df2fb51ab6db0e1b726f9644e4cfbf733d6
2021-08-05 14:21:48 -07:00
737d920b21 Strictly type everything in .github and tools (#59117)
Summary:
This PR greatly simplifies `mypy-strict.ini` by strictly typing everything in `.github` and `tools`, rather than picking and choosing only specific files in those two dirs. It also removes `warn_unused_ignores` from `mypy-strict.ini`, for reasons described in https://github.com/pytorch/pytorch/pull/56402#issuecomment-822743795: basically, that setting makes life more difficult depending on what libraries you have installed locally vs in CI (e.g. `ruamel`).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59117

Test Plan:
```
flake8
mypy --config mypy-strict.ini
```

Reviewed By: malfet

Differential Revision: D28765386

Pulled By: samestep

fbshipit-source-id: 3e744e301c7a464f8a2a2428fcdbad534e231f2e
2021-06-07 14:49:36 -07:00
f2e41257e4 Back out "Revert D26077905: Back out "Revert D25850783: Add torch::deploy, an embedded torch-python interpreter"" (#51267)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51267

Original commit changeset: b70185916502

Test Plan: test locally, oss ci-all, fbcode incl deferred

Reviewed By: suo

Differential Revision: D26121251

fbshipit-source-id: 4315b7fd5476914c8e5d6f547e1cfbcf0c227781
2021-01-28 19:30:45 -08:00
12a434abbc Revert D26077905: Back out "Revert D25850783: Add torch::deploy, an embedded torch-python interpreter"
Test Plan: revert-hammer

Differential Revision:
D26077905 (dc2a44c4fc)

Original commit changeset: fae83bf9822d

fbshipit-source-id: b70185916502ba9ebe16d781cf0659b9f7865c9a
2021-01-27 19:53:29 -08:00
dc2a44c4fc Back out "Revert D25850783: Add torch::deploy, an embedded torch-python interpreter" (#51124)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51124

Original commit changeset: 1c7133627da2

Test Plan: Test locally with interpreter_test and on CI

Reviewed By: suo

Differential Revision: D26077905

fbshipit-source-id: fae83bf9822d79e9a9b5641bc5191a7f3fdea78d
2021-01-27 16:49:42 -08:00
e843974a6e Revert D25850783: Add torch::deploy, an embedded torch-python interpreter
Test Plan: revert-hammer

Differential Revision:
D25850783 (3192f9e4fe)

Original commit changeset: a4656377caff

fbshipit-source-id: 1c7133627da28fb12848da7a9a46de6d3b2b67c6
2021-01-26 02:07:44 -08:00
3192f9e4fe Add torch::deploy, an embedded torch-python interpreter (#50458)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50458

libinterpreter.so contains a frozen python distribution including
torch-python bindings.

Freezing refers to serializing bytecode of python standard library modules as
well as the torch python library and embedding them in the library code.  This
library can then be dlopened multiple times in one process context, each
interpreter having its own python state and GIL.  In addition, each python
environment is sealed off from the filesystem and can only import the frozen
modules included in the distribution.

This change relies on newly added frozenpython, a cpython 3.8.6 fork built for this purpose.  Frozenpython provides libpython3.8-frozen.a which
contains frozen bytecode and object code for the python standard library.

Building on top of frozen python, the frozen torch-python bindings are added in
this diff, providing each embedded interpreter with a copy of the torch
bindings.  Each interpreter is intended to share one instance of libtorch and
the underlying tensor libraries.

Known issues

- Autograd is not expected to work with the embedded interpreter currently, as it manages
its own python interactions and needs to coordinate with the duplicated python
states in each of the interpreters.
- Distributed and cuda stuff is disabled in libinterpreter.so build, needs to be revisited
- __file__ is not supported in the context of embedded python since there are no
files for the underlying library modules.
using __file__
- __version__ is not properly supported in the embedded torch-python, just a
workaround for now

Test Plan: tested locally and on CI with cmake and buck builds running torch::deploy interpreter_test

Reviewed By: ailzhang

Differential Revision: D25850783

fbshipit-source-id: a4656377caff25b73913daae7ae2f88bcab8fd88
2021-01-25 15:14:28 -08:00
20ac736200 Remove py2 compatible future imports (#44735)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735

Reviewed By: mruberry

Differential Revision: D23731306

Pulled By: ezyang

fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f
2020-09-16 12:55:57 -07:00
f17d7a5556 Fix exception chaining in torch/ (#43836)
Summary:
## Motivation
Fixes https://github.com/pytorch/pytorch/issues/43770.

## Description of the change
This PR fixes exception chaining only in files under `torch/` where appropriate.
To fix exception chaining, I used either:
1. `raise new_exception from old_exception` where `new_exception` itself seems not descriptive enough to debug or `old_exception` delivers valuable information.
2. `raise new_exception from None` where raising both of `new_exception` and `old_exception` seems a bit noisy and redundant.
I subjectively chose which one to use from the above options.

## List of lines containing raise in except clause:
I wrote [this simple script](https://gist.github.com/akihironitta/4223c1b32404b36c1b349d70c4c93b4d) using [ast](https://docs.python.org/3.8/library/ast.html#module-ast) to list lines where `raise`ing in `except` clause.

- [x] 000739c31a/torch/jit/annotations.py (L35)
- [x] 000739c31a/torch/jit/annotations.py (L150)
- [x] 000739c31a/torch/jit/annotations.py (L158)
- [x] 000739c31a/torch/jit/annotations.py (L231)
- [x] 000739c31a/torch/jit/_trace.py (L432)
- [x] 000739c31a/torch/nn/utils/prune.py (L192)
- [x] 000739c31a/torch/cuda/nvtx.py (L7)
- [x] 000739c31a/torch/utils/cpp_extension.py (L1537)
- [x] 000739c31a/torch/utils/tensorboard/_pytorch_graph.py (L292)
- [x] 000739c31a/torch/utils/data/dataloader.py (L835)
- [x] 000739c31a/torch/utils/data/dataloader.py (L849)
- [x] 000739c31a/torch/utils/data/dataloader.py (L856)
- [x] 000739c31a/torch/testing/_internal/common_utils.py (L186)
- [x] 000739c31a/torch/testing/_internal/common_utils.py (L189)
- [x] 000739c31a/torch/testing/_internal/common_utils.py (L424)
- [x] 000739c31a/torch/testing/_internal/common_utils.py (L1279)
- [x] 000739c31a/torch/testing/_internal/common_utils.py (L1283)
- [x] 000739c31a/torch/testing/_internal/common_utils.py (L1356)
- [x] 000739c31a/torch/testing/_internal/common_utils.py (L1388)
- [x] 000739c31a/torch/testing/_internal/common_utils.py (L1391)
- [ ] 000739c31a/torch/testing/_internal/common_utils.py (L1412)
- [x] 000739c31a/torch/testing/_internal/codegen/random_topo_test.py (L310)
- [x] 000739c31a/torch/testing/_internal/codegen/random_topo_test.py (L329)
- [x] 000739c31a/torch/testing/_internal/codegen/random_topo_test.py (L332)
- [x] 000739c31a/torch/testing/_internal/jit_utils.py (L183)
- [x] 000739c31a/torch/testing/_internal/common_nn.py (L4789)
- [x] 000739c31a/torch/onnx/utils.py (L367)
- [x] 000739c31a/torch/onnx/utils.py (L659)
- [x] 000739c31a/torch/onnx/utils.py (L892)
- [x] 000739c31a/torch/onnx/utils.py (L897)
- [x] 000739c31a/torch/serialization.py (L108)
- [x] 000739c31a/torch/serialization.py (L754)
- [x] 000739c31a/torch/distributed/rpc/_testing/faulty_agent_backend_registry.py (L76)
- [x] 000739c31a/torch/distributed/rpc/backend_registry.py (L260)
- [x] 000739c31a/torch/distributed/distributed_c10d.py (L184)
- [x] 000739c31a/torch/_utils_internal.py (L57)
- [x] 000739c31a/torch/hub.py (L494)
- [x] 000739c31a/torch/contrib/_tensorboard_vis.py (L16)
- [x] 000739c31a/torch/distributions/lowrank_multivariate_normal.py (L100)
- [x] 000739c31a/torch/distributions/constraint_registry.py (L142)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43836

Reviewed By: ailzhang

Differential Revision: D23431212

Pulled By: malfet

fbshipit-source-id: 5f7f41b391164a5ad0efc06e55cd58c23408a921
2020-08-31 20:26:23 -07:00
5c39146c34 Fix get_writable_path (#42895)
Summary:
As name suggests, this function should always return a writable path
Call `mkdtemp` to create temp folder if path is not writable

This fixes `TestNN.test_conv_backcompat` if PyTorch is installed in non-writable location

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42895

Reviewed By: dzhulgakov

Differential Revision: D23070320

Pulled By: malfet

fbshipit-source-id: ed6a681d46346696a0de7e71f0b21cba852a964e
2020-08-12 09:38:24 -07:00
ca665c682c Separate RTLD_GLOBAL from _load_global_deps() (#36682)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36682

For fb internal builds we need to separate whether to use global deps library from loading with RTLD_GLOBAL.

Test Plan: CI -- this should be a no-op for existing builds

Reviewed By: ezyang

Differential Revision: D21051427

fbshipit-source-id: 83bb703d6ceb0265a4c58166749312a44172e78c
2020-04-22 19:08:44 -07:00
e35dd4f603 [jit] Include call stack in OSError message (#34669)
Summary:
Previously there was no indication of why you would get an `OSError` for something (such as the generated methods of a `dataclass`).
](https://our.intern.facebook.com/intern/diff/20426570/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34669

Pulled By: driazati

Differential Revision: D20426570

fbshipit-source-id: 45d63631984fa26a87c03de5523fb10d8abbc6db
2020-03-18 15:10:23 -07:00
ddff4efa26 Don't use RTLD_GLOBAL to load _C. (#31162)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31162

This should help us resolve a multitude of weird segfaults and crashes
when PyTorch is imported along with other packages. Those would often
happen because libtorch symbols were exposed globally and could be used
as a source of relocations in shared libraries loaded after libtorch.

Fixes #3059.

Some of the subtleties in preparing this patch:

* Getting ASAN to play ball was a pain in the ass. The basic problem is that when we load with `RTLD_LOCAL`, we now may load a library multiple times into the address space; this happens when we have custom C++ extensions. Since the libraries are usually identical, this is usually benign, but it is technically undefined behavior and UBSAN hates it. I sprayed a few ways of getting things to "work" correctly: I preload libstdc++ (so that it is seen consistently over all library loads) and added turned off vptr checks entirely. Another possibility is we should have a mode where we use RTLD_GLOBAL to load _C, which would be acceptable in environments where you're sure C++ lines up correctly. There's a long comment in the test script going into more detail about this.
* Making some of our shared library dependencies load with `RTLD_LOCAL` breaks them. OpenMPI and MKL don't work; they play linker shenanigans to look up their symbols which doesn't work when loaded locally, and if we load a library with `RLTD_LOCAL` we aren't able to subsequently see it with `ctypes`. To solve this problem, we employ a clever device invented by apaszke: we create a dummy library `torch_global_deps` with dependencies on all of the libraries which need to be loaded globally, and then load that with `RTLD_GLOBAL`. As long as none of these libraries have C++ symbols, we can avoid confusion about C++ standard library.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D19262579

Test Plan: Imported from OSS

Pulled By: ezyang

fbshipit-source-id: 06a48a5d2c9036aacd535f7e8a4de0e8fe1639f2
2020-01-09 07:28:15 -08:00
df338f80a6 Add a wrapper for inspect in JIT to produce better error message (#25415)
Summary:
If source code is not available due to packaging (e.g. sources are compiled to .pyc), TorchScript produces very obscure error message. This tries to make it nicer and allow to customize message by overriding _utils_internal.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25415

Test Plan: Really hard to unittest properly. Did one off testing by compiling to .pyc and checking the message.

Differential Revision: D17118238

Pulled By: dzhulgakov

fbshipit-source-id: 3cbfee0abddc8613000680548bfe0b8ed52a36b0
2019-09-14 21:27:51 -07:00
8bc3b66be9 Override the resolve_library_path in FBCode (#17497)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17497

The following problems have been addressed: 1) import torch.ops correctly, 2) make realpath call optional

Reviewed By: dzhulgakov

Differential Revision: D14094358

fbshipit-source-id: 2f9a6fca656867287a7c82c465a4554384ff7323
2019-03-12 22:09:24 -07:00
2a6431ba2d Use fixed MASTER_PORT in test_distributed (#13109)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13109

The "right" strategy of creating a socket, binding to an undefined port, closing the socket, and reusing the port it was bound to, was subject to a race condition. Another process could bind to that same port sooner than the tests would, causing an "Address already in use" failure when rank 0 would try and bind to that same port. The THD tests have been using a fixed port since forever. Time will tell if this fixes #12876.

Differential Revision: D10850614

fbshipit-source-id: c19f12bb4916141187ee8ddb52880f5f418310dc
2018-10-25 08:51:34 -07:00
48e90e3339 Build system changes (#8627)
* All changes needed to get rid of process_github.sh

* allow thnn_h_path
2018-06-20 17:45:26 -04:00