Summary:
Adds a note explaining the difference between several often conflated mechanisms in the autograd note
Also adds a link to this note from the docs in `grad_mode` and `nn.module`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58513
Reviewed By: gchanan
Differential Revision: D28651129
Pulled By: soulitzer
fbshipit-source-id: af9eb1749b641fc1b632815634eea36bf7979156
Summary:
*Context:* https://github.com/pytorch/pytorch/issues/53406 added a lint for trailing whitespace at the ends of lines. However, in order to pass FB-internal lints, that PR also had to normalize the trailing newlines in four of the files it touched. This PR adds an OSS lint to normalize trailing newlines.
The changes to the following files (made in 54847d0adb9be71be4979cead3d9d4c02160e4cd) are the only manually-written parts of this PR:
- `.github/workflows/lint.yml`
- `mypy-strict.ini`
- `tools/README.md`
- `tools/test/test_trailing_newlines.py`
- `tools/trailing_newlines.py`
I would have liked to make this just a shell one-liner like the other three similar lints, but nothing I could find quite fit the bill. Specifically, all the answers I tried from the following Stack Overflow questions were far too slow (at least a minute and a half to run on this entire repository):
- [How to detect file ends in newline?](https://stackoverflow.com/q/38746)
- [How do I find files that do not end with a newline/linefeed?](https://stackoverflow.com/q/4631068)
- [How to list all files in the Git index without newline at end of file](https://stackoverflow.com/q/27624800)
- [Linux - check if there is an empty line at the end of a file [duplicate]](https://stackoverflow.com/q/34943632)
- [git ensure newline at end of each file](https://stackoverflow.com/q/57770972)
To avoid giving false positives during the few days after this PR is merged, we should probably only merge it after https://github.com/pytorch/pytorch/issues/54967.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54737
Test Plan:
Running the shell script from the "Ensure correct trailing newlines" step in the `quick-checks` job of `.github/workflows/lint.yml` should print no output and exit in a fraction of a second with a status of 0. That was not the case prior to this PR, as shown by this failing GHA workflow run on an earlier draft of this PR:
- https://github.com/pytorch/pytorch/runs/2197446987?check_suite_focus=true
In contrast, this run (after correcting the trailing newlines in this PR) succeeded:
- https://github.com/pytorch/pytorch/pull/54737/checks?check_run_id=2197553241
To unit-test `tools/trailing_newlines.py` itself (this is run as part of our "Test tools" GitHub Actions workflow):
```
python tools/test/test_trailing_newlines.py
```
Reviewed By: malfet
Differential Revision: D27409736
Pulled By: samestep
fbshipit-source-id: 46f565227046b39f68349bbd5633105b2d2e9b19
Summary:
Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857
These are the only hand-written parts of this diff:
- the addition to `.github/workflows/lint.yml`
- the file endings changed in these four files (to appease FB-internal land-blocking lints):
- `GLOSSARY.md`
- `aten/src/ATen/core/op_registration/README.md`
- `scripts/README.md`
- `torch/csrc/jit/codegen/fuser/README.md`
The rest was generated by running this command (on macOS):
```
git grep -I -l ' $' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' | xargs gsed -i 's/ *$//'
```
I looked over the auto-generated changes and didn't see anything that looked problematic.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406
Test Plan:
This run (after adding the lint but before removing existing trailing spaces) failed:
- https://github.com/pytorch/pytorch/runs/2043032377
This run (on the tip of this PR) succeeded:
- https://github.com/pytorch/pytorch/runs/2043296348
Reviewed By: walterddr, seemethere
Differential Revision: D26856620
Pulled By: samestep
fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97
Summary:
Uses cmake's `configure_file()` macro to generate a new `torch/csrc/api/include/torch/version.h` header with `TORCH_VERSION_{MAJOR,MINOR,PATCH}` \#defines from an input file `torch/csrc/api/include/torch/version.h.in`.
For Bazel builds, this is accomplished with `header_template_rule()`.
For Buck builds, this is accomplished with `fb_native.genrule()`.
Fixes https://github.com/pytorch/pytorch/issues/44365
<img width="1229" alt="Screen Shot 2021-01-05 at 3 19 24 PM" src="https://user-images.githubusercontent.com/75754324/103809279-3fd80380-5027-11eb-9039-fd23922cebd5.png">
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50073
Reviewed By: glaringlee
Differential Revision: D25855877
Pulled By: jbschlosser
fbshipit-source-id: 6bb792718c97e2c2dbaa74b7b7b831a4f6938e49
Summary:
Fixes https://github.com/pytorch/pytorch/issues/47462, but not completely.
Update breathe to the latest version to get fixes for the "Unable to resolve..." issues. There are still some build errors, but much fewer than before.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49407
Reviewed By: izdeby
Differential Revision: D25562163
Pulled By: glaringlee
fbshipit-source-id: 91bfd9e9ac70723816309f489022d72853f5fdc5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46244
- What does the generated binding code do?
The Python binding codegen produces code that takes the input list of
PyObjects, finds the matching ATen C++ function using PythonArgParser,
converts the PyObjects into C++ types and calls the ATen C++ function:
```
+--------+ parsing +------------------------+ binding +-----------------------+
| PyObjs | ---------> | PythonArgParser Output | ---------> | Cpp Function Dispatch |
+--------+ +------------------------+ +-----------------------+
```
- Are Python arguments 1-1 mapped to C++ arguments?
Python arguments might be reordered, packed, unpacked when binding to
C++ arguments, as illustrated below:
```
// Binding - Reorder & Packing
// aten::empty.names(int[] size, *, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None,
Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor
Python Args Cpp Args
-----------------------------------------------------------
0: size size
1: names names
2: memory_format -------+
3: dtype -----+-|--> options
4: layout / |
5: device / +--> memory_format
6: pin_memory /
7: requires_grad -+
// Binding - Unpacking
// aten::max.names_dim(Tensor self, Dimname dim, bool keepdim=False) -> (Tensor values, Tensor indices)
Python Args Cpp Args
-----------------------------------------------------------
+----> max
/-----> max_values
0: input / self
1: dim / dim
2: keepdim / keepdim
3: out -----+
```
- Why do we want to rewrite the python binding codegen?
The old codegen takes Declarations.yaml as input. It doesn't distinguish
between Python arguments and C++ arguments - they are all mixed together
as a bag of non-typed dict objects. Different methods process these arg
objects and add new attributes for various different purposes. It's not so
obvious to figure out the semantics of these attributes. The complicated
binding logic happens implicitly and scatteredly.
```
+--------------------+
| Native Functions |
+--------------------+
|
|
v
+--------------------+
| Cpp Signatures |
+--------------------+
|
|
v
+--------------------+
| Declarations.yaml |
+--------------------+
| +-------------------------------------+
| +-------> | PythonArgParser Schema |
| | +-------------------------------------+
| | .
| | .
v | .
+--------------------+ +-------------------------------------+
| NonTyped Args Objs | --> | PythonArgParser -> Cpp Args Binding |
+--------------------+ +-------------------------------------+
| .
| .
| .
| +-------------------------------------+
+-------> | Cpp Function Dispatch |
+-------------------------------------+
```
This PR leverages the new immutable data models introduced in the new
aten codegen. It introduces dedicated data models for python schema.
This way, we can not only avoid subtle Declaration.yaml conversions but
also decouple the generation of python schema, python to c++ binding and
c++ function call.
The ultimate state will be like the following diagram:
```
+-------------------+ +-------------------------------------+
+-------> | Python Signatures | --> | PythonArgParser Schema |
| +-------------------+ +-------------------------------------+
| | .
| | .
| | .
+------------------+ | +-------------------------------------+
| Native Functions | +-------> | PythonArgParser -> Cpp Args Binding |
+------------------+ | +-------------------------------------+
| | .
| | .
| | .
| +-------------------+ +-------------------------------------+
+-------> | Cpp Signatures | --> | Cpp Function Dispatch |
+-------------------+ +-------------------------------------+
```
This PR has migrated the core binding logic from
tools/autograd/gen_python_functions.py to tools/codegen/api/python.py.
It produces the byte-for-byte same results (tested with #46243).
Will migrate the rest of gen_python_functions.py in subsequent PRs.
Test Plan: Imported from OSS
Reviewed By: bhosmer
Differential Revision: D24388874
Pulled By: ljk53
fbshipit-source-id: f88b6df4e917cf90d868a2bbae2d5ffb680d1841
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42629
How to approach reviewing this diff:
- The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen.
- The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`.
- All of the inputs to the old codegen are deleted.
- Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI.
- LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: bhosmer
Differential Revision: D23183978
Pulled By: ezyang
fbshipit-source-id: 6073ba432ad182c7284a97147b05f0574a02f763
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41312
I was hoping that exhale had gotten incremental recompilation
in its latest version, but experimentally this does not seem
to have been the case. Still, I had gotten the whole shebang
to be working on the latest version of these packages, so might
as well land the upgrade. There was one bug in Optional.h that
I had to fix; see the cited bug report.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: zou3519
Differential Revision: D22526349
Pulled By: ezyang
fbshipit-source-id: d4169c2f48ebd8dfd8a593cc8cd232224d008ae9
Summary:
See https://www.sphinx-doc.org/en/master/man/sphinx-build.html#cmdoption-sphinx-build-j
> Distribute the build over N processes in parallel, to make building on multiprocessor machines more effective. Note that not all parts and not all builders of Sphinx can be parallelized. If auto argument is given, Sphinx uses the number of CPUs as N.
- Timing results
- Python doc build on a 40-core machine: 9:34 down to 1:29
- pytorch_cpp_doc_push: ~1h 10m down to 47m
- pytorch_python_doc_push: 34m down to 32m
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38785
Differential Revision: D21691991
Pulled By: zou3519
fbshipit-source-id: cfc5e8cd13414640f82edfd2ad1ce4d9c7afce12
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36742
Now, you can define a custom class inside a TORCH_LIBRARY block.
It looks very similar to what you did before. Instead of
```
static auto m = torch::class_<Class>("Namespace", "Class").def("foo", foo);
```
you write
```
TORCH_LIBRARY(Namespace, m) {
m.class_<Class>("Class")
.def("foo", foo);
}
```
All the old usages still work, but at some point we should start
updating the tutorials when we're ready to go 100% live with the
new pybind11 style API.
custom class API previously lived in torch/ folder and in torch
namespace, so for consistency, the new TORCH_LIBRARY also got
moved to torch/library.h The definition of Library::class_ is in the
bottom of that header because I need all of the class_ constructors
available, but there is a circular dependency between the two headers.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D21089648
Test Plan: Imported from OSS
Pulled By: ezyang
fbshipit-source-id: 8d54329c125242605336c22fa1642aae6940b507
Summary:
Fixes https://github.com/pytorch/pytorch/issues/25845.
**Test Plan:**
Check `pytorch_cpp_doc_push` CI job, and see if there is `classat_1_1_tensor` generated (similar to `structat_1_1native_1_1_convolution_descriptor`).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34467
Differential Revision: D20338190
Pulled By: yf225
fbshipit-source-id: 52dc05af5e0d742e740de5576d0d2b3e17ef28dd
Summary:
This commit fixes overlapping keywords in the CPP Docs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34142
Test Plan: Imported from GitHub, without a `Test Plan:` line.
Differential Revision: D20319949
Pulled By: yf225
fbshipit-source-id: e7bb2efdc286c85792c6f18a260c3bba33c54008
Summary:
This PR move glu to Aten(CPU).
Test script:
```
import torch
import torch.nn.functional as F
import time
torch.manual_seed(0)
def _time():
if torch.cuda.is_available():
torch.cuda.synchronize()
return time.time()
device = "cpu"
#warm up
for n in [10, 100, 1000, 10000]:
input = torch.randn(128, n, requires_grad=True, device=device)
grad_output = torch.ones(128, n // 2, device=device)
for i in range(1000):
output = F.glu(input)
output.backward(grad_output)
for n in [10, 100, 1000, 10000]:
fwd_t = 0
bwd_t = 0
input = torch.randn(128, n, requires_grad=True, device=device)
grad_output = torch.ones(128, n // 2, device=device)
for i in range(10000):
t1 = _time()
output = F.glu(input)
t2 = _time()
output.backward(grad_output)
t3 = _time()
fwd_t = fwd_t + (t2 -t1)
bwd_t = bwd_t + (t3 - t2)
fwd_avg = fwd_t / 10000 * 1000
bwd_avg = bwd_t / 10000 * 1000
print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)."
% (n, fwd_avg, bwd_avg))
```
Test device: **skx-8180.**
Before:
```
input size(128, 10) forward time is 0.04 (ms); backwad avg time is 0.08 (ms).
input size(128, 100) forward time is 0.06 (ms); backwad avg time is 0.14 (ms).
input size(128, 1000) forward time is 0.11 (ms); backwad avg time is 0.31 (ms).
input size(128, 10000) forward time is 1.52 (ms); backwad avg time is 2.04 (ms).
```
After:
```
input size(128, 10) forward time is 0.02 (ms); backwad avg time is 0.05 (ms).
input size(128, 100) forward time is 0.04 (ms); backwad avg time is 0.09 (ms).
input size(128, 1000) forward time is 0.07 (ms); backwad avg time is 0.17 (ms).
input size(128, 10000) forward time is 0.13 (ms); backwad avg time is 1.03 (ms).
```
Fix https://github.com/pytorch/pytorch/issues/24707, https://github.com/pytorch/pytorch/issues/24708.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33179
Differential Revision: D19839835
Pulled By: VitalyFedyunin
fbshipit-source-id: e4d3438556a1068da2c4a7e573d6bbf8d2a6e2b9
Summary:
Instead of a mixture of direct calls to library provided atomicAdd calls, such as float atomicAdd(float*, float) and calls provided internally, such as void atomicAdd(long*, long), abstract to one API void gpuAtomicAdd(T*, T) in THCAtomics.cuh for the PyTorch backend.
The advantage of this approach is that it allows us to more easily distinguish between capabiltiies of different platforms (and their versions). Additionally, the abstraction of void returning atomicAdds allows us to, in the future, support fast HW instructions on some platforms that will not return the previous value.
Call sites that do not satisfy above conditions and are either highly platform specific (__half2 atomicAdd fast path in one operator) or require the return explicitly (some int atomicAdd invocations) are left untouched. The Caffe2 backend also remains untouched.
While here, add a bunch of includes of THCAtomics.cuh that were missing before.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31992
Differential Revision: D19330220
Pulled By: ezyang
fbshipit-source-id: d6ab73ec5168c77e328faeef6c6f48eefba00861
Summary:
This is a first pass attempt at documenting `IValue` to help with problems like in #17165. Most users are probably concerned with
* how to make an `IValue` that matches the input type to their graph (most of the constructors are pretty self explanatory, so as long as they are in the docs I think its enough)
* how to extract the results after running their graph (there is a small note on the behavior of `.toX()` based on confusions we've had in the past)
Preview:
https://driazati.github.io/pytorch_doc_previews/31904/api/structc10_1_1_i_value.html#exhale-struct-structc10-1-1-i-value
There are also some random CSS fixes to clean up the style.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31904
Pulled By: driazati
Differential Revision: D19318733
fbshipit-source-id: b29dae3349d5a7ea5a3b8e09cd23f7ff8434edb4
Summary:
Stacked PRs
* **#31908 - Remove C++ docs contributing page**
* #31905 - Add doc previewing instructions
We should have 1 source of truth for contribution instructions (CONTRIBUTING.md).
This PR moves the instructions from the C++ doc pages there instead of having its
own separate page.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31908
Pulled By: driazati
Differential Revision: D19296366
fbshipit-source-id: c1daf004259342bd09e09dea3b80e34db47066ec
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30530
Switch some mentions of "C++11" in the docs to "C++14"
ghstack-source-id: 95812049
Test Plan: testinprod
Differential Revision: D18733733
fbshipit-source-id: b9d0490eb3f72bad974d134bbe9eb563f6bc8775
Summary:
To avoid ABI issue
EDIT: After this PR, the example CMakeLists.txt will always use the `-D_GLIBCXX_USE_CXX11_ABI` value set in `share/cmake/Torch/TorchConfig.cmake`, regardless of the `-D_GLIBCXX_USE_CXX11_ABI` value passed to the `cmake` command by the user.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29890
Differential Revision: D18531391
Pulled By: yf225
fbshipit-source-id: 2db78ae7a33a4088b579e81c60b9a74861f1ccde
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29041
1) Enhanced autograd unit tests to test the
torch.distributed.autograd.backward() API more thoroughly on Python UDFs.
2) Enhanced `python_error` to override `what` such that it returns an
appropriate error string if we call `what()` on this error. This ensures we can
propagate exceptions over the wire during RPCs (since we get the error string
by calling what() on the exception)
ghstack-source-id: 93098679
ghstack-source-id: 93098679
Test Plan: waitforbuildbot
Reviewed By: mrshenli
Differential Revision: D18273041
fbshipit-source-id: 85d3932fed6337668a812367fdfce233c1b3ff8e
Summary:
`at::ArrayRef` / `torch::IntArrayRef` should be discouraged in user code, because users might not be aware of the fact that it doesn't own the underlying data, which already leads to memory access bugs when they try to write the following:
```cpp
auto expected_sizes = torch::IntArrayRef({2, 16, 6}); // The memory that represents `{2, 16, 6}` is released after this line
ASSERT_EQ(output.sizes(), expected_sizes); // `expected_sizes` is pointing to invalid memory region
```
This PR changes all usage of `at::ArrayRef` and `torch::IntArrayRef` to the corresponding `std::vector` version, so that users won't pick up the habit of using `ArrayRef` by looking at the test code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27884
Differential Revision: D17921646
Pulled By: yf225
fbshipit-source-id: 461e79fc22b598aac230d36cc028085ce6cbe937
Summary:
According to https://github.com/pytorch/pytorch/issues/27285 , seems we do not intend to use shebang as an indication of Python version, thus
we enable EXE001 flake8 check.
For violations, we either remove shebang from non-executable Python scripts or grant them executable permission.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27560
Differential Revision: D17831782
Pulled By: ezyang
fbshipit-source-id: 6282fd3617b25676a6d959af0d318faf05c09b26