Options to address the "undocumented python objects":
1. Reference the functions in the .rst via the torch.nn.modules namespace. Note that this changes the generated doc filenames / locations for most of these functions!
2. [Not an option] Monkeypatch `__module__` for these objects (broke several tests in CI due to `inspect.findsource` failing after this change)
3. Update the .rst files to also document the torch.nn.modules forms of these functions, duplicating docs.
#### [this is the docs page added](https://docs-preview.pytorch.org/pytorch/pytorch/158491/nn.aliases.html)
This PR takes option 3 by adding an rst page nn.aliases that documents the aliases in nested namespaces, removing all the torch.nn.modules.* entries from the coverage skiplist except
- NLLLoss2d (deprecated)
- Container (deprecated)
- CrossMapLRN2d (what is this?)
- NonDynamicallyQuantizableLinear
This mostly required adding docstrings to `forward`, `extra_repr` and `reset_parameters`. Since forward arguments are already part of the module docstrings I just added a very basic docstring.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158491
Approved by: https://github.com/janeyx99
Fixes https://github.com/pytorch/pytorch/issues/118129
Suppressions automatically added with
```
import re
with open("error_file.txt", "r") as f:
errors = f.readlines()
error_lines = {}
for error in errors:
match = re.match(r"(.*):(\d+):\d+: error:.*\[(.*)\]", error)
if match:
file_path, line_number, error_type = match.groups()
if file_path not in error_lines:
error_lines[file_path] = {}
error_lines[file_path][int(line_number)] = error_type
for file_path, lines in error_lines.items():
with open(file_path, "r") as f:
code = f.readlines()
for line_number, error_type in sorted(lines.items(), key=lambda x: x[0], reverse=True):
code[line_number - 1] = code[line_number - 1].rstrip() + f" # type: ignore[{error_type}]\n"
with open(file_path, "w") as f:
f.writelines(code)
```
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Co-authored-by: Catherine Lee <csl@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118533
Approved by: https://github.com/Skylion007, https://github.com/zou3519
Fixes https://github.com/pytorch/pytorch/issues/118129
Suppressions automatically added with
```
import re
with open("error_file.txt", "r") as f:
errors = f.readlines()
error_lines = {}
for error in errors:
match = re.match(r"(.*):(\d+):\d+: error:.*\[(.*)\]", error)
if match:
file_path, line_number, error_type = match.groups()
if file_path not in error_lines:
error_lines[file_path] = {}
error_lines[file_path][int(line_number)] = error_type
for file_path, lines in error_lines.items():
with open(file_path, "r") as f:
code = f.readlines()
for line_number, error_type in sorted(lines.items(), key=lambda x: x[0], reverse=True):
code[line_number - 1] = code[line_number - 1].rstrip() + f" # type: ignore[{error_type}]\n"
with open(file_path, "w") as f:
f.writelines(code)
```
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118533
Approved by: https://github.com/Skylion007, https://github.com/zou3519
Fixes#112602
For some reason, I could not get the same output when running pycodestyle command as indicated in the issue. I manually ran ruff checks fixing the following issues `D202`, `D204`, `D205`, `D207`, `D400` and `D401`.
### Requested output
nn.modules.activation:
before: 135
after: 79
nn.modules.batchnorm
before: 21
after: 3
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113531
Approved by: https://github.com/mikaylagawarecki
I approved https://github.com/pytorch/pytorch/pull/110850 which did the following
Previously:
`num_batches_tracked` not in state_dict when doing `m.load_state_dict(state_dict)` --> always overwrite module's `num_batches_tracked` in `load_from_state_dict` with a 0 cpu tensor
Now:
`num_batches_tracked` not in state_dict loaded when doing `m.load_state_dict(state_dict)` --> only overwrite module's `num_batches_tracked` in `load_from_state_dict` with a 0 cpu tensor if module does not have `num_batches_tracked`
This causes the following issue:
```
with torch.device('meta'):
m = BatchNorm(...)
m.load_state_dict(state_dict, assign=True)
```
If `num_batches_tracked` is not in `state_dict`, since `modules's` `num_batches_tracked` is present on meta device, it is not overwritten with a 0 cpu tensor. When compiling, this error is raised
```
AssertionError: Does not support mixing cuda+meta
```
I am not sure whether the explicit check for meta device makes sense as a fix, will add testing if this fix is ok
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115285
Approved by: https://github.com/albanD
When converting bn to sync bn, we need to keep sync bn's training flag with the original bn flag, the motivation is there in case the given origin model has set some bn training flag and others are not seated, after we convert sync bn, we hoping not to change this behavior.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111998
Approved by: https://github.com/mikaylagawarecki
Fixes#103818
1. for some special nn.Modules, there are checks which only support cuda, so I add `privateuse1` check.
2. when get the device type for `privateuse1` by `torch._C._get_privateuse1_backend_name()`, it will get error in `torch.jit.script`, so I add a global variable to avoid this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103419
Approved by: https://github.com/albanD
This is a new version of #15648 based on the latest master branch.
Unlike the previous PR where I fixed a lot of the doctests in addition to integrating xdoctest, I'm going to reduce the scope here. I'm simply going to integrate xdoctest, and then I'm going to mark all of the failing tests as "SKIP". This will let xdoctest run on the dashboards, provide some value, and still let the dashboards pass. I'll leave fixing the doctests themselves to another PR.
In my initial commit, I do the bare minimum to get something running with failing dashboards. The few tests that I marked as skip are causing segfaults. Running xdoctest results in 293 failed, 201 passed tests. The next commits will be to disable those tests. (unfortunately I don't have a tool that will insert the `#xdoctest: +SKIP` directive over every failing test, so I'm going to do this mostly manually.)
Fixes https://github.com/pytorch/pytorch/issues/71105
@ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82797
Approved by: https://github.com/ezyang
Summary:
This PR was not my worst debugging annoyance, nor my smallest in lines changed, but it has the highest `debugging annoyance/lines changed` ratio.
The current pattern
```
self.num_batches_tracked = self.num_batches_tracked + 1
```
, if captured, deletes an eagerly-allocated tensor and overwrites it with a captured tensor. Replays read from the (deallocated) original tensor's address.
This can cause
1. an IMA on graph replay
2. failure to actually increment `num_batches_tracked` during graph replay, because every replay reads from the old location without adding to it
3. numerical corruption if the allocator reassigns the original tensor's memory to some unrelated tensor
4. combinations of 1, 2, and 3, depending on global allocation patterns and if/when the BN module is called eagerly sometimes between replays
(ask me how I know).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70444
Reviewed By: albanD
Differential Revision: D33342203
Pulled By: ngimel
fbshipit-source-id: 5f201cc25030517e75af010bbaa88c452155df21
Summary:
There is a very common error when writing docs: One forgets to write a matching `` ` ``, and something like ``:attr:`x`` is rendered in the docs. This PR fixes most (all?) of these errors (and a few others).
I found these running ``grep -r ">[^#<][^<]*\`"`` on the `docs/build/html/generated` folder. The regex finds an HTML tag that does not start with `#` (as python comments in example code may contain backticks) and that contains a backtick in the rendered HTML.
This regex has not given any false positive in the current codebase, so I am inclined to suggest that we should add this check to the CI. Would this be possible / reasonable / easy to do malfet ?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60474
Reviewed By: mrshenli
Differential Revision: D29309633
Pulled By: albanD
fbshipit-source-id: 9621e0e9f87590cea060dd084fa367442b6bd046
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56982
SyncBatchNorm should behave as a regular BN layer in eval model, this
change ensures that this is the case.
In particular, the bug was when `track_running_stats=False`, `bn_training` would be set to True in eval mode, but this would trigger a collective sync in syncBN.
However, in eval mode syncBN should behave like a regular BN layer and not do this sync.
Closes https://github.com/pytorch/pytorch/issues/48988
Ensured with unittest that when used for inference on a single rank, stats sync is not triggered.
ghstack-source-id: 127544421
Test Plan: CI
Reviewed By: SciPioneer
Differential Revision: D27579297
fbshipit-source-id: 26406e2793f0be14f2daa46ae66f97a8494182ed
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56425
As SPMD mode is gone, `_specify_ddp_gpu_num` becomes useless. It only checks if the module is a GPU module. This actually is already checked by the caller of this function (in fairscale and some other codebases).
Additionally also remove `enable_pytorch_sync_bn` wrapper that only calls this function and does nothing else.
ghstack-source-id: 126885376
Test Plan: waitforbuildbot
Reviewed By: zhaojuanmao
Differential Revision: D27866440
fbshipit-source-id: d2fd5cf43eda25c0a2bd35f647848ec0dbd6ad0f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55946
As `ddp_gpu_size` field of `SyncBatchNorm` will always be 1 for GPU modules, remove this field and the relevant code.
ghstack-source-id: 126883498
Test Plan: waitforbuildbot
Reviewed By: zhaojuanmao
Differential Revision: D27746021
fbshipit-source-id: b4518c07e6f0c6943fbd7a7548500a7d4337126c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54208
It seems like it was added to suppress some errors in LazyModules, but I think we should solve those more directly with some type ignores in more surgical places.
Fixes#54087.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D27137363
Pulled By: ezyang
fbshipit-source-id: 017cafcc3350e73cd62436078835b97cd9b3b929