From 9e50c21e27268dcd4dbf82de26e7a2094b88d363 Mon Sep 17 00:00:00 2001
From: Anthony Shoumikhin <shoumikhin@meta.com>
Date: Fri, 25 Apr 2025 21:27:27 +0000
Subject: [PATCH] Fix xrefs (#151888)

Fix existing cross references and removed old ones

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151888
Approved by: https://github.com/eqy, https://github.com/huydhn, https://github.com/svekars
---
 CONTRIBUTING.md                                   |  2 --
 aten/src/ATen/cudnn/README.md                     |  2 +-
 aten/src/ATen/mkl/README.md                       |  2 +-
 aten/src/ATen/native/nested/README.md             |  4 ++--
 test/distributed/elastic/test_control_plane.py    |  7 ++++---
 test/onnx/torchlib/README.md                      |  2 +-
 torch/ao/quantization/backend_config/README.md    |  2 +-
 torch/ao/quantization/fx/README.md                |  2 +-
 .../c10d/control_plane/WorkerServer.cpp           |  5 ++---
 torch/csrc/jit/OVERVIEW.md                        | 15 +++++++--------
 torch/csrc/jit/README.md                          |  2 +-
 torch/csrc/profiler/README.md                     |  4 ++--
 torch/fx/passes/README.md                         |  4 ----
 13 files changed, 23 insertions(+), 30 deletions(-)

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index cdb34d5073c8..151033526835 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -281,8 +281,6 @@ dependencies as well as the nightly binaries into the repo directory.
 * [caffe2](caffe2) - The Caffe2 library.
   * [core](caffe2/core) - Core files of Caffe2, e.g., tensor, workspace,
     blobs, etc.
-  * [operators](caffe2/operators) - Operators of Caffe2.
-  * [python](caffe2/python) - Python bindings to Caffe2.
   * ...
 * [.circleci](.circleci) - CircleCI configuration management. [README](.circleci/README.md)
 
diff --git a/aten/src/ATen/cudnn/README.md b/aten/src/ATen/cudnn/README.md
index 057fbc92ecb0..1b605343a792 100644
--- a/aten/src/ATen/cudnn/README.md
+++ b/aten/src/ATen/cudnn/README.md
@@ -1,4 +1,4 @@
 All files living in this directory are written with the assumption that cuDNN is available,
 which means that these code are not guarded by `#if AT_CUDNN_ENABLED()`. Therefore, whenever
 you need to use definitions from here, please guard the `#include<ATen/cudnn/*.h>` and
-definition usages with `#if AT_CUDNN_ENABLED()` macro, e.g. [native/cudnn/BatchNorm.cpp](native/cudnn/BatchNorm.cpp).
+definition usages with `#if AT_CUDNN_ENABLED()` macro, e.g. [native/cudnn/BatchNorm.cpp](../native/cudnn/BatchNorm.cpp).
diff --git a/aten/src/ATen/mkl/README.md b/aten/src/ATen/mkl/README.md
index ee10392bdccd..2c9285b5c3bd 100644
--- a/aten/src/ATen/mkl/README.md
+++ b/aten/src/ATen/mkl/README.md
@@ -1,4 +1,4 @@
 All files living in this directory are written with the assumption that MKL is available,
 which means that these code are not guarded by `#if AT_MKL_ENABLED()`. Therefore, whenever
 you need to use definitions from here, please guard the `#include<ATen/mkl/*.h>` and
-definition usages with `#if AT_MKL_ENABLED()` macro, e.g. [SpectralOps.cpp](native/mkl/SpectralOps.cpp).
+definition usages with `#if AT_MKL_ENABLED()` macro, e.g. [SpectralOps.cpp](../native/mkl/SpectralOps.cpp).
diff --git a/aten/src/ATen/native/nested/README.md b/aten/src/ATen/native/nested/README.md
index e79256cff2d6..8c2b51e7e24f 100644
--- a/aten/src/ATen/native/nested/README.md
+++ b/aten/src/ATen/native/nested/README.md
@@ -16,7 +16,7 @@ When constructing a NestedTensor in C++ you will likely not be using the NestedT
 
 ##  Code Structure
 
-The NestedTensor code is split into two parts: the C++ code and the Python code. The C++ code is located in [aten/src/ATen/native/nested](.) and the Python code is located in [torch/nested/__init__.py](/torch/nested/__init__.py). The C++ code is split into the following files:
+The NestedTensor code is split into two parts: the C++ code and the Python code. The C++ code is located in [aten/src/ATen/native/nested](.) and the Python code is located in [torch/nested/__init__.py](../../../../../torch/nested/__init__.py). The C++ code is split into the following files:
 
 - `NestedTensorImpl.h | NestedTensorImpl.cpp`: The NestedTensor data structure and its methods.
 - `NestedTensorUtils.h | NestedTensorUtils.cpp`: Utility functions for working with NestedTensors. (This is where you will find  `map_nested_tensor` which is discussed below in the section on implementing new functions.)
@@ -60,4 +60,4 @@ If performance is not your main concern and you would like to enable coverage th
 ##  Best Practices
 
 ## Testing
-Unit tests for NestedTensors can be found at [test/test_nestedtensor.py](/test/test_nestedtensor.py). If a new operator is added to NestedTensors it is important to add a unit test for it. The unit tests are run on the CI and if they fail the PR will not be merged.
+Unit tests for NestedTensors can be found at [test/test_nestedtensor.py](../../../../../test/test_nestedtensor.py). If a new operator is added to NestedTensors it is important to add a unit test for it. The unit tests are run on the CI and if they fail the PR will not be merged.
diff --git a/test/distributed/elastic/test_control_plane.py b/test/distributed/elastic/test_control_plane.py
index cfa221147789..8fc51b5bf7e0 100644
--- a/test/distributed/elastic/test_control_plane.py
+++ b/test/distributed/elastic/test_control_plane.py
@@ -57,9 +57,10 @@ class WorkerServerTest(TestCase):
             self.assertEqual(resp.status, 200)
             self.assertEqual(
                 resp.data,
-                b"""<h1>torch.distributed.WorkerServer</h1>
-<a href="/handler/">Handler names</a>
-""",
+                b"<h1>torch.distributed.WorkerServer</h1>\n"
+                b'<a href="'
+                b"/handler/"
+                b'">Handler names</a>\n',
             )
 
             resp = pool.request("POST", "/handler/ping")
diff --git a/test/onnx/torchlib/README.md b/test/onnx/torchlib/README.md
index 0ea8c6c524d4..5a81438155e9 100644
--- a/test/onnx/torchlib/README.md
+++ b/test/onnx/torchlib/README.md
@@ -36,7 +36,7 @@ Sometimes, there is no existing OpInfo that fits our need to test an operator. Y
 
 Follow the steps below to create new OpInfo tests:
 
-1. Use the implementation for `ops.aten.slice_scatter` as a reference (https://github.com/microsoft/onnxscript/blob/e67335101e4a06b8cc98cb4129935a9af5062c77/tests/function_libs/torch_lib/extra_opinfo.py#L2412-L2418) to declare an OpInfo in [`extra_opinfo.py`](./extra_opinfo.py)
+1. Use the implementation for `ops.aten.slice_scatter` as a reference (https://github.com/microsoft/onnxscript/blob/e67335101e4a06b8cc98cb4129935a9af5062c77/tests/function_libs/torch_lib/extra_opinfo.py#L2412-L2418) to declare an `OpInfo` in `extra_opinfo.py`.
 
    ```py
     opinfo_core.OpInfo(
diff --git a/torch/ao/quantization/backend_config/README.md b/torch/ao/quantization/backend_config/README.md
index 5e63af1af355..6c5186dc2d3f 100644
--- a/torch/ao/quantization/backend_config/README.md
+++ b/torch/ao/quantization/backend_config/README.md
@@ -1,6 +1,6 @@
 ## BackendConfig Overview
 
-BackendConfig allows PyTorch quantization to work with different backend or kernel libraries. These backends may have different sets of supported quantized operator patterns, and the same operator patterns may require different handling across different backends. To make quantization work with different backends and allow maximum flexibility, we strived to make all the parts of the quantization flow configurable with BackendConfig. Currently, it is only used by FX graph mode quantization. For more details on how it integrates with the FX graph mode quantization flow, refer to this [README](/torch/ao/quantization/fx/README.md).
+BackendConfig allows PyTorch quantization to work with different backend or kernel libraries. These backends may have different sets of supported quantized operator patterns, and the same operator patterns may require different handling across different backends. To make quantization work with different backends and allow maximum flexibility, we strived to make all the parts of the quantization flow configurable with BackendConfig. Currently, it is only used by FX graph mode quantization. For more details on how it integrates with the FX graph mode quantization flow, refer to this [README](../fx/README.md).
 
 BackendConfig configures quantization behavior in terms of operator patterns. For each operator pattern, we need to specify what the supported data types are for the input and output activations, weights, and biases, and also specify the QAT modules, the reference quantized modules etc., which will be used in module swapping during the quantization passes.
 
diff --git a/torch/ao/quantization/fx/README.md b/torch/ao/quantization/fx/README.md
index ca116b282e7a..c41fd51ff6f3 100644
--- a/torch/ao/quantization/fx/README.md
+++ b/torch/ao/quantization/fx/README.md
@@ -446,4 +446,4 @@ However, for some operator based backends, like the current pytorch native backe
 
 ## Extensibility
 
-FX graph mode quantization can be extended to work with different backends, which may have different sets of supported quantized operator patterns and different requirements for each pattern. For more detail, please refer to the [BackendConfig README](/torch/ao/quantization/backend_config/README.md).
+FX graph mode quantization can be extended to work with different backends, which may have different sets of supported quantized operator patterns and different requirements for each pattern. For more detail, please refer to the [BackendConfig README](../backend_config/README.md).
diff --git a/torch/csrc/distributed/c10d/control_plane/WorkerServer.cpp b/torch/csrc/distributed/c10d/control_plane/WorkerServer.cpp
index 5d406656b094..a5ff2b22af2b 100644
--- a/torch/csrc/distributed/c10d/control_plane/WorkerServer.cpp
+++ b/torch/csrc/distributed/c10d/control_plane/WorkerServer.cpp
@@ -96,9 +96,8 @@ WorkerServer::WorkerServer(const std::string& hostOrFile, int port) {
       "/",
       [](const httplib::Request& req [[maybe_unused]], httplib::Response& res) {
         res.set_content(
-            R"BODY(<h1>torch.distributed.WorkerServer</h1>
-<a href="/handler/">Handler names</a>
-)BODY",
+            "<h1>torch.distributed.WorkerServer</h1>\n"
+            "<a href=\"/handler/\">Handler names</a>\n",
             "text/html");
       });
   server_.Get(
diff --git a/torch/csrc/jit/OVERVIEW.md b/torch/csrc/jit/OVERVIEW.md
index b15fe34d4397..29079448abfa 100644
--- a/torch/csrc/jit/OVERVIEW.md
+++ b/torch/csrc/jit/OVERVIEW.md
@@ -367,7 +367,7 @@ Values are abstract representations of data in the program. When executing, the
 
 ## Type ##
 
-[aten/src/ATen/core/jit_type.h](/aten/src/ATen/core/jit_type.h)
+[aten/src/ATen/core/jit_type.h](../../../aten/src/ATen/core/jit_type.h)
 
 TorchScript, unlike Python, is statically typed, so every `Value` has a Type associated with it, and every FunctionSchema has a list of argument types and a return type for a function. Type is the base class of a hierarchy of C++ objects that represent the built-in types of TorchScript. Types provide methods such as `Type::isSubtypeOf` that describe the typing relationships. Common type are:
 
@@ -389,7 +389,6 @@ JIT programs are created using either the tracing frontend (`torch.jit.trace`) o
 
 
 [tracer.h](frontend/tracer.h)
-[tracer_state.h](frontend/tracer_state.h)
 
 The tracer produces graphs by recording what actual operations are done on `Tensors`.
 The entry point from Python into C++ for tracing using `torch.jit.trace` is `_create_method_from_trace`.
@@ -398,7 +397,7 @@ A thread local instance of the TracingState object maintains a mapping between a
 
 An initial `IValue` to `Value` mapping is set up between the inputs to the function being traced and symbolic `Value` inputs to the `Graph` being constructed. If we are tracing a `torch.nn.Module`, the tracer also adds Parameters and sub-Modules to the Module being constructed that correspond to the Python `torch.nn.Module` being traced.  Mappings for these values are also added so that uses of the Parameters in the trace will create uses of the Parameters in the `Graph`.
 
-As the trace runs, individual operators create `Nodes` in the `Graph` being traced to record what happens. This code is currently generated per operator in [tools/autograd/gen_variable_type.py](/tools/autograd/gen_variable_type.py). It results in code that looks like the following:
+As the trace runs, individual operators create `Nodes` in the `Graph` being traced to record what happens. This code is currently generated per operator in [tools/autograd/gen_variable_type.py](../../../tools/autograd/gen_variable_type.py). It results in code that looks like the following:
 
 ```cpp
 torch::jit::Node* node = nullptr;
@@ -434,7 +433,7 @@ The resulting `Graph` created by tracing is installed as the 'forward' method of
 
 ## Script ##
 
-The script frontend directly converts Python syntax into Modules. Like many compilers this happens in two phases. First, we generate an abstract syntax tree (AST), which is constructed out of Tree objects. The IR emitter then does semantic analysis on the Tree and lowers it into a Module. We can generate Trees in two ways: (1) using frontend.py, which takes the Python AST and transliterates it into Tree objects, or (2) via the Lexer and Parser which parse Python syntax directly. The Lexer+Parser path may seem redundant but it is crucially important. We need to define builtin functions ([frontend/builtin_functions.cpp](frontend/builtin_functions.cpp)) when Python is not linked because we allow users to generate TorchScript programs directly from strings containing Python source code ([api/include/torch/jit.h](/torch/csrc/api/include/torch/jit.h)) without linking a full Python implementation (e.g. CPython). We also use this Python syntax as the serialization format for TorchScript, since it allows us to make changes to our IR without breaking backward compatibility. Furthermore, the Lexer is reused to implement the FunctionSchema parser, which turns FunctionSchema declarations from strings into FunctionSchema objects.
+The script frontend directly converts Python syntax into Modules. Like many compilers this happens in two phases. First, we generate an abstract syntax tree (AST), which is constructed out of Tree objects. The IR emitter then does semantic analysis on the Tree and lowers it into a Module. We can generate Trees in two ways: (1) using frontend.py, which takes the Python AST and transliterates it into Tree objects, or (2) via the Lexer and Parser which parse Python syntax directly. The Lexer+Parser path may seem redundant but it is crucially important. We need to define builtin functions ([frontend/builtin_functions.cpp](frontend/builtin_functions.cpp)) when Python is not linked because we allow users to generate TorchScript programs directly from strings containing Python source code ([api/include/torch/jit.h](../api/include/torch/jit.h)) without linking a full Python implementation (e.g. CPython). We also use this Python syntax as the serialization format for TorchScript, since it allows us to make changes to our IR without breaking backward compatibility. Furthermore, the Lexer is reused to implement the FunctionSchema parser, which turns FunctionSchema declarations from strings into FunctionSchema objects.
 
 The following sections look into each the stages in the script frontend in detail.
 
@@ -761,7 +760,7 @@ Optimization passes that wish to exploit multi-threaded execution may automatica
 
 ## IValue ##
 
-[ivalue.h](/aten/src/ATen/core/ivalue.h)
+[ivalue.h](../../../aten/src/ATen/core/ivalue.h)
 
 All evaluation involves computation using `IValues`, 16-byte tagged unions that can hold the concrete representation of any type in TorchScript. TorchScript is statically typed, so it would be possible to operate on unboxed primitive types, but the interface between interpreter, built-in ops and user functions would be significantly more complicated. A single tagged union keeps these interfaces simple and since most objects are `Tensors` anyway, the overhead of storing a tag is small compared to the data stored in the `Tensors`.
 
@@ -1407,7 +1406,7 @@ def foo(a : Tensor, b : Tensor):
 ```
 Will produce a graph like this:
 
-![AliasTracker graph](/docs/source/_static/img/aliastracker_graph.png)
+![AliasTracker graph](../../../docs/source/_static/img/aliastracker_graph.png)
 
 A few things to note:
 - "Graph Input Element" is an example of an `Element` that isn't a first-class `Value`. Alias analysis happens on a per-function level, so we don't necessarily know the aliasing relationships of the inputs. The only safe assumption is that `a` and `b` may alias each other, so they point to a special `Element` that describes "the world outside of this function".
@@ -1459,8 +1458,8 @@ When differentiating a graph, each node that has a symbolic gradient will be inc
 Adding/updating symbolic gradient functions must be tested carefully as it's easy to get CI green by comparing autograd result with itself, but potentially cause an autodiff support regression.
 
 If your PR adds/updates a gradient formula for `torch`/`nn` functions, you **MUST** enable/update the corresponding tests in
-- `torch` functions: `method_tests` in [common_method_tests.py](../../../test/common_method_tests.py)
-- `nn` functions: `nn_functional_tests` in [test_jit.py](../../../test/test_jit.py)
+- `torch` functions: `module_tests` in [common_nn.py](../../testing/_internal/common_nn.py)
+- `nn` functions: `nn_functional_tests` in [test_jit.py](../../testing/_internal/jit_metaprogramming_utils.py)
 
 To turn on autodiff check, you can add an optional `check_ad(should_autodiff_node[bool], nonfusible_nodes[str|list[str]], fusible_nodes[str|list[str]])` tuple after the optional test variant name field.
 If `should_autodiff_node=True`, the differentiated traced/script forward graph must have a `prim::DifferentiableGraph`.
diff --git a/torch/csrc/jit/README.md b/torch/csrc/jit/README.md
index 2b80d51a182a..83a2393a7862 100644
--- a/torch/csrc/jit/README.md
+++ b/torch/csrc/jit/README.md
@@ -26,5 +26,5 @@ A brief summary of the source tree:
 **Refer** to each folder for more in-depth documentation.
 
 Other relevant parts of the codebase not contained here:
-- [aten/src/ATen/core](/aten/src/ATen/core): contains JIT code re-used by other elements of the
+- [aten/src/ATen/core](../../../aten/src/ATen/core): contains JIT code re-used by other elements of the
   runtime system (eager, mobile, etc.)
diff --git a/torch/csrc/profiler/README.md b/torch/csrc/profiler/README.md
index 36c743132b50..339c84c0a08e 100644
--- a/torch/csrc/profiler/README.md
+++ b/torch/csrc/profiler/README.md
@@ -23,7 +23,7 @@ TODO
 
 ## `RecordFunction` ##
 
-[/aten/src/ATen/record_function.h](/aten/src/ATen/record_function.h)
+[aten/src/ATen/record_function.h](../../../aten/src/ATen/record_function.h)
 
 `RecordFunction` is used by the profiler to instrument CPU-side events.
 
@@ -38,7 +38,7 @@ There is also a python binding for `RecordFunction` in python (`with torch.profi
 The autograd engine is responsible for automatically computing gradients.
 
 The profiler records two pieces of information from the autograd engine:
-* [Sequence number](/aten/src/ATen/SequenceNumber.h): this is a unique-per-thread index assigned to each op call(\*) in the forward pass. When a backward op is triggered, it is also assigned a sequence number matching the sequence number of the forward op that caused that backward op to be executed. Using this information, the profiler is able to match forward and backward ops; in chrome traces, this feature can be enabled with the "fwd_bwd" flow events
+* [Sequence number](../../../aten/src/ATen/SequenceNumber.h): this is a unique-per-thread index assigned to each op call(\*) in the forward pass. When a backward op is triggered, it is also assigned a sequence number matching the sequence number of the forward op that caused that backward op to be executed. Using this information, the profiler is able to match forward and backward ops; in chrome traces, this feature can be enabled with the "fwd_bwd" flow events
 * [Forward thread id](https://github.com/pytorch/pytorch/blob/2e3fce54506ba82eee2c890410bf7a1405a64ec6/aten/src/ATen/record_function.h#L357): Autograd can be used in multi-threaded environments. The forward thread ID indicates the ID of the thread on which the forward op was executed on. This information is needed because the sequence number, mentioned above, is only unique within a thread; the forward thread ID is used for differentiating different ops with the same sequence number.
 
 (\*) Note that only op invocations whose inputs require gradients are assigned a sequence number
diff --git a/torch/fx/passes/README.md b/torch/fx/passes/README.md
index 1fd169bf54a3..e6f3c7a8b0e2 100644
--- a/torch/fx/passes/README.md
+++ b/torch/fx/passes/README.md
@@ -12,9 +12,5 @@ This folder contains the pass infrastructure and passes for transforming fx.Grap
 * [dialect](dialect) - dialect specific passes
     * [common](dialect/common) - common passes that can be shared by all dialects
         * [cse_pass.py](dialect/common/cse_pass.py) - a CSE pass
-    * [aten](dialect/aten) - aten dialect specific passes
-    * [prims](dialect/prims) - prim dialect specific passes
 * [backends](backends) - Backend specific passes
-    * [nvfuser](backends/nvfuser) - passes for nvfuser
-        * [operator_support.py](backends/nvfuser/operator_support.py) - nvFuser supported ops
 * [conversion](conversion) - Conversion passes between dialects