[BE][EZ] Minor doc fixes (#158574)

[BE] Minor doc fixes
2025-10-20 12:54:11 +08:00 · 2025-07-18 10:34:55 -05:00
parent 036eb1f65d
commit 193b29ee0c
9 changed files with 16 additions and 16 deletions
--- a/README.md
+++ b/README.md
@ -520,7 +520,7 @@ on [our website](https://pytorch.org/get-started/previous-versions).

 ## Getting Started

-Three-pointers to get you started:
+Three pointers to get you started:
 - [Tutorials: get you started with understanding and using PyTorch](https://pytorch.org/tutorials/)
 - [Examples: easy to understand PyTorch code across all domains](https://github.com/pytorch/examples)
 - [The API Reference](https://pytorch.org/docs/)
--- a/functorch/README.md
+++ b/functorch/README.md
@ -7,7 +7,7 @@
 | [**Future Plans**](#future-plans)

 **This library is currently under heavy development - if you have suggestions
-on the API or use-cases you'd like to be covered, please open an github issue
+on the API or use-cases you'd like to be covered, please open a GitHub issue
 or reach out. We'd love to hear about how you're using the library.**

 `functorch` is [JAX-like](https://github.com/google/jax) composable function
@ -161,7 +161,7 @@ result = vmap(model)(examples)

 ### grad

-`grad(func)(*inputs)` assumes `func` returns a single-element Tensor. It compute
+`grad(func)(*inputs)` assumes `func` returns a single-element Tensor. It computes
 the gradients of the output of func w.r.t. to `inputs[0]`.

 ```py
@ -192,7 +192,7 @@ def compute_loss(weights, example, target):
 weights = torch.randn(feature_size, requires_grad=True)
 examples = torch.randn(batch_size, feature_size)
 targets = torch.randn(batch_size)
-inputs = (weights,examples, targets)
+inputs = (weights, examples, targets)
 grad_weight_per_example = vmap(grad(compute_loss), in_dims=(None, 0, 0))(*inputs)
 ```

--- a/functorch/writing_batching_rules.md
+++ b/functorch/writing_batching_rules.md
@ -5,7 +5,7 @@ First off, what are batching rules and why do we need so many of them? Well, to
 ### How does vmap work?
 Vmap is a function transform (pioneered by Jax) that allows one to batch functions. That is, given a function `f(x: [N]) -> [N]`, `vmap(f)` now transforms the signature to be `f(x: [B, N]) -> [B, N]`. That is - it adds a batch dimension to both the input and the output of the function.

-This guide will gloss over all the cool things you can do this (there are many!), so let's focus on how we actually implement this.
+This guide will gloss over all the cool things you can do with this (there are many!), so let's focus on how we actually implement this.

 One misconception is that this is some magic compiler voodoo, or that it is inherently some function transform. It is not - and there's another framing of it that might make it more clear.

--- a/scripts/release_notes/README.md
+++ b/scripts/release_notes/README.md
@ -130,7 +130,7 @@ This part is a little tedious but it seems to work. May want to explore using pa
 5. Install the google doc extension [docs to markdown](https://github.com/evbacher/gd2md-html)
 6. Start to compile back down these markdown files into a single markdown file.

-`TODO`: This is by far the most manual process and is ripe for automation. If the next person up would like to investigate Google Doc APIS there is some room hor improvement here.
+`TODO`: This is by far the most manual process and is ripe for automation. If the next person up would like to investigate Google Doc APIS there is some room for improvement here.

 ### Part 4: Cherry Picks

@ -187,7 +187,7 @@ You will then create a release at [Pytorch Release](https://github.com/pytorch/p

 #### Tidbits
 You will probably have a release note that doesn't fit into the character limit of github. I used the following regex:
-`\[#(\d+)\]\(https://github.com/pytorch/pytorch/pull/\d+\)` to replace the full lunks to (#<pull-request-number>).
+`\[#(\d+)\]\(https://github.com/pytorch/pytorch/pull/\d+\)` to replace the full links to (#<pull-request-number>).
 This will get formatted correctly in the github UI and can be checked when creating a draft release.


--- a/torch/csrc/jit/tensorexpr/ConditionalsInTE.md
+++ b/torch/csrc/jit/tensorexpr/ConditionalsInTE.md
@ -14,7 +14,7 @@ So far the recommendation was to standardize on fused conditionals.

 ## Expression Conditionals vs Statement Conditionals

-Tensor IR contains both expression conditionals (`CompareSelect` and `IfThenElse`), as well as statement conditionals (`Cond`).  Expression conditionals are defined by being functional in nature: there is no side effect from duplicating the conditional, evaluating it twice, etc.  They are an important ingredient in expression important operators like ReLU:
+Tensor IR contains both expression conditionals (`CompareSelect` and `IfThenElse`), as well as statement conditionals (`Cond`).  Expression conditionals are defined by being functional in nature: there is no side effect from duplicating the conditional, evaluating it twice, etc.  They are an important ingredient in expressing important operators like ReLU:

 ```
 store (((load A) >= 0.0) ? (load A) : 0.0), B
--- a/torch/distributed/CONTRIBUTING.md
+++ b/torch/distributed/CONTRIBUTING.md
@ -2,7 +2,7 @@

 Please go through PyTorch's top level [Contributing Guide](../../CONTRIBUTING.md) before proceeding with this guide.

-[PyTorch Distributed Overview](https://pytorch.org/tutorials//beginner/dist_overview.html) is a great starting point with a lot of tutorials, documentation and design docs covering PyTorch Distributed. We would highly recommend going through some of that material before you start working on PyTorch Distributed.
+[PyTorch Distributed Overview](https://pytorch.org/tutorials//beginner/dist_overview.html) is a great starting point with a lot of tutorials, documentation and design docs covering PyTorch Distributed. We highly recommend going through some of that material before you start working on PyTorch Distributed.

 In this document, we mostly focus on some of the code structure for PyTorch distributed and implementation details.

--- a/torch/fx/README.md
+++ b/torch/fx/README.md
@ -70,7 +70,7 @@ Here, we set up a simple Module that exercises different language features: fetc
 The `fx.Graph` is a core data structure in FX that represents the operations and their dependencies in a structured format. It consists of a List of `fx.Node` representing individual operations and their inputs and outputs. The Graph enables simple manipulation and analysis of the model structure, which is essential for implementing various transformations and optimizations.

 ## Node
-An `fx.Node` is a datastructure that represent individual operations within an `fx.Graph`, it maps to callsites such as operators, methods and modules. Each `fx.Node` keeps track of its inputs, the previous and next nodes, the stacktrace so you can map back the node to a line of code in your python file and some optional metadata stored in a `meta` dict.
+An `fx.Node` is a data structure that represents individual operations within an `fx.Graph`, it maps to callsites such as operators, methods and modules. Each `fx.Node` keeps track of its inputs, the previous and next nodes, the stacktrace so you can map back the node to a line of code in your python file and some optional metadata stored in a `meta` dict.

 ## [GraphModule](https://pytorch.org/docs/main/fx.html#torch.fx.GraphModule) ##
 The `fx.GraphModule` is a subclass of `nn.Module` that holds the transformed Graph, the original module's parameter attributes and its source code. It serves as the primary output of FX transformations and can be used like any other `nn.Module`. `fx.GraphModule` allows for the execution of the transformed model, as it generates a valid forward method based on the Graph's structure.
@ -115,11 +115,11 @@ Tracing captures an intermediate representation (IR), which is represented as a

 Node is the data structure that represents individual operations within a Graph. For the most part, Nodes represent callsites to various entities, such as operators, methods, and Modules (some exceptions include Nodes that specify function inputs and outputs). Each Node has a function specified by its `op` property. The Node semantics for each value of `op` are as follows:

- `placeholder` represents a function input. The `name` attribute specifies the name this value will take on. `target` is similarly the name of the argument. `args` holds either: 1) nothing, or 2) a single argument denoting the default parameter of the function input. `kwargs` is don't-care. Placeholders correspond to the function parameters (e.g. `x`) in the graph printout.
- `get_attr` retrieves a parameter from the module hierarchy. `name` is similarly the name the result of the fetch is assigned to. `target` is the fully-qualified name of the parameter's position in the module hierarchy. `args` and `kwargs` are don't-care
+- `placeholder` represents a function input. The `name` attribute specifies the name this value will take on. `target` is similarly the name of the argument. `args` holds either: 1) nothing, or 2) a single argument denoting the default parameter of the function input. `kwargs` is ignored. Placeholders correspond to the function parameters (e.g. `x`) in the graph printout.
+- `get_attr` retrieves a parameter from the module hierarchy. `name` is similarly the name the result of the fetch is assigned to. `target` is the fully-qualified name of the parameter's position in the module hierarchy. `args` and `kwargs` are ignored
 - `call_function` applies a free function to some values. `name` is similarly the name of the value to assign to. `target` is the function to be applied. `args` and `kwargs` represent the arguments to the function, following the Python calling convention
 - `call_module` applies a module in the module hierarchy's `forward()` method to given arguments. `name` is as previous. `target` is the fully-qualified name of the module in the module hierarchy to call. `args` and `kwargs` represent the arguments to invoke the module on, *including the self argument*.
- `call_method` calls a method on a value. `name` is as similar. `target` is the string name of the method to apply to the `self` argument. `args` and `kwargs` represent the arguments to invoke the module on, *including the self argument*
+- `call_method` calls a method on a value. `name` is similar. `target` is the string name of the method to apply to the `self` argument. `args` and `kwargs` represent the arguments to invoke the module on, *including the self argument*
 - `output` contains the output of the traced function in its `args[0]` attribute. This corresponds to the "return" statement in the Graph printout.

 To facilitate easier analysis of data dependencies, Nodes have read-only properties `input_nodes` and `users`, which specify which Nodes in the Graph are used by this Node and which Nodes use this Node, respectively. Although Nodes are represented as a doubly-linked list, the use-def relationships form an acyclic graph and can be traversed as such.
--- a/torch/onnx/README.md
+++ b/torch/onnx/README.md
@ -23,7 +23,7 @@ symbolic_opset9.py.

 To extend support for updated operators in different opset versions on top of opset 9,
 simply add the updated symbolic functions in the respective symbolic_opset{version}.py file.
-Checkout topk in symbolic_opset10.py, and upsample_nearest2d in symbolic_opset8.py for example.
+Check out topk in symbolic_opset10.py, and upsample_nearest2d in symbolic_opset8.py for example.

 ## Editing Symbolic Files

--- a/torch/utils/benchmark/README.md
+++ b/torch/utils/benchmark/README.md
@ -25,7 +25,7 @@ into two broad categories:

  * `Timer` implements the `blocked_autorange` function which is a
  mixture of `timeit.Timer.repeat` and `timeit.Timer.autorange`. This function
-  selects and appropriate number and runs for a roughly fixed amount of time
+  selects an appropriate number and runs for a roughly fixed amount of time
  (like `autorange`), but is less wasteful than `autorange` which discards
  ~75% of measurements. It runs many times, similar to `repeat`, and returns
  a `Measurement` containing all of the run results.
@ -46,7 +46,7 @@ table will be generated per unique label.
 may be logically equivalent differ in implementation. Assigning separate
 sub_labels will result in a row per sub_label. If a sublabel is not provided,
 `stmt` is used instead. Statistics (such as computing the fastest
-implementation) are use all sub_labels.
+implementation) use all sub_labels.

 * `description`: This describes the inputs. For instance, `stmt=torch.add(x, y)`
 can be run over several values of `x` and `y`. Each pair should be given its