Forbid trailing whitespace (#53406)

Summary:
Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857

These are the only hand-written parts of this diff:
- the addition to `.github/workflows/lint.yml`
- the file endings changed in these four files (to appease FB-internal land-blocking lints):
  - `GLOSSARY.md`
  - `aten/src/ATen/core/op_registration/README.md`
  - `scripts/README.md`
  - `torch/csrc/jit/codegen/fuser/README.md`

The rest was generated by running this command (on macOS):
```
git grep -I -l ' $' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' | xargs gsed -i 's/ *$//'
```

I looked over the auto-generated changes and didn't see anything that looked problematic.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406

Test Plan:
This run (after adding the lint but before removing existing trailing spaces) failed:
- https://github.com/pytorch/pytorch/runs/2043032377

This run (on the tip of this PR) succeeded:
- https://github.com/pytorch/pytorch/runs/2043296348

Reviewed By: walterddr, seemethere

Differential Revision: D26856620

Pulled By: samestep

fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97
This commit is contained in:
Sam Estep
2021-03-05 17:19:22 -08:00
committed by Facebook GitHub Bot
parent cab2689eb1
commit 8c798e0622
238 changed files with 799 additions and 798 deletions

View File

@ -24,6 +24,6 @@ rm cert.txt
if ! [ -x "$(command -v xcodebuild)" ]; then
echo 'Error: xcodebuild is not installed.'
exit 1
fi
fi
PROFILE=PyTorch_CI_2021
ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} -c ${PROFILE} -t ${IOS_DEV_TEAM_ID}

View File

@ -40,6 +40,9 @@ jobs:
rm -r "shellcheck-${scversion}"
shellcheck --version
.jenkins/run-shellcheck.sh
- name: Ensure no trailing spaces
run: |
(! git grep -I -l ' $' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' || (echo "The above files have trailing spaces; please remove them"; false))
- name: Ensure no tabs
run: |
(! git grep -I -l $'\t' -- . ':(exclude)*.svg' ':(exclude)**Makefile' ':(exclude)**/contrib/**' ':(exclude)third_party' ':(exclude).gitattributes' ':(exclude).gitmodules' || (echo "The above files have tabs; please convert them to spaces"; false))

View File

@ -21,7 +21,7 @@ if (( $num_gpus == 0 )); then
fi
if (( $num_gpus >= 1 )); then
"$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 128 --epoch_size 12800 --num_epochs 2 --num_gpus 1
# Let's skip the fp16 bench runs for now, as it recompiles the miopen kernels and can take 10+min to run.
# Let's skip the fp16 bench runs for now, as it recompiles the miopen kernels and can take 10+min to run.
# We can resume when we (1) bindmount the miopen cache folder in jenkins; (2) install the pre-compiled miopen kernel library in the docker
# "$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 256 --epoch_size 25600 --num_epochs 2 --num_gpus 1 --float16_compute --dtype float16
fi

View File

@ -159,7 +159,7 @@ with `brew install cmake` if you are developing on MacOS or Linux system.
check whether your Git local or global config file contains any `submodule.*` settings. If yes, remove them and try again.
(please reference [this doc](https://git-scm.com/docs/git-config#Documentation/git-config.txt-submoduleltnamegturl) for more info).
- If you encountered error such as
- If you encountered error such as
```
fatal: unable to access 'https://github.com/pybind11/pybind11.git': could not load PEM client certificate ...
```
@ -169,11 +169,11 @@ with `brew install cmake` if you are developing on MacOS or Linux system.
openssl x509 -noout -in <cert_file> -dates
```
- If you encountered error that some third_party modules are not checkout correctly, such as
- If you encountered error that some third_party modules are not checkout correctly, such as
```
Could not find .../pytorch/third_party/pybind11/CMakeLists.txt
```
remove any `submodule.*` settings in your local git config (`.git/config` of your pytorch repo) and try again.
remove any `submodule.*` settings in your local git config (`.git/config` of your pytorch repo) and try again.
## Nightly Checkout & Pull

View File

@ -1,4 +1,4 @@
# PyTorch Glossary
# PyTorch Glossary
- [PyTorch Glossary](#pytorch-glossary)
- [Operation and Kernel](#operation-and-kernel)
@ -39,7 +39,7 @@ For example, this
to create Custom Operations.
## Kernel
Implementation of a PyTorch operation, specifying what should be done when an
Implementation of a PyTorch operation, specifying what should be done when an
operation executes.
## Compound Operation
@ -57,7 +57,7 @@ Same as Compound Operation.
## Leaf Operation
An operation that's considered a basic operation, as opposed to a Compound
Operation. Leaf Operation always has dispatch functions defined, usually has a
derivative function defined as well.
derivative function defined as well.
## Device Kernel
Device-specific kernel of a leaf operation.
@ -79,4 +79,4 @@ using just-in-time compilation.
## Scripting
Using `torch.jit.script` on a function to inspect source code and compile it as
TorchScript code.
TorchScript code.

View File

@ -300,7 +300,7 @@ Tensor trace_backward_batching_rule(const Tensor& grad, IntArrayRef input_sizes)
auto grad_input = at::zeros(grad_physical.getPhysicalShape(input_sizes), grad.options());
// Batched Diagonal View
auto grad_input_diag = at::diagonal(grad_input, /*offset*/0, /*dim1*/-2, /*dim2*/-1);
// Append a dimension of size one to the grad output
// Append a dimension of size one to the grad output
auto grad_physical_tensor = grad_physical.tensor().unsqueeze(-1);
grad_input_diag.copy_(grad_physical_tensor);
return grad_physical.getPhysicalToLogicalMap().apply(grad_input);

View File

@ -38,7 +38,7 @@ struct CPUGeneratorImplStateLegacy {
* new data introduced in at::CPUGeneratorImpl and the legacy state. It is used
* as a helper for torch.get_rng_state() and torch.set_rng_state()
* functions.
*/
*/
struct CPUGeneratorImplState {
CPUGeneratorImplStateLegacy legacy_pod;
float next_float_normal_sample;
@ -119,7 +119,7 @@ uint64_t CPUGeneratorImpl::seed() {
* must be a strided CPU byte tensor and of the same size as either
* CPUGeneratorImplStateLegacy (for legacy CPU generator state) or
* CPUGeneratorImplState (for new state).
*
*
* FIXME: Remove support of the legacy state in the future?
*/
void CPUGeneratorImpl::set_state(const c10::TensorImpl& new_state) {

View File

@ -94,7 +94,7 @@ TORCH_API Tensor flatten_indices(const Tensor& indices, IntArrayRef full_size, b
// new_indices = [ 3, 1, 3 ] # uncoalesced
TORCH_API Tensor flatten_indices_by_dims(const Tensor& indices, const IntArrayRef& sizes, const IntArrayRef& dims_to_flatten);
// Find the CSR representation for a row `indices` from the COO format
// Find the CSR representation for a row `indices` from the COO format
TORCH_API Tensor coo_to_csr(const int64_t* indices, int64_t dim, int64_t nnz);
}} // namespace at::sparse

View File

@ -114,7 +114,7 @@ std::string used_cpu_capability() {
case native::CPUCapability::AVX2:
ss << "AVX2";
break;
#endif
#endif
default:
break;
}

View File

@ -47,7 +47,7 @@ using VmapDimVector = SmallVector<int64_t, kVmapStaticDimVecSize>;
// argument.
// VmapTransform for operators that take tensors with multiple batch dims.
// Given one or more logical views on Tensors, `logicalToPhysical`
// Given one or more logical views on Tensors, `logicalToPhysical`
// permutes all of the batch dims to the front of the tensor, aligns
// and expands the batch dims to match each other (according to their `level`),
// and returns a VmapPhysicalView on the tensor(s).

View File

@ -143,7 +143,7 @@ namespace detail {
/**
* Helper function for checking the validity of new random generator
* state. Right now following conditions are checked:
*
*
* - The new state tensor must be a torch.ByteTensor
* - Data of the new state tensor must be contiguous
*/

View File

@ -40,13 +40,13 @@ typedef at::detail::Array<float, 2> FLOAT2;
* Note that currently this implementation of the philox engine is not used
* anywhere except for tests in cpu_generator_test.cpp. However, this engine
* will replace curandStatePhilox4_32_10_t in the future.
*
*
* The philox engine takes a seed value, a subsequeunce
* for starting the generation and an offset for the subsequence.
* Think of this engine as an algorithm producing a huge array. We are
* parallelizing this array by partitioning the huge array and assigning
* a thread index to each partition. In other words, each seed value
* (there are 2^64 possible seed values) gives a sub array of size
* Think of this engine as an algorithm producing a huge array. We are
* parallelizing this array by partitioning the huge array and assigning
* a thread index to each partition. In other words, each seed value
* (there are 2^64 possible seed values) gives a sub array of size
* 2^128 (each element in that array is a 128 bit number). Reasoning
* behind the array being of size 2^128 is, there are 2^64 possible
* thread index value and there is an array of size 2^64 for each of
@ -59,9 +59,9 @@ typedef at::detail::Array<float, 2> FLOAT2;
* seed: Seed values could be any number from 0 to 2^64-1.
* subsequence: Subsequence is just the cuda thread indexing with:
* - blockIdx.x * blockDim.x + threadIdx.x
* offset: The offset variable in PhiloxEngine decides how many 128-bit
* offset: The offset variable in PhiloxEngine decides how many 128-bit
* random numbers to skip (i.e. how many groups of 4, 32-bit numbers to skip)
* and hence really decides the total number of randoms that can be achieved
* and hence really decides the total number of randoms that can be achieved
* for the given subsequence.
*/

View File

@ -254,5 +254,3 @@ Also, there's some requirements on the operator schema for it to be callable fro
* Except for `Tensor` or `Tensor[]`, only arguments of type `int`, `double` and `bool` are supported. These can be in any position in the argument list and will be read from the caffe2 operator arguments, based on the argument name in the operator schema.
* We do not support lists (`int[]`, `double[]` or `bool[]`) or optionals (`int?`, `double?`, `bool?`) yet.
* The operator must return a single `Tensor` or multiple tensors as in `(Tensor, Tensor, Tensor)`. It cannot return a list `Tensor[]`, optional `Tensor?` or any primitive types.

View File

@ -1124,12 +1124,12 @@ std::string ClassType::getForwardPreHookErrorMessage(int pre_hook_idx) const {
const FunctionSchema& forward_schema = getMethod("forward").getSchema();
std::string input_types = getSchemaInputTypesString(forward_schema);
const std::vector<Argument>& forward_args = forward_schema.arguments();
std::string single_output = "";
if (forward_args.size() == 2 &&
forward_args[1].type()->cast<TupleType>() == nullptr) {
// if the output type is a single tuple, it needs to be wrapped in an outer tuple
// to match eager's behavior
// to match eager's behavior
single_output = ", '" + forward_args[1].type()->annotation_str() + "',";
}
std::string pre_hook_schema =
@ -1138,9 +1138,9 @@ std::string ClassType::getForwardPreHookErrorMessage(int pre_hook_idx) const {
"This error occured while scripting the forward pre-hook '" +
pre_hook_name + "' on module '" + name()->name() +
"'. If you did not want to script this pre-hook remove it from the "
"original NN module before scripting. Pre-hooks for module '" +
name()->name() + "' are expected to have the following signature: "
+ pre_hook_schema + " with a return type of either 'None'" +
"original NN module before scripting. Pre-hooks for module '" +
name()->name() + "' are expected to have the following signature: "
+ pre_hook_schema + " with a return type of either 'None'" +
single_output + " or 'Tuple[" + input_types + "]'.";
return return_string;
}
@ -1148,7 +1148,7 @@ std::string ClassType::getForwardPreHookErrorMessage(int pre_hook_idx) const {
std::string ClassType::getForwardHookErrorMessage(int hook_idx) const {
const std::string& hook_name = forward_hooks_[hook_idx]->name();
const FunctionSchema& forward_schema = getMethod("forward").getSchema();
std::string input_types = getSchemaInputTypesString(forward_schema);
std::string input_types = getSchemaInputTypesString(forward_schema);
// create expected output types string
const Argument& pre_output =
@ -1160,33 +1160,33 @@ std::string ClassType::getForwardHookErrorMessage(int hook_idx) const {
std::string hook_schema = hook_name + "(self, input: Tuple[" +
input_types + "], output: " + output_types + ")";
std::string return_string =
"This error occured while scripting the forward hook '"
"This error occured while scripting the forward hook '"
+ hook_name + "' on module " + name()->name() +
". If you did not want to script this hook remove it from" +
" the original NN module before scripting. This hook was" +
" expected to have the following signature: " + hook_schema +
". The type of the output arg is the returned type from" +
" either the forward method or the previous hook if it exists. " +
"Note that hooks can return anything, but if the hook is " +
". The type of the output arg is the returned type from" +
" either the forward method or the previous hook if it exists. " +
"Note that hooks can return anything, but if the hook is " +
"on a submodule the outer module is expecting" +
" the same return type as the submodule's forward.";
return return_string;
}
void checkForwardHookInputArguments(
const FunctionSchema& forward_schema,
const FunctionSchema& hook_schema,
const std::string& hook_id,
const FunctionSchema& forward_schema,
const FunctionSchema& hook_schema,
const std::string& hook_id,
const std::string& hook_err_msg) {
// check for proper tuple input types
const std::vector<Argument>& forward_args = forward_schema.arguments();
const Argument input_arg = hook_schema.arguments()[1];
TORCH_CHECK(
input_arg.type()->cast<TupleType>() != nullptr,
input_arg.type()->cast<TupleType>() != nullptr,
hook_id,
"expected the input argument to be typed as a Tuple but found type: '",
input_arg.type()->annotation_str(),
"' instead.\n",
input_arg.type()->annotation_str(),
"' instead.\n",
hook_err_msg
);
@ -1229,7 +1229,7 @@ void checkForwardHookInputArguments(
}
void ClassType::checkForwardPreHookSchema(
int pre_hook_idx,
int pre_hook_idx,
const FunctionSchema& pre_hook_schema) const {
const torch::jit::Function* pre_hook = forward_pre_hooks_[pre_hook_idx];
std::string hook_id =
@ -1261,7 +1261,7 @@ void ClassType::checkForwardPreHookSchema(
pre_hook_err_msg
);
const Argument return_arg = pre_hook_schema.returns()[0];
std::string wrong_type_returned_err_msg = hook_id +
std::string wrong_type_returned_err_msg = hook_id +
"returned the wrong type of: '" +
return_arg.type()->annotation_str() + "'.";
@ -1269,9 +1269,9 @@ void ClassType::checkForwardPreHookSchema(
return;
}
if (forward_args.size() == 2 && *forward_args[1].type() == *return_arg.type()) {
// TORCH_CHECK below is for the edge case where forward's input is a tuple and the
// TORCH_CHECK below is for the edge case where forward's input is a tuple and the
// pre-hook returns a matching tuple. Eager doesn't support this- the working eager return
// for a tuple type is the forward's input tuple wrapped inside of another tuple.
// for a tuple type is the forward's input tuple wrapped inside of another tuple.
TORCH_CHECK(
return_arg.type()->cast<TupleType>() == nullptr,
wrong_type_returned_err_msg,
@ -1316,7 +1316,7 @@ void ClassType::checkForwardPreHookSchema(
for (int i = 1; i < forward_args.size(); ++i) {
if (*forward_args[i].type() != *return_tuple_types[i - 1]) {
TORCH_CHECK(
false,
false,
wrong_type_returned_err_msg,
" The returned tuple contains the wrong inner types.\n",
pre_hook_err_msg);
@ -1325,7 +1325,7 @@ void ClassType::checkForwardPreHookSchema(
}
void ClassType::checkForwardHookSchema(
int hook_idx,
int hook_idx,
const FunctionSchema& hook_schema) const {
const torch::jit::Function* hook = forward_hooks_[hook_idx];
std::string hook_id =
@ -1388,8 +1388,8 @@ torch::jit::Function& ClassType::getMethod(const std::string& name) const {
torch::jit::Function* ClassType::findHook(const std::string& name) const {
auto hook = findForwardHook(name);
if (hook == nullptr) {
hook = findForwardPreHook(name);
}
hook = findForwardPreHook(name);
}
return hook;
}

View File

@ -113,7 +113,7 @@ public:
const auto not_nan_mask = _mm256_cmp_pd(values, values, _CMP_EQ_OQ);
const auto nan_mask = _mm256_cmp_pd(not_nan_mask, zero_vec, _CMP_EQ_OQ);
const auto pi = _mm256_set1_pd(c10::pi<double>);
const auto neg_mask = _mm256_cmp_pd(values, zero_vec, _CMP_LT_OQ);
auto angle = _mm256_blendv_pd(zero_vec, pi, neg_mask);
angle = _mm256_blendv_pd(angle, nan_vec, nan_mask);

View File

@ -120,7 +120,7 @@ public:
const auto not_nan_mask = _mm256_cmp_ps(values, values, _CMP_EQ_OQ);
const auto nan_mask = _mm256_cmp_ps(not_nan_mask, zero_vec, _CMP_EQ_OQ);
const auto pi = _mm256_set1_ps(c10::pi<float>);
const auto neg_mask = _mm256_cmp_ps(values, zero_vec, _CMP_LT_OQ);
auto angle = _mm256_blendv_ps(zero_vec, pi, neg_mask);
angle = _mm256_blendv_ps(angle, nan_vec, nan_mask);

View File

@ -364,7 +364,7 @@ class Vec256<ComplexDbl> {
}
Vec256<ComplexDbl> sqrt() const {
return map(std::sqrt);
return map(std::sqrt);
}
Vec256<ComplexDbl> reciprocal() const {

View File

@ -417,7 +417,7 @@ class Vec256<ComplexFlt> {
}
Vec256<ComplexFlt> sqrt() const {
return map(std::sqrt);
return map(std::sqrt);
}
Vec256<ComplexFlt> reciprocal() const {

View File

@ -82,7 +82,7 @@ class Vec256<double> {
blend(const Vec256<double>& a, const Vec256<double>& b) {
return { a._vec0, b._vec1 };
}
template <int64_t mask>
static std::enable_if_t<blendChoiceDbl(mask) == 4, Vec256<double>> C10_ALWAYS_INLINE
@ -206,7 +206,7 @@ class Vec256<double> {
for (int i = 0; i < size()/2; i++) {
ret._vec0[i] = f(_vec0[i], other._vec0[i]);
}
for (int i = 0; i < size()/2; i++) {
for (int i = 0; i < size()/2; i++) {
ret._vec1[i] = f(_vec1[i], other._vec1[i]);
}
return ret;
@ -314,7 +314,7 @@ class Vec256<double> {
Vec256<double> C10_ALWAYS_INLINE sqrt() const {
return {vec_sqrt(_vec0), vec_sqrt(_vec1)};
}
Vec256<double> C10_ALWAYS_INLINE reciprocal() const {
Vec256<double> C10_ALWAYS_INLINE reciprocal() const {
return {
vec_div(vd_one, _vec0), // vec_re(_vec0) is estimated one.
vec_div(vd_one, _vec1)};

View File

@ -134,11 +134,11 @@ struct Vec256<c10::qint32> {
Vec256<float> vf0 = rhs[0];
vfloat32 vecf0 = vf0.vec0();
vfloat32 vecf1 = vf0.vec1();
vfloat32 vecf1 = vf0.vec1();
vecf0 = vec_mul(vecf0, inverse_scale_v);
vecf1 = vec_mul(vecf1, inverse_scale_v);
vecf0 = vec_add(vec_rint(vecf0), vec_zero_point);
vecf1 = vec_add(vec_rint(vecf1), vec_zero_point);
vecf1 = vec_add(vec_rint(vecf1), vec_zero_point);
vint32 veci0 = vec_signed(vecf0);
vint32 veci1 = vec_signed(vecf1);
@ -171,7 +171,7 @@ struct Vec256<c10::qint32> {
float multiplier,
int32_t zero_point) {
const vint32 vmin = vec_splats(std::numeric_limits<value_type>::min());
const vint32 vmax = vec_splats(std::numeric_limits<value_type>::max());
const vint32 vmax = vec_splats(std::numeric_limits<value_type>::max());
vfloat32 vec_mult = vec_splats(multiplier);
vint32 vec_zero_point = vec_splats(zero_point);
Vec256<c10::qint32> vi = inp[0];

View File

@ -337,7 +337,7 @@ struct Vec256<c10::qint8> {
vint32 veci4 = vec_signed(vecf4);
vint32 veci5 = vec_signed(vecf5);
vint32 veci6 = vec_signed(vecf6);
vint32 veci7 = vec_signed(vecf7);
vint32 veci7 = vec_signed(vecf7);
veci0 = vec_add(veci0, vec_zero_point);
veci1 = vec_add(veci1, vec_zero_point);
@ -348,7 +348,7 @@ struct Vec256<c10::qint8> {
veci5 = vec_add(veci5, vec_zero_point);
veci6 = vec_add(veci6, vec_zero_point);
veci7 = vec_add(veci7, vec_zero_point);
vint16 vecshi0 = vec_packs(veci0, veci1);
vint16 vecshi1 = vec_packs(veci2, veci3);
vint16 vecshi2 = vec_packs(veci4, veci5);

View File

@ -345,8 +345,8 @@ struct Vec256<c10::quint8> {
vint32 veci4 = vec_signed(vecf4);
vint32 veci5 = vec_signed(vecf5);
vint32 veci6 = vec_signed(vecf6);
vint32 veci7 = vec_signed(vecf7);
vint32 veci7 = vec_signed(vecf7);
veci0 = vec_add(veci0, vec_zero_point);
veci1 = vec_add(veci1, vec_zero_point);
veci2 = vec_add(veci2, vec_zero_point);
@ -356,11 +356,11 @@ struct Vec256<c10::quint8> {
veci5 = vec_add(veci5, vec_zero_point);
veci6 = vec_add(veci6, vec_zero_point);
veci7 = vec_add(veci7, vec_zero_point);
vint16 vecshi0 = vec_packs(veci0, veci1);
vint16 vecshi1 = vec_packs(veci2, veci3);
vint16 vecshi2 = vec_packs(veci4, veci5);
vint16 vecshi3 = vec_packs(veci6, veci7);
vint16 vecshi3 = vec_packs(veci6, veci7);
vuint8 vec0 = vec_packsu(vecshi0, vecshi1);
vuint8 vec1 = vec_packsu(vecshi2, vecshi3);

View File

@ -193,7 +193,7 @@ void CUDAGeneratorImpl::set_state(const c10::TensorImpl& new_state) {
} else {
TORCH_CHECK(new_state_size == total_size, "RNG state is wrong size");
}
uint64_t input_seed;
auto new_rng_state = new_state.data<uint8_t>();
memcpy(&input_seed, new_rng_state + states_size, seed_size);

View File

@ -3,7 +3,7 @@
namespace at { namespace native {
at::Tensor linspace_from_neg_one(const Tensor& grid, int64_t num_steps,
at::Tensor linspace_from_neg_one(const Tensor& grid, int64_t num_steps,
bool align_corners) {
if (num_steps <= 1) {
return at::tensor(0, grid.options());

View File

@ -96,7 +96,7 @@ void avg_pool2d_out_cpu_template(
Tensor &output,
const Tensor &input_,
IntArrayRef kernel_size,
IntArrayRef stride,
IntArrayRef stride,
IntArrayRef padding,
bool ceil_mode,
bool count_include_pad,

View File

@ -11,11 +11,11 @@ extern "C" void zgemm_(char *transa, char *transb, int *m, int *n, int *k, void
#endif // AT_BUILD_WITH_BLAS()
#if AT_BUILD_WITH_BLAS()
extern "C" void cswap_(int *n, const void *x, int *incx, void *y, int *incy);
extern "C" void cswap_(int *n, const void *x, int *incx, void *y, int *incy);
extern "C" void dcopy_(int *n, const double *x, int *incx, double *y, int *incy);
extern "C" void scopy_(int *n, const float *x, int *incx, float *y, int *incy);
extern "C" void zcopy_(int *n, const void *x, int *incx, void *y, int *incy);
extern "C" void ccopy_(int *n, const void *x, int *incx, void *y, int *incy);
extern "C" void zcopy_(int *n, const void *x, int *incx, void *y, int *incy);
extern "C" void ccopy_(int *n, const void *x, int *incx, void *y, int *incy);
extern "C" void daxpy_(int *n, double *a, const double *x, int *incx, double *y, int *incy);
extern "C" void saxpy_(int *n, float *a, const float *x, int *incx, float *y, int *incy);
extern "C" void caxpy_(int *n, void *a, const void *x, int *incx, void *y, int *incy);
@ -279,7 +279,7 @@ void axpy(int64_t n, double a, const double *x, int64_t incx, double *y, int64_t
daxpy_(&i_n, &a, x, &i_incx, y, &i_incy);
return;
}
#endif
#endif
axpy_stub(
kCPU, at::kDouble,
n, a, x, incx, y, incy);
@ -300,7 +300,7 @@ void axpy(int64_t n, float a, const float *x, int64_t incx, float *y, int64_t in
saxpy_(&i_n, &a, x, &i_incx, y, &i_incy);
return;
}
#endif
#endif
axpy_stub(
kCPU, at::kFloat,
n, a, x, incx, y, incy);
@ -321,7 +321,7 @@ void axpy(int64_t n, c10::complex<double> a, const c10::complex<double> *x, int6
zaxpy_(&i_n, &a, x, &i_incx, y, &i_incy);
return;
}
#endif
#endif
axpy_stub(
kCPU, at::kComplexDouble,
n, a, x, incx, y, incy);
@ -342,7 +342,7 @@ void axpy(int64_t n, c10::complex<float> a, const c10::complex<float> *x, int64_
caxpy_(&i_n, &a, x, &i_incx, y, &i_incy);
return;
}
#endif
#endif
axpy_stub(
kCPU, at::kComplexFloat,
n, a, x, incx, y, incy);
@ -364,7 +364,7 @@ void copy(int64_t n, const double *x, int64_t incx, double *y, int64_t incy) {
dcopy_(&i_n, x, &i_incx, y, &i_incy);
return;
}
#endif
#endif
copy_stub(
kCPU, at::kDouble,
n, x, incx, y, incy);
@ -384,7 +384,7 @@ void copy(int64_t n, const float *x, int64_t incx, float *y, int64_t incy) {
scopy_(&i_n, x, &i_incx, y, &i_incy);
return;
}
#endif
#endif
copy_stub(
kCPU, at::kFloat,
n, x, incx, y, incy);
@ -404,7 +404,7 @@ void copy(int64_t n, const c10::complex<double> *x, int64_t incx, c10::complex<d
zcopy_(&i_n, x, &i_incx, y, &i_incy);
return;
}
#endif
#endif
copy_stub(
kCPU, at::kComplexDouble,
n, x, incx, y, incy);
@ -424,10 +424,10 @@ void copy(int64_t n, const c10::complex<float> *x, int64_t incx, c10::complex<fl
ccopy_(&i_n, x, &i_incx, y, &i_incy);
return;
}
#endif
#endif
copy_stub(
kCPU, at::kComplexFloat,
n, x, incx, y, incy);
}
}}} // namespace at::native::cpublas

View File

@ -4,7 +4,7 @@
namespace at { namespace native {
// View tensor with new dtype, storage offset, sizes and strides
// View tensor with new dtype, storage offset, sizes and strides
inline Tensor view_tensor(
const Tensor &tensor, ScalarType dtype,
int64_t offset, IntArrayRef sizes, IntArrayRef strides) {

View File

@ -128,7 +128,7 @@ C10_DEVICE static inline scalar_t polevl(const scalar_t x, const scalar_t A[],
}
/* the functions stirling_approx_tail, binomial_inversion, and btrs are adapted
* from TensorFlow's random_binomial_op.cc implementation. That code is under
* from TensorFlow's random_binomial_op.cc implementation. That code is under
* copyright: 2019 The TensorFlow Authors.
*
* It was released under the Apache License, Version 2.0 (the "License"), available at:

View File

@ -4,7 +4,7 @@
namespace at {
namespace native {
namespace {
// Check foreach API restrictions
// Check foreach API restrictions
// - Tensor lists must be non-empty.
// - All tensors in all lists must have the same dtype.
// - All TensorLists and ScalarLists must have the same number of elements.

View File

@ -51,7 +51,7 @@ Tensor& glu_backward_out(Tensor& grad_input,
Tensor secondHalf = input.narrow(wrap_dim, inputSize, inputSize);
Tensor gradInputfirstHalf = grad_input.narrow(wrap_dim, 0, inputSize);
Tensor gradInputsecondHalf = grad_input.narrow(wrap_dim, inputSize, inputSize);
at::sigmoid_out(gradInputfirstHalf, secondHalf);
// for second gradinput half, can get a better performance by fusion
auto iter = at::TensorIteratorConfig()

View File

@ -485,7 +485,7 @@ Tensor _grid_sampler_2d_cpu_fallback(const Tensor& input, const Tensor& grid,
}
}
} else if (interpolation_mode == GridSamplerInterpolation::Bicubic) {
// grid_sampler_compute_source_index will "clip the value" of idx depends on the padding,
// grid_sampler_compute_source_index will "clip the value" of idx depends on the padding,
// which would cause calculation to be wrong,
// for example x = -0.1 -> ix = 0 for zero padding, but in bicubic ix = floor(x) = -1
// There would be more problem in reflection padding, since the -1 and +1 direction is not fixed in boundary condition

View File

@ -10,7 +10,7 @@
namespace at {
namespace native {
namespace {
static void im2col_out_cpu_template(
Tensor& output,
const Tensor& input_,

View File

@ -56,7 +56,7 @@ namespace {
nframe = input.size(0);
dim = input.size(1);
}
TORCH_CHECK(
valid_inputs,
"Expected non-empty vector or matrix with optional 0-dim batch size, but got: ",

View File

@ -40,7 +40,7 @@ inline scalar_t multilabel_margin_loss_forward_inner_sum_cpu(
}
}
}
return sum;
}
@ -103,7 +103,7 @@ static void multilabel_margin_loss_forward_out_cpu_template(
int64_t reduction) {
auto target_arg = TensorArg(target, "target", 2);
int64_t nframe, dim;
const int64_t ndims = input.dim();
const int64_t ndims = input.dim();
if (ndims <= 1) {
nframe = 1;
dim = ndims == 0 ? 1 : input.size(0);
@ -113,7 +113,7 @@ static void multilabel_margin_loss_forward_out_cpu_template(
dim = input.size(1);
}
multilabel_margin_loss_shape_check(nframe, dim, ndims, target_arg, input, target);
// special case target.dim() <= 1: produce scalar output for scalar inputs
// even if reduction == Reduction::None
if (reduction != Reduction::None || target.dim() <= 1) {
@ -228,12 +228,12 @@ static void multilabel_margin_loss_backward_out_cpu_template(
multilabel_margin_loss_shape_check(nframe, dim, ndims, target_arg, input, target);
checkSameSize(c, target_arg, is_target_arg);
grad_input.resize_as_(input);
if (grad_input.numel() == 0) {
return;
}
TORCH_CHECK(grad_input.is_contiguous(), "grad_input must be contiguous");
grad_input.zero_();

View File

@ -205,7 +205,7 @@ void multi_margin_loss_backward_out_cpu_template(
int64_t nframe, dim;
auto target_arg = TensorArg(target, "target", 2);
const auto ndims = input.dim();
TORCH_CHECK(p == 1 || p == 2, "only p == 1 and p == 2 supported");
multi_margin_loss_shape_check(nframe, dim, ndims, target_arg, input, target);

View File

@ -153,7 +153,7 @@ static void nll_loss2d_forward_out_frame(
for (int64_t b = 0; b < batch_size; b++) {
for (int64_t elem = 0; elem < map_size; elem++) {
const int64_t cur_target = target_data[b * map_size + elem];
if (cur_target == ignore_index) {
continue;
}
@ -284,7 +284,7 @@ static void nll_loss2d_backward_out_frame(
for (int64_t b = start; b < end; b++) {
for (int64_t elem = 0; elem < map_size; elem++) {
const int64_t cur_target = target_data[b * map_size + elem];
if (cur_target == ignore_index) {
continue;
}

View File

@ -68,7 +68,7 @@ pool2d_shape_check(
TORCH_CHECK(dilationH > 0 && dilationW > 0,
"dilation should be greater than zero, but got ",
"dilationH: ", dilationH, " dilationW: ", dilationW);
bool valid_dims = input.size(1) != 0 && input.size(2) != 0;
if (memory_format == at::MemoryFormat::ChannelsLast){
// Expect tensor in NHWC format and allow 0-dim only for N.
@ -81,7 +81,7 @@ pool2d_shape_check(
"Expected 3D or 4D (batch mode) tensor with optional 0 dim batch size for input, but got:",
input.sizes());
}
TORCH_CHECK(kW/2 >= padW && kH/2 >= padH,
"pad should be smaller than or equal to half of kernel size, but got ",
"padW = ", padW, ", padH = ", padH, ", kW = ", kW, ", kH = ", kH);

View File

@ -16,7 +16,7 @@ namespace native {
#endif
// integral power in pytorch allows for negative exponents, giving truncated integral results.
// e.g. since 2**-1==0.5, the truncated integral result is zero. 1**negative_exponent is the
// e.g. since 2**-1==0.5, the truncated integral result is zero. 1**negative_exponent is the
// only non-zero result.
template <class T,
typename std::enable_if<std::is_integral<T>::value, T>::type* = nullptr>

View File

@ -219,7 +219,7 @@ Tensor& replication_pad1d_backward_out_cpu_template(
gradInput.resize_as_(input);
if (gradInput.numel() == 0) {
return gradInput;
}
}
gradInput.zero_();
/* backprop */
@ -522,7 +522,7 @@ Tensor& replication_pad2d_backward_out_cpu_template(
if (gradInput.numel() == 0) {
return gradInput;
}
gradInput.zero_();
/* backprop */

View File

@ -143,7 +143,7 @@ public:
return (ptr - other.ptr) / stride;
}
// }
// Comparison operators {
C10_HOST_DEVICE
bool operator==(const ConstStridedRandomAccessor& other) const {
@ -175,7 +175,7 @@ public:
return !(*this < other);
}
// }
protected:
PtrType ptr;
index_t stride;
@ -186,7 +186,7 @@ template <
typename index_t = int64_t,
template <typename U> class PtrTraits = DefaultPtrTraits
>
class StridedRandomAccessor
class StridedRandomAccessor
: public ConstStridedRandomAccessor<T, index_t, PtrTraits> {
public:
using difference_type = index_t;

View File

@ -12,7 +12,7 @@ namespace native {
static inline void flip_check_errors(int64_t total_dims, int64_t flip_dims_size, IntArrayRef dims) {
if (flip_dims_size==0) {
return;
}
}
// check if number of axis in dim is valid
if (flip_dims_size < 0 || flip_dims_size > total_dims) {
TORCH_CHECK_INDEX(false, "flip dims size out of range, got flip dims size=", flip_dims_size);

View File

@ -844,14 +844,14 @@ struct ApplyGridSample<scalar_t, 2, GridSamplerInterpolation::Bicubic,
auto mask_x = must_in_bound ? iVec(-1) : (ix > iVec(-1)) & (ix < iVec(inp_W));
auto mask_y = must_in_bound ? iVec(-1) : (iy > iVec(-1)) & (iy < iVec(inp_H));
auto mask = cast<scalar_t>(mask_x & mask_y);
auto offset = iy * iVec(inp_sH) + ix * iVec(inp_sW);
auto val = mask_gather<sizeof(scalar_t)>(Vec(0), data, offset, mask);
return val;
}
inline void add_value_bounded(scalar_t* data, int64_t len, const Vec& x, const Vec&y,
inline void add_value_bounded(scalar_t* data, int64_t len, const Vec& x, const Vec&y,
const Vec& delta) const {
auto ix = convert_to_int_of_same_size(compute_W.compute_coordinates(x));
@ -860,7 +860,7 @@ struct ApplyGridSample<scalar_t, 2, GridSamplerInterpolation::Bicubic,
auto mask_x = must_in_bound ? iVec(-1) : (ix > iVec(-1)) & (ix < iVec(inp_W));
auto mask_y = must_in_bound ? iVec(-1) : (iy > iVec(-1)) & (iy < iVec(inp_H));
auto mask = cast<scalar_t>(mask_x & mask_y);
auto i_gInp_offset = iy * iVec(inp_W) + ix;
integer_t i_gInp_offset_arr[iVec::size()];
i_gInp_offset.store(i_gInp_offset_arr);
@ -899,7 +899,7 @@ struct ApplyGridSample<scalar_t, 2, GridSamplerInterpolation::Bicubic,
// Interpolate the 4 values in the x direction
Vec interp_x[4];
for (int64_t i = 0; i < 4; ++i) {
interp_x[i] =
interp_x[i] =
coeff_x[0] * get_value_bounded(inp_slice_C_ptr, ix - Vec(1), iy + Vec(-1 + i)) +
coeff_x[1] * get_value_bounded(inp_slice_C_ptr, ix + Vec(0), iy + Vec(-1 + i)) +
coeff_x[2] * get_value_bounded(inp_slice_C_ptr, ix + Vec(1), iy + Vec(-1 + i)) +

View File

@ -76,7 +76,7 @@ inline void _vec_log_softmax_lastdim(
scalar_t* output_data = output_data_base + i * dim_size;
scalar_t tmp_sum = tmp_sum_scalar[j];
scalar_t max_input = max_input_arr[j];
// It's necessary to keep the order of the operations below.
// In some cases that input is large digits and the difference
// is small, if we compute `max_input` plus `tmp_sum` before,

View File

@ -39,7 +39,7 @@
// grad_in[...,i_in_dim,...,i_in_last_dim], where
// i_in_dim is in [left_idx_fold, right_idx_fold],
// i_in_last_dim = i_out_dim - i_in_dim * step,
// left_idx_fold = (i_out_dim - size) / step
// left_idx_fold = (i_out_dim - size) / step
// if i_out_dim in [left_idx_fold * step, left_idx_fold * step + size)
// else (i_out_dim - size) / step + 1,
// right_idx_fold = i_out_dim / step.

View File

@ -45,7 +45,7 @@ static inline void compute_source_index_and_lambda(
// Helper structs and methods for cpu_upsample_linear
//
// Interpolation methods that used below are separable, and as such we can compute the interpolation
// Interpolation methods that used below are separable, and as such we can compute the interpolation
// independently per dimension in a recursive way. Please, refer to #10482 for more context.
//
// Linear Interpolation structure to compute output value in n-dimensional case.
@ -96,26 +96,26 @@ static inline bool is_contiguous_stride(const int64_t* strides) {
}
// Helper class to recursively check if all input strides corresponding to interpolated dimensions
// Helper class to recursively check if all input strides corresponding to interpolated dimensions
// are equal zero except on a single dimension.
//
//
// Inputs: array of strides of size N, non_zero_stride_dim which can be -1, 0, 1, 2, ...
// if non_zero_stride_dim, we check that all strides are equal zero, otherwise
// 4 strides corresponding to the strides for index_0, weight_0, index_1 and weight_1 for non_zero_stride_dim
// dimension should be non zero.
//
// Unit check of the recursion is to verify whether 4 strides for one interpolated dimension are either zero,
//
// Unit check of the recursion is to verify whether 4 strides for one interpolated dimension are either zero,
// see method is_zero_stride, or (sizeof(index_t), sizeof(scalar_t), sizeof(index_t), sizeof(scalar_t)), see
// method is_contiguous_stride.
//
//
// In practice, we have the following cases:
// - for ND, float32, channel first, strides are
// - for ND, float32, channel first, strides are
// dimN-1, dim1, dim0
// i0, w0, i1, w1, ..., i0, w0, i1, w1, i0, w0, i1, w1
// strides=(0, 0, 0, 0, ..., 0, 0, 0, 0, 4, 4, 4, 4)
//
// if size dim0 is 1 then its strides are 0 and dim1 strides are equal 4
//
//
// - for ND, float32, channel last, strides are
// dimN-1, dimN-2, dim0
// i0, w0, i1, w1, i0, w0, i1, w1, ... i0, w0, i1, w1
@ -155,7 +155,7 @@ static inline void basic_loop(char** data, const int64_t* strides, int64_t n) {
}
// Linear upsampling computation method using TensorIterator for Nd case.
//
//
// Single loop function for 1d, 2d and 3d cases.
// For N dimensions, output value up to Di dimension can be computed as
//
@ -505,7 +505,7 @@ void cpu_upsample_linear_backward(
//
template<typename scalar_t>
std::vector<Tensor> compute_indices_weights_linear(
int64_t input_size, int64_t output_size, int64_t stride, int64_t ndims, int64_t reshape_dim,
int64_t input_size, int64_t output_size, int64_t stride, int64_t ndims, int64_t reshape_dim,
bool align_corners, const c10::optional<double> opt_scale
) {
@ -516,7 +516,7 @@ std::vector<Tensor> compute_indices_weights_linear(
new_shape[reshape_dim] = output_size;
output.emplace_back(empty(new_shape, CPU(at::kLong)));
output.emplace_back(empty(new_shape, CPU(c10::CppTypeToScalarType<scalar_t>())));
output.emplace_back(empty(new_shape, CPU(c10::CppTypeToScalarType<scalar_t>())));
output.emplace_back(empty(new_shape, CPU(at::kLong)));
output.emplace_back(empty(new_shape, CPU(c10::CppTypeToScalarType<scalar_t>())));
@ -524,7 +524,7 @@ std::vector<Tensor> compute_indices_weights_linear(
auto lambda0_ptr = output[1].data_ptr<scalar_t>();
auto input_index1_ptr = output[2].data_ptr<int64_t>();
auto lambda1_ptr = output[3].data_ptr<scalar_t>();
for (int64_t i=0; i<output_size; i++) {
compute_source_index_and_lambda<scalar_t>(
@ -543,7 +543,7 @@ std::vector<Tensor> compute_indices_weights_linear(
}
// Upsampling linear interpolation kernel for N-d case.
// Input is assumed to be like NCHW, NCL, NCKHW - interpolated spatial dimension
// Input is assumed to be like NCHW, NCL, NCKHW - interpolated spatial dimension
// are those from the end up to batch size N and number of channels C.
//
// Internally, it uses TensorIterator to optimize the computations.
@ -588,8 +588,8 @@ void upsample_linearNd_kernel_impl(
.declare_static_dtype_and_device(input.scalar_type(), input.device())
.add_output(output)
.add_input(restrided_input);
for (auto iter=indices_weights.begin(); iter!=indices_weights.end(); iter++) {
for (auto iter=indices_weights.begin(); iter!=indices_weights.end(); iter++) {
for (auto& tensor : *iter) {
config.add_input(tensor);
}

View File

@ -71,7 +71,7 @@ void batch_norm_cpu_inference_contiguous_impl(Tensor& output,
if (image_size != 1) {
const int64_t n_offset = n_channel * image_size;
const int64_t loop_size = image_size - (image_size % Vec::size());
for (int64_t n = 0; n < n_batch; n++) {
for (int64_t n = 0; n < n_batch; n++) {
for (int64_t c = 0; c < n_channel; c++) {
const Vec alpha_vec(alpha_data[c]);
const Vec beta_vec(beta_data[c]);

View File

@ -1894,7 +1894,7 @@ AT_ERROR("triangular_solve: MAGMA library not found in "
magma_int_t n = magma_int_cast(A.size(-2), "A.size(-2)");
magma_int_t nrhs = magma_int_cast(b.size(-1), "b.size(-1)");
// magma returns early if m <= 0 || n <= 0 for magmaTriangularSolveBatched
// magmaTriangularSolve is calling cuBLAS and it prints
// magmaTriangularSolve is calling cuBLAS and it prints
// ** On entry to DTRSM parameter number 9 had an illegal value
// so let's use proper lda parameter here
magma_int_t lda = std::max<magma_int_t>(1, n);
@ -2282,7 +2282,7 @@ std::tuple<Tensor, Tensor> _syevd_helper_cuda(const Tensor& self, bool compute_e
bool upper = uplo == 'U' ? true : false;
return _symeig_helper_cuda(self, compute_eigenvectors, upper);
}
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ svd ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
template<typename scalar_t>

View File

@ -50,7 +50,7 @@ static void apply_batched_inverse_lib(Tensor& self, Tensor& self_inv, Tensor& in
auto& allocator = *::c10::cuda::CUDACachingAllocator::get();
// Heuristic: For small batch size or large matrix size, we use for-loop to iterate over the batches instead of
// Heuristic: For small batch size or large matrix size, we use for-loop to iterate over the batches instead of
// calling the batched cublas routine.
if (batch_size <= 8 || /* batch_size > 8 && */ n >= 512) {
for (int64_t i = 0; i < batch_size; i++) {

View File

@ -260,7 +260,7 @@ public:
#ifdef __HIP_PLATFORM_HCC__
// clone input to avoid issues with hipfft clobering the input and failing tests
clone_input = true;
clone_input = true;
#else
clone_input = false;
#endif

View File

@ -20,9 +20,9 @@ std::vector<Tensor> foreach_tensor_list_op(TensorList tensors1, TensorList tenso
AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3(kBool, kBFloat16, kHalf, tensors1[0].scalar_type(), "foreach_binary_op_list_cuda", [&]() {
using opmath_t = get_opmath_t<scalar_t>::opmath_t;
multi_tensor_apply<3>(tensor_lists,
BinaryOpListAlphaFunctor<scalar_t,
BinaryOpListAlphaFunctor<scalar_t,
/* depth */ 3,
/* r_args_depth */ 2,
/* r_args_depth */ 2,
/* res_arg_index */ 2>(),
Op<opmath_t>(),
alpha.to<opmath_t>());
@ -40,9 +40,9 @@ void foreach_tensor_list_op_(TensorList tensors1, TensorList tensors2, Scalar al
AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3(kBool, kBFloat16, kHalf, tensors1[0].scalar_type(), "foreach_binary_op_list_cuda_", [&]() {
using opmath_t = get_opmath_t<scalar_t>::opmath_t;
multi_tensor_apply<2>(tensor_lists,
BinaryOpListAlphaFunctor<scalar_t,
BinaryOpListAlphaFunctor<scalar_t,
/* depth */ 2,
/* r_args_depth */ 2,
/* r_args_depth */ 2,
/* res_arg_index */ 0>(),
Op<opmath_t>(),
alpha.to<opmath_t>());

View File

@ -19,9 +19,9 @@ std::vector<Tensor> foreach_binary_op(TensorList tensors, Scalar scalar) {
AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3(kBool, kBFloat16, kHalf, tensors[0].scalar_type(), "foreach_binary_op_scalar_cuda", [&]() {
using opmath_t = get_opmath_t<scalar_t>::opmath_t;
multi_tensor_apply<2>(tensor_lists,
BinaryOpScalarFunctor<scalar_t,
BinaryOpScalarFunctor<scalar_t,
/* depth */ 2,
/* r_args_depth */ 1,
/* r_args_depth */ 1,
/* res_arg_index */ 1>(),
Op<opmath_t>(),
scalar.to<opmath_t>());
@ -37,9 +37,9 @@ void foreach_binary_op_(TensorList tensors, Scalar scalar) {
AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3(kBool, kBFloat16, kHalf, tensors[0].scalar_type(), "foreach_binary_op_scalar_cuda_", [&]() {
using opmath_t = get_opmath_t<scalar_t>::opmath_t;
multi_tensor_apply<1>(tensor_lists,
BinaryOpScalarFunctor<scalar_t,
BinaryOpScalarFunctor<scalar_t,
/* depth */ 1,
/* r_args_depth */ 1,
/* r_args_depth */ 1,
/* res_arg_index */ 0>(),
Op<opmath_t>(),
scalar.to<opmath_t>());

View File

@ -20,9 +20,9 @@ std::vector<Tensor> foreach_binary_op(TensorList tensors, at::ArrayRef<Scalar> s
using opmath_t = get_opmath_t<scalar_t>::opmath_t;
multi_tensor_apply<2, opmath_t>(tensor_lists,
scalars,
BinaryOpScalarListFunctor<scalar_t,
BinaryOpScalarListFunctor<scalar_t,
/* depth */ 2,
/* r_args_depth */ 1,
/* r_args_depth */ 1,
/* res_arg_index */ 1>(),
Op<opmath_t>());
@ -39,9 +39,9 @@ void foreach_binary_op_(TensorList tensors, at::ArrayRef<Scalar> scalars) {
using opmath_t = get_opmath_t<scalar_t>::opmath_t;
multi_tensor_apply<1, opmath_t>(tensor_lists,
scalars,
BinaryOpScalarListFunctor<scalar_t,
BinaryOpScalarListFunctor<scalar_t,
/* depth */ 1,
/* r_args_depth */ 1,
/* r_args_depth */ 1,
/* res_arg_index */ 0>(),
Op<opmath_t>());
});

View File

@ -22,9 +22,9 @@ std::vector<Tensor> foreach_pointwise_op(TensorList input, TensorList tensors1,
AT_DISPATCH_ALL_TYPES_AND(kHalf, input[0].scalar_type(), "foreach_pointwise_op_cuda", [&]() {
using opmath_t = get_opmath_t<scalar_t>::opmath_t;
multi_tensor_apply<4>(tensor_lists,
PointwiseOpScalarFunctor<scalar_t,
PointwiseOpScalarFunctor<scalar_t,
/* depth */ 4,
/* r_args_depth */ 3,
/* r_args_depth */ 3,
/* res_arg_index */ 3>(),
Op<opmath_t>(),
scalar.to<opmath_t>());
@ -43,9 +43,9 @@ void foreach_pointwise_op_(TensorList input, TensorList tensors1, TensorList ten
AT_DISPATCH_ALL_TYPES_AND(kHalf, input[0].scalar_type(), "foreach_pointwise_op__cuda", [&]() {
using opmath_t = get_opmath_t<scalar_t>::opmath_t;
multi_tensor_apply<3>(tensor_lists,
PointwiseOpScalarFunctor<scalar_t,
PointwiseOpScalarFunctor<scalar_t,
/* depth */ 3,
/* r_args_depth */ 3,
/* r_args_depth */ 3,
/* res_arg_index */ 0>(),
Op<opmath_t>(),
scalar.to<opmath_t>());
@ -64,9 +64,9 @@ void foreach_pointwise_op_(TensorList input, TensorList tensors1, TensorList ten
using opmath_t = get_opmath_t<scalar_t>::opmath_t;
multi_tensor_apply<3, opmath_t>(tensor_lists,
scalars,
PointwiseOpScalarListFunctor<scalar_t,
PointwiseOpScalarListFunctor<scalar_t,
/* depth */ 3,
/* r_args_depth */ 3,
/* r_args_depth */ 3,
/* res_arg_index */ 0>(),
Op<opmath_t>());
});
@ -91,9 +91,9 @@ std::vector<Tensor> foreach_pointwise_op(TensorList input, TensorList tensors1,
using opmath_t = get_opmath_t<scalar_t>::opmath_t;
multi_tensor_apply<4, opmath_t>(tensor_lists,
scalars,
PointwiseOpScalarListFunctor<scalar_t,
PointwiseOpScalarListFunctor<scalar_t,
/* depth */ 4,
/* r_args_depth */ 3,
/* r_args_depth */ 3,
/* res_arg_index */ 3>(),
Op<opmath_t>());
});

View File

@ -33,7 +33,7 @@ template <typename scalar_t, template<class> class Op> void foreach_unary_op_(Te
multi_tensor_apply<1>(tensor_lists,
UnaryOpFunctor<scalar_t,
/* depth */ 1,
/* r_args_depth */ 1,
/* r_args_depth */ 1,
/* res_arg_index */ 0>(),
Op<opmath_t>());
}
@ -230,7 +230,7 @@ void foreach_tensor_neg_cuda_(TensorList tensors) {
}
// Abs have to go via slow path in case of a complex type.
// This is because foreach kernels can't return a different dtype than passed, while
// This is because foreach kernels can't return a different dtype than passed, while
// abs with complex inputs will produce float output.
template<typename T>
struct Abs {
@ -283,7 +283,7 @@ void foreach_tensor_zero_cuda_(TensorList tensors) {
multi_tensor_apply<1>(tensor_lists,
ZeroFunctor<scalar_t,
/* depth */ 1,
/* r_args_depth */ 1,
/* r_args_depth */ 1,
/* res_arg_index */ 0>());
});
}

View File

@ -142,14 +142,14 @@ scalar_t reflect_coordinates_set_grad(scalar_t in, int twice_low, int twice_high
}
}
template<typename scalar_t>
static __forceinline__ __device__
template<typename scalar_t>
static __forceinline__ __device__
scalar_t safe_downgrade_to_int_range(scalar_t x){
// -100.0 does not have special meaning. This is just to make sure
// it's not within_bounds_2d or within_bounds_3d, and does not cause
// undefined behavior. See #35506.
if (x > INT_MAX-1 || x < INT_MIN || !::isfinite(static_cast<double>(x)))
return static_cast<scalar_t>(-100.0);
// -100.0 does not have special meaning. This is just to make sure
// it's not within_bounds_2d or within_bounds_3d, and does not cause
// undefined behavior. See #35506.
if (x > INT_MAX-1 || x < INT_MIN || !::isfinite(static_cast<double>(x)))
return static_cast<scalar_t>(-100.0);
return x;
}
@ -219,7 +219,7 @@ scalar_t grid_sampler_compute_source_index_set_grad(
*grad_in = (*grad_in) * grad_refl * grad_clip;
}
coord = safe_downgrade_to_int_range(coord);
coord = safe_downgrade_to_int_range(coord);
return coord;
}
@ -244,7 +244,7 @@ scalar_t get_value_bounded(
y = compute_coordinates(y, H, padding_mode, align_corners);
int ix = static_cast<int>(x);
int iy = static_cast<int>(y);
int iy = static_cast<int>(y);
if (within_bounds_2d(iy, ix, H, W)) {
return data[iy * sH + ix * sW];
@ -284,7 +284,7 @@ void add_value_bounded(
y = compute_coordinates(y, H, padding_mode, align_corners);
int ix = static_cast<int>(x);
int iy = static_cast<int>(y);
int iy = static_cast<int>(y);
safe_add_2d(data, iy, ix, sH, sW, H, W, delta);
}

View File

@ -978,7 +978,7 @@ Tensor & masked_fill__cuda(Tensor& self, const Tensor & mask, Scalar value) {
.add_output(self)
.add_input(self)
.add_input(b_mask)
.build();
.build();
if (b_mask.dtype() == at::ScalarType::Byte) {
TORCH_WARN("masked_fill_ received a mask with dtype torch.uint8, this behavior is now deprecated," \

View File

@ -58,7 +58,7 @@ Tensor kl_div_backward_cuda(const Tensor& grad, const Tensor& input, const Tenso
});
});
}
else {
else {
grad_input = -at::exp(target) * grad;
if (reduction == at::Reduction::Mean) {
grad_input /= input.numel();

View File

@ -91,7 +91,7 @@ struct MagmaStreamSyncGuard {
static inline int cuda_int_cast(int64_t value, const char* varname) {
auto result = static_cast<int>(value);
TORCH_CHECK(static_cast<int64_t>(result) == value,
TORCH_CHECK(static_cast<int64_t>(result) == value,
"cuda_int_cast: The value of ", varname, "(", (long long)value,
") is too large to fit into a int (", sizeof(int), " bytes)");
return result;

View File

@ -555,7 +555,7 @@ __global__ void batch_norm_backward_elemt_kernel(
const GenericPackedTensorAccessor<stat_accscalar_t, 1, DefaultPtrTraits, index_t> sum_dy_xmu,
GenericPackedTensorAccessor<input_scalar_t, 3, DefaultPtrTraits, index_t> grad_input,
const int* __restrict__ numel, const int world_size) {
int64_t div = 0;
for (int i = 0; i < world_size; i ++) {
div += numel[i];
@ -955,7 +955,7 @@ std::tuple<Tensor, Tensor> batch_norm_update_stats_cuda_template(
}
// welford kernel for c last tensor calculating mean/biased_variance/unbiased_variance
// original apex name: welford_kernel_c_last
// original apex name: welford_kernel_c_last
template
<template<typename T> class VarTransform,
typename scalar_t,
@ -1632,7 +1632,7 @@ at::Tensor batch_norm_backward_elemt_channels_last_cuda_template(
});
}
C10_CUDA_KERNEL_LAUNCH_CHECK();
return grad_input;
}

View File

@ -24,7 +24,7 @@ namespace {
// applied to the result of the inline function, and thus the result is incorrect.
// e.g. if we use 1.0 / sqrt(2) for 2 ^ (-0.5) in MSVC, we get
// int(2 ^ (-0.5)) = int(1.0 / sqrt(2)) = int(1.0 / int(1.414)) = int(1.0 / 1) = 1
// However, the correct result is
// However, the correct result is
// int(2 ^ (-0.5)) = int(1.0 / 1.414) = 0
#ifdef _MSC_VER
// Functions for pow

View File

@ -119,14 +119,14 @@ static void _aminmax_kernel_impl(
const Tensor& self,
int64_t dim,
bool keepdim) {
at::TensorIterator iter = make_reduction("_aminmax", min_result,
at::TensorIterator iter = make_reduction("_aminmax", min_result,
max_result, self, dim, keepdim, self.scalar_type());
AT_DISPATCH_ALL_TYPES_AND2(kHalf, kBool, self.scalar_type(), "_aminmax_cuda", [&]() {
gpu_reduce_kernel<scalar_t, scalar_t>(
iter,
MinMaxOps<scalar_t, scalar_t, int32_t>{},
thrust::pair<scalar_t, scalar_t>(
at::numeric_limits<scalar_t>::upper_bound(),
at::numeric_limits<scalar_t>::upper_bound(),
at::numeric_limits<scalar_t>::lower_bound()
)
);

View File

@ -553,7 +553,7 @@ static inline void split_batch_dim_to_32bit_out(
const at::Tensor& input,
const at::Tensor& weight,
IntArrayRef padding, IntArrayRef stride, IntArrayRef dilation, int64_t groups,
bool benchmark, bool deterministic, bool allow_tf32,
bool benchmark, bool deterministic, bool allow_tf32,
int64_t max_worksize, func_t func_32bit) {
constexpr int64_t int_max = std::numeric_limits<int>::max();
const int64_t ni = input.numel();

View File

@ -28,7 +28,7 @@
# |out ch indx| 16
# |params | 20
# |-----------|
#
#
# After loading w pointer in ip reg.
# And after pushing r4-r8 and d8-d15 on stack
@ -42,7 +42,7 @@
# |out ch indx| 112
# |params | 116
# |-----------|
#
#
# void pytorch_q8conv_ukernel_4x8__aarch32_neon(
# size_t mr,

View File

@ -16,8 +16,8 @@
# x2: kc
# x3: ks
# x4: a
# x5: w
# x6: c
# x5: w
# x6: c
# x7: c_stride
#

View File

@ -28,7 +28,7 @@
# |out ch indx| 16
# |params | 20
# |-----------|
#
#
# After loading w pointer in ip reg.
# And after pushing r4-r9 and d8-d15 on stack
@ -42,7 +42,7 @@
# |out ch indx| 104
# |params | 108
# |-----------|
#
#
#
# New Struct for pytorch_qnnp_conv_quantization_params

View File

@ -41,7 +41,7 @@
# |out ch indx| 16
# |params | 20
# |-----------|
#
#
# After loading w pointer in ip reg.
# And after pushing r4-r8 and d8-d15 on stack
@ -56,7 +56,7 @@
# |out ch indx| 100
# |params | 104
# |-----------|
#
#
# void pytorch_q8gemm_ukernel_4x8__aarch32_neon(
# size_t mr,

View File

@ -15,8 +15,8 @@
# x2: k
# x3: a
# x4: a_stride
# x5: w
# x6: c
# x5: w
# x6: c
# x7: c_stride
#

View File

@ -21,15 +21,15 @@
# |----------------|
# |packed_a | 0
# |----------------|
#
#
# After loading w pointer in ip reg.
# And after pushing r4-r9 and d8-d15 on stack
# |----------------|
# |r4 - r11 | 0
# |r4 - r11 | 0
# |packed_a | 32
# |----------------|
#
#
# Packed A format.
# 4kx4m blocks for alls blocks given 4 rows (4m) are placed in contiguous memory.
@ -42,7 +42,7 @@
# | | Thus Packed A has (K + 4 - 1)/4 * (M + 4 -1)/4 blocks
# | |
# |---------------------|
#
#
# Each 4 x 4 blocks is transposed and stored.
# Each of the (K + 4 - 1)/4 blocks for a given group of 4 m blocks
# are stored adjacent in memory

View File

@ -20,7 +20,7 @@
## Stack
# 4 a_stride
# 4 packed_w
# 4 w_row_ptr
# 4 w_row_ptr
# 4 w_block_ids_ptr
# 4 b
# 4 c
@ -43,7 +43,7 @@
# |out ch indx | 24
# |params | 28
# |----------------|
#
#
# After loading w pointer in ip reg.
# And after pushing r4-r9 and d8-d15 on stack
@ -58,7 +58,7 @@
# |out ch indx | 120
# |params | 124
# |----------------|
#
#
# void pytorch_q8gemm_dq_sparse_1x4_ukernel_4x8_packedA__aarch32_neon(
# size_t mr,
@ -223,7 +223,7 @@ k_loop:
# Each iteration produce 4 values each of 4 bytes
# Thus 4 x 4 = 16 bytes 2^4
# In this implementation, first value will be stored at
# 1st value: sp - 12 - r1 * 16
# 1st value: sp - 12 - r1 * 16
# 2nd value: sp - 12 - (r1 - 1) * 16
# and so on.
SUB r9, r9, r1, LSL #4

View File

@ -20,7 +20,7 @@
## Stack
# 4 a_stride
# 4 packed_w
# 4 w_row_ptr
# 4 w_row_ptr
# 4 w_block_ids_ptr
# 4 b
# 4 c
@ -43,7 +43,7 @@
# |out ch indx | 24
# |params | 28
# |----------------|
#
#
# After loading w pointer in ip reg.
# And after pushing r4-r9 and d8-d15 on stack
@ -58,7 +58,7 @@
# |out ch indx | 120
# |params | 124
# |----------------|
#
#
# void pytorch_q8gemm_dq_sparse_8x1_ukernel_4x8_packedA__aarch32_neon(
# size_t mr,

View File

@ -21,15 +21,15 @@
# |----------------|
# |packed_a | 0
# |----------------|
#
#
# After loading w pointer in ip reg.
# And after pushing r4-r9 and d8-d15 on stack
# |----------------|
# |r4 - r11 | 0
# |r4 - r11 | 0
# |packed_a | 32
# |----------------|
#
#
# Packed A format.
# 8kx4m blocks for alls blocks given 4 rows (4m) are placed in contiguous memory.
@ -42,7 +42,7 @@
# | | Thus Packed A has (K + 4 - 1)/4 * (M + 8 -1)/8 blocks
# | |
# |---------------------|
#
#
# Each 8 x 4 blocks is transposed and stored.
# Each of the (K + 4 - 1)/4 blocks for a given group of 8 m blocks
# are stored adjacent in memory

View File

@ -19,7 +19,7 @@
# | | Thus Packed A has (K + 4 - 1)/4 * (M + 8 -1)/8 blocks
# | |
# |---------------------|
#
#
# Each 8 x 4 blocks is transposed and stored.
# Each of the (K + 4 - 1)/4 blocks for a given group of 8 m blocks
# are stored adjacent in memory

View File

@ -278,7 +278,7 @@ k_loop:
# v10 : x10, x11, x12, x13
# v12 : x20, x21, x22, x23
# v14 : x30, x31, x32, x33
# Then using
# Then using
# TRANSPOSE_4X4_S32 v16, v18, v20, v22, v4, v5, v6, v7
# We get
# v16 : x04, x05, x06, x07

View File

@ -635,7 +635,7 @@ Tensor _sparse_log_softmax(const Tensor& input_, const int64_t dim_, c10::option
namedinference::propagate_names(result, input_);
return result;
}
Tensor _sparse_log_softmax(const Tensor& self, Dimname dim, optional<ScalarType> dtype) {
return at::_sparse_log_softmax(self, dimname_to_position(self, dim), dtype);
}

View File

@ -541,8 +541,8 @@ Tensor sparse_mask_helper_cpu(
`t` - coalesced sparse tensor input
`mask_indices` - mask indices tensor
Note: The nnz in the output tensor will be same as the `mask_indices`. So it will
works independently if the mask is coalesced or not.
Note: The nnz in the output tensor will be same as the `mask_indices`. So it will
works independently if the mask is coalesced or not.
*/
TORCH_CHECK(t.is_sparse(), "t: input is not a sparse tensor");
TORCH_CHECK(t.is_coalesced(), "t: input is uncoalesced");
@ -554,7 +554,7 @@ Tensor sparse_mask_helper_cpu(
auto t_v = t._values();
auto vsize = t_v.sizes().vec();
vsize[0] = r_nnz;
Tensor r_values = at::zeros(vsize, t_v.options());
auto t_i = t._indices();
auto t_nnz = t._nnz();
@ -583,7 +583,7 @@ Tensor sparse_mask_helper_cpu(
}
}
});
return r_values;
return r_values;
}
}} // namespace at::native

View File

@ -1116,7 +1116,7 @@ SparseTensor& _sspaddmm_out_cpu(
"sspaddmm: Argument #1: Expected dim 1 size ", dim_k, ", got ", t.size(1));
int64_t nnz = sparse._nnz();
// We have to make indices contiguous as we use indices.data_ptr in _to_csr which assumes row-contiguous storage
// We have to make indices contiguous as we use indices.data_ptr in _to_csr which assumes row-contiguous storage
Tensor indices = sparse._indices().contiguous();
Tensor values = sparse._values();

View File

@ -144,8 +144,8 @@ void csrmm2(
TORCH_CUDASPARSE_CHECK(cusparseCreateDnMat(
&descC, /* output */
m, n, ldc, /* rows, cols, leading dimension */
c, /* values */
cusparse_value_type, /* data type of values */
c, /* values */
cusparse_value_type, /* data type of values */
CUSPARSE_ORDER_COL /* memory layout, ONLY column-major is supported now */
));

View File

@ -105,7 +105,7 @@ def main(argv):
out_dir = pathlib.Path(__file__).parent
(out_dir / "nnapi_wrapper.h").write_text(
PREFIX +
PREFIX +
textwrap.dedent("""\
#ifndef NNAPI_WRAPPER_H_
#define NNAPI_WRAPPER_H_
@ -124,7 +124,7 @@ def main(argv):
)
(out_dir / "nnapi_wrapper.cpp").write_text(
PREFIX +
PREFIX +
textwrap.dedent("""\
#ifndef _WIN32
#include <dlfcn.h>

View File

@ -140,7 +140,7 @@ struct NnapiCompilation : torch::jit::CustomClassHolder {
}
check_nnapi->Execution_compute(execution);
// TODO: Maybe skip this for fixed-size outputs?
for (size_t i = 0; i < outputs.size(); i++) {
auto& t = outputs[i];

View File

@ -18,7 +18,7 @@ TEST(CPUGeneratorImpl, TestGeneratorDynamicCast) {
}
TEST(CPUGeneratorImpl, TestDefaultGenerator) {
// Test Description:
// Test Description:
// Check if default generator is created only once
// address of generator should be same in all calls
auto foo = at::detail::getDefaultCPUGenerator();
@ -27,7 +27,7 @@ TEST(CPUGeneratorImpl, TestDefaultGenerator) {
}
TEST(CPUGeneratorImpl, TestCloning) {
// Test Description:
// Test Description:
// Check cloning of new generators.
// Note that we don't allow cloning of other
// generator states into default generators.
@ -47,9 +47,9 @@ void thread_func_get_engine_op(CPUGeneratorImpl* generator) {
}
TEST(CPUGeneratorImpl, TestMultithreadingGetEngineOperator) {
// Test Description:
// Test Description:
// Check CPUGeneratorImpl is reentrant and the engine state
// is not corrupted when multiple threads request for
// is not corrupted when multiple threads request for
// random samples.
// See Note [Acquire lock when using random generators]
auto gen1 = at::detail::createCPUGenerator();
@ -74,7 +74,7 @@ TEST(CPUGeneratorImpl, TestMultithreadingGetEngineOperator) {
}
TEST(CPUGeneratorImpl, TestGetSetCurrentSeed) {
// Test Description:
// Test Description:
// Test current seed getter and setter
// See Note [Acquire lock when using random generators]
auto foo = at::detail::getDefaultCPUGenerator();
@ -92,7 +92,7 @@ void thread_func_get_set_current_seed(Generator generator) {
}
TEST(CPUGeneratorImpl, TestMultithreadingGetSetCurrentSeed) {
// Test Description:
// Test Description:
// Test current seed getter and setter are thread safe
// See Note [Acquire lock when using random generators]
auto gen1 = at::detail::getDefaultCPUGenerator();
@ -107,7 +107,7 @@ TEST(CPUGeneratorImpl, TestMultithreadingGetSetCurrentSeed) {
}
TEST(CPUGeneratorImpl, TestRNGForking) {
// Test Description:
// Test Description:
// Test that state of a generator can be frozen and
// restored
// See Note [Acquire lock when using random generators]
@ -124,7 +124,7 @@ TEST(CPUGeneratorImpl, TestRNGForking) {
ASSERT_EQ(target_value.sum().item<double>(), forked_value.sum().item<double>());
}
/**
/**
* Philox CPU Engine Tests
*/
@ -208,7 +208,7 @@ TEST(CPUGeneratorImpl, TestMT19937EngineReproducibility) {
// Test Description:
// Tests if same inputs give same results when compared
// to std.
// test with zero seed
at::mt19937 engine1(0);
std::mt19937 engine2(0);
@ -231,5 +231,5 @@ TEST(CPUGeneratorImpl, TestMT19937EngineReproducibility) {
for(int i = 0; i < 10000; i++) {
ASSERT_EQ(engine1(), engine2());
}
}

View File

@ -80,7 +80,7 @@ __global__ void testEngineOffset2(){
unsigned long long increment_val = ::ldexp(1.0, 64);
at::Philox4_32_10 engine1(123, 0, increment_val);
at::Philox4_32_10 engine2(123, increment_val, increment_val);
engine2.incr_n(increment_val);
engine2.incr();
assert(engine1() == engine2());
@ -166,7 +166,7 @@ TEST(CUDAGeneratorImpl, TestGeneratorDynamicCast) {
}
TEST(CUDAGeneratorImpl, TestDefaultGenerator) {
// Test Description:
// Test Description:
// Check if default generator state is created only once
// address of generator should be same in all calls
if (!at::cuda::is_available()) return;
@ -186,7 +186,7 @@ TEST(CUDAGeneratorImpl, TestDefaultGenerator) {
}
TEST(CUDAGeneratorImpl, TestCloning) {
// Test Description:
// Test Description:
// Check cloning of new generators.
// Note that we don't allow cloning of other
// generator states into default generators.
@ -211,9 +211,9 @@ void thread_func_get_set_current_seed(Generator generator) {
current_seed++;
generator.set_current_seed(current_seed);
}
TEST(CUDAGeneratorImpl, TestMultithreadingGetSetCurrentSeed) {
// Test Description:
// Test Description:
// Test current seed getter and setter are thread safe
// See Note [Acquire lock when using random generators]
if (!at::cuda::is_available()) return;
@ -229,7 +229,7 @@ TEST(CUDAGeneratorImpl, TestMultithreadingGetSetCurrentSeed) {
}
TEST(CUDAGeneratorImpl, TestRNGForking) {
// Test Description:
// Test Description:
// Test that state of a generator can be frozen and
// restored
// See Note [Acquire lock when using random generators]

View File

@ -765,7 +765,7 @@ int main()
test_THDoubleVector_fill_VSX();
test_THFloatVector_fill_VSX();
test_THDoubleVector_muls_VSX();
test_THFloatVector_muls_VSX();

View File

@ -93,7 +93,7 @@ static inline uint32_t detectHostSIMDExtensions()
}
#endif
#elif defined(__EMSCRIPTEN__)
static inline uint32_t detectHostSIMDExtensions()

View File

@ -294,9 +294,9 @@ inline __device__ at::BFloat16 gpuAtomicMul(at::BFloat16 * address, at::BFloat16
return AtomicFPOp<at::BFloat16>()(address, val,
[](at::BFloat16 bsum, at::BFloat16 val) {
return THCNumerics<at::BFloat16>::mul(bsum, val);
});
});
}
inline __device__ double gpuAtomicMul(double * address, double val) {
return AtomicFPOp<double>()(address, val,
[](double val, unsigned long long int assumed) {

View File

@ -270,7 +270,7 @@ BENCHMARK_DEFINE_F(Reduce1D, TeSplitTail)(benchmark::State& state) {
te::For* tail;
loop.splitWithTail(m, kChunkSize, &mo, &mi, &tail);
}
loop.prepareForCodegen();
te::Stmt* s = loop.root_stmt();
s = te::IRSimplifier::simplify(s);
@ -313,7 +313,7 @@ BENCHMARK_DEFINE_F(Reduce1D, TeSplitMask)(benchmark::State& state) {
te::For* mi;
loop.splitWithMask(m, kChunkSize, &mo, &mi);
}
loop.prepareForCodegen();
te::Stmt* s = loop.root_stmt();
s = te::IRSimplifier::simplify(s);
@ -369,7 +369,7 @@ BENCHMARK_DEFINE_F(Reduce1D, TeRfactorV1)(benchmark::State& state) {
auto bt_body = te::NodeFinder<te::ReduceOp>::find(loop.root_stmt())[0];
loop.rfactor(bt_body, mi->var());
}
loop.prepareForCodegen();
te::Stmt* s = loop.root_stmt();
s = te::IRSimplifier::simplify(s);
@ -419,7 +419,7 @@ BENCHMARK_DEFINE_F(Reduce1D, TeRfactorV2)(benchmark::State& state) {
{
// Look for the new For and vectorize, but rfactor didn't return the newly added "For *".
// Resort to a hack to find the lost "For *".
// Resort to a hack to find the lost "For *".
// TODO: make it easier to find the transformed loop after rfactor.
auto loops = te::NodeFinder<te::For>::find(loop.root_stmt());
TORCH_CHECK(loops.size() == 4);

View File

@ -163,7 +163,7 @@ class AgentBase:
r"""
Finishes the episode
Args:
rets (list): List containing rewards generated by selct action calls during
rets (list): List containing rewards generated by selct action calls during
episode run
"""
return self.agent_latency, self.agent_throughput

View File

@ -48,13 +48,13 @@ class CoordinatorBase:
def run_coordinator(self, episodes, episode_steps, queue):
r"""
Runs n benchmark episodes. Each episode is started by coordinator telling each
observer to contact the agent. Each episode is concluded by coordinator telling agent
Runs n benchmark episodes. Each episode is started by coordinator telling each
observer to contact the agent. Each episode is concluded by coordinator telling agent
to finish the episode, and then the coordinator records benchmark data
Args:
episodes (int): Number of episodes to run
episode_steps (int): Number steps to be run in each episdoe by each observer
queue (SimpleQueue): SimpleQueue from torch.multiprocessing.get_context() for
queue (SimpleQueue): SimpleQueue from torch.multiprocessing.get_context() for
saving benchmark run results to
"""
@ -96,9 +96,9 @@ class CoordinatorBase:
observer_throughput_final = [
t for s in observer_throughput_final for t in s]
benchmark_metrics = {'agent latency (seconds)': {},
'agent throughput': {},
'observer latency (seconds)': {},
benchmark_metrics = {'agent latency (seconds)': {},
'agent throughput': {},
'observer latency (seconds)': {},
'observer throughput': {}}

View File

@ -44,19 +44,19 @@ args = vars(args)
def run_worker(rank, world_size, master_addr, master_port, batch, state_size, nlayers, out_features, queue):
r"""
inits an rpc worker
inits an rpc worker
Args:
rank (int): Rpc rank of worker machine
world_size (int): Number of workers in rpc network (number of observers +
1 agent + 1 coordinator)
master_addr (str): Master address of cooridator
master_port (str): Master port of coordinator
batch (bool): Whether agent will use batching or process one observer
batch (bool): Whether agent will use batching or process one observer
request a at a time
state_size (str): Numerical str representing state dimensions (ie: 5-15-10)
nlayers (int): Number of layers in model
out_features (int): Number of out features in model
queue (SimpleQueue): SimpleQueue from torch.multiprocessing.get_context() for
queue (SimpleQueue): SimpleQueue from torch.multiprocessing.get_context() for
saving benchmark run results to
"""
state_size = list(map(int, state_size.split('-')))
@ -82,9 +82,9 @@ def find_graph_variable(args):
r"""
Determines if user specified multiple entries for a single argument, in which case
benchmark is run for each of these entries. Comma separated values in a given argument indicate multiple entries.
Output is presented so that user can use plot repo to plot the results with each of the
variable argument's entries on the x-axis. Args is modified in accordance with this.
More than 1 argument with multiple entries is not permitted.
Output is presented so that user can use plot repo to plot the results with each of the
variable argument's entries on the x-axis. Args is modified in accordance with this.
More than 1 argument with multiple entries is not permitted.
Args:
args (dict): Dictionary containing arguments passed by the user (and default arguments)
"""
@ -138,12 +138,12 @@ def print_benchmark_results(report):
if x_axis_name:
x_axis_output_label = f'{x_axis_name} |'
heading += append_spaces(x_axis_output_label, col_width)
metric_headers = ['agent latency (seconds)', 'agent throughput',
metric_headers = ['agent latency (seconds)', 'agent throughput',
'observer latency (seconds)', 'observer throughput']
percentile_subheaders = ['p50', 'p75', 'p90', 'p95']
subheading = ""
if x_axis_name:
subheading += append_spaces(' ' * (len(x_axis_output_label) - 1), col_width)
subheading += append_spaces(' ' * (len(x_axis_output_label) - 1), col_width)
for header in metric_headers:
heading += append_spaces(header, col_width * len(percentile_subheaders))
for percentile in percentile_subheaders:
@ -163,7 +163,7 @@ def print_benchmark_results(report):
def main():
r"""
Runs rpc benchmark once if no argument has multiple entries, and otherwise once for each of the multiple entries.
Runs rpc benchmark once if no argument has multiple entries, and otherwise once for each of the multiple entries.
Multiple entries is indicated by comma separated values, and may only be done for a single argument.
Results are printed as well as saved to output file. In case of multiple entries for a single argument,
the plot repo can be used to benchmark results on the y axis with each entry on the x axis.
@ -171,7 +171,7 @@ def main():
find_graph_variable(args)
# run once if no x axis variables
x_axis_variables = args[args['x_axis_name']] if args.get('x_axis_name') else [None]
x_axis_variables = args[args['x_axis_name']] if args.get('x_axis_name') else [None]
ctx = mp.get_context('spawn')
queue = ctx.SimpleQueue()
benchmark_runs = []
@ -197,7 +197,7 @@ def main():
print(f"Time taken benchmark run {i} -, {time.time() - start_time}")
if args.get('x_axis_name'):
# save x axis value was for this iteration in the results
benchmark_run_results[args['x_axis_name']] = x_axis_variable
benchmark_run_results[args['x_axis_name']] = x_axis_variable
benchmark_runs.append(benchmark_run_results)
report = args

View File

@ -1,6 +1,6 @@
# Fast RNN benchmarks
Benchmarks for TorchScript models
Benchmarks for TorchScript models
For most stable results, do the following:
- Set CPU Governor to performance mode (as opposed to energy save)
@ -24,7 +24,7 @@ or run the test independently:
should give a good comparison, or you can specify the type of model to run
`python -m fastrnns.bench --rnns cudnn aten jit --group rnns`
`python -m fastrnns.bench --rnns cudnn aten jit --group rnns`
## Run model profiling, calls nvprof
@ -33,7 +33,7 @@ should give a good comparison, or you can specify the type of model to run
should generate nvprof file for all models somewhere.
you can also specify the models to generate nvprof files separately:
`python -m fastrnns.profile --rnns aten jit`
`python -m fastrnns.profile --rnns aten jit`
### Caveats

View File

@ -1,16 +1,16 @@
import operator_benchmark as op_bench
import benchmark_caffe2 as op_bench_c2
from benchmark_caffe2 import Caffe2BenchmarkBase # noqa
from caffe2.python import core
from caffe2.python import core
"""Microbenchmarks for element-wise Add operator. Supports both Caffe2/PyTorch."""
# Configs for C2 add operator
# Configs for C2 add operator
add_long_configs = op_bench.cross_product_configs(
M=[8, 64, 128],
N=range(2, 10, 3),
K=[2 ** x for x in range(0, 3)],
K=[2 ** x for x in range(0, 3)],
dtype=["int", "float"],
tags=["long"]
)
@ -22,20 +22,20 @@ add_short_configs = op_bench.config_list(
[16, 16, 64, "float"],
[64, 64, 128, "int"],
],
attr_names=["M", "N", "K", "dtype"],
tags=["short"],
attr_names=["M", "N", "K", "dtype"],
tags=["short"],
)
class AddBenchmark(op_bench_c2.Caffe2BenchmarkBase):
def init(self, M, N, K, dtype):
self.input_one = self.tensor([M, N, K], dtype)
self.input_two = self.tensor([M, N, K], dtype)
def init(self, M, N, K, dtype):
self.input_one = self.tensor([M, N, K], dtype)
self.input_two = self.tensor([M, N, K], dtype)
self.output = self.tensor([M, N, K], dtype)
self.set_module_name("add")
def forward(self):
op = core.CreateOperator(
"Add", [self.input_one, self.input_two], self.output, **self.args
"Add", [self.input_one, self.input_two], self.output, **self.args
)
return op

View File

@ -2,7 +2,7 @@
import operator_benchmark as op_bench
import benchmark_caffe2 as op_bench_c2
from benchmark_caffe2 import Caffe2BenchmarkBase # noqa
from caffe2.python import core
from caffe2.python import core
"""Microbenchmarks for MatMul operator"""
@ -10,7 +10,7 @@ from caffe2.python import core
mm_long_configs = op_bench.cross_product_configs(
M=[8, 64, 128],
N=range(2, 10, 3),
K=[2 ** x for x in range(0, 3)],
K=[2 ** x for x in range(0, 3)],
trans_a=[True, False],
trans_b=[True, False],
tags=["long"]
@ -23,13 +23,13 @@ mm_short_configs = op_bench.config_list(
[1024, 1024, 256, True, False],
[8192, 8192, 1024, True, False],
],
attr_names=["M", "N", "K", "trans_a", "trans_b"],
tags=["short"],
attr_names=["M", "N", "K", "trans_a", "trans_b"],
tags=["short"],
)
class MatMulBenchmark(op_bench_c2.Caffe2BenchmarkBase):
def init(self, M, N, K, trans_a, trans_b):
def init(self, M, N, K, trans_a, trans_b):
self.input_one = self.tensor([N, M]) if trans_a else self.tensor([M, N])
self.input_two = self.tensor([K, N]) if trans_b else self.tensor([N, K])
self.args = {'trans_a': trans_a, 'trans_b': trans_b}
@ -38,7 +38,7 @@ class MatMulBenchmark(op_bench_c2.Caffe2BenchmarkBase):
def forward(self):
op = core.CreateOperator(
"MatMul", [self.input_one, self.input_two], self.output, **self.args
"MatMul", [self.input_one, self.input_two], self.output, **self.args
)
return op

View File

@ -22,7 +22,7 @@ unary_ops_list = op_bench.op_list(
class UnaryOpBenchmark(op_bench.TorchBenchmarkBase):
def init(self, M, N, op_func):
def init(self, M, N, op_func):
self.input_one = torch.rand(M, N)
self.op_func = op_func

View File

@ -1,5 +1,5 @@
import operator_benchmark as op_bench
from caffe2.python import core
from caffe2.python import core
add_configs = op_bench.cross_product_configs(
@ -11,24 +11,24 @@ add_configs = op_bench.cross_product_configs(
)
class AddBenchmark(op_bench.Caffe2BenchmarkBase):
def init(self, M, N, K, device):
def init(self, M, N, K, device):
self.set_module_name("add")
self.input_one = self.tensor([M, N, K], device=device)
self.input_two = self.tensor([M, N, K], device=device)
self.input_one_grad = self.tensor([M, N, K], device=device)
self.input_two_grad = self.tensor([M, N, K], device=device)
self.input_one = self.tensor([M, N, K], device=device)
self.input_two = self.tensor([M, N, K], device=device)
self.input_one_grad = self.tensor([M, N, K], device=device)
self.input_two_grad = self.tensor([M, N, K], device=device)
self.output = self.tensor([M, N, K], device=device)
def forward(self):
op = core.CreateOperator(
"Add", [self.input_one, self.input_two], self.output, **self.args
"Add", [self.input_one, self.input_two], self.output, **self.args
)
return op
def backward(self):
grad_op = core.CreateOperator(
"AddGradient", [self.output, self.input_one, self.input_two],
[self.input_one_grad, self.input_two_grad], **self.args
"AddGradient", [self.output, self.input_one, self.input_two],
[self.input_one_grad, self.input_two_grad], **self.args
)
return grad_op

View File

@ -5,8 +5,8 @@ intraop_bench_configs = op_bench.config_list(
attrs=[
[8, 16],
],
attr_names=["M", "N"],
tags=["short"],
attr_names=["M", "N"],
tags=["short"],
)
@torch.jit.script
@ -24,9 +24,9 @@ class TorchSumBenchmark(op_bench.TorchBenchmarkBase):
self.input_one = torch.rand(M, N)
self.set_module_name("sum")
# This is a very temporary method and will be removed soon, so
# This is a very temporary method and will be removed soon, so
# don't use this method in your benchmark
# TODO(mingzhe): use one forward method for both JIT and Eager
# TODO(mingzhe): use one forward method for both JIT and Eager
def jit_forward(self, iters):
return torch_sumall(self.input_one, iters)

View File

@ -10,9 +10,9 @@ add_configs = op_bench.cross_product_configs(
)
# This benchmark uses the auto_set to automatically set requires_grad
# for both inputs. The test name can also be used for filtering.
# for both inputs. The test name can also be used for filtering.
class AddBenchmark(op_bench.TorchBenchmarkBase):
def init(self, M, N, K):
def init(self, M, N, K):
self.input_one = torch.rand(M, N, K, requires_grad=self.auto_set())
self.input_two = torch.rand(M, N, K, requires_grad=self.auto_set())
self.set_module_name("add")

View File

@ -4,7 +4,7 @@ import torch
"""Microbenchmarks for element-wise Add operator. Supports both Caffe2/PyTorch."""
add_short_configs = op_bench.config_list(
attr_names=['M', 'N', 'K'],
attr_names=['M', 'N', 'K'],
attrs=[
[8, 16, 32],
[16, 16, 64],
@ -14,12 +14,12 @@ add_short_configs = op_bench.config_list(
'device': ['cpu', 'cuda'],
'dtype': [torch.float, torch.float64],
},
tags=['short'],
tags=['short'],
)
class AddBenchmark(op_bench.TorchBenchmarkBase):
def init(self, M, N, K, device, dtype):
def init(self, M, N, K, device, dtype):
self.input_one = torch.rand(M, N, K, device=device, dtype=dtype, requires_grad=True)
self.input_two = torch.rand(M, N, K, device=device, dtype=dtype)
self.set_module_name('add')

View File

@ -12,7 +12,7 @@ add_configs = op_bench.cross_product_configs(
class AddBenchmark(op_bench.TorchBenchmarkBase):
def init(self, M, N, K, device):
def init(self, M, N, K, device):
self.input_one = torch.rand(M, N, K, device=device, requires_grad=True)
self.input_two = torch.rand(M, N, K, device=device, requires_grad=True)
self.set_module_name("add")

View File

@ -7,19 +7,19 @@ configs = op_bench.random_sample_configs(
N=[7, 8, 9, 10, 11, 12],
K=[13, 14, 15, 16, 17, 18],
# probs saves the weights of each value
probs=op_bench.attr_probs(
probs=op_bench.attr_probs(
M=[0.5, 0.2, 0.1, 0.05, 0.03, 0.1],
N=[0.1, 0.3, 0.4, 0.02, 0.03, 0.04],
K=[0.03, 0.6, 0.04, 0.02, 0.03, 0.01],
),
# this is the number of returned inputs
total_samples=10,
# this is the number of returned inputs
total_samples=10,
tags=["short"],
)
class AddBenchmark(op_bench.TorchBenchmarkBase):
def init(self, M, N, K):
def init(self, M, N, K):
self.input_one = torch.rand(M, N, K)
self.input_two = torch.rand(M, N, K)
self.set_module_name("add")

View File

@ -1,9 +1,9 @@
# Sparse benchmarks
# These benchmarks are for the sparse matrix functionality.
# These benchmarks are for the sparse matrix functionality.
# They exist for comparing the performance of sparse matrix routines
# torch.sparse.mm(sparse, sparse)` with different backends (CPU/CUDA)
# and with other frameworks such as scipy.
# and with other frameworks such as scipy.
import sys
from scipy import sparse

View File

@ -3,9 +3,9 @@
DATASET_ROOT_DIR=$HOME/datasets/
# wget https://storage.googleapis.com/sgk-sc2020/dlmc.tar.gz -P $DATASET_ROOT_DIR
# tar -xvf $DATASET_ROOT_DIR/dlmc.tar.gz
# tar -xvf $DATASET_ROOT_DIR/dlmc.tar.gz
echo "!! SPARSE SPMS TIME BENCHMARK!! "
echo "!! SPARSE SPMS TIME BENCHMARK!! "
python matmul_dlmc_bench.py --path $DATASET_ROOT_DIR/dlmc/rn50 --dataset random_pruning --operation matmul --output /tmp/matmul_bench.pkl
python matmul_dlmc_bench.py --path $DATASET_ROOT_DIR/dlmc/rn50 --dataset random_pruning --operation backward --output /tmp/backward_bench.pkl

Some files were not shown because too many files have changed in this diff Show More