Compare commits

...

12 Commits

Author SHA1 Message Date
a8e7c98cb9 Revert "Require less alignment for attn bias (#114173) (#114837)"
This reverts commit 59656491f3b1da809312942872cce010337504b0.
2023-12-12 08:41:07 -08:00
448700d18e Fix NULL dereference in binary CPU ops (#115241)
* Fix NULL dereference in binary CPU ops (#115183)

Targeted fix for https://github.com/pytorch/pytorch/issues/113037

A more fundamental one, where those functions are not even called for
empty tensors are coming later

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115183
Approved by: https://github.com/drisspg, https://github.com/atalman, https://github.com/huydhn

* Fix build after conflict resolution

* Also include https://github.com/pytorch/pytorch/pull/113262 to pass the test

---------

Co-authored-by: Nikita Shulga <nshulga@meta.com>
2023-12-06 01:20:06 -08:00
59656491f3 Require less alignment for attn bias (#114173) (#114837)
Improved Fix for Attention Mask Alignment Issue (#112577)

This PR addresses Issue #112577 by refining the previously implemented fix, which was found to be incorrect and causes un-needed memory regressions. The update simplifies the approach to handling the alignment of the attention mask for mem eff attention.

Alignment Check and Padding: Initially, the alignment of the attention mask is checked. If misalignment is detected, padding is applied, followed by slicing. During this process, a warning is raised to alert users.

Should this be warn_once?

We only call expand, once on the aligned mask.

Reference
https://github.com/facebookresearch/xformers/blob/main/xformers/ops/fmha/cutlass.py#L115

@albanD, @mruberry, @jbschlosser, @walterddr, and @mikaylagawarecki.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114173
Approved by: https://github.com/danthe3rd
2023-12-05 14:50:58 -05:00
41210eaedc [MPS] Fix out-of-bounds fill to sliced tensor (#114958)
This fixes regression introduced by https://github.com/pytorch/pytorch/pull/81951 that caused out-of-bounds access when sliced tensor is filled with zeros

Remove bogus `TORCH_INTERNAL_ASSERT(length >= offset)` as [NSMakeRange](https://developer.apple.com/documentation/foundation/1417188-nsmakerange?language=objc) arguments are location and length rather than start and end offset.

In `fill_mps_tensor_`:
- Pass `value` argument to `MPSStream::fill`
- Pass `self.nbytes()` rather than `self.storage().nbytes()` as length of of buffer to fill as later will always results in out-of-bounds write if offset within the store is non-zero

Add regression test

Fixes https://github.com/pytorch/pytorch/issues/114692

Cherry pick of https://github.com/pytorch/pytorch/pull/114838 into release/2.1 branch

Co-authored-by: Nikita Shulga <nshulga@meta.com>
2023-12-01 10:58:57 -08:00
3183bcd417 Fix mkldnn_matmul error on AArch64 (#114851)
Fixes https://github.com/pytorch/pytorch/issues/110149

Cherry pick https://github.com/pytorch/pytorch/pull/110150. This is a bug fix against 2.1 release
2023-11-30 08:11:08 -08:00
b5a89bbc5f Fix broadcasting cosine_similarity (#114795)
* Fix broadcasting cosine_similarity (#109363)

Fixes https://github.com/pytorch/pytorch/issues/109333
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109363
Approved by: https://github.com/peterbell10

* The PR incidentally fixes the test by switching from sizes to sym_sizes

test_make_fx_symbolic_exhaustive_masked_scatter_cpu_float32

---------

Co-authored-by: lezcano <lezcano-93@hotmail.com>
2023-11-30 00:23:40 -08:00
3f662b6255 Package pybind11/eigen/ (#113055) (#114756)
Which was added for eigen 2.11 release, see https://github.com/pybind/pybind11/tree/v2.11.0/include/pybind11/eigen

Fixes https://github.com/pytorch/pytorch/issues/112841

Cherry-pick of  https://github.com/pytorch/pytorch/pull/113055 into release/2.1 branch

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2023-11-29 07:04:29 -08:00
614af50378 [release only] Pin disabled-test-condensed and slow-tests json (#114514)
* [release only] Pin disabled-test-condensed json

* pin slow tests json
2023-11-27 13:30:27 -05:00
b3b22d7390 [BE] Handle errors in set_num_threads (#114420)
and `set_num_interop_threads`

Before that, call `torch.set_num_threads(2**65)` resulted in segmentation fault, afterwards it becomes a good old runtime error:
```
% python -c "import torch;torch.set_num_threads(2**65)"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
RuntimeError: Overflow when unpacking long
```

Similar to https://github.com/pytorch/pytorch/pull/60073

Cherry pick of https://github.com/pytorch/pytorch/pull/113684 into release/2.1

(cherry picked from commit 78f3937ee84e71475942598f4b51dce7c8a70783)
2023-11-23 14:04:26 -05:00
7405d70c30 [MPS] Fix crashes during Conv backward pass (#114419)
By adding weights tensor to the MPSGraph cache key.
Add regression test to validate that collision no longer happens

Fixes https://github.com/pytorch/pytorch/issues/112998

Cherry pick of https://github.com/pytorch/pytorch/pull/113398 into release/2.1

(cherry picked from commit 265d6aac0b71b917d6e36c5dd65c22f61644b715)
2023-11-23 14:02:46 -05:00
d62c757533 [Caffe2] Handle cpuinfo_initialize() failure (#114418)
It can fail on ARM platform if `/sys` folder is not accessible.
In that case, call `std:🧵:hardware_concurrency()`, which is
aligned with the thread_pool initialization logic of `c10::TaskThreadPoolBase:defaultNumThreads()`

Further addresses issue raised in https://github.com/pytorch/pytorch/issues/113568
This is a cherry-pick of https://github.com/pytorch/pytorch/pull/114011 into release/2.1 branch

(cherry picked from commit 310e3060b7e4d0c76149aadad4519c7abed8c2a7)
2023-11-23 14:01:16 -05:00
7833889a44 Fix chrome trace entry format (#113763) (#114416)
Fix regression introduced by https://github.com/pytorch/pytorch/pull/107519

`'"args": {{}}}}, '` was part of format string, when curly braces a duplicated to get them printed single time, but ruff change left the string format as is

Fixes https://github.com/pytorch/pytorch/issues/113756

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113763
Approved by: https://github.com/Skylion007, https://github.com/aaronenyeshi

(cherry picked from commit e100ff42fd087d7a1696cb52c216507d45b8fb85)
2023-11-23 13:57:43 -05:00
18 changed files with 140 additions and 57 deletions

View File

@ -187,17 +187,18 @@ expand_inplace(
// See NOTE [ ExpandUtils Borrowing ] above for `MaybeOwned` explanation.
inline std::tuple<c10::MaybeOwned<Tensor>, c10::MaybeOwned<Tensor>>
expand_outplace(const Tensor& to_expand1, const Tensor& to_expand2) {
if (to_expand1.sizes().equals(to_expand2.sizes())) {
auto s1 = to_expand1.sym_sizes();
auto s2 = to_expand2.sym_sizes();
if (s1.equals(s2)) {
return std::make_tuple(
c10::MaybeOwned<Tensor>::borrowed(to_expand1),
c10::MaybeOwned<Tensor>::borrowed(to_expand2));
}
auto expanded_size =
infer_size_dimvector(to_expand1.sizes(), to_expand2.sizes());
auto expanded_size = infer_size_symdimvector(s1, s2);
return std::make_tuple(
c10::MaybeOwned<Tensor>::owned(to_expand1.expand(expanded_size)),
c10::MaybeOwned<Tensor>::owned(to_expand2.expand(expanded_size)));
c10::MaybeOwned<Tensor>::owned(to_expand1.expand_symint(expanded_size)),
c10::MaybeOwned<Tensor>::owned(to_expand2.expand_symint(expanded_size)));
}
inline std::tuple<c10::MaybeOwned<Tensor>, c10::MaybeOwned<Tensor>>

View File

@ -147,9 +147,9 @@ void MPSStream::addCompletedHandler(MTLCommandBufferHandler block) {
}
void MPSStream::fill(id<MTLBuffer> buffer, uint8_t value, size_t length, size_t offset, SyncType syncType) {
TORCH_INTERNAL_ASSERT(length >= offset);
if (length == 0)
if (length == 0) {
return;
}
dispatch_sync(_serialQueue, ^() {
@autoreleasepool {
endKernelCoalescing();

View File

@ -308,16 +308,18 @@ Tensor cosine_similarity(const Tensor& x1_, const Tensor& x2_, int64_t dim, doub
// We accept integral types (and bools lol) but vector_norm does not
auto x1_is_int = c10::isIntegralType(x1_.scalar_type(), /*încludeBool=*/true);
auto x2_is_int = c10::isIntegralType(x2_.scalar_type(), /*încludeBool=*/true);
auto x1 = x1_is_int ? x1_.to(commonDtype) : x1_;
auto x2 = x2_is_int ? x2_.to(commonDtype) : x2_;
auto x1_t = x1_is_int ? x1_.to(commonDtype) : x1_;
auto x2_t = x2_is_int ? x2_.to(commonDtype) : x2_;
c10::MaybeOwned<Tensor> x1, x2;
std::tie(x1, x2) = expand_outplace(x1_t, x2_t);
// We want to divide each tensor by its norm first, as it's more numerically stable.
// This keeps the result between -1.0 and 1.0
// We clone them, as we're going to modify them in-place
// This allows the gradients to propagate propertly all the way to x1 and x2
auto x1_norm = at::linalg_vector_norm(x1, 2, /*dim=*/dim, /*keepdim=*/true).clone();
auto x2_norm = at::linalg_vector_norm(x2, 2, /*dim=*/dim, /*keepdim=*/true).clone();
auto x1_norm = at::linalg_vector_norm(*x1, 2, /*dim=*/dim, /*keepdim=*/true).clone();
auto x2_norm = at::linalg_vector_norm(*x2, 2, /*dim=*/dim, /*keepdim=*/true).clone();
{
at::NoGradGuard guard;
@ -325,7 +327,7 @@ Tensor cosine_similarity(const Tensor& x1_, const Tensor& x2_, int64_t dim, doub
x2_norm.clamp_min_(eps);
}
return ((x1 / x1_norm) * (x2 / x2_norm)).sum(dim);
return ((*x1 / x1_norm) * (*x2 / x2_norm)).sum(dim);
}
}} // namespace at::native

View File

@ -1483,12 +1483,14 @@ static void addmm_impl_cpu_(
// it is faster to call oneDNN matrix multiplication primitive with RHS*LHS
// that will call then into Arm® Compute Library (ACL) GEMM kernel and also
// additionally have support for running kernel with BF16 instructions
bool apply_heur = apply_mkldnn_matmul_heur(b.sizes()[0], b.sizes()[1], a.sizes()[1]);
if (apply_heur && transpose_a && !transpose_b && result.scalar_type() == at::ScalarType::Float) {
mkldnn_matmul(b, a, c, beta.to<float>(), alpha.to<float>());
// We have dispatched to ACL GEMM for single precision float
// so do not need to dispatch to BLAS GEMM below
dispatched = true;
if (transpose_c) {
bool apply_heur = apply_mkldnn_matmul_heur(b.sizes()[0], b.sizes()[1], a.sizes()[1]);
if (apply_heur && transpose_a && !transpose_b && result.scalar_type() == at::ScalarType::Float) {
mkldnn_matmul(b, a, c, beta.to<float>(), alpha.to<float>());
// We have dispatched to ACL GEMM for single precision float
// so do not need to dispatch to BLAS GEMM below
dispatched = true;
}
}
#endif

View File

@ -101,7 +101,7 @@ void mul_kernel(TensorIteratorBase& iter) {
using comp_t = c10::complex<float>;
return comp_t{a} * comp_t{b};
});
} else if (iter.is_scalar(2) && at::isReducedFloatingType(dtype)) {
} else if (iter.is_scalar(2) && iter.data_ptr(2) != nullptr && at::isReducedFloatingType(dtype)) {
AT_DISPATCH_REDUCED_FLOATING_TYPES(dtype, "mul_cpu_reduced_float", [&]() {
using opmath_t = at::opmath_type<scalar_t>;
opmath_t b = iter.original_scalar_value<opmath_t>(2);
@ -125,7 +125,7 @@ void mul_kernel(TensorIteratorBase& iter) {
void div_true_kernel(TensorIteratorBase& iter) {
const auto dtype = iter.common_dtype();
if (iter.is_scalar(2) && at::isReducedFloatingType(dtype)) {
if (iter.is_scalar(2) && iter.data_ptr(2) != nullptr && at::isReducedFloatingType(dtype)) {
AT_DISPATCH_REDUCED_FLOATING_TYPES(dtype, "div_cpu_reduced_float", [&]() {
using opmath_t = at::opmath_type<scalar_t>;
opmath_t b = iter.original_scalar_value<opmath_t>(2);
@ -162,19 +162,28 @@ void div_trunc_kernel(TensorIteratorBase& iter) {
return a / b;
});
});
} else if (iter.is_scalar(2) && at::isReducedFloatingType(dtype)) {
AT_DISPATCH_REDUCED_FLOATING_TYPES(dtype, "div_trunc_cpu_reduced_float", [&]() {
using opmath_t = at::opmath_type<scalar_t>;
opmath_t b = iter.original_scalar_value<opmath_t>(2);
iter.remove_operand(2);
cpu_kernel_vec(iter,
[=](scalar_t a) __ubsan_ignore_float_divide_by_zero__ -> scalar_t {
return std::trunc(static_cast<opmath_t>(a) / b);
},
[=](Vectorized<scalar_t> a) {
return binary_op_scalar(a, b, [](const Vectorized<opmath_t>& x, const Vectorized<opmath_t>& y) { return (x / y).trunc(); });
} else if (iter.is_scalar(2) && iter.data_ptr(2) != nullptr && at::isReducedFloatingType(dtype)) {
AT_DISPATCH_REDUCED_FLOATING_TYPES(
dtype, "div_trunc_cpu_reduced_float", [&]() {
using opmath_t = at::opmath_type<scalar_t>;
opmath_t b = iter.original_scalar_value<opmath_t>(2);
iter.remove_operand(2);
cpu_kernel_vec(
iter,
[=](scalar_t a)
__ubsan_ignore_float_divide_by_zero__ -> scalar_t {
return std::trunc(static_cast<opmath_t>(a) / b);
},
[=](Vectorized<scalar_t> a) {
return binary_op_scalar(
a,
b,
[](const Vectorized<opmath_t>& x,
const Vectorized<opmath_t>& y) {
return (x / y).trunc();
});
});
});
});
} else {
AT_DISPATCH_FLOATING_TYPES_AND2(kBFloat16, kHalf, dtype, "div_trunc_cpu", [&]() {
cpu_kernel_vec(iter,
@ -223,20 +232,25 @@ void div_floor_kernel(TensorIteratorBase& iter) {
});
} else {
// See NOTE: [Floor Division in Python]
if (iter.is_scalar(2) && at::isReducedFloatingType(dtype)) {
AT_DISPATCH_REDUCED_FLOATING_TYPES(dtype, "div_floor_cpu_reduced_float", [&]() {
using opmath_t = at::opmath_type<scalar_t>;
opmath_t b = iter.original_scalar_value<opmath_t>(2);
iter.remove_operand(2);
using vec_t = Vectorized<opmath_t>;
cpu_kernel_vec(iter,
[=](scalar_t a) -> scalar_t {
return div_floor_floating(static_cast<opmath_t>(a), b);
},
[=](Vectorized<scalar_t> a) {
return binary_op_scalar(a, b, [](const vec_t& x, const vec_t& y) { return div_floor_floating_vec(x, y); });
if (iter.is_scalar(2) && iter.data_ptr(2) != nullptr && at::isReducedFloatingType(dtype)) {
AT_DISPATCH_REDUCED_FLOATING_TYPES(
dtype, "div_floor_cpu_reduced_float", [&]() {
using opmath_t = at::opmath_type<scalar_t>;
opmath_t b = iter.original_scalar_value<opmath_t>(2);
iter.remove_operand(2);
using vec_t = Vectorized<opmath_t>;
cpu_kernel_vec(
iter,
[=](scalar_t a) -> scalar_t {
return div_floor_floating(static_cast<opmath_t>(a), b);
},
[=](Vectorized<scalar_t> a) {
return binary_op_scalar(
a, b, [](const vec_t& x, const vec_t& y) {
return div_floor_floating_vec(x, y);
});
});
});
});
} else {
AT_DISPATCH_FLOATING_TYPES_AND2(kBFloat16, kHalf, dtype, "div_floor_cpu", [&]() {
using vec_t = Vectorized<scalar_t>;

View File

@ -72,7 +72,7 @@ static bool fill_mps_tensor_(Tensor& self, uint8_t value) {
if (self.is_contiguous()) {
MPSStream* stream = getCurrentMPSStream();
auto storage_byte_offset = self.storage_offset() * self.itemsize();
stream->fill(mps::getMTLBufferStorage(self), 0, self.storage().nbytes(), storage_byte_offset);
stream->fill(mps::getMTLBufferStorage(self), value, self.nbytes(), storage_byte_offset);
return true;
}
return false;

View File

@ -445,7 +445,7 @@ static Tensor mps_convolution_backward_weights(IntArrayRef weight_size,
string key = "mps_convolution_backward_weights:" + to_string(stride[0]) + ":" + to_string(stride[1]) + ":" +
to_string(dilation[0]) + ":" + to_string(dilation[1]) + ":" + to_string(padding[0]) + ":" +
to_string(padding[1]) + ":" + to_string(groups) + ":" + mem_format_key +
getTensorsStringKey({grad_output_t, input_t}) + ":" + string([ns_shape_key UTF8String]);
getTensorsStringKey({grad_output_t, input_t, grad_weight_t}) + ":" + string([ns_shape_key UTF8String]);
auto cachedGraph = LookUpOrCreateCachedGraph<CachedGraph>(key, [&](auto mpsGraph, auto newCachedGraph) {
MPSGraphConvolution2DOpDescriptor* conv2dDescriptor_ = [[MPSGraphConvolution2DOpDescriptor new] autorelease];

View File

@ -41,8 +41,13 @@ namespace {
}
size_t getDefaultNumThreads() {
CAFFE_ENFORCE(cpuinfo_initialize(), "cpuinfo initialization failed");
int numThreads = cpuinfo_get_processors_count();
auto numThreads = 1U;
if (cpuinfo_initialize()) {
numThreads = std::max(cpuinfo_get_processors_count(), 1U);
} else {
LOG(WARNING) << "cpuinfo initialization failed";
numThreads = std::max(std::thread::hardware_concurrency(), 1U);
}
bool applyCap = false;
#if defined(C10_ANDROID)
@ -109,7 +114,7 @@ size_t getDefaultNumThreads() {
* detect if we are running under tsan, for now capping the default
* threadcount to the tsan limit unconditionally.
*/
int tsanThreadLimit = 63;
auto tsanThreadLimit = 63U;
numThreads = std::min(numThreads, tsanThreadLimit);
return numThreads;

View File

@ -1323,6 +1323,7 @@ def main():
"include/torch/csrc/lazy/ts_backend/*.h",
"include/pybind11/*.h",
"include/pybind11/detail/*.h",
"include/pybind11/eigen/*.h",
"include/TH/*.h*",
"include/TH/generic/*.h*",
"include/THC/*.cuh",

View File

@ -516,16 +516,20 @@ class TestForeach(TestCase):
sum(ref((ref_tensors,), ord=ord)).backward()
self.assertEqual([t.grad for t in tensors], [t.grad for t in ref_tensors])
@dtypes(*all_types_and_complex_and(torch.half, torch.bfloat16, torch.bool))
@dtypes(*all_types_and_complex_and(torch.half, torch.bfloat16))
def test_add_scalar_with_empty_list_and_empty_tensor(self, device, dtype):
# TODO: enable empty list case
for tensors in [[torch.randn([0])]]:
for tensors in [[torch.randn([0], device=device, dtype=dtype)],
[torch.empty_strided((0, 1), (0, 0), dtype=dtype, device=device)]]:
res = torch._foreach_add(tensors, 1)
self.assertEqual(res, tensors)
torch._foreach_add_(tensors, 1)
self.assertEqual(res, tensors)
# Regression test for https://github.com/pytorch/pytorch/issues/113156
torch._foreach_mul_(tensors, 1)
@ops(
filter(lambda op: not op.has_no_out_of_place, foreach_binary_op_db),
dtypes=OpDTypes.supported,

View File

@ -1197,17 +1197,22 @@ class TestMPS(TestCaseMPS):
tensor_cpu = tensor_0[:][1].fill_(val)
self.assertEqual(tensor_mps, tensor_cpu)
self.assertEqual(tensor, tensor_0)
shape = [1, 10]
val = 0.0
tensor = torch.ones(shape, device="mps")
val_tensor_mps = torch.tensor(val, device="mps")
tensor_mps = tensor[:, 9].fill_(val_tensor_mps)
# Regression test for https://github.com/pytorch/pytorch/issues/114692
tensor[:, 5].fill_(val_tensor_mps)
tensor_0 = torch.ones(shape, device="cpu")
val_tensor_cpu = torch.tensor(val, device="cpu")
tensor_cpu = tensor_0[:, 9].fill_(val_tensor_cpu)
tensor_0[:, 5].fill_(val_tensor_cpu)
self.assertEqual(tensor_mps, tensor_cpu)
self.assertEqual(tensor_mps.to(device="cpu"), tensor_cpu)
self.assertEqual(tensor.to(device="cpu"), tensor_0)
def test_cdist_large(self, device="mps"):
for cm in ['use_mm_for_euclid_dist_if_necessary', 'use_mm_for_euclid_dist', 'donot_use_mm_for_euclid_dist']:
@ -7982,6 +7987,18 @@ class TestNNMPS(NNTestCase):
actual = F.conv2d(x, y, padding='valid')
self.assertEqual(expect.to('cpu'), actual.to('cpu'))
def test_conv2d_backward_collision(self):
# Test for https://github.com/pytorch/pytorch/issues/112998
x = torch.rand(1, 1, 10, 10, device="mps", requires_grad=True)
m1 = nn.Conv2d(1, 1, 3, stride=2, padding=1).to("mps")
m2 = nn.Conv2d(1, 1, 4, stride=2, padding=1).to("mps")
y1, y2 = m1(x), m2(x)
self.assertEqual(y1.shape, y2.shape)
y1.sum().backward()
# This used to crash with MPSNDArrayConvolutionA14.mm:4352: failed assertion
y2.sum().backward()
def test_gemm_permute_transpose(self):
batch_size = 32
n = 20

View File

@ -5609,6 +5609,18 @@ tensor(..., device='meta', size=(1,), requires_grad=True)""")
out = F.cosine_similarity(input.to(torch.int8), input, dim=-1)
self.assertEqual(out, 1.)
# Check broadcasting #109333
a = torch.ones(2, 3, dtype=torch.float)
b = torch.ones(1, 1, dtype=torch.float)
out = F.cosine_similarity(a, b)
self.assertEqual(out, torch.ones(2, dtype=torch.float))
a = torch.ones(2, 3, dtype=torch.float)
b = torch.ones(1, dtype=torch.float)
out = F.cosine_similarity(a, b)
self.assertEqual(out, torch.ones(2, dtype=torch.float))
def test_grid_sample_error_checking(self):
input = torch.empty(1, 1, 2, 2)
grid = torch.empty(1, 1, 1, 2)

View File

@ -472,6 +472,21 @@ class TestNumPyInterop(TestCase):
else:
self.assertTrue(t == a)
@onlyCPU
def test_empty_tensors_interop(self, device):
x = torch.rand((), dtype=torch.float16)
y = torch.tensor(np.random.rand(0), dtype=torch.float16)
# Same can be achieved by running
# y = torch.empty_strided((0,), (0,), dtype=torch.float16)
# Regression test for https://github.com/pytorch/pytorch/issues/115068
self.assertEqual(torch.true_divide(x, y).shape, y.shape)
# Regression test for https://github.com/pytorch/pytorch/issues/115066
self.assertEqual(torch.mul(x, y).shape, y.shape)
# Regression test for https://github.com/pytorch/pytorch/issues/113037
self.assertEqual(torch.div(x, y, rounding_mode='floor').shape, y.shape)
instantiate_device_type_tests(TestNumPyInterop, globals())
if __name__ == '__main__':

View File

@ -1604,7 +1604,6 @@ symbolic_tensor_failures.update(symbolic_tensor_segfaults)
outplace_symbolic_tensor_failures = {
xfail('i0', ''), # aten.i0.default - couldn't find symbolic meta function/decomposition
xfail('masked_scatter', ''), # aten.masked_scatter.default - couldn't find symbolic meta function/decomposition
}
inplace_symbolic_tensor_failures = {

View File

@ -9151,6 +9151,13 @@ tensor([[[1.+1.j, 1.+1.j, 1.+1.j, ..., 1.+1.j, 1.+1.j, 1.+1.j],
t2 = t[0:0].view(0, 1)
self.assertEqual(t2.data_ptr(), 0)
def test_invalid_arg_error_handling(self) -> None:
""" Tests that errors from old TH functions are propagated back """
for invalid_val in [-1, 2**65]:
self.assertRaises(RuntimeError, lambda: torch.set_num_threads(invalid_val))
self.assertRaises(RuntimeError, lambda: torch.set_num_interop_threads(invalid_val))
# The following block extends TestTorch with negative dim wrapping tests
# FIXME: replace these with OpInfo sample inputs or systemic OpInfo tests
# Functions to test negative dimension wrapping

View File

@ -66,7 +66,7 @@ def fetch_and_cache(
def get_slow_tests(
dirpath: str, filename: str = SLOW_TESTS_FILE
) -> Optional[Dict[str, float]]:
url = "https://ossci-metrics.s3.amazonaws.com/slow-tests.json"
url = "https://ossci-metrics.s3.amazonaws.com/slow-tests.json?versionId=iWAOsEqlVH1mfs7w5A3KlyTalvubE4Ru"
try:
return fetch_and_cache(dirpath, filename, url, lambda x: x)
except Exception:
@ -98,7 +98,7 @@ def get_disabled_tests(
return disabled_test_from_issues
try:
url = "https://ossci-metrics.s3.amazonaws.com/disabled-tests-condensed.json"
url = "https://ossci-metrics.s3.amazonaws.com/disabled-tests-condensed.json?versionId=JMUOdxgUAeI4yXhzc.dJlCuxVrsfkZTj"
return fetch_and_cache(dirpath, filename, url, process_disabled_test)
except Exception:
print("Couldn't download test skip set, leaving all tests enabled...")

View File

@ -253,7 +253,7 @@ class EventList(list):
'"pid": "CPU functions", '
f'"id": {next_id}, '
f'"cat": "cpu_to_{device_name}", '
'"args": {{}}}}, '
'"args": {}}, '
)
# Note: use torch.profiler to get device kernel trace
next_id += 1

View File

@ -240,6 +240,7 @@ static PyObject* THPModule_getNumThreads(PyObject* module, PyObject* noargs) {
}
static PyObject* THPModule_setNumThreads(PyObject* module, PyObject* arg) {
HANDLE_TH_ERRORS
THPUtils_assert(
THPUtils_checkLong(arg),
"set_num_threads expects an int, "
@ -249,6 +250,7 @@ static PyObject* THPModule_setNumThreads(PyObject* module, PyObject* arg) {
THPUtils_assert(nthreads > 0, "set_num_threads expects a positive integer");
at::set_num_threads(nthreads);
Py_RETURN_NONE;
END_HANDLE_TH_ERRORS
}
static PyObject* THPModule_getNumInteropThreads(
@ -260,6 +262,7 @@ static PyObject* THPModule_getNumInteropThreads(
static PyObject* THPModule_setNumInteropThreads(
PyObject* module,
PyObject* arg) {
HANDLE_TH_ERRORS
THPUtils_assert(
THPUtils_checkLong(arg),
"set_num_interop_threads expects an int, "
@ -270,6 +273,7 @@ static PyObject* THPModule_setNumInteropThreads(
nthreads > 0, "set_num_interop_threads expects a positive integer");
at::set_num_interop_threads(nthreads);
Py_RETURN_NONE;
END_HANDLE_TH_ERRORS
}
PyObject* THPModule_setDefaultTensorType(PyObject* _unused, PyObject* type) {