Reset stepcurrent cache if file succeeds (#115775)

Attempt to surface the segfault that happens on exit by resetting the "pytest last run" cache if pytest succeeds.  CI does not rerun on success so we won't hit an infinite loop anywhere, and I don't expect people to rerun on success (unless they're looking for flakes? Either way I highly doubt any one is using the --sc/--scs flag locally).

This ensures that if pytest succeeds but the process gets a non zero exit code, the rerun will start at beginning instead of skipping all the "succeeding" tests.

This only applies if the --sc/--scs flags are used, custom to pytorch and probably not used anywhere other than CI, not to be confused with --stepwise, which pytest has by default

Here's a list of segfaulting inductor/test_aot_inductor tests, which I added skips for:
```
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_duplicated_params_abi_compatible_cpu_with_stack_allocation
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_fqn_abi_compatible_cpu_with_stack_allocation
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_no_args_abi_compatible_cpu_with_stack_allocation
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_output_misaligned_abi_compatible_cpu_with_stack_allocation
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_pytree_inputs_abi_compatible_cpu_with_stack_allocation
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_seq_abi_compatible_cpu_with_stack_allocation
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_simple_split_abi_compatible_cpu_with_stack_allocation
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpuWithStackAllocationAndMinimalArrayRefInterface::test_addmm_abi_compatible_cpu_with_stack_allocation_and_minimal_arrayref_interface
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpuWithStackAllocationAndMinimalArrayRefInterface::test_aliased_buffer_reuse_abi_compatible_cpu_with_stack_allocation_and_minimal_arrayref_interface
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpuWithStackAllocationAndMinimalArrayRefInterface::test_buffer_reuse_abi_compatible_cpu_with_stack_allocation_and_minimal_arrayref_interface
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpuWithStackAllocationAndMinimalArrayRefInterface::test_convolution_abi_compatible_cpu_with_stack_allocation_and_minimal_arrayref_interface
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpuWithStackAllocationAndMinimalArrayRefInterface::test_duplicated_params_abi_compatible_cpu_with_stack_allocation_and_minimal_arrayref_interface
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpuWithStackAllocationAndMinimalArrayRefInterface::test_empty_graph_abi_compatible_cpu_with_stack_allocation_and_minimal_arrayref_interface
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpuWithStackAllocationAndMinimalArrayRefInterface::test_fqn_abi_compatible_cpu_with_stack_allocation_and_minimal_arrayref_interface
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpuWithStackAllocationAndMinimalArrayRefInterface::test_large_abi_compatible_cpu_with_stack_allocation_and_minimal_arrayref_interface
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpuWithStackAllocationAndMinimalArrayRefInterface::test_missing_output_abi_compatible_cpu_with_stack_allocation_and_minimal_arrayref_interface
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpuWithStackAllocationAndMinimalArrayRefInterface::test_no_args_abi_compatible_cpu_with_stack_allocation_and_minimal_arrayref_interface
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpuWithStackAllocationAndMinimalArrayRefInterface::test_output_misaligned_abi_compatible_cpu_with_stack_allocation_and_minimal_arrayref_interface
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpuWithStackAllocationAndMinimalArrayRefInterface::test_output_path_1_abi_compatible_cpu_with_stack_allocation_and_minimal_arrayref_interface
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpuWithStackAllocationAndMinimalArrayRefInterface::test_pytree_inputs_abi_compatible_cpu_with_stack_allocation_and_minimal_arrayref_interface
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpuWithStackAllocationAndMinimalArrayRefInterface::test_repeat_interleave_abi_compatible_cpu_with_stack_allocation_and_minimal_arrayref_interface
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpuWithStackAllocationAndMinimalArrayRefInterface::test_return_constant_abi_compatible_cpu_with_stack_allocation_and_minimal_arrayref_interface
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpuWithStackAllocationAndMinimalArrayRefInterface::test_reuse_kernel_abi_compatible_cpu_with_stack_allocation_and_minimal_arrayref_interface
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpuWithStackAllocationAndMinimalArrayRefInterface::test_seq_abi_compatible_cpu_with_stack_allocation_and_minimal_arrayref_interface
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpuWithStackAllocationAndMinimalArrayRefInterface::test_simple_abi_compatible_cpu_with_stack_allocation_and_minimal_arrayref_interface
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpuWithStackAllocationAndMinimalArrayRefInterface::test_simple_split_abi_compatible_cpu_with_stack_allocation_and_minimal_arrayref_interface
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpuWithStackAllocationAndMinimalArrayRefInterface::test_small_constant_abi_compatible_cpu_with_stack_allocation_and_minimal_arrayref_interface
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpuWithStackAllocationAndMinimalArrayRefInterface::test_with_no_triton_profiler_abi_compatible_cpu_with_stack_allocation_and_minimal_arrayref_interface
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpuWithStackAllocationAndMinimalArrayRefInterface::test_with_offset_abi_compatible_cpu_with_stack_allocation_and_minimal_arrayref_interface
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpuWithStackAllocationAndMinimalArrayRefInterface::test_with_profiler_abi_compatible_cpu_with_stack_allocation_and_minimal_arrayref_interface
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpuWithStackAllocationAndMinimalArrayRefInterface::test_zero_size_weight_abi_compatible_cpu_with_stack_allocation_and_minimal_arrayref_interface
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115775
Approved by: https://github.com/desertfire
This commit is contained in:
Catherine Lee
2023-12-19 22:19:57 +00:00
committed by PyTorch MergeBot
parent f88c9af98e
commit 7f7a7b0b48
2 changed files with 50 additions and 0 deletions

View File

@ -276,6 +276,7 @@ class StepcurrentPlugin:
self.cache: pytest.Cache = config.cache
self.directory = f"{STEPCURRENT_CACHE_DIR}/{config.getoption('stepcurrent')}"
self.lastrun: Optional[str] = self.cache.get(self.directory, None)
self.initial_val = self.lastrun
self.skip: bool = config.getoption("stepcurrent_skip")
def pytest_collection_modifyitems(self, config: Config, items: List[Any]) -> None:
@ -310,3 +311,7 @@ class StepcurrentPlugin:
def pytest_runtest_protocol(self, item, nextitem) -> None:
self.lastrun = item.nodeid
self.cache.set(self.directory, self.lastrun)
def pytest_sessionfinish(self, session, exitstatus):
if exitstatus == 0:
self.cache.set(self.directory, self.initial_val)

View File

@ -1477,6 +1477,16 @@ def fail_with_and_without_stack_allocation(is_skip=False):
)
def fail_stack_allocation(is_skip=False):
return TestFailure(
(
"abi_compatible_cpu_with_stack_allocation",
"abi_compatible_cpu_with_stack_allocation_and_minimal_arrayref_interface",
),
is_skip=is_skip,
)
def fail_minimal_arrayref_interface(is_skip=False):
return TestFailure(
("abi_compatible_cpu_with_stack_allocation_and_minimal_arrayref_interface",),
@ -1517,6 +1527,41 @@ CPU_TEST_FAILURES = {
"test_simple_dynamic": fail_with_and_without_stack_allocation(),
}
if not IS_FBCODE:
# The following tests look like they pass in both pytest and unittest (xml
# and terminal output say pass), but the process will segfault. This only
# happens in OSS CI and is fine internally.
CPU_TEST_FAILURES.update(
{
"test_duplicated_params": fail_stack_allocation(is_skip=True),
"test_fqn": fail_stack_allocation(is_skip=True),
"test_no_args": fail_stack_allocation(is_skip=True),
"test_output_misaligned": fail_stack_allocation(is_skip=True),
"test_pytree_inputs": fail_stack_allocation(is_skip=True),
"test_seq": fail_stack_allocation(is_skip=True),
"test_simple_split": fail_stack_allocation(is_skip=True),
"test_addmm": fail_minimal_arrayref_interface(is_skip=True),
"test_aliased_buffer_reuse": fail_minimal_arrayref_interface(is_skip=True),
"test_buffer_reuse": fail_minimal_arrayref_interface(is_skip=True),
"test_convolution": fail_minimal_arrayref_interface(is_skip=True),
"test_empty_graph": fail_minimal_arrayref_interface(is_skip=True),
"test_large": fail_minimal_arrayref_interface(is_skip=True),
"test_missing_output": fail_minimal_arrayref_interface(is_skip=True),
"test_output_path_1": fail_minimal_arrayref_interface(is_skip=True),
"test_repeat_interleave": fail_minimal_arrayref_interface(is_skip=True),
"test_return_constant": fail_minimal_arrayref_interface(is_skip=True),
"test_reuse_kernel": fail_minimal_arrayref_interface(is_skip=True),
"test_simple": fail_minimal_arrayref_interface(is_skip=True),
"test_small_constant": fail_minimal_arrayref_interface(is_skip=True),
"test_with_no_triton_profiler": fail_minimal_arrayref_interface(
is_skip=True
),
"test_with_offset": fail_minimal_arrayref_interface(is_skip=True),
"test_with_profiler": fail_minimal_arrayref_interface(is_skip=True),
"test_zero_size_weight": fail_minimal_arrayref_interface(is_skip=True),
}
)
copy_tests(
AOTInductorTestsTemplate,
AOTInductorTestABICompatibleCpu,