mirror of
https://github.com/pytorch/pytorch.git
synced 2025-10-21 05:34:18 +08:00
Fix flaky test_udf_remote_message_delay_timeout_to_self (#41217)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41217 Fixes this flaky test. Due to the possibility of callback finishCreatingOwnerRRef running after request_callback has processed and created the owner RRef, we could actually end up with 0 owners on the node, since the callback removes from the owners_ map. In this case, shutdown is fine since there are no owners. On the other hand, if the callback runs first, there will be 1 owner which we will delete in shutdown when we detect it has no forks. So either way, shutdown works fine and we don't need to enforce there to be 1 owner. ghstack-source-id: 107883497 Test Plan: Ran the test 500 times with TSAN. Reviewed By: ezyang Differential Revision: D22469806 fbshipit-source-id: 02290d6d5922f91a9e2d5ede21d1cf1c4598cb46
This commit is contained in:
committed by
Facebook GitHub Bot
parent
94e4248d80
commit
b5e32528d0
@ -273,6 +273,9 @@ void RRefContext::delAllUsersAndUnforkedOwners(
|
||||
for (auto& rrefId : unforkedOwners) {
|
||||
LOG(INFO) << "Removing unforked OwnerRRef with RRefId: " << rrefId;
|
||||
auto iter = owners_.find(rrefId);
|
||||
TORCH_CHECK(
|
||||
iter != owners_.end(),
|
||||
c10::str("Did not find OwnerRRef with RRefId: ", rrefId));
|
||||
owners_.erase(iter);
|
||||
}
|
||||
}
|
||||
|
Reference in New Issue
Block a user