-
Notifications
You must be signed in to change notification settings - Fork 268
Description
While running the CI for my own fork of C++/WinRT, I observed that test_old fails randomly (typically requiring 100-500 runs to reproduce). The failure line is
cppwinrt/test/old_tests/UnitTests/async.cpp
Line 1411 in 129c925
| REQUIRE(completed); |
The following is a simplified version of the code, stripped of all unrelated distractions. It is equivalent to the content in async.cpp, with some comments added to aid understanding.
struct signal_done
{
HANDLE signal;
~signal_done()
{
SetEvent(signal);
}
};
IAsyncOperationWithProgress<std::uint64_t, std::uint64_t> AutoCancel_IAsyncOperationWithProgress(HANDLE go)
{
signal_done d{ go };
co_await resume_on_signal(go); // switches to the thread pool due to suspension.
co_await std::suspend_never{}; // at this point, an exception is thrown due to cancellation being detected
REQUIRE(false);
co_return 0;
}
TEST_CASE("async, AutoCancel_IAsyncOperationWithProgress, 2")
{
handle event { CreateEvent(nullptr, false, false, nullptr)}; // # 1 auto-reset event and initialized as unset
IAsyncOperationWithProgress<std::uint64_t, std::uint64_t> async = AutoCancel_IAsyncOperationWithProgress(event.get());
REQUIRE(async.Status() == AsyncStatus::Started);
bool completed = false; // # 2 not atomic and not protected by a mutex
bool objectMatches = false;
bool statusMatches = false;
async.Completed([&](const IAsyncOperationWithProgress<std::uint64_t, std::uint64_t> & sender, AsyncStatus status)
{
completed = true; # 3
objectMatches = (async == sender);
statusMatches = (status == AsyncStatus::Canceled);
});
async.Cancel();
SetEvent(event.get()); // #4 signal async to run
REQUIRE(WaitForSingleObject(event.get(), INFINITE) == WAIT_OBJECT_0); // # 5 wait for async to be canceled
REQUIRE(async.Status() == AsyncStatus::Canceled);
REQUIRE_THROWS_AS(async.GetResults(), hresult_canceled); # 6
REQUIRE(completed); # 7
REQUIRE(objectMatches);
REQUIRE(statusMatches);
}A simplified execution flow of this test is as follows:
- An auto-reset event is create.
WaitForSingleObjectis responsible for resetting it. - Execute the coroutine and its body (C++/WinRT coroutine's
initial_awaiter::suspenddoes not suspend the coroutine).cppwinrt/strings/base_coroutine_foundation.h
Line 582 in 129c925
return{}; signal_doneis initialized; it will signal the event upon destruction.- Execute
co_await resume_on_signal(go);. This causes the coroutine to switch to the thread pool and suspend.cppwinrt/strings/base_coroutine_threadpool.h
Lines 486 to 498 in 129c925
bool await_ready() const noexcept { return WINRT_IMPL_WaitForSingleObject(m_handle, 0) == 0; } template <typename T> void await_suspend(impl::coroutine_handle<T> resume) { set_cancellable_promise_from_handle(resume); m_resume = resume; create_threadpool_wait(); } - The coroutine returns to the test function.
- Set the completion callback.
- Set the coroutine state to canceled.
- Signal the event.
- The coroutine resumes.
- A.
co_await std::suspend_never{};detects that the coroutine has been canceled and throws acanceledexception.
B. The test function is suspended at step # 5, waiting for the signal. - A.
signal_donedestructs and signals the event.
B. The test function resumes execution. - A. The coroutine executes
final_suspend, which invokes the completion callback.cppwinrt/strings/base_coroutine_foundation.h
Lines 585 to 610 in 129c925
struct final_suspend_awaiter { promise_base* promise; bool await_ready() const noexcept { return false; } void await_resume() const noexcept { } bool await_suspend(coroutine_handle<>) const noexcept { promise->set_completed(); uint32_t const remaining = promise->subtract_reference(); if (remaining == 0) { std::atomic_thread_fence(std::memory_order_acquire); } return remaining > 0; } };
B. The test function checks thecompletedvariable at step # 7.
The problem lies in step 4. The coroutine is resumed on the thread pool. Consequently, the completed callback is set to true on a thread pool thread (# 3). Simultaneously, the test thread checks the variable at step # 7. The setting of the completed variable and the checking of it are unordered. This causes the test to fail randomly. The failure is not always observed because step # 7 throws and catches an exception, which often slows down the test function, creating a timing window that allows the test to pass more often than it fails.
I believe the key to this issue lies in two points:
- Is this test correct?
- Does the completion callback have to be executed in
final_suspend? Can it be executed by theCancelfunction? By modifyingbasic_coroutine_foundation.has follows, the test can also succeed.
cppwinrt/strings/base_coroutine_foundation.h
Lines 481 to 509 in 129c925
| void Cancel() noexcept | |
| { | |
| winrt::delegate<> cancel; | |
| { | |
| slim_lock_guard const guard(m_lock); | |
| if (m_status.load(std::memory_order_relaxed) == AsyncStatus::Started) | |
| { | |
| m_status.store(AsyncStatus::Canceled, std::memory_order_relaxed); | |
| if (cancellable_promise::originate_on_cancel()) | |
| { | |
| m_exception = std::make_exception_ptr(hresult_canceled()); | |
| } | |
| else | |
| { | |
| m_exception = std::make_exception_ptr(hresult_canceled(hresult_error::no_originate)); | |
| } | |
| cancel = std::move(m_cancel); | |
| } | |
| } | |
| if (cancel) | |
| { | |
| cancel(); | |
| } | |
| cancellable_promise::cancel(); | |
| } |
void Cancel() noexcept
{
winrt::delegate<> cancel;
async_completed_handler_t<AsyncInterface> completed; // NB
{
slim_lock_guard const guard(m_lock);
if (m_status.load(std::memory_order_relaxed) == AsyncStatus::Started)
{
m_status.store(AsyncStatus::Canceled, std::memory_order_relaxed);
if (cancellable_promise::originate_on_cancel())
{
m_exception = std::make_exception_ptr(hresult_canceled());
}
else
{
m_exception = std::make_exception_ptr(hresult_canceled(hresult_error::no_originate));
}
cancel = std::move(m_cancel);
completed = std::move(this->m_completed); // NB
}
}
if (cancel)
{
cancel();
}
cancellable_promise::cancel();
if (completed) // NB
{
winrt::impl::invoke(completed, *this, AsyncStatus::Canceled); //NB
}
}This change gives the completion callback a chance to execute on the thread that calls Cancel. I'm not sure if this is a good idea or if it will break existing code. The documentation currently doesn't specify this in detail.