Bug: There is a data race in the `test_old` test

While running the CI for my own fork of C++/WinRT, I observed that `test_old` fails randomly (typically requiring 100-500 runs to reproduce). The failure line is [https://github.com/microsoft/cppwinrt/blob/129c9258fedbae96a0155f634bf56b68f6f75053/test/old_tests/UnitTests/async.cpp#L1411](https://github.com/microsoft/cppwinrt/blob/129c9258fedbae96a0155f634bf56b68f6f75053/test/old_tests/UnitTests/async.cpp#L1411), and I have identified the root cause.

The following is a simplified version of the code, stripped of all unrelated distractions. It is equivalent to the content in `async.cpp`, with some comments added to aid understanding.

```cpp
struct signal_done
{
    HANDLE signal;
    ~signal_done()
    {
        SetEvent(signal);
    }
};
IAsyncOperationWithProgress<std::uint64_t, std::uint64_t> AutoCancel_IAsyncOperationWithProgress(HANDLE go)
{
    signal_done d{ go };
    co_await resume_on_signal(go); // switches to the thread pool due to suspension.
    co_await std::suspend_never{}; // at this point, an exception is thrown due to cancellation being detected
    REQUIRE(false);
    co_return 0;
}
TEST_CASE("async, AutoCancel_IAsyncOperationWithProgress, 2")
{
    handle event { CreateEvent(nullptr, false, false, nullptr)}; // # 1 auto-reset event and initialized as unset
    IAsyncOperationWithProgress<std::uint64_t, std::uint64_t> async = AutoCancel_IAsyncOperationWithProgress(event.get());
    REQUIRE(async.Status() == AsyncStatus::Started);

    bool completed = false;     // # 2 not atomic and not protected by a mutex
    bool objectMatches = false;
    bool statusMatches = false;

    async.Completed([&](const IAsyncOperationWithProgress<std::uint64_t, std::uint64_t> & sender, AsyncStatus status)
    {
        completed = true; # 3
        objectMatches = (async == sender);
        statusMatches = (status == AsyncStatus::Canceled);
    });

    async.Cancel();
    SetEvent(event.get()); // #4 signal async to run
    REQUIRE(WaitForSingleObject(event.get(), INFINITE) == WAIT_OBJECT_0); // # 5 wait for async to be canceled
    REQUIRE(async.Status() == AsyncStatus::Canceled);
    REQUIRE_THROWS_AS(async.GetResults(), hresult_canceled); # 6

    REQUIRE(completed);     # 7
    REQUIRE(objectMatches);
    REQUIRE(statusMatches);
}
```

A simplified execution flow of this test is as follows:

1.  An auto-reset event is create. `WaitForSingleObject` is responsible for resetting it.
2.  Execute the coroutine and its body (C++/WinRT coroutine's `initial_awaiter::suspend` does not suspend the coroutine). https://github.com/microsoft/cppwinrt/blob/129c9258fedbae96a0155f634bf56b68f6f75053/strings/base_coroutine_foundation.h#L582
3.  `signal_done` is initialized; it will signal the event upon destruction.
4.  Execute `co_await resume_on_signal(go);`. This causes the coroutine to switch to the thread pool and suspend. https://github.com/microsoft/cppwinrt/blob/129c9258fedbae96a0155f634bf56b68f6f75053/strings/base_coroutine_threadpool.h#L486-L498
5.  The coroutine returns to the test function.
6.  Set the completion callback.
7.  Set the coroutine state to canceled.
8.  Signal the event.
9.  The coroutine resumes.
10. A. `co_await std::suspend_never{};` detects that the coroutine has been canceled and throws a `canceled` exception.
    B. The test function is suspended at step # 5, waiting for the signal.
11. A. `signal_done` destructs and signals the event.
    B. The test function resumes execution.
12. A. The coroutine executes `final_suspend`, which invokes the completion callback. https://github.com/microsoft/cppwinrt/blob/129c9258fedbae96a0155f634bf56b68f6f75053/strings/base_coroutine_foundation.h#L585-L610
    B. The test function checks the `completed` variable at step # 7.

The problem lies in step 4. The coroutine is resumed on the thread pool. Consequently, the `completed` callback is set to `true` on a **thread pool thread (# 3)**. Simultaneously, the **test thread** checks the variable at step # 7. The setting of the `completed` variable and the checking of it are **unordered**. This causes the test to fail randomly. The failure is not always observed because step # 7 throws and catches an exception, which often slows down the test function, creating a timing window that allows the test to pass more often than it fails.

I believe the key to this issue lies in two points: 
1. Is this test correct?
2. Does the completion callback have to be executed in `final_suspend`? Can it be executed by the `Cancel` function? By modifying `basic_coroutine_foundation.h` as follows, the test can also succeed.

https://github.com/microsoft/cppwinrt/blob/129c9258fedbae96a0155f634bf56b68f6f75053/strings/base_coroutine_foundation.h#L481-L509

```cpp
void Cancel() noexcept
{
    winrt::delegate<> cancel;
    async_completed_handler_t<AsyncInterface> completed; // NB
    {
        slim_lock_guard const guard(m_lock);
        if (m_status.load(std::memory_order_relaxed) == AsyncStatus::Started)
        {
            m_status.store(AsyncStatus::Canceled, std::memory_order_relaxed);
            if (cancellable_promise::originate_on_cancel())
            {
                m_exception = std::make_exception_ptr(hresult_canceled());
            }
            else
            {
                m_exception = std::make_exception_ptr(hresult_canceled(hresult_error::no_originate));
            }
            cancel = std::move(m_cancel);
            completed = std::move(this->m_completed); // NB
        }
    }
    if (cancel)
    {
        cancel();
    }
    cancellable_promise::cancel();
    if (completed) // NB
    {
        winrt::impl::invoke(completed, *this, AsyncStatus::Canceled); //NB
    }
}
```
This change gives the completion callback a chance to execute on the thread that calls `Cancel`. I'm not sure if this is a good idea or if it will break existing code. The documentation currently doesn't specify this in detail.

	bool await_ready() const noexcept
	{
	return WINRT_IMPL_WaitForSingleObject(m_handle, 0) == 0;
	}

	template <typename T>
	void await_suspend(impl::coroutine_handle<T> resume)
	{
	set_cancellable_promise_from_handle(resume);

	m_resume = resume;
	create_threadpool_wait();
	}

	struct final_suspend_awaiter
	{
	promise_base* promise;

	bool await_ready() const noexcept
	{
	return false;
	}

	void await_resume() const noexcept
	{
	}

	bool await_suspend(coroutine_handle<>) const noexcept
	{
	promise->set_completed();
	uint32_t const remaining = promise->subtract_reference();

	if (remaining == 0)
	{
	std::atomic_thread_fence(std::memory_order_acquire);
	}

	return remaining > 0;
	}
	};

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: There is a data race in the `test_old` test #1540

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	void Cancel() noexcept
	{
	winrt::delegate<> cancel;

	{
	slim_lock_guard const guard(m_lock);

	if (m_status.load(std::memory_order_relaxed) == AsyncStatus::Started)
	{
	m_status.store(AsyncStatus::Canceled, std::memory_order_relaxed);
	if (cancellable_promise::originate_on_cancel())
	{
	m_exception = std::make_exception_ptr(hresult_canceled());
	}
	else
	{
	m_exception = std::make_exception_ptr(hresult_canceled(hresult_error::no_originate));
	}
	cancel = std::move(m_cancel);
	}
	}

	if (cancel)
	{
	cancel();
	}

	cancellable_promise::cancel();
	}

Bug: There is a data race in the test_old test #1540

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Bug: There is a data race in the `test_old` test #1540