Task Scheduling & Lifecycle¶

This reference covers how asyncio instantiates, schedules, suspends, and terminates tasks — and how those mechanics interact with the event loop's ready queue under production load. The focus is narrow on purpose: not coroutines in the abstract, but the concrete state transitions a Task undergoes, the catalogue of scheduling patterns that fan work out across the loop, and the resource boundaries that keep a high-throughput service from drowning in pending work. Get these wrong and you see the classic asyncio failure signatures: loop starvation, unbounded memory growth from retained tasks, and non-deterministic teardown that drops in-flight requests.

A Task is a thin wrapper that drives a coroutine to completion by repeatedly stepping it and re-scheduling the next step through the loop. Everything in this guide flows from that single fact: the loop owns a queue of callbacks, a task is just a recurring producer of callbacks into that queue, and "scheduling" is the discipline of controlling how many such producers exist and how fast they enqueue.

Architectural principles¶

A task is a self-rescheduling callback. Each await that yields control returns the task to the loop; the loop resumes it via loop.call_soon() on a later iteration. The task is never "running in parallel" — it occupies the single loop thread in discrete steps.
Creation is decoupled from awaiting. create_task() schedules immediately and returns a handle; the result is collected later. This split is what enables concurrency, and also what enables silent failures when the handle is dropped.
References are load-bearing. The loop holds only a weak reference to a task. A fire-and-forget task with no strong reference can be garbage-collected mid-flight, so retaining the handle is part of correctness, not bookkeeping.
Cancellation is cooperative and deferred. task.cancel() schedules a CancelledError to be raised at the next suspension point; it does not interrupt synchronous code. Deterministic teardown depends on every task yielding regularly and re-raising on cancel.
Backpressure belongs at the application layer. The loop will happily accept millions of pending tasks. Bounding concurrency with a Semaphore, a bounded queue, or a structured group is the only thing standing between you and an out-of-memory kill.

How scheduling integrates with the event loop¶

Task scheduling is best understood as a feedback cycle into the loop's ready queue. The loop's core iteration does three things in order: it runs every callback currently in the ready queue (loop._ready), it polls the selector for I/O that became ready and enqueues their callbacks, then it fires any timers that have come due. A Task participates as follows. When you call create_task(), the task schedules its first step with loop.call_soon(), which appends a callback to _ready. On the next iteration the loop pops that callback and runs Task.__step, which sends None into the coroutine. The coroutine executes synchronously until it hits an await on something not yet ready — at which point it yields a future. The task registers a done-callback on that future and stops. It is now suspended, holding no slot in the ready queue. When the awaited future completes (I/O ready, timer fired, another task resolved), its done-callback re-enqueues the task's next step via call_soon, and the cycle repeats until the coroutine returns or raises.

This is why a CPU-bound section with no await freezes everything: while __step runs synchronously, the loop cannot advance to the next ready callback, poll the selector, or fire timers. For the broader picture of how the selector, timers, and thread-pool executors compose into a single loop iteration, start from Asyncio Fundamentals & Event Loop Architecture, the overview that this section sits under. The precise mechanics of which API places a task into _ready — and the historical baggage around it — are dissected in understanding asyncio.create_task vs asyncio.ensure_future.

Pattern catalogue¶

Each scheduling pattern is a different answer to one question: how do results and failures flow back from the tasks you fan out? Choose by failure semantics first, ergonomics second.

Fire-and-forget with reference retention¶

Use when a side effect (a metric flush, a cache warm) should run concurrently and you do not need its result, but you must not let it be silently dropped. The loop keeps only a weak reference, so an un-retained task can vanish mid-execution. Retain it in a set and discard on completion.

import asyncio
import logging

logger = logging.getLogger("fire_and_forget")
_background: set[asyncio.Task] = set()


async def flush_metrics(payload: dict) -> None:
    await asyncio.sleep(0.05)  # network write
    logger.info("flushed %d metrics", len(payload))


def spawn_background(coro) -> asyncio.Task:
    task = asyncio.create_task(coro)
    _background.add(task)  # strong ref keeps it alive
    task.add_done_callback(_background.discard)
    task.add_done_callback(_log_if_failed)
    return task


def _log_if_failed(task: asyncio.Task) -> None:
    if not task.cancelled() and task.exception():
        logger.error("background task failed: %r", task.exception())


async def main() -> None:
    spawn_background(flush_metrics({"latency_ms": 12}))
    await asyncio.sleep(0.1)  # let it complete before loop exits


asyncio.run(main())

The done-callback that inspects exception() is mandatory: without it a failing fire-and-forget task raises Task exception was never retrieved at finalization and the error is lost. Diagnosing that class of leak is covered in debugging unawaited coroutines in large codebases.

gather fan-out¶

Use when you have a fixed set of awaitables, want every result back in order, and are happy to wait for the slowest. gather schedules all of them concurrently and returns a list aligned to the input. With return_exceptions=True, failures become result values instead of propagating, letting you handle partial success.

import asyncio


async def fetch(shard: int) -> int:
    await asyncio.sleep(0.01 * shard)
    if shard == 3:
        raise ConnectionError(f"shard {shard} down")
    return shard * 10


async def main() -> list:
    results = await asyncio.gather(
        *(fetch(i) for i in range(5)),
        return_exceptions=True,
    )
    for i, r in enumerate(results):
        if isinstance(r, Exception):
            print(f"shard {i} failed: {r!r}")
        else:
            print(f"shard {i} -> {r}")
    return results


asyncio.run(main())

Note the sharp edge: with return_exceptions=False (the default), the first exception propagates but the sibling tasks are not cancelled — they run to completion in the background. That asymmetry is the main reason to prefer TaskGroup for new code.

as_completed streaming¶

Use when you want to process results the instant each finishes rather than waiting for the batch — for example, returning the first healthy replica or rendering search hits incrementally. It yields awaitables in completion order.

import asyncio


async def probe(replica: str, delay: float) -> str:
    await asyncio.sleep(delay)
    return f"{replica} ok"


async def main() -> None:
    coros = [probe("r1", 0.3), probe("r2", 0.1), probe("r3", 0.2)]
    for earliest in asyncio.as_completed(coros):
        result = await earliest
        print("first available:", result)
        break  # take the fastest replica, ignore the rest

To avoid leaking the slower probes after you break, wrap them in tasks and cancel the unused ones — as_completed alone does not cancel stragglers.

wait with FIRST_COMPLETED¶

Use when you need explicit control over the done/pending split — implementing a race, a timeout fan-in, or a "first to respond wins" with deliberate cleanup of the losers.

import asyncio


async def slow(name: str, t: float) -> str:
    await asyncio.sleep(t)
    return name


async def main() -> None:
    tasks = {asyncio.create_task(slow(n, t)) for n, t in
             [("primary", 0.2), ("fallback", 0.05)]}
    done, pending = await asyncio.wait(
        tasks, return_when=asyncio.FIRST_COMPLETED
    )
    winner = done.pop().result()
    print("winner:", winner)
    for p in pending:
        p.cancel()
    await asyncio.gather(*pending, return_exceptions=True)  # drain cancellations


asyncio.run(main())

wait returns sets, not ordered results, and never raises on task failure — you inspect each task yourself. The explicit cancel() plus drained gather is the canonical way to retire the losers cleanly; the deeper teardown rules live in cancellation patterns.

TaskGroup structured scheduling¶

Use this for almost all new fan-out work. asyncio.TaskGroup (Python 3.11+) ties task lifetimes to a lexical scope: the async with block does not exit until every child finishes, and if any child raises, the rest are cancelled and the failures surface together as an ExceptionGroup.

import asyncio


async def fetch(endpoint: str) -> dict:
    await asyncio.sleep(0.05)
    if "critical" in endpoint:
        raise ConnectionError("service unavailable")
    return {"endpoint": endpoint, "status": "ok"}


async def main() -> None:
    endpoints = ["/data", "/critical", "/health"]
    try:
        async with asyncio.TaskGroup() as tg:
            tasks = [tg.create_task(fetch(e)) for e in endpoints]
    except* ConnectionError as eg:
        print(f"group failed: {[str(e) for e in eg.exceptions]}")
    else:
        print([t.result() for t in tasks])


asyncio.run(main())

The structured guarantee — no task outlives its scope, no failure is silently dropped — is why this is the default recommendation. The dedicated walkthrough is structured concurrency with asyncio.TaskGroup.

Resource boundaries¶

Patterns decide how results flow; boundaries decide how much runs at once. The loop imposes no limit, so every fan-out over an unbounded input needs one of these.

Bounding task count with a Semaphore. Wrap the body of each task in a semaphore so only N are in their critical section concurrently. This caps in-flight I/O (open sockets, DB connections) without limiting how many tasks you create:

import asyncio

sem = asyncio.Semaphore(20)  # at most 20 concurrent fetches


async def bounded_fetch(url: str) -> str:
    async with sem:
        await asyncio.sleep(0.05)  # the actual I/O
        return f"got {url}"


async def main() -> None:
    urls = [f"https://api/{i}" for i in range(1000)]
    async with asyncio.TaskGroup() as tg:
        for u in urls:
            tg.create_task(bounded_fetch(u))

Creating 1000 tasks is cheap; the semaphore ensures only 20 hold a connection at any moment. Note that the tasks themselves still all exist in memory — for truly large inputs, prefer the queue pattern below so you never materialize the full task set.

Queue-based worker fan-out. When the input is huge or a stream, spawn a fixed pool of workers draining a bounded asyncio.Queue. The queue's maxsize provides backpressure: producers block when it fills, so memory stays flat regardless of input size.

import asyncio


async def worker(name: str, q: asyncio.Queue) -> None:
    while True:
        item = await q.get()
        try:
            await asyncio.sleep(0.01)  # process item
        finally:
            q.task_done()


async def main() -> None:
    q: asyncio.Queue = asyncio.Queue(maxsize=100)  # backpressure bound
    workers = [asyncio.create_task(worker(f"w{i}", q)) for i in range(8)]
    for item in range(10_000):
        await q.put(item)  # blocks when queue is full
    await q.join()  # wait for all items processed
    for w in workers:
        w.cancel()
    await asyncio.gather(*workers, return_exceptions=True)


asyncio.run(main())

This is the workhorse for sustained throughput: fixed worker count, fixed queue depth, constant memory. The same primitives underpin the broader treatment of locks, semaphores, and events in synchronization primitives.

Integrated production example¶

The following ties the catalogue and boundaries together: a bounded crawler that fans out with a worker pool, retries transient failures, enforces a per-item deadline, retires cleanly on cancellation, and exposes a diagnostic snapshot.

import asyncio
import logging
import time
from dataclasses import dataclass, field

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("crawler")


@dataclass
class Stats:
    processed: int = 0
    failed: int = 0
    started_at: float = field(default_factory=time.perf_counter)


async def process_item(item: int) -> None:
    # Simulated I/O with a deadline; some items are transiently slow.
    async with asyncio.timeout(0.5):
        await asyncio.sleep(0.05 if item % 7 else 0.6)


async def worker(name: str, q: asyncio.Queue, stats: Stats) -> None:
    while True:
        item = await q.get()
        try:
            for attempt in range(3):
                try:
                    await process_item(item)
                    stats.processed += 1
                    break
                except TimeoutError:
                    if attempt == 2:
                        stats.failed += 1
                        logger.warning("item %s gave up after retries", item)
                    await asyncio.sleep(0.02 * (attempt + 1))  # backoff
        except asyncio.CancelledError:
            logger.info("%s cancelled; draining", name)
            raise
        finally:
            q.task_done()


async def diagnostics(stats: Stats, q: asyncio.Queue) -> None:
    while True:
        await asyncio.sleep(0.2)
        elapsed = time.perf_counter() - stats.started_at
        rate = stats.processed / elapsed if elapsed else 0
        logger.info(
            "pending_tasks=%d queue=%d processed=%d failed=%d rate=%.0f/s",
            len(asyncio.all_tasks()), q.qsize(),
            stats.processed, stats.failed, rate,
        )


async def main() -> None:
    q: asyncio.Queue = asyncio.Queue(maxsize=200)
    stats = Stats()
    workers = [asyncio.create_task(worker(f"w{i}", q, stats)) for i in range(10)]
    monitor = asyncio.create_task(diagnostics(stats, q))
    try:
        for item in range(2_000):
            await q.put(item)  # backpressure when full
        await q.join()
    finally:
        for t in (*workers, monitor):
            t.cancel()
        await asyncio.gather(*workers, monitor, return_exceptions=True)
    logger.info("done: processed=%d failed=%d", stats.processed, stats.failed)


asyncio.run(main())

Diagnostic Hook: The diagnostics coroutine is the production-critical piece. It emits len(asyncio.all_tasks()) every 200 ms — a flat or falling number means the pool is keeping up; a steadily climbing number means task creation outpaces completion and you are heading toward an OOM. Pair it with q.qsize(): a queue pinned at maxsize confirms backpressure is engaging and producers are correctly throttled.

Diagnostic Hook — scheduling health metrics

Instrument three signals on any task-heavy loop. Pending task count: sample len(asyncio.all_tasks()) on a timer; a monotonic rise is the leading indicator of an unbounded fan-out. Scheduling latency: record loop.time() immediately before and after a no-op await asyncio.sleep(0) from a watchdog coroutine — the delta is how long a ready callback waited behind synchronous work, and a growing delta means the loop is being starved. Slow-callback warnings: run with PYTHONASYNCIODEBUG=1 (or set loop.slow_callback_duration, default 0.1s) so the loop logs any callback that monopolized the thread, pinpointing the coroutine that forgot to await.

Failure modes¶

Failure mode	Root cause	Detection	Fix
Task silently vanishes mid-run	Loop holds only a weak ref; no strong ref retained for a fire-and-forget task	Result/side effect never appears; no traceback	Keep the handle in a set and discard on done; never drop the return of `create_task()`
`Task exception was never retrieved`	A spawned task failed and its `exception()`/result was never read	Warning at GC/finalization, long after the fault	Add a done-callback that inspects `exception()`, or await/gather the task
Loop starvation / latency spikes	A task ran CPU-bound or blocking code without yielding, freezing `__step`	Rising scheduling latency; slow-callback log lines	Offload to `asyncio.to_thread()`/executor; insert `await asyncio.sleep(0)` in hot loops
Unbounded memory growth	Fan-out over a large input with no concurrency cap	`asyncio.all_tasks()` count climbs without plateau	Bound with `Semaphore` or a queue-backed worker pool
Zombie tasks survive shutdown	`CancelledError` swallowed in a `finally`/`except` without re-raising	Tasks remain pending after `loop.close()`; teardown hangs	Always re-raise `CancelledError` after cleanup
Siblings keep running after a sibling fails	`gather()` default does not cancel peers on first exception	Background work continues past the error	Use `TaskGroup`, or cancel pending tasks explicitly

Frequently Asked Questions¶

How does asyncio.create_task differ from directly awaiting a coroutine?

create_task immediately schedules the coroutine as a Task in the loop's ready queue and returns a handle, enabling concurrent execution and independent lifecycle tracking. A direct await blocks the current coroutine until completion, so no concurrency is gained and there is no Task object to introspect.

What happens to pending tasks when the event loop closes?

Pending tasks are cancelled, raising CancelledError at their next suspension point. If that error is swallowed without re-raising, resources leak and teardown can hang. Drain tasks deterministically with asyncio.gather(*tasks, return_exceptions=True) or a TaskGroup before calling loop.close().

How can I diagnose loop starvation caused by long-running tasks?

Run with PYTHONASYNCIODEBUG=1 or set loop.slow_callback_duration so the loop logs callbacks that monopolize the thread. Measure scheduling latency by timing a no-op await asyncio.sleep(0) from a watchdog, and offload CPU-bound work to executors via asyncio.to_thread() to keep scheduling cooperative.

When should I use asyncio.TaskGroup over asyncio.gather?

Use TaskGroup (Python 3.11+) for almost all fan-out: it scopes task lifetimes, cancels siblings on the first failure, and surfaces errors as an ExceptionGroup. Use gather for a fixed set of awaitables where you want every result back and, with return_exceptions=True, want failures returned as values rather than raised.

Why does a fire-and-forget task sometimes never run to completion?

The event loop keeps only a weak reference to a task. If you do not retain the handle returned by create_task(), the task can be garbage-collected mid-flight. Store it in a set and discard it in a done-callback so a strong reference exists until it finishes.

Asyncio Fundamentals & Event Loop Architecture — up to the overview for how the selector, timers, and executors compose the loop iteration that drives every task.
Understanding asyncio.create_task vs ensure_future — which scheduling API to call and why, in depth.
Structured concurrency with asyncio.TaskGroup — the recommended default for fan-out with scoped, fail-fast semantics.
Cancellation patterns — how to retire losing and pending tasks without leaking CancelledError.
Synchronization primitives — locks, semaphores, and events for bounding and coordinating concurrent tasks.