problemhardooddesign-a-task-or-job-schedulerdesign a task or job schedulerdesignataskorjobscheduler

Object-Oriented Design for Task/Job Scheduler

HardUpdated: Jan 1, 2026

Problem

Design an object-oriented task/job scheduler that accepts tasks scheduled for future execution (one-shot or recurring), dispatches them reliably to worker threads at the right time, supports cancellation and retries, and scales to many concurrent schedules.

Solution

1. Requirements Analysis

Assumptions: this note combines interview prompts and community examples. Where details were absent, I kept to minimal, commonly expected scheduler capabilities.

Functional Requirements:

  • Submit tasks (function/callable) with scheduling metadata: absolute run time, delay, or recurrence (cron-like).
  • Persist and restore scheduled tasks (optional pluggable PersistenceStore).
  • Dispatch and execute tasks using a WorkerPool at or after the scheduled time.
  • Allow cancellation, status queries, and simple priority handling.
  • Retry failed tasks per a RetryPolicy with configurable backoff and max attempts.

Non-Functional Requirements:

  • Thread-safety: protect scheduling/cancellation/dispatch operations.
  • Scalability: support large numbers of scheduled entries and concurrent task execution.
  • Observability: task auditing, metrics, and error logging.
  • Extensibility: pluggable persistence, cron parsing, and retry strategies.

2. Use Case Diagram

Actors: Client (submitting tasks), Scheduler System (coordinator), Worker (executor), Admin (operator).

Use case summary: Client submits tasks to Scheduler; Scheduler enqueues and persists tasks; TimerService wakes Scheduler when tasks are due; Scheduler dispatches tasks to WorkerPool; Workers execute and report results. Clients and Admins can query or cancel tasks.

graph TB
  subgraph "Scheduler System"
    UC_Submit("Submit Task")
    UC_Cancel("Cancel Task")
    UC_Query("Query Status")
    UC_Dispatch("Dispatch Task")
    UC_Retry("Retry Policy")
  end
  Client([Client]) --> UC_Submit
  Client([Client]) --> UC_Query
  Client([Client]) --> UC_Cancel
  Admin([Admin]) --> UC_Query
  Worker([Worker]) --> UC_Dispatch
  style Client fill:#4CAF50,color:#fff
  style Admin fill:#FF9800,color:#fff
  style Worker fill:#2196F3,color:#fff

3. Class Diagram

Core classes (concise responsibilities):

  • Task: id, payload (callable), scheduleSpec, status, attempts, priority. Methods: execute(), cancel(), markSucceeded(), markFailed().
  • ScheduleSpec: Encapsulates run time, interval, cron expressions and computes the next run time.
  • Scheduler: Accepts tasks, persists/restores them, manages DispatchQueue and TimerService, applies RetryPolicy, and exposes admin APIs.
  • DispatchQueue: Time-ordered priority queue of tasks (min-heap or DelayQueue equivalent).
  • WorkerPool: Bounded thread pool that executes Task.execute().
  • TimerService: Responsible for sleeping/waking the Scheduler when the next task becomes due.
  • PersistenceStore: Interface for persisting tasks (in-memory/DB/redis).
  • RetryPolicy: Encapsulates retry/backoff logic.
  • TaskResult / AuditLog: Stores execution metadata for observability.
classDiagram
  class Task { +String id +Object payload +ScheduleSpec schedule +TaskStatus status +int attempts +execute() +cancel() }
  class ScheduleSpec { +nextRunAfter(now): Timestamp }
  class Scheduler { +submitTask(task) +cancelTask(id) +queryTask(id) +dispatchLoop() }
  class DispatchQueue { +add(task) +pollReady(now) }
  class WorkerPool { +submit(task) +shutdown() }
  class TimerService { +sleepUntil(ts) }
  class PersistenceStore { +save(task) +loadAll() }
  class RetryPolicy { +shouldRetry(task) +nextDelay(attempts) }
  Task "1" -- "1" ScheduleSpec : "uses"
  Scheduler "1" -- "1" DispatchQueue : "manages"
  Scheduler "1" -- "1" WorkerPool : "uses"
  Scheduler "1" -- "1" PersistenceStore : "persists"
  Scheduler "1" -- "1" TimerService : "schedules"

4. Activity Diagrams

Activity: Submit -> Execute -> Complete

graph TB
  S1[Client submits Task] --> S2[Scheduler validates & persists Task]
  S2 --> S3[Insert Task into DispatchQueue]
  S3 --> S4[TimerService wakes when Task due]
  S4 --> S5[Scheduler dispatches task to worker pool]
  S5 --> S6[Worker executes task]
  S6 --> S7{Execution success}
  S7 -- Yes --> S8[Mark succeeded and record result]
  S7 -- No --> S9[Apply retry policy; reschedule or mark failed]

Activity: Cancel Task

graph TB
  C1[Client requests cancel task] --> C2[Scheduler looks up task]
  C2 --> C3{Task running?}
  C3 -- No --> C4[Remove from queue and persist cancel]
  C3 -- Yes --> C5[Signal worker to stop or mark no retry]
  C4 --> C6[Return cancelled]
  C5 --> C6[Return cancelled best effort]

5. High-Level Code Implementation

Java skeleton (shapes only):

public enum TaskStatus { SCHEDULED, RUNNING, SUCCEEDED, FAILED, CANCELLED }

public class ScheduleSpec {
    public java.time.Instant nextRunAfter(java.time.Instant now) { return null; }
}

public abstract class Task {
    protected String id;
    protected Object payload; // Runnable/Callable
    protected ScheduleSpec schedule;
    protected TaskStatus status;
    protected int attempts;
    public abstract void execute() throws Exception;
    public void cancel() { /* mark cancelled */ }
}

public class Scheduler {
    private DispatchQueue queue;
    private WorkerPool workers;
    private PersistenceStore store;
    private TimerService timer;
    public String submitTask(Task t) { /* persist + enqueue */ return t.id; }
    public boolean cancelTask(String id) { return false; }
    public Task queryTask(String id) { return null; }
    public void dispatchLoop() { /* main loop */ }
}

public class WorkerPool {
    public void submit(Task t) { /* hand to thread pool */ }
    public void shutdown() { }
}

Python skeleton (type-hinted):

from __future__ import annotations
from dataclasses import dataclass
from enum import Enum
from typing import Any, Optional
import datetime

class TaskStatus(Enum):
    SCHEDULED = "SCHEDULED"
    RUNNING = "RUNNING"
    SUCCEEDED = "SUCCEEDED"
    FAILED = "FAILED"
    CANCELLED = "CANCELLED"

@dataclass
class ScheduleSpec:
    cron: Optional[str] = None
    run_at: Optional[datetime.datetime] = None
    def next_run_after(self, now: datetime.datetime) -> Optional[datetime.datetime]:
        return None

class Task:
    def __init__(self, id: str, payload: Any, schedule: ScheduleSpec) -> None:
        self.id = id
        self.payload = payload
        self.schedule = schedule
        self.status = TaskStatus.SCHEDULED
        self.attempts = 0
    def execute(self) -> None:
        raise NotImplementedError
    def cancel(self) -> None:
        self.status = TaskStatus.CANCELLED

class Scheduler:
    def __init__(self) -> None:
        self.queue = None
        self.workers = None
        self.store = None
    def submit_task(self, task: Task) -> str:
        # persist and enqueue
        return task.id
    def cancel_task(self, id: str) -> bool:
        return False
    def query_task(self, id: str) -> Optional[Task]:
        return None

Comments