Why I built my own node-schedule alternative

Why I built my own node-schedule alternative

Job-Stash is an alternative to node-schedule that persists job, supports atomic execution and has inbuilt retry mechanism

ยท

8 min read

Like on YouTube where you can schedule your videos at a specific date and time

I had a similar requirement a few months back for an LMS product. I had to publish reports of students at the end of the term. (A specific date and time)

When I googled how to schedule date-based jobs,

I found this package called node-schedule.

(Why not just use event bridge? - My Answer)

Node Schedule was super easy to use, You just pass it a name, callback and the date on which the callback function should be executed.

But it had 2 major and 1 minor problem

  • Jobs weren't persistent (If I restarted my express app, the jobs were gone)

  • Jobs weren't Atomic - If the same job was scheduled on multiple machines, then it would execute them on every machine leading to duplicate data, so you could only vertically scale your machines where jobs are running. Also cancelling the job on every machine was not possible.

  • No Inbuilt Retry Mechanism, if a job failed, I had to catch it and then reschedule the job.

To fix these things, I had previously built a class on top of node-schedule in my code base that made sure the jobs were atomic and had the rescheduling ability.

A similar requirement popped up in a different project recently and I decided to bundle this into a new package and that's how building job-stash began.

Introducing Job Stash - Alternative to Node Schedule and Agenda

Job Stash is an npm package to schedule tasks which takes a callback function and the date on which the function has to be executed.

  • Persists task metadata in DB - Reschedule all the jobs using this on your app restart.

  • Abstracts DB Implementation - Give it a Mongo DB Address(URI) and it forms a new connection. Give it a MongoDBClient and it reuses the connection.

  • It's Atomic - If you have multiple machines running, only one of them will execute your callback function.

  • Has an Inbuilt Retry Mechanism - if something goes wrong while executing the job, they are retried after a retry window(customisable).

  • Unpacked size less than 30KB.

For persistence storage I am using MongoDB here since our app already used MongoDB and the idea was I could reuse the same connection.

Tutorial on How to Schedule Jobs in Node.js using Job Stash

Step 1 - Install it on your project

npm i job-stash

Step 2- Initialise the Scheduler

The scheduler has to be initialised with db credentials and other options

I am here reusing the db connection from Mongoose and setting useLock to true.

Making useLock to true ensures that if you are running a service on multiple machines, the job is executed only in one of them.

import { Scheduler } from 'job-stash';

await Scheduler.init({ mongo: mongoose.connection.db }, { useLock: true });

/* 
If you don't already have a mongodb connection running
use the below snippet.

await Scheduler.init(
  {
    db: {
      address: mongoConnectionString,
      collection: 'jobs',
      name: 'evaluation_service',
    },
  },
  { useLock: true }
);
*/

Let's talk about the callback function that decides what to do when the job is executed.

I have a common function that decides which function to call / what to do based on a field called operation.

Example:

If my operation is equal to publishReport I want to call a publishReport().

async function decideCallback(job) {
  const { jobId, operation, activityType } = job;
  if (operation === "publishReport") {
    await publishReport(jobId);
  }
  else if (operation === "gradeZero" && activityType === "quiz") {
    await gradeZero(jobId);
    await createQuizReport(jobId);
  }
  else if (operation === "gradeZero") {
    await gradeZero(jobId);
  }
}

If you notice, the decideCallback function has access to a job object which includes jobId and other metadata that you store.

The Scheduler binds the job data inside any callback you pass to it when scheduling or rescheduling jobs.

Step 3 - Reschedule Jobs From Disk to Memory on App Start

I will now use the decideCallback function to reschedule all my tasks on the restart of my express app after initialising the scheduler class.

await Scheduler.init({ mongo: mongoose.connection.db }, { useLock: true });
await Scheduler.rescheduleJobs(decideCallback)

the Scheduler.rescheduleJobs will internally fetch all the active jobs and will pass all the job data inside the decideCallback function.

You can use the same decideCallback function to schedule a new job too.

How to Schedule A New Job

It's a little weird to cover rescheduling before scheduling a new job, but if you look at how you use this package in your app, You initialise it, you Reschedule all the current jobs from Disk to Memory and then based on the API Request / Event you schedule a new job.

Here is how you schedule a new job

// decideCallback has access to jobId, dateToRunOn and metadata if any.
// the Scheduler binds this data when scheduling or rescheduling jobs.
async function decideCallback(job) {
  const { jobId, dateToRunOn, operation } = job;
  if (operation === "publishReport") {
    await publishReport(jobId, dateToRunOn);
  }
}

async function scheduleJob(jobId, dateToRun, metadata) {
  await Scheduler.scheduleJob(decideCallback, dateToRunOn, jobId, metadata);
}

scheduleJob(
  jobId: '123',
  dateToRunOn: '2023-11-13T07:44:06.191Z',
  { operation: 'publishReport' },
);

When the callback function is executed(In our case decideCallback), it calls the function with the jobId, dateToRunOn and all the other metadata that means the jobId, dateToRunOn and all the other metadata are accessible inside decideCallback through the first argument.

The Scheduler.scheduleJob function expects a callback, dateToRunOn, jobId and metadata.

metadata and jobId are optional

  1. metadata - This is any information you want to save in the database or use to choose which function or API to call when the job runs.

    In our case, I am storing a field called operation that will help me decide which function to call when the job is executed.

  2. jobId - If you don't pass a jobId then the package creates a unique UUID.

    The Scheduler.scheduleJob returns a job object with a method called get getJobId

    You can use that to store a jobId that can be further used to cancel a job.

    Here is how you cancel a job

await Scheduler.cancelJob(jobId);

How to Update a Job

Maybe you want to change the date of execution or add some new metadata to a job.

There is an inbuilt method that lets you update a job.

It accepts the same things as a Scheduler.scheduleJob but the only difference is that it cancels the current running job and reschedules a new one.

await Scheduler.updateJob(jobId, callback, dateToRunOn, metadata)

Why not use a Managed Service like Event Bridge?

It's not like I don't want to use Event Bridge. In fact, I use it for a couple of cron-style APIs that have to be run on specific intervals like recalculating analytics for all the links every 10 minutes for tapthe.link etc.

I also agree that managed services are great.

However, the use case for scheduling reports was a little different in my case.

Here are my requirements

  • The Reports of students are usually published by the end of the term - meaning long-lived jobs.

  • Wanted to be accurate within 5 minutes of max delay.

  • I won't have more than 5000 active jobs at a given time

I just didn't want to poll my DB every 2-5 minutes through a cron expression because I know that there won't be any new job the majority of the time.

The job should be like an event.

an event occurred -> timestamp was hit -> Job has to be executed.

That's why I liked the idea of using a date-based scheduler instead of polling the DB. There is an internal clock in the app that knows when to execute a job.

Current Limitations of Job Stash

  • It requires your tech stack to be MongoDB. If you don't use MongoDB as your primary database, it doesn't make sense to introduce another DB just to support a scheduler. If you plan to make your scheduler service separate then it's still okay to have mongoDB. (Even the Free plan is more than enough)

  • Doesn't Support Cron Job yet - It's not that hard enabling this, I just haven't worked on this yet.

    For now, all my cron-related things just work on Event Bridge. There is a hacky way of making interval-based things work in the current setup too, you just have to schedule a new job whenever the current job is executed but yeah , I haven't built it. will build it soon or you can build and contribute too.

Why does Job Stash even exist?

While there exists a plethora of packages and ways to schedule jobs. The Hackernews Community loves using systemd but that involves a lot of boilerplate code and research.

it's just pain.

Job Stash exists for developers who have a Node.js codebase with MongoDB as their database and want to get a scheduler up and running within 5 minutes.

Job Stash exists for developers

  1. Who have a Node.js codebase

  2. Who have MongoDB as their database

  3. Who like their machines to be scaled horizontally and want only one of them to execute that job

  4. Who wants easy rescheduling of jobs on restart

  5. Who schedules a medium scale of 50K-100K jobs

  6. Who wants to schedule a job on a specific date and time instead of polling the database every X minutes.

Do let me know what you think in the comment section.

Shout out to Prajwal who reviewed the code and motivated me into building this.