publishOnce()
updated to fall back to the queue if the key argument is missing.- Upgraded uuid dependency to version 8.0.0.
- Retention policies added for internal maintenance queues to reduce the number of records in the job table.
- Fixed issue in some multi-master use cases where too many maintenance jobs were being created.
- Changed
deleteQueue(name)
anddeleteAllQueues()
behavior to only impact pending queue items and not delete completed or active jobs. - Added
getQueueSize(name)
to retrieve the current size of a queue. - Added
clearStorage()
as a utility function if and when needed to empty all job storage, archive included. - Restored older schema migrations to allow upgrading directly to version 4 from version 1.1 and higher.
- Upgraded pg dependency to version 8.0.0.
- Restored BYODB support for Knex.js
-
start()
is now fully multi-master ready and supported for installation, schema migrations and maintenance operations. -
Added default configurations. The following options can now be set in the constructor and will apply to all usages of
publish()
orsubscribe()
on the instance unless overridden on the functions themselves.- Subscribe
- polling interval
- Publish
- Expiration
- Retries
- Retention (new)
- Subscribe
-
MAJOR: Replaced expiration pg interval string configuration in
publish()
with specific integer options for better validation and api consistency. If theexpireIn
option is detected after upgrading, you will see a warning such as the following, which will only be emitted once per instance. As mentioned above, all of these options can become defaults if used in the constructor configuration.(node:1) [pg-boss-w01] Warning: 'expireIn' option detected. This option has been removed. Use expireInSeconds, expireInMinutes or expireInHours
- Removed:
expireIn
- Added:
expireInSeconds
expireInMinutes
expireInHours
- Removed:
-
MAJOR: Added retention policies for created jobs. In v3, maintenance operations archived completed jobs, but this policy ignored jobs which were created and never fetched.
- Added the following configuration options to
publish()
andnew PgBoss()
retentionSeconds
retentionMinutes
retentionHours
retentionDays
- Added the following configuration options to
-
MAJOR: Replaced maintenance pg interval string configurations with specific integer options for better validation and api consistency
- Removed:
deleteArchivedJobsEvery
archiveCompletedJobsEvery
- Added:
archiveIntervalSeconds
archiveIntervalMinutes
archiveIntervalHours
archiveIntervalDays
deleteIntervalSeconds
deleteIntervalMinutes
deleteIntervalHours
deleteIntervalDays
- Removed:
-
MAJOR: Consolidated the maintenance constructor options and removed any options for intervals less than 1 second.
- Removed:
expireCheckInterval
expireCheckIntervalSeconds
expireCheckIntervalMinutes
archiveCheckInterval
archiveCheckIntervalSeconds
archiveCheckIntervalMinutes
deleteCheckInterval
- Added:
maintenanceIntervalSeconds
maintenanceIntervalMinutes
- Removed:
-
MAJOR: Split static getMigrationPlans() function into 2 functions for clarity.
- Removed:
uninstall
argument fromgetMigrationPlans(schema, version)
- Added:
getRollbackPlans(schema, version)
- Removed:
-
MAJOR: Removed pgcrypto from installation script.
The breaking changes introduced in this release should not cause any run-time failures, as they are focused on maintenance and operations. However, if you use the deferred publishing options, read the section below regarding retention policy changes, as this version will now archive jobs which have been created but never fetched.
This release was originally started to support rolling deployments where a new instance was being started before another instance was turned off in a container orchestration system. When this happened, sometimes a race condition occurred between maintenance operations causing unpredictable deadlock errors (see issue #133). This was primarily because of the use of unordered data sets in CTEs from a DELETE ... RETURNING
statement. However, instead of focusing on the SQL itself, the concurrency problem proved a far superior use case to resolve holistically, and this became a perfect example of pg-boss eating its own dog food via a dedicated maintenance queue (mentioned below).
The result of using a queue for maintenance instead of timers such as setTimeout()
is the same distributed concurrency benefit of using queues for other workloads. This is sometimes referred to as a multi-master configuration, where more than one instance is using start()
simultaneously. If and when this occurs in your environment, only one of them will be able to fetch a job (maintenance or state monitoring) and issue the related SQL commands.
Additionally, all schema operations, both first-time provisioning and migrations, are nested within advisory locks to prevent race conditions during start()
. Internally, these locks are created using pg_advisory_xact_lock()
which auto-unlock at the end of the transaction and don't require a persistent session or the need to issue an unlock. This should make it compatible with most connection poolers, such as pgBouncer in transactional pooling mode.
One example of how this is useful would be including start()
inside the bootstrapping of a pod in a ReplicaSet in Kubernetes. Being able to scale up your job processing using a container orchestration tool like k8s is becoming more popular, and pg-boss can be dropped into this system with no additional code or special configuration.
As mentioned above, previously only completed jobs were included in the archive maintenance, but with one exception: completion jobs were also moved to the archive even though they were in created
state. This would sometimes result in missed jobs if an onComplete
subscription were to reach a backlogged state that couldn't keep up with the configured archive interval.
A new set of retention options (listed above) have been added which control how long any job may exist in created state, original or completion. Currently, the default retention is 30 days, but even if it's customized it automatically carries over to the associated completion job as well.
Furthermore, this retention policy is aware of any deferred jobs, such as those created with publishAfter()
. If you have future-dated or interval-deferred jobs, the retention policy start date is internally based on the deferred date, not the created timestamp.
If you're upgrading from v3, a migration script will run and set the retention date on all jobs found in 'created' state. For example, if you use the option retentionDays: 7
in the constructor, then run start()
, the migration will assign a retention date of 7 days after the created or deferred date, whichever is later.
To keep maintenance overhead as light as possible, the concurrency of each task (expiration, archiving, deletion) has been adjusted to one operation at a time and placed into dedicated queues prefixed with '__pgboss__'
. The same was also done for the optional state count monitoring.
If you were running pg-boss as a superuser account in production to have it auto-provision the pgcrypto extension in a new database, this change might be viewed as a disadvantage. The primary principle at play in this decision is "It should be simple to uninstall anything which was installed". Adding an extension to a database cannot be scoped to a schema, and it requires superuser privilege. If pg-boss were to install pgcrypto, it would be unsafe to assume it could be later removed, as it may be in use elsewhere. Also, having a script embedded in the installation which requires superuser privilege sends the wrong message of the intent of how applications should be configured in production, where a least privilege model should always be used. As a reminder, below is a simple 1-liner to run in your database if it's not already installed. If you are upgrading pg-boss from a previous version, this is obviously not an issue.
CREATE EXTENSION pgcrypto;
- Deferring housekeeping operations on start to reduce deadlocks during concurrent start() instances
- Fixed rare deadlocks by stacking housekeeping operations one at a time during start().
- Added
archive()
,purge()
andexpire()
to exports for manual housekeeping if desired along with connect(). Use this only if you need it for special cases, as it's not a good idea to run these in parallel (see deadlock comment above). - Added index to archive table by date to improve housekeeping perf.
- Node 8 is now officially the minimum supported version. Not only have I stopped testing anything lower than 8 in Travis, but I finally migrated to async await in this round of changes.
- Typescript type defs.
- Typescript type defs for singletonNextSlot config updated via PR.
- Typescript type defs for deletion config updated via PR.
- Typescript type defs updated for job priority via PR.
- Set default
teamConcurrency
to 1 whenteamSize
> 1.
- Typescript type defs updated for static function exports via PR.
- Added support for typeorm with job insertion script via PR.
- Prevented duplicate state completion jobs being created from an expired onComplete() subscription.
- Typescript defs patch
-
Added wildcard pattern matching for subscriptions. The allows you to have 1 subscription over many queues. For example, the following subscription uses the
*
placeholder to fetch completed jobs from all queues that start with the textsensor-report-
.boss.onComplete('sensor-report-*', processSensorReport);
Wildcards may be placed anywhere in the queue name. The motivation for this feature is adding the capability for an orchestration to use a single subscription to listen to potentially thousands of job processors that just have 1 thing to do via isolated queues.
-
Multiple subscriptions to the same queue are now allowed on the same instance.
Previously an error was thrown when attempting to subscribe to the same queue more than once on the same instance. This was merely an internal concern with worker tracking. Since
teamConcurrency
was introduced in 3.0, it blocks polling until the last job in a batch is completed, which may have the side effect of slowing down queue operations if one job is taking a long time to complete. Being able to have multiple subscriptions isn't necessarily something I'd advertise as a feature, but it's something easy I can offer until implementing a more elaborate producer consumer queue pattern that monitors its promises.Remember to keep in mind that
subscribe()
is intended to provide a nice abstraction overfetch()
andcomplete()
, which are always there if and when you require a use case thatsubscribe()
cannot provide. -
Internal state job suffixes are now prefixes. The following shows a comparison of completed state jobs for the queue
some-job
.- 3.0:
some-job__state__completed
- 3.1:
__state__completed__some-job
This is a internal implementation detail included here if you happen to have any custom queries written against the job tables. The migration will handle this for the job table (the archive will remain as-is).
- 3.0:
- Removed connection string parsing and validation. The pg module bundles pg-connection-string which supports everything I was trying to do previously with connection strings. This resolves some existing issues related to conditional connection arguments as well as allowing auto-promotion of any future enhancements that may be provided by these libraries.
- Retry support added for failed jobs! Pretty much the #1 feature request of all time.
- Retry delay and backoff options added! Expired and failed jobs can now delay a retry by a fixed time, or even a jittered exponential backoff.
- New publish options:
retryDelay
(int) andretryBackoff
(bool) retryBackoff
will use an exponential backoff algorithm with jitter to somewhat randomize the distribution. Inspired by Marc on the AWS blog post Exponential Backoff and Jitter
- New publish options:
- Backpressure support added to
subscribe()
! If your callback returns a promise, it will defer polling and other callbacks until it resolves.- Returning a promise replaces the need to use the job.done() callback, as this will be handled automatically. Any errors thrown will also automatically fail the job.
- A new option
teamConcurrency
was added that can be used along withteamSize
for single job callbacks to control backpressure if a promise is returned.
subscribe()
will now return an array of jobs all at once whenbatchSize
is specified.fetch()
now returns jobs with a conveniencejob.done()
function likesubscribe()
- Reduced polling load by consolidating all state-based completion subscriptions to
onComplete()
- Want to know if the job failed?
job.data.failed
will be true. - Want to know if the job expired?
job.data.state
will be'expired'
. - Want to avoid hard-coding that constant? All state names are now exported in the root module and can be required as needed, like in the following example.
const {states} = require('pg-boss'); if(job.data.state === states.expired) { console.log(`job ${job.data.request.id} in queue ${job.data.request.name} expired`); console.log(`createdOn: ${job.data.createdOn}`); console.log(`startedOn: ${job.data.startedOn}`); console.log(`expiredOn: ${job.data.completedOn}`); console.log(`retryCount: ${job.data.retryCount}`); }
- Want to know if the job failed?
- Batch failure and completion now create completed state jobs for
onComplete()
. Previously, if you called complete or fail with an array of job IDs, no state jobs were created. - Added convenience publish functions that set different configuration options:
publishThrottled(name, data, options, seconds, key)
publishDebounced(name, data, options, seconds, key)
publishAfter(name, data, options, seconds | ISO date string | Date)
publishOnce(name, data, options, key)
- Added
deleteQueue()
anddeleteAllQueues()
api to clear queues if and when needed.
- Removed all events that emitted jobs, such as
failed
,expired-job
, andjob
, as these were all instance-bound and pre-dated the distribution-friendlyonComplete()
- Removed extra convenience
done()
argument insubscribe()
callback in favor of consolidating tojob.done()
- Renamed
expired-count
event toexpired
- Failure and completion results are now wrapped in an object with a value property if they're not an object
subscribe()
with abatchSize
property now runs the callback only once with an array of jobs. TheteamSize
option still calls back once per job.- Removed
onFail()
,offFail()
,onExpire()
,onExpire()
,fetchFailed()
andfetchExpired()
. All job completion subscriptions should now useonComplete()
and fetching is consolidated tofetchCompleted()
. In order to determine how the job completed, additional helpful properties have been added todata
on completed jobs, such asstate
andfailed
. startIn
option has been renamed tostartAfter
to make its behavior more clear. Previously, this value accepted an integer for the number of seconds of delay, or a PostgreSQL interval string. The interval string has been replaced with an UTC ISO date time string (must end in Z), or you can pass a Date object.singletonDays
option has been removed- Dropping node 4 support. All tests in 3.0 have passed in CI on node 4, but during release I removed the Travis CI config for it, so future releases may not work.
- The pgcrypto extension is now used internally for uuid generation with onComplete(). It will be added in the database if it's not already added.
- Adjusted indexes to help with fetch performance
- Errors thrown in job handlers will now correctly serialize into the response property of the completion job.
- Typescript defs patch
- Added
max
constructor option additional topoolSize
- Migration: use pg transaction to avoid inconsistency
- Archive: Existing archive configuration settings now apply to moving jobs into a new table
arvhive
instead of immediate deletion. This allows the concerns of job indexing and job retention to be separated. - Archive:
deleteArchivedJobsEvery
anddeleteCheckInterval
settings added for defining job retention. The default retention interval is 7 days. - Archive: Changed default archive interval to 1 hour from 1 day.
- Monitoring: Updated contract for
monitor-states
event to add counts by queue, not just totals. - Monitoring: Adjusted queue size counting to exclude state-based jobs. While these were technically correct in regards to physical record count, it was a bit too difficult to explain.
- Downgraded bluebird to a dev dependency. Always nice to have 1 less dependency.
- Typescript defs patch
- Typescript defs patch
- Typescript defs patch
- Added constructor option
db
for using an external/existing database connection. This bypasses having to create an additional connection pool.
- Patch to prevented state transition jobs from being created from existing state transition jobs. Kind of meta. These were unfetchable and therefor just clutter.
- Patch to allow custom schema name with a connectionString constructor option.
- Patch to fix missing error on
failed
event. via PR #37.
- Patch to fix typescript types path
- Typescript defs
- Patched pg driver to 7.1
- Upgrade pg driver to 7.0
- Added state transition jobs and api for orchestration/saga support.
- Added job fetch batching
- Added
onExpire(jobName, callback)
for guaranteed handling of expiration (not just an event anymore) failed
was added as a job status- now emits 'failed' on unhandled subscriber errors instead of 'error', which is far safer
done()
insuscribe()
callbacks now support passing an error (the popular node convention) to automatically mark the job as failed as well as emitting failed. For example, if you are processing a job and you want to explicitly mark it as failed, you can just calldone(error)
at any time.fail(jobId)
added for external failure reporting along withfetch()
andcomplete()
unsubscribe(jobName)
added to undo asubscribe()
- Dropped support for node 0.10 and 0.12
- Added new publish option called
singletonKey
was added in order to make sure only 1 job of a certain type is active, queued or in a retry state - Added new publish option called
singletonNextSlot
was added in order to make sure a job is processed eventually, even if it was throttled down (not accepted). Basically, this is debouncing with a lousy name, because I'm not very good at naming things and didn't realize it at time - Added
newJobCheckInterval
andnewJobCheckIntervalSeconds
tosubscribe()
instead of just in the constructor - Added
poolSize
constructor option to explicitly control the maximum number of connections that can be used against the specified database - 0.x had a data management bug which caused expired jobs to not be archived and remain in the job table. I also added a fix to the migration script so if you had any old expired jobs they should be automatically archived.
- Error handling in subscriber functions!
Previously I've encouraged folks to handle their own errors with try catch and be as defensive as possible in
callback functions passed to
subscribe()
. However, it was too easy to miss that at times and if an error occurred that wasn't caught, it had the pretty lousy side effect of halting all job processing. 1.0.0 now wraps all subscriber functions in try catch blocks and emits the 'error' event if one is encountered.