2018/02/16

Redesigned jBPM executor

jBPM executor is the backbone for asynchronous execution in jBPM. This applies to so called async continuation (when given activity is marked as isAsync), async work item handler or standalone jobs.

Currently jBPM executor has two mechanisms to trigger execution:

  • polling mechanism that is available for all environments 
  • JMS based that is only available in JEE environment with configured queue

JMS part was left as is because it proved to be extremely efficient and performs way better than polling one. Worth mentioning is that JMS based is only applicable for immediate jobs - meaning retries will always be processed by polling mechanism.

On the other hand, polling based mechanism is not really efficient and in some cases (like cloud deployments where charge model is pay as you go) can cost more - due to periodic queries that checks for jobs to execute, even if there are no jobs. In addition to this, with high volume of jobs polling mechanism is actually suffering from race conditions between jBPM executor threads that constantly try to find a job to execute and they might be trying to get the same job. To solve that jBPM executor uses pessimistic locks for queries to make sure that only one instance (or thread) can fetch given job. This in turn caused bunch of issues with some of the data bases.

All these led to redesign of jBPM executor internals to make it more robust, not only in JEE environment and not only for immediate jobs.

What has changed?

The most important (from user point of view) is meaning of one of the system properties used to control jBPM executor:
  • org.kie.executor.interval 
this property was referring to how often polling thread was invoked to check for available jobs and was set to 3 (seconds) by default.
After redesign, this in turn has default value 0 and refers to how often executor should sync with underlying data base. This should only be used in cluster setup when failover (execution of jobs from other instance) should be enabled.

There is no initial delay any longer as it used to be, to let executor delay execution to allow other parts of the environment to finish bootstrapping. Instead, executor is started (initialised) only when all components finished - in context of KIE Server this is only when KIE Server is actually ready to serve requests.

There is no more polling threads (except the optional sync with db) that are responsible for executing jobs. With that all EJB with asynchronous methods are gone too.


Implementation

So how would it actually work now? Diagram below shows the components (classes) involved and following is explanation on how they interact



ExecutorService is the entry point and the only class that user/client code interacts with. Whatever client needs to do with executor it must go via executor service.

ExecutorService delegates to executor (impl) all the scheduling related operations like:
  • schedule jobs
  • cancel jobs
  • requeue jobs
additionally ExecutorService uses other services to deal with persistent stores. Though this part has not changed.

ExecutorImpl is the main component in jBPM executor and takes all responsibility for maintaining consistent scheduling of jobs. It embeds special extension to ScheduledThreadPoolExecutor called PrioritisedScheduledThreadPoolExecutor. The main extension point is to use overridden delegateTask  to enforce prioritisation of jobs that should fire at the same time.

ExecutorImpl schedules (depending on settings for interval and time unit property) sync with data base. As soon as it starts (is initialised) will load all eligible jobs (with queued or retrying status) and schedule them on thread pool executor. At the same time it handles duplicates to avoid multiple schedules for the same job. 
What is actually scheduled in the thread pool executor is a thin object (PrioritisedJobRunnable) that holds only three values:
  • id of the job
  • priority of the job
  • execution time (when it should fire)
each job has also reference to AvailableJobProcessor that is actually used to execute given job.

AvailableJobProcessor is pretty much the same as it was, it's responsibility is to fetch given job by id (this time complete job with all data) and execute it. It also handles exceptions and completion of the job by interacting with ExecutorImpl whenever needed. It uses pessimistic lock when fetching a job but it avoid any of the problematic constructs as it gets the job by id.

LoadJobsRunnable is a job that can be one time or periodic to sync thread pool executor with underlying data base. In non cluster environments it should be run only once - on startup and this is always the case regardless of the setting of org.kie.executor.interval property. Though in cluster environment where there are multiple instance of the executor using same data base interval can be set to positive integer to enable periodic sync with data base. This is to provide failover between executor instances.


How jobs are managed and executed

On executor start, all jobs are always loaded, regardless if there are one or more instances in the environment. That makes sure all jobs will be executed, even when their fire time has already passed or was scheduled by other executor instance.
Jobs are always stored in db, no matter what trigger mechanism is used to execute the job (JMS or thread pool).

With JMS

Whenever JMS is available it will be used for immediate jobs, meaning they won't be scheduled in thread pool executor. Will be directly executed via JMS queue as they are handled in the current implementation. 

Without JMS

Jobs are stored in data base and scheduled in thread pool executor. Scheduling takes place only when transaction was committed successfully to make sure job is actually stored before it's attempted to be executed.
Scheduling takes place always in the same JVM as the request is being handled. That means there is no load balancing and regardless when the job should fire it will fire in the same server (JVM). Failover will only apply when the periodic sync is enabled.

Thread pool executor has configurable ThreadFactory and in JEE environment it relies on ManagedThreadFactory to allow access to application server components such as transaction manager etc. ManagedTreadFactory is configurable as well, so users can define their own thread factory in application server instead of the default one.


Performance

Thread pool executor is extremely fast at executing jobs but it's less efficient when scheduling jobs as it must reorganise internal queue of jobs. Obviously that will depend on the size of the queue but it's worth to keep in mind. 
Overall performance of this approach is way better than polling mechanism and does not cause additional load on db - to periodically check for jobs.
It's performance is close to what JMS provides (at least with embedded broker) so it gives really good alternative for non JEE environments like servlet containers or spring boot.


Conclusion

Main conclusion is that this bring efficient background processing to all possible runtime environments that jBPM can be deployed to. Moreover it reduces load on data bases and by that in some cases reduces costs. 
When it comes to when to use which approach I'd say:
  • use JMS when possible for immediate jobs, in many cases it will be more efficient (especially with big volume of jobs) and provides load balancing with clustered JMS setup
  • use thread pool executor only when JMS is not applicable - servlet containers, spring boot, etc
The defaults for KIE Server are as described above, on JEE servers it relies on JMS and for tomcat or spring boot it uses only thread pool executor. 

In both ways it is now comparable when it comes to performance of the async jobs execution.