This month, we look at the top eight questions about Camunda Platform 8 - as determined by the super-scientific rule of thumb of what people ask about in Meetups and on the Forum and Slack.
A number of these questions were discussed in a panel discussion at a US Camunda User Group Meetup with consultants from NTConsult. If you want to watch that, you can view a recording on Youtube.
Here are the top eight questions covered in this episode:
1. Camunda 8 relies on a remote engine paradigm instead of the embedded engine which is more common on the Spring Boot implementations. What are the advantages of remote engines over embedded ones?
See also Bernd's blog post on this topic "Moving from Embedded to Remote Workflow Engines"
2.We know remote engines come with some limitations: We don’t have delegates, for instance. What strategies are recommended for overcoming them:
3. For Camunda 8 SaaS, web modeler is used to deploy processes, but for self hosted, what is the best strategy for DevOps and project management? And are there plans to have web modeler for self-hosted?
4. Camunda 8 has 3 APIs in different technologies: gRPC, GraphQL and REST. Each is responsible for part of the functionality that was provided by the REST API on 7.
Can you clarify the differences in scope of each of them?
5. What are the best alternatives for someone who wants to implement their own tasklist?
6. Why do I get RESOURCE_EXHAUSTED errors when I add workers / increase the maxJobsActive setting?
7. Why the change in licensing between Camunda 7 and Camunda 8, and what is it?
Here is Bernd's blog post "How Open is Camunda Platform 8?"
8. Implementing a cloud native application requires new roles and new capabilities within our teams. Can you describe the internal stack, and what skills we need to be leveraging not only to implement the projects (both Saas and self hosted), but also to operate camunda once in production?
Welcome to this month’s episode of the Camunda Nation podcast. I’m your host - my name is Josh Wulf and I’m a Developer Advocate at Camunda.
This month, I’m answering the “Top 8 questions about Camunda Platform 8”.
I selected these eight questions from two sources: the forum and Slack, which are primarily people who are using Camunda Platform 8 and asking specific questions related to deployment and development; and from Meetups, where there are a lot of Camunda Platform 7 users asking about the differences between the two.
I also answered many of these questions as a panel talk at the Camunda User Group Meetup in the United States, with some of the folks at NTConsult. I’ll put a link to the video of that Meetup in the show notes, in case you want to check that out.
So without any further ado, let’s jump into the top 8 questions about Camunda 8.
[00:01:03] Question 1: Camunda 8 relies on a remote engine paradigm instead of the embedded engine which is more common on the Spring Boot implementations. What are the advantages of remote engines over embedded ones?
Answer: A principle advantage of a remote engine is that it can be horizontally scaled across multiple nodes to take advantage of the hardware for failure tolerance and high-performance. That’s one of the principle design goals for Camunda 8. It’s a radical re-imagining of the workflow engine to make it able to scale linearly to an extent that Camunda 7 cannot.
So if you really need an embedded engine, then by all means use Camunda 7. If you need the level of scalability that Camunda 8 gives you - then yes, that means that you need to use a remote engine architecture.
As well as the scalable architecture piece, we found that having multiple scenarios for development and deployment of the engine vastly increased the complexity of development and support for us.
With Camunda Platform 8, we opted to do one thing and do it well: build the world’s most scalable workflow engine.
So there are obvious advantages to an embedded engine: the resources are lower. The complexity of the system is lower. The number of teams involved in deploying and maintaining it is lower. It is effectively an application library, whereas the remote engine is a piece of infrastructure.
Camunda 8 is for when you need the scalability that makes those tradeoffs worthwhile.
[00:02:44] Question 2: We know remote engines come with some limitations: We don’t have delegates, for instance. What strategies are recommended for overcoming them:
What to do instead of listeners?
How to replicate the functionality we had with Global listeners and engine plugins?
The first thing here is to ask yourself the question: should I be doing this?
I’d say that you shouldn’t be doing this just because Camunda 8 is the new hotness and “we have to use the latest version”. You really want to be doing this because Camunda 8 provides something that Camunda 7 doesn’t - which really means the scalability piece.
If you wait long enough, Camunda 8 is going to approach closer and closer to feature parity with Camunda 7. At the moment, it’s a radically innovative platform that provides breakthrough scalability with a much smaller feature set than Camunda 7 - for example, fewer BPMN symbols are supported.
Also, listeners and engine plugins are not supported.
The entire architecture is built around the external task worker pattern. Everything - literally everything - is an external task. The engine itself is effectively a black box with a simple interface for deploying starting processes and getting work for external workers.
This black box, however, does emit a stream of events. Or rather, it can, by loading an event exporter. You can write your own custom event exporters quite easily. These can filter on the events that you are interested in, and can emit them using the transport you want.
One approach to execution listeners, for example, would be to write an exporter that emits all events, and then to put your execution listener logic in something that reads that event stream. This is an event stream for the engine - not per-process-instance. So it has a global read on everything happening in the engine.
For task listeners, you would need to implement your own Task list, and fire the task lifecycle events from there.
So it is doable - with one caveat. You can only do this using Self-Managed, because you can’t load custom exporters in the SaaS hosted version.
But again - if you wait long enough, this functionality will eventually show up in Camunda 8. Or, you might go ahead and implement it, and open source it, and it gets rolled in from the community as the solution. Or you might go back to first principles, and have a look at, like: “What are actual primitives that are available - the building blocks that are available in Camunda Platform 8? And how do I reimagine this problem and solve it with the things that I have at hand?”
Changing the model, for example.
But again, it’s looking into the specifics of your situation: “is the additional complexity of writing event-streaming exporters and listeners worth it? Is there another way to do it? Or do we just stay with Camunda 7?”
[00:05:59] Question 3: For Camunda 8 SAAS, web modeler is used to deploy processes, but for self hosted, what is the best strategy for DevOps and project management? And are there plans to have web modeler for self-hosted?
Another great question. This one’s very hot, this question. People love Web Modeler.
So, I don’t know that web modeler deploying processes is really the best strategy for DevOps, even in SaaS. It’s a great demo, and a great fast REPL cycle for development - for developing, deploying, and starting instances to test them. It’s great for collaboration - especially between distributed teams. But is it the best strategy for production? This is a different question.
Personally, for production, I would probably export the models to source control, and deploy them in a CI/CD pipeline from a tag. You can do that using Jenkins or even GitHub workflows - either using zbctl, or code you write in any of the client libraries. There is even a GitHub Action for Camunda 8 SaaS that can be used to deploy models on git push to a production deployment branch or matching on a tag.
So having your models and your code in the same place, tagged with the same revision labels makes understanding what’s running and what revision it’s at easy to understand and easy to look at.
So maybe the best strategy for both SaaS and self-hosted is the same thing. You want a single view of your production system if possible, and tags and source control allow you to do that.
And the Web Modeler is fantastic for Rapid Application Development and collaboration, so I understand why folks want that in self-hosted. That is part of the roadmap for self-hosted. It is coming. When? Not tomorrow, but it is high in the list of priorities.
[00:08:01] Question 4: Camunda 8 has 3 APIs in different technologies: gRPC, GraphQL and REST. Each is responsible for part of the functionality that was provided by the REST API on 7.
Can you clarify the differences in scope of each of them?
Workflow commands are sent to the engine over gRPC. You won’t have to deal with gRPC implementation or gRPC as a transport, however, as client libraries wrap these operations into high-level APIs. So you can effectively ignore it in 90% of cases. The 10% of cases where you do have to pay attention to the fact that these calls are going out over gRPC are related to firewall configuration, especially if you doing Self-hosted, or punching out of your corporate firewall to access SaaS. So that gRPC API talks directly to the Zeebe engine.
Tasklist–the Tasklist component–is queried over GraphQL. GraphQL is a good match for Tasklist. It’s a technology designed to reduce the number of related calls you have make to a remote system, traversing a graph returned by previous calls so you can batch it all into a single query.
The REST API - so REST is used by two components in the stack. One is the the Cloud Console, which in the SaaS offering allows you to query your organization and provision new clkusters.So you could write your own company frontend to the Cloud Console to allow developers to provision new clusters according to your own rules. A REST API is also used by Operate which is the runtime management component of Camunda Platform 8; it’s roughly analagous to Cockpit in Camunda 7.
For Operate, you can use it to do things like listing deployed workflows. This is an interesting one. You might expect listing deployed workflows to be part of the gRPC API with the Zeebe engine but it turns out that it’s not.Effectively, the gRPC interface is write-only with one or two exceptions. It’s a command channel. The only exceptions to that are a status query - topology - to get the cluster status, and the activate jobs command to get more work. That is a command that returns a stream of jobs for a worker. Otherwise, it’s pretty much write-only.
So to find out what process definitions are deployed, you need to query Operate, which builds a picture of the current state of the system by consuming an event stream from an exporter and building that picture which includes what process defintions are deployed. So, that’s an interesting one there, where you expect it to be part of the gRPC command set but it’s actually part of the Operate REST command set. But those are the three differents ones: gRPC for the Zeebe workflow engine; GraphQL for Tasklist; and REST for Operate and for the Cloud Console.
[00:11:20] Question 5: What are the best alternatives for someone who wants to implement their own tasklist?
You can do it in one of two ways:
You can either consume the Tasklist GraphQL API, or you can implement your own worker that subscribes to the task type: io.camunda.zeebe:userTask.
If you consume the Tasklist API, you need to get a license to use it in production. With the Tasklist component you get an Identity provider via Keycloak - so you can easily integrate an existing user identity system. You also get the GraphQL API to do queries and mutations.
You don’t get Task listeners that can fire on the Task claimed lifecycle event, however. Using Camunda Platform 8 self-managed with an exporter you could trigger events on Task created and Task completed. However, there is no broker event for “Task claimed”, because it happens in Tasklist.
So if you really needed / wanted that, you could implement your own Task list using the external worker pattern.
[00:13:00] Question 6: Why do I get RESOURCE_EXHAUSTED errors when I add workers / increase the maxJobsActive setting?
OK, I love this question because it requires really understanding the architecture of Zeebe - the workflow engine underlying Camunda Platform 8.
You might think “more external task workers equals more performance”, or “making my external task worker capable of more work in parallel equals more performance”. However, neither of these is necessarily the case.
Why? At a high-level, it’s because Camunda Platform 8 is fundamentally a distributed streaming system.
This means that it has greater linear scalability than your traditional RDBMS-bound workflow engine, but it has different bottlenecks, back pressure at various points, and a radically different performance envelope. It is a complex system and tuning it requires understanding how it works.
In this specific case, every worker periodically sends an ActivateJobs command to the gateway. The gateway then sends this command to each workflow engine in the cluster. This command gets appended to the event stream and needs to be processed by the stream processor. This all takes CPU cycles and I/O operations. More workers means more requests. When the system is under load and the stream processor starts lagging behind the event stream: ie: more events are being added than processed every second, the engine starts to reject commands with a back pressure signal - GRPC ERROR 8: RESOURCE_EXHAUSTED.
Why raising the maxActiveJobs setting for a worker should cause this is less obvious.
Let’s imagine a worker with maxActiveJobs set to 1. This is effectively a statement about how many jobs this worker can execute in parallel. The worker asks for one job, and when it gets that one job, it starts work on it, completes the work, then asks for another job.
Now consider a worker with maxActiveJobs set to 5. This worker is saying: “I can handle five jobs at the same time.”
So, let’s say this worker asks for 5 jobs, but only one job is available. The one job is returned and the worker starts work on it - but immediately asks for another four jobs. Again, one job is available, so it gets one job and starts work on it, and immediately asks for three jobs.
So the requests for jobs have been parallelized. At the same time, the completion of jobs has also been parallelized. In the case where jobs are becoming available one at a time, it is like having five workers connected. This leads to many more job activation requests per unit of time.
However, if you have a higher throughput of work - more jobs available at a time, and that take longer to process, then it makes sense to have a worker ask for five at a time. The work is then parallelised in the worker. This setup is not going to result in more job activation requests per unit of time, but is going to mean that the worker can crunch through the work in parallel. So it is more suited to I/O or CPU-intensive tasks that take more time for the worker, and when there is either sufficiently high load on the system or sufficiently low, so that the worker is not being given an anemic stream of work and constantly asking for more.
So tuning the performance of this setup depends on the time taken in the worker and the number of jobs created per second in the cluster. “More” is only better if it is impedance-matched to the rest of the system. The best way to tune the system is to map the performance envelope by trying different settings while using a spreadsheet to track the variables - things like process instances started per second, jobs available per second, time spent in worker, and MaxJobsActive setting. This is how our consultants tune these systems. They go down to the hardware, tracking the performance based on things like the amount of L2 cache memory in the CPU.
It’s part science and part magic - basically alchemy. You can find dead-spots and surprising performance zones. Usually you are not tuning for the absolute best possible performance, and you are tuning for a range of loads. We have discussed distributing intelligence in the system to allow workers to do things like change their own settings based on things like jobs/second, time spent in the worker, and back pressure signals.
This would, however, introduce a new class of problems - second-order effects that create complex harmonics and unpredictable behavior.
That’s the nature of a distributed, streaming system. It’s complicated, and you need to understand what you are tuning.
[00:18:15] Question 7: Why the change in licensing between Camunda 7 and Camunda 8, and what is it?
There are many variations on this question - asking things like “how is this licensed? How can I use this?”
I think the best way to think through the licensing is to ask a lawyer. And since I am not a lawyer, I’m going to give you the next best thing, and that is to understand the intent or the spirit of the licensing.
With Camunda Platform 8’s core engine, Zeebe, we wanted to make it source-available to enable community contribution, but not expose ourselves to large cloud providers consuming it and reselling it as a service. This is a common issue faced by open source software vendors today, and so the Zeebe license prohibits using Zeebe to provide a generic “Workflow engine as a service” to clients.
With the other components, like Tasklist, Operate, and Optimize - these are free to use in development, and require a license to use in production.
With Camunda Platform 7, we offer a free to use in production version, and an Enterprise licensed version that unlocks additional features. However, we find that this is complicated to understood, and can make it difficult for champions inside the organization to advocate to the decision makers about the value of these features.
So we licensed the ancillary components in Camunda Platform 8 to be free-forever in development, to allow technical teams to build solutions using them that demonstrate the value they provide, making the purchasing conversation more straight-forward, and easier.
I think that understanding these background considerations, which informed the licensing structure, makes it easier to understand the licensing when you read through the specifics. There is a great blog post on this by Bernd Ruecker titled “How Open is Camunda Platform 8?”. I’ll link it in the show notes.
[00:20:11] Question 8: Implementing a cloud native application requires new roles and new capabilities within our teams. Can you describe the internal stack, and what skills we need to be leveraging not only to implement the projects (both Saas and self hosted), but also to operate camunda once in production?
So we have two scenarios: self-managed, where you host everything yourself; and SaaS, where Camunda hosts everything except your business logic - the engine and all the other components of the stack are managed.
SaaS is pretty straightforward - you don’t need anything for the stack itself, Camunda’s SRE team does everything for you.
Self-managed is where you need additional capabilities. Cloud-native means that everything is optimized for, tested on, and documented to deploy to Kubernetes. In this case, it is a really good match for organizations that are already familiar with and using Kubernetes, in either their own cloud or an external cloud provider.
For local development or testing, you can stand up an entire platform stack using Docker Compose. But for production you are going to need Kubernetes, so a dedicated DevOps / SRE type capability.
In terms of the stack:
There is the core broker engine - Zeebe; and this can be clustered as 1 or more nodes, and you have 1 or more gateways that act as the client contact point. You might have a reverse proxy in front of the gateways to provide load-balancing and a single point of contact, like Nginx. That’s how the Camunda-hosted SaaS does it.
Then you have an ElasticSearch cluster for event stream exports.
You have Operate, which is the process instance management UI. This reads the event stream from ElasticSearch, and also writes its own indices in there, basically projections of the “current state” from the historical events.
You have Optimise, which is the Process Business Intelligence piece for analytics and reporting. It interacts with ElasticSearch in a similar way to Operate.
And you have Tasklist, which provides a UI for user tasks, along with a GraphQL API that you can use for queries and mutations.
Then there is Identity and Keycloak, which are used for access control and groups for user task assignment.
One difference between Self-Managed and SaaS - apart from the fact that with Self-Managed you have to manage the deployment, configuration and security of all these components, whereas with SaaS that is all done for you - is that in SaaS you also have the web modeler, which is integrated with the broker clusters, and you also have the Cloud Console - which allows you to provision clusters and create API credentials for client applications, either via the UI or via a REST API.
So web modeler and Cloud Console are not there in Self-Managed - yet. There are plans to bring them, but they aren’t yet part of that stack.
So you have all the Kubernetes deployment and configuration, including things like configuring ingresses and securing transport layers. One thing about this is that with Camunda Platform 8 you have an architecture that maps onto the structure modern organizations that are running cloud-native workloads.
By that, I mean that the separation of concerns maps to the division of labor. You have a DevOps team, and you have a clear DevOps piece. You have an application development team, and have an application development piece.
So those are the eight top questions about Camunda Platform 8, as decided by me - based on Slack, Forum, and Meetup conversations. If you have a question that didn’t get asked, feel free to drop by the Forum and ask away.
I’m your host, Josh Wulf, signing off for this episode.
Stay safe out there and keep automating those processes.