Jenkins multiple masters using Consul
2019-11-26 / modified at 2023-07-30 / 1.1k words / 6 mins
️This article has been over 1 years since the last update.

Consul is HashiCorp’s service networking solution for naming service. In this post, we will show how to use consul to design serverless Jenkins clusters.

For small or Enterprise teams. there are many DevOps tools to choose.

Small TeamsEnterprisePrice
SAASCircleCI, Azure, Codefresh, Github ActionDepends on the security level.Pay as pipeline time
Self-hostedsingle free Jenkins instance is enough.Buy enterprise license of SAAS/Jenkins, or develop from Jenkins OSS version.Infrastructure/Software License/customization

This post is for readers

  • Who are interested in Jenkins’s details.
  • Who are in a large team looking for a self-hosting open source(or vendor-neutral) solution, and have the ability and time to customize Jenkins.
  • This post contains high-level thoughts only, the implementation may cost 6~15 man-months, and there is no open-source product available publicly.

Liebig’s law of Jenkins

For historical reasons, Jenkins uses a memory and file based solution to maintain the running job status.

Running statusInternalProblems
ConfigurationXML file basedNot too bad
QueueArrayListcentral embedded queue
AgentConcurrentSkipListMaplog(n) complexity, may slow when too much.
Running JobsBind to agentreal-time scheduling.
Logssent from agent into masterlog/network stress.

Existing multiple masters solutions

To overcome the problem, there are some solutions already.

  • Binlog-like solution: Using SCM Sync configuration plugin/rsync/Enterprise version or Shared NFS to replicate settings from one to another. However, it is just a hot standby failover solution, only one Jenkins instance is working.
  • Strong consistent solution: You need to implement a centralized lock(Database/RAFT/WAL) to ensure each write waits until confirmation is received from both master and slave, that means you may need a team to modify and maintain the core source code from Jenkins.
  • Sharding based solution: split jenkins instances into disjoint services. A global and dynamic mapping(consistent hash or else) between users(tenant) and services is required.
    • ClientSide mapping: Gearman is the real multiple masters’ solution which the masters are treated as ‘Jobs’ in Gearman. However the repository is unmaintained for nearly 5 years, and it’s cli is more like an Ansible solution, there is no open source Jenkins GUI avaliable in this solution.
    • ServerSide mapping: create a load balancer such as Nginx/citus, however they are hard to be maintained and upgraded.
  • Sidecar based solution: Using AOP and wrappers to forward RESTful services, webhooks and logs into a centralized datalake.

Here is a summary

SolutionProsCons
Binlogeasy to createonly one instance is working, bind with NFS/rsync
Global lockReal HApatch and maintain source code.
ShardingEasy to understandMaintain the mapping
SidecarAOP basedToo much works to be wrapped && too much stacks

Serverless

What is serverless?

Serverless is stateless(or states are moved outside), which means your Jenkins instance is only a jenkinsfile interceptor and has no bind with the local database(in JENKINS_HOME).

$2JenkinsfileJenkinsfile RunnerOutput

What’s Jenkins-X serverless solution?

The jenkins-X is based on kubernetes, and use the Sidecar pattern too. However

  • I just want a jenkinsfile interceptor, but got the full kubernete/helm instances(buy, install, and maintain the kubernetes clusters, and configure the F**K yaml)
  • Single-vendor lock in, using CD only with kubernates/GitHub.
  • It’s hard to customize own steps, and bind with a fixed git methodology/YAML.
  • JVM memory saves, but k8s creates new problems.

See more at https://codefresh.io/continuous-deployment/codefresh-versus-jenkins-x/

My Sidecar based Solution

Here is my solution, every masters are independence, where data get eventually consistency with low latency

Summary

All components are made with opensource products.

$2Multi-clouds Agent PoolMastersControllerNomadDockerKubernetesMesosJenkins1Jenkins2Data lakeClientconsulfilebeatsHealth checkRunListenerJob.xml withJenkinsfilehelp chooseTCP
multiple-clouds Jenkins

This solution is not a smart solution, but saves your time

  • No stronghly bind with any cloud platforms(Kubernetes/Nomad/Mesos).
  • Most works will be transformed into a Spring/Database based project -> They are cost-effective, easy to be designed, developed and matainted, and make employee recruitment or outsourcing easier.

Client

  • Create and send raw XML to jenkins instances.
  • Inside the XML, the payload can be Jenkinsfile, YAML, JSON, or the other DSL.
  • Job definitions will be saved into a database(pg/mysql/elk).

Data lake

  • ELK/KsqlDB: ETL, grafana and visualization.
  • Jenkins shell step need to be reimplemented with redirect for centralized logs.

Jenkins

Modifications with jenkins.war and other plugins are not required. PAAS bind naming services(k8s/nomad) are also not required.

Just create a jenkins plugin(not open sourced here)

  • Use hudson.model.listeners.RunListener to send back job results(RUNNING/SUCCESS/FAIL).
  • Use shared library for intercepting the high-level DSL

Consul

  • Use Nomad’s HCL or Kubernetes CRD/annotation or sidecar to register service to the Consul -> No SDK installation required in the jenkins.war.
  • Use consul block query and metadata for client side load balancing.
  • Use maintenance mode when releasing or upgrating a Jenkins instance.

Why not create and destroy Jenkins container instance per job?

  • JVM cold starts is too slow when you have lots of jobs/plugins(30s or more).
  • The main costs is infra(machines/energy) when self-hosting, and Jenkins masters are alway filled with workload.

Summary

ProblemsBeforeAfter
Pipeline definationMaster DiskDatabase
Build resultMaster DiskForwarding to data lake by RunListener class
Step detailsMaster DiskJust forwarding, not persistence
Stdout LogMaster DiskForwarding to ELK by filebeats
WebhookMaster DiskSpring Controller wrappers
Cron JobMaster DiskQuartz(on Database)+WebHook
SCM pullingMaster DiskQuartz or WebHook
Load balancerNoConsul blocking query
visualizationJSP likePowered by Spring RESTful API

I should emphasize again that this solution is not cheap at all.

Conclusion

Pros and cons

Single JenkinsPaid SolutionsSidecar based Solution
CostsFree, no guaranteesLicense feesCustomization costs and time
Parallels jobsLimitedBy license type10K Jobs+, depends on your datalake
HANoYesYes(Flight jobs still down)
Pipelines as codeGroovyYAML/NodeJSSame as Jenkins + DSL
TenantNoYesCustomized by Spring
AuthorizationSimpleYesCustomized by Spring(RBAC/LDAP/…)
CDNoYesDSL template

So, what you will get depends on how much you pay. This post proposes a way to implement mutilple masters, but requires a lot of works and time.