Setting up an active-passive Gitlab instance
2024-11-11 / modified at 2024-11-11 / 1.2k words / 7 mins

Gitlab is an open source DevOps platform used in many companies. The instruction will set up a hot standby OSS Gitlab in self-hosted environment.

Gitlab is more than a repository hosting

Gitlab Inc ($GTLB) is under the pressure of income growth from stockholders. It releases new features every month so that latest technologies could be applied, including AI chat or kubernetes deployment, increasing the complexity of Gitlab instances.

Besides the complexity of new features, Gitlab undertakes production release roles such as a config center, CI/CD workflow or GitOps provider, demanding raised availability in our rapid development.

Gitlab Architecture

Gitlab is a ruby application that is built on top of Postgres, Redis and cloud strorage. Compared with subversion’s focusing on repository hosting, which only requires a httpd loadbalancer and shared disk, Gitlab requires more resources while hosting respositories.

Conponent Overall

The diagram shows the minimum components for the repository hosting.

$2GitlabStatefulGitaly: The FS layerSidekiq: Redis based job queueStatelessRails: The RESTful APINginx/WorkhorseDatabasePostgres/RedisS3: for LFS/uploadsDisk(EBS/NVME)

Unrelative components are disables for vulnerabilities concerns.

Ruby components that hinder the high availability

The major problems hindering the availability while deploying Gitlab on multiple nodes is that Gitlab need a redundant block storage.

There are two components that require shared block storage. The first is sidekiq, a distributed job executor. When users want to retrieve some temporary file generated by a background job, the Nginx can’t assure the request is routed into the same node.

Another is Gitaly, the filesystem layer of Git, which has officially ended of support of NFS, Luster, GlusterFS, multi-attach EBS and EFS. However, the virtual block storage is supported, such as Cinder, EBS, or VMDK.

Keeping minimum components

Features such as CI/CD, GitOps, KAS(Kuberenetes management), or artifacts are disabled in our self-hosting Gitlab, as we are concerned about the segregation of duties(SoD) and security. Instead, we opt other well-known alternatives (JFrog, Jenkins, Gerrit) to maintain the simplicity of Gitlab instances.

Which machine is best for Gitlab self hosting?

We have 3K active users per month on self hosting Gitlab instances, which undertake CI/CD builds (by Jenkins) and source code management. We consistently monitor the Gitlab instances with grafana to find the performance issues.

Cloud instance specs

According to our statistics, there are some requirements for Gitlab hosting

  • CPU: Gitlab is not a CPU-bound application, as the peak CPU time is during the merge request only. But you may not use the oversold CPU instances, as the high performance network adapter requires better specs.

  • Memory: Gitlab requires memories at a range of 8GB and 128GB, besides the application memory, the operating system may cache the file into memory pages.

  • Networking: Gitlab requires high performance I/O during the code pull and push

    • Without LFS: When the Git LFS was not enabled or the traffic was redirected to third party S3 storages, we observed 5Gb/s peak traffics during the pipeline builds.
    • With LFS: If the LFS storage is proxied or hosted by a Gitlab instance, we found a 10Gb/s card is required.
  • Local storage: Gitlab requires high performance storage while differing the code, the cloud SSD disk is recommended.

In summary, I’d recommend deploying on an instance that has 4U/32G or 8U/64G machine with large local cloud SSD disks.

Besides the machine specs, we also found that running on bare metal was not efficient.

  • Additionally management the local RAID5 SSD disks was required, while we already had the centralized storage installed. And most of the existing X86 computing servers are not designed for storage, which have limited disk slots.
  • Only 100GB of memories (including the OS pages) were used, but 1 terabye memories were installed on the machine.
  • The 25Gb/s fiber network adapter was wasted, as we only have 5Gb/s peak traffics.

The tradeoffs between storage reliability and cost optimization

As there is no sliver bullet to eliminate all single point of failures and disasters. In Gitlab, the storage replication is required.

However, as dedicated storage including Clould EBS are zone specific resources, which are redundant in the same available zone, disaster tolerance among regions is not covered in our scope, as Git operations are lantency intensive, replicating data over WANs will slow down the I/O performance.

$2Storage replicationBetween racks(0.01ms)10G, 25G, up to 100Gb/sAmong multi-datacenters(0.1~1ms)up to 100Gb/s (Shared)Between Regions(30ms)Limited bandwidth

Here are some Git hosting solutions.

Gitlab EnterpriseWandisco’s GerritHot standby Gitlab
FreeNNY(OSS)
DocsGitlab Geo-replica.Wandisco’s Multisite solutionHere
Minimum local nodes51 *2
Support local storagesYYN (Dedicated storage required)
High availabilityYYGenerally
Disaster toleranceY (with Gitlab Geo)YN
CommentsGitlay cluster requires an additional PostgresEcosystems changedSwichover is not automatically.

Inplement replication through Gitlab Enterprise

You need to pay an enterprise license to get the premium service.

  • Gitlab Geo(Push-through Proxy) that works between datacenters.
  • Gitaly Cluster(Shard and replication between nodes), leveraging Gitlab’s Gitlay cluster with an additional Postgres and three dedicated storage nodes, while technical support is only for enterprise users.

It works fine on both bare metal and virtualized machines, even with the local storage.

Implement with Wandisco’s Gerrit

By using gerrit, you have to switch the ecosystem from Gitlab to Gerrit. I believe it’s okay to use gerrit if your team is changing from subversion to Git.

Implement centralized storage with the hot standby node

To avoid the computing node disruptions, we believe Gitlab node should never access it’s local storage but access centralized and dedicated storage nodes through fibers, such as a NetApp/OceanStor hardware, or Cinder virtualization platform.

The diagram shows a hot standby architecture, The secondary VM is connecting the primary VM directly with Gitaly client through gRPC, rather than connecting to a disk.

You are allowed to create one more more Secondary VMs to minimize the downtime risk.

$2Region1Primary VMSecondary VM(s)Region2(Disater tolerance)ELBRedis(HA)Postgres(HA)Virtualized DiskRails/NginxGitaly LocalSidekiqRails/NginxGitaly(as client)Backup DiskUsersjobsI/ORPCPerioid Snapshots

Compared with the single node deployment, when the primary VM is down, despite the unavailability of accessing code, the rails application in the secondary node, which is database backended, is keeping running.

  • The administrator can sent a broadcast message to help users know what just happened in thier terminal or webpages.
  • Third party intergration jobs such as group & user synchronization will not get failed.
  • Rather than wait for the infra provider, you might switchover the primary VM to other VMs with disk remount (Gitlab reconfiguration is required).

Summary

There are three approaches that can set up the high availability of Git hosting. The hot standby solution comes with tradeoffs, but it requires no additional license, I hope you find the design useful when deploying Gitlab services.