Setting up an active-passive Gitlab instance
2024-11-11 / modified at 2024-12-31 / 1.1k words / 7 mins

Gitlab is an open source DevOps platform in many companies. The instruction will set up a hot standby OSS Gitlab in self-hosted environment.

Gitlab is more than a repository hosting

Gitlab Inc ($GTLB) is under the pressure of income growth from stockholders. It releases new features every month so that latest technologies could be applied, including AI chat or kubernetes deployment, increasing the complexity of Gitlab instances.

Besides the repository hosting, Gitlab undertakes production release roles such as a config center, CI/CD workflow or GitOps provider, demanding raised availability in our rapid development.

As a consequence, it is critical for SRE engineers to maintain the high availability of Gitlab instances. The guide will help engineers create hot-standby Git hosting with Gitlab OSS.

Gitlab Architecture

Gitlab is a ruby rails application that is built on top of Postgres, Redis and cloud strorage. Compared with subversion’s focusing on repository hosting, which only requires an apache httpd loadbalancer and shared disk, Gitlab requires more resources while hosting respositories.

Conponent Overall

The diagram shows the minimum components for the repository hosting.

$2GitlabStatefulGitaly: The FS layerSidekiq: Redis based job queueStatelessRails: The RESTful APINginx/WorkhorseDatabasePostgres/RedisS3: for LFS/uploadsDisk(EBS/NVME)

Unrelative components are disables in our team for vulnerabilities concerns.

Ruby components that hinder the shared disks

Before Gitlab 15, we implemented the availability of Gitlab via shared NFS, despite there could be minor latency errors while accessing the filesystem.

Since Gitlab 16, Gitaly which is the filesystem layer of Git that undertakes Git commands, has officially ended of support of NFS, Luster, GlusterFS, multi-attach EBS and EFS. The only supported block storage is cloud storage, such as Cinder, EBS, or VMDK.

Which machine is best for Gitlab self hosting?

We have 3K active users per month on self hosting Gitlab instances, which undertake CI/CD builds (by Jenkins) and source code management. We consistently monitor the Gitlab instances with grafana to find the performance issues.

Cloud VM Specs

According to our statistics, there are some requirements for Gitlab hosting

  • CPU: Gitlab is not a CPU-bound application, as the peak CPU time is during the merge request only. But you may not use the oversold CPU instances, as the network adapter requires better CPU specs.

  • Memory: Gitlab requires memories at a range of 8GB and 128GB. Besides the application memory, the operating system may cache the git repository files into memory pages.

  • Networking: Gitlab requires high I/O speed during the code pull and push

    • Without LFS: When the Git LFS was not enabled or the traffic was redirected to third party S3 storages, we observed 5Gb/s peak traffics during the pipeline builds.
    • With LFS: If the LFS storage is proxied or hosted by a Gitlab instance, a 10Gb/s card is required.
  • Local storage: Gitlab requires high performance storage during the merge request, the cloud SSD disk is recommended.

In summary, I’d recommend deploying on an instance that has 4U/32G or 8U/64G machine with large local cloud SSD disks.

Bare Metal Specs

Besides the virtual machine, we also tested deploying instances on bare metals and found that running on bare metals was not as efficient as we thought, as all resources were not utilized effectively.

Local storage inefficiency && Low-density datacenter
  • We must manage RAID5 disks at a foundametal level.
  • There are only 8 disk slots in existing computing nodes, while we already bought dedicated storages.
Memory inefficiency

Only 100GB of memories (including the OS pages) were used, but 1TB/512GB memories were installed on the machine.

Network inefficiency

The 25Gb/s fiber network adapter was wasted, as we only have 5Gb/s peak traffics.

Which components could be disabled?

To keep the repository hosting with high availability, we apply the segregation of duties(SoD) principal on Gitlab. Features such as CI/CD, GitOps, KAS(Kuberenetes management), or artifacts are disabled in our self-hosting Gitlab. Instead, we opt other well-known alternatives (JFrog, Jenkins, Gerrit) to maintain the simplicity of Gitlab instances.

Git hosting solutions

Here are some Git hosting solutions.

Gitlab EnterpriseWandisco’s GerritHot standby Gitlab
LicenseCommercialCommercialOSS
DocsGitlab Geo-replica.Wandisco’s Multisite solutionHere
Minimum local nodes51 *2
Support local storagesYYN (Dedicated storage required)
High availabilityYY(Paxo)Generally
Disaster toleranceY (with Gitlab Geo)Y(replicating over WANs)N
CommentsGitlay cluster requires an additional PostgresEcosystems changedSwichover is not automatically.

Inplement replication through Gitlab Enterprise

You need to pay enterprise licenses to get the premium service.

  • Gitlab Geo(Push-through Proxy) that works between datacenters.
  • Gitaly Cluster(Shard and replication between nodes), leveraging Gitlab’s Gitlay cluster with an additional Postgres and three dedicated storage nodes, while technical support is only for enterprise users.

It works fine on both bare metal and virtualized machines, even with the local storage.

Implement with Wandisco’s Gerrit

By using gerrit, you have to switch the ecosystem from Gitlab to Gerrit. I believe it’s okay to use gerrit if your team is switching from subversion to Git.

Implement centralized storage with the hot standby node

To avoid the computing node disruptions, we believe Gitlab node should never access it’s local storage but access centralized and dedicated storage nodes through fibers, such as a NetApp/OceanStor hardware, or Cinder virtualization platform.

The diagram shows a hot standby architecture, The secondary VM is connecting the primary VM directly with Gitaly client through gRPC, rather than connecting to a disk.

The component sidekiq, a distributed job executor, is disabled in the secondary node, as we found when users want to retrieve some temporary file generated by a background job, the Nginx couldn’t assure the request is routed into the same node.

You are allowed to create one more more Secondary VMs to minimize the downtime risk.

$2Region1Primary VMSecondary VM(s)Region2(Disater tolerance)ELBRedis(HA)Postgres(HA)Virtualized DiskRails/NginxGitaly LocalSidekiqRails/NginxGitaly(as client)Backup DiskUsersjobsI/ORPCPerioid Snapshots

Compared with the single node deployment, when the primary VM is down, despite the unavailability of accessing code, the rails application in the secondary node, which is database backended, is keeping running.

  • The administrator can sent a broadcast message to help users know what just happened in thier terminal or webpages.
  • Third party intergration jobs such as group & user synchronization will not get failed.
  • Rather than wait for the infra provider, you might switchover the primary VM to other VMs with disk remount (Gitlab reconfiguration is required, root user is required).

Summary

There are three approaches that can set up the high availability of Git hosting. As there is no sliver bullet to eliminate all single point of failures and disasters, the hot standby solution comes with tradeoffs, but it requires no additional license, I hope you find the design useful when deploying Gitlab services.