Setting up an active-passive GitLab instance
2024-11-11 / modified at 2025-07-13 / 1.4k words / 8 mins

GitLab is an open-source DevOps platform widely adopted in many companies. This instruction outlines the setup of a hot standby (active-passive) GitLab instance.

Key Considerations

  • To maintain segregation of duties and adhere to vendor-neutral technology stack, we’re utilizing a reduced-feature set (dumb down) GitLab configuration, focusing on core repository hosting functionalities.
  • We’re introducing an active-passive deployment that eliminates the need for NFS.

GitLab: More Than Just a Repository Host

GitLab Inc ($GTLB) is under pressure from stockholders to demonstrate income growth, leading to a rapid release cycle of new features. While these additions can be beneficial, they also increase the complexity of GitLab instance. requiring platform engineers to manage a broader range of components, disable potentially vulnerable features, and maintain high availability. The guide details how to create hot standby Git hosting solution using GitLab OSS.

Choosing the Right Hardware for GitLab Self-Hosting

We support approximately 3,000 active users per month performing code commits and CI/CD builds (by Jenkins) on self-hosting GitLab instances. We continuously evaluate GitLab performance across various cloud and bare metal environments.

Cloud VM Specs

According to our statistics, the following specifications are recommended for GitLab hosting.

  • CPU: GitLab is not inherently CPU-intensive, with peak load occurring during merge request processing. However, avoid oversold CPU, as high speed I/O network adapter requires better CPU specs.
  • Memory: GitLab requires between 8 GB and 128 GB of RAM. In addition to the application memory, the operating system may cache Git repository files into memory pages.
  • Networking: GitLab requires high-speed I/O during pull and push operations.
    • Without LFS: We’re observed peak traffic of 5 Gb/s during pipeline builds.
    • With LFS: If LFS storage is proxied or hosted by a GitLab instance, a 10 Gb/s networking card is recommended.
    • Local storage: High-performance storage (e.g., clould SSD disks) is crucial for optimal performance during merge request processing.

Bare Metal Considerations

While bare metal deployments offer potential benefits, our testing indicates that they are not always as efficient as virtualized environments.

Storage Scalability

Manage RAID5 disks at a fundamental level presents a barrier for non-IDC engineers. Furthermore, the limited disk slots (typically 8) on existing computing nodes, combined with our existing dedicated enterprise storage, makes bare metal expansion less attractive.

Memory Utilization

Significant RAM (e.g., 1TB/512G installed) may be underutilized, as only 100 GB of RAM (including the OS pages) is consumed by the application.

Network inefficiency

A 25 Gb/s fiber network adapter may be underutilized if peak traffic remains bellow 5 Gb/s.

In summary, we recommend deploying GitLab on instances with 4U/32G or 8U/64G configurations and large local cloud SSD disks.

GitLab Architecture

GitLab is a Ruby Rails application built on top of Postgres, Redis, and cloud storage. Compared with subversion’s focus solely on repository hosting, which requires only an Apache httpd loadbalancer and shared disk, GitLab demands more resources.

Component Overall

The diagram shows the minimum components for the repository-hosting.

$2GitLabStatefulGitaly: The FS layerSidekiq: Redis-based job queueStatelessRails: The RESTful APINginx/WorkhorseDatabasePostgres/RedisS3: for LFS/uploadsDisk(EBS/NVME)

NFS: End of support

Prior to GitLab 15, we implemented GitLab availability using shared NFS, despite occasional latency errors while accessing the filesystem.

Since GitLab 16, Gitaly, the filesystem layer of Git, has officially ended supporting for shared disk, including NFS, Luster, GlusterFS, multi-attach EBS, and EFS. The only supported storage is block storage, such as Cinder, EBS, or VMDK.

Which components could be disabled?

Due to segregation of duties (SoD) and vulnerabilities concerns, we opt to keep simplicity on GitLab. Features such as CI/CD, GitOps, KAS(Kubernetes management), or artifacts are disabled in our self-hosted GitLab. Instead, we opt for other well-known alternatives (JFrog, Jenkins) to reduce the workload on GitLab.

Git high availability hosting solutions

Here are some Git hosting solutions.

GitLab EnterpriseWandisco’s GerritHot standby GitLab
LicenseCommercialCommercialOSS
DocsGitLab Geo-replica.Wandisco’s Multisite solutionHere
Minimum Nodes532
Local Storages SupportYesYesN (Centralized storage required)
AZ ReplicationYes(Gitaly cluster)Yes(PAXOS)Partial
Regional ReplicationYes(GitLab Geo)Yes(WAN replication)No
CommentsGitaly cluster requires an additional PostgresRequire ecosystem migrationSwichover is not automatically.

Implement replication through GitLab Enterprise

You need to purchase enterprise licenses to access these premium services.

  • GitLab Geo (Push-through Proxy) facilitates communication between regional datacenters.
  • Gitaly Cluster (Shard and replication between nodes), leveraging GitLab’s Gitlay cluster with an additional Postgres and three dedicated storage nodes, while technical support is only for enterprise users.

It works fine on both bare metal and virtual machines, even with local storage. The issue remains here is that there will still be downtime during the upgrade.

Implement with Wandisco’s Gerrit

Wandisco introduced PAXOS on Git, which is a consensus algorithm for distributed systems, to replicate repositories in multiple continents. The performance has been proven over decades of use in their subversion product.

To use Gerrit by Wandisco, consider the following:

  • Licenseing fees: you need to purchase a license to receive their support.
  • Ecosystem changed: you have to switch from GitLab to Gerrit. This may be acceptable if your team is migrating from subversion to Git.

Implement centralized storage with the hot standby node

To avoid the disruptions to computing nodes, we believe each node should access dedicated storage nodes through fiber connections, such as a NetApp/OceanStor hardware, or Cinder virtualization platform.

The diagram shows a hot standby architecture along with disaster tolerance in another region.

  • In Region 1, there are two nodes. the secondary node is connecting the primary VM directly with Gitaly client through gRPC, rather than connecting to a local disk.
  • For backup in Region 2, leveraging storage level replication is sufficient. Incremental backup tools like AWS Backup or Restic is acceptable. There could be a minor risk of data loss while transactional data is being uploaded during snapshot progress, but it’s acceptable.
$2Region1Primary VM(16U64G)Secondary VM(4U8G)Region2(Disaster tolerance)ELBRedis(HA)Postgres(HA)Virtualized DiskS3Rails/NginxGitaly LocalSidekiqRails/NginxGitaly(as client)Backup S3Users3/4 traffics1/4 trafficsCloud RegionalReplicationjobsI/ORPCRestic snapshots

Here is more details explanation:

  • The Sidekiq component, a distributed job executor, is disabled on the secondary node, as we found that when users attempt to retrieve a temporary file generated by a background job, Nginx cannot reliably routed the request to the same node.
  • All services should be protected within private subnets.
  • You can even create more secondary machines to minimize the downtime risk.

Compared with the single node deployment, when the primary VM is unavailable, the Rails application on the secondary node, which is database-backed, continues to run.

  • The administrator can send broadcast messages to inform users about the outage in their terminal or webpages.
  • Third-party integration jobs such as group and user synchronization remain functional.
  • Instead of wait for the infrastructure provider, you might manually switch over the primary VM to other VMs by remounting the disk (GitLab reconfiguration is required, root access is required).

GitLab’s version policy

Unlike Jenkins or Sonarqube, there is no LTS version of GitLab on security & bug fixes, we must keep the instance up-to-date every three months. However, according to GitLab’s version policy, new releases may include new features, as we all know, no one can ensure new features are free from vulnerabilities. That is to say, we are in a frequently upgrade cycle.

  • New features can be an experimental version (e.g., AI chat, Kubernetes deployment) for an extended period, you might wait for iteration.

  • The upgrade requires extensive preparations every three months.

    • It requires thorough test of any platforms that integrate with GitLab API, in case of significant API changes.
    • It requires a downtime window for backuping databases and repositories.

Summary

While no solution elimates all single points of failure, the hot-standby solution offers a license-free option with tradeoffs, I hope you find the design useful when deploying GitLab services.