Available reference architectures
- GitLab package (Omnibus)
- Cloud native hybrid
Deciding which architecture to use
Requirements
Recommended cloud providers and services
- Recommendation notes for the database services
- Recommendation notes for Azure
Validation and test results
Cost to run
Deviating from the suggested reference architectures

Reference architectures

The GitLab Reference Architectures have been designed and tested by the GitLab Quality and Support teams to provide recommended deployments at scale.

Available reference architectures

Depending on your workflow, the following recommended reference architectures may need to be adapted accordingly. Your workload is influenced by factors including how active your users are, how much automation you use, mirroring, and repository/change size. Additionally, the displayed memory values are provided by GCP machine types. For different cloud vendors, attempt to select options that best match the provided architecture.

GitLab package (Omnibus)

The following reference architectures, where the GitLab package is used, are available:

Cloud native hybrid

The following Cloud Native Hybrid reference architectures, where select recommended components can be run in Kubernetes, are available:

Deciding which architecture to use

The Reference Architectures are designed to strike a balance between two important factors–performance and resilience.

While they are designed to make it easier to set up GitLab at scale, it can still be a challenge to know which one meets your requirements.

As a general guide, the more performant and/or resilient you want your environment to be, the more complex it is.

This section explains the designs you can choose from. It begins with the least complexity, goes to the most, and ends with a decision tree.

Backups

For environments serving 2,000 or fewer users we generally recommend that an automated backup strategy is used instead of HA.

Depending on your setup and requirements, this can include configuring backups on any external services you may be using, such as Object Storage (AWS S3 / Google Cloud Storage) or Postgres (AWS RDS / Google Cloud SQL) backups for further resilience.

Backups can provide a good level of RPO / RTO while avoiding the complexities that come with HA.

High Availability (HA)

High Availability ensures every component in the GitLab setup can handle failures through various mechanisms. To achieve this however is complex, and the environments required can be sizable.

For environments serving 3,000 or more users we generally recommend that a HA strategy is used as at this level outages have a bigger impact against more users. All the architectures in this range have HA built in by design for this reason.

For users who still need to have HA for a lower number of users this can also be achieved with an adjusted 3K architecture.

Do you need High Availability (HA)?

As mentioned above, achieving HA does come at a cost. The environment’s required are sizable as each component needs to be multiplied, which comes with additional actual and maintenance costs.

For a lot of our customers with fewer than 3,000 users, we’ve found a backup strategy is sufficient and even preferable. While this does have a slower recovery time, it also means you have a much smaller architecture and less maintenance costs as a result.

In general then, we’d only recommend you employ HA in the following scenarios:

When you have 3,000 or more users.
When GitLab being down would critically impact your workflow.

Zero Downtime Upgrades

Zero Downtime Upgrades are available for standard Reference Architecture environments with HA (Cloud Native Hybrid is not supported at this time). This allows for an environment to stay up during an upgrade, but the process is more complex as a result and has some limitations as detailed in the documentation.

When going through this process it’s worth noting that there may still be brief moments of downtime when the HA mechanisms tale effect.

In most cases the downtime required for doing an upgrade in general shouldn’t be substantial, so this is only recommended if it’s a key requirement for you.

Cloud Native Hybrid (Kubernetes HA)

As an additional layer of HA resilience you can deploy select components in Kubernetes, known as a Cloud Native Hybrid Reference Architecture.

This is an alternative and more advanced setup compared to a standard Reference Architecture. Running services in Kubernetes is well known to be complex. This setup is only recommended if you have strong working knowledge and experience in Kubernetes.

GitLab Geo (Cross Regional Distribution / Disaster Recovery)

With GitLab Geo you can have both distributed environments in different regions and a full Disaster Recovery (DR) setup in place. With this setup you would have 2 or more separate environments, with one being a primary that gets replicated to the others. In the rare event the primary site went down completely you could fail over to one of the other environments.

This is an advanced and complex setup and should only be undertaken if you have DR as a key requirement. Decisions then on how each environment are configured would also need to be taken, such as if each environment itself would be the full size and / or have HA.

Decision Tree

Below you can find the above guidance in the form of a decision tree. It’s recommended you read through the above guidance in full first before though.

%%{init: { 'theme': 'base' } }%% graph TD L1A(What Reference Architecture should I use?) --> L2A(More than 3000 users?) L2A -->|No| L3A("<a href=#do-you-need-high-availability-ha>Do you need HA?</a> (or Zero-Downtime Upgrades)") --> |Yes| L4A>Recommendation 3K architecture with HA including supported modifications] L3A -->|No| L4B>Recommendation Architecture closest to user count with Backups] L2A -->|Yes| L3B[Do you have experience with and want additional resilience with select components in Kubernetes?] L3B -->|No| L4C>Recommendation Architecture closest to user count with HA] L3B -->|Yes| L4D>Recommendation Cloud Native Hybrid architecture closest to user count] L5A("<a href=#gitlab-geo-cross-regional-distribution-disaster-recovery>Do you need cross regional distribution or disaster recovery?"</a>) --> |Yes| L6A>Additional Recommendation GitLab Geo] L4A -.- L5A L4B -.- L5A L4C -.- L5A L4D -.- L5A classDef default fill:#FCA326 linkStyle default fill:none,stroke:#7759C2

Requirements

Before implementing a reference architecture, refer to the following requirements and guidance.

Supported CPUs

These reference architectures were built and tested on Google Cloud Platform (GCP) using the Intel Xeon E5 v3 (Haswell) CPU platform as a baseline (Sysbench benchmark).

Newer, similarly-sized CPUs are supported and may have improved performance as a result. For Omnibus GitLab environments, ARM-based equivalents are also supported.

Any “burstable” instance types are not recommended due to inconsistent performance.

Supported infrastructure

As a general guidance, GitLab should run on most infrastructure such as reputable Cloud Providers (AWS, GCP, Azure) and their services, or self managed (ESXi) that meet both:

The specifications detailed in each reference architecture.
Any requirements in this section.

However, this does not constitute a guarantee for every potential permutation.

See Recommended cloud providers and services for more information.

Additional workloads

These reference architectures have been designed and tested for standard GitLab setups with good headroom in mind to cover most scenarios. However, if any additional workloads are being added on the nodes, such as security software, you may still need to adjust the specs accordingly to compensate.

This also applies for some GitLab features where it’s possible to run custom scripts, for example server hooks.

As a general rule, you should have robust monitoring in place to measure the impact of any additional workloads to inform any changes needed to be made.

No swap

Swap is not recommended in the reference architectures. It’s a failsafe that impacts performance greatly. The reference architectures are designed to have memory headroom to avoid needing swap.

Large repositories

The relevant reference architectures were tested with repositories of varying sizes that follow best practices.

However, large repositories or monorepos (several gigabytes or more) can significantly impact the performance of Git and in turn the environment itself if best practices aren’t being followed such as not storing binary or blob files in LFS.

Repositories are at the core of any environment and the consequences can be wide-ranging when they are not optimized. Some examples of this impact include:

Git packing operations taking longer and consuming high CPU and memory resources.
Git checkouts taking longer that affect both users and CI/CD pipelines alike.

As such, large repositories come with notable cost and typically require more resources to handle, (significantly more in some cases). You should review large repositories to ensure they maintain good health and reduce their size wherever possible.

If best practices aren’t followed and large repositories are present on the environment, increased Gitaly specs may be required to ensure stable performance.

Refer to the Managing large repositories documentation for more information and guidance.

Praefect PostgreSQL

Praefect requires its own database server and that to achieve full High Availability, a third-party PostgreSQL database solution is required.

We hope to offer a built in solutions for these restrictions in the future but, in the meantime, a non HA PostgreSQL server can be set up using Omnibus GitLab, the specifications reflect. Refer to the following issues for more information:

Recommended cloud providers and services

The following lists are non-exhaustive. Generally, other cloud providers not listed here likely work with the same specs, but this hasn’t been validated. Additionally, when it comes to other cloud provider services not listed here, it’s advised to be cautious as each implementation can be notably different and should be tested thoroughly before production use.

Through testing and real life usage, the Reference Architectures are validated and supported on the following cloud providers:

Reference Architecture	GCP	AWS	Azure	Bare Metal
Omnibus	🟢	🟢	🟡¹	🟢
Cloud Native Hybrid	🟢	🟢

We only recommend smaller setups (up to 2k) at this time on Azure due to performance issues at larger scales. See the Recommendation Notes for Azure section for more info.

Additionally, the following cloud provider services are validated and supported for use as part of the Reference Architectures:

Cloud Service	GCP	AWS	Bare Metal
Object Storage	🟢 Cloud Storage	🟢 S3	🟢 MinIO
Database	🟢 Cloud SQL	🟢 RDS
Redis		🟢 ElastiCache

Recommendation notes for the database services

When selecting a database service, it should run a standard, performant, and supported version of PostgreSQL with the following features:

Read Replicas for Database Load Balancing.
Cross Region replication for GitLab Geo.

Several cloud provider services are known not to support the above or have been found to have other issues and aren’t recommended:

Amazon Aurora is incompatible and not supported. See 14.4.0 for more details.
Azure Database for PostgreSQL Single Server (Single / Flexible) is strongly not recommended for use due to notable performance / stability issues or missing functionality. See Recommendation Notes for Azure for more details.
Google AlloyDB and Amazon RDS Multi-AZ DB cluster have not been tested and are not recommended. Both solutions are specifically not expected to work with GitLab Geo.
- Amazon RDS Multi-AZ DB instance is a separate product and is supported.

Recommendation notes for Azure

Due to performance issues that we found with several key Azure services, we only recommend smaller architectures (up to 2k) to be deployed to Azure. For larger architectures, we recommend using another cloud provider.

In addition to the above, you should be aware of the additional specific guidance for Azure:

We outright strongly do not recommend Azure Database for PostgreSQL Single Server specifically due to significant performance and stability issues found. For GitLab 14.0 and higher the service is not supported due to it only supporting up to PostgreSQL 11.
- A new service, Azure Database for PostgreSQL Flexible Server has been released. Internal testing has shown that it does look to perform as expected, but this hasn’t been validated in production, so generally isn’t recommended at this time. Additionally, as it’s a new service, you may find that it’s missing some functionality depending on your requirements.
Azure Blob Storage has been found to have performance limits that can impact production use at certain times. However, this has only been seen in our largest architectures (25k+) so far.

Validation and test results

The Quality Engineering team does regular smoke and performance tests for the reference architectures to ensure they remain compliant.

Why we perform the tests

The Quality Department has a focus on measuring and improving the performance of GitLab, as well as creating and validating reference architectures that self-managed customers can rely on as performant configurations.

For more information, see our handbook page.

How we perform the tests

Testing occurs against all reference architectures and cloud providers in an automated and ad-hoc fashion. This is done by two tools:

The GitLab Environment Toolkit for building the environments.
The GitLab Performance Tool for performance testing.

Network latency on the test environments between components on all Cloud Providers were measured at <5 ms. This is shared as an observation and not as an implicit recommendation.

We aim to have a “test smart” approach where architectures tested have a good range that can also apply to others. Testing focuses on 10k Omnibus on GCP as the testing has shown this is a good bellwether for the other architectures and cloud providers as well as Cloud Native Hybrids.

The Standard Reference Architectures are designed to be platform-agnostic, with everything being run on VMs via Omnibus GitLab. While testing occurs primarily on GCP, ad-hoc testing has shown that they perform similarly on hardware with equivalent specs on other Cloud Providers or if run on premises (bare-metal).

Testing on these reference architectures is performed with the GitLab Performance Tool at specific coded workloads, and the throughputs used for testing are calculated based on sample customer data. Select the reference architecture that matches your scale.

Each endpoint type is tested with the following number of requests per second (RPS) per 1,000 users:

API: 20 RPS
Web: 2 RPS
Git (Pull): 2 RPS
Git (Push): 0.4 RPS (rounded to the nearest integer)

How to interpret the results

Read our blog post on how our QA team leverages GitLab performance testing tool.

Testing is done publicly, and all results are shared.

The following table details the testing done against the reference architectures along with the frequency and results. Additional testing is continuously evaluated, and the table is updated accordingly.

Reference Architecture	GCP (* also proxy for Bare-Metal)		AWS		Azure
Reference Architecture	Omnibus	Cloud Native Hybrid	Omnibus	Cloud Native Hybrid	Omnibus
1k	Weekly
2k	Weekly
3k	Weekly			Weekly
5k	Weekly
10k	Daily	Weekly	Weekly	Weekly	Ad-Hoc
25k	Weekly				Ad-Hoc
50k	Weekly		Ad-Hoc (inc Cloud Services)

Cost to run

The following table details the cost to run the different reference architectures across GCP, AWS, and Azure. Bare-metal costs are not included here as it varies widely depending on each customer configuration.

Reference Architecture	GCP		AWS		Azure
Reference Architecture	Omnibus	Cloud Native Hybrid	Omnibus	Cloud Native Hybrid	Omnibus
1k	Calculated cost		Calculated cost		Calculated cost
2k	Calculated cost		Calculated cost		Calculated cost
3k	Calculated cost		Calculated cost		Calculated cost
5k	Calculated cost		Calculated cost		Calculated cost
10k	Calculated cost		Calculated cost		Calculated cost
25k	Calculated cost		Calculated cost		Calculate cost
50k	Calculated cost		Calculated cost		Calculated cost

Deviating from the suggested reference architectures

As a general guideline, the further away you move from the Reference Architectures, the harder it is to get support for it. With any deviation, you’re introducing a layer of complexity that adds challenges to finding out where potential issues might lie.

The reference architectures use the official GitLab Linux packages (Omnibus GitLab) or Helm Charts to install and configure the various components. The components are installed on separate machines (virtualized or bare metal), with machine hardware requirements listed in the “Configuration” column and equivalent VM standard sizes listed in GCP/AWS/Azure columns of each available reference architecture.

Running components on Docker (including Compose) with the same specs should be fine, as Docker is well known in terms of support. However, it is still an additional layer and may still add some support complexities, such as not being able to run strace easily in containers.

Other technologies, like Docker swarm are not officially supported, but can be implemented at your own risk. In that case, GitLab Support is not able to help you.

Reference architectures

Available reference architectures

GitLab package (Omnibus)

Cloud native hybrid

Deciding which architecture to use

Backups

High Availability (HA)

Do you need High Availability (HA)?

Zero Downtime Upgrades

Cloud Native Hybrid (Kubernetes HA)

GitLab Geo (Cross Regional Distribution / Disaster Recovery)

Decision Tree

Requirements

Supported CPUs

Supported infrastructure

Additional workloads

No swap

Large repositories

Praefect PostgreSQL

Recommended cloud providers and services

Recommendation notes for the database services

Recommendation notes for Azure

Validation and test results

Why we perform the tests

How we perform the tests

How to interpret the results

Cost to run

Deviating from the suggested reference architectures

ヘルプとご意見