Real-Time Features

This guide contains instructions on how to safely roll out new real-time features.

Real-time features are implemented using GraphQL Subscriptions. Developer documentation is available.

WebSockets are a relatively new technology at GitLab, and supporting them at scale introduces some challenges. For that reason, new features should be rolled out using the instructions below.

Reuse an existing WebSocket connection

Features reusing an existing connection incur minimal risk. Feature flag rollout is recommended to give more control to self-hosting customers. However, it is not necessary to roll out in percentages, or to estimate new connections for

Introduce a new WebSocket connection

Any change that introduces a WebSocket connection to part of the GitLab application incurs some scalability risk, both to nodes responsible for maintaining open connections and on downstream services; such as Redis and the primary database.

Estimate peak connections

The first real-time feature to be fully enabled on was real-time assignees. By comparing peak throughput to the issue page against peak simultaneous WebSocket connections it is possible to crudely estimate that each 1 request per second adds approximately 4200 WebSocket connections.

To understand the impact a new feature might have, sum the peak throughput (RPS) to the pages it originates from (n) and apply the formula:

(n * 4200) / peak_active_connections

Current active connections are visible on this Grafana chart.

This calculation is crude, and should be revised as new features are deployed. It yields a rough estimate of the capacity that must be supported, as a proportion of existing capacity.

Graduated roll-out

New capacity may need to be provisioned to support your changes, depending on current saturation and the proportion of new connections required. While Kubernetes makes this relatively easy in most cases, there remains a risk to downstream services.

To mitigate this, ensure that the code establishing the new WebSocket connection is feature flagged and defaulted to off. A careful, percentage-based roll-out of the feature flag ensures that effects can be observed on the WebSocket dashboard

  1. Create a feature flag roll-out issue.
  2. Add the estimated new connections required under the What are we expecting to happen section.
  3. Copy in a member of the Plan and Scalability teams to estimate a percentage-based roll-out plan.

Backward compatibility

For the duration of the feature flag roll-out and indefinitely thereafter, real-time features must be backward-compatible, or at least degrade gracefully. Not all customers have Action Cable enabled, and further work needs to be done before Action Cable can be enabled by default.

Making real-time a requirement represents a breaking change, so the next opportunity to do this is version 15.0.

Enable Real-Time by default

Mounting the Action Cable library adds minimal memory footprint. However, serving WebSocket requests introduces additional memory requirements. For this reason, enabling Action Cable by default requires additional work; perhaps to reduce overall memory usage, including a known issue with Workhorse, but at least to revise Reference Architectures.

Real-time infrastructure on

On, WebSocket connections are served from dedicated infrastructure, entirely separate from the regular Web fleet and deployed with Kubernetes. This limits risk to nodes handling requests but not to shared services. For more information on the WebSockets Kubernetes deployment see this epic.