Go at Scale: Engineering Patterns from Uber, Stripe, and High-Growth Teams

Go at Scale: Engineering Patterns from Uber, Stripe, and High-Growth Teams

Learn how large engineering teams organize Go codebases for scale. Discover Uber's monorepo strategy, dependency management, testing patterns, and workflows that handle thousands of developers.

By Omar Flores

The Problem That Only Big Teams Face

Imagine you are a junior developer at a startup. You have 10 developers. Every morning, someone pushes code. Everyone pulls it. Things work or they break immediately. Fix or revert. Simple.

Now imagine you are at Uber. You have 5,000 developers. Ten platforms. Thousands of services. Millions of lines of code. Code changes happen thousands of times per day across the codebase. Most changes succeed. Some break systems serving hundreds of millions of users.

The question shifts from β€œdoes the code work?” to β€œhow do we prevent disasters when thousands of people touch the same codebase daily?”

Uber engineers solved this problem not with better tools, but with better structure. They built systems around organization, ownership, and verification β€” not complexity.

This guide reveals those patterns. Not theory. Actual strategies used by companies building at massive scale.


Part 1: The Monorepo β€” Uber’s Foundation

Uber does not manage hundreds of separate Git repositories. Uber uses a monorepo β€” a single Git repository containing code for virtually every service, library, and tool.

This sounds chaotic. It is actually the opposite.

Why a Monorepo Makes Sense at Scale

Problem at small scale: Each service in its own repo works fine. But when you have 500 services that depend on each other, managing versions becomes impossible. Every change to a shared library forces coordinated updates across dozens of repositories. It becomes a game of dependency hell.

Solution: One repository. Entire codebase visible. A change to a shared library is immediately visible to all code that uses it. Dependencies are up-to-date automatically. No version mismatches because there is no versioning β€” everything is on the latest commit.

uber-monorepo/
  services/
    auth-service/
    payment-service/
    ride-matching/
    user-platform/
    driver-platform/
  libraries/
    common/
      errors/
      logging/
      metrics/
    storage/
      postgres/
      redis/
      cassandra/
  tools/
    deploy/
    monitoring/
    testing/
  vendor/
    dependencies/

Advantages at scale:

  • Atomic commits β€” You commit a library change and its consumers’ updates at the same time. No partial states.
  • Easier refactoring β€” Rename a function in a library, the compiler tells you every place that breaks. You fix them all in one commit.
  • Dependency transparency β€” See exactly what depends on what. No hidden transitive dependencies.
  • Shared standards β€” One lint configuration, one test standard, one deployment process, everyone follows it.

Challenges:

  • Repository size β€” Uber’s monorepo is hundreds of gigabytes. Cloning it takes time. Diffing is slow.
  • Tool overhead β€” Git can’t handle tens of thousands of files efficiently. Uber uses custom tooling on top of Git.
  • Access control complexity β€” How do you give Person A access to Service B but not Service C when they are in the same repo?

Most companies adopt the monorepo strategy as they grow beyond 100 engineers and 50 services. Before that, separate repos are simpler.

Managing Monorepo Scale

For large monorepos, Uber uses tools like:

Bazel (build system)

  • Replaces go build
  • Understands entire dependency graph
  • Caches build artifacts across the organization
  • Enables building only changed code and its dependents

Phabricator/Arcanist (code review)

  • Understands which files changed
  • Routes reviews to responsible teams
  • Blocks merge if tests fail
  • Enforces ownership rules

Git with filtering

# Only clone the code you need
git clone --filter=blob:none <monorepo-url>
git sparse-checkout set services/my-service/

This reduces clone size from 100GB to what you actually use.


Part 2: Ownership and Responsibility β€” The OWNERS File

In a monorepo with thousands of developers, you cannot have everyone changing everything. You need clear ownership.

Uber uses OWNERS files throughout the codebase:

# services/auth-service/OWNERS
auth-team@uber.com
@mchen
@priya.patel

# services/auth-service/internal/jwt/OWNERS
@mchen (primary)
@priya.patel (backup)

How it works:

  1. Developer submits a code review
  2. System checks which OWNERS files apply
  3. Those people are automatically added as reviewers
  4. Code cannot merge without their approval

Benefits:

  • Clarity β€” Everyone knows who owns what
  • Accountability β€” Clear owner for each piece
  • Knowledge β€” You know who to ask questions about a service
  • Quality gates β€” Experienced people must approve changes

The rule: Every directory has an OWNERS file. No exceptions. No β€œanyone can change this.”


Part 3: Testing at Scale β€” The Test Pyramid

Uber runs millions of tests daily. Not all at once. But the testing strategy creates confidence:

The Three-Level Pyramid

        /\
       /  \
      / E2E \          Integration tests across services
     /______\         (slow, realistic, few tests)
    /        \
   /  Integration\    Tests involving real databases, caches
  /____________\      (medium speed, medium count)
 /              \
/ Unit Tests    \     Tests for individual functions
/______________\ (fast, many tests, instant feedback)

Level 1: Unit Tests (Vast Majority)

Fast. Isolated. No database. No network. Mock everything.

// userservice/user_test.go
func TestCreateUserValidation(t *testing.T) {
	user, err := CreateUser("", "invalid")
	assert.Error(t, err)
	assert.Nil(t, user)
}

Developers run unit tests locally before pushing. Takes seconds. Catches 80% of bugs immediately.

Level 2: Integration Tests

Real database (test database). Real services talking to each other. Slower. Fewer of them.

// auth_test/integration_test.go
func TestAuthServiceWithRealDatabase(t *testing.T) {
	db := setupTestDB()
	defer db.Close()

	svc := auth.NewService(db)
	token, err := svc.GenerateToken(userID)
	assert.NoError(t, err)

	valid := svc.ValidateToken(token)
	assert.True(t, valid)
}

These run in CI after unit tests pass. Takes minutes. Catches integration issues.

Level 3: E2E Tests

Real services deployed in test environment. Real HTTP calls. Extremely slow and expensive.

// Minimal E2E tests
// Only test critical user journeys
// - User signup β†’ verification β†’ login
// - Rider books ride β†’ sees driver β†’ gets rating

Only critical paths. Maybe 50 E2E tests for entire platform. Run on each release, not on every commit.

Test Execution Strategy

Commit pushed
    ↓
Run unit tests (30 sec)
    ↓
    β”œβ”€ PASS β†’ Run integration tests in parallel (5 min)
    └─ FAIL β†’ Notify developer, stop
    ↓
    β”œβ”€ PASS β†’ Run E2E tests on staging (30 min)
    └─ FAIL β†’ Notify developer, stop
    ↓
    β”œβ”€ PASS β†’ Merge to main, trigger deployment
    └─ FAIL β†’ Block merge, investigate

Key principle: Fast feedback for common cases, thorough verification before production.


Part 4: Code Organization β€” The Service Structure

At Uber, each service follows a consistent structure. This uniformity allows any developer to navigate any service.

services/
  payment/
    cmd/
      payment-server/
        main.go              // Service entry point
    internal/
      service/
        payment_service.go   // Core business logic
        refund_service.go
      repository/
        payment_repo.go      // Database layer
      handler/
        payment_http.go      // HTTP handlers
    api/
      v1/
        payment.proto        // API definition (gRPC or OpenAPI)
    config/
      config.go             // Configuration loading
    tests/
      integration/
        payment_test.go      // Integration tests
    Makefile               // Local development
    go.mod
    go.sum

Why this structure:

  • cmd/ β€” Entry points. One service = one binary = one main.go
  • internal/ β€” Private code. Compiler enforces it. Cannot import from another service’s internal/
  • api/ β€” API contracts. Humans and machines read this
  • config/ β€” Centralized configuration
  • tests/ β€” Test structure mirrors production structure

The rule: Every developer can find code in any service using the same mental model. No surprises.


Part 5: Dependency Management at Scale

In a monorepo, dependencies are both a strength and a hazard.

The strength: All code is visible. If Service A depends on Library B, you see that relationship clearly.

The hazard: Circular dependencies. Tight coupling. Complex dependency graphs.

Dependency Rules

Uber enforces strict rules:

1. Acyclic Dependency Graph

Services can depend on libraries and lower-level services. But never circular.

User Service
  ↓
Auth Library
  ↓
Common Library
  ↓
(Nothing above)

Payment Service
  ↓
User Service  ← ALLOWED (depends on lower layer)
  ↓
Common Library

Not allowed:

User Service β†’ Payment Service β†’ User Service ← CYCLE

These are caught by the build system. They prevent building.

2. Explicit Dependencies

Every import must be in go.mod. No implicit dependencies. The build system verifies it.

import (
	"uber/services/user"           // OK - explicit in go.mod
	"uber/libraries/common/errors" // OK - explicit in go.mod
	"some-external/package"        // Must be in go.mod
)

3. Dependency Versioning Strategy

In the monorepo, everything uses HEAD (the latest commit). But external dependencies are pinned.

// go.mod
require (
	github.com/some-library v1.2.3  // Pinned
	github.com/another-lib v2.0.0   // Pinned
)

// Internal: always latest
import "uber/libraries/common"  // Auto-latest

This prevents external library breakage while ensuring internal consistency.


Part 6: Deployment and Rollback

When thousands of developers are pushing code, deployment becomes safety-critical.

Canary Deployments

Instead of deploying to all servers at once, Uber deploys to a small percentage first:

Version 2.0.0
    ↓
Deploy to 1% of servers (canary)
    ↓
Monitor for 5 minutes
    β”œβ”€ Error rate normal? β†’ Continue
    β”œβ”€ Error rate spiked? β†’ Automatic rollback
    └─ Manual inspection needed? β†’ Pause for review
    ↓
Deploy to 25% (still safe to rollback)
    ↓
Monitor for 10 minutes
    β”œβ”€ All good? β†’ Continue
    └─ Issues? β†’ Rollback
    ↓
Deploy to 100% (full rollout)

Key metric: Error rate. If error rate increases more than X% between versions, automatic rollback triggers.

Implementation:

// In deployment system
if currentErrorRate > baselineErrorRate * 1.2 {
    // 20% increase in errors
    rollback()
    notifyOnCall()
    createIncident()
}

Deployment Windows

Uber does not deploy during peak hours. Deployment happens during low-traffic windows when issues are easiest to spot and rollback is fastest.


Part 7: Code Review Culture

Code review is where quality is enforced. At Uber scale, code review must be:

  1. Fast β€” Average review time under 2 hours
  2. Thorough β€” All tests pass before review
  3. Clear β€” Comments explain the why, not the what
  4. Non-blocking β€” Reviewers suggest, not dictate

Code Review Checklist

Reviewers check:

  • Tests cover happy path and error cases
  • No new complexity without justification
  • Dependencies are explicit
  • Performance implications considered
  • Error handling is correct
  • Logging is sufficient for debugging
  • Security implications reviewed (if relevant)
  • Documentation updated

The culture: Disagreements are about ideas, not people. A senior developer deferring to a junior developer’s better solution is celebrated, not seen as weakness.

Handling Large Changes

For large refactorings or architectural changes:

  1. RFC (Request for Comments) β€” Written proposal describing problem, solution, and trade-offs
  2. Discussion β€” Async comments and suggestions
  3. Decision β€” TL makes final call
  4. Implementation β€” Small incremental commits, each reviewed
  5. Rollout β€” Canary deployment with monitoring

This prevents β€œsurprises” where a massive change breaks assumptions people didn’t know existed.


Part 8: Workflow for New Developers

A new engineer at Uber follows this onboarding:

Week 1: Local Setup

# Clone monorepo (sparse checkout, minimal)
git clone --filter=blob:none --sparse <monorepo>

# Check out just their team's code
git sparse-checkout set services/my-team/

# Build and test
bazel build //services/my-team/...
bazel test //services/my-team/...

Week 2: First PR

  • Small bug fix or doc improvement
  • Assigned to experienced reviewer
  • Learn code review culture

Week 3-4: Small Feature

  • Feature in their service only
  • Understanding: how testing works, how to review others’ code

Month 2-3: Cross-Service Feature

  • Feature spanning multiple services
  • Understanding: dependencies, coordination, larger systems

This gradual escalation prevents β€œnew developer breaks prod” while building knowledge.


Part 9: Lessons for Your Team

You don’t need Uber’s scale to benefit from these patterns.

Applicable immediately:

  1. OWNERS files β€” Even for 10 developers, know who owns what
  2. Test pyramid β€” Unit tests fast, integration tests thorough
  3. Consistent structure β€” New service = same directory layout
  4. Explicit dependencies β€” go.mod is always complete and accurate
  5. Code review standards β€” Checklist, async, fast
  6. Dependency acyclicity β€” Check this as you grow

Adopt as you scale:

  1. Monorepo β€” When you have 50+ services
  2. Build system β€” Custom tooling for caching and parallelization
  3. Ownership automation β€” OWNERS files + CI enforcement
  4. Canary deployments β€” Error rate monitoring, automatic rollback
  5. Staged rollouts β€” 1% β†’ 25% β†’ 100%

Part 10: The Uncomfortable Truth

Uber’s infrastructure is not better because Uber has more engineers. Uber has better infrastructure because of decisions made when they were smaller β€” decisions that scaled.

Most companies approach scale reactively. They have 100 developers working on 50 services, each in separate repos, no clear ownership, inconsistent code structure. Then they hire 500 more developers and wonder why everything breaks.

Uber built patterns that scaled from 10 developers to 5,000. That is the insight. Not the tools. Not the budget. The intentional architecture.

And here is the thing: You can adopt these patterns today.

You don’t need Bazel to enforce acyclic dependencies. You need an OWNERS file and someone checking PRs. You don’t need Uber’s deployment system to do canary releases. You need to deploy to 10% first and monitor. You don’t need a monorepo to have consistent code structure. You need a template that every service follows.

The limiting factor is not tools. It is discipline.

The best large engineering teams are not distinguished by their tools or their budget. They are distinguished by their discipline in maintaining clarity when systems grow complex. That discipline, applied early, scales infinitely.

Tags

#go #golang #engineering-at-scale #monorepo #workflow #architecture #uber #large-teams #code-organization #dependency-management #testing #ci-cd #best-practices