CI/CD Patterns Quick Reference#
The canonical pipeline#
Every modern CI pipeline should have these stages, in order:
Lint / format check — cheap, catches ~20% of problems.
Type check (if typed language) — catches another 30%.
Unit tests — fast feedback, runs on every commit.
Integration tests — slower, may need a database/network.
Build — compile / bundle / package the artifact.
Container image build and push (if deploying containers).
Security scan — SAST, dependency audit, container scan.
Deploy to staging — automatic on merge to main.
Smoke tests on staging — prove the deploy actually works.
Deploy to prod — manual approval gate.
Stages 1–4 should complete in under 5 minutes. If they don’t, developers stop running them locally and start pushing broken code.
GitLab CI example#
# .gitlab-ci.yml
stages:
- lint
- test
- build
- deploy
variables:
PYTHON_VERSION: "3.11"
default:
image: python:${PYTHON_VERSION}-slim
cache:
key:
files:
- pyproject.toml
- uv.lock
paths:
- .uv-cache/
before_script:
- pip install uv
- uv sync --frozen
lint:
stage: lint
script:
- uv run ruff check .
- uv run ruff format --check .
typecheck:
stage: lint
script:
- uv run mypy src/
test:
stage: test
services:
- postgres:16-alpine
variables:
POSTGRES_PASSWORD: test
DATABASE_URL: postgresql://postgres:test@postgres:5432/test
script:
- uv run pytest --cov=src --cov-report=term --cov-report=xml
coverage: '/^TOTAL.+?(\d+\%)$/'
artifacts:
reports:
coverage_report:
coverage_format: cobertura
path: coverage.xml
build:
stage: build
image: docker:24
services:
- docker:24-dind
rules:
- if: $CI_COMMIT_BRANCH == "main"
script:
- docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA .
- docker tag $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA $CI_REGISTRY_IMAGE:latest
- echo $CI_REGISTRY_PASSWORD | docker login -u $CI_REGISTRY_USER --password-stdin $CI_REGISTRY
- docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
- docker push $CI_REGISTRY_IMAGE:latest
deploy_staging:
stage: deploy
rules:
- if: $CI_COMMIT_BRANCH == "main"
environment:
name: staging
url: https://staging.example.com
script:
- ./scripts/deploy.sh staging $CI_COMMIT_SHA
deploy_prod:
stage: deploy
rules:
- if: $CI_COMMIT_BRANCH == "main"
when: manual
environment:
name: production
url: https://example.com
script:
- ./scripts/deploy.sh prod $CI_COMMIT_SHA
GitHub Actions equivalent#
# .github/workflows/ci.yml
name: CI
on:
push:
branches: [main]
pull_request:
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: astral-sh/setup-uv@v3
- run: uv sync --frozen
- run: uv run ruff check .
- run: uv run ruff format --check .
test:
runs-on: ubuntu-latest
needs: lint
services:
postgres:
image: postgres:16-alpine
env:
POSTGRES_PASSWORD: test
ports: ["5432:5432"]
options: >-
--health-cmd pg_isready
--health-interval 5s
steps:
- uses: actions/checkout@v4
- uses: astral-sh/setup-uv@v3
- run: uv sync --frozen
- run: uv run pytest --cov=src
env:
DATABASE_URL: postgresql://postgres:test@localhost:5432/postgres
build-and-push:
runs-on: ubuntu-latest
needs: test
if: github.ref == 'refs/heads/main'
permissions:
contents: read
packages: write
steps:
- uses: actions/checkout@v4
- uses: docker/setup-buildx-action@v3
- uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- uses: docker/build-push-action@v6
with:
push: true
tags: |
ghcr.io/${{ github.repository }}:${{ github.sha }}
ghcr.io/${{ github.repository }}:latest
cache-from: type=gha
cache-to: type=gha,mode=max
Secrets management#
Never commit secrets to Git. Options in order of preference:
Cloud-native identity — GitHub Actions OIDC → AWS IAM assume role, no long-lived keys at all.
Secret manager integration — AWS Secrets Manager, HashiCorp Vault, Infisical, injected at job start.
CI-native secrets — GitLab CI variables, GitHub Actions secrets. Easy, still much better than Git.
Never — environment variables committed to the repo.
Caching tips#
Cache the dependency lockfile’s output, not its input.
Use a content-addressed cache key (hash of
uv.lock,pnpm-lock.yaml, etc.) so cache invalidates automatically on changes.Cache build tool output (
ruff,mypy,tscincremental files) for 2–5× speedup on incremental runs.
Zero-downtime deploys#
For containerized services, the deploy step should:
Push the new image with a unique tag (commit SHA, never
latest).Update the Kubernetes Deployment / ECS service / Cloud Run revision to point at the new tag.
Use a rolling strategy with a readiness probe so old pods drain only when new ones pass
/ready.Verify with a smoke test against the new version before marking success.
Keep the previous version’s image available for fast rollback.
Common mistakes#
Slow pipelines — anything over 15 minutes and devs stop waiting, start pushing broken code, and the signal value collapses.
Flaky tests — one flaky test ruins the entire feedback loop. Quarantine or fix aggressively.
Everything in one job — makes failures hard to diagnose. Split stages.
No branch protection — merge buttons that don’t require CI to pass defeat the whole point.
Manual steps hidden in runbooks — if it’s not in the pipeline, it will drift.
Practice#
1. Build the canonical pipeline#
Take a small Python FastAPI service. Write a .gitlab-ci.yml (or
.github/workflows/ci.yml) that implements the full canonical
pipeline from the documentation page: lint → typecheck → test → build
→ deploy (to a fake staging target).
Target: total pipeline under 5 minutes on cache hit.
2. Matrix test against multiple Python versions#
Run the test suite against Python 3.11, 3.12, and 3.13 in parallel.
Use parallel: matrix: (GitLab) or strategy.matrix (GitHub). Verify
the pipeline summary shows all three as independent jobs.
3. Cache hit rate#
Run your pipeline twice. Measure total time and cache hit rate on the second run. If cache hit rate isn’t >80%, your cache key is wrong — fix it.
4. Secret rotation#
Add a secret (e.g., a fake API key) to the CI-native secret store. Use it in a job via an environment variable. Rotate it — confirm the next pipeline run uses the new value without any code change.
Bonus: migrate the same secret to a cloud secret manager and inject it via OIDC instead of a static CI variable.
5. Manual approval gate#
Add a deploy_prod job that requires manual approval via GitLab
when: manual (or GitHub Actions environments: with required
reviewers). Confirm the pipeline pauses and waits for a human to
click the button before running the deploy step.
6. Flake detection#
Deliberately introduce a flaky test (random assert random.random() > 0.3).
Run the pipeline 10 times. Configure the pipeline to retry failing tests
once and report the flake. Then fix or quarantine it.
Review Questions#
What is the target wall-clock time for the lint-through-integration-test phase of a CI pipeline?
A. Under 60 minutes
B. Under 15 minutes; ideally under 5
C. Under 2 hours
D. There is no target
Why is
latesta dangerous tag for container images in a deploy pipeline?A. It’s slower to pull
B. It’s mutable — you cannot reliably roll back or identify what is running in production
C. It’s banned by Docker Hub
D. It uses more disk space
What is the most secure way to give a GitHub Actions workflow access to AWS?
A. Long-lived access keys stored as GitHub Secrets
B. OIDC federation with an AWS IAM role and an assume-role trust policy (no long-lived keys)
C. Committing keys to a private repo
D. Sharing the root account password
Which stage of a canonical CI pipeline should run first?
A. Integration tests
B. Lint and format check (cheap, catches ~20% of problems)
C. Build and push Docker image
D. Deploy to production
What makes a cache key effective in CI?
A. Using a fixed string like
"cache"B. Hashing the dependency lockfile (e.g.,
uv.lock,pnpm-lock.yaml) so the cache invalidates automatically on changesC. Using the current timestamp
D. Not caching at all
A flaky test is in your pipeline. What should you do?
A. Ignore it and re-run the pipeline until it passes
B. Quarantine (skip) or fix it — one flake destroys the signal value of the whole pipeline
C. Delete the test
D. Mark the whole suite as optional
How should the deploy step tag a container image?
A. With
latestB. With an immutable unique tag like the commit SHA
C. With a random UUID generated at deploy time
D. It shouldn’t tag at all
Why should the production deploy require a manual approval gate?
A. To give someone credit
B. To add a human checkpoint for high-blast-radius changes, even after automated tests pass
C. It’s required by law
D. It’s free extra compute time
What does “zero-downtime deploy” typically require?
A. Taking the service offline during deploys
B. A rolling update strategy with health/readiness probes, so old instances drain only when new ones are healthy
C. Recompiling the kernel
D. Running two separate clusters
Why should manual steps be avoided in the deploy pipeline?
A. Manual steps are slower
B. They drift from documentation, are unauditable, and can’t be reproduced — anything not in the pipeline eventually breaks
C. They use more electricity
D. Manual steps are illegal
View Answer Key
B — Under 15 minutes; ideally under 5 for fast feedback.
B — Mutable tags break rollback and auditability.
B — OIDC federation is the modern, keyless approach.
B — Cheap checks first; they catch a large fraction of problems at minimal cost.
B — Content-addressed keys (hashing lockfiles) give automatic invalidation.
B — Quarantine or fix; never just re-run.
B — Immutable, unique tags (commit SHA) for rollback and traceability.
B — Manual approval is a human checkpoint for high-blast-radius changes.
B — Rolling updates with readiness probes are the standard zero-downtime pattern.
B — Manual steps drift and become un-reproducible; automate everything.