Method

How an engagement actually runs, week by week.

We are asked, every engagement, what the first month looks like in enough detail to explain to a VP. This page is the answer we give. It is also the thing we send in advance so nobody is surprised on day one.

The sixteen-week shape

Weeks 1–2

Assessment

Read the code, end to end, with a stopwatch. Run the existing load tests; if none exist, write a minimal harness. Produce a twenty-to-forty page written review with: architectural diagram of the current state, ranked list of bottlenecks with measured evidence, proposed target architecture with two to three alternatives, and an honest risk register. The review is signed by all three of us. It is the deliverable whether we continue or not.

Artifacts: architectural-review.md · bench/harness_test.go · risks.md

Weeks 3–5

Differential test harness

Before the new implementation writes its first production line, we build a harness that can replay production traffic or journal data against both the old and the new path and fail on any semantic divergence. On the ledger engagement this caught two real bugs in the replacement during the first week of implementation. Every rewrite we have done since 2021 has started here.

Artifacts: difftest/replay.go · difftest/fixtures/ · CI integration

Weeks 4–12

Implementation in parallel

New code is written in your repository, on feature branches, reviewed by your engineers. We pair with at least two of your staff throughout, because handoff is a practice, not an event. The new path is deployed alongside the old one — behind a header, a tenant flag, or an Envoy route — and runs in shadow mode on production traffic long before it serves any response.

We hold a thirty-minute review every Friday with the client's technical lead. Scope changes are decided here and nowhere else. If a decision cannot wait a week, we are probably reacting to pressure rather than evidence.

Artifacts: code in your repo · weekly review notes · shadow-mode dashboards

Weeks 12–15

Cutover

Gradual. Tenant-by-tenant or percentage-by-percentage, with explicit rollback criteria agreed in advance, in writing. We have never done a flag-day cutover and we will not propose one. The cutover plan includes a named owner for each step, a measured rollback trigger, and a communication template for the internal status channel.

Artifacts: cutover.md · rollback playbook · SLO dashboards

Week 16

Handoff & on-call week

We run the on-call rotation alongside your team for the final week. The runbooks get exercised while we are still around to answer "why did we do it this way?" questions. At the end of the week we hold a retrospective, archive our commit access, and leave. The ninety-day warranty starts the day we leave.

Artifacts: runbooks/ · retrospective notes · warranty period begins

Benchmark discipline

Every performance claim we publish — including the ones on the Work page — carries the following minimum metadata. If it does not, it does not leave the engagement.

Percentiles

p50, p95, p99 at minimum. p99.9 when tail matters to the business. "Average" is not a percentile.

Workload

A named, reproducible workload — usually a replay of a specific production window, referenced by its Kafka offset range or its time bounds.

Hardware

Exact instance type or bare-metal spec, kernel version, Go version, and whether the measurement ran on isolated or shared cores.

Sample size

Run count and duration. A single run is a demonstration, not a measurement. We report thirty runs or ten minutes of sustained load, whichever gives tighter variance.

Commits

The exact SHA of the old implementation and the new one. Both must be re-runnable from the artifacts in the final report.

Variance

Standard deviation or IQR reported alongside the central tendency. A number without a spread is a wish.

A bench harness as we tend to write it

This is the shape of a typical bench we hand over. It is deliberately boring. It runs the same shape of workload against the old and new implementation and reports the same fields every time.

func BenchmarkPostingEngine(b *testing.B) {
    corpus := loadReplayCorpus(b, "fixtures/payday-2024-03-15.jsonl")
    for _, impl := range []Engine{legacyRuby, goV2} {
        b.Run(impl.Name(), func(b *testing.B) {
            b.ReportAllocs()
            b.ResetTimer()
            lat := newLatencyRecorder()
            for i := 0; i < b.N; i++ {
                lat.Observe(impl.Post(corpus.Next()))
            }
            b.ReportMetric(lat.P50().Seconds(), "p50_s")
            b.ReportMetric(lat.P95().Seconds(), "p95_s")
            b.ReportMetric(lat.P99().Seconds(), "p99_s")
            b.ReportMetric(lat.StdDev().Seconds(), "stddev_s")
        })
    }
}

Footnote on payday: a "payday corpus" is a six-hour replay window sampled from the fifteenth of the month, when postings peak at roughly 14× weekday-mean. We keep one per quarter per engagement. The field note on benchmarks that lie goes into why this matters.