Work · Healthtech · 12 weeks · 2023
Auth service carved out of a Django monolith
A healthcare-adjacent SaaS needed to decouple session and token handling from its Django monolith — both for compliance reasons and to enable a mobile client rewrite — without forcing a migration to a third-party identity provider.
The situation
The existing auth logic was fifty thousand lines old, tangled with user-profile code, and used a custom bcrypt-based password scheme that predated anything we would recommend today (bcrypt work factor 10, no pepper, and a per-user salt stored in the same row). The compliance audit that triggered the engagement required demonstrable separation of authentication concerns within a fixed deadline — eleven weeks from first meeting to auditor review.
What we did
We designed a thin Go auth service that sits in front of the monolith and handles login, token issuance, refresh, and revocation. Password hashes were migrated lazily: on successful authentication against the legacy bcrypt hash, the service re-hashed the credential under argon2id (memory 64 MiB, iterations 3, parallelism 4) and stored it in a new column, then cleared the legacy column. This avoided a bulk migration that would have required a flag day and given the compliance auditors a single transition point to review instead of a four-month-long cohort migration.
Tokens were moved from legacy session cookies to PASETO v4.public with short-lived access tokens (15 minutes) and rotating refresh tokens (30 days, single-use, family-revocation on reuse). Cutover was done gradually via Envoy routing: the monolith continued to accept legacy sessions for a grandfathering window of 45 days so that every long-lived browser session could roll over naturally on its next refresh rather than being forcibly logged out. Zero forced logouts were observed during the three-week active cutover.
The compliance-sensitive piece was the audit trail. Every auth event — successful login, failed login, token issuance, token revocation, hash upgrade — emits a structured log record signed by an HMAC chain, so the auditor could verify that no events had been retroactively removed. This took ten days of the engagement and was, in the client's words, "the part the auditor actually cared about."
Outcome
- Compliance audit passed on the original deadline
- No user-visible logout events during cutover
- Auth-related p99 latency dropped from 180ms to 22ms
- Password hash scheme fully migrated within four months post-launch, with no engineering intervention required
- Mobile client rewrite proceeded in parallel without auth-team coordination
Stack
Go 1.21 · PASETO v4 · argon2id · PostgreSQL · Envoy · Redis