Most teams talk about "zero downtime" like it is a routing trick. It is not. It is a risk management problem.
If you migrate fast but corrupt state, you did not win. Build validation into the pipeline: checksums, row counts, business invariants, and replayable audit trails.
Dual-write introduces drift. If you cannot prove ordering and idempotency, you are manufacturing incidents. Prefer a single source of truth and replicate outward, not two primaries.
Write down the rollback plan as code and rehearse it. A rollback that depends on people reading a wiki is not a rollback.
Without visibility, you are guessing. Define SLOs for the cutover window, track error budgets, and make "stop" the default when signals degrade.
"Zero downtime" for the business means users can complete critical flows. Identify the 3-5 flows that matter and instrument them end-to-end.
If you want this pattern applied to your environment, send context and constraints. I will tell you what I would do and what I would not do.