UgrĂĄs a tartalomhoz
← Back to the journal

NIP — what happens during a deploy, step by step

We walk through what happens on NIP from commit to a new ready pod: eight steps, nine-minute median, four-step rollback.

What happens during a deploy on NIP

The Nortinia Infrastructure Platform (NIP) is where an engineer's commit turns into a running service. It is not a CI/CD tool — it is an infrastructure controller that ties Kubernetes clusters, GitOps state, image rollouts, and notifications into one coherent flow. This article walks through what happens between a simple git push and a new pod taking traffic.

Eight steps (median: 9 minutes)

  1. Commit — engineer pushes to a main branch (or merges the PR). GitHub Actions workflow fires.
  2. CI build — the workflow builds an OCI image. Layer cache makes a typical backend build take 3-4 minutes.
  3. Push to GHCR — image is uploaded to ghcr.io/nortinia-ltd/<repo> with two tags: main-<sha> and main-latest.
  4. Webhook to NIP — CI POSTs an image-built payload to NIP at /api/v1/deploy/image-built. Payload includes repo, commit SHA, image tag, and target environment (staging/prod).
  5. NIP dedupes — using an idempotency key (repo+sha+env) NIP checks whether the webhook has already been processed. If so, it returns 200 and does nothing.
  6. Flux reconcile — NIP updates the Kustomization image tag in the GitOps repo (one commit bumping the tag in kustomization.yaml). Flux notices within 60 seconds and starts the sync.
  7. kubectl rollout — the Deployment gets a new ReplicaSet. Each new pod must pass its readiness probe before taking traffic. Default surge: 25%, max unavailable: 0.
  8. Notification — Slack message in #deploys tagging the commit author. Green on success, red plus on-call on failure.

End-to-end median: 9 minutes (commit to first ready pod). The most common bottleneck is the CI build (image size + Next.js production build).

The four-step rollback

When something goes wrong, rollback is not git revert — there's a faster path:

  1. "Rollback to previous tag" button in the NIP UI.
  2. NIP commits to the GitOps repo, pinning the last known good tag.
  3. Flux syncs within 60 seconds.
  4. kubectl rollout creates a new ReplicaSet, the broken one drains.

Execution time: 2-3 minutes. The git revert path would be 8+ minutes (CI build + push + webhook).

Why webhook deduplication matters

GitHub Actions retries if NIP responds slowly or with a 5xx. Without dedup, that meant two consecutive Flux commits for the same SHA — pointless reconcile churn. The dedup key is ${repo}:${sha}:${env}, 24-hour TTL in Redis. Simple, but it filters 30+ duplicate hits per day.

What happens when Flux gets stuck

Flux exposes a failureSeverity field on the HelmRelease/Kustomization status. NIP polls it every 30 seconds. On failure (invalid manifest, image pull error, etc.) the deploy state moves to FAILED, and on-call receives a Slack DM plus a PagerDuty page (Sev2).

The most common failures over the past six months: (1) memory request too large for the node; (2) missing secret (SealedSecret not yet deployed); (3) ConfigMap key typo. All three have runbooks.

What we did not build (and why)

  • Canary deployments with auto-rollback — planned, but our traffic is not high enough for metric-based decisions to be reliable. Instead we hold a 5-minute stability window and a human reviews the Sentry error rate.
  • Multi-region failover automation — one region (Hetzner FSN1). If we ever add another, it still won't be automatic on day one.
  • In-house container registry — GHCR works, it is free, no reason to switch.

Numbers from last quarter

  • 1,247 successful deploys
  • 23 rollbacks (1.8%)
  • 9.1-minute median end-to-end
  • 0 lost deploys (thanks to webhook dedup)

Let's talk about your project

Tell us what you are building — we will figure out how to help.

NIP — what happens during a deploy, step by step — Nortinia Journal | Nortinia