nip-platform8 June 2026EN

NIP — what happens during a deploy, step by step

We walk through what happens on NIP from commit to a new ready pod: eight steps, nine-minute median, four-step rollback.

What happens during a deploy on NIP

The Nortinia Infrastructure Platform (NIP) is where an engineer's commit turns into a running service. It is not a CI/CD tool — it is an infrastructure controller that ties Kubernetes clusters, GitOps state, image rollouts, and notifications into one coherent flow. This article walks through what happens between a simple git push and a new pod taking traffic.

Eight steps (median: 9 minutes)

Commit — engineer pushes to a main branch (or merges the PR). GitHub Actions workflow fires.
CI build — the workflow builds an OCI image. Layer cache makes a typical backend build take 3-4 minutes.
Push to GHCR — image is uploaded to ghcr.io/nortinia-ltd/<repo> with two tags: main-<sha> and main-latest.
Webhook to NIP — CI POSTs an image-built payload to NIP at /api/v1/deploy/image-built. Payload includes repo, commit SHA, image tag, and target environment (staging/prod).
NIP dedupes — using an idempotency key (repo+sha+env) NIP checks whether the webhook has already been processed. If so, it returns 200 and does nothing.
Flux reconcile — NIP updates the Kustomization image tag in the GitOps repo (one commit bumping the tag in kustomization.yaml). Flux notices within 60 seconds and starts the sync.
kubectl rollout — the Deployment gets a new ReplicaSet. Each new pod must pass its readiness probe before taking traffic. Default surge: 25%, max unavailable: 0.
Notification — Slack message in #deploys tagging the commit author. Green on success, red plus on-call on failure.

End-to-end median: 9 minutes (commit to first ready pod). The most common bottleneck is the CI build (image size + Next.js production build).

The four-step rollback

When something goes wrong, rollback is not git revert — there's a faster path:

"Rollback to previous tag" button in the NIP UI.
NIP commits to the GitOps repo, pinning the last known good tag.
Flux syncs within 60 seconds.
kubectl rollout creates a new ReplicaSet, the broken one drains.

Execution time: 2-3 minutes. The git revert path would be 8+ minutes (CI build + push + webhook).

Why webhook deduplication matters

GitHub Actions retries if NIP responds slowly or with a 5xx. Without dedup, that meant two consecutive Flux commits for the same SHA — pointless reconcile churn. The dedup key is ${repo}:${sha}:${env}, 24-hour TTL in Redis. Simple, but it filters 30+ duplicate hits per day.

What happens when Flux gets stuck

Flux exposes a failureSeverity field on the HelmRelease/Kustomization status. NIP polls it every 30 seconds. On failure (invalid manifest, image pull error, etc.) the deploy state moves to FAILED, and on-call receives a Slack DM plus a PagerDuty page (Sev2).

The most common failures over the past six months: (1) memory request too large for the node; (2) missing secret (SealedSecret not yet deployed); (3) ConfigMap key typo. All three have runbooks.

What we did not build (and why)

Canary deployments with auto-rollback — planned, but our traffic is not high enough for metric-based decisions to be reliable. Instead we hold a 5-minute stability window and a human reviews the Sentry error rate.
Multi-region failover automation — one region (Hetzner FSN1). If we ever add another, it still won't be automatic on day one.
In-house container registry — GHCR works, it is free, no reason to switch.

Numbers from last quarter

1,247 successful deploys
23 rollbacks (1.8%)
9.1-minute median end-to-end
0 lost deploys (thanks to webhook dedup)