canary code

Production-Grade AI Agents That Won't Break at 3 AM

Diêgo — Sun, 26 Apr 2026 20:13:44 GMT

Every AI agent demo I’ve seen works perfectly. The agent calls a tool, gets a response, formats it nicely, done. Fifteen seconds, clean terminal output, applause.

Then you deploy it. And it calls the same tool four times in a loop because the LLM hallucinated a retry instruction. Or it silently eats an error and returns a confident, completely wrong answer to your user. Or it runs for 47 minutes burning tokens on a task that should take 10 seconds.

I’ve been building agent-based systems for the past few months, and the gap between “works in my notebook” and “runs in production without waking me up” is enormous. This is my attempt to write down what I’ve actually learned about closing that gap. Some of this I’m confident about. Some of it I’m still figuring out.

The problem nobody talks about in agent tutorials

AI agents are stateful, non-deterministic processes that make decisions at runtime. That sentence sounds obvious, but it has consequences that most tutorials skip.

A traditional API endpoint receives a request, does some work, returns a response. The work is predictable. You can write tests for it. You can set timeouts. You know the blast radius.

An agent is different. It decides what to do next based on LLM output, which means you can’t fully predict the execution path. It might call one tool or five. It might finish in 2 seconds or loop for a minute. It might encounter an error from an external API and decide (on its own) to retry, or to try a completely different approach, or to give up and hallucinate an answer.

This is why durability matters so much for agents. Not durability in the “survives a server restart” sense (though that too), but durability in the broader sense: the agent should behave predictably even when the world around it doesn’t.

Step 1: Put boundaries on everything

Before you think about orchestration patterns or fancy frameworks, the single most useful thing you can do is constrain your agent’s behavior.

I mean this literally. Set hard limits on:

Maximum number of LLM calls per task (I usually start with 10 and adjust)
Maximum wall-clock time per agent run
Maximum tokens spent per run
Maximum number of tool invocations

Without these, a confused agent will happily burn through your entire monthly API budget in one run. I’ve seen it happen. Not to me, thankfully. Okay, once to me.

Here’s what a simple bounded agent loop looks like in TypeScript:

async function runAgent(task: string, tools: Tool[], options: AgentOptions) {
  const maxSteps = options.maxSteps ?? 10;
  const maxDurationMs = options.maxDurationMs ?? 30_000;
  const startTime = Date.now();
  const messages: Message[] = [{ role: "user", content: task }];
  let steps = 0;

  while (steps < maxSteps) {
    if (Date.now() - startTime > maxDurationMs) {
      return { status: "timeout", steps, messages };
    }

    const response = await callLLM(messages, tools);
    messages.push(response);
    steps++;

    if (response.toolCalls && response.toolCalls.length > 0) {
      for (const call of response.toolCalls) {
        const result = await executeToolWithTimeout(call, 5000);
        messages.push({ role: "tool", content: result, toolCallId: call.id });
      }
    } else {
      return { status: "complete", steps, messages };
    }
  }

  return { status: "max_steps_exceeded", steps, messages };
}

Nothing fancy. But notice the return type always includes status. That's the first principle: every agent run should terminate with an explicit status, not just a response. You need to know whether it finished, timed out, or hit a limit. This is the thing that makes the difference between "it worked" and "I can monitor and alert on it."

Step 2: Make tool execution the reliability boundary

Your agent is only as reliable as its tools. And tools fail. APIs return 500s, databases time out, rate limits kick in.

The pattern I’ve found most useful: wrap every tool in its own error boundary, with its own timeout, and return structured results regardless of success or failure. The LLM is surprisingly good at handling “this tool failed with error X” if you give it that information cleanly. What it’s terrible at is handling a thrown exception that kills the entire agent loop.

async function executeToolWithTimeout(
  call: ToolCall,
  timeoutMs: number
): Promise {
  const controller = new AbortController();
  const timer = setTimeout(() => controller.abort(), timeoutMs);

  try {
    const tool = toolRegistry.get(call.name);
    if (!tool) {
      return JSON.stringify({
        error: true,
        message: `Unknown tool: ${call.name}`,
      });
    }

    const result = await tool.execute(call.arguments, {
      signal: controller.signal,
    });
    return JSON.stringify({ error: false, data: result });
  } catch (err) {
    const message =
      err instanceof Error ? err.message : "Tool execution failed";
    return JSON.stringify({ error: true, message });
  } finally {
    clearTimeout(timer);
  }
}

The key insight: never throw from tool execution. Always return a structured result. Let the LLM decide what to do with failures. This is one of those things I’m quite certain about after watching agents in production for a while.

Step 3: Think about durability for long-running agents

Short agents that finish in a few seconds? The pattern above is probably enough. But once agents start running for minutes, or need to survive server restarts, or coordinate with other agents, you need something more. This is where the concept of durable execution comes in. If the process dies after step 3 of 7, you should be able to resume from step 3 instead of starting over.

I think this matters more than most people realize. In serverless environments especially, your function might get killed by the platform after a timeout. Without checkpointing, that’s a complete waste of every token and API call that already happened.

The principle is straightforward even if you don’t use a specific durability framework. After each significant step (LLM call, tool result, decision point), persist the agent’s state somewhere. A database, a queue, a file. Whatever your infrastructure supports. Then build your agent loop to accept a “resume from” parameter.

I’m not going to pretend I’ve nailed this perfectly. My current approach is to store the full message history after each step in Postgres, with a run ID and step number. If the process crashes, a recovery worker picks up incomplete runs and resumes them. It’s not elegant but it works.

Step 4: Parallel agents are powerful and dangerous

The Pragmatic Engineer blog recently covered an interesting trend: developers kicking off multiple AI agents in parallel to work on different parts of a codebase simultaneously. The idea is that instead of one agent doing everything sequentially, you split the work and let multiple agents tackle sub-tasks at the same time.

I’ve been experimenting with this and it’s genuinely useful. But it introduces failure modes that sequential agents don’t have.

The obvious one: what happens when agent 3 out of 5 fails? Do you retry just that one? Do you cancel all of them? Does the output of agent 3 depend on agents 1 and 2?

Here’s the pattern I’ve settled on for parallel agent work:

interface AgentTask {
  id: string;
  prompt: string;
  tools: Tool[];
  dependsOn?: string[];
}

async function runParallelAgents(tasks: AgentTask[]): Promise> {
  const results = new Map();
  const pending = new Map(tasks.map((t) => [t.id, t]));

  while (pending.size > 0) {
    const ready: AgentTask[] = [];

    for (const [id, task] of pending) {
      const depsResolved = (task.dependsOn ?? []).every(
        (dep) => results.has(dep) && results.get(dep)!.status === "complete"
      );
      if (depsResolved) ready.push(task);
    }

    if (ready.length === 0 && pending.size > 0) {
      for (const [id] of pending) {
        results.set(id, { status: "blocked", steps: 0, messages: [] });
        pending.delete(id);
      }
      break;
    }

    const batchResults = await Promise.allSettled(
      ready.map(async (task) => {
        const depContext = (task.dependsOn ?? [])
          .map((dep) => results.get(dep))
          .filter(Boolean);
        const contextualPrompt = buildContextualPrompt(task.prompt, depContext);
        const result = await runAgent(contextualPrompt, task.tools, {
          maxSteps: 10,
          maxDurationMs: 30_000,
        });
        return { id: task.id, result };
      })
    );

    for (const settled of batchResults) {
      if (settled.status === "fulfilled") {
        results.set(settled.value.id, settled.value.result);
        pending.delete(settled.value.id);
      } else {
        const failedTask = ready.find(
          (t) => !results.has(t.id)
        );
        if (failedTask) {
          results.set(failedTask.id, {
            status: "error",
            steps: 0,
            messages: [],
          });
          pending.delete(failedTask.id);
        }
      }
    }
  }

  return results;
}

Notice the dependency graph. Some agents can run in parallel, but others depend on earlier results. The orchestrator resolves dependencies, runs independent tasks concurrently, and handles failures without killing the entire batch.

I’m going to be honest: the error handling here is something I’m still iterating on. The “blocked” status when dependencies can’t be resolved feels like the right thing, but I haven’t tested it under enough real scenarios to be certain.

Step 5: Observe everything, trust nothing

Remember the observability point from step 1? It comes back here, and it’s even more important with agents than with normal services.

For every agent run, I log:

Total steps taken
Total tokens consumed (prompt and completion separately)
Wall-clock duration
Which tools were called and how many times
The terminal status (complete, timeout, max_steps, error)
Whether the agent retried any tool calls

This is how you catch the patterns that kill you. “Hey, the invoice-processing agent has been averaging 8 steps for the past week, but today it’s averaging 14.” That’s your early warning. Something changed in the data, or the LLM is behaving differently, or a downstream API is returning errors that cause retries.

Without these metrics, you’ll find out when your token bill arrives. Or when a user complains. Or at 3 AM.

One thing I keep going back to: the bounded execution from step 1 is what makes observability useful. If an agent can run unbounded, your metrics are meaningless because the variance is infinite. Boundaries give you a normal range to compare against.

Step 6: Test the failure modes, not just the happy path

This is the part most people skip and it’s the part that matters most.

Your tests for an agent system should include:

What happens when the LLM returns malformed tool calls?
What happens when a tool times out on every invocation?
What happens when the agent hits its step limit without completing the task?
What happens when two parallel agents try to modify the same resource?
What happens when the LLM decides to call a tool that doesn’t exist?

I write these as integration tests with a mock LLM that returns predefined sequences. It’s not perfect because you can’t predict every weird thing a real LLM will do. But it catches the structural failures: the ones where your orchestration logic breaks, not where the LLM says something dumb.

For the LLM-says-something-dumb cases, I rely on the boundaries from step 1 and the observability from step 5. You can’t test for every hallucination. But you can make sure hallucinations don’t cause unbounded damage.

What I’d do differently starting from scratch

If I were building a new agent system today, I’d start with the boring stuff first. Timeouts, structured tool results, status tracking, logging. Then I’d add the actual agent logic on top.

Most teams do it the other way around. They get the agent working, it’s exciting, it does cool things. Then they spend three months retrofitting all the production-hardening stuff. I’ve done this. It’s painful. The guardrails are much easier to build when you design around them from the start.

I’d also think carefully about whether I actually need agents at all. A lot of problems that people solve with agents can be solved with a well-structured prompt and a single LLM call. Agents add complexity. Every step in an agent loop is a place where things can go wrong. If your task doesn’t require dynamic tool selection or multi-step reasoning, a simpler approach is almost always better.

That said, when you do need agents (and there are real cases where you do), building them with durability in mind from day one will save you more headaches than any framework or library choice.

Where I’m still figuring things out

I don’t have a great answer for agent memory yet. For short tasks, passing the full message history works fine. For agents that run across multiple sessions or need to remember things from days ago, I’m experimenting with summarization and retrieval patterns, but nothing feels solid yet.

I also don’t have strong opinions on agent frameworks. There are a lot of them. Some seem good, some seem like thin wrappers around API calls with a lot of abstraction for abstraction’s sake. I’ve been writing my own orchestration code because it helps me understand the failure modes, but I could be wrong that this is the best use of my time.

And multi-agent coordination where agents communicate with each other, not just run in parallel, is something I’ve read about more than I’ve built. Projects like Wuphf (which uses Git and Markdown files as a shared knowledge base between agents) are interesting because they solve the coordination problem through a shared artifact instead of direct communication. That feels right to me, but I haven’t tested it enough to recommend it.

The honest summary: if you get the basics right (boundaries, structured tool results, observability, explicit status tracking), you can build agent systems that run in production without constant babysitting. The fancy orchestration patterns matter less than you’d think. The boring reliability patterns matter more.

Build the guardrails first. Then let the agents loose inside them.

How to actually get better output from AI coding assistants

Diêgo — Wed, 22 Apr 2026 21:51:17 GMT

Most people treat AI coding assistants like a smarter autocomplete. Type a prompt, get some code, edit it until it works, repeat. And that works fine for small stuff: generating a utility function, explaining an unfamiliar API, drafting a quick test.

But when you try to use AI on real production work, things with actual constraints, team conventions, and code that has to survive contact with other engineers, that approach falls apart fast. The output is technically correct but doesn’t fit anywhere. You end up rewriting most of it anyway.

I’ve been reading through some of the most useful practical writing I’ve found on this topic, and I want to walk through three patterns that actually change the output quality in a meaningful way. Not theory. Patterns you can start using today.

Start with context, not prompts

Here’s a pattern I see constantly: engineers write detailed prompts but give the AI no context about where the code will live. No project structure. No team conventions. No architectural decisions. Just a description of the feature they want.

The AI fills in the blanks. And it’s good at that. But it fills them in with generic, statistically average choices, not your choices.

The fix is something called knowledge priming (or context engineering if you want to sound fancy). Before you start a session, you feed the AI the information it needs to make decisions that match your codebase.

This can be as simple as pasting your team’s style guide into the conversation. Or pointing the AI at a representative file from your codebase and saying “write new code that looks like this.” Or, and this is where it gets more structured, maintaining a document that lives in your repo and gets included in every AI session automatically.

The Encoding Team Standards piece from Martin Fowler’s site gets into exactly this. The idea is to make your team’s conventions explicit and machine-readable, so you’re not re-explaining them every session. Things like: how you name variables, how you handle errors, what packages you prefer, what patterns you avoid. Not a vague “we care about clean code.” Specific, concrete rules the AI can actually follow.

This matters more than most people realize. The AI isn’t being sloppy when it ignores your conventions. It genuinely doesn’t know them. Give it the information and the output quality shifts noticeably.

Build a harness before you build a feature

This one changed how I think about AI-assisted development entirely.

The instinct when working with a coding agent is to ask it to build the thing you need. But what usually happens is the agent generates something, you’re not sure if it’s right, you ask for changes, the changes break something else, and you spend more time debugging than you would have writing it yourself.

The pattern that actually works is to build the harness first.

A harness here isn’t a testing framework in the traditional sense. It’s a set of constraints (tests, type contracts, lint rules, example inputs and outputs) that define what correct looks like before any implementation exists. You give the agent something to run against. It can iterate on its own output, catch its own mistakes, and come back to you with something that already passes your criteria.

This is the core of what harness engineering describes. Instead of reviewing AI output by reading it and hoping you catch the bugs, you create an automated feedback mechanism. The agent fails fast, locally, on your constraints, not in production, not in code review.

Here’s what this looks like in practice. Say you’re asking an agent to implement a data transformation function. Before you write the prompt, you write:

- A few unit tests covering the expected behavior

- Type signatures that constrain the inputs and outputs

- Maybe a couple of edge cases you know are tricky

Then you give the agent the tests and tell it to make them pass. Not “build me a function that does X.” Give it something to aim at.

The difference in output quality is real. The agent has a ground truth to orient around. It stops guessing at what “correct” means and starts solving a specific, verifiable problem.

This also makes code review faster. When an agent’s output comes with passing tests, you’re not starting from scratch when evaluating it. You’re asking: are these tests sufficient? That’s a much smaller question.

Create a feedback loop, not a conversation

Most AI-assisted coding sessions look like a conversation. You ask for something, you get something, you give feedback, you get a revision. Back and forth.

That works. But it doesn’t scale, and it doesn’t improve over time. Every session starts from zero. Every mistake is one you have to catch yourself.

The Feedback Flywheel pattern is about turning that conversation into something self-improving. The idea is to capture the feedback you’re giving the AI (the corrections, the style notes, the “no, not like that” moments) and encode them back into the context the AI starts with next time.

So say you’re working with an agent and it keeps generating code with a pattern you don’t use. You correct it. That correction disappears when the session ends. But if you take that correction and add it to your team standards document, it’s part of the context the next session starts with. You stopped correcting the same mistake.

Over time, this compounds. Your AI sessions get progressively less corrective work, because the common mistakes are already ruled out before the session begins. The flywheel is slow at first (encoding one convention at a time is tedious) but it pays back quickly.

The practical steps for this are roughly:

1. Run a session, collect corrections and feedback

2. Group the feedback into categories (naming, patterns, architecture, style)

3. Rewrite the corrections as rules, not commentary: “use X” not “avoid Y because...”

4. Add those rules to a shared context document that every session gets

This is also how teams start sharing AI productivity gains. If one engineer figures out a better prompt structure or catches a common mistake pattern, it shouldn’t stay in their head. It should go into the shared context, where everyone benefits automatically.

A note on what these patterns have in common

Looking at all three, the thread is the same: friction reduction through upfront investment.

Knowledge priming requires writing down your conventions explicitly. Harness engineering requires writing tests before implementation. The feedback flywheel requires capturing corrections and updating shared context. None of these feel productive in the moment. They all slow down the first session.

But they’re the difference between AI assistance that compounds and AI assistance that plateaus.

The engineers I’ve seen get the most out of these tools aren’t the ones with the cleverest prompts. They’re the ones treating the AI’s working environment with the same care they’d treat their own. Good tooling, good constraints, good feedback mechanisms.

The AI isn’t going to ask for any of this. It will work with whatever you give it. The question is whether what you give it is enough for it to do good work.

Where to start

If you’re not doing any of this yet, pick one:

Easiest: Write a short conventions document for your project. Three to five specific rules the AI should follow. Paste it at the start of every session for a week and notice what changes.
Higher impact: Before your next feature task, write two or three tests that describe the expected behavior. Use those as the prompt. See if you spend less time revising.
Long game: After your next AI session, write down the corrections you made. Find the most common one. Turn it into a rule in your conventions document.

None of this requires new tools. It’s a shift in how you set up the work before the AI touches it.

That’s the whole idea. AI coding assistants are powerful, but they’re not magic. They’re tools that reflect the quality of their inputs. Make the inputs better and the outputs follow.

Kubernetes Core Concepts Explained with a Golang Example

Diêgo — Mon, 10 Nov 2025 23:46:47 GMT

Why Use Kind?

Kind is an excellent tool for setting up a local Kubernetes environment. It offers:

Run Kubernetes clusters inside Docker containers.
Quick and simple setup for local development.
Perfect for both beginners and experienced developers experimenting with Kubernetes features.

Who Is This Article For?

Kubernetes Beginners: Developers getting started with Kubernetes who want a practical, hands-on introduction.

Experienced Developers: Those who prefer a “deploy first” approach—setting up containers and Kubernetes clusters locally before moving to cloud infrastructure.

What You’ll Learn

This article breaks down Kubernetes core concepts step by step. In each section, we’ll dive deeper into the fundamentals, explaining key Kubernetes components and how they interact with each other through practical examples.

Base Project

To keep things practical and focused on Kubernetes concepts, we’ll use a simple Go application. You can find the complete code here: gst-app

Containerization and Kind: Building and Managing Our Kubernetes Environment

Containerization has transformed how applications are built, shipped, and deployed. By isolating applications and their dependencies into lightweight, self-contained packages, containers ensure consistent behavior across different environments—from development to production. Let’s explore containerization fundamentals and how we use Kind (Kubernetes in Docker) to set up our Kubernetes environment.

Containerization: A Modern Approach to Application Deployment

What is Containerization?

Containerization involves packaging an application and its dependencies into a “container”—a lightweight, portable, self-sufficient environment that runs consistently across various infrastructures.

Key Benefits:

Isolation: Containers provide isolated environments, preventing conflicts between applications running on the same host.
Portability: Containers run on any system supporting the container runtime, ensuring consistent deployment across development, testing, and production.
Scalability: Containers can easily scale up or down, making them ideal for dynamic, cloud-native applications.

Example: Dockerfile Configuration

In our project, dockerfile.todo defines the Docker image for the todo-api service:

FROM golang:1.23.0 AS build_todo-api

ENV CGO_ENABLED=0 GOOS=linux GOARCH=amd64
WORKDIR /app

COPY go.mod go.sum ./
RUN go mod download

COPY . .

RUN go build -o todo-api ./main.go 

FROM alpine:3.18
RUN apk --no-cache add postgresql-client
RUN addgroup -g 1000 -S todo && \
    adduser -u 1000 -h /app -G todo -S todo

WORKDIR /app
COPY --from=build_todo-api --chown=todo:todo /app/todo-api /app/todo-api
USER todo
EXPOSE 8000

CMD [”./todo-api”]

LABEL org.opencontainers.image.title=”todo-api” \
      org.opencontainers.image.authors=”Diêgo ” \
      org.opencontainers.image.source=”https://github.com/diegom7s-dev/gst-app” \
      org.opencontainers.image.version=”1.0.0”

This Dockerfile uses a two-stage build:

Build Stage (golang:1.23.0 AS build_todo-api): Compiles the application in a clean Go environment, ensuring the final image contains only necessary binaries.
Runtime Stage (FROM alpine:3.18): Copies the compiled binary to a minimal Alpine Linux image, providing a lightweight runtime environment with only essential dependencies like postgresql-client.

By separating build and runtime stages, we optimize the image for both size and security—following containerization best practices.

Kind: Simulating a Kubernetes Cluster in Docker

What is Kind?

Kind (Kubernetes in Docker) is a tool for running local Kubernetes clusters using Docker containers as nodes. It’s excellent for local development and testing, allowing developers to create multi-node clusters without needing multiple physical or virtual machines.

Example: Kind Configuration

The kind.config.yaml file defines our Kind cluster configuration:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  extraPortMappings:
  # Todo-Api
  - containerPort: 8000
    hostPort: 8000
  # Postgres
  - containerPort: 5432
    hostPort: 5432

This configuration creates a single-node Kind cluster with a control plane role. It maps ports 8000 and 5432 from the host to the container, allowing us to access services running inside the cluster (the todo-api on port 8000 and PostgreSQL on port 5432) directly from our local machine.

Integrating Containerization with Kind: Building and Running the Service

Makefile Setup for Automation

Our project’s Makefile automates various tasks related to building and deploying the todo-api service using Docker and Kind:

# Define dependencies
GOLANG          := golang:1.22.2
ALPINE          := alpine:3.18
KIND            := kindest/node:v1.27.3
POSTGRES        := postgres:15.4

# Building containers
service:
    docker build \
        -f infra/docker/dockerfile.todo \
        -t $(SERVICE_IMAGE) \
        --build-arg BUILD_REF=$(VERSION) \
        --build-arg BUILD_DATE=`date -u +”%Y-%m-%dT%H:%M:%SZ”` \
        .

# Running from within k8s/kind
dev-up:
    kind create cluster \
        --image $(KIND) \
        --name $(KIND_CLUSTER) \
        --config infra/k8s/dev/kind/kind.config.yaml

    kubectl config use-context kind-$(KIND_CLUSTER)
    kubectl wait --timeout=120s --namespace=local-path-storage --for=condition=Available deployment/local-path-provisioner
    kind load docker-image $(POSTGRES) --name $(KIND_CLUSTER)

Key Makefile Targets:

service: Builds the Docker image for the todo-api service using the Dockerfile at infra/docker/dockerfile.todo.
dev-up: Creates a Kind cluster using the specified configuration file and loads necessary Docker images into the cluster.

By leveraging Docker and Kind, our setup ensures a streamlined development workflow that mirrors a production environment (within limitations). This allows us to build, deploy, and test our Go application in a local Kubernetes cluster, providing a high-fidelity environment for development and testing.

Essential Kubernetes Components: What They Are and How to Use Them

Understanding Kubernetes core components is fundamental to effectively deploying and managing applications. Let’s explore the key components that form the foundation of Kubernetes, using our Go application as a practical example.

1. Nodes: The Worker Machines in Kubernetes Clusters

What are Nodes?

Nodes are the worker machines in Kubernetes clusters. They’re responsible for running containerized applications and providing the computational resources needed to keep your applications running smoothly. Nodes can be physical servers or virtual machines, depending on your cluster configuration.

Architectural Role:

Runtime Environment: Nodes serve as the execution environment for your Pods. Each node runs at least a kubelet (an agent responsible for communicating with the Kubernetes control plane), a container runtime (like Docker or containerd), and kube-proxy (which maintains network rules on nodes).
Resource Management: Nodes provide CPU, memory, storage, and network resources for running containers. Kubernetes manages these resources efficiently, ensuring each Pod receives the necessary resources as specified in its configuration.

Node Components:

Kubelet: An agent running on each node that ensures containers are running in a Pod. It continuously monitors Pod status and communicates with the Kubernetes API server to maintain the desired state.
Container Runtime: The software responsible for running containers. Popular runtimes include Docker, containerd, and CRI-O. Kubernetes supports any runtime implementing the Kubernetes Container Runtime Interface (CRI).
Kube-proxy: A network proxy running on each node that manages network communication between Pods across different nodes. It implements Kubernetes networking services on each node, ensuring Pods can communicate with each other and external services.

Example in Our Project:

In our project, nodes are represented by Docker containers running Kubernetes when using Kind. Each node in a Kind cluster is a Docker container, allowing us to simulate a multi-node Kubernetes cluster locally.

While we don’t have a specific YAML manifest to define nodes (since nodes are managed by the control plane), we rely on them to provide the necessary environment for our Pods and services. For example, when deploying the PostgreSQL database or the todo-api application, Kubernetes schedules these Pods on available nodes, utilizing their computational resources.

Key Concepts Related to Nodes:

Node Affinity and Anti-Affinity: Kubernetes provides mechanisms to control how Pods are scheduled on nodes. Node affinity allows you to define rules that attract Pods to certain nodes, while anti-affinity ensures Pods are distributed across nodes for improved fault tolerance.
Taints and Tolerations: Used to prevent certain Pods from being scheduled on specific nodes. For example, a node can be tainted to allow only specific workloads, like those requiring GPUs, ensuring only compatible Pods are scheduled there.

Understanding Node Management:

Node Status: Each node maintains a status providing essential information like node health, capacity (CPU, memory, etc.), and conditions (e.g., Ready, DiskPressure, MemoryPressure).
Node Maintenance: Nodes can be marked as unschedulable when needing maintenance, preventing new Pods from being scheduled while allowing existing Pods to continue running or be rescheduled.

2. Pods: The Fundamental Building Block of Kubernetes

What are Pods?

Pods are the smallest deployable units in Kubernetes, representing a single instance of a running process. A Pod can encapsulate one or more containers that share the same network namespace and storage. Containers within a Pod can communicate using localhost and share storage volumes.

Architectural Role:

Ephemeral Nature: Pods are designed to be ephemeral. When a Pod fails, Kubernetes automatically creates a new Pod to replace it rather than repairing the existing one.
Container Co-location: Containers that need to share resources (like storage or networking) or must always be deployed together are grouped in a single Pod.

Example in Our Project:

In our project, the StatefulSet configuration in dev-database.yaml defines a Pod template for running a PostgreSQL container:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: database
  namespace: simple-go-todo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: database
  template:
    metadata:
      labels:
        app: database
    spec:
      containers:
      - name: postgres
        image: ‘postgres:15.4’
        volumeMounts:
        - name: data
          mountPath: /var/lib/postgresql/data

This configuration ensures a Pod running a PostgreSQL database is created and maintained, with persistent storage mounted at /var/lib/postgresql/data.

3. Deployments: Managing Your Application’s Desired State

What are Deployments?

Deployments are abstractions that manage Pods and ReplicaSets. They provide declarative updates, ensuring the specified number of Pods is always running, and handle tasks like scaling, rolling updates, and rollbacks.

Architectural Role:

Scalability and Resilience: Deployments enable horizontal scaling of applications (increasing the number of replicas) to handle increased traffic or workload.
Rolling Updates and Rollbacks: Support zero-downtime updates by incrementally updating Pods with new application versions and can roll back to a previous version if needed.

Example in Our Project:

The base-service.yaml file specifies a Deployment for our todo application:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: todo
  namespace: simple-go-todo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: todo
  template:
    metadata:
      labels:
        app: todo
    spec:
      containers:
      - name: todo-api
        image: service-image

This Deployment manages the lifecycle of the todo-api Pod, ensuring one instance is always running and can be scaled as needed.

4. Services: Stable Networking for Your Pods

What are Services?

Services provide stable network endpoints for accessing Pods within a Kubernetes cluster. They abstract network access to Pods, enabling communication within the cluster and with external clients.

Service Types:

ClusterIP: Exposes the Service on an internal cluster IP, accessible only within the cluster
NodePort: Exposes the Service on a static port on each node’s IP
LoadBalancer: Provisions an external IP to load balance traffic across nodes
ExternalName: Maps a Service to an external DNS name

Architectural Role:

Decoupling: Services decouple clients from the underlying Pod IP addresses, which can change if Pods are recreated or rescheduled
Service Discovery: They provide a consistent interface for service discovery, allowing other applications to reliably discover and communicate with Pods

Example in Our Project:

The dev-todo-patch-service.yaml creates a Service for the todo-api:

apiVersion: v1
kind: Service
metadata:
  name: todo-api
  namespace: simple-go-todo
spec:
  type: ClusterIP
  ports:
  - name: todo-api
    port: 8000
    targetPort: todo-api

This Service enables internal cluster communication to access the todo-api on a stable IP and port.

5. ConfigMaps and Secrets: Managing Configuration and Sensitive Data

What are ConfigMaps and Secrets?

ConfigMaps: Store non-sensitive configuration data in key-value pairs
Secrets: Store sensitive data like passwords, OAuth tokens, and SSH keys, base64-encoded

Architectural Role:

Separation of Configuration and Code: Enable separating configuration from application code, making applications portable and easier to manage
Secure and Flexible Management: Secrets ensure sensitive data is managed securely, while ConfigMaps provide a flexible way to manage configurations without hardcoding values

Example in Our Project:

We use a ConfigMap to configure PostgreSQL settings in dev-database.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: pghbaconf
  namespace: simple-go-todo
data:
  pg_hba.conf: |
    local   all             all                                     trust
    # IPv4 local connections:
    host    all             all             0.0.0.0/0               trust
    # IPv6 local connections:
    host    all             all             ::1/128                 trust
    # Allow replication connections from localhost, by a user with the
    # replication privilege.
    local   replication     all                                     trust
    host    replication     all             0.0.0.0/0               trust
    host    replication     all             ::1/128                 trust

This ConfigMap stores PostgreSQL’s access control configuration, mounted as a file in the Pod.

6. StatefulSets: Managing Stateful Applications

What are StatefulSets?

StatefulSets manage the deployment and scaling of a set of Pods, providing guarantees about the ordering and uniqueness of these Pods.

Architectural Role:

Stateful Application Management: Ideal for managing stateful applications where each Pod must have a unique identity and stable persistent storage
Stable Network Identity and Storage: Ensures each Pod has a unique, stable network identity and can maintain persistent storage across restarts

Example in Our Project:

The dev-database.yaml file uses a StatefulSet to deploy a PostgreSQL instance:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: database
  namespace: simple-go-todo
spec:
  selector:
    matchLabels:
      app: database
  replicas: 1
  template:
    metadata:
      labels:
        app: database

This configuration provides stable identity and persistent storage for our PostgreSQL database.

Kubernetes’ Declarative Model

Kubernetes uses a declarative model where you define your application’s desired state, and Kubernetes continuously works to maintain that state. By defining your application components as YAML manifests, you can easily manage and scale your applications in a Kubernetes cluster. This approach contrasts with imperative models where each step is executed manually, offering a more scalable and resilient way to manage applications.

Understanding these Kubernetes objects and their architecture is crucial to leveraging Kubernetes’ full potential. In our project, we apply these concepts to deploy a Go application, providing a practical example of how each component fits into the overall architecture.

Putting It Into Practice: Running the Project with Makefile Commands

In this final section, we’ll walk through the step-by-step process of building, deploying, and running our Go application using the commands defined in the Makefile. This will provide a comprehensive understanding of how each command contributes to the overall deployment process and ensure everything works correctly in your Kubernetes environment.

The Makefile simplifies the workflow by automating repetitive tasks. Let’s break down the main commands and what happens when you execute each one.

1. Installing Dependencies

The first step in setting up our environment is installing all necessary dependencies. The Makefile provides a target to install these dependencies using Homebrew (feel free to adapt this for your preferred package manager):

dev-brew:
    brew update
    brew list kind || brew install kind
    brew list kubectl || brew install kubectl
    brew list kustomize || brew install kustomize
    brew list pgcli || brew install pgcli

This step ensures all necessary tools are available on your machine to interact with the Kubernetes cluster and manage configurations.

2. Pulling Docker Images

Before building our custom Docker image, we need to ensure we have the necessary base images:

dev-docker:
    docker pull $(GOLANG)
    docker pull $(ALPINE)
    docker pull $(KIND)
    docker pull $(POSTGRES)

This script pulls the specified Docker images for Go, Alpine, Kind node, and PostgreSQL. These images are the foundation for building our custom application image and running our local Kubernetes cluster.

3. Building the Docker Image

The Makefile includes a command to build the Docker image for our todo-api service:

service:
    docker build \
        -f infra/docker/dockerfile.todo \
        -t $(SERVICE_IMAGE) \
        --build-arg BUILD_REF=$(VERSION) \
        --build-arg BUILD_DATE=`date -u +”%Y-%m-%dT%H:%M:%SZ”` \
        .

Docker Image Build: This command builds the Docker image using the Dockerfile at infra/docker/dockerfile.todo. It tags the image with the version specified in the VERSION variable (like todo-api:0.0.1).

Build Arguments: BUILD_REF and BUILD_DATE are passed as build arguments to incorporate versioning information and build metadata into the image.

The resulting Docker image contains the compiled Go application, ready to be deployed to our Kubernetes cluster.

4. Creating the Kind Cluster

To simulate a Kubernetes environment locally, we use Kind to create a new cluster:

dev-up:
    kind create cluster \
        --image $(KIND) \
        --name $(KIND_CLUSTER) \
        --config infra/k8s/dev/kind/kind.config.yaml

    kubectl config use-context kind-$(KIND_CLUSTER)
    kubectl wait --timeout=120s --namespace=local-path-storage --for=condition=Available deployment/local-path-provisioner
    kind load docker-image $(POSTGRES) --name $(KIND_CLUSTER)

Creating the Kind Cluster: The kind create cluster command creates a new Kubernetes cluster named sgt-kind-cluster using the specified Kind node image (kindest/node:v1.27.3) and configuration file (kind.config.yaml).
Setting Kubernetes Context: kubectl config use-context switches the current Kubernetes context to the new Kind cluster, allowing subsequent kubectl commands to interact with it.
Waiting for Storage Provisioner: The kubectl wait command waits until the local-path-provisioner deployment is available, ensuring the cluster is ready to provision storage volumes.
Loading Docker Image into Cluster: kind load docker-image loads the PostgreSQL Docker image into the Kind cluster, making it available for our application.

5. Deploying the Application to Kubernetes

With the cluster configured and images loaded, we can now deploy our application and its dependencies.

Using Kustomize to Manage Kubernetes Configurations

What is Kustomize?

Kustomize is a Kubernetes-native tool that allows you to customize Kubernetes resource configurations without modifying the original YAML files. It’s especially useful for managing different environments (like development, testing, and production) from a common base of configuration files. Using Kustomize, we can automatically generate customized manifests for our cluster by applying specific overlays that adjust configurations as needed.

dev-apply:
    kustomize build infra/k8s/dev/database | kubectl apply -f -
    kubectl rollout status --namespace=$(NAMESPACE) --watch --timeout=120s sts/database
    
    kustomize build infra/k8s/dev/service | kubectl apply -f -
    kubectl wait pods --namespace=$(NAMESPACE) --selector app=$(APP) --timeout=120s --for=condition=Ready

Apply Database Configuration: kustomize build generates Kubernetes manifests for the database configuration from base YAML files. kubectl apply -f - applies these configurations to the cluster, creating necessary resources (e.g., StatefulSet for PostgreSQL)
Wait for Database Deployment: The kubectl rollout status command waits for the PostgreSQL StatefulSet to be fully deployed and running before proceeding
Apply Service Configuration: The process repeats for the todo-api service, ensuring the service and its dependencies are deployed to the cluster
Wait for Pods to Be Ready: kubectl wait pods ensures all Pods associated with the todo-api application are running and ready before completing the deployment process

6. Testing Application Endpoints

Finally, we can use the Makefile to test our REST API endpoints and verify everything is working as expected:

test_all: create get_all get_one update delete

By following these steps, you can successfully build, deploy, and test your Go application in a local Kubernetes cluster using Docker and Kind. The Makefile automates much of this process, making it easier to manage and reducing the risk of errors.

Conclusion

By combining containerization, Kubernetes, and Kind, we can create a powerful and flexible local development environment that closely resembles a production setup (within limitations). This approach enables efficient development, testing, and iteration, ensuring your applications are robust, scalable, and ready for deployment in real-world environments.