<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[canary code]]></title><description><![CDATA[Notes from a software engineer on what he's reading, building, and thinking about.]]></description><link>https://diegom7s.com</link><image><url>https://substackcdn.com/image/fetch/$s_!A5zw!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca548b57-8d42-41b4-aaed-cb3bf389708b_574x574.png</url><title>canary code</title><link>https://diegom7s.com</link></image><generator>Substack</generator><lastBuildDate>Mon, 29 Jun 2026 16:48:32 GMT</lastBuildDate><atom:link href="https://diegom7s.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Diêgo]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[diegomagalhaescontact@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[diegomagalhaescontact@substack.com]]></itunes:email><itunes:name><![CDATA[Diêgo]]></itunes:name></itunes:owner><itunes:author><![CDATA[Diêgo]]></itunes:author><googleplay:owner><![CDATA[diegomagalhaescontact@substack.com]]></googleplay:owner><googleplay:email><![CDATA[diegomagalhaescontact@substack.com]]></googleplay:email><googleplay:author><![CDATA[Diêgo]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Structured Logging and Observability in Backend Systems]]></title><description><![CDATA[Your logs are probably lying to you. Here's how to make them useful.]]></description><link>https://diegom7s.com/p/structured-logging-and-observability</link><guid isPermaLink="false">https://diegom7s.com/p/structured-logging-and-observability</guid><dc:creator><![CDATA[Diêgo]]></dc:creator><pubDate>Wed, 10 Jun 2026 00:34:54 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/47b5f0af-96c1-4dcd-bedf-4f088e172130_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A few months ago I spent almost four hours tracking down a bug in production. The request was failing intermittently, maybe one in every fifty calls. I had logs. Plenty of them. But they looked like this:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;bash&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-bash">Error: connection refused
Error: timeout exceeded
Info: request completed</code></pre></div><p>No timestamp context. No request ID. No indication of which service, which user, or which downstream dependency was involved. I was reading thousands of lines of text, trying to mentally correlate entries by their position in the file. Like solving a puzzle where half the pieces are from a different box.</p><p>That experience changed how I think about logging. Not as an afterthought you sprinkle into your code, but as infrastructure you design up front. The same way you&#8217;d design your database schema or your API contracts.</p><h2>The problem with unstructured logs</h2><p>Most applications start with <code>console.log</code> or <code>fmt.Println</code>. And honestly, that&#8217;s fine for a single process running on your laptop. You can read the output. You know what&#8217;s happening because you just wrote the code.</p><p>But the moment you have two services talking to each other, unstructured text logs become almost useless. You can&#8217;t filter them. You can&#8217;t correlate them across services. You can&#8217;t aggregate them into dashboards. You end up grepping through gigabytes of text, hoping the timestamp and some keyword will be enough.</p><p>The core issue is that unstructured logs are optimized for humans reading a single stream in real time. Production systems don&#8217;t work that way. You have dozens of processes, thousands of requests per second, and you&#8217;re usually looking at the logs hours after the problem happened.</p><p>Structured logging solves this by treating every log entry as a data record with typed fields, not a sentence for a human to read.</p><h2>What structured logging actually looks like</h2><p>Instead of this:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;bash&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-bash">User 4521 failed to authenticate: invalid password</code></pre></div><p>You produce this:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;json&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-json">{
  "timestamp": "2025-01-15T14:32:01.442Z",
  "level": "warn",
  "message": "authentication failed",
  "user_id": "4521",
  "reason": "invalid_password",
  "ip": "192.168.1.42",
  "service": "auth-api",
  "request_id": "req-a8f3c"
}</code></pre></div><p>Same information. But now every field is queryable. You can ask your log aggregator: &#8220;show me all authentication failures in the last hour, grouped by reason.&#8221; Or: &#8220;show me every log entry with <code>request_id: req-a8f3c</code> across all services.&#8221; You couldn&#8217;t do either of those with plain text.</p><p>The format doesn&#8217;t have to be JSON. Some teams use logfmt (the <code>key=value</code> format popular in the Go ecosystem). The point is that the log entry has a predictable structure with named fields.</p><h2>Setting up structured logging in practice</h2><p>I&#8217;ll show this in two languages because the patterns are the same, but the ergonomics differ.</p><h3>Go (standard library, slog)</h3><p>Go added the <a href="https://go.dev/blog/slog">log/slog package in version 1.21</a>. Before that, most teams used third-party libraries. Now there&#8217;s a good option in the standard library.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;go&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-go">package main

import (
&#9;"log/slog"
&#9;"os"
)

func main() {
&#9;logger := slog.New(slog.NewJSONHandler(os.Stdout, &amp;slog.HandlerOptions{
&#9;&#9;Level: slog.LevelInfo,
&#9;}))

&#9;slog.SetDefault(logger)

&#9;slog.Info("server starting",
&#9;&#9;"port", 8080,
&#9;&#9;"environment", "production",
&#9;)
}</code></pre></div><p>This outputs a JSON line with timestamp, level, message, and your custom fields. No extra dependencies.</p><p>Where <code>slog</code> really shines is when you create child loggers with pre-bound fields:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;go&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-go">func handleRequest(w http.ResponseWriter, r *http.Request) {
&#9;requestID := r.Header.Get("X-Request-ID")
&#9;if requestID == "" {
&#9;&#9;requestID = generateID()
&#9;}

&#9;log := slog.With(
&#9;&#9;"request_id", requestID,
&#9;&#9;"method", r.Method,
&#9;&#9;"path", r.URL.Path,
&#9;)

&#9;log.Info("request received")

&#9;user, err := authenticate(r)
&#9;if err != nil {
&#9;&#9;log.Warn("authentication failed", "error", err.Error())
&#9;&#9;http.Error(w, "unauthorized", http.StatusUnauthorized)
&#9;&#9;return
&#9;}

&#9;log = log.With("user_id", user.ID)
&#9;log.Info("user authenticated")

&#9;// Every subsequent log in this handler carries request_id, method, path, and user_id
}</code></pre></div><p>Every log line from this handler now carries the request ID, the HTTP method, the path, and (once authenticated) the user ID. You don&#8217;t pass these fields around manually to every function call; you build them up as context accumulates.</p><h3> Node.js (pino)</h3><p>In Node.js, <a href="https://getpino.io">pino</a> has been the go-to structured logger for years. It&#8217;s fast (it writes logs asynchronously by default) and outputs JSON.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;javascript&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-javascript">const pino = require('pino');

const logger = pino({
  level: process.env.LOG_LEVEL || 'info',
});

logger.info({ port: 8080, environment: 'production' }, 'server starting');</code></pre></div><p>The child logger pattern works the same way:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;javascript&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-javascript">function handleRequest(req, res) {
  const requestId = req.headers['x-request-id'] || generateId();

  const log = logger.child({
    request_id: requestId,
    method: req.method,
    path: req.url,
  });

  log.info('request received');

  try {
    const user = authenticate(req);
    const userLog = log.child({ user_id: user.id });
    userLog.info('user authenticated');
    // use userLog for the rest of this request
  } catch (err) {
    log.warn({ error: err.message }, 'authentication failed');
    res.writeHead(401);
    res.end('unauthorized');
  }
}</code></pre></div><p>The pattern is identical across languages. Create a logger, bind context fields as you learn them, and every log entry automatically carries that context.</p><h3>Correlation IDs: the thing that makes distributed debugging possible</h3><p>I mentioned <code>request_id</code> in both examples. This is probably the single most impactful thing you can add to your logging infrastructure. I&#8217;m sure about this one.</p><p>The idea is simple. When a request enters your system, you generate a unique ID (or read one from an incoming header). Every service that touches this request includes that ID in its logs. When something goes wrong, you search for that one ID and get the full story across every service.</p><p>Here&#8217;s the thing that trips people up: you have to propagate it. If Service A calls Service B, A needs to include the correlation ID in the outgoing request header. B needs to read it and attach it to its own logger.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;go&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-go">// Service A: outgoing call
func callServiceB(ctx context.Context, requestID string) error {
&#9;req, _ := http.NewRequestWithContext(ctx, "GET", "http://service-b/data", nil)
&#9;req.Header.Set("X-Request-ID", requestID)
&#9;resp, err := http.DefaultClient.Do(req)
&#9;// ...
}

// Service B: incoming middleware
func correlationMiddleware(next http.Handler) http.Handler {
&#9;return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
&#9;&#9;requestID := r.Header.Get("X-Request-ID")
&#9;&#9;if requestID == "" {
&#9;&#9;&#9;requestID = generateID()
&#9;&#9;}

&#9;&#9;ctx := context.WithValue(r.Context(), "request_id", requestID)
&#9;&#9;next.ServeHTTP(w, r.WithContext(ctx))
&#9;})
}</code></pre></div><p>The header name is a convention, not a standard. <code>X-Request-ID</code> is common. Some teams use <code>X-Correlation-ID</code> or <code>X-Trace-ID</code>. Pick one and stick with it everywhere. The name matters less than consistency.</p><p>If you&#8217;re using a message queue (Kafka, RabbitMQ, SQS), put the correlation ID in the message metadata too. Async boundaries are where correlation breaks down most often, and that&#8217;s exactly where you need it most.</p><h2>Log levels: fewer than you think</h2><p>I&#8217;ve seen logging configs with eight or ten levels. In practice, I use four:</p><ul><li><p><strong>debug</strong> is for development. Verbose, noisy, turned off in production unless you&#8217;re actively investigating something.</p></li><li><p><strong>info</strong> records normal operations. Request received, request completed, job started, job finished. The &#8220;everything is fine&#8221; level.</p></li><li><p><strong>warn</strong> means something unexpected happened, but the system handled it. A retry succeeded. A cache miss on something you expected to be cached. A deprecated endpoint got called.</p></li><li><p><strong>error</strong> means something failed and a human should probably know about it. A database query timed out and the request returned a 500. A downstream service is unreachable. Payment processing failed.</p></li></ul><p>I&#8217;ve seen teams debate whether they need <strong>fatal</strong> or <strong>critical</strong> as separate levels. I think that&#8217;s overthinking it. If your process is about to crash, log it as <strong>error</strong> with a field like <code>&#8221;fatal&#8221;: true</code> and let your alerting system pick it up.</p><p>The more important discipline is not *which* level to use but making sure you actually use them consistently. I&#8217;ve worked on codebases where <strong>error</strong> was used for input validation failures. That meant the error rate dashboard was always noisy, and real errors got buried. That&#8217;s a people problem, not a tools problem, but you can make it easier by writing down what each level means for your team<code>.</code></p><h2>What to log (and what not to)</h2><p>This is where I see the biggest mistakes.</p><p>Log too little and you can&#8217;t debug anything. Log too much and you can&#8217;t find anything (and your log storage bill gets alarming).</p><p>Things I always log:</p><p>Incoming requests (method, path, status code, duration). Outgoing calls to other services or databases (target, duration, success or failure). Business events that matter (order created, payment processed, user signed up). Errors and the context around them.</p><p>Things I never log:</p><p>Request or response bodies in production (too much data, and you will accidentally log passwords or tokens). Personally identifiable information without redaction. Secrets, API keys, or database credentials. Health check requests (they flood your logs with noise and they tell you nothing useful).</p><p>The PII point deserves emphasis. I&#8217;ve seen production logs that included full email addresses, phone numbers, and once even credit card numbers. Structured logging makes this worse because it&#8217;s so easy to just spread an object into your log fields. You need explicit allowlists or redaction for any user-facing data.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;javascript&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-javascript">// Don't do this
log.info({ user: req.body }, 'user signup');

// Do this
log.info({
  user_id: user.id,
  email_domain: user.email.split('@')[1],
  plan: user.plan,
}, 'user signup');</code></pre></div><h2>From logs to observability</h2><p>Structured logs are one pillar. On their own, they&#8217;re a massive improvement over unstructured text. But they&#8217;re not the full picture.</p><p>The industry generally talks about three signals: logs, metrics, and traces. I think of it differently. Logs tell you *what* happened. Metrics tell you *how much* is happening. Traces tell you *where time was spent*.</p><p>You don&#8217;t need all three on day one. Start with structured logs and correlation IDs. That alone will solve 80% of your debugging problems. I&#8217;m pretty confident about that number based on my own experience, though your mileage will vary.</p><p>When you&#8217;re ready to add metrics, the pattern is similar to structured logging: named, typed, labeled data points. Request duration as a histogram, labeled by service and endpoint. Error counts, labeled by error type. Queue depth over time. The same fields you put in your logs (service name, endpoint, environment) should appear as labels on your metrics.</p><p>Traces are the most complex to set up. They require instrumentation at every network boundary, and the tooling is still maturing. <a href="https://opentelemetry.io/">OpenTelemetry</a> has become the standard way to collect all three signals, and most popular frameworks have integrations for it. But I&#8217;ll be honest: I&#8217;ve only set up tracing on one production system, and it took significantly more effort than I expected. Worth it for complex distributed systems with many hops. Probably overkill for a two-service architecture.</p><h2>Operational patterns that actually help</h2><p>A few things I&#8217;ve learned the hard way.</p><p>First, always include the service name and environment in every log entry. This sounds obvious, but when you&#8217;re aggregating logs from twelve services into one system, and someone forgot to tag their logs with the service name, you&#8217;ll spend twenty minutes figuring out where a log line came from.</p><p>Second, log at the boundaries. When a request enters your service, log it. When it leaves (response sent), log it with the duration and status code. When you call another service, log the call and the result. If you do nothing else, boundary logging with correlation IDs gives you a timeline of every request through your system.</p><p>Third, make your log pipeline resilient. Your application should not crash because the log aggregator is down. Write to stdout, let a sidecar or agent handle shipping. If the agent falls behind, you lose some logs. That&#8217;s better than losing the service.</p><p>Remember that correlation ID propagation I talked about earlier? It becomes even more important here. When you have boundary logging on every service, the correlation ID is the thread that stitches those boundary events into a coherent story. Without it, you have a pile of disconnected entries. With it, you have a timeline.</p><p>Fourth, and I could be wrong about this being universally true, but I&#8217;ve found that a single log entry at the end of a request with all the context (duration, status, user ID, any errors) is more useful than many small entries scattered throughout. Some teams call this the &#8220;request summary log.&#8221; It gives you one searchable record per request, which makes aggregation and alerting much simpler.</p><h2>The mistake I keep seeing</h2><p>Teams invest in log aggregation tooling but never invest in log quality. They set up <a href="https://www.elastic.co/">Elastic Search</a> or Loki or whatever, pipe everything in, build dashboards. Then when something breaks, they search the logs and find entries like:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;bash&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-bash">Something went wrong
Error occurred
null</code></pre></div><p>The aggregation system is only as good as what you put into it. If your log entries don&#8217;t have context, no amount of tooling will help. Spend the time making every log entry useful on its own. Include the operation that failed, the inputs that caused it (redacted if sensitive), and what the system did about it.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;go&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-go">// This is useless
slog.Error("database error")

// This is useful
slog.Error("failed to fetch user profile",
    "user_id", userID,
    "query_duration_ms", elapsed.Milliseconds(),
    "error", err.Error(),
    "retry_attempted", true,
    "retry_succeeded", false,
)</code></pre></div><p>The second entry tells you everything. Who was affected, how long the query ran, whether a retry was attempted, and whether it worked. That&#8217;s the difference between &#8220;something broke&#8221; and &#8220;I know exactly what broke and for whom.&#8221;</p><h2>Where to start if you have nothing</h2><p>If you&#8217;re starting from scratch or retrofitting an existing system, here&#8217;s what I&#8217;d do in order:</p><p>1. Pick a structured logger for your language. In Go, use <code>log/slog</code>. In Node.js, pino is the safe choice. In Python, the standard <code>logging</code> module with a JSON formatter works fine. Don&#8217;t build your own.</p><p>2. Add a correlation ID middleware. Generate an ID on the first service that receives a request. Propagate it through headers on every outgoing call.</p><p>3. Log at every boundary. Request in, request out, external call made, external call returned.</p><p>4. Write to stdout. Let the infrastructure handle shipping logs somewhere searchable.</p><p>5. Agree with your team on what each log level means. Write it down. Two sentences per level is enough.</p><p>That&#8217;s it for the first pass. You can add metrics, traces, and fancier tooling later. But these five steps will transform your ability to debug production issues. I&#8217;ve done this exact sequence on three different projects now, and each time the improvement was immediate.</p><p>The honest truth is that structured logging isn&#8217;t exciting. Nobody&#8217;s going to write a blog post about how your JSON log lines are a breakthrough. But the first time you search for a correlation ID and see the entire request path across four services in under a minute, you&#8217;ll wonder how you ever worked without it.</p>]]></content:encoded></item><item><title><![CDATA[Connection Pooling and Resource Management Under Load]]></title><description><![CDATA[The outage nobody sees coming until every connection is gone]]></description><link>https://diegom7s.com/p/connection-pooling-and-resource-management</link><guid isPermaLink="false">https://diegom7s.com/p/connection-pooling-and-resource-management</guid><dc:creator><![CDATA[Diêgo]]></dc:creator><pubDate>Tue, 19 May 2026 22:51:48 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/30fc86c5-3958-41db-b1ef-815b6f46a6d1_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>You know the kind of outage that comes from a traffic spike that wasn&#8217;t even that big? The database is fine. The app servers have plenty of CPU and memory. But every request is hanging, timing out, and eventually failing. The culprit is connection pool exhaustion. Every available connection to <a href="https://postgresql.org/">PostgreSQL</a> is checked out, waiting on slow queries that have piled up, and new requests can&#8217;t get a connection at all. The queue backs up, health checks start failing, and <a href="https://kubernetes.io/">Kubernetes</a> does what Kubernetes does: it kills the pods and restarts them. Fresh pods come up, try to establish connections, hit the same wall, and die again.</p><p>The whole thing cascades in about four minutes. Twelve minutes of availability can be gone before someone manually scales down the traffic.</p><p>The part worth dwelling on isn&#8217;t the outage itself. It&#8217;s how invisible the problem is before it happens. Most teams have dashboards for CPU, memory, error rates, request latency. Nobody is watching connection pool utilization. And the failure mode isn&#8217;t gradual. It&#8217;s a cliff.</p><p>This is worth thinking about whenever a team spins up a new service without thinking carefully about how it manages connections to its dependencies. So I want to walk through what actually happens when connection pools go wrong, why the defaults are almost always wrong for production, and what recovery looks like when you&#8217;re already in the hole.</p><h2>What connection pooling actually solves</h2><p>Opening a new database connection is expensive. For PostgreSQL, each connection spawns a new backend process. There&#8217;s a TCP handshake, TLS negotiation if you&#8217;re using encryption (you should be), authentication, and some session setup. That easily takes 20 to 50 milliseconds, sometimes more. If every incoming request opens a fresh connection and closes it when done, you&#8217;re burning a lot of time and putting real pressure on the database server&#8217;s process table.</p><p>A connection pool solves this by keeping a set of pre-established connections ready. Your application checks one out, uses it, returns it. The next request grabs the same connection without paying the setup cost.</p><p>Simple idea. The complexity is in the details.</p><h2>The defaults will hurt you</h2><p>Most connection pool libraries ship with defaults that work fine for development and fall apart under load. This pattern shows up across languages and databases. The pool size is set to something like 10 or 20, the connection timeout is either infinite or very long, and there&#8217;s no limit on how long a connection can be held.</p><p>Here&#8217;s the thing: the right pool size depends on your database, your query profile, your concurrency model, and your infrastructure. There&#8217;s a well-known formula from the PostgreSQL wiki that suggests <code>connections = (core_count * 2) + effective_spindle_count</code> for the database side. For SSDs, the spindle count is effectively 1. So a 4-core database server might optimally handle around 9 to 10 concurrent active queries. Not 100. Not 200.</p><p>But most teams don&#8217;t think about it from the database&#8217;s perspective. They think about it from the application side: &#8220;I have 50 concurrent requests, so I need 50 connections.&#8221; That math ignores the fact that the database itself becomes slower with too many concurrent connections. Context switching, lock contention, shared buffer pressure. More connections past the sweet spot means every query gets slower, which means connections are held longer, which means you need more connections. It&#8217;s a death spiral.</p><p>The approach that tends to work well for most web applications is to keep the per-instance pool small (something like 5 to 15 connections per application instance) and use an external connection pooler like <a href="https://pgbouncer.org/">PgBouncer</a> in front of PostgreSQL when you have many application instances. PgBouncer multiplexes hundreds of application-side connections onto a much smaller number of actual database connections. It&#8217;s been around for ages, it&#8217;s battle-tested, and it handles the mismatch between &#8220;lots of app instances&#8221; and &#8220;database that works best with fewer connections&#8221; really well.</p><h2>What exhaustion looks like from the inside</h2><p>Let me walk through a concrete scenario. Here&#8217;s the sequence of events in a typical incident.</p><p>Picture a marketing campaign that drives a traffic spike around 2x normal peak. Not extreme. The app servers can handle the request volume. But several of the hot endpoints run queries that, under normal load, complete in 5 to 15 milliseconds. Under the increased concurrency, those queries start competing for the same rows and indexes. Lock contention pushes some of them to 200, 500, even 800 milliseconds.</p><p>Each slow query holds a connection from the pool for that entire duration. Say the pool size is 20 per instance. With 4 instances, that&#8217;s 80 connections total. PostgreSQL&#8217;s <code>max_connections</code> is set to 100 (the default). You&#8217;re already close to the ceiling.</p><p>As queries slow down, connections aren&#8217;t returned fast enough. New requests come in and wait for a connection. If the pool&#8217;s checkout timeout is set to 30 seconds, which is absurdly long, requests just sit there. Waiting. The client-side timeout on the API gateway is 10 seconds, so users see errors, but the server-side requests are still holding spots in the queue.</p><p>Health check endpoints share the same connection pool. When the pool is exhausted, health checks can&#8217;t get a connection, time out, and Kubernetes marks the pods as unhealthy. Rolling restarts kick in. New pods come up, try to establish 20 connections each to PostgreSQL, push the total connection count over <code>max_connections</code>, get rejected, and crash.</p><p>This is the part that tends to catch people off guard. The recovery mechanism (pod restarts) makes the problem worse. Each restart attempt consumes connections during initialization, adding pressure to a database that&#8217;s already at its limit.</p><h2>The fixes, in order</h2><p>The fix breaks into roughly three phases: stop the bleeding, tune the pools, then add visibility so you see it coming next time.</p><h3>Immediate stabilization</h3><p>First, manually reduce the replica count to 2 instances. Fewer instances mean fewer total connections. The database can breathe again. Queries start completing normally once contention drops.</p><p>Then kill the long-running queries manually using <code>pg_terminate_backend</code>. Ideally you&#8217;d have a statement timeout configured in PostgreSQL itself from the start; if not, this is the moment you wish you did. Setting <code>statement_timeout</code> to 5 seconds for the application role is a good baseline. If a query runs longer than that, something is wrong and it&#8217;s better to fail fast than hold connection hostage.</p><h3>Pool configuration changes</h3><p>Reduce the pool size per instance from 20 to 8. This feels counterintuitive at first. Fewer connections per instance? But with 6 instances running (scale up the instance count instead), that&#8217;s 48 application-side connections. More than enough for typical throughput, and well within what PostgreSQL handles efficiently.</p><p>Set the checkout timeout to 2 seconds. If a request can&#8217;t get a connection within 2 seconds, it fails immediately with a clear error. This is important. A short checkout timeout converts invisible queuing into visible errors. You can monitor error rates. You can&#8217;t easily monitor &#8220;requests silently waiting in a pool queue.&#8221;</p><p>Add a connection max lifetime. Connections that have been open for more than 30 minutes get recycled. This prevents problems with stale connections, server-side session bloat, and helps after database failovers where old connections might be pointing at the wrong host.</p><p>Add idle connection cleanup too. If a connection has been sitting unused for more than 5 minutes, close it. No reason to hold resources you&#8217;re not using.</p><h3>Health check isolation</h3><p>This one is worth repeating until it sticks. Your health check endpoint should never depend on the same resource pool as your business logic. Move health checks to use a separate, tiny connection pool (2 connections). Some teams just check if the process is alive without hitting the database at all for liveness probes, and only check the database for readiness probes. Either approach works. The point is that your orchestrator&#8217;s ability to assess your application&#8217;s health shouldn&#8217;t compete with actual traffic.</p><p>Remember that cascading pod restart problem? It goes away once health checks have their own pool. Even when the main pool is under pressure, readiness probes can still respond, so Kubernetes stops killing pods that are actually making progress.</p><h2>This isn&#8217;t just about databases</h2><p>I&#8217;ve focused on PostgreSQL here because that&#8217;s where this pain shows up most vividly. But connection exhaustion follows the same pattern everywhere you maintain persistent connections or limited resource pools.</p><p><a href="https://redis.io/">Redis</a> connection pools behave similarly. HTTP client pools for outbound service calls have the same dynamics. gRPC channels maintain underlying connections that can get saturated. Thread pools in worker-based architectures (think Java or .NET) exhibit identical cliff-edge behavior when saturated.</p><p>The pattern is always the same:</p><p>1. A resource pool has a fixed size.</p><p>2. Under load, consumers hold resources longer than expected.</p><p>3. New consumers can&#8217;t acquire resources and start queuing.</p><p>4. The queue grows faster than resources are released.</p><p>5. Timeouts (or lack thereof) determine whether the failure is visible and fast, or invisible and cascading.</p><p>I could be wrong about this, but I think most production outages trace back to some form of resource exhaustion rather than the thing people usually blame first (like &#8220;the database was slow&#8221;). The database was slow <strong>because</strong> it had too many connections. The service was slow <strong>because</strong> the HTTP client pool was saturated. The root cause is usually the pool management, not the downstream system.</p><h2>What to actually monitor</h2><p>There are a few metrics I now consider non-negotiable for any service that talks to a database or external dependency.</p><p>Pool utilization as a percentage: active connections divided by max pool size. Alert when this stays above 80% for more than a minute. Not on a spike. On sustained pressure. Brief spikes to 90% during a burst are normal. Sitting at 85% for five minutes means you&#8217;re one slow query away from exhaustion.</p><p>Checkout wait time: how long requests spend waiting for a connection. If this number is usually zero and suddenly it&#8217;s not, something changed. This is the metric that catches slow-burn pool pressure &#8212; wait times creep up, you investigate, you find something like a missing index on a new table, and you fix it before users notice anything.</p><p>Connection creation rate: how often the pool is creating new connections. A sudden spike in new connection creation can indicate connection churn (connections being closed and reopened rapidly) or a failover event where all connections got invalidated at once.</p><p>Pool error count: connection checkout timeouts, failed connection attempts, connections rejected by the database. These should normally be zero. Any non-zero value is worth investigating.</p><h2>A note on connection poolers</h2><p>I mentioned PgBouncer earlier. I want to be clear about when it&#8217;s worth adding and when it&#8217;s just more complexity.</p><p>If you have 2 to 4 application instances with well-tuned pool sizes, you probably don&#8217;t need PgBouncer. The total connection count stays manageable, and the extra hop adds latency (small, but it&#8217;s there) and another component to operate.</p><p>If you have 20 or 50 or 200 instances (common in Kubernetes deployments with aggressive autoscaling), you almost certainly need something between your apps and the database. Without it, autoscaling events can overwhelm PostgreSQL&#8217;s connection limit. PgBouncer in transaction mode works well here. It assigns a real database connection only for the duration of a transaction, then returns it to the shared pool. This means 200 application instances can share, say, 50 real database connections.</p><p>The gotcha with transaction-mode pooling is that you can&#8217;t use session-level features: prepared statements (in some configurations), <code>SET</code> commands, <code>LISTEN/NOTIFY</code>, temp tables. If your application relies on those, you need to think carefully about which pooling mode to use. Teams can burn days debugging weird behavior that traces back to PgBouncer silently swapping the underlying connection mid-sess<code>on.</code></p><h2>What&#8217;s worth doing up front</h2><p>The fixes for this kind of outage are all things you can configure from day one. Statement timeouts, reasonable pool sizes, checkout timeouts, separate health check pools, basic pool monitoring. None of it is novel. None of it requires new technology.</p><p>The problem is that connection pooling rarely gets treated as something that needs active design. It&#8217;s left as a library default. Set it up once, forget about it. And that works right up until it doesn&#8217;t, and when it stops working, it stops all at once.</p><p>If you&#8217;re setting up a new service today, it&#8217;s worth spending maybe an hour on pool configuration before writing any business logic. Set the pool size based on the database&#8217;s capacity divided by the expected number of instances. Set a checkout timeout of 1 to 3 seconds. Set a statement timeout on the database role. Set up the four metrics listed above. Wire health checks to a separate pool or at least a separate connection.</p><p>That hour saves you from a 2 a.m. page six months later. I&#8217;m pretty confident about that.</p><p>The other thing worth doing, and it&#8217;s the kind of thing that&#8217;s easy to skip until you&#8217;ve watched this pattern play out somewhere, is load testing specifically for pool exhaustion. Not just &#8220;can the system handle 10,000 requests per second&#8221; but &#8220;what happens when a downstream dependency gets slow.&#8221; Inject latency into your database calls and watch what the pool does. Watch where the cliff is. You want to find it in a test environment at 3 p.m., not in production at midnight.</p><p>Connection pools are one of those things that feel boring until they&#8217;re the only thing that matters. Treat them like infrastructure, not like library configuration, and they&#8217;ll stay boring. Which is exactly what you want.</p>]]></content:encoded></item><item><title><![CDATA[Production-Grade AI Agents That Won't Break at 3 AM]]></title><description><![CDATA[Most AI agent tutorials stop right before the part where everything falls apart. This one doesn't.]]></description><link>https://diegom7s.com/p/production-grade-ai-agents-that-wont</link><guid isPermaLink="false">https://diegom7s.com/p/production-grade-ai-agents-that-wont</guid><dc:creator><![CDATA[Diêgo]]></dc:creator><pubDate>Sun, 26 Apr 2026 20:13:44 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/a797f990-4ac1-46d0-b946-602058a94f25_1731x909.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Every AI agent demo I&#8217;ve seen works perfectly. The agent calls a tool, gets a response, formats it nicely, done. Fifteen seconds, clean terminal output, applause.</p><p>Then you deploy it. And it calls the same tool four times in a loop because the LLM hallucinated a retry instruction. Or it silently eats an error and returns a confident, completely wrong answer to your user. Or it runs for 47 minutes burning tokens on a task that should take 10 seconds.</p><p>I&#8217;ve been building agent-based systems for the past few months, and the gap between &#8220;works in my notebook&#8221; and &#8220;runs in production without waking me up&#8221; is enormous. This is my attempt to write down what I&#8217;ve actually learned about closing that gap. Some of this I&#8217;m confident about. Some of it I&#8217;m still figuring out.</p><h2><strong>The problem nobody talks about in agent tutorials</strong></h2><p>AI agents are stateful, non-deterministic processes that make decisions at runtime. That sentence sounds obvious, but it has consequences that most tutorials skip.</p><p>A traditional API endpoint receives a request, does some work, returns a response. The work is predictable. You can write tests for it. You can set timeouts. You know the blast radius.</p><p>An agent is different. It decides what to do next based on LLM output, which means you can&#8217;t fully predict the execution path. It might call one tool or five. It might finish in 2 seconds or loop for a minute. It might encounter an error from an external API and decide (on its own) to retry, or to try a completely different approach, or to give up and hallucinate an answer.</p><p>This is why durability matters so much for agents. Not durability in the &#8220;survives a server restart&#8221; sense (though that too), but durability in the broader sense: the agent should behave predictably even when the world around it doesn&#8217;t.</p><h2><strong>Step 1: Put boundaries on everything</strong></h2><p>Before you think about orchestration patterns or fancy frameworks, the single most useful thing you can do is constrain your agent&#8217;s behavior.</p><p>I mean this literally. Set hard limits on:</p><ul><li><p>Maximum number of LLM calls per task (I usually start with 10 and adjust)</p></li><li><p>Maximum wall-clock time per agent run</p></li><li><p>Maximum tokens spent per run</p></li><li><p>Maximum number of tool invocations</p></li></ul><p>Without these, a confused agent will happily burn through your entire monthly API budget in one run. I&#8217;ve seen it happen. Not to me, thankfully. Okay, once to me.</p><p>Here&#8217;s what a simple bounded agent loop looks like in TypeScript:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;typescript&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-typescript">async function runAgent(task: string, tools: Tool[], options: AgentOptions) {
  const maxSteps = options.maxSteps ?? 10;
  const maxDurationMs = options.maxDurationMs ?? 30_000;
  const startTime = Date.now();
  const messages: Message[] = [{ role: "user", content: task }];
  let steps = 0;

  while (steps &lt; maxSteps) {
    if (Date.now() - startTime &gt; maxDurationMs) {
      return { status: "timeout", steps, messages };
    }

    const response = await callLLM(messages, tools);
    messages.push(response);
    steps++;

    if (response.toolCalls &amp;&amp; response.toolCalls.length &gt; 0) {
      for (const call of response.toolCalls) {
        const result = await executeToolWithTimeout(call, 5000);
        messages.push({ role: "tool", content: result, toolCallId: call.id });
      }
    } else {
      return { status: "complete", steps, messages };
    }
  }

  return { status: "max_steps_exceeded", steps, messages };
}</code></pre></div><p>Nothing fancy. But notice the return type always includes <code>status</code>. That's the first principle: <strong>every agent run should terminate with an explicit status, not just a response.</strong> You need to know whether it finished, timed out, or hit a limit. This is the thing that makes the difference between "it worked" and "I can monitor and alert on it."</p><h2><strong>Step 2: Make tool execution the reliability boundary</strong></h2><p>Your agent is only as reliable as its tools. And tools fail. APIs return 500s, databases time out, rate limits kick in.</p><p>The pattern I&#8217;ve found most useful: wrap every tool in its own error boundary, with its own timeout, and return structured results regardless of success or failure. The LLM is surprisingly good at handling &#8220;this tool failed with error X&#8221; if you give it that information cleanly. What it&#8217;s terrible at is handling a thrown exception that kills the entire agent loop.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;typescript&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-typescript">async function executeToolWithTimeout(
  call: ToolCall,
  timeoutMs: number
): Promise&lt;string&gt; {
  const controller = new AbortController();
  const timer = setTimeout(() =&gt; controller.abort(), timeoutMs);

  try {
    const tool = toolRegistry.get(call.name);
    if (!tool) {
      return JSON.stringify({
        error: true,
        message: `Unknown tool: ${call.name}`,
      });
    }

    const result = await tool.execute(call.arguments, {
      signal: controller.signal,
    });
    return JSON.stringify({ error: false, data: result });
  } catch (err) {
    const message =
      err instanceof Error ? err.message : "Tool execution failed";
    return JSON.stringify({ error: true, message });
  } finally {
    clearTimeout(timer);
  }
}</code></pre></div><p>The key insight: <strong>never throw from tool execution.</strong> Always return a structured result. Let the LLM decide what to do with failures. This is one of those things I&#8217;m quite certain about after watching agents in production for a while.</p><h2><strong>Step 3: Think about durability for long-running agents</strong></h2><p>Short agents that finish in a few seconds? The pattern above is probably enough. But once agents start running for minutes, or need to survive server restarts, or coordinate with other agents, you need something more. This is where the concept of durable execution comes in. If the process dies after step 3 of 7, you should be able to resume from step 3 instead of starting over.</p><p>I think this matters more than most people realize. In serverless environments especially, your function might get killed by the platform after a timeout. Without checkpointing, that&#8217;s a complete waste of every token and API call that already happened.</p><p>The principle is straightforward even if you don&#8217;t use a specific durability framework. After each significant step (LLM call, tool result, decision point), persist the agent&#8217;s state somewhere. A database, a queue, a file. Whatever your infrastructure supports. Then build your agent loop to accept a &#8220;resume from&#8221; parameter.</p><p>I&#8217;m not going to pretend I&#8217;ve nailed this perfectly. My current approach is to store the full message history after each step in Postgres, with a run ID and step number. If the process crashes, a recovery worker picks up incomplete runs and resumes them. It&#8217;s not elegant but it works.</p><h2><strong>Step 4: Parallel agents are powerful and dangerous</strong></h2><p><a href="https://blog.pragmaticengineer.com/new-trend-programming-by-kicking-off-parallel-ai-agents/">The Pragmatic Engineer</a> blog recently covered an interesting trend: developers kicking off multiple AI agents in parallel to work on different parts of a codebase simultaneously. The idea is that instead of one agent doing everything sequentially, you split the work and let multiple agents tackle sub-tasks at the same time.</p><p>I&#8217;ve been experimenting with this and it&#8217;s genuinely useful. But it introduces failure modes that sequential agents don&#8217;t have.</p><p>The obvious one: what happens when agent 3 out of 5 fails? Do you retry just that one? Do you cancel all of them? Does the output of agent 3 depend on agents 1 and 2?</p><p>Here&#8217;s the pattern I&#8217;ve settled on for parallel agent work:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;typescript&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-typescript">interface AgentTask {
  id: string;
  prompt: string;
  tools: Tool[];
  dependsOn?: string[];
}

async function runParallelAgents(tasks: AgentTask[]): Promise&lt;Map&lt;string, AgentResult&gt;&gt; {
  const results = new Map&lt;string, AgentResult&gt;();
  const pending = new Map(tasks.map((t) =&gt; [t.id, t]));

  while (pending.size &gt; 0) {
    const ready: AgentTask[] = [];

    for (const [id, task] of pending) {
      const depsResolved = (task.dependsOn ?? []).every(
        (dep) =&gt; results.has(dep) &amp;&amp; results.get(dep)!.status === "complete"
      );
      if (depsResolved) ready.push(task);
    }

    if (ready.length === 0 &amp;&amp; pending.size &gt; 0) {
      for (const [id] of pending) {
        results.set(id, { status: "blocked", steps: 0, messages: [] });
        pending.delete(id);
      }
      break;
    }

    const batchResults = await Promise.allSettled(
      ready.map(async (task) =&gt; {
        const depContext = (task.dependsOn ?? [])
          .map((dep) =&gt; results.get(dep))
          .filter(Boolean);
        const contextualPrompt = buildContextualPrompt(task.prompt, depContext);
        const result = await runAgent(contextualPrompt, task.tools, {
          maxSteps: 10,
          maxDurationMs: 30_000,
        });
        return { id: task.id, result };
      })
    );

    for (const settled of batchResults) {
      if (settled.status === "fulfilled") {
        results.set(settled.value.id, settled.value.result);
        pending.delete(settled.value.id);
      } else {
        const failedTask = ready.find(
          (t) =&gt; !results.has(t.id)
        );
        if (failedTask) {
          results.set(failedTask.id, {
            status: "error",
            steps: 0,
            messages: [],
          });
          pending.delete(failedTask.id);
        }
      }
    }
  }

  return results;
}</code></pre></div><p>Notice the dependency graph. Some agents can run in parallel, but others depend on earlier results. The orchestrator resolves dependencies, runs independent tasks concurrently, and handles failures without killing the entire batch.</p><p>I&#8217;m going to be honest: the error handling here is something I&#8217;m still iterating on. The &#8220;blocked&#8221; status when dependencies can&#8217;t be resolved feels like the right thing, but I haven&#8217;t tested it under enough real scenarios to be certain.</p><h2><strong>Step 5: Observe everything, trust nothing</strong></h2><p>Remember the observability point from step 1? It comes back here, and it&#8217;s even more important with agents than with normal services.</p><p>For every agent run, I log:</p><ul><li><p>Total steps taken</p></li><li><p>Total tokens consumed (prompt and completion separately)</p></li><li><p>Wall-clock duration</p></li><li><p>Which tools were called and how many times</p></li><li><p>The terminal status (complete, timeout, max_steps, error)</p></li><li><p>Whether the agent retried any tool calls</p></li></ul><p>This is how you catch the patterns that kill you. &#8220;Hey, the invoice-processing agent has been averaging 8 steps for the past week, but today it&#8217;s averaging 14.&#8221; That&#8217;s your early warning. Something changed in the data, or the LLM is behaving differently, or a downstream API is returning errors that cause retries.</p><p>Without these metrics, you&#8217;ll find out when your token bill arrives. Or when a user complains. Or at 3 AM.</p><p>One thing I keep going back to: the bounded execution from step 1 is what makes observability useful. If an agent can run unbounded, your metrics are meaningless because the variance is infinite. Boundaries give you a normal range to compare against.</p><h2><strong>Step 6: Test the failure modes, not just the happy path</strong></h2><p>This is the part most people skip and it&#8217;s the part that matters most.</p><p>Your tests for an agent system should include:</p><ul><li><p>What happens when the LLM returns malformed tool calls?</p></li><li><p>What happens when a tool times out on every invocation?</p></li><li><p>What happens when the agent hits its step limit without completing the task?</p></li><li><p>What happens when two parallel agents try to modify the same resource?</p></li><li><p>What happens when the LLM decides to call a tool that doesn&#8217;t exist?</p></li></ul><p>I write these as integration tests with a mock LLM that returns predefined sequences. It&#8217;s not perfect because you can&#8217;t predict every weird thing a real LLM will do. But it catches the structural failures: the ones where your orchestration logic breaks, not where the LLM says something dumb.</p><p>For the LLM-says-something-dumb cases, I rely on the boundaries from step 1 and the observability from step 5. You can&#8217;t test for every hallucination. But you can make sure hallucinations don&#8217;t cause unbounded damage.</p><h2><strong>What I&#8217;d do differently starting from scratch</strong></h2><p>If I were building a new agent system today, I&#8217;d start with the boring stuff first. Timeouts, structured tool results, status tracking, logging. Then I&#8217;d add the actual agent logic on top.</p><p>Most teams do it the other way around. They get the agent working, it&#8217;s exciting, it does cool things. Then they spend three months retrofitting all the production-hardening stuff. I&#8217;ve done this. It&#8217;s painful. The guardrails are much easier to build when you design around them from the start.</p><p>I&#8217;d also think carefully about whether I actually need agents at all. A lot of problems that people solve with agents can be solved with a well-structured prompt and a single LLM call. Agents add complexity. Every step in an agent loop is a place where things can go wrong. If your task doesn&#8217;t require dynamic tool selection or multi-step reasoning, a simpler approach is almost always better.</p><p>That said, when you do need agents (and there are real cases where you do), building them with durability in mind from day one will save you more headaches than any framework or library choice.</p><h2><strong>Where I&#8217;m still figuring things out</strong></h2><p>I don&#8217;t have a great answer for agent memory yet. For short tasks, passing the full message history works fine. For agents that run across multiple sessions or need to remember things from days ago, I&#8217;m experimenting with summarization and retrieval patterns, but nothing feels solid yet.</p><p>I also don&#8217;t have strong opinions on agent frameworks. There are a lot of them. Some seem good, some seem like thin wrappers around API calls with a lot of abstraction for abstraction&#8217;s sake. I&#8217;ve been writing my own orchestration code because it helps me understand the failure modes, but I could be wrong that this is the best use of my time.</p><p>And multi-agent coordination where agents communicate with each other, not just run in parallel, is something I&#8217;ve read about more than I&#8217;ve built. Projects like <a href="https://github.com/nex-crm/wuphf">Wuphf</a> (which uses Git and Markdown files as a shared knowledge base between agents) are interesting because they solve the coordination problem through a shared artifact instead of direct communication. That feels right to me, but I haven&#8217;t tested it enough to recommend it.</p><p>The honest summary: if you get the basics right (boundaries, structured tool results, observability, explicit status tracking), you can build agent systems that run in production without constant babysitting. The fancy orchestration patterns matter less than you&#8217;d think. The boring reliability patterns matter more.</p><p>Build the guardrails first. Then let the agents loose inside them.</p>]]></content:encoded></item><item><title><![CDATA[How to actually get better output from AI coding assistants]]></title><description><![CDATA[The patterns that separate productive AI-assisted development from expensive trial and error.]]></description><link>https://diegom7s.com/p/how-to-actually-get-better-output</link><guid isPermaLink="false">https://diegom7s.com/p/how-to-actually-get-better-output</guid><dc:creator><![CDATA[Diêgo]]></dc:creator><pubDate>Wed, 22 Apr 2026 21:51:17 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/3b9a57d3-e70b-4f1d-b3b6-f87f488baa9c_1731x909.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Most people treat AI coding assistants like a smarter autocomplete. Type a prompt, get some code, edit it until it works, repeat. And that works fine for small stuff: generating a utility function, explaining an unfamiliar API, drafting a quick test.</p><p>But when you try to use AI on real production work, things with actual constraints, team conventions, and code that has to survive contact with other engineers, that approach falls apart fast. The output is technically correct but doesn&#8217;t fit anywhere. You end up rewriting most of it anyway.</p><p>I&#8217;ve been reading through some of the most useful practical writing I&#8217;ve found on this topic, and I want to walk through three patterns that actually change the output quality in a meaningful way. Not theory. Patterns you can start using today.</p><h2>Start with context, not prompts</h2><p>Here&#8217;s a pattern I see constantly: engineers write detailed prompts but give the AI no context about where the code will live. No project structure. No team conventions. No architectural decisions. Just a description of the feature they want.</p><p>The AI fills in the blanks. And it&#8217;s good at that. But it fills them in with generic, statistically average choices, not your choices.</p><p>The fix is something called <strong>knowledge priming</strong> (or context engineering if you want to sound fancy). Before you start a session, you feed the AI the information it needs to make decisions that match your codebase.</p><p>This can be as simple as pasting your team&#8217;s style guide into the conversation. Or pointing the AI at a representative file from your codebase and saying &#8220;write new code that looks like this.&#8221; Or, and this is where it gets more structured, maintaining a document that lives in your repo and gets included in every AI session automatically.</p><p>The <a href="https://martinfowler.com/articles/reduce-friction-ai/encoding-team-standards.html">Encoding Team Standards</a> piece from Martin Fowler&#8217;s site gets into exactly this. The idea is to make your team&#8217;s conventions explicit and machine-readable, so you&#8217;re not re-explaining them every session. Things like: how you name variables, how you handle errors, what packages you prefer, what patterns you avoid. Not a vague &#8220;we care about clean code.&#8221; Specific, concrete rules the AI can actually follow.</p><p>This matters more than most people realize. The AI isn&#8217;t being sloppy when it ignores your conventions. It genuinely doesn&#8217;t know them. Give it the information and the output quality shifts noticeably.</p><h2>Build a harness before you build a feature</h2><p>This one changed how I think about AI-assisted development entirely.</p><p>The instinct when working with a coding agent is to ask it to build the thing you need. But what usually happens is the agent generates something, you&#8217;re not sure if it&#8217;s right, you ask for changes, the changes break something else, and you spend more time debugging than you would have writing it yourself.</p><p>The pattern that actually works is to build the <strong>harness first</strong>.</p><p>A harness here isn&#8217;t a testing framework in the traditional sense. It&#8217;s a set of constraints (tests, type contracts, lint rules, example inputs and outputs) that define what correct looks like before any implementation exists. You give the agent something to run against. It can iterate on its own output, catch its own mistakes, and come back to you with something that already passes your criteria.</p><p>This is the core of what <a href="https://martinfowler.com/articles/harness-engineering.html">harness engineering</a> describes. Instead of reviewing AI output by reading it and hoping you catch the bugs, you create an automated feedback mechanism. The agent fails fast, locally, on your constraints, not in production, not in code review.</p><p>Here&#8217;s what this looks like in practice. Say you&#8217;re asking an agent to implement a data transformation function. Before you write the prompt, you write:</p><p>- A few unit tests covering the expected behavior</p><p>- Type signatures that constrain the inputs and outputs</p><p>- Maybe a couple of edge cases you know are tricky</p><p>Then you give the agent the tests and tell it to make them pass. Not &#8220;build me a function that does X.&#8221; Give it something to aim at.</p><p>The difference in output quality is real. The agent has a ground truth to orient around. It stops guessing at what &#8220;correct&#8221; means and starts solving a specific, verifiable problem.</p><p>This also makes code review faster. When an agent&#8217;s output comes with passing tests, you&#8217;re not starting from scratch when evaluating it. You&#8217;re asking: are these tests sufficient? That&#8217;s a much smaller question.</p><h2>Create a feedback loop, not a conversation</h2><p>Most AI-assisted coding sessions look like a conversation. You ask for something, you get something, you give feedback, you get a revision. Back and forth.</p><p>That works. But it doesn&#8217;t scale, and it doesn&#8217;t improve over time. Every session starts from zero. Every mistake is one you have to catch yourself.</p><p>The <a href="https://martinfowler.com/articles/reduce-friction-ai/feedback-flywheel.html">Feedback Flywheel</a> pattern is about turning that conversation into something self-improving. The idea is to capture the feedback you&#8217;re giving the AI (the corrections, the style notes, the &#8220;no, not like that&#8221; moments) and encode them back into the context the AI starts with next time.</p><p>So say you&#8217;re working with an agent and it keeps generating code with a pattern you don&#8217;t use. You correct it. That correction disappears when the session ends. But if you take that correction and add it to your team standards document, it&#8217;s part of the context the next session starts with. You stopped correcting the same mistake.</p><p>Over time, this compounds. Your AI sessions get progressively less corrective work, because the common mistakes are already ruled out before the session begins. The flywheel is slow at first (encoding one convention at a time is tedious) but it pays back quickly.</p><p>The practical steps for this are roughly:</p><p>1. Run a session, collect corrections and feedback</p><p>2. Group the feedback into categories (naming, patterns, architecture, style)</p><p>3. Rewrite the corrections as rules, not commentary: &#8220;use X&#8221; not &#8220;avoid Y because...&#8221;</p><p>4. Add those rules to a shared context document that every session gets</p><p>This is also how teams start sharing AI productivity gains. If one engineer figures out a better prompt structure or catches a common mistake pattern, it shouldn&#8217;t stay in their head. It should go into the shared context, where everyone benefits automatically.</p><h2><strong>A note on what these patterns have in common</strong></h2><p>Looking at all three, the thread is the same: <strong>friction reduction through upfront investment</strong>.</p><p>Knowledge priming requires writing down your conventions explicitly. Harness engineering requires writing tests before implementation. The feedback flywheel requires capturing corrections and updating shared context. None of these feel productive in the moment. They all slow down the first session.</p><p>But they&#8217;re the difference between AI assistance that compounds and AI assistance that plateaus.</p><p>The engineers I&#8217;ve seen get the most out of these tools aren&#8217;t the ones with the cleverest prompts. They&#8217;re the ones treating the AI&#8217;s working environment with the same care they&#8217;d treat their own. Good tooling, good constraints, good feedback mechanisms.</p><p>The AI isn&#8217;t going to ask for any of this. It will work with whatever you give it. The question is whether what you give it is enough for it to do good work.</p><h2><strong>Where to start</strong></h2><p>If you&#8217;re not doing any of this yet, pick one:</p><ul><li><p><strong>Easiest</strong>: Write a short conventions document for your project. Three to five specific rules the AI should follow. Paste it at the start of every session for a week and notice what changes.</p></li><li><p><strong>Higher impact</strong>: Before your next feature task, write two or three tests that describe the expected behavior. Use those as the prompt. See if you spend less time revising.</p></li><li><p><strong>Long game</strong>: After your next AI session, write down the corrections you made. Find the most common one. Turn it into a rule in your conventions document.</p></li></ul><p>None of this requires new tools. It&#8217;s a shift in how you set up the work before the AI touches it.</p><p>That&#8217;s the whole idea. AI coding assistants are powerful, but they&#8217;re not magic. They&#8217;re tools that reflect the quality of their inputs. Make the inputs better and the outputs follow.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://diegom7s.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption"></p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Kubernetes Core Concepts Explained with a Golang Example]]></title><description><![CDATA[This article breaks down Kubernetes fundamentals through a hands-on approach. We'll explore key concepts while deploying a Go application using Kind (Kubernetes in Docker).]]></description><link>https://diegom7s.com/p/kubernetes-core-concepts-explained</link><guid isPermaLink="false">https://diegom7s.com/p/kubernetes-core-concepts-explained</guid><dc:creator><![CDATA[Diêgo]]></dc:creator><pubDate>Mon, 10 Nov 2025 23:46:47 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/7fc87e14-3bab-43c6-a750-60253fb29a67_1731x909.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3>Why Use Kind?</h3><p>Kind is an excellent tool for setting up a local Kubernetes environment. It offers:</p><ul><li><p>Run Kubernetes clusters inside Docker containers.</p></li><li><p>Quick and simple setup for local development.</p></li><li><p>Perfect for both beginners and experienced developers experimenting with Kubernetes features.</p></li></ul><h3>Who Is This Article For?</h3><p><strong>Kubernetes Beginners:</strong> Developers getting started with Kubernetes who want a practical, hands-on introduction.</p><p><strong>Experienced Developers:</strong> Those who prefer a &#8220;deploy first&#8221; approach&#8212;setting up containers and Kubernetes clusters locally before moving to cloud infrastructure.</p><h3>What You&#8217;ll Learn </h3><p>This article breaks down Kubernetes core concepts step by step. In each section, we&#8217;ll dive deeper into the fundamentals, explaining key Kubernetes components and how they interact with each other through practical examples.</p><h3>Base Project</h3><p>To keep things practical and focused on Kubernetes concepts, we&#8217;ll use a simple Go application. You can find the complete code here: <a href="https://github.com/diegom7s/gst-app.git">gst-app</a></p><h3>Containerization and Kind: Building and Managing Our Kubernetes Environment</h3><p>Containerization has transformed how applications are built, shipped, and deployed. By isolating applications and their dependencies into lightweight, self-contained packages, containers ensure consistent behavior across different environments&#8212;from development to production. Let&#8217;s explore containerization fundamentals and how we use Kind (Kubernetes in Docker) to set up our Kubernetes environment.</p><h4>Containerization: A Modern Approach to Application Deployment</h4><p><strong>What is Containerization?</strong></p><p>Containerization involves packaging an application and its dependencies into a &#8220;container&#8221;&#8212;a lightweight, portable, self-sufficient environment that runs consistently across various infrastructures.</p><p><strong>Key Benefits:</strong></p><ul><li><p><strong>Isolation:</strong> Containers provide isolated environments, preventing conflicts between applications running on the same host.</p></li><li><p><strong>Portability:</strong> Containers run on any system supporting the container runtime, ensuring consistent deployment across development, testing, and production.</p></li><li><p><strong>Scalability:</strong> Containers can easily scale up or down, making them ideal for dynamic, cloud-native applications.</p></li></ul><p><strong>Example: Dockerfile Configuration</strong></p><p>In our project, <code>dockerfile.todo</code> defines the Docker image for the <code>todo-api</code> service:</p><pre><code>FROM golang:1.23.0 AS build_todo-api

ENV CGO_ENABLED=0 GOOS=linux GOARCH=amd64
WORKDIR /app

COPY go.mod go.sum ./
RUN go mod download

COPY . .

RUN go build -o todo-api ./main.go 

FROM alpine:3.18
RUN apk --no-cache add postgresql-client
RUN addgroup -g 1000 -S todo &amp;&amp; \
    adduser -u 1000 -h /app -G todo -S todo

WORKDIR /app
COPY --from=build_todo-api --chown=todo:todo /app/todo-api /app/todo-api
USER todo
EXPOSE 8000

CMD [&#8221;./todo-api&#8221;]

LABEL org.opencontainers.image.title=&#8221;todo-api&#8221; \
      org.opencontainers.image.authors=&#8221;Di&#234;go &lt;diegomagalhaes.contact@gmail.com&gt;&#8221; \
      org.opencontainers.image.source=&#8221;https://github.com/diegom7s-dev/gst-app&#8221; \
      org.opencontainers.image.version=&#8221;1.0.0&#8221;</code></pre><p>This Dockerfile uses a two-stage build:</p><ol><li><p><strong>Build Stage</strong> (<code>golang:1.23.0 AS build_todo-api</code>): Compiles the application in a clean Go environment, ensuring the final image contains only necessary binaries.</p></li><li><p><strong>Runtime Stage</strong> (<code>FROM alpine:3.18</code>): Copies the compiled binary to a minimal Alpine Linux image, providing a lightweight runtime environment with only essential dependencies like <code>postgresql-client.</code></p></li></ol><p>By separating build and runtime stages, we optimize the image for both size and security&#8212;following containerization best practices.</p><h4>Kind: Simulating a Kubernetes Cluster in Docker</h4><p><strong>What is Kind?</strong></p><p>Kind (Kubernetes in Docker) is a tool for running local Kubernetes clusters using Docker containers as nodes. It&#8217;s excellent for local development and testing, allowing developers to create multi-node clusters without needing multiple physical or virtual machines.</p><p><strong>Example: Kind Configuration</strong></p><p>The <code>kind.config.yaml</code> file defines our Kind cluster configuration:</p><pre><code>kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  extraPortMappings:
  # Todo-Api
  - containerPort: 8000
    hostPort: 8000
  # Postgres
  - containerPort: 5432
    hostPort: 5432</code></pre><p>This configuration creates a single-node Kind cluster with a control plane role. It maps ports 8000 and 5432 from the host to the container, allowing us to access services running inside the cluster (the <code>todo-api</code> on port 8000 and PostgreSQL on port 5432) directly from our local machine.</p><h4>Integrating Containerization with Kind: Building and Running the Service</h4><p><strong>Makefile Setup for Automation</strong></p><p>Our project&#8217;s Makefile automates various tasks related to building and deploying the <code>todo-api</code> service using Docker and Kind:</p><pre><code># Define dependencies
GOLANG          := golang:1.22.2
ALPINE          := alpine:3.18
KIND            := kindest/node:v1.27.3
POSTGRES        := postgres:15.4

# Building containers
service:
    docker build \
        -f infra/docker/dockerfile.todo \
        -t $(SERVICE_IMAGE) \
        --build-arg BUILD_REF=$(VERSION) \
        --build-arg BUILD_DATE=`date -u +&#8221;%Y-%m-%dT%H:%M:%SZ&#8221;` \
        .

# Running from within k8s/kind
dev-up:
    kind create cluster \
        --image $(KIND) \
        --name $(KIND_CLUSTER) \
        --config infra/k8s/dev/kind/kind.config.yaml

    kubectl config use-context kind-$(KIND_CLUSTER)
    kubectl wait --timeout=120s --namespace=local-path-storage --for=condition=Available deployment/local-path-provisioner
    kind load docker-image $(POSTGRES) --name $(KIND_CLUSTER)</code></pre><p><strong>Key Makefile Targets:</strong></p><ul><li><p><strong>service:</strong> Builds the Docker image for the <code>todo-api</code> service using the Dockerfile at <code>infra/docker/dockerfile.todo.</code></p></li><li><p><strong>dev-up:</strong> Creates a Kind cluster using the specified configuration file and loads necessary Docker images into the cluster.</p></li></ul><p>By leveraging Docker and Kind, our setup ensures a streamlined development workflow that mirrors a production environment (within limitations). This allows us to build, deploy, and test our Go application in a local Kubernetes cluster, providing a high-fidelity environment for development and testing.</p><h3>Essential Kubernetes Components: What They Are and How to Use Them</h3><p>Understanding Kubernetes core components is fundamental to effectively deploying and managing applications. Let&#8217;s explore the key components that form the foundation of Kubernetes, using our Go application as a practical example.</p><h4><strong>1. Nodes: The Worker Machines in Kubernetes Clusters</strong></h4><p><strong>What are Nodes?</strong></p><p>Nodes are the worker machines in Kubernetes clusters. They&#8217;re responsible for running containerized applications and providing the computational resources needed to keep your applications running smoothly. Nodes can be physical servers or virtual machines, depending on your cluster configuration.</p><p><strong>Architectural Role:</strong></p><ul><li><p><strong>Runtime Environment:</strong> Nodes serve as the execution environment for your Pods. Each node runs at least a kubelet (an agent responsible for communicating with the Kubernetes control plane), a container runtime (like Docker or containerd), and kube-proxy (which maintains network rules on nodes).</p></li><li><p><strong>Resource Management:</strong> Nodes provide CPU, memory, storage, and network resources for running containers. Kubernetes manages these resources efficiently, ensuring each Pod receives the necessary resources as specified in its configuration.</p></li></ul><p><strong>Node Components:</strong></p><ul><li><p><strong>Kubelet:</strong> An agent running on each node that ensures containers are running in a Pod. It continuously monitors Pod status and communicates with the Kubernetes API server to maintain the desired state.</p></li><li><p><strong>Container Runtime:</strong> The software responsible for running containers. Popular runtimes include Docker, containerd, and CRI-O. Kubernetes supports any runtime implementing the Kubernetes Container Runtime Interface (CRI).</p></li><li><p><strong>Kube-proxy:</strong> A network proxy running on each node that manages network communication between Pods across different nodes. It implements Kubernetes networking services on each node, ensuring Pods can communicate with each other and external services.</p></li></ul><p><strong>Example in Our Project:</strong></p><p>In our project, nodes are represented by Docker containers running Kubernetes when using Kind. Each node in a Kind cluster is a Docker container, allowing us to simulate a multi-node Kubernetes cluster locally.</p><p>While we don&#8217;t have a specific YAML manifest to define nodes (since nodes are managed by the control plane), we rely on them to provide the necessary environment for our Pods and services. For example, when deploying the PostgreSQL database or the <code>todo-api</code> application, Kubernetes schedules these Pods on available nodes, utilizing their computational resources.</p><p><strong>Key Concepts Related to Nodes:</strong></p><ul><li><p><strong>Node Affinity and Anti-Affinity:</strong> Kubernetes provides mechanisms to control how Pods are scheduled on nodes. Node affinity allows you to define rules that attract Pods to certain nodes, while anti-affinity ensures Pods are distributed across nodes for improved fault tolerance.</p></li><li><p><strong>Taints and Tolerations:</strong> Used to prevent certain Pods from being scheduled on specific nodes. For example, a node can be tainted to allow only specific workloads, like those requiring GPUs, ensuring only compatible Pods are scheduled there.</p></li></ul><p><strong>Understanding Node Management:</strong></p><ul><li><p><strong>Node Status:</strong> Each node maintains a status providing essential information like node health, capacity (CPU, memory, etc.), and conditions (e.g., Ready, DiskPressure, MemoryPressure).</p></li><li><p><strong>Node Maintenance:</strong> Nodes can be marked as unschedulable when needing maintenance, preventing new Pods from being scheduled while allowing existing Pods to continue running or be rescheduled.</p></li></ul><h4>2. Pods: The Fundamental Building Block of Kubernetes</h4><p><strong>What are Pods?</strong></p><p>Pods are the smallest deployable units in Kubernetes, representing a single instance of a running process. A Pod can encapsulate one or more containers that share the same network namespace and storage. Containers within a Pod can communicate using localhost and share storage volumes.</p><p><strong>Architectural Role:</strong></p><ul><li><p><strong>Ephemeral Nature:</strong> Pods are designed to be ephemeral. When a Pod fails, Kubernetes automatically creates a new Pod to replace it rather than repairing the existing one.</p></li><li><p><strong>Container Co-location:</strong> Containers that need to share resources (like storage or networking) or must always be deployed together are grouped in a single Pod.</p></li></ul><p><strong>Example in Our Project:</strong></p><p>In our project, the StatefulSet configuration in <code>dev-database.yaml</code> defines a Pod template for running a PostgreSQL container:</p><pre><code>apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: database
  namespace: simple-go-todo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: database
  template:
    metadata:
      labels:
        app: database
    spec:
      containers:
      - name: postgres
        image: &#8216;postgres:15.4&#8217;
        volumeMounts:
        - name: data
          mountPath: /var/lib/postgresql/data</code></pre><p>This configuration ensures a Pod running a PostgreSQL database is created and maintained, with persistent storage mounted at <code>/var/lib/postgresql/data</code>.</p><h4>3. Deployments: Managing Your Application&#8217;s Desired State</h4><p><strong>What are Deployments?</strong></p><p>Deployments are abstractions that manage Pods and ReplicaSets. They provide declarative updates, ensuring the specified number of Pods is always running, and handle tasks like scaling, rolling updates, and rollbacks.</p><p><strong>Architectural Role:</strong></p><ul><li><p><strong>Scalability and Resilience:</strong> Deployments enable horizontal scaling of applications (increasing the number of replicas) to handle increased traffic or workload.</p></li><li><p><strong>Rolling Updates and Rollbacks:</strong> Support zero-downtime updates by incrementally updating Pods with new application versions and can roll back to a previous version if needed.</p></li></ul><p><strong>Example in Our Project:</strong></p><p>The <code>base-service.yaml</code> file specifies a Deployment for our todo application:</p><pre><code>apiVersion: apps/v1
kind: Deployment
metadata:
  name: todo
  namespace: simple-go-todo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: todo
  template:
    metadata:
      labels:
        app: todo
    spec:
      containers:
      - name: todo-api
        image: service-image</code></pre><p>This Deployment manages the lifecycle of the <code>todo-api</code> Pod, ensuring one instance is always running and can be scaled as needed.</p><h4>4. Services: Stable Networking for Your Pods</h4><p><strong>What are Services?</strong></p><p>Services provide stable network endpoints for accessing Pods within a Kubernetes cluster. They abstract network access to Pods, enabling communication within the cluster and with external clients.</p><p><strong>Service Types:</strong></p><ul><li><p><strong>ClusterIP:</strong> Exposes the Service on an internal cluster IP, accessible only within the cluster</p></li><li><p><strong>NodePort:</strong> Exposes the Service on a static port on each node&#8217;s IP</p></li><li><p><strong>LoadBalancer:</strong> Provisions an external IP to load balance traffic across nodes</p></li><li><p><strong>ExternalName:</strong> Maps a Service to an external DNS name</p></li></ul><p><strong>Architectural Role:</strong></p><ul><li><p><strong>Decoupling:</strong> Services decouple clients from the underlying Pod IP addresses, which can change if Pods are recreated or rescheduled</p></li><li><p><strong>Service Discovery:</strong> They provide a consistent interface for service discovery, allowing other applications to reliably discover and communicate with Pods</p></li></ul><p><strong>Example in Our Project:</strong></p><p>The <code>dev-todo-patch-service.yaml</code> creates a Service for the <code>todo-api</code>:</p><pre><code>apiVersion: v1
kind: Service
metadata:
  name: todo-api
  namespace: simple-go-todo
spec:
  type: ClusterIP
  ports:
  - name: todo-api
    port: 8000
    targetPort: todo-api</code></pre><p>This Service enables internal cluster communication to access the <code>todo-api</code> on a stable IP and port.</p><div><hr></div><h4>5. ConfigMaps and Secrets: Managing Configuration and Sensitive Data</h4><p><strong>What are ConfigMaps and Secrets?</strong></p><ul><li><p><strong>ConfigMaps:</strong> Store non-sensitive configuration data in key-value pairs</p></li><li><p><strong>Secrets:</strong> Store sensitive data like passwords, OAuth tokens, and SSH keys, base64-encoded</p></li></ul><p><strong>Architectural Role:</strong></p><ul><li><p><strong>Separation of Configuration and Code:</strong> Enable separating configuration from application code, making applications portable and easier to manage</p></li><li><p><strong>Secure and Flexible Management:</strong> Secrets ensure sensitive data is managed securely, while ConfigMaps provide a flexible way to manage configurations without hardcoding values</p></li></ul><p><strong>Example in Our Project:</strong></p><p>We use a ConfigMap to configure PostgreSQL settings in <code>dev-database.yaml</code>:</p><pre><code>apiVersion: v1
kind: ConfigMap
metadata:
  name: pghbaconf
  namespace: simple-go-todo
data:
  pg_hba.conf: |
    local   all             all                                     trust
    # IPv4 local connections:
    host    all             all             0.0.0.0/0               trust
    # IPv6 local connections:
    host    all             all             ::1/128                 trust
    # Allow replication connections from localhost, by a user with the
    # replication privilege.
    local   replication     all                                     trust
    host    replication     all             0.0.0.0/0               trust
    host    replication     all             ::1/128                 trust</code></pre><p>This ConfigMap stores PostgreSQL&#8217;s access control configuration, mounted as a file in the Pod.</p><h4>6. StatefulSets: Managing Stateful Applications</h4><p><strong>What are StatefulSets?</strong></p><p>StatefulSets manage the deployment and scaling of a set of Pods, providing guarantees about the ordering and uniqueness of these Pods.</p><p><strong>Architectural Role:</strong></p><ul><li><p><strong>Stateful Application Management:</strong> Ideal for managing stateful applications where each Pod must have a unique identity and stable persistent storage</p></li><li><p><strong>Stable Network Identity and Storage:</strong> Ensures each Pod has a unique, stable network identity and can maintain persistent storage across restarts</p></li></ul><p><strong>Example in Our Project:</strong></p><p>The <code>dev-database.yaml</code> file uses a StatefulSet to deploy a PostgreSQL instance:</p><pre><code>apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: database
  namespace: simple-go-todo
spec:
  selector:
    matchLabels:
      app: database
  replicas: 1
  template:
    metadata:
      labels:
        app: database</code></pre><p>This configuration provides stable identity and persistent storage for our PostgreSQL database.</p><h3>Kubernetes&#8217; Declarative Model</h3><p>Kubernetes uses a declarative model where you define your application&#8217;s desired state, and Kubernetes continuously works to maintain that state. By defining your application components as YAML manifests, you can easily manage and scale your applications in a Kubernetes cluster. This approach contrasts with imperative models where each step is executed manually, offering a more scalable and resilient way to manage applications.</p><p>Understanding these Kubernetes objects and their architecture is crucial to leveraging Kubernetes&#8217; full potential. In our project, we apply these concepts to deploy a Go application, providing a practical example of how each component fits into the overall architecture.</p><h3>Putting It Into Practice: Running the Project with Makefile Commands</h3><p>In this final section, we&#8217;ll walk through the step-by-step process of building, deploying, and running our Go application using the commands defined in the Makefile. This will provide a comprehensive understanding of how each command contributes to the overall deployment process and ensure everything works correctly in your Kubernetes environment.</p><p>The Makefile simplifies the workflow by automating repetitive tasks. Let&#8217;s break down the main commands and what happens when you execute each one.</p><h4>1. Installing Dependencies</h4><p>The first step in setting up our environment is installing all necessary dependencies. The Makefile provides a target to install these dependencies using Homebrew (feel free to adapt this for your preferred package manager):</p><pre><code>dev-brew:
    brew update
    brew list kind || brew install kind
    brew list kubectl || brew install kubectl
    brew list kustomize || brew install kustomize
    brew list pgcli || brew install pgcli</code></pre><p>This step ensures all necessary tools are available on your machine to interact with the Kubernetes cluster and manage configurations.</p><h4>2. Pulling Docker Images</h4><p>Before building our custom Docker image, we need to ensure we have the necessary base images:</p><pre><code>dev-docker:
    docker pull $(GOLANG)
    docker pull $(ALPINE)
    docker pull $(KIND)
    docker pull $(POSTGRES)</code></pre><p>This script pulls the specified Docker images for Go, Alpine, Kind node, and PostgreSQL. These images are the foundation for building our custom application image and running our local Kubernetes cluster.</p><h4>3. Building the Docker Image</h4><p>The Makefile includes a command to build the Docker image for our <code>todo-api</code> service:</p><pre><code>service:
    docker build \
        -f infra/docker/dockerfile.todo \
        -t $(SERVICE_IMAGE) \
        --build-arg BUILD_REF=$(VERSION) \
        --build-arg BUILD_DATE=`date -u +&#8221;%Y-%m-%dT%H:%M:%SZ&#8221;` \
        .</code></pre><p><strong>Docker Image Build:</strong> This command builds the Docker image using the Dockerfile at <code>infra/docker/dockerfile.todo</code>. It tags the image with the version specified in the <code>VERSION</code> variable (like <code>todo-api:0.0.1</code>).</p><p><strong>Build Arguments:</strong> <code>BUILD_REF</code> and <code>BUILD_DATE</code> are passed as build arguments to incorporate versioning information and build metadata into the image.</p><p>The resulting Docker image contains the compiled Go application, ready to be deployed to our Kubernetes cluster.</p><h4>4. Creating the Kind Cluster</h4><p>To simulate a Kubernetes environment locally, we use Kind to create a new cluster:</p><pre><code>dev-up:
    kind create cluster \
        --image $(KIND) \
        --name $(KIND_CLUSTER) \
        --config infra/k8s/dev/kind/kind.config.yaml

    kubectl config use-context kind-$(KIND_CLUSTER)
    kubectl wait --timeout=120s --namespace=local-path-storage --for=condition=Available deployment/local-path-provisioner
    kind load docker-image $(POSTGRES) --name $(KIND_CLUSTER)</code></pre><ul><li><p><strong>Creating the Kind Cluster:</strong> The <code>kind create cluster</code> command creates a new Kubernetes cluster named <code>sgt-kind-cluster</code> using the specified Kind node image (<code>kindest/node:v1.27.3</code>) and configuration file (<code>kind.config.yaml</code>).</p></li><li><p><strong>Setting Kubernetes Context:</strong> <code>kubectl config use-context</code> switches the current Kubernetes context to the new Kind cluster, allowing subsequent <code>kubectl</code> commands to interact with it.</p></li><li><p><strong>Waiting for Storage Provisioner:</strong> The <code>kubectl wait</code> command waits until the <code>local-path-provisioner</code> deployment is available, ensuring the cluster is ready to provision storage volumes.</p></li><li><p><strong>Loading Docker Image into Cluster:</strong> <code>kind load docker-image</code> loads the PostgreSQL Docker image into the Kind cluster, making it available for our application.</p></li></ul><h4>5. Deploying the Application to Kubernetes</h4><p>With the cluster configured and images loaded, we can now deploy our application and its dependencies.</p><p><strong>Using Kustomize to Manage Kubernetes Configurations</strong></p><p><strong>What is Kustomize?</strong></p><p>Kustomize is a Kubernetes-native tool that allows you to customize Kubernetes resource configurations without modifying the original YAML files. It&#8217;s especially useful for managing different environments (like development, testing, and production) from a common base of configuration files. Using Kustomize, we can automatically generate customized manifests for our cluster by applying specific overlays that adjust configurations as needed.</p><pre><code>dev-apply:
    kustomize build infra/k8s/dev/database | kubectl apply -f -
    kubectl rollout status --namespace=$(NAMESPACE) --watch --timeout=120s sts/database
    
    kustomize build infra/k8s/dev/service | kubectl apply -f -
    kubectl wait pods --namespace=$(NAMESPACE) --selector app=$(APP) --timeout=120s --for=condition=Ready</code></pre><ul><li><p><strong>Apply Database Configuration:</strong> <code>kustomize build</code> generates Kubernetes manifests for the database configuration from base YAML files. <code>kubectl apply -f -</code> applies these configurations to the cluster, creating necessary resources (e.g., StatefulSet for PostgreSQL)</p></li><li><p><strong>Wait for Database Deployment:</strong> The <code>kubectl rollout status</code> command waits for the PostgreSQL StatefulSet to be fully deployed and running before proceeding</p></li><li><p><strong>Apply Service Configuration:</strong> The process repeats for the <code>todo-api</code> service, ensuring the service and its dependencies are deployed to the cluster</p></li><li><p><strong>Wait for Pods to Be Ready:</strong> <code>kubectl wait pods</code> ensures all Pods associated with the <code>todo-api</code> application are running and ready before completing the deployment process</p></li></ul><h4>6. Testing Application Endpoints</h4><p>Finally, we can use the Makefile to test our REST API endpoints and verify everything is working as expected:</p><pre><code>test_all: create get_all get_one update delete</code></pre><p>By following these steps, you can successfully build, deploy, and test your Go application in a local Kubernetes cluster using Docker and Kind. The Makefile automates much of this process, making it easier to manage and reducing the risk of errors.</p><h3>Conclusion</h3><p>By combining containerization, Kubernetes, and Kind, we can create a powerful and flexible local development environment that closely resembles a production setup (within limitations). This approach enables efficient development, testing, and iteration, ensuring your applications are robust, scalable, and ready for deployment in real-world environments.</p>]]></content:encoded></item></channel></rss>