Blog
Writing
Writing about the systems behind the projects: AI workflows, private-cloud gremlins, and the process of turning weird ideas into real builds.
Writing
Latest posts
Featured first, then freshest.
Build Notes
FeaturedBuilding Weird Ideas Into Real Systems
A note on turning chaotic ideas into scoped, shippable systems without sanding off the fun parts.
- Product Thinking
- Systems
- Architecture
- Creativity
AI Development
FeaturedAI-Assisted Development Without Losing the Architecture
How I think about using AI agents to move faster while keeping systems modular, reviewable, and sane.
- AI
- Claude
- Architecture
- TypeScript
- Validation
Systems Thinking
Authentication Is a Relay Race
Federated and proxied authentication isn't one check — it's a request handed between many parties, any of which can reject it, drop it, or pass it on. If you can't see which leg failed, you can't debug it.
- Systems Thinking
- Networking
- Authentication
- Debugging
Engineering Practice
Know What Your Reload Actually Reloads
A reload almost never applies everything — some changes a hot reload picks up while others silently need a full restart, and if you don't know which is which you'll swear you applied a change you didn't.
- Engineering Practice
- Operations
- Configuration
- Debugging
Systems Thinking
Multi-Tenancy Is a Day-One Decision
Tenant isolation, per-tenant billing and audit, and a white-label boundary are architecture you design before the first line, not a feature you bolt on later. Retrofitting tenancy into a single-tenant app is a rewrite.
- Systems Thinking
- Architecture
- SaaS
- Design
Infrastructure
Restore in Another Failure Domain, or It Isn't Disaster Recovery
A backup you can only restore in the same place it came from isn't disaster recovery — it's a convenience copy. Real DR means rehearsing a restore into a different failure domain, with everything that requires.
- Infrastructure
- Backups
- Disaster Recovery
- Reliability
Engineering Practice
Validate Config When You Load It, Not When You Use It
Configuration that's only resolved at the moment it's used fails at the worst possible time — mid-request, in production. Resolve and validate config at startup so a bad value fails fast and loud instead of late and quiet.
- Engineering Practice
- Configuration
- Reliability
- Operations
Security
Don't Trust the Client With State You Didn't Sign
Sometimes the architecture forces you to route state through an untrusted client — a redirect, a cookie, a token. When you do, the client is a courier you can't trust, so the payload has to be signed or encrypted and validated on the way back.
- Security
- Architecture
- Web
- Cryptography
Engineering Practice
Routing Rules Are Data, Not Code
When the decisions a system makes change far more often than the machinery that executes them, the decisions belong in a data table — so a routing change is an edited row, not a code deploy.
- Engineering Practice
- Architecture
- Configuration
- Design
Engineering Practice
Scope Shared Code by Who's Allowed to See It
Deduplicating shared code is good, but "shared with everyone" is its own risk. The durable pattern is one canonical source, scoped by audience so nothing leaks where it shouldn't, deployed as a boring flat artifact.
- Engineering Practice
- Architecture
- Code Reuse
- Design
Infrastructure
Some Failures Only Clear on a Cold Boot
A warm reboot doesn't reset everything. When hardware gets wedged in a stuck state, a driver reload or a soft restart can leave the fault in place — and a full power-drain is the one move that actually clears it.
- Infrastructure
- Hardware
- Homelab
- Debugging
AI Development
Your AI Assistant Has a Meter Running
AI coding tools are a metered utility, not a flat-rate magic box. Treating their usage like a cloud bill — budgeted, attributed, and matched to the job — is the difference between leverage and a surprise.
- AI Development
- Automation
- Cost
- Workflows
Systems Thinking
A Plugin Is a Contract and a Blast Radius
Plugin architectures buy you modularity, but each plugin is also a trust boundary and a failure boundary — and the success, failure, and error paths around every invocation are the part people forget to design.
- Systems Thinking
- Architecture
- Reliability
- Design
Systems Thinking
Choose Storage by How It Fails, Not by Its Spec Sheet
A feature list like "all-flash" tells you how storage behaves on a good day. What actually matters is how it behaves degraded, under load, and on the worst device you bought — and those don't fit on the spec sheet.
- Systems Thinking
- Infrastructure
- Storage
- Reliability
- Homelab
Engineering Practice
Diagnose Read-Only Before You Touch Anything
You can learn almost everything about a broken system without changing a single thing — and building the whole picture from read-only commands first is what keeps a diagnosis from becoming a second outage.
- Engineering Practice
- Operations
- Debugging
- Reliability
Engineering Practice
Know How Long Your State Lives
Every variable has a lifetime — this request, this session, the whole process, or a cache somewhere — and most state bugs come from treating one lifetime as if it were another.
- Engineering Practice
- Architecture
- State Management
- Debugging
Security
Public DNS Is Free Reconnaissance
Every public DNS record you publish is a fact you've handed to anyone curious about your infrastructure. Deciding whether to expose an origin, proxy it, or tunnel it is a security choice, not a convenience one.
- Security
- Networking
- DNS
- Infrastructure
Security
A Public Network Treats Every Device as Hostile
The moment a network is open to people you don't control — a guest SSID, a room of shared machines, a venue Wi-Fi — every device on it is potentially hostile, and the design has to start from that assumption.
- Security
- Networking
- Segmentation
- Infrastructure
Security
Automation Needs an Identity, and That's a Credential You Now Own
Before automation can do anything privileged it has to act as someone — and that machine identity is a real credential with a lifecycle, not a side effect you can skip.
- Security
- Automation
- PKI
- Operations
AI Development
Save Your Prompts — They're Tools, Not Chat Messages
A prompt that reliably does a job is reusable tooling. Name it, version it, keep it next to the work it drives, and make it a system prompt instead of retyping it every session.
- AI Development
- Automation
- Prompting
- Workflows
Engineering Practice
Structured Logs Beat Clever Sentences
A log line is a message to a future reader — usually you at 3am, often a query engine. Design logs for that reader with stable messages, structured context, and deliberate levels instead of concatenating values into prose.
- Engineering Practice
- Observability
- Logging
- Operations
Infrastructure
The Slowest Replica Sets Your Write Latency
In synchronously replicated storage a write isn't finished until every copy acknowledges it, so the slowest device in the cluster sets write latency for everyone — and reads will happily hide that from you.
- Infrastructure
- Storage
- Performance
- Homelab
- Reliability
Infrastructure
Lift-and-Shift Is a Risk Assessment, Not a Dockerfile
Containerizing an old application looks like writing a Dockerfile, but the real work is finding every assumption the app makes about its host — a hardware-bound license, inbound connections, files it rewrites itself — and proving the container won't break them.
- Infrastructure
- Docker
- Legacy
- Migration
Automation
Adopting a Script Into Your Platform Is a Refactor, Not a Copy-Paste
Folding a pile of loose one-off scripts into a maintained automation system isn't a move-the-files job — it's a refactor that gates each action, strips hardcoded credentials, sources artifacts properly, and makes everything idempotent and reversible.
- Automation
- Ansible
- Security
- Engineering Practice
Systems Thinking
A Zero-Downtime Upgrade Is Quorum Math and a One-Way Door
Upgrading a replicated datastore without downtime comes down to two things — keeping a quorum alive while you take one node at a time, and knowing exactly where rollback stops being "revert" and becomes "restore from backup."
- Systems Thinking
- Databases
- Reliability
- Automation
Automation
Deploy With the Access You Actually Have
When I couldn't get the rights to set a VM's network the official way, cloud-init's guestinfo datasource let each clone self-configure with nothing but edit-settings and power-on — a lesson in routing around a permission you're never going to be granted.
- Automation
- Infrastructure
- Provisioning
- Cloud-Init
Systems Thinking
A Shared Name Is Not a Shared File
Before you consolidate a helper that's been copied into a dozen projects, prove the copies are actually identical — a shared filename is not evidence of shared content, and the copies have usually already drifted.
- Systems Thinking
- Engineering Practice
- Refactoring
- Source Control
Engineering Practice
Your Load Generator Is Part of the Experiment
A benchmark measures the thing generating load as much as the system under test — so isolate the client, change one variable at a time, and never let a single generator feed many targets at once.
- Engineering Practice
- Benchmarking
- Performance
- Operations
Automation
AI Agents for Small Business: Useful Automation or Expensive Chaos Machine?
A practical guide to AI agents for small businesses: what they are, when to use them, when not to, and how to build safe human-in-the-loop workflows with logging, approvals, and guardrails.
- AI Agents
- AI Automation
- Small Business
- Workflow Automation
- Systems Design
- Consulting
Automation
AI Automation for Small Business: How to Start Without Building a Haunted House
A practical guide to AI automation for small businesses: where to start, what to automate first, what to avoid, and how to build workflows that are secure, useful, and maintainable.
- AI Automation
- Small Business
- Workflow Automation
- Systems Design
- Consulting
Software Consulting
API Integrations for Small Business: How to Connect the Tools That Run Your Operations
A practical guide to API integrations for small businesses: how to connect SaaS tools, automate workflows, sync data, reduce manual entry, and build maintainable business systems.
- API Integrations
- Small Business
- Workflow Automation
- SaaS Integration
- Systems Integration
- Internal Tools
- Consulting
Software Consulting
Custom Software vs SaaS: When a Small Business Should Build Instead of Buy
A practical guide for small businesses deciding between SaaS tools, automation platforms, and custom software. Learn when to buy, when to build, and how to avoid expensive software mistakes.
- Custom Software
- Small Business
- SaaS
- Internal Tools
- Systems Design
- Consulting
Consulting
Fractional CTO for Small Business: When You Need Technical Leadership Before a Full-Time Hire
A practical guide for small businesses and founders deciding when to bring in fractional CTO support, technical consulting, architecture review, automation strategy, or software planning.
- Fractional CTO
- Technical Consulting
- Software Architecture
- Small Business
- Technology Strategy
- Systems Design
Software Consulting
Internal Tools for Small Business: When Spreadsheets and SaaS Stop Being Enough
A practical guide to internal tools for small businesses: when spreadsheets stop working, when SaaS gets messy, and how custom dashboards, workflows, and lightweight software can improve operations.
- Internal Tools
- Small Business
- Custom Software
- Workflow Automation
- Dashboards
- Operations
- Consulting
Infrastructure
Going IPv6-Only Is a DNS Problem First
When you stand up an IPv6-only host, the first thing to break usually isn't your application — it's name resolution, because the resolver and the services behind the names have to be reachable over v6 too.
- Infrastructure
- Networking
- IPv6
- DNS
Websites
Service Landing Pages for Small Businesses: How to Turn What You Do Into Pages That Rank and Convert
A practical guide to building small business service landing pages that explain your offer, support SEO, build trust, and turn visitors into leads.
- Service Pages
- Technical SEO
- Small Business
- Website Strategy
- Landing Pages
- Conversion
Websites
Slow Website? A Practical Speed Optimization Checklist for Small Businesses
A practical website speed optimization checklist for small businesses covering images, JavaScript, fonts, third-party scripts, Core Web Vitals, hosting, caching, analytics, and maintainability.
- Website Performance
- Speed Optimization
- Core Web Vitals
- Technical SEO
- Small Business
- Consulting
Security
Small Business Cybersecurity Checklist: Practical Security Without Enterprise Theater
A practical small business cybersecurity checklist covering MFA, passwords, backups, email security, device management, vendor risk, access control, logging, and incident readiness.
- Cybersecurity
- Small Business
- IT Security
- Security Audit
- Risk Management
- Consulting
Operations
Small Business Technology Audit: Find the Mess Before It Finds You
A practical small business technology audit guide covering SaaS tools, accounts, access, security, automations, websites, backups, documentation, vendors, and workflow risk.
- Technology Audit
- Small Business
- IT Systems
- SaaS Audit
- Cybersecurity
- Workflow Automation
- Consulting
Websites
Small Business Website Maintenance Checklist: What to Check After Launch
A practical website maintenance checklist for small businesses covering updates, backups, forms, analytics, SEO, performance, security, content, and monthly review tasks.
- Website Maintenance
- Small Business
- Technical SEO
- Website Security
- Website Performance
- Consulting
Websites
Static Website vs WordPress: What Small Businesses Should Choose
A practical guide for small businesses deciding between a static website and WordPress, covering speed, security, SEO, maintenance, editing, cost, and when each option actually makes sense.
- Static Websites
- WordPress
- Small Business
- Web Performance
- Technical SEO
- Web Development
Operations
Technical Documentation for Small Business: Runbooks, SOPs, and Knowledge Bases That Actually Help
A practical guide to technical documentation for small businesses: how to create SOPs, IT runbooks, workflow docs, knowledge bases, and operational notes that reduce risk and make systems easier to maintain.
- Technical Documentation
- SOPs
- Runbooks
- Small Business
- Knowledge Base
- IT Operations
Websites
Website Redesign SEO Checklist: How to Rebuild Without Throwing Away What Works
A practical website redesign SEO checklist for small businesses planning a rebuild, migration, or platform change without losing useful pages, traffic, redirects, metadata, analytics, or trust.
- Website Redesign
- Technical SEO
- Website Migration
- Small Business
- Web Performance
- Consulting
Engineering Practice
Query the Data Without Standing Up the System
When you only need to run queries against a system's data, you usually don't need to run the system — a search is a wire protocol plus a matching engine, and the matching engine often ships as a library you can point straight at a file.
- Engineering Practice
- Tooling
- LDAP
- Debugging
Operations
Small Business Backup and Disaster Recovery Checklist: Recover Without Panic
A practical backup and disaster recovery checklist for small businesses — critical systems, ransomware-resistant backups, restore testing, RPO/RTO in plain English, documentation, and business continuity.
- Backups
- Disaster Recovery
- Small Business
- IT Operations
- Business Continuity
- Consulting
Infrastructure
The Last sysctl Drop-In Wins
A kernel tuning value I set kept reverting — not ignored but overridden — because /etc/sysctl.d drop-ins load in filename order and the last file to set a key wins.
- Infrastructure
- Linux
- Performance
- Operations
Infrastructure
Your CI Runner Is a Production Server in Disguise
A CI runner feels like invisible plumbing until it fills its disk, leaks credentials, or drifts out of date — it's a real server with state, identity, and a lifecycle, so treat it like one.
- Infrastructure
- CI/CD
- Operations
- Security
Field Notes
Field Note: Automation Needs a Dry-Run Mode
A field note on the cheapest safety feature in automation — a preview that shows exactly what would happen before anything does. Build the dry run first, and the blast radius stays understandable.
- Field Notes
- Automation
- Operations
- Reliability
Field Notes
Field Note: Small Slices Beat Big Rewrites
A field note on resisting the full rewrite — especially now that AI makes throwing out working code feel cheap. Replace systems in small, reversible slices instead of one heroic leap.
- Field Notes
- AI
- Refactoring
- Process
Automation
Automation Needs a Panic Button
Automation is only trustworthy when a human can understand it, audit it, pause it, and recover from it. Dry runs, logs, rollback, and a manual override are the price of trust.
- Automation
- Operations
- Reliability
- AI
- Least Privilege
Engineering Practice
Documentation Is Infrastructure
Docs aren't a side quest. For secure systems, automation, homelabs, and AI-built projects, documentation is part of the system itself.
- Documentation
- Security
- Automation
- Operations
- Systems
Systems Thinking
Gaming Taught Me Systems Thinking
Competitive team games teach communication, role clarity, feedback loops, and incident review — and those lessons map onto engineering better than they have any right to.
- Gaming
- Systems
- Teams
- Communication
- Community
Infrastructure
Homelabs Teach the Messy Parts
A homelab hands you the operational failures that polished cloud dashboards quietly handle for you — and those messy parts are exactly the lessons worth having.
- Homelab
- Infrastructure
- Operations
- Networking
- DNS
Security
Security Is Architecture, Not Decoration
Security works best when it's built into boundaries, deploy paths, and secrets handling from the start — not painted on at the end.
- Security
- Architecture
- Least Privilege
- Secrets
- Operations
AI Development
Small Slices Beat Big Bang AI
AI agents earn their keep when work is cut into reviewable, testable slices. Big-bang prompts make chaos; small slices make momentum.
- AI
- Workflow
- Pull Requests
- Validation
- Process
Automation
A Query That Matches Nothing Is a Silent Bug
An automation run reported success and changed nothing — its XPath matched zero nodes because the file had a default namespace it didn't account for. Selectors that find nothing don't error, they no-op, so any automation that edits via a selector has to assert it actually found the target.
- Automation
- Debugging
- XML
- Reliability
Engineering Practice
Tag Your Telemetry Where It's Born
Once log events from different environments pool together downstream, you can't reliably tell them apart again — so stamp the origin onto each event at the source, where the fact is still known for certain.
- Engineering Practice
- Observability
- Logging
- Operations
Infrastructure
Replication Across Failure Domains, Not Just Machines
Three copies of your data sitting in one room is one fire away from zero copies — what protects you isn't the replica count, it's whether the copies live in things that can fail independently.
- Infrastructure
- Reliability
- Backups
- Architecture
Infrastructure
Stop Poking Holes in Your Firewall to Reach Your Own Stuff
Reaching a self-hosted service from anywhere usually doesn't require an inbound port at all — an outbound tunnel plus authentication is a smaller attack surface than a port-forward, and a name that resolves still isn't a service you can reach.
- Infrastructure
- Networking
- Security
- Homelab
Engineering Practice
Build Automation That Doesn't Assume One Distro
Porting an automation suite from one Linux family to another is mostly mechanical until it isn't — the real work is the paths, package names, and service quirks you hardcoded without noticing, plus a few sharp edges that only appear on the new OS.
- Engineering Practice
- Ansible
- Linux
- Automation
Engineering Practice
Migrating Live Data Without Touching the Callers
When external callers depend on a function by name and you can't change their queries, you migrate by changing what the function does behind a stable signature — in reversible steps, with one explicit point of no return.
- Engineering Practice
- Databases
- Migration
- Reliability
Build Notes
Same Input, Different Result? Look Outside Your Code
A CI job that fails before it runs a line of your code, then passes on the very next run with nothing changed, is telling you the problem lives in the environment — and a retry is a mitigation, not a fix.
- Build Notes
- CI/CD
- Operations
- Debugging
Security
Certificates Should Renew Themselves
A TLS certificate you renew by hand is an outage with a calendar invite — the durable fix is to treat certificates as a declarative, auto-renewing lifecycle with the issuing identity kept in a secret store, not a runbook.
- Security
- PKI
- Automation
- TLS
Security
Environment Variables Are Not a Secrets Manager
Moving a password out of a config file into an environment variable is a real improvement over committing plaintext, but a process environment is readable by anyone with host access — it's a step up, not a vault.
- Security
- Secrets
- Configuration
- Operations
Automation
When the Failure Happens Before Your Code Runs
A CI job that dies before it executes a line of your code isn't your bug — it's infrastructure, and the fix is telling transient failures apart from real ones instead of chasing your own tail.
- Automation
- CI/CD
- Reliability
- Docker
Engineering Practice
Teach Your Scripts Where They're Running
If a script needs to behave differently depending on the network it's on, it should detect its environment from ranked, reliable signals — not from the most convenient value, which is usually the one that lies.
- Engineering Practice
- Networking
- Automation
- Scripting
Systems Thinking
Config Management Is Not a Scheduler
Provisioning, scheduling, and orchestration are three different jobs, and the moment you make your configuration tool moonlight as a cron you've invented a single point of failure for work that was supposed to be reliable.
- Systems Thinking
- Automation
- Operations
- Reliability
Engineering Practice
Write the Risk Brief for the Person Who Signs the Check
A technical risk only gets acted on when someone with budget understands it, so the brief that lands isn't the one with the most findings — it's the one that leads with the ask, frames impact in business terms, lays out options with costs, and makes "do nothing" an explicit signed decision.
- Engineering Practice
- Communication
- Security
- Risk
Automation
Automatic Updates That Can't Reboot Aren't Done
Default unattended upgrades on a Linux server quietly do less than people assume — they skip non-security updates, never reboot, and can deadlock for months on a prompt no one is there to answer.
- Automation
- Linux
- Operations
- Reliability
Systems Thinking
Findings Without Owners Are Just Complaints
A list of everything wrong with a system is noise — a useful assessment ranks each problem by real risk, pairs it with a one-line fix and a named owner, and sequences the work so the safest changes go first.
- Systems Thinking
- Engineering Practice
- Operations
- Reliability
Engineering Practice
Test Fixtures Deserve a Coverage Map
A folder full of saved test inputs isn't a test suite until you know which behavior each one exercises and, more importantly, which behaviors nothing covers at all.
- Engineering Practice
- Testing
- Quality
- Documentation
Infrastructure
Backups Are a Restore Problem
The size of your data is the least interesting thing about backing it up — what matters is whether you can restore, whether you'd notice a silently failing backup, and whether the thing you backed up was even consistent.
- Infrastructure
- Backups
- Reliability
- Operations
Systems Thinking
A Null Result Is Still a Result
I ran a benchmark to prove a tuning change made things faster, and it came back inconclusive — twice. The honest finding wasn't a speedup; it was that the test couldn't show one, because the workload never exercised the thing I'd tuned. Reporting that is more valuable than torturing the data into a story.
- Systems Thinking
- Performance
- Benchmarking
- Honesty
Infrastructure
A Golden Image Has to Forget Who It Was
Cloning a configured machine into a reusable template only works if you first wipe its identity — machine-id, SSH host keys, hostname, and NIC pinning — because anything you leave behind gets duplicated onto every clone.
- Infrastructure
- Linux
- Provisioning
- Homelab
Automation
"It Ran" Is Not "It Worked"
Some runtimes have no compiler to catch your mistakes, so you have to define what "it worked" means yourself — and a single signal like "it didn't crash" is the easiest one to fool yourself with.
- Engineering Practice
- Testing
- Automation
- Reliability
Systems Thinking
A Knowledge System That Compounds Instead of Accumulates
Most note-taking optimizes for capture, which is the easy half — the value comes from a system built around using notes, organized by what they're for and connected so the right one resurfaces when you can act on it.
- Systems Thinking
- Knowledge Management
- Workflow
- Writing
AI Development
An AI Agent Needs Context, Not Just a Connection
Wiring an AI agent to a real, heavily-customized system is the easy part; making it correct means encoding the system's domain model as durable context, caching what's stable, and only going live for what actually changes.
- AI Development
- AI Agents
- MCP
- Context Engineering
Engineering Practice
Stop Extrapolating, Build the Rehearsal
When the runtime of a big operation is the thing you're most unsure about, an estimate scaled up from a tiny sample is a guess wearing a number's clothes — build a production-scale rehearsal instead, and watch the real bottleneck show up somewhere you didn't expect.
- Engineering Practice
- Performance
- Testing
- Databases
Automation
A Runbook Is Only Valid in the Environment It Was Written For
Inherited procedures carry invisible assumptions about the world they were written in — internet access, a reachable mirror, a particular OS — and those assumptions break silently the moment you run the runbook somewhere different.
- Automation
- Operations
- Documentation
- Runbooks
Systems Thinking
Trust the System of Record, Not the Ticket
A ticket can be confidently, specifically wrong — so before you act on the IPs, hostnames, or targets it hands you, verify them against the systems that actually know the truth, and triangulate across more than one.
- Systems Thinking
- Operations
- Debugging
- Infrastructure
Security
Where Encryption Belongs in Your Stack
Disk encryption and field-level encryption both say "the data is encrypted," but they defend against completely different attackers — so the right question isn't whether to encrypt, it's which layer, against whom.
- Security
- Encryption
- Threat Modeling
- Databases
Build Notes
Keeping Plaintext Out of Git History With Clean/Smudge Filters
Git can transparently encrypt files on commit and decrypt them on checkout using clean/smudge filters — a genuinely useful trick for shared lab secrets, with one security trade-off you have to understand before you trust it.
- Build Notes
- Git
- Security
- Tooling
Security
Build Ops Tools That Are Safe by Construction
When a tool can change production, safety shouldn't live in the operator's caution — it should live in the shape of the tool: read-only by default, the dangerous verbs quarantined and clearly named, and confirmation that scales with the blast radius.
- Security
- Operations
- Tooling
- Automation
AI Development
Give Your Agent Durable Context, Not a Fresh Briefing Every Time
An AI agent that re-learns your conventions every session is wasting most of its usefulness — the leverage comes from durable, versioned context files, scoped tools, and explicit rules for what's cached versus always-live.
- AI Development
- AI Agents
- Workflow
- Automation
Automation
Pay the Expensive Setup Cost Once
A tool that rebuilt the same slow SSH tunnel on every run taught me to make expensive setup persistent and shared — reused across calls, health-checked before it's trusted, and rebuilt only when it's actually dead.
- Automation
- SSH
- Performance
- Tooling
AI Development
Compile Your Notes, Don't Re-Read Them
Most AI knowledge setups re-derive the same understanding from raw sources on every question — a better pattern has the model distill sources into an interlinked wiki once, so knowledge compounds instead of being rediscovered each time.
- AI Development
- Knowledge Management
- AI Agents
- Workflow
Systems Thinking
Measure the Metric That Drives the Load, Not the One That's Easy to Count
Capacity and cost decisions go wrong when you size against the obvious number — host count, row count, user count — instead of the metric that actually drives the work. Find the real driver first.
- Systems Thinking
- Performance
- Capacity Planning
- Infrastructure
Security
Vulnerability Management Is a Triage Problem, Not a Scanning Problem
Running the scanner is the easy part — the actual work is turning a weekly firehose of findings into a short list of what's genuinely exploitable, while not re-investigating the same accepted risks every single week.
- Security
- Vulnerability Management
- Operations
- Patching
Security
Govern AI Tools Like Production Access, Because That's What They Are
An AI agent wired into a production system is privileged production access wearing a friendlier interface — so it deserves the same controls you'd demand of any other account that can read and change real data.
- Security
- AI Agents
- Governance
- Operations
Engineering Practice
Write the Ticket You'd Want to Receive
An issue tracker is a communication tool pretending to be a database — the conventions that make it work are less about fields and more about writing each ticket for the person who picks it up cold.
- Engineering Practice
- Collaboration
- Workflow
- Documentation
Engineering Practice
Migrating a Wiki Is a Translation Problem, Not a Copy-Paste
Moving docs from one system to another looks like export-then-import, but the value is in translating structure — turning the source's macros, links, and panels into the target's idioms instead of dumping raw markup.
- Engineering Practice
- Documentation
- Migration
- Knowledge Management
Security
When You Can't Patch It, the Network Is Your Only Control
Some systems can't be patched — the vendor is gone, the runtime is a decade old, the fixes were never written. For those, security stops being about updates and becomes entirely about what can reach them, so segmentation is the control that does the real work.
- Security
- Networking
- Legacy
- Architecture
Systems Thinking
Decision Records Capture the Why, Including the Ones You Reverse
Code shows what you decided; it never shows why, or what you rejected, or that you once chose the opposite for good reasons — a lightweight decision log captures the reasoning so later-you isn't guessing.
- Systems Thinking
- Documentation
- Architecture
- Engineering Practice
AI Development
Put a Gateway in Front of Your LLMs
Routing every team's LLM calls through one internal gateway instead of letting each app hit providers directly buys central auth, cost control, model flexibility, and an audit trail — at the price of a chokepoint you have to operate well.
- AI Development
- LLM
- Architecture
- Operations
Security
When There's No Patch, It's a Decision, Not a Ticket
A compliance scan kept flagging vulnerabilities that had no fix — the patches were never written for that end-of-life OS. Re-scanning forever doesn't help. When no patch exists, the honest move isn't another ticket, it's a decision — accept the risk in writing, or replace the platform.
- Security
- Vulnerability Management
- Risk
- Operations
Infrastructure
When a Scan Can't See the Whole Network
A ping or ARP sweep tells you which hosts are awake right now, not which addresses are actually owned — so a scan is a useful sanity check, never your source of truth for IP allocation.
- Infrastructure
- Networking
- Homelab
- IPAM
Engineering Practice
Writing the Runbook Is the Test
Writing a step-by-step install or operations guide is the fastest way to find the holes in your own understanding — if you can't write the rollback step, you don't actually have a rollback, you have a hope.
- Engineering Practice
- Documentation
- Operations
- Runbooks
Systems Thinking
Containers Rehearse the Logic, VMs Rehearse the Risk
When I needed to rehearse a risky database migration, containers were the fast, cheap way to debug the logic — but the parts that could actually hurt in production only exist on a real VM, so the honest answer was to use both.
- Systems Thinking
- Containers
- Docker
- Infrastructure
Engineering Practice
A Skeleton That Compiles Is Not a Feature
Scaffolding that maps cleanly to the spec, wires up the entrypoints, and passes its checks feels like progress — but a structure full of debug placeholders is the easy 20%, and mistaking it for done is how projects look finished while doing nothing.
- Engineering Practice
- Project Management
- Automation
- Honesty
Automation
Install Is One Verb; the Lifecycle Has Six
A deployment tool that only knows how to install leaves operators improvising the dangerous parts — upgrade, rollback, the dry run they wish they'd done. Real operational tooling ships a first-class entrypoint for every verb in a system's life, not just the first one.
- Automation
- Operations
- Deployment
- Ansible
Infrastructure
A Ping Is Not a Health Check
A load balancer happily reported a service healthy because it could ping the host — while the actual service was broken. A health check that doesn't exercise the real thing is theater, and the most common reason teams fall back to ping is that the real check rotted.
- Infrastructure
- Monitoring
- Reliability
- Load Balancing
Automation
Automate the Gathering, Not the Judgment
The right split for a recurring analysis task is to automate the tedious, error-prone gathering and formatting completely — and deliberately leave the judgment call to a human, because that's the part that needs context the machine doesn't have.
- Automation
- Operations
- Workflows
- Security
Infrastructure
A Reverse Proxy Can't Fix an App That Assumes It Owns the Root
Serving several apps under one hostname with path prefixes is clean in theory, but a static site or SPA that bakes its base URL in at build time can't be reliably proxied under a subpath — so give each one its own origin instead.
- Infrastructure
- Networking
- Nginx
- Web
AI Development
You Probably Don't Need a Vector Database
The reflex for "let an AI answer questions over my stuff" is to reach for embeddings, a vector database, and a RAG pipeline. For a lot of cases there's a simpler architecture — have the model compile your sources into a structured, interlinked set of notes once, and let it read those.
- AI Development
- Knowledge Management
- RAG
- Architecture
Security
Encrypted-in-Git Secrets: a Useful Hack With a Sharp Edge
Git's clean/smudge filters can keep plaintext in your working tree and ciphertext in history — a genuinely useful trick for shared lab credentials, as long as you're honest that it is not a substitute for a real secrets manager.
- Security
- Git
- Secrets
- Encryption
Infrastructure
Relabeling a Stateful System's Topology Is a Migration, Not an Edit
Changing where data lives in a distributed database — its datacenter labels, replication, sharding — looks like a few config tweaks but is really a streaming data migration with restarts, and the quick four-step version always glosses over the dangerous middle.
- Infrastructure
- Databases
- Distributed Systems
- Migrations
Systems Thinking
A Manual Fallback Has a Volume Ceiling
"Reconcile it by hand" is a perfectly good safety net at low volume and a quiet liability at scale — manual fallbacks have a throughput ceiling, so know the number before you lean on one.
- Systems Thinking
- Reliability
- Operations
- Scaling
Systems Thinking
Keep a Map of Your Environments
You can't reason about a fleet you can't enumerate — and environments drift into legend unless someone maintains a living map of what exists, where it is, and which thing talks to which.
- Systems Thinking
- Infrastructure
- Operations
- Documentation
Engineering Practice
Your Hot Path Is Rebuilding the Same Thing Every Call
A decryption function on a database read path was secretly rebuilding all its lookup tables from scratch on every call — hoisting that one-time setup out of the hot path made it roughly fifteen times faster without touching the algorithm.
- Engineering Practice
- Performance
- Optimization
- Cryptography
AI Development
Build Your Agent's Tools Like Real Services
The tool servers you expose to an AI agent are production services, not scripts — they need real auth, 12-factor config, typed errors, rate limiting, and observability. The "AI" part is the least interesting thing about them, and treating them like toys is how they fail in production.
- AI Development
- MCP
- Architecture
- APIs
Engineering Practice
When the Error Blames the Wrong Thing
An installer died with a missing-class error that sent me hunting for a corrupt download. The real cause was a full disk — the error was a symptom three steps downstream. The fastest debugging move is often to suspect the boring resource before the scary stack trace.
- Engineering Practice
- Debugging
- Operations
- Troubleshooting
Security
Encryption Is a Threat-Model Question, Not a Checkbox
Disk, column, and application-level encryption defend against completely different attackers — so "is the data encrypted?" is the wrong question, and "encrypted against whom?" is the one that actually picks the layer.
- Security
- Encryption
- Architecture
- Databases
Engineering Practice
Build a Local Test Harness First
When a system fails late and softly and the real platform is slow to spin up, the highest-leverage thing you can build isn't the feature — it's a local harness that boots the real engine, fires a real request, and reads the real result in seconds.
- Engineering Practice
- Testing
- Docker
- Developer Experience
Infrastructure
Failover Hides the First Failure
Automatic failover is supposed to save you, and it does — but it also silently absorbs the first outage, so you cruise along on your last healthy path with no idea you're one failure from down. Monitor each path, not just the service.
- Infrastructure
- Reliability
- Monitoring
- Resilience
Engineering Practice
Make the Error Visible Where People Are Looking
A CI job failed with a useless generic message while the real diagnostics sat in a log file the harness never read. The fix wasn't better error messages — it was teaching the tool to surface the cause where people actually look. A reporter is only as good as the log it reads.
- Engineering Practice
- CI/CD
- Observability
- Tooling
Engineering Practice
Files Are Not Modules
Splitting a monolith into separate files feels like modularity, but if they still share global state and ship as one deployable, you've just rearranged a monolith — real modularity is isolation of state, build, test, release, and ownership.
- Engineering Practice
- Architecture
- Modularity
- Refactoring
Engineering Practice
Artifacts Should Know Where They Came From
A build artifact with no provenance is a liability waiting to happen — stamp every published package with the commit, branch, build time, and pipeline that produced it, so any artifact can be traced back to the exact source that made it.
- Engineering Practice
- CI/CD
- Build Notes
- Supply Chain
Automation
Environments Should Differ by Config, Not Code
The moment dev, staging, and prod start diverging in your code instead of your data, you've lost the guarantee that what you tested is what you shipped — keep one code path and let declared configuration carry every environment difference.
- Automation
- Infrastructure
- Configuration
- Ansible
Systems Thinking
If Publishing Is Hard, Your Docs Go Stale
Documentation rots in direct proportion to how much friction sits between writing it and publishing it. The fix isn't discipline, it's plumbing — make publishing an automatic side effect of saving your work, and the docs stay current because staying current stopped being a chore.
- Systems Thinking
- Documentation
- Automation
- CI/CD
Infrastructure
Eventual Consistency Has Homework
A replicated database had every replica quietly drifting apart because the anti-entropy repair that reconciles them had never once run. "Eventually consistent" isn't a property you get for free — it's a promise that depends on maintenance somebody has to schedule.
- Infrastructure
- Databases
- Distributed Systems
- Reliability
Engineering Practice
Monorepo or Polyrepo? Pick Both
The monorepo-versus-polyrepo debate is a false binary — a thin git superproject that pins a set of independent repos as submodules gives you independent lifecycles and one coherent checkout at the same time.
- Engineering Practice
- Git
- Architecture
- Source Control
Engineering Practice
Reuse the Platform's Primitives Instead of Reinventing Them
Before you build a backup, sync, or dedup system, check what the platform already gives you — the best tools wrap native primitives like atomic snapshots and immutable files rather than competing with them, and that's usually the right layer to build at too.
- Engineering Practice
- Architecture
- Backups
- Design
Engineering Practice
Log Why It Failed, Not Just That It Failed
A legacy system I dug into got one thing exactly right — every failure logged a structured reason — a severity, a category, and a specific cause — plus a tag pointing at the exact code that emitted it. That turns a log from a wall of "denied" into something you can build dashboards and triage on.
- Engineering Practice
- Logging
- Observability
- Debugging
Automation
First-Boot Automation Races the Network
The hardest part of "configure itself on first boot" isn't the configuration — it's that the machine wakes up faster than its network does, so the automation has to wait for the network to be truly ready and then prove it before doing anything.
- Automation
- Networking
- Provisioning
- Reliability
Security
The Change That Locks You Out Flips a Default From Allow to Deny
The scariest config changes aren't the ones that obviously break something — they're the ones that silently convert an implicit allow-all into an explicit allow-list, so the thing you added quietly excludes everything you didn't.
- Security
- Linux
- Operations
- SSH
Security
A TLS Handshake Failure Is a Trust Problem, Not a Network One
A mutual-TLS connection kept failing with a certificate alert, and the instinct was to blame firewalls and connectivity. The real issue was a trust mismatch — the server was validating the client's cert against the wrong CA. Read which side rejected which certificate, and the cause is obvious.
- Security
- TLS
- Debugging
- PKI
Infrastructure
Find Out Who's Using It Before You Touch It
A database cluster everyone called a non-prod test rig turned out to have a live external client writing to it around the clock. The label was wrong, and only the connection data told the truth. Before maintenance on anything shared, verify who's actually connected — don't trust the name on the box.
- Infrastructure
- Databases
- Operations
- Investigation
Security
Patch the Class, Not the CVE
A weekly vulnerability report looks like hundreds of unique problems, but it's really a handful of recurring classes — and triaging by class, leading with the genuinely remote-exploitable, beats playing whack-a-mole with individual CVE numbers.
- Security
- Vulnerability Management
- Operations
- Systems Thinking
Engineering Practice
When You Can't Unit-Test, Make Two Signals Agree
Some systems give you no unit test and no static checker — only a running instance and its logs. The way to trust a result there is to require two independent signals to agree, because either one alone can fool you.
- Engineering Practice
- Testing
- Reliability
- Debugging
Engineering Practice
Inherited Defaults Are Decisions Nobody Made
A database I reviewed was running an old garbage collector copied forward from a previous major version, hugepages left at the distro default, and a memory-map limit far below what it needed. None of it was chosen — it was inherited. Stale defaults masquerade as decisions, and auditing them is its own kind of work.
- Engineering Practice
- Configuration
- Performance
- Operations
Engineering Practice
You Can't Rewrite What You Haven't Mapped
Before a 25,000-line undocumented rules engine could be rewritten into something modular, someone had to reconstruct what it actually did — label by label, exit code by exit code. The undocumented behavior of the old system is the real specification, and skipping the mapping step is how rewrites quietly change behavior.
- Engineering Practice
- Legacy
- Refactoring
- Architecture
Security
Access Control Is a Pipeline, Not a Switch
Granting someone access feels like flipping a switch, but in most real environments it's a multi-stage pipeline — create, own, assign, link, sync — and the change usually isn't live until some later event actually re-reads it.
- Security
- Access Control
- Operations
- Identity
Security
A Scanner's False Positives Are a Pattern
A vulnerability scanner's false positives aren't random noise — they cluster into a few predictable patterns rooted in how the scanner detects, and cataloging those patterns is what stops you from re-investigating the same non-issue every week.
- Security
- Vulnerability Management
- Operations
- Engineering Practice
Systems Thinking
Give Each Organizing Tool One Job
Folders, tags, and links each do something different, and the mess starts when you make two of them encode the same thing — assign each one a distinct job and lean on links as you scale, instead of building ever-deeper folder trees.
- Systems Thinking
- Knowledge Management
- Organization
- Note-Taking
Engineering Practice
A Note Isn't an Asset Until You Use It
Capturing information feels productive and mostly isn't — a note only earns its place when it feeds a decision, a piece of writing, a conversation, or an action. Optimize your knowledge system for output, not accumulation.
- Engineering Practice
- Knowledge Management
- Writing
- Productivity
Engineering Practice
Error Codes Are an API Contract
The numeric status codes one system returns to another are a real API contract — and when their meaning lives only in the receiver's lookup table, you've got a contract that one party can break without anyone noticing.
- Engineering Practice
- APIs
- Integration
- Architecture
Security
A Shared Service Account Is a Single Point of Contention
When several consumers integrate through one shared service account — especially a legacy one that allows a single session and locks out after a few bad logins — they don't just share access, they compete for it, and one fat-fingered password takes everyone down.
- Security
- Integration
- Authentication
- Architecture
Security
Getting a Certificate Is a Supply Chain, Not a Download
Requesting a TLS certificate in an enterprise isn't a download — it's a multi-party process with approvals, metadata that has to be exactly right, formats that differ per consumer, and a handoff to whoever installs it. That friction is precisely why certificates lapse, and why automating it matters.
- Security
- PKI
- Certificates
- Operations
Infrastructure
Your Backup Job Needs a Smoke Alarm
A backup that fails silently is worse than no backup, because it sells you false confidence — so the job has to verify it actually worked and shout when it doesn't, with the checks living in whatever actually runs the backup.
- Infrastructure
- Backups
- Reliability
- Operations
Systems Thinking
Organize by What You'll Do With It, Not What It's About
Filing notes and files by topic feels natural and quietly fails, because the same subject spans active work and dead archives — organizing by actionability instead keeps what's live in front of you and what's done out of the way.
- Systems Thinking
- Knowledge Management
- Productivity
- Organization
Automation
Teach Your Scripts Which Network They're On
If a tool needs to behave differently depending on the network it's running from, don't rely on a human to remember which one that is — have it detect its environment from ranked, reliable signals, and know which signals lie.
- Automation
- Networking
- Scripting
- Reliability
Engineering Practice
Profile Before You Tune
Tuning without a baseline isn't optimization, it's superstition — the only way to know a change helped is a controlled experiment that holds everything constant except the one variable you're testing.
- Engineering Practice
- Performance
- Benchmarking
- Systems Thinking
Engineering Practice
A Config Language Is Still Code
The domain-specific languages we use for policy, config, and rules usually have no static checker and fail late and softly at runtime — which is exactly why they need more engineering discipline than "real" code, not less.
- Engineering Practice
- Languages
- Reliability
- Code Review
Security
IPv6 Has No NAT to Hide Behind
A lot of "security" on IPv4 networks is really just an accident of NAT — and the moment you give a host a routable IPv6 address, that accidental privacy is gone and your firewall intent has to be explicit.
- Security
- Networking
- IPv6
- Architecture
Systems Thinking
The Schema Is the Real Documentation
When you inherit or integrate with an unfamiliar system, the data model tells you the truth the wiki won't — what it actually stores, what's really sensitive, and where it's secretly mid-migration.
- Systems Thinking
- Databases
- Architecture
- Engineering Practice
Engineering Practice
When in Doubt, Suspect the Cache
A huge share of "but it works on my machine" and "I already changed that" confusion is a stale cache lying to you — so when reality and your expectation disagree, clear the cache and check the authoritative source before you debug anything else.
- Engineering Practice
- Debugging
- DNS
- Networking
Engineering Practice
Git Submodules Pin, They Don't Sync
Most submodule pain comes from one wrong mental model — treating a submodule like a live link to another repo's branch, when it's really a pin to one exact commit that only moves when you move it.
- Engineering Practice
- Git
- Source Control
- Tooling
Infrastructure
Your Monitoring System Is Production Too
When you size a monitoring server, host count is the wrong number to plan around — the real driver is how many values per second you collect, and the system that watches everything else is itself a write-heavy production database you have to capacity-plan.
- Infrastructure
- Monitoring
- Capacity Planning
- Operations
Systems Thinking
A Silent Fallback Is Worse Than a Crash
When a system hits something it can't handle, falling back to a different working-looking state is more dangerous than failing loudly — because the only signal that anything went wrong is the one it just swallowed.
- Systems Thinking
- Reliability
- Networking
- Debugging
Infrastructure
Notes From a Private Cloud Gremlin
Private cloud, homelab, containers, and the unreasonable joy of making infrastructure actually yours.
- Infrastructure
- Docker
- PKI
- Homelab
- Security