The Vibe Coding Automated Programming Revolution

Software engineering is fundamentally changing. AI coding agents like Claude Code have put the vision of just telling a computer to do something in natural language and it just working out of the box in sight - the kind of natural language computer interface that Star Trek: The Next Generation and other science fictionpromised decades ago. This approach emphasizes directness and velocity over precision. You describe what you want, the AI generates something close, you iterate until it feels right.

However, as we embrace more stochastic, probabilistic code generation, the need for deterministic, pedantic verification systems has become more critical, not less. The definition of good ergonomics in software tools has fundamentally changed. "Annoying and verbose" type systems that once felt overly burdensome are now precisely what makes automated programming viable for production systems.

The Old Ergonomics: Flexibility vs. Safety

Python/JS: Maximal Expressiveness

For decades, languages like Python, Ruby, and JavaScript defined "ergonomic" programming. They clearly made design decisions prioritizing:

Rapid prototyping: Write code without upfront type declarations or boilerplate
Low friction: No compiler to satisfy, just run your code and see what happens
Expressiveness: Duck typing lets you write generic code without ceremony
Approachability: New developers could be productive quickly

The goal a lot of programmers have is "what is the shortest path to getting something running". And Python and JavaScript are great choices for this because of their flexibility.

This made perfect sense when humans were the primary authors. We have the full context of what we just wrote and over time this knowledge grows, so we can make informed decisions about additions or subtractions from the programs. We intuitively understand types from variable names and usage patterns. We catch obvious mistakes as we go. The cognitive overhead of static types felt like friction against our natural workflow. Python became the language of ML precisely because of these properties. When you're iterating on model architectures or experimenting with data transformations, you want to think about the problem, not satisfy a type checker.

The Deal We Made with the Devil

But this flexibility came with costs that we grudgingly accepted:

Runtime surprises: AttributeError, TypeError, KeyError appearing in production
Refactoring anxiety: Changing a function signature requires manually hunting down all call sites
Integration fragility: Connecting components means carefully reading documentation and hoping you followed the library and language right
Testing burden: You need comprehensive test coverage just to catch new bugs

We built elaborate defensive strategies: extensive unit tests, runtime assertions, careful code reviews, gradual typing systems like TypeScript and mypy. But fundamentally, we relied on human vigilance to catch what the language wouldn't.

This was manageable when humans wrote all the code. We could be careful. We understood our codebases. We had context.

Rust: Maximal Safety

Consider the common complaints about Rust today. Developers constantly cite "fighting the borrow checker" as a major pain point. Lifetime annotations are confusing. The compiler is pedantic about ownership and mutability in ways that feel unnecessarily restrictive. You write what seems like perfectly reasonable code and the compiler rejects it with cryptic error messages about borrowed values not living long enough. Simple tasks that take five minutes in Python require 30 minutes of appeasing the type system. The learning curve is brutal.

All of this is true when a human is writing Rust character by character. The cognitive overhead of tracking ownership, satisfying lifetime requirements, and structuring code to make the borrow checker happy is immense. You're trying to solve a problem while simultaneously managing low-level memory safety invariants.

But with AI-assisted development, this dynamic inverts completely. The LLM doesn't experience cognitive overhead from the borrow checker. It will happily try 20 different lifetime annotations until one works (provided youre willing to pay for the tokens). It doesn't get frustrated when the compiler complains about moving a value that's still borrowed. It just generates a new version. The "fighting the borrow checker" experience - which exhausts humans - is exactly what LLMs are built for: rapid iteration against deterministic feedback.

// You: "create a user service that fetches users and validates email"
// AI generates, borrow checker complains, AI regenerates, repeat 15 times...
// You see only the final result:

fn get_user(id: UserId) -> Result<User, Error> {
    // ... implementation
}

fn validate_email(user: User) -> bool {
    // ... implementation
}

fn process_user(id: UserId) -> Result<(), Error> {
    let user = get_user(id)?;
    // AI accidentally tries:
    validate_email(user.email) // COMPILE ERROR: expected User, got String
    // AI immediately fixes:
    validate_email(user) // ✓ compiles
}

The compiler catches that the agent initially mixed up user.email and user. No runtime error in production, no unit test needed for this specific case. The type system caught it mechanically, the AI fixed it automatically, and you never saw the 15 iterations it took to satisfy the borrow checker.

The complaints about Rust's difficulty are really complaints about the human experience of satisfying its strict requirements. But those strict requirements are precisely what make it ideal for AI-generated code. The pain points for humans become verification points for AI.

Declarative Infrastructure: The Nix Example

The same pattern extends beyond code to systems management. Declarative, reproducible infrastructure tools like Nix represent another case where what seemed like excessive rigor for human operators becomes ideal for AI agents.

Traditional imperative systems administration - running sequences of apt install, systemctl enable, editing config files, hoping everything works - is error-prone and fragile. It works when experienced sysadmins carefully maintain mental models of system state and manually verify each step. But it's a nightmare to automate reliably because there's no verification that the system ended up in the desired state. Tools like Ansible improved on raw shell scripts by adding idempotency and a declarative veneer, but they still operate imperatively under the hood - checking current state, computing diffs, applying changes. When an Ansible playbook fails halfway through, you're left with partial state and have to reason about what succeeded and what didn't.

Nix flips this model. You declare what you want: Python 3.11, PostgreSQL 15, these exact library versions, with these configuration files, and Nix handles the how. More importantly, Nix provides strong guarantees: builds are reproducible, rollbacks are atomic, different configurations can coexist without conflict. The system either successfully reaches the declared state or fails clearly, with no partial success states. However, many decry Nix's obtuse syntax and steep learning curve. Many developers take one look at Nix and decide apt install is good enough. They want to think imperatively not wrestle with recursive attribute sets and mkDerivation functions.

Consider asking an AI to write a bash script that installs dependencies, configures services, and sets up an environment. The AI has to reason about: What packages are already installed? What version? Will this conflict with something else? Did the previous step succeed? Is the service already running? Every system is unique. The AI generates plausible commands that might work, but verification requires actually running them and seeing what breaks.

With Nix, the AI writes a declarative configuration. The Nix evaluator immediately catches errors. When you build it, you get exactly what was declared, reproducibly, every time. The AI handles the ceremony of Nix syntax and package resolution while the human specifies intent. The declarative substrate provides verification that what gets built matches what was intended.

With AI agents, you can have it both ways. You express your intent imperatively and naturally: "Install PostgreSQL 15, create a database called 'myapp', set up connection pooling with 20 max connections, and configure it to start on boot." The AI translates this into proper Nix configuration handling the syntax, looking up the right package attributes, structuring the configuration options correctly, managing the boilerplate. You think imperatively, the AI writes declaratively, and Nix verifies the result.

This inverts the traditional adoption barrier. Instead of having to learn Nix before you can use it, you can use Nix from day one by expressing what you want in plain language. The weird syntax and verbose boilerplate become the AI's problem, not yours. You get the benefits of declarative, reproducible infrastructure without having to master the tool itself.

The same applies to all the declarative tools that have better properties but worse ergonomics: Terraform's HCL, Kubernetes manifests, etc. They're all better than imperative alternatives for reproducibility and verification, but they're harder to learn and more verbose to write. AI bridges this gap: you work imperatively, the AI generates declaratively, and the declarative substrate provides verification.

Why This Matters Now: Enter Stochastic Code Generation

Now instead of writing code character by character, we describe what we want and receive entire functions, classes, or modules. This is incredible for productivity, but it introduces new failure modes:

Plausible but wrong: AI generates code that looks right but has subtle bugs
Inconsistent assumptions: Different sections of AI-generated code may make incompatible assumptions about data types
Naming mismatches: Field names, parameter names, method names might be close but not quite right
Silent type mismatches: Passing string where int was expected, or vice versa
Lost context: AI doesn't always maintain type consistency across the codebase

When a human writes user.email, they remember that user came from a database query two lines ago and has specific fields. When AI generates it, it's making a probabilistic guess based on prior context for that particular session. That guess is often right, but not always.

The same flexibility that made Python ergonomic for human-written code becomes treacherous for AI-generated code. The AI can't "just know" the types the way a human author might, just the same way a fresh new grad being thrown a codebase needs time to understand its subtle behavior, the nuances of the language that the codebase exploits / abuses. Agents without that long form persistent experience do the best they can, sample from a distribution of valid completions.

TypeScript's Evolution

TypeScript's adoption trajectory shows us a preview of this future. When it launched in 2012, many JavaScript developers saw it as "unnecessary" ceremony. JavaScript was already working fine,why add all this type annotation boilerplate? The tooling was immature. The type definitions for popular libraries were incomplete or wrong. Developers complained about having to write interface declarations for every data structure and explicitly type every function parameter.

The turning point came as codebases scaled. When teams grew beyond five people, when code lived longer than six months, when people joined and left organizations, the lack of static checking and other mechanistic verification became a liability. You couldn't rely on humans having perfect context about all the nuances of the codebase. A new hire looks at a function that returns data, has no idea what shape that data has, and makes an incorrect assumption that ships a catastrophic bug into production.

TypeScript solved this by making the implicit explicit. Types serve as documentation that the compiler enforces. When someone changes a function signature, every call site gets flagged. When you hover over a variable in your editor, you see exactly what it is. The "unnecessary ceremony" turned out to be essential scaffolding for teams and time scales.

Now with AI coding assistants, TypeScript's value proposition has intensified. LLMs generate more accurate code when they have type information to guide them, using types as contracts to ensure correctness. The types that made human collaboration scalable now make human-AI collaboration reliable.

Why Runtime Errors Are Worse for AI-Generated Code

Running code to discover errors is orders of magnitude slower than compile-time checking. When an LLM generates code, waits for it to run, encounters a TypeError on line 47, and has to regenerate. Compile errors return in milliseconds. Runtime errors require executing code, potentially setting up test environments, databases, or external dependencies. The AI might need to generate and run code multiple times to catch all the runtime errors that static analysis would have caught immediately. In AI development in particular, runs can be on the order of months. So either you must spend a lot of effort verifying each AI generated piece of code, or you pray that in 2 weeks you wont have a crashed training run. Runtime errors also leave your system in an unknown state. Did the database transaction commit? Did the file get written? Did the message get sent? The AI has no memory of what side effects occurred before the error, making it hard to reason about what needs to be cleaned up or retried. Runtime errors can depend on execution paths, data, and timing. The same AI-generated code might work with one set of inputs and fail with another. It might work in development and fail in production. The LLM might successfully generate code that passes initial tests but contains latent bugs that only appear under specific conditions. Type systems catch these categories of errors unconditionally. In dynamically typed languages, one type error can cascade into many runtime errors. The AI fixes the first error, runs again, hits a second error caused by the same underlying type confusion, fixes that, runs again, hits a third error. With static types, all related errors are caught simultaneously, and fixing the type definition fixes all of them at once. AI coding assistants amplify both the productivity gains of flexible languages and their risks.

The New Ergonomics: Flexibility + Safety

This is where formally verified and compiler-verified systems become essential. Languages like Rust, OCaml, Haskell, and even modern TypeScript provide something critical: automated, mechanical verification that generated code is internally consistent.

The new ergonomic ideal is this: be loose and viby with intent: ideas, features, specifications, but strict and pedantic with verification. This is where previously "annoying" tools like static analyzers, linters, and type checkers play a crucial role. Good software software written with these agentic tools needs to be self-verifying. Ideally in addition to using a programming language with strong type checking, the codebase itself should be imposing additional contract layers to ensure that new code integrates as expected.

Why Pedantic Systems Are Perfect for AI-Assisted Coding

The pairing of AI code generation with strong type systems exploits a fundamental asymmetry in capabilities. LLMs are remarkably good at iterating rapidly through compiler errors without fatigue, pattern matching against type signatures to generate syntactically valid code, and exploring large search spaces of possible implementations. They maintain perfect patience through dozens of compile-fix cycles, never getting frustrated or bored. An LLM will happily regenerate code 20 times to satisfy the borrow checker or fix a type mismatch.

But LLMs are comparatively poor at catching subtle semantic bugs that compile correctly, understanding deep business logic and domain invariants, reasoning comprehensively about edge cases and error conditions, or maintaining consistency in intent across a large codebase. These require contextual understanding and judgment that current models lack.

This creates a clear division of labor. Strong type systems and compiler verification provide quick, theoretical guarantees about structural correctness, exactly the kind of mechanical feedback that LLMs can iterate against efficiently. The compiler provides immediate, precise feedback that guides the generative process toward valid implementations. The LLM fights through the pedantic mechanical details that would exhaust a human developer.

Meanwhile, those same type systems won't catch that you're calculating the discount incorrectly, that you forgot to handle the case where a user account is suspended, or that your algorithm has the wrong complexity for the scale you need. These subtle implementation issues require human review and domain knowledge. This points to what next-generation programming interfaces must solve: surfacing business logic and domain invariants clearly and explicitly, rather than allowing them to be scattered implicitly throughout the codebase. When the critical logic is concentrated and visible, humans can effectively review what matters. The type system details, the mechanical plumbing, the syntactic correctness become increasingly hidden from the human developer, the same way most people don't read the assembly for their programs. Its main role now is as feedback for the LLM during implementing a new set of business logic. Part of this work will also be implementing higher order contract verfication into the codebase so that as new additions are made, conflicts can be surfaced prior to deployement.

This suggests a future where programming interfaces have two distinct layers: a dense, type-rich substrate that the LLM interacts with directly, and a higher-level interface where humans specify intent, business rules, and constraints. The human never sees the 40 iterations the LLM took to satisfy the borrow checker. They see the business logic they specified, and tools that help them verify it does what they intended. The type system becomes infrastructure - essential, but largely invisible to the human operator.

This isn't just speculation - this pattern is already being deployed in production systems where correctness is life-or-death critical.

Real Deployments: Stochastic Generation + Deterministic Verification Today

Dual Verification in Safety-Critical Systems

This pattern of dual-system verification isn't hypothetical, it's already deployed in autonomous vehicles. NVIDIA's DRIVE platform for autonomous vehicles exemplifies this architecture. The system runs two parallel stacks: an end-to-end AI-based autonomous driving system (increasingly powered by models like Alpamayo, which uses vision-language-action models to reason about complex driving scenarios) and a traditional safety verification system.

The AI system is generative and stochastic, making probabilistic decisions about steering, acceleration, and navigation based on sensor inputs. It can handle novel situations and reason through edge cases like navigating a traffic light outage, but its decision-making process is fundamentally opaque. The safety system, in contrast, uses traditional formal verification methods that check whether the AI's proposed actions violate safety constraints.

The result is exactly the pattern we've been discussing: a stochastic generative system (the AI driver) paired with deterministic verification (the safety stack). The AI explores the solution space creatively, making real-time decisions. The formal verification system checks those decisions against hard constraints such as occupancy, speed limits, and traffic rules. The AI can propose novel solutions to complex problems, but only actions that pass formal verification get executed.

This is the template for AI-assisted development writ large. The generative system proposes, the verification system validates, and humans review the semantic correctness at a high level. When your car's autonomy stack makes a decision, you want the same confidence you get from a type-checked program: the mechanical details have been verified, so you can focus on whether the high-level behavior matches your intent.

Formal Methods and Theorem Proving

For critical systems, we're seeing interest in even stronger verification: formal methods like dependent types (Idris, Agda), proof assistants (Coq, Lean), and specification languages (TLA+, Alloy).

These tools let you specify invariants and properties that must hold, and mechanically verify that your code satisfies them. This was previously seen as too expensive for most software, but when AI can help generate both code and proofs, the cost-benefit calculation changes.

The recent breakthroughs in AI-assisted theorem proving demonstrate this isn't hypothetical. Google DeepMind's AlphaProof achieved silver-medal performance at the 2024 International Mathematical Olympiad by proving mathematical statements in Lean. More recently, they achieved gold-medal performance in 2025. These systems train by proving millions of problems, learning from each verified proof to tackle increasingly difficult challenges.

Tools like LeanDojo and Lean Copilot now make this accessible for developers. They provide AI-assisted theorem proving that integrates directly into the Lean workflow, with over 210,000 theorems already formalized in mathlib as of 2025. The AI generates proof candidates, the Lean proof assistant verifies them mechanically, and the process iterates until a valid proof is found.

This is the same pattern: you specify what properties your system should have (the theorem statement), AI explores the solution space (generating proof candidates), and formal verification ensures correctness (Lean checking each proof). The difference is that instead of catching type errors, you're proving algorithmic correctness, data structure invariants, or protocol safety properties.

Imagine: you vibe out the high-level requirements and properties your system must satisfy, AI generates an implementation and proofs that it satisfies those properties, and a proof checker verifies everything mechanically. You get the velocity of vibe coding with the confidence of formal verification.

The Cognitive Load Shift

The old model:

Writing code: Low cognitive load (flexible, forgiving)
Verifying code: High cognitive load (manual review, testing, debugging)

-- or --

Writing code: High cognitive load (fighting to satisfy the compiler, very slow iteration time)
Verifying code: Low(er) cognitive load (compiler does much of it automatically, but still some manual review, testing, debugging)

The new model:

Writing code: Medium cognitive load (express intent clearly, but let AI handle details)
Verifying code: Low(er) cognitive load (compiler does much of it automatically but still some manual review)

This is a fundamental shift in what programming ergonomics means.

What "Ergonomic" Means Now

Good software ergonomics used to mean:

Minimal ceremony
Flexibility
Freedom to experiment
Low friction for small changes

Good software ergonomics now means:

Minimal ceremony for expressing intent
Flexibility in approach, rigidity in contracts
Freedom to experiment with implementation, strictness in interfaces
Low friction for small changes with automated verification

The key insight is that we can have both velocity and safety by separating the stochastic layer (AI generation) from the deterministic layer (type checking and formal verification). Each does what it's good at.

Implications for Existing Languages

Today, the choice of tools should be guided by the principles of choosing systems which can enforce correctness and safety, letting LLMs be the interface between the creative high level design decisions (e.g. model architecture choices in a machine learning system) and a deterministic self verifiying implementation. The complaint about Nix flake being too hard to write doesn't make as much sense when the LLM will do it for you and Nix will verify it and all you have to say is "add postgres to my environment".

Python's Potential

Python is in an interesting position. It's the language of AI development, but it's poorly suited to AI-assisted application development in its default form. The solution could be more disciplined use of type hints and tools like mypy, Pydantic, and dataclasses. But this only goes so far and requires significant discipline and effort from developers and the LLMs to verify through those tools.

Projects like Mojo and efforts to improve Python's performance also point toward a future where Python gets stricter, not looser, because strictness enables better tooling and verification.

But what if we didn't retrofit existing languages? What if we designed a language from scratch for this new paradigm?

The Compiler's Job: Verification and Translation

The AI compiler should do several things:

Translates intent to implementation: Natural language becomes properly structured code
Generates verification artifacts: Adds phantom types, lifetime annotations, proofs
Enforces contracts: Ensures all requires and ensures clauses are satisfied
Iterates on errors: When the verification layer rejects code, the AI regenerates until it passes
Optimizes silently: Chooses efficient data structures and algorithms within the constraints
Builds internal verification tooling: Adds progressively more verification to the application to test against new additions that validate the semantic correctness the type system can't guarantee

This last point is crucial. Good LLM-assisted design doesn't just generate code that satisfies a type checker - it also generates the verification infrastructure to check semantic properties. When you specify ensure: vip_users_get_at_least(15%) in your discount calculation constraint, the system should generate property-based tests that verify this holds across thousands of randomized inputs. When you declare invariant: suspended_users_cannot_login, the system should generate integration tests that attempt to authenticate suspended users and verify they're rejected.

The LLM can generate both the implementation and the verification tooling in parallel, using the same declarative specifications. Your business logic constraints become executable tests. Your invariants become runtime assertions in debug builds and fuzzing targets. The type system catches structural errors, and the generated verification tooling catches semantic errors.

This is another advantage of working at the right abstraction level. When humans write tests, we often skip edge cases or forget to test invariants we assume are obvious. When an LLM generates verification tooling from explicit invariant declarations, it can be exhaustive. It doesn't get bored writing the hundredth property test. It will happily generate comprehensive test suites that check every stated constraint from every angle, because constraints are just another kind of specification to translate into code.

The human never sees most compiler errors. The AI iterates in the background until the generated code satisfies all the invariants and type constraints. What the human sees is:

✓ Constraint discount_calculation verified
✓ Invariant suspended_users_cannot_login enforced at type level
✓ Session lifetime safety proven
✓ All authentication paths preserve invariants

Or, if there's a semantic issue the type system can't catch:

⚠ Warning: discount_calculation may overflow for lifetime_value > $9,999,999,999
  Suggest: Add explicit bounds to user.lifetime_value type

⚠ Potential performance issue: authenticate() performs O(n) scan on user table
  Suggestion: Add index on email column (migrations/add_email_index.sql)

Properties of AI-Native Languages

This two-layer architecture enables several key properties:

For humans:

Write mostly declarative specifications
See business logic clearly concentrated
Get semantic feedback about domain issues
The compiler is not your problem

For LLMs:

Clear contracts to satisfy through types
Rich type system to iterate against
Mechanical verification of correctness
Freedom to explore implementation space

For the system:

Provably correct by construction
Performance optimizations without sacrificing safety
Refactoring is automated - change a constraint, AI updates all implementations
Testing focuses on business logic, not mechanical correctness

The Gap Today

We're not there yet. Current AI coding tools work with languages designed for humans writing code character by character. But the bones of this vision exist: Rust's type system, Liquid Haskell's refinement types, Dafny's verification, even SQL's declarative nature. What's missing is the integration - a language where the human interface is intent and contracts, the machine interface is rich types and proofs, and AI bridges the gap.

Early evidence suggests this approach is viable. Anthropic's experience building a C compiler in Rust with Claude demonstrated exactly this dynamic: the LLM patiently iterated through Rust's strict type system and borrow checker, generating code until it satisfied all the constraints. The compiler's mechanical feedback guided the AI toward correct implementations, catching entire classes of bugs that would have been subtle runtime errors in C. The developers focused on architectural decisions and correctness of the parsing logic, while Claude fought the borrow checker.

Conclusion: Embracing the Pedantic

This isn't a compromise or a trade-off. It's a synthesis. We get the benefits of flexible, natural-language-driven development without sacrificing correctness or maintainability. We (humans) write less boilerplate than ever, but the underlying code is verified to be consistent.

The definition of ergonomic software development has changed. Languages and tools that felt heavyweight in the era of human-only coding are the lightweight choice in the era of AI-assisted coding. When you didn't write the code yourself, you want as many guarantees as possible. Hoping that an agent writes perfect Python every time is a gamble. The future of programming isn't choosing between velocity and safety - it's using formal systems to verify stochastic processes, and that combination is more ergonomic than either approach alone.

An aside: LLMs as an abstraction layer (2/9/26)

Writing assembly is an enlightening experience. Anything is possible, but it's tedious, hard to reason about system state, and errors are easy to make, severe, and difficult to debug (as anyone who has taken ECE 391 at UIUC can attest). The burden of reasoning about system state falls entirely on the programmer. Yet assembly is far preferable to writing raw machine code or having to build an ASIC by hand.

The stack machine abstraction built on assembly enables code reuse through strict conventions (like the C calling convention). By imposing limitations through contract—register usage, stack management—two disjoint codebases can be merged to create more complex systems. C adds another abstraction layer with its type system, memory management, and abstract system definition, meaning many developers never need to think about how many bits are in an integer or where a local variable is stored in memory. Objects, methods, and interfaces introduce further contracts that make systems easier to reason about. The operating system provides a common virtual environment for multiple userland programs to interact with hardware concurrently. The GUI offers a graphical language for managing operating system state, creating a more intuitive interface.

Each abstraction layer imposes rigid rules on what is allowed that may at first feel limiting, while providing a solid foundation for building higher-order applications. The abstraction layer enforces these rules, ensuring the system remains consistent and predictable. What's missing now is a composable contract system that allows LLMs to expose a higher level of abstraction to developers with the same strong guarantees that existing abstraction layers provide.

(Relevant discussion: Software Engineering Past, Present, and Future with Grady Booch)