Pragmatic Summit: AI Revolutionizes Software Development

⚡

Key Takeaways

1At the Pragmatic Summit, the adoption of AI in software development was analyzed, highlighting the shift towards autonomous coding agents.

2Trust in the outputs of AIs, such as Opus 4.5, is crucial for developers, who compare it to the trust placed in professional teams.

3The automation of testing with coding agents, through tools like Showboat, is transforming the way developers validate code.

💡Why it matters — The increasing integration of AI in software development is redefining coding practices and raising questions about security and quality.

Phases of AI Adoption

At the Pragmatic Summit, the discussion began with an in-depth exploration of the stages that software developers go through when integrating AI-based coding tools. The journey often starts with the use of ChatGPT to get answers to specific questions. Gradually, developers move on to more advanced coding agents capable of generating code snippets. A significant turning point occurs when these agents start producing more code than the developer themselves. For the discussion's author, this key moment happened about six months ago.

A radical evolution was observed about three weeks ago: developers no longer read the generated code. For example, StrongDM announced last week an approach where neither writing nor reading code is necessary, raising questions about accountability and security, especially for a company specializing in security software.

Trusting AI Output

Another crucial topic addressed was the trust that can be placed in the results provided by AI. In the past, working in a large company involved trusting services developed by other teams, relying on their documentation without necessarily verifying their code. This analogy is used to illustrate the trust that can be placed in AI. Opus 4.5 is an example of a tool that has gained this trust, particularly for specific tasks like creating a JSON API. The author now feels very confident that for classes of problems already encountered, this tool will not make mistakes. When he asks the AI to build an API that queries a database and returns paginated data, he consistently receives the correct response.

Test-Driven Development with Agents

Test-driven development (TDD) is a method that has been revisited with the help of coding agents. Each coding session begins with running tests, often via the command uv run pytest. The use of red-green TDD is simplified by the agents, who understand this approach and apply it effectively. Although TDD may have seemed cumbersome in the past, automation by agents makes it more acceptable.

Some developers still neglect to write tests, which is considered a mistake. Tests, which were once perceived as a workload burden, are now facilitated by agents, making their absence unjustifiable. The author recently wrote more about TDD for coding agents in an article titled Red/green TDD. He admits to having hated test-first TDD throughout his career, finding it tedious and slowing down his work. However, letting the agents handle it is acceptable, even if it means they spend time on tests that do not work.

Manual Testing and Showboat

The importance of manual testing was emphasized, despite its apparent contradiction with automation. Automated tests do not always guarantee the proper functioning of a web server. To address this, tools like Showboat have been developed. Showboat generates a markdown document of the manual tests performed, providing an overview of the checks conducted and the results obtained. The author explains that he asks his agents to start the server in the background and use curl to test the created API. This method often uncovers new bugs that automated tests have not covered.

Compliance-Driven Development

A recent project demonstrated the effectiveness of compliance-driven development. In seeking to add file upload features to Datasette, a suite of tests was built for several web frameworks, including Go, Node.js, Django, and Starlette. This approach allowed for the creation of a compliant implementation, illustrating the power of this method. The author was able to reverse-engineer six implementations of a standard to obtain a new standard, which he then used to implement in Datasette.

Is Code Quality Important?

Code quality heavily depends on context. For ephemeral projects, quality may be secondary. However, for long-term projects, it becomes crucial. Coding agents can produce code of varying quality, and it is up to developers to decide whether to improve it or not. The agent can be used to perform refactorings that the developer might overlook due to time constraints. The author emphasizes that if an agent produces 2,000 lines of bad code and the developer chooses to ignore it, that is a deliberate choice. However, spotting a small refactoring at the end that would take an hour, he prefers to let the agent handle it while he takes the dog for a walk.

Code Patterns and Templates

Coding agents excel in consistency. By using predefined templates, they can maintain uniformity in the codebase. This facilitates continuous integration and ensures that new additions adhere to established standards. The author starts most of his projects by cloning a template, which places tests in the right spot and sets up GitHub continuous integration. Even having just one or two tests in the preferred style means that the agent will write tests in that same style. He highlights the importance of maintaining a high-quality codebase, as the agent will then add to it in a qualitative manner.

Prompt Injection and the Lethal Trifecta

The use of language models (LLMs) in software development externalizes certain decisions. However, these models are vulnerable to manipulations, similar to SQL injection. Unlike SQL injection, it is difficult to distinguish data from instructions in LLMs, making securing them more complex. The author named this problem in reference to SQL injection, as the initial issue lay in the combination of trusted text and untrusted text. However, he acknowledges that this name was a poor choice, as there is no reliable way to parameterize queries with LLMs.

Securing Coding Agents

The security of coding agents is a challenge, especially when they are run on local machines. Sandboxing is essential to limit potential damage in case of malfunction or malicious instructions. Solutions like Claude Code for the web, which operate in controlled environments, offer increased security by executing tasks in containers managed by third parties like Anthropic. The author uses Claude on his phone because it operates in a secure container, allowing him to launch a Linux VM, retrieve a git repository, and solve problems safely.