0:00
/
0:00
Transcript

Test && Commit || Revert with AI

Training yourself to make LLM-sized changes

This week’s Works on My Machine demo video showcases an agent I’ve put together and have been programming with in the Test && Commit || Revert (TCR) process that Kent Beck shared back in 2018 in this medium post. A process where when the tests run correctly you make a commit, but any time they fail the code is reverted.

If you want to see more experiments like this, subscribe to Works on My Machine for weekly demos and insights.

What It Does

This agent is set up to do a somewhat strict process of TCR where after you write some new tests:

  1. An LLM examines those tests and makes changes to the existing implementation

  2. All of your tests are run

  3. If they pass, the LLM generates a commit message and makes a commit in your repository

  4. If they fail, all your changes are reverted and you have to start over with new tests

Why It Matters

LLMs do the best code generation when you give them a small, well defined task to accomplish. The bigger and less defined, the more likely they are to get it wrong or go off in a direction you don’t expect. By using an agent like this, it helps train you both to make small enough changes that an LLM can get it right in one try, but also to help you more clearly define your problem.

I think there are two really interesting ideas I’d like to explore more with something like this.

First, there have been a lot of conversations about how if LLMs take care of all the “junior” work, there won’t really be a way for new people to learn and grow in many industries. With tools like this, you can get trained on the types of things that experts in the fields do with LLMs to get high quality output quickly, getting yourself the intuition for what is going to work and what won’t in a tight feedback loop. Could something like this be what education or entry level positions of the future look like?

And second, I touch on it a bit in the video, but tests don’t just have to serve as an automated way to verify your code works. Your tests can also be thought of as a structured way to define what you want your program to do. This means you can think of your tests as an “executable prompt” or a prompt DSL that just happens to be run on its own to verify your program does what it’s supposed to. Is something like Cucumber/Gherkin primed to make a comeback?

How To Get It

All the code for this agent is available at sublayerapp/tcr_agent and the most relevant code is in lib/tcr_agent/commands/go_command.rb. There is also an older implementation within the agents folder that I tried using for a bit, but found that a key command when I was ready to run the tests was better than triggering on something like file save or something. The code is there though if you’d like to try out an even more hardcore version of this!

If you’re interested in seeing a longer version of TCR with AI in action, the plan right now is to live-stream it on Sunday, March 9th around 3pm ET for about an hour. I’ll share the link out on social media but it will either be on the @sublayerscott YouTube channel or the @SublayerTeam channel so keep an eye out.

As always, you can reach me here, on LinkedIn, Twitter or the Sublayer Discord with any questions on how I have this running. Also! If you get a chance to try it out, I’d love to hear about your results and if there’s anything about it you would change!

Thanks for reading Works on My Machine! This post is public so feel free to share it.

Share

Discussion about this video