RALPH.md – a markdown format for autonomous agent loops¶

I keep setting up the same thing: an agent in a while loop, a few shell commands that run between iterations to check what changed, their output piped back into the prompt. Every time I set one up I rewrite the same scaffolding. So I made a format for it — a skill-like format with a single markdown file that defines what happens in the outer loop — run commands, assemble a prompt, pipe it to an agent — so the agent can focus on the inner loop.

AGENTS.md and Agent Skills are for the inner loop — they guide the agent during a session. RALPH.md is for the outer loop — it defines what runs between sessions.

The simplest ralph looks like this:

---
agent: claude -p
commands:
  - name: tests
    run: uv run pytest -x
args:
  - module
---

Fix the failing tests in {{ args.module }}.

{{ commands.tests }}

That's it. Run the tests, inject the output, pipe the prompt to the agent, repeat. The module arg becomes a --module flag on the CLI so you can point the same ralph at different parts of a codebase.

The format¶

A ralph is a self-contained directory. The only required file is RALPH.md - everything else is optional context:

bug-hunter/
├── RALPH.md              # the loop definition (required)
├── check-coverage.sh     # script used by a command (optional)
├── coding-guidelines.md  # context the agent loads on demand (optional)
└── test-data.json        # whatever else the loop needs (optional)

Here's a real one I use — a bug hunter with multiple commands and a focus arg:

---
agent: claude -p --dangerously-skip-permissions
commands:
  - name: tests
    run: uv run pytest -x
  - name: types
    run: uv run ty check
  - name: lint
    run: uv run ruff check .
  - name: git-log
    run: git log --oneline -10
args:
  - focus
---

# Bug Hunter

You are an autonomous bug-hunting agent running in a loop.
Each iteration starts with fresh context.
Your progress lives in the code and git.

## Test results

{{ commands.tests }}

## Type checking

{{ commands.types }}

## Lint

{{ commands.lint }}

## Recent commits

{{ commands.git-log }}

If tests, types, or lint are failing, fix that before hunting
for new bugs.

## Task

Find and fix a real bug in this codebase.
{{ args.focus }}

Each iteration:

1. **Read code** - pick a module and read it carefully. Look for
   edge cases, off-by-one errors, missing validation, incorrect
   error handling, race conditions, or logic errors.
2. **Write a failing test** - prove the bug exists with a test
   that fails on the current code.
3. **Fix the bug** - make the test pass with a minimal fix.
4. **Verify** - all existing tests must still pass.

## Rules

- One bug per iteration
- The bug must be real - do not invent hypothetical issues
- Always write a regression test before fixing
- Do not change unrelated code
- Commit with `fix: resolve <description>`

Four things¶

The whole format is four things:

agent - the command to run (anything that reads stdin)
commands - deterministic feedback commands that run between iterations
args - declared arguments to parametrize the ralph from the command line
A prompt body - with {{ placeholders }} for command output and arguments

Each iteration: run the commands, optionally inject their output into the prompt via {{ commands.<name> }}, resolve {{ args.<name> }} placeholders for ad-hoc steering, pipe the assembled prompt to the agent, agent does its thing, repeat. Fresh context every cycle.

Design decisions¶

Why a directory, not just a file? Same reason the Agent Skills format uses a directory. A RALPH.md on its own is enough for simple loops, but ralph loops often benefit from being bundled with shell scripts for custom checks and context injection and reference docs for progressive disclosure (coding-guidelines.md, architecture.md). Commands starting with ./ run relative to the ralph directory, so bundled scripts just work. The directory then is the unit of sharing.

Why not just make it a skill? They look similar on the surface - both are directories with a markdown file and optional bundled resources. That similarity is intentional - the skill format has become familiar to a lot of people, and borrowing its shape makes ralphs easy to understand at a glance. But they serve different layers. A skill provides knowledge about reusable processes in the inner loop - the agent's session. A ralph steers the outer loop by running code between iterations to deterministically control the environment and optionally inject context into the inner loop before kicking off the next iteration.

Try it¶

I'm building a tool called Ralphify to run ralphs in this format. Arguments declared in the frontmatter become flags on the command line, so a single ralph works across different contexts:

uv tool install ralphify

# point it at a directory containing a RALPH.md
ralph run ./ralphs/bug-hunter --focus "authentication and session handling"

# same ralph, different focus
ralph run ./ralphs/bug-hunter --focus "edge cases in the payment flow"

# or run it without args - unmatched placeholders just resolve to empty
ralph run ./ralphs/bug-hunter

Declare args: [focus] and you get --focus on the CLI. The value fills {{ args.focus }} in the prompt. One ralph, many use cases.

Because ralphs are just directories in a git repo, anyone can share them. If a repo contains a directory with a RALPH.md, you can install it with agr:

# install a specific ralph from any GitHub repo
agr add owner/repo/ralph-name

# install all ralphs in a repo
agr add owner/repo

The ralphify examples are a good place to start — and the cookbook has more.

I'd love feedback¶

This is where my thinking landed, but I'm sure there are blind spots. If you're running agent loops - for coding, research, testing, or something I haven't thought of - I'd genuinely like to hear what you think.

Share a use case: open an issue describing how you'd use this, or how you already run agent loops. The weird, unexpected ones are the most useful.
Poke holes in the format: if something feels wrong or missing, I want to know.
Write a ralph and share it: if you try the format and build something interesting, I'd love to see it.

GitHub | Docs | PyPI