Loop
Try task → evaluator scores → if failed, reflect verbally → store reflection → retry with reflection in context.
Advertisement
Advantage
Learn from failures without fine-tuning. Test-time adaptation via context.
Advertisement
Domains
Code generation with test signal. Reasoning tasks with verifier. Agent tool-use where reward signal available.
Memory management
Accumulate reflections across many tries. Summarize when memory grows. Cap total memory size.