Roadmap

Our vision for the future of autonomous agent evaluation

This roadmap is continuously updated based on community feedback and emerging research. Want to see a feature? Join the discussion →

🎯Our Vision

Build self-improving agents through code + LLM-as-judge evaluation. Using research like GEPA (35x more efficient than RL) and RULER reward modeling, iofold's deepagent writes evals with data science tools and backtesting—enabling agents to evolve without manual labeling.

Q4 2025

In Progress
  • GEPA (Genetic-Pareto Agent Evolution) for self-improving agents
  • Automated reward model generation with code + LLM-as-judge evals
  • RULER (Relative Universal LLM-Elicited Rewards) scoring
  • Full web dashboard with traces, evals, and agent comparison
  • Open source release under MIT license

Q1 2026

Planned
  • Simulated mailbox environment for email agent rollouts
  • User behavior modeling for multi-turn agent support
  • Multimodal support: document and image understanding

Q2 2026

Planned
  • Simulated CRM/ERP environments for sales and support agents
  • Multimodal support: voice agent evaluation
  • Bring your own RL gym / simulation environments
  • Multi-agent system evaluation

Q3 2026

Planned
  • Simulated browser environment for web automation agents
  • Multimodal support: video agent evaluation
  • Eval marketplace (share and discover evals)
  • Enterprise and self-hosted deployment options

Help Shape the Future

iofold is open source and community-driven. We want to hear from you: