Roadmap
Our vision for the future of autonomous agent evaluation
This roadmap is continuously updated based on community feedback and emerging research. Want to see a feature? Join the discussion →
🎯Our Vision
Build self-improving agents through code + LLM-as-judge evaluation. Using research like GEPA (35x more efficient than RL) and RULER reward modeling, iofold's deepagent writes evals with data science tools and backtesting—enabling agents to evolve without manual labeling.
Q4 2025
In Progress
- ▸GEPA (Genetic-Pareto Agent Evolution) for self-improving agents
- ▸Automated reward model generation with code + LLM-as-judge evals
- ▸RULER (Relative Universal LLM-Elicited Rewards) scoring
- ▸Full web dashboard with traces, evals, and agent comparison
- ▸Open source release under MIT license
Q1 2026
Planned
- ▸Simulated mailbox environment for email agent rollouts
- ▸User behavior modeling for multi-turn agent support
- ▸Multimodal support: document and image understanding
Q2 2026
Planned
- ▸Simulated CRM/ERP environments for sales and support agents
- ▸Multimodal support: voice agent evaluation
- ▸Bring your own RL gym / simulation environments
- ▸Multi-agent system evaluation
Q3 2026
Planned
- ▸Simulated browser environment for web automation agents
- ▸Multimodal support: video agent evaluation
- ▸Eval marketplace (share and discover evals)
- ▸Enterprise and self-hosted deployment options
Help Shape the Future
iofold is open source and community-driven. We want to hear from you:
- ✓Vote on features in GitHub Discussions
- ✓Propose new ideas in GitHub Issues
- ✓Contribute code via pull requests
- ✓Share your use cases in GitHub Discussions