We present an outcome-driven fine-tuning framework that enhances the forecasting capabilities of large language models (LLMs) without relying on human-curated reasoning samples.
Key Insight: Our method increases prediction accuracy by 7–10% over baseline models, bringing a 14B parameter model on par with frontier models like GPT-4o—using only <10K training samples.