• About
  • FAQ
  • Landing Page
Newsletter
Blockchain News
  • Home
    • Home – Layout 1
    • Home – Layout 2
    • Home – Layout 3
  • Bitcoin
  • Ethereum
  • Regulation
  • Market
  • Blockchain
  • Business
  • Guide
  • Contact Us
No Result
View All Result
  • Home
    • Home – Layout 1
    • Home – Layout 2
    • Home – Layout 3
  • Bitcoin
  • Ethereum
  • Regulation
  • Market
  • Blockchain
  • Business
  • Guide
  • Contact Us
No Result
View All Result
Blockchain News
No Result
View All Result
Home Ripple

Together AI Launches DSGym Framework for Training Data Science AI Agents

admin by admin
01/26/2026
in Ripple
0
Together AI Launches DSGym Framework for Training Data Science AI Agents
189
SHARES
1.5k
VIEWS
Share on FacebookShare on Twitter




Rebeca Moen
Jan 26, 2026 23:09

Together AI’s DSGym framework benchmarks LLM agents on 90+ bioinformatics tasks and 92 Kaggle competitions. Their 4B parameter model matches larger rivals.



Together AI Launches DSGym Framework for Training Data Science AI Agents

Together AI has released DSGym, a comprehensive framework for evaluating and training AI agents designed to perform data science tasks autonomously. The framework includes over 90 bioinformatics challenges and 92 Kaggle competition datasets, providing standardized benchmarks that address fragmentation issues plaguing existing evaluation methods.

The standout claim: Together AI’s 4 billion parameter model, trained using DSGym’s synthetic trajectory generation, achieves performance competitive with models 50 times its size on certain benchmarks.

Benchmark Results Show Surprising Efficiency

The published benchmarks reveal interesting performance dynamics across model sizes. Together AI’s Qwen3-4B-DSGym-SFT-2k model—fine-tuned using the framework—scored 59.36% on QRData-Verified and 77.78% on DABStep-easy tasks. That puts it ahead of the base Qwen3-4B-Instruct model (45.27% and 58.33% respectively) and competitive with models like Deepseek-v3.1 and GPT-OSS-120B on several metrics.

Claude 4.5 Sonnet currently leads the pack on harder tasks, hitting 37.04% on DABStep-hard compared to the fine-tuned 4B model’s 33.07%. But the gap narrows considerably given the massive difference in model scale.

Kimi-K2-Instruct posted the highest QRData-Verified score at 63.68%, while GPT-4o achieved 92.26% on DAEval-Verified—suggesting different architectures excel at different task types.

Why This Matters for AI Development

DSGym tackles a real problem in the AI agent space. Current benchmarks suffer from inconsistent evaluation interfaces and limited task diversity, making it difficult to compare agent performance meaningfully. The framework’s modular architecture allows researchers to add new tasks, agent scaffolds, and tools without rebuilding from scratch.

The execution-verified data synthesis pipeline is particularly notable. Rather than training on static datasets, the system generates synthetic training trajectories that are validated through actual code execution—reducing the garbage-in-garbage-out problem that hampers many AI training pipelines.

For companies building AI-powered data analysis tools, DSGym provides a standardized way to measure progress. The bioinformatics focus (DSBio) and prediction task coverage (DSPredict) extend beyond generic coding benchmarks into domain-specific applications where AI agents could deliver real productivity gains.

What’s Next

The framework is positioned as an evolving testbed rather than a static benchmark suite. Together AI has emphasized the extensibility angle, suggesting they’ll continue adding task categories and evaluation metrics. With AI agent development accelerating across the industry, having a common evaluation standard could help separate genuine capability improvements from benchmark gaming—though that’s always easier said than done.

Image source: Shutterstock




Source link

Related articles

Pantera Capital Backs Doppler Token Launch Protocol

Linux Vulnerability ‘Copy Fail’ Exposes Crypto Systems to Risk

05/04/2026
AAVE Price Prediction: Targets $185-196 by Mid-January 2026

AAVE Price Prediction: $80 Breakdown Imminent Before December Recovery to $120

05/03/2026
Share76Tweet47

Related Posts

Pantera Capital Backs Doppler Token Launch Protocol

Linux Vulnerability ‘Copy Fail’ Exposes Crypto Systems to Risk

by admin
05/04/2026
0

Ca...

AAVE Price Prediction: Targets $185-196 by Mid-January 2026

AAVE Price Prediction: $80 Breakdown Imminent Before December Recovery to $120

by admin
05/03/2026
0

Pe...

AAVE Price Prediction: Targets $185-196 by Mid-January 2026

AAVE Price Prediction: $98-105 Recovery Rally Within 14 Days Despite Current Weakness

by admin
05/02/2026
0

Jo...

AAVE Price Prediction: Targets $185-196 by Mid-January 2026

AAVE Price Prediction: $85 Breakdown Before Explosive Rally to $110+ by June

by admin
05/01/2026
0

Te...

AAVE Price Prediction: Targets $185-196 by Mid-January 2026

AAVE Price Prediction: $105 Target Within 48 Hours as Smart Money Accumulates

by admin
04/30/2026
0

Ja...

Load More
  • Trending
  • Comments
  • Latest
BoE Opens Review on Pound-Linked Stablecoin Rules

BoE Opens Review on Pound-Linked Stablecoin Rules

11/16/2025
Jeff Bezos Returns to Lead AI Venture, Project Prometheus

Jeff Bezos Returns to Lead AI Venture, Project Prometheus

11/17/2025
AVAX Drops 6% Following $30M Token Unlock as Crypto Markets Face Stock Volatility

AVAX Drops 6% Following $30M Token Unlock as Crypto Markets Face Stock Volatility

11/17/2025

High-Speed Traders In Search of New Markets Jump Into Bitcoin

01/11/2023

US Commodities Regulator Beefs Up Bitcoin Futures Review

0

Bitcoin Hits 2018 Low as Concerns Mount on Regulation, Viability

0

India: Bitcoin Prices Drop As Media Misinterprets Gov’s Regulation Speech

0

Bitcoin’s Main Rival Ethereum Hits A Fresh Record High: $425.55

0
Pantera Capital Backs Doppler Token Launch Protocol

Linux Vulnerability ‘Copy Fail’ Exposes Crypto Systems to Risk

05/04/2026
Bitcoin Drops Below $77,000 as Oil Surge Stalls Iran Talks

Bitcoin Drops Below $77,000 as Oil Surge Stalls Iran Talks

05/04/2026
How Crypto Audits Prevent Fraud and Financial Risk?

How Crypto Audits Prevent Fraud and Financial Risk?

05/03/2026
AAVE Price Prediction: Targets $185-196 by Mid-January 2026

AAVE Price Prediction: $80 Breakdown Imminent Before December Recovery to $120

05/03/2026
  • About
  • FAQ
  • Support Forum
  • Landing Page
  • Contact Us

© 2025 Blockchainews. All Rights Reserved

No Result
View All Result
  • Contact Us
  • Homepages
  • Business
  • Guide

© 2025 Blockchainews. All Rights Reserved