arXiv AI recent: Towards Verifiable Agentic Data Science: Solving Irregular TSQA Via Tool-Grounded Reasoning
Researchers introduced IRTS-ToolBench, a benchmark for irregular time series question answering (TSQA).,IRTS-ToolBench consists of 1,700 questions across 10 task types and 13 domains.,The...
Time series data in real-world deployments is often irregular, with asynchronous observations, informative missing values, and varying sampling frequencies.,Existing TSQA benchmarks mostly assume regularly sampled inputs, leaving a gap in understanding LLM and AI agent performance under irregular...