10 Matching Annotations
  1. Last 7 days
    1. ritic model to evaluate each step of the interaction based on three dimensions: logical soundness, tool-call accuracy, and informational gain.

      turn level critic

    2. In addition, we train a more capable CoT Reconstruction model to generate cleaner and more faithful reasoning traces from refined answers

      trained a COT generating model.

    3. easoning-focused models often struggle with long-horizon interactions (e.g., deep search) [ 17], while code or agent specialized models typically lack robust general reasoning abilities

      the problem it solved.

    1. 3.1.3 Factor Backtesting. To establish the ground truth for factor behavior, we perform a backtest on the entire factor pool U over the historical window. For each factor 𝑖, we obtain a quantitative performance vector 𝑃𝑖 , which includes key metrics such as returns, volatility, and decay characteristics. This dataset serves as the ob- jective basis for linking market memory with factor effectiveness

      full shit. In sample, leakage.

  2. Jun 2026