6 Matching Annotations
  1. Last 7 days
    1. Next, hierarchical active perception collects targeted evidence via coarse-to-fine spatial crops

      minor item: no arrow between bottom left HAP (2) to EBA (3), is this intended ? It's visually not easy to follow at the current state 1 splitting two step 2s and only one connecting to 3. Visual could be made more easy-to-follow.

    2. Finally, while TIR-Flow signifi-564cantly elevates reasoning ceilings, the iterative na-565ture of System-2 active perception entails a mod-566est trade-off in inference speed compared to static,567single-pass baselines.

      speed is mentioned as a limitation, but the paper doesn't include any measures for this.

    3. 4 Experiment

      Computational analysis is missing. The study should also compare the work against other methods (e.g. SmartSight/STTM) in terms of computational performance/overhead.

    4. 47.6

      the max reported 51.9 for SmartSight with Video-R1 and 47.6 with Qwen2.5-VL. For a fair comparison the authors might've chosen Qwen2.5-VL performance, however it's not the maximum performance of SmartSight anyways. Needs review/revision. Also, 56.2 vs. 60.2 with Video-R1 and Qwen2.5-VL respectively for SmartSight.

    Annotators

  2. Dec 2023