4 Matching Annotations
  1. Sep 2023
    1. Due to the inherent difficulty of tool learning, even the most sophisticated LLM, i.e.,GPT-4, has a low pass rate for complex instructions

      low pass rate similar to code generation. however, code generation take advantage of pre-training, tool usage is not

    1. When an answer has multiple spans,the numeracy-focused F1 performs a one-to-one alignmentgreedily based on the bag-of-word overlap on the set spansto ensure every current span can get the highest F1 value,then compute micro-average F1 over each span

      what's the meaning?