multi-tool usage
partially included in the meaning of diversity?
multi-tool usage
partially included in the meaning of diversity?
prompt ChatGPT togenerate diverse instructions for these APIs
how?
Due to the inherent difficulty of tool learning, even the most sophisticated LLM, i.e.,GPT-4, has a low pass rate for complex instructions
low pass rate similar to code generation. however, code generation take advantage of pre-training, tool usage is not
When an answer has multiple spans,the numeracy-focused F1 performs a one-to-one alignmentgreedily based on the bag-of-word overlap on the set spansto ensure every current span can get the highest F1 value,then compute micro-average F1 over each span
what's the meaning?