13 Matching Annotations
  1. Jan 2026
    1. n July reasoning models from both OpenAI and Google Gemini achieved gold medal performance in the International Math Olympiad, a prestigious mathematical competition held annually (bar 1980) since 1959. This was notable because the IMO poses challenges that are designed specifically for that competition. There’s no chance any of these were already in the training data! It’s also notable because neither of the models had access to tools—their solutions were generated purely from their internal knowledge and token-based reasoning capabilities.

      international math olympiad style questions can be answered by OpenAI and Gemini models without tools nor having the challenges in their training data.

  2. Jun 2024
    1. there is essentially this Benchmark 00:09:58 called the math benchmark a set of difficult mathematic problems from a high school math competitions and when the Benchmark was released in 2021 gpt3 only got 5%

      for - stats - AI - evolution - Math benchmark

      stats - AI - evolution - Math benchmark - 2021 - GPT3 scored 5% - 2022 - scored 50% - 2024 - Gemini 1.5 Pro scored 90%

  3. Oct 2023
    1. Beyond just audio recordings so for that reason two of our senior 00:15:02 researchers Benjamin Hoffman and Maddie cusumano have also developed a biologer benchmark data set and so a biologer is an animal born tag like the one in the image on the right here 00:15:14 and these produce very valuable data because they can inform us about animal ecophysiology and allow us to improve conservation by monitoring animal movements and behaviors with very high 00:15:27 resolution
      • for: BEBE, biologger Ethogram Benchmark
    2. beans and 00:13:54 this is a benchmark of animal sounds and it's a collection of audio recordings from more than 250 species and this large aggregate data set is a way to 00:14:07 test tools for classification and detection and these are outstanding problems in bioacoustics that we desperately need solutions to
      • for: BEANS, Benchmark of Animal Sounds
  4. Sep 2023
      • for: animal communication, AI - animal communication, bioacoustic

      • title: BEAN: The Benchmark of Animal Sounds

      • author

        • Masato Hagiwara
        • Benjamin Hoffman
        • Jen-Yu Liu
        • Maddie Cusimano
      • Abstract

        • The use of machine learning (ML) based techniques has become increasingly popular in the field of bioacoustics over the last years.
        • Fundamental requirements for the successful application of ML based techniques are curated, agreed upon, high-quality datasets and benchmark tasks to be learned on a given dataset.
        • However, the field of bioacoustics so far lacks such public benchmarks which cover multiple tasks and species to measure the performance of ML techniques in a controlled and standardized way and that allows for benchmarking newly proposed techniques to existing ones.
        • Here, we propose BEANS (the BEnchmark of ANimal Sounds), a collection of bioacoustics tasks and public datasets, specifically designed to measure the performance of machine learning algorithms in the field of bioacoustics.
        • The benchmark proposed here consists of two common tasks in bioacoustics:
          • classification and
          • detection.
        • It includes 12 datasets covering various species, including
          • birds,
          • land and marine mammals,
          • anurans, and insects.
        • In addition to the datasets, we also present the performance of a set of standard ML methods as the baseline for task performance.
        • The benchmark and baseline code is made publicly available at
        • in the hope of establishing a new standard dataset for ML-based bioacoustic research.
  5. Dec 2022
  6. Feb 2020
  7. Jan 2020
  8. Apr 2017