110 Matching Annotations
  1. Last 7 days
    1. SubQ 1M-Preview scores 95% accuracy, compared to 94.8% for Claude Opus 4.6

      在RULER 128K基准测试中,SubQ 1M-Preview准确率达到95%,略高于Claude Opus 4.6的94.8%。这个数据点表明SubQ在长上下文理解方面已达到前沿水平,同时突破了传统二次扩展模型的性能瓶颈。

  2. May 2026
  3. Apr 2026
    1. Claude Opus 4.7 is a solid upgrade with no regressions for Vercel. It's phenomenal on one-shot coding tasks, more correct and complete than Opus 4.6, and noticeably more honest about its own limits.

      在单次编码任务中的卓越表现和对自身局限性的诚实认知,展示了AI在准确性和自我意识上的双重进步,这种对自身能力的准确评估对于构建可靠的AI系统至关重要。

    1. Using these ability scores, the method predicts performance on new tasks with ~88% accuracy, including for models such as GPT-4o and Llama-3.1.

      88%的预测准确率是一个令人印象深刻的数据点,表明ADeLe不仅能够解释现有性能,还能可靠预测模型在新任务上的表现。这一准确率远超传统方法,为AI系统的可靠部署提供了强有力的预测工具,可能是AI评估领域的重要突破。

    2. Using these ability scores, the method predicts performance on new tasks with ~88% accuracy, including for models such as GPT-4o and Llama-3.1.

      令人惊讶的是:ADeLe方法能够以约88%的准确度预测AI模型在新任务上的表现,这包括像GPT-4o和Llama-3.1这样先进的大模型。这种预测能力远超传统评估方法,为AI性能评估提供了革命性的突破,使研究人员能够更可靠地预见模型在未见过的任务上的表现。

    1. TriAttention matches Full Attention reasoning accuracy while achieving 2.5x higher throughput or 10.7x KV memory reduction

      大多数人认为在KV缓存压缩中,准确率和效率之间存在不可避免的权衡,但作者提出的TriAttention方法能够在保持全注意力推理准确度的同时,实现2.5倍的吞吐量提升或10.7倍的内存减少。这一结果挑战了当前领域内的效率-准确度权衡范式,表明可以通过创新方法打破这一传统限制。

    2. TriAttention matches Full Attention reasoning accuracy while achieving 2.5x higher throughput or 10.7x KV memory reduction

      大多数人认为在大幅压缩KV缓存时必然会牺牲模型推理的准确性,但作者声称TriAttention在实现10.7倍内存减少的同时,仍能保持与完整注意力相同的推理准确性。这一结果挑战了业界在KV压缩与准确性之间的权衡认知。

    1. We've seen customers go from 10-20% field accuracy with a frontier model to 99-100% just by switching to using Reducto's Deep Extract.

      大多数人认为从前沿模型到接近完美的准确率需要根本性的技术突破或大量数据训练。但作者声称仅通过切换到Deep Extract方法就能将准确率从10-20%提升到99-100%,这种巨大性能提升的幅度与行业通常预期的改进曲线相悖,暗示现有方法可能存在根本性缺陷。

  4. Mar 2026
    1. Aortic dissection typically presents acutely with sudden, severe tearing chest or back pain, often described as lancinating in quality. [5-6] Approximately 50% of patients with thoracic aortic aneurysm may progress to dissection without timely intervention. [5] In contrast, thoracic aortic aneurysm is usually asymptomatic and discovered incidentally during physical examination or imaging for other indications. [5]

  5. Sep 2025
  6. Feb 2025
  7. Jan 2025
  8. Nov 2024
    1. we now realize the base pairs come to join each other up together as the system unravels and forms a new pair of DNA molecules well up to a point it does and that point is known to be accurate to about one in 10,000 base pairs now if you and I wrote an article and there was only one typo in a 10,000w article we'd be very pleased but this is nowhere near enough for a DNA sequence of three billion base pairs there would be half a million at least of Errors

      for - DNA replication accuracy - 1 in 10,000 - too high for successful replication - another higher level mechanism to correct for these errors - need a whole body for that - Denis Noble

  9. Oct 2024
    1. In the beginning of the film, a message appears that states the film encompasses historical facts as well as free personal impressions about Muhammad. Accordingly, some of the film's events did not actually take place in real life, but are indeed similar to events in Muhammad's biography.[6] Majidi stated that the objective behind presenting these scenes is to show that the whole existence could feel Muhammad's presence as well as his mercy.

      Interesting. This is a general problem with historical movies. There is almost no such thing as objectivity. By making a movie, you make choices, you select what makes the cut and what doesn't. and by doing so, you form a certain image of the prophet, in this case. The free personal impressions of Majid are in fact a way to represent a certain image of Muhammad.

      From what I have read, Majid is blamed for putting forward a Shi'ite Muhammed forward in the movie. Perhaps his free personal impressions are expressed in this regard?

  10. Sep 2024
  11. Apr 2024
    1. The initial focus is on the learner’s home language (it’s currently being piloted with grade 3 isiZulu-speaking learners at a school in Soweto, Johannesburg). English is introduced gradually as a target language. The language and speech technology has been developed to provide linguistic accuracy and is grounded in teaching principles.

      This application is for Grade 3 and up. It doesn't solve the problem I identified which is by Grade 2 most learners can't read for meaning. Stepping in early is key so there is still viability for an application like mine.

  12. Mar 2024
    1. The development of the card system and itsmore universal adoption within recent years isundoubtedly due in the mail to the development in modernbusiness and factory organisation ; it may be regarded as anoffspring of manufacture in quantities. (Massenfabrikation, Gross-industrie.) The recognised principle in manufacture in quantities ismaximum of output with minimum of labour. The means to attainthis end is specialisation, which in its turn yields greater precisionand accuracy as it^ result. All this is equally applicable to thecard system, and the last factor, greater precision and accuracy,is one of its most conspicuous claims.

      Julius Kaiser contemporaneously posits that mass manufacture and maximizing efficiency (greater output for minimum input) are the primary drivers of card index system use in the early 20th century. These also improve both precision and accuracy in handling information which allow for better company or factory operation, which would have been rising concerns for businesses and manufacturing operations at the rise of scientific management during the time period.

  13. Feb 2024
  14. Jan 2024
    1. Accuracy of the slide rule. From thediscussion of § 2 it appears that we read fourfigures of a result on one part of the scaleand three figures on the remaining part.Assuming that the error of a reading is onetenth of the smallest interval following theleft-hand index of D, we conclude that theerror is roughly 1 in 1000 or one tenth of oneper cent. The effect of the assumed errorin judging a distance is inversely propor-tional to the length of the rule. Hencewe associate with a 10-inch slide rule anerror of one tenth of one per cent, with a20-inch slide rule an error of one twentiethof one per cent or 1 part in 2000, and withthe Thacher Cylindrical slide rule an errorof a hundredth of one per cent or one part.in 10,000. The accuracy obtainable withthe 10-inch slide rule is sufficient for manypractical purposes; in any ease the sliderule result serves as a check.

      The accuracy of most 10 inch slide rules is approximately 1 in 1000 or one tenth of one percent.

      Because the error in approximating distance is inversely proportion to the length of a slide rule, longer slide rules will have proportionally smaller errors, so while a 10 inch slide rule has an error of 1 in 1000, a 20 inch will have an error of 1 in 2000 and larger rules can be accurate to within 1 in 10,000 or better.

    Tags

    Annotators

  15. Aug 2022
  16. May 2022
  17. Apr 2022
  18. Mar 2022
  19. Dec 2021
    1. Tom Moultrie. (2021, December 12). Given the comedic misinterpretation of the South African testing data offered by @BallouxFrancois (and many others!) last night ... I offer some tips having contributed to the analysis of the testing data for the @nicd_sa since April last year. (1/6) [Tweet]. @tomtom_m. https://twitter.com/tomtom_m/status/1469954015932915718

  20. Nov 2021
  21. Oct 2021
  22. Sep 2021
  23. Jul 2021
  24. Jun 2021
  25. May 2021
  26. Mar 2021
  27. Feb 2021
  28. Nov 2020
  29. Oct 2020
  30. Sep 2020
    1. Leuker, C., Hertwig, R., Gumenik, K., Eggeling, L. M., Hechtlinger, S., Kozyreva, A., Samaan, L., & Fleischhut, N. (2020). Wie informiert sich die Bevölkerung in Deutschland rund um das Coronavirus? Umfrage zu vorherrschenden Themen und Gründen, dem Umgang mit Fehlinformationen, sowie der Risikowahrnehmung und dem Wissen der Bevölkerung rund um das Coronavirus (Version 5, p. 966670) [Application/pdf]. Max-Planck-Institut für Bildungsforschung. https://doi.org/10.17617/2.3247925

  31. Aug 2020
  32. Jul 2020
  33. Jun 2020
  34. May 2020
    1. Mei, X., Lee, H.-C., Diao, K., Huang, M., Lin, B., Liu, C., Xie, Z., Ma, Y., Robson, P. M., Chung, M., Bernheim, A., Mani, V., Calcagno, C., Li, K., Li, S., Shan, H., Lv, J., Zhao, T., Xia, J., … Yang, Y. (2020). Artificial intelligence for rapid identification of the coronavirus disease 2019 (COVID-19). MedRxiv, 2020.04.12.20062661. https://doi.org/10.1101/2020.04.12.20062661

  35. Apr 2020
  36. Mar 2019
  37. Nov 2018
    1. Is Robustness the Cost of Accuracy? -- A Comprehensive Study on the Robustness of 18 Deep Image Classification Models

      这文帅了~ 信息丰富 超多的图~ 让人眼前一亮~

      探讨了18个模型的鲁棒性和准确率。结论很多,如模型构架是影响鲁棒性和准确率的重要因素(似乎是废话);相似模型构架基础上增加“深度”对鲁棒性的提升很微弱;有些模型(Vgg类)的表现出很强的对抗样本迁移性。。。

  38. Aug 2018
    1. Similarly, theories of informant accuracy posit that those with accurate domainknowledge provide more reliable responses with less error than those without such knowledge (Romney et al.,1986; Romneyand Weller,1984; Sudman et al.,1996; Weller and Romney,1988). Their observations will cluster around a single“truth”whileinaccurate observations (i.e. error) will be randomly scattered around the truth; that is, error is inhomogeneous and does nottypically converge around a small number of data points.

      Informant accuracy is also new to me.

      Get these papers. This framework might help strengthen the information quality/data validity as a broader notion of my sociotemporal representations research.

  39. May 2017
  40. Jan 2017
    1. AI criticism is also limited by the accuracy of human labellers, who must carry out a close reading of the ‘training’ texts before the AI can kick in. Experiments show that readers tend to take longer to process events that are distant in time or separated by a time shift (such as ‘a day later’).
  41. Oct 2013