71 Matching Annotations
  1. Last 7 days
    1. Composer 2.5 is exceptionally intelligent & up to 10x more efficient than similarly capable models.

      Cursor公司声称其Composer 2.5模型比同等能力的模型效率高10倍。这是一个相当大胆的断言,但缺乏具体的基准测试数据或比较标准。虽然可能存在一些优化,但10倍的提升需要更详细的验证。

    2. Pulled the trigger today & switched 100% of Lindy traffic to DeepSeek v4, churning from Anthropic models. Saves us millions of $ & we're actually seeing an _increase_ in performance on many core use cases.

      Lindy完全切换到DeepSeek v4模型,节省数百万美元,同时核心用例性能还提升了。这个案例展示了从封闭模型转向开源模型的显著经济优势,但缺乏具体的节省金额和性能提升的具体数据点。

    1. Kimi K2.6, the best-performing open-source model, achieves just 3.8% on Diamond, 16% on Main and 37% on Extended.

      开源模型与闭源模型之间存在显著差距,最佳开源模型在三个难度级别上的表现均大幅落后。37%的分数在Extended集上仍远低于Claude Opus的51.8%,这突显了开源模型在代码质量评估上的挑战,但也缺乏与商业模型同等规模的训练数据支持。

    2. Claude Opus 4.8, achieves a score of only 13.4%. Other models score significantly lower: GPT-5.5 receives 6.3%, Gemini 3.1 Pro 4.7%, and others even less.

      这些分数显示了当前最先进AI模型在生产级代码质量评估上的表现不佳,即使是最好的模型也只达到13.4%的分数。这表明AI代码生成仍有巨大改进空间,但缺乏绝对评分标准,难以判断这个分数的实际意义。

  2. May 2026
    1. 20亿参数对比同体量自回归模型、千亿参数LLaDA2.0,连续路线的scaling曲线健康有效。

      这是一个重要的模型规模对比数据。20亿参数的连续模型能媲美千亿参数的自回归模型,表明连续空间范式在参数效率上有巨大优势。这暗示着未来AI模型可能不再单纯追求参数规模,而是转向更高效的架构设计,对行业资源分配和技术路线产生深远影响。

    1. Team, $24.99/user/month: a shared workspace with admin controls and more storage.

      团队版定价为每人每月24.99美元,比个人版高出约67%。这种定价差异反映了团队协作功能的价值,包括管理员控制功能和更多存储空间。与市场上其他AI工具的团队版相比,这个价格处于中等水平,表明Mistral试图在价格和价值之间找到平衡点,以吸引中小型企业客户。

    1. Over the past six months, OpenAI forward deployed engineers and researchers along with Thrive Holdings' engineers collaborated to build Tax AI

      六个月的开发周期表明这是一个长期、复杂的项目。'forward deployed engineers'表明OpenAI团队采用了嵌入式工作方式,这有助于更好地理解实际业务需求。这种跨公司合作模式可能成为AI专业领域应用的标准开发方式。

    1. V4-Flash by default for cheap iteration; /pro lifts a single turn to V4-Pro

      这个数据点提到了两种模型版本:默认使用V4-Flash进行低成本迭代,而/pro命令可以将单个回合提升到V4-Pro。虽然提到了模型版本,但没有提供关于这两种模型在性能、能力或成本方面的具体比较数据。这种分层定价策略在AI工具中很常见,但缺乏具体细节使其难以评估。

    1. search over millions of model configurations to jointly optimize over perceptual quality and on-device runtime

      数百万模型配置的搜索规模表明研究进行了大规模的实验和优化,这增强了结果的可信度。然而,文章没有提供具体的搜索方法、优化算法或计算资源信息,这使得难以评估这一过程的效率和科学性。

    1. When we looked, use of “goblin” in ChatGPT had risen by 175% after the launch of GPT‑5.1, while “gremlin” had risen by 52%.

      令人震惊的数据表明,一个看似无害的偏好可以迅速在模型中扩散,突显了监控和及时响应模型行为变化的重要性。

  3. Apr 2026
    1. 🔹 **DeepSeek-V4-Flash:** 284B total / 13B active params. Your fast, efficient, and economical choice.

      DeepSeek-V4-Flash的参数规模明显小于Pro版本:总参数2840亿,活跃参数130亿。参数效率比约为4.6%,略高于Pro版本。这种参数设计使其在保持性能的同时实现更快响应和更低成本,适合需要快速响应的应用场景。

    2. 🔹 **DeepSeek-V4-Pro:** 1.6T total / 49B active params. Performance rivaling the world's top closed-source models.

      这里提供了DeepSeek-V4-Pro的具体参数数据:总参数1.6万亿,活跃参数490亿。这种参数规模远超大多数开源模型,接近顶级闭源模型。参数效率比(活跃参数/总参数)约为3%,表明采用了稀疏激活技术,这可能是其性能与效率平衡的关键。

    1. Two variants are available: **Sakana Fugu Mini 🐟**, optimized with latency in mind, and **Sakana Fugu Ultra 🐡**, the full orchestration system, optimized for performance for demanding tasks.

      文章提到有两种变体:Mini(延迟优化)和Ultra(性能优化),但未提供具体的性能指标差异,如延迟降低百分比或吞吐量提升数据。这种缺乏具体量化参数的描述难以评估两种变体在实际应用中的性能差异。

    2. GPQAD | 94.4 | 90.9 | 92.7 | 92.4 | **95.1** | LCBv6 | 90.3 | 92.1 | 92.4 | 90.4 | **93.2** | SWEPro | 48.4 | 51.2 | _53.4_ | 51.3 | **54.2**

      性能对比表格显示,Sakana Fugu Ultra在三个基准测试中均优于竞争对手:GPQAD上达95.1%(超越Gemini 3.1的94.4%),LCBv6上达93.2%(超越GPT 5.4的92.1%),SWEPro上达54.2%(超越Opus 4.6的53.4%)。这些数据表明其多模型协调策略确实带来了性能提升,特别是在科学推理任务上优势明显。

    1. The best-performing model across these three metrics was a pair of independent linear trends: one for reasoning models and one for non-reasoning models.

      这个模型选择结果(100%的三个指标)表明将模型分为推理和非推理两类是最优预测模型。这提供了强有力的统计证据,支持推理能力可能是AI加速发展的关键因素。然而,文章没有详细说明如何定义推理模型,这可能影响结果的可靠性。

    2. Reasoning models show both a one-off jump in performance and a roughly 2-3x faster trend compared to non-reasoning models.

      这是一个重要的性能对比数据,表明推理模型比非推理模型的进步速度快2-3倍。这是一个显著的加速比率,暗示推理能力的突破可能代表了AI发展的一个转折点。然而,文章没有提供具体的基准测试数据来支持这一倍数关系,需要谨慎对待。

    3. The best-performing model across these three metrics was a pair of independent linear trends: one for reasoning models and one for non-reasoning models.

      这个发现表明推理模型和非推理模型的发展轨迹确实存在显著差异。这种分离的线性趋势模型在三个指标上表现最佳,100%的情况下优于其他模型,提供了强有力的统计证据支持AI能力加速的论点。

    1. Reddit, Shutterstock, and News Corp are making hundreds of millions a year licensing their high-quality data to companies training AI, and those contracts are growing about 20 percent annually, according to their quarterly filings.

      这一数据揭示了AI训练数据市场的巨大经济价值,表明高质量数据已成为AI公司的战略资产。传统内容公司正在转型为AI的'输入公司',这种转变不仅改变了他们的商业模式,也重新定义了数据在AI生态系统中的核心地位。

    1. SOTA models of different architectures and parameter scales exhibit highly consistent failure patterns on the same set of hard samples, suggesting that the performance bottleneck stems from shared deficiencies in training data rather than architecture itself.

      大多数人认为不同架构的模型会有不同的失败模式和弱点,但作者发现无论架构和参数规模如何,SOTA模型在相同困难样本上表现出高度一致的失败模式,这表明性能瓶颈源于训练数据的共同缺陷,而非架构差异,这一发现挑战了模型多样化的传统观点。

    2. Without any architectural modification, MinerU2.5-Pro achieves 95.69 on OmniDocBench v1.6, improving over the same-architecture baseline by 2.71 points and surpassing all existing methods including models with over 200× more parameters.

      大多数人认为更大的模型架构必然带来性能提升,但作者仅通过数据工程和训练策略优化,在保持1.2B参数架构不变的情况下,超越了参数量超过200倍的现有模型,这挑战了'越大越好'的行业共识,证明了数据质量的重要性。

  4. Apr 2024
    1. The consensus is reached in the same way as fortransactions i.e. using hasgraph consensus algorithm. The onlydifference is, that the concerning events in the hashgraph nowcontain other type of data instead of transactions

      Not necessarily, how to store received events is an implementation detail. One could dump them in an array on a side. Can be as efficient as array of pointers to events. Where idx of this array is event's position in total order.

    Tags

    Annotators

  5. Aug 2023
  6. Jul 2023
    1. Conceptual data model: describes the semantics of a domain, being the scope of the model. For example, it may be a model of the interest area of an organization or industry. This consists of entity classes, representing kinds of things of significance in the domain, and relationship assertions about associations between pairs of entity classes. A conceptual schema specifies the kinds of facts or propositions that can be expressed using the model. In that sense, it defines the allowed expressions in an artificial 'language' with a scope that is limited by the scope of the model.
  7. Jan 2023
    1. 个人学习可能取决于他人行为的主张突出了将学习环境视为一个涉及多个互动参与者的系统的重要性
  8. Aug 2022
  9. Apr 2022
    1. ReconfigBehSci. (2022, January 24). @STWorg @FraserNelson @GrahamMedley no worse- he took Medley’s comment that Sage model the scenarios the government asks them to consider to mean that they basically set out to find the justification for what the government already wanted to do. Complete failure to distinguish between inputs and outputs of a model [Tweet]. @SciBeh. https://twitter.com/SciBeh/status/1485625862645075970

    1. Dr Nisreen Alwan 🌻. (2020, March 14). Our letter in the Times. ‘We request that the government urgently and openly share the scientific evidence, data and modelling it is using to inform its decision on the #Covid_19 public health interventions’ @richardhorton1 @miriamorcutt @devisridhar @drannewilson @PWGTennant https://t.co/YZamKCheXH [Tweet]. @Dr2NisreenAlwan. https://twitter.com/Dr2NisreenAlwan/status/1238726765469749248

  10. Feb 2022
  11. Jan 2022
  12. Jul 2021
  13. May 2021
  14. Mar 2021
  15. Nov 2020
    1. We love dbt because of the values it embodies. Individual transformations are SQL SELECT statements, without side effects. Transformations are explicitly connected into a graph. And support for testing is first-class. dbt is hugely enabling for an important class of users, adapting software engineering principles to a slightly different domain with great ergonomics. For users who already speak SQL, dbt’s tooling is unparalleled.

      when using [[dbt]] the [[transformations]] are [[SQL statements]] - already something that our team knows

    1. The attribution data modelIn reality, it’s impossible to know exactly why someone converted to being a customer. The best thing that we can do as analysts, is provide a pretty good guess. In order to do that, we’re going to use an approach called positional attribution. This means, essentially, that we’re going to weight the importance of various touches (customer interactions with a brand) based on their position (the order they occur in within the customer’s lifetime).To do this, we’re going to build a table that represents every “touch” that someone had before becoming a customer, and the channel that led to that touch.

      One of the goals of an [[attribution data model]] is to understand why someone [[converted]] to being a customer. This is impossible to do accurately, but this is where analysis comes in.

      There are some [[approaches to attribution]], one of those is [[positional attribution]]

      [[positional attribution]] is that we are weighting the importance of touch points - or customer interactions, based on their position within the customer lifetime.

  16. Oct 2020
  17. Aug 2020
  18. Jul 2020
  19. May 2020
  20. Apr 2020
  21. Feb 2020
  22. Jan 2020
    1. The Web Annotation Data Model specification describes a structured model and format to enable annotations to be shared and reused across different hardware and software platforms.

      The publication of this web standard changed everything. I look forward to true testing of interoperable open annotation. The publication of the standard nearly three years ago was a game changer, but the game is still in progress. The future potential is unlimited!

  23. Nov 2019
  24. Sep 2019
    1. On the other hand, a resource may be generic in that as a concept it is well specified but not so specifically specified that it can only be represented by a single bit stream. In this case, other URIs may exist which identify a resource more specifically. These other URIs identify resources too, and there is a relationship of genericity between the generic and the relatively specific resource.

      I was not aware of this page when the Web Annotations WG was working through its specifications. The word "Specific Resource" used in the Web Annotations Data Model Specification always seemed adequate, but now I see that it was actually quite a good fit.

  25. Apr 2019
  26. Sep 2016
    1. The importance of models may need to be underscored in this age of “big data” and “data mining”. Data, no matter how big, can only tell you what happened in the past. Unless you’re a historian, you actually care about the future — what will happen, what could happen, what would happen if you did this or that. Exploring these questions will always require models. Let’s get over “big data” — it’s time for “big modeling”.
  27. Feb 2015