16 Matching Annotations
  1. Sep 2025
    1. nference complexity comparison with differentTransformers. Here T denotes the sequence length,d the feature dimension, c is MEGA’s chunk size ofquadratic attention, and s is the size of a local windowfor AFT.

      Time and Space Complexity for Attention Mechanisms

    2. linear attention mech-anism

      Softmax Function - converts vector of predictions into probabilities for each class. https://www.geeksforgeeks.org/deep-learning/the-role-of-softmax-in-neural-networks-detailed-explanation-and-applications/

      Linear Attention = approximation of softmax (by using linear dot product of kernel feature maps to convert each step into addition for the update equations). https://linear-transformers.com/ https://haileyschoelkopf.github.io/blog/2024/linear-attn/

      Also note that flash attention (a newer, better in terms of memory approach), is briefly mentioned at the end of this paper, and discussed more in the sequel paper.

    3. Receptance Weighted Key Value (RWKV), thatcombines the efficient parallelizable trainingof transformers with the efficient inference ofRNNs.

      RWKV - Receptance Weighted Key Value = hybrid approach. Parallelization in training of transformers but efficient inference of RNNs.

    4. Transformers have revolutionized almost allnatural language processing (NLP) tasks butsuffer from memory and computational com-plexity that scales quadratically with sequencelength. In contrast, recurrent neural networks(RNNs) exhibit linear scaling in memory andcomputational requirements but struggle tomatch the same performance as Transformersdue to limitations in parallelization and scala-

      Transformers' memory and compute scale quadratically with sequence length. RNNs' memory and compute scale linearly, but performance is not as good as Transformers due to parallelization and scalability limits.

  2. Apr 2021
    1. While the global middle class is expected to expand threefold by 2030, there is increasing pressure on essential business resources, which are growing at a slower rate of 1.5 times.

      Middle class expands greatly during digital transformation, while resources used by middle class workers expand at a slower rate due to becoming more efficient?

    2. The emergence of this flexible, global enterprise requires organisations to manage a dynamic ecosystem of talent and enable next-generation digital business processes that prove to be effective, even when distributed across various places and time zones. The 2020 pandemic has certainly fast tracked this transition in some respects, at least in the short term

      Pandemic fast tracked the transition to incorporating the internet into company/organization business and hiring practices.

    3. Uber, the world’s largest taxi company, owns no vehicles. Facebook, the world’s most popular media owner, creates no content. Alibaba, the most valuable retailer, has no inventory. And Airbnb, the world’s largest accommodation provider, owns no real estate…

      Provides a platform to do these services, as well as the usage of data to make these processes more efficient.

    4. The aggressive use of data is transforming business models, facilitating new products and services, creating new processes, generating greater utility, and ushering in a new culture of management

      Highlights usage of data to increase efficiency in business models and technolgoy

    5. digital transformation refers to the adoption of digital technology to transform services or businesses.

      Does this include new services and businesses that have sprung up, such as streaming/content creation, blogs/podcasts, etc?

    1. precision-guided missile strike against one of the older TSMC foundries in Taiwan to send a message.  China could announce it will destroy one foundry each week until TSMC agrees to sell only to China. Even destroying all the TSMC foundries in Taiwan would still be net win for China:

      Ridiculous claim with no evidence. Just fear mongering without any credibility.

    2. First, it could nationalize TSMC’s two less-advanced foundries in mainland China.

      Really important because although these 2 foundaries don't have cutting edge stuff, their semiconductor output still makes up most of TSMC's market right now.

    3. and has gone from making zero to 16 percent of the world’s chips, though today their quality is low.

      Not sure if this is talking about transistor density or yield, but it's probably transistor density. Samung, TSMC, and Intel dominate the field (although Intel is still on 7 nanometer)

    4. China uses 61 percent of the world’s chips in products for both its domestic and export markets, importing around $310 billion worth in 2018.

      Increasing Usage due to a larger middle class and increasing usage for infrastructure like internet, surveillance (hikvision), cell phone processors, electric vehicles, etc.