16 Matching Annotations

Sep 2025
arxiv.org arxiv.org

RWKV: Reinventing RNNs for the Transformer Era

6
1. aalichao 23 Sep 2025
  
  in Public
  
  the Pile
  
  Open source LLM dataset provided by the main authors of this paper, Eleuther.ai.
2. aalichao 23 Sep 2025
  
  in Public
  
  nference complexity comparison with differentTransformers. Here T denotes the sequence length,d the feature dimension, c is MEGA’s chunk size ofquadratic attention, and s is the size of a local windowfor AFT.
  
  Time and Space Complexity for Attention Mechanisms
3. aalichao 23 Sep 2025
  
  in Public
  
  vanishing gradientproblem
  
  Gradients become exponentially smaller, eventually leading to no change in model weights
4. aalichao 22 Sep 2025
  
  in Public
  
  linear attention mech-anism
  
  Softmax Function - converts vector of predictions into probabilities for each class. https://www.geeksforgeeks.org/deep-learning/the-role-of-softmax-in-neural-networks-detailed-explanation-and-applications/
  
  Linear Attention = approximation of softmax (by using linear dot product of kernel feature maps to convert each step into addition for the update equations). https://linear-transformers.com/ https://haileyschoelkopf.github.io/blog/2024/linear-attn/
  
  Also note that flash attention (a newer, better in terms of memory approach), is briefly mentioned at the end of this paper, and discussed more in the sequel paper.
  
  Softmax Linear Attention Flash Attention
5. aalichao 22 Sep 2025
  
  in Public
  
  Receptance Weighted Key Value (RWKV), thatcombines the efficient parallelizable trainingof transformers with the efficient inference ofRNNs.
  
  RWKV - Receptance Weighted Key Value = hybrid approach. Parallelization in training of transformers but efficient inference of RNNs.
  
  RWKV
6. aalichao 22 Sep 2025
  
  in Public
  
  Transformers have revolutionized almost allnatural language processing (NLP) tasks butsuffer from memory and computational com-plexity that scales quadratically with sequencelength. In contrast, recurrent neural networks(RNNs) exhibit linear scaling in memory andcomputational requirements but struggle tomatch the same performance as Transformersdue to limitations in parallelization and scala-
  
  Transformers' memory and compute scale quadratically with sequence length. RNNs' memory and compute scale linearly, but performance is not as good as Transformers due to parallelization and scalability limits.
  
  Transformers vs Recurrent Neural Networks
Visit annotations in context

Tags

Transformers vs Recurrent Neural Networks

RWKV

Linear Attention

Softmax

Flash Attention

Annotators

aalichao

URL

arxiv.org/pdf/2305.13048
Apr 2021
www2.deloitte.com www2.deloitte.com

What is digital economy? | Deloitte Malta | Technology

6
1. aalichao 07 Apr 2021
  
  in Public
  
  While the global middle class is expected to expand threefold by 2030, there is increasing pressure on essential business resources, which are growing at a slower rate of 1.5 times.
  
  Middle class expands greatly during digital transformation, while resources used by middle class workers expand at a slower rate due to becoming more efficient?
2. aalichao 07 Apr 2021
  
  in Public
  
  The emergence of this flexible, global enterprise requires organisations to manage a dynamic ecosystem of talent and enable next-generation digital business processes that prove to be effective, even when distributed across various places and time zones. The 2020 pandemic has certainly fast tracked this transition in some respects, at least in the short term
  
  Pandemic fast tracked the transition to incorporating the internet into company/organization business and hiring practices.
3. aalichao 07 Apr 2021
  
  in Public
  
  Uber, the world’s largest taxi company, owns no vehicles. Facebook, the world’s most popular media owner, creates no content. Alibaba, the most valuable retailer, has no inventory. And Airbnb, the world’s largest accommodation provider, owns no real estate…
  
  Provides a platform to do these services, as well as the usage of data to make these processes more efficient.
4. aalichao 07 Apr 2021
  
  in Public
  
  The aggressive use of data is transforming business models, facilitating new products and services, creating new processes, generating greater utility, and ushering in a new culture of management
  
  Highlights usage of data to increase efficiency in business models and technolgoy
5. aalichao 07 Apr 2021
  
  in Public
  
  digital transformation refers to the adoption of digital technology to transform services or businesses.
  
  Does this include new services and businesses that have sprung up, such as streaming/content creation, blogs/podcasts, etc?
6. aalichao 07 Apr 2021
  
  in Public
  
  A unicorn is a privately held startup company whose valuation is over $1 billion
  
  Definition unicorn - private startup worth > $1 Billion
Visit annotations in context

Annotators

aalichao

URL

www2.deloitte.com/mt/en/pages/technology/articles/mt-what-is-digital-economy.html
warontherocks.com warontherocks.com

The Chip Wars of the 21st Century - War on the Rocks

4
1. aalichao 07 Apr 2021
  
  in Public
  
  precision-guided missile strike against one of the older TSMC foundries in Taiwan to send a message. China could announce it will destroy one foundry each week until TSMC agrees to sell only to China. Even destroying all the TSMC foundries in Taiwan would still be net win for China:
  
  Ridiculous claim with no evidence. Just fear mongering without any credibility.
2. aalichao 07 Apr 2021
  
  in Public
  
  First, it could nationalize TSMC’s two less-advanced foundries in mainland China.
  
  Really important because although these 2 foundaries don't have cutting edge stuff, their semiconductor output still makes up most of TSMC's market right now.
3. aalichao 07 Apr 2021
  
  in Public
  
  and has gone from making zero to 16 percent of the world’s chips, though today their quality is low.
  
  Not sure if this is talking about transistor density or yield, but it's probably transistor density. Samung, TSMC, and Intel dominate the field (although Intel is still on 7 nanometer)
4. aalichao 07 Apr 2021
  
  in Public
  
  China uses 61 percent of the world’s chips in products for both its domestic and export markets, importing around $310 billion worth in 2018.
  
  Increasing Usage due to a larger middle class and increasing usage for infrastructure like internet, surveillance (hikvision), cell phone processors, electric vehicles, etc.
Visit annotations in context

Annotators

aalichao

URL

warontherocks.com/2020/06/the-chip-wars-of-the-21st-century/

aalichao

Annotations: 16

Joined: April 7, 2021

Tags

Annotators

URL

Annotators

URL

Annotators

URL