411 Matching Annotations

Apr 2025
arxiv.org arxiv.org

1810.04805.pdf

1
1. mark.crowley 04 Apr 2025
  
  in Public
  
  could be very harmful when applying fine-tuning based approaches to token-level tasks suchas question answering, where it is crucial to incor-porate context from both directions.
  
  were they right or wrong, in hindsight?
Visit annotations in context

Annotators

mark.crowley

URL

arxiv.org/pdf/1810.04805
Mar 2025
arxiv.org arxiv.org

2402.17238v1.pdf

1
1. mark.crowley 28 Mar 2025
  
  in Public
  
  JOURNAL OF IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 14, NO. 8, AUGUST 2023
  
  Zhen Yang. “Does Negative Sampling Matter? A Review with Insights into its Theory and Applications”. JOURNAL OF IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 14, NO. 8, AUGUST 2023
Visit annotations in context

Annotators

mark.crowley

URL

arxiv.org/pdf/2402.17238v1
arxiv.org arxiv.org

1706.03762.pdf

1
1. mark.crowley 28 Mar 2025
  
  in Public
  
  Attention(Q, K, V ) = softmax( QKT√dk)V
  
  Scaled Dot-Product Attention Formula
  
  transformers
Visit annotations in context

Tags

transformers

Annotators

mark.crowley

URL

arxiv.org/pdf/1706.03762
proceedings.mlr.press proceedings.mlr.press

Show, Attend and Tell: Neural Image CaptionGeneration with Visual Attention

3
1. mark.crowley 05 Mar 2025
  
  in Public
  
  Examples of mistakes where we can use attention to gain intuition into what the model saw.
  
  Perhaps the best use of this approach is for looking for mistakes or understanding why a model does badly on certain data instances.
  
  attention interpretability
2. mark.crowley 05 Mar 2025
  
  in Public
  
  Visualization of the attention for each generated word. The rough visualizations obtained by upsampling the attention weightsand smoothing. (top)“soft” and (bottom) “hard” attention (note that both models generated the same captions in this example)
  
  In a trained model each word correlates to strong responses from certain parts of the image.
  
  attention
3. mark.crowley 05 Mar 2025
  
  in Public
  
  Our model learns a words/image alignment.
  
  The high level view of the attention model.
  
  attention
Visit annotations in context

Tags

attention

interpretability

Annotators

mark.crowley

URL

proceedings.mlr.press/v37/xuc15.pdf
arxiv.org arxiv.org

1602.04938v3.pdf

1
1. mark.crowley 05 Mar 2025
  
  in Public
  
  Raw data and explanation of a badmodel’s prediction in the “Husky vs Wolf ” task
  
  Famous example of how supervison model can overfit to extraneous data. Attention can help to discover these.
  
  attention
Visit annotations in context

Tags

attention

Annotators

mark.crowley

URL

arxiv.org/abs/1602.04938
Feb 2025
openreview.net openreview.net

13411_OpenRCA_Can_Large_Langua.pdf

1
1. mark.crowley 24 Feb 2025
  
  in Public
  
  Open source dataset for using LLMs to location root causes for microservices
  
  proj-causal-rca proj-gm-causal
Visit annotations in context

Tags

proj-causal-rca

proj-gm-causal

Annotators

mark.crowley

URL

openreview.net/pdf
dl.acm.org dl.acm.org

DARTS: Distributed Architecture for Real-Time, Resilient and AI-Compressed Workflows

1
1. mark.crowley 21 Feb 2025
  
  in Public
  
  we applyIsolation Forest Method [ 17 ], an unsupervised machine learningalgorithm to filter out anomalies from the sensor data.
  
  They mention iForest, but cite iMondrian Forest. Which do they use?
Visit annotations in context

Annotators

mark.crowley

URL

dl.acm.org/doi/pdf/10.1145/3524053.3542742
www.diva-portal.org www.diva-portal.org

FULLTEXT01.pdf

2
1. mark.crowley 20 Feb 2025
  
  in Public
  
  One problem that came up was that the online iForest solution did producesimilar results as the Scikit-learn libraries
  
  which online iforest solution?
2. mark.crowley 20 Feb 2025
  
  in Public
  
  he ROC curve in the figures shows us that the solution isperforming well, in some cases even better than the solution with the regulariForest
  
  got better results than iforest in some cases.
Visit annotations in context

Annotators

mark.crowley

URL

diva-portal.org/smash/get/diva2:1707555/FULLTEXT01.pdf
arxiv.org arxiv.org

2003.03692v2.pdf

1
1. mark.crowley 20 Feb 2025
  
  in Public
  
  H. Ma, B. Ghojogh, M. N. Samad, D. Zheng and M. Crowley, "Isolation Mondrian Forest for Batch and Online Anomaly Detection," 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Toronto, ON, Canada, 2020, pp. 3051-3058, doi: 10.1109/SMC42975.2020.9283073.
  
  The algorithm fuses two ideas, "isolation" from ensemble trees methods for anomaly detection and "Mondrian forests" which can learn flexible regression models from streaming data.
  
  anomaly-detection
Visit annotations in context

Tags

anomaly-detection

Annotators

mark.crowley

URL

arxiv.org/pdf/2003.03692
hal.science hal.science

Strategic Foundation Models

1
1. mark.crowley 18 Feb 2025
  
  in Public
  
  Interesting position paper about how to have useful discussion about AGI and Foundation models.
  
  artificial-intelligence large-language-models foundation-models agi
Visit annotations in context

Tags

artificial-intelligence

foundation-models

large-language-models

agi

Annotators

mark.crowley

URL

hal.science/hal-04925309v1/document
epoch.ai epoch.ai

How has DeepSeek improved the Transformer architecture?

1
1. mark.crowley 02 Feb 2025
  
  in Public
  
  Detailed explanation of what DeepSeek model is doing differently to improve performance and training time over ChatGPT.
  
  large-language-models transformers deepseek chat-gpt
Visit annotations in context

Tags

deepseek

transformers

large-language-models

chat-gpt

Annotators

mark.crowley

URL

epoch.ai/gradient-updates/how-has-deepseek-improved-the-transformer-architecture
Jan 2025
openreview.net openreview.net

74_Mapping_Social_Choice_Theor.pdf

1
1. mark.crowley 31 Jan 2025
  
  in Public
  
  MAPPING SOCIAL CHOICE THEORY TO RLHF Jessica Dai and Eve Fleisig ICLR Workshop on Reliable and Responsible Foundation Models 2024
  
  Nice overview of how social choice theory has been connected to RLHF and AI alignment ideas.
  
  #ai-morality align rlhf llm #reinforcement-learning
Visit annotations in context

Tags

rlhf

llm

#reinforcement-learning

align

#ai-morality

Annotators

mark.crowley

URL

openreview.net/pdf
arxiv.org arxiv.org

2106.01345.pdf

1
1. mark.crowley 29 Jan 2025
  
  in Public
  
  e sample minibatches of sequence lengthK from the dataset. The prediction head corresponding to the input token st is trained to predict at –either with cross-entropy loss for discrete actions or mean-squared error for continuous actions – andthe losses for each timestep are average
  
  How training loss is compute.
Visit annotations in context

Annotators

mark.crowley

URL

arxiv.org/pdf/2106.01345
en.wikipedia.org en.wikipedia.org

Gradient boosting - Wikipedia

2
1. mark.crowley 24 Jan 2025
  
  in Public
  
  indicator notation,
  
  in class our indicator notation looks different, we'd write:
  
  $$h_m(x) = \sum_{j=1}^{J_m} b_{jm}\mathbb{I}{R{jm}} (x)$$
  
  457b25
2. mark.crowley 24 Jan 2025
  
  in Public
  
  . The tree partitions the input space into J m {\displaystyle J_{m}} disjoint regions R 1 m , … , R J m m {\displaystyle R_{1m},\ldots ,R_{J_{m}m}} and predicts a constant value in each region.
  
  457b25
Visit annotations in context

Tags

457b25

Annotators

mark.crowley

URL

en.wikipedia.org/wiki/Gradient_boosting
www.nvidia.com www.nvidia.com

NVIDIA Redefines Game AI With ACE Autonomous Game Characters

4
1. mark.crowley 13 Jan 2025
  
  in Public
  
  developers can use similarity searches to “remember” relevant information to the current prompt:
  
  Similarity Search: similarity to the search text? to he situation plus the text? This is an interesting task.
2. mark.crowley 13 Jan 2025
  
  in Public
  
  One of the best sources of information in the game world is the game itself. Game state can be transcribed into text so that a SLM can reason about the game world
  
  What is the mapping between the Game World and the Real world?
3. mark.crowley 13 Jan 2025
  
  in Public
  
  with the help of generative AI, and large language models trained on trillions of sentences describing how humans react to the world, we can start to simulate human-like decision making.
  
  So their training basis is real human responses to the world scraped from internet data. Question: Do they limit themselves to conversations about "taking actions in the world"? How would they define that dataset?
4. mark.crowley 13 Jan 2025
  
  in Public
  
  to incorporate ACE autonomous game characters into their titles.
  
  So, would all autonomous game characters have the same strategies and personalities that arise from NVIDIA implementation, regardless of the game or platform? Interesting.
Visit annotations in context

Annotators

mark.crowley

URL

nvidia.com/en-us/geforce/news/nvidia-ace-autonomous-ai-companions-pubg-naraka-bladepoint/
Nov 2024
thegradient.pub thegradient.pub

Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research

2
1. mark.crowley 27 Nov 2024
  
  in Public
  
  Figure 1: Mathematics can illuminate the ways that ReLU-based neural networks shatter input space into countless polygonal regions, in each of which the model behaves like a linear map [2, 3, 4]. These decompositions create beautiful patterns. (Figure made with SplineCam [5]).
  
  Fascinating! What RELU does.
2. mark.crowley 27 Nov 2024
  
  in Public
  
  Great review of mathematical patterns and insights about recent ML research, and discussion of how the often complicated relationship between math and ML progress is playing out in the LLM era.
Visit annotations in context

Annotators

mark.crowley

URL

thegradient.pub/shape-symmetry-structure/
arxiv.org arxiv.org

2201.02628v1.pdf

1
1. mark.crowley 08 Nov 2024
  
  in Public
  
  Likemany existing option discovery methods, we too make theassumption that all options are available everywhere, i.e.,∀s ∈ S, ∀ω ∈ Ω : s ∈ Iω . However, we show that ourapproach ends up relaxing this assumption, in effect, andprovides an elegant way to learn distinct initiation sets foroptions
  
  general assumptions about option generation in general
Visit annotations in context

Annotators

mark.crowley

URL

arxiv.org/pdf/2201.02628
www.datacamp.com www.datacamp.com

An Introduction to the Mamba LLM Architecture: A New Paradigm in Machine Learning

2
1. mark.crowley 06 Nov 2024
  
  in Public
  
  The introduction of Transformers, such as GPT-4, took the field of natural language processing (NLP) and established benchmarks for several natural language tasks. Longer sequences have long been a thorn in the side of transformers as they significantly hamper their efficiency. This deficiency is where Mamba excels. Namely, mamba can process lengthy sequences more quickly than transformers and does so more simply due to its unique architecture.
  
  Focus of mamba is on efficiently modelling long range dependencies, and allowing transitions to vary over "time"
2. mark.crowley 06 Nov 2024
  
  in Public
  
  good article on Mamba architecture vs transformers
Visit annotations in context

Annotators

mark.crowley

URL

datacamp.com/tutorial/introduction-to-the-mamba-llm-architecture
arxiv.org arxiv.org

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

1
1. mark.crowley 06 Nov 2024
  
  in Public
  
  We simplify prior deep sequence model architectures by combining the design of prior SSM architectures(Dao, Fu, Saab, et al. 2023) with the MLP block of Transformers into a single block, leading to a simple and homogenousarchitecture design (Mamba) incorporating selective state spaces
  
  So the main idea is to unify the multi-layered MLP block into a single unified one?
Visit annotations in context

Annotators

mark.crowley

URL

arxiv.org/pdf/2312.00752
arxiv.org arxiv.org

2012.00152v1.pdf

1
1. mark.crowley 02 Nov 2024
  
  in Public
  
  Every Model Learned by Gradient Descent Is Approximatelya Kernel Machine
  
  "Every Model Learned by Gradient Descent Is Approximately a Kernel Machine" by Pedro Domingos, 2020.
Visit annotations in context

Annotators

mark.crowley

URL

arxiv.org/pdf/2012.00152
Oct 2024
plato.stanford.edu plato.stanford.edu

Scientific Objectivity

2
1. mark.crowley 25 Oct 2024
  
  in Public
  
  the meaning of observational concepts is influenced by theoretical assumptions and presuppositions. For example, the concepts “mass” and “length” have different meanings in Newtonian and relativistic mechanics; so does the concept “temperature” in thermodynamics and statistical mechanics
  
  Good example of how everything in a theory that we think is truth relates to the concepts of our paradigm.
2. mark.crowley 25 Oct 2024
  
  in Public
  
  This position has been adopted by Karl R. Popper, Rudolf Carnap and other leading figures in (broadly) empiricist philosophy of science. Many philosophers have argued that the relation between observation and theory is way more complex and that influences can actually run both ways (e.g., Duhem 1906 [1954]; Wittgenstein 1953 [2001]). The most lasting criticism, however, was delivered by Thomas S. Kuhn (1962 [1970]) in his book “The Structure of Scientific Revolutions”.
  
  Competing views about the relation between observations reality and truth. Popper argues that observations help us distinguish which theories are true or not plus bringing us always closer to a more true scientific theory. Wittgenstein argues this can go both ways. Kuhn argues that these are observations are couched in the language of our paradigm and so everything is relative to that.
  
  Consistency-learning Science Truth
Visit annotations in context

Tags

Consistency-learning

Truth

Science

Annotators

mark.crowley

URL

plato.stanford.edu/entries/scientific-objectivity/
openreview.net openreview.net

1661_implementation_matters_in_deep.pdf

2
1. mark.crowley 11 Oct 2024
  
  in Public
  
  the rewards are divided through by the standard deviation of a rolling dis-counted sum of the reward
  
  big reward shaping
2. mark.crowley 11 Oct 2024
  
  in Public
  
  we find that they dramatically affect the performanceof PPO. To demonstrate this, we start by performing a full ablation study on the four optimizationsmentioned above
  
  All these little optimizations in the implementation of PPO have a big impact on it's performance.
Visit annotations in context

Annotators

mark.crowley

URL

openreview.net/pdf
proceedings.neurips.cc proceedings.neurips.cc

NeurIPS-2021-balanced-chamfer-distance-as-a-comprehensive-metric-for-point-cloud-completion-Paper.pdf

2
1. mark.crowley 09 Oct 2024
  
  in Public
  
  In brief, DCD takes a step from CD and attempts to provide a rationale bridge towards EMD for abetter sense of point distribution rather than being blinded by its nearest neighbour. Compared withEMD, it is not only more efficient but also stricter with local structures. A balanced distributionand good preservation of detailed structures are both important factors for the visual quality of thecompletion result.
  
  DCD is an improvement on CD towards the very expensive EMD method.
2. mark.crowley 09 Oct 2024
  
  in Public
  
  Chamfer Distance between two point sets S1 and S2 is defined as
  
  need to do both directions because of the minimization
Visit annotations in context

Annotators

mark.crowley

URL

proceedings.neurips.cc/paper/2021/file/f3bd5ad57c8389a8a1a541a76be463bf-Paper.pdf
arxiv.org arxiv.org

1612.00593.pdf

1
1. mark.crowley 09 Oct 2024
  
  in Public
  
  Using input transformation givesa 0.8% performance boost.
  
  but what is the input transformation?
Visit annotations in context

Annotators

mark.crowley

URL

arxiv.org/pdf/1612.00593v2
www.cs.mcgill.ca www.cs.mcgill.ca

rnn_nips.pdf

1
1. mark.crowley 04 Oct 2024
  
  in Public
  
  Q(λ) is a variant of Q-learning where eligibility traces are used to calculate the TD error. Asmentioned previously, the backwards view of traces is traditionally used
  
  The version of TD(lambda) they are using.
Visit annotations in context

Annotators

mark.crowley

URL

cs.mcgill.ca/~jmerhe1/rnn_nips.pdf
Sep 2024
agupubs.onlinelibrary.wiley.com agupubs.onlinelibrary.wiley.com

A Global Probability‐Of‐Fire (PoF) Forecast

1
1. mark.crowley 16 Sep 2024
  
  in Public
  
  to provide a forecast Probability of Fire (PoF) on a given day within a 9 by9 km grid cell
  
  Their regression task.
Visit annotations in context

Annotators

mark.crowley

URL

agupubs.onlinelibrary.wiley.com/doi/pdfdirect/10.1029/2023GL107929
Aug 2024
proceedings.neurips.cc proceedings.neurips.cc

NeurIPS-2021-direct-multi-view-multi-person-3d-pose-estimation-Paper.pdf

3
1. mark.crowley 16 Aug 2024
  
  in Public
  
  To effectively fuse the multi-view information, we propose a geometrically-guided projective attentionmechanism. Instead of applying full attention to densely aggregate features across spaces and views,it projects the estimated 3D joint into 2D anchor points for different views, and then selectivelyfuses the multi-view local features near to these anchors to precisely refine the 3D joint location. wepropose to encode the camera rays into the multi-view feature representations via a novel RayConvoperation to integrate multi-view positional information into the projective attention. In this way, thestrong multi-view geometrical priors can be exploited by projective attention to obtain more accurate3D pose estimation.
  
  Definition:: projective attention
  
  It takes into account the 3D space the points live in, and the rays of light that explain their 2D preojcetions.
  
  projective-attention attention
2. mark.crowley 15 Aug 2024
  
  in Public
  
  MvP : "Direct Multi-view Multi-person 3D Pose Estimation" Tao Wang, Jianfeng Zhang, Yujun Cai, Shuicheng Yan, Jiashi Feng
  
  Influential paper on learning consistent skeletal models of human pose from multiview images
  
  transformers pedestrian-detection multi-view attention projective-attention consistency-learning
3. mark.crowley 15 Aug 2024
  
  in Public
  
  MvP designs a novel geometricallyguided attention mechanism, called projective attention, to more precisely fuse thecross-view information for each joint.
  
  question: what is projective attention?
Visit annotations in context

Tags

pedestrian-detection

transformers

projective-attention

consistency-learning

attention

multi-view

Annotators

mark.crowley

URL

proceedings.neurips.cc/paper_files/paper/2021/file/6da9003b743b65f4c0ccd295cc484e57-Paper.pdf
openaccess.thecvf.com openaccess.thecvf.com

Multiple View Geometry Transformers for 3D Human Pose Estimation

4
1. mark.crowley 15 Aug 2024
  
  in Public
  
  Since the number of queries is larger than the actualnumber of people, we train an MLP-based classifier fβ (.) topredict a score for each query based on the appearance termto remove the “empty” ones.
  
  Initially there are more queries than there are actual pedestrians. A classifier is trained to prune out the non-people.
2. mark.crowley 15 Aug 2024
  
  in Public
  
  Really interesting and innovative method for using multiview perspective data to learn human pose and pedestrian detection.
  
  transformers pedestrian-detection hierarchical-modelling consistency-learning
3. mark.crowley 15 Aug 2024
  
  in Public
  
  We adopt a hierarchical query embed-ding scheme proposed in [36] to reduce the number of learn-able parameters.
  
  a hierarchical scheme to reduce learning paramters, if you know something about the model, that's good!
4. mark.crowley 15 Aug 2024
  
  in Public
  
  Most closely related to our work, MvP [36]extends DETR for multi-view 3D human pose estimation.
  
  mostly based on [36]
Visit annotations in context

Tags

transformers

pedestrian-detection

hierarchical-modelling

consistency-learning

Annotators

mark.crowley

URL

openaccess.thecvf.com/content/CVPR2024/papers/Liao_Multiple_View_Geometry_Transformers_for_3D_Human_Pose_Estimation_CVPR_2024_paper.pdf
arxiv.org arxiv.org

2312.15133v1.pdf

1
1. mark.crowley 02 Aug 2024
  
  in Public
  
  APU-LDI: Learning Continuous Implicit Field with Local Distance Indicator for Arbitrary-Scale Point Cloud Upsampling
  
  @inproceedings{li2024LDI, title={Learning Continuous Implicit Field with Local Distance Indicator for Arbitrary-Scale Point Cloud Upsampling}, author={Li, Shujuan and Zhou, Junsheng and Ma, Baorui and Liu, Yu-Shen and Han, Zhizhong}, booktitle={Proceedings of the AAAI Conference on Artificial Intelligence}, year={2024} }
  
  proj-csafirelidar lidar-surface-extraction feature-extraction
Visit annotations in context

Tags

feature-extraction

proj-csafirelidar

lidar-surface-extraction

Annotators

mark.crowley

URL

arxiv.org/pdf/2312.15133
www.researchgate.net www.researchgate.net

Correlational_Neural_Networks.pdf

2
1. mark.crowley 01 Aug 2024
  
  in Public
  
  We propose a variant of autoencoderswhich can work with two views of the data, while being explicitly trained to achieve all
  
  The goal is to build an autoencoder that learns a common representation of a single object when given multiple perspectives during training.
2. mark.crowley 01 Aug 2024
  
  in Public
  
  "Correlational neural networks" - looking at learning from multiple perspectives of the same thing to increase representation learning.
  
  @article{chandar2016neuralcompjour, author = {Chandar, Sarath and Khapra, Mitesh M and Larochelle, Hugo and Ravindran, Balaraman}, date-added = {2024-08-01 10:47:30 -0400}, date-modified = {2024-08-01 10:50:01 -0400}, journal = {Neural Computation}, keywords = {correlation-learning, machine-learning, inductive-bias, autoencoders}, number = {2}, pages = {257--285}, pdf = {https://www.researchgate.net/profile/Balaraman-Ravindran/publication/275588055_Correlational_Neural_Networks/links/55ed84d308ae21d099c75c00/Correlational-Neural-Networks.pdf}, publisher = {MIT Press}, title = {Correlational neural networks}, venue-short = {NeuralCompJour}, volume = {28}, year = {2016}}
  
  correlation-learning representation-learning autoencoders
Visit annotations in context

Tags

autoencoders

representation-learning

correlation-learning

Annotators

mark.crowley

URL

researchgate.net/profile/Balaraman-Ravindran/publication/275588055_Correlational_Neural_Networks/links/55ed84d308ae21d099c75c00/Correlational-Neural-Networks.pdf
Jul 2024
arxiv.org arxiv.org

2106.01345.pdf

1
1. mark.crowley 29 Jul 2024
  
  in Public
  
  his suggests that in scenarios with relativelylow amounts of data, Decision Transformer can outperform %BC by using all trajectories in thedataset to improve generalization, even if those trajectories are dissimilar from the return conditioningtarget. Our results indicate that Decision Transformer can be more effective than simply performingimitation learning on a subset of the dataset. On the tasks we considered, Decision Transformer eitheroutperforms or is competitive to %BC, without the confound of having to select the optimal subset
  
  So it seems like it isn't just behaviour cloning. It works better than it with smaller amounts of training data, so it generalizes well. But with large amounts of data, copying human behaviour may be good enough.
Visit annotations in context

Annotators

mark.crowley

URL

arxiv.org/pdf/2106.01345
arxiv.org arxiv.org

2206.08853.pdf

2
1. mark.crowley 29 Jul 2024
  
  in Public
  
  Formally, the learned reward function can be defined as ΦR : (G, V ) → R that maps a language goalG and a video snippet V to a scalar reward. An ideal ΦR should return a high reward if the behaviordepicted in the video faithfully follows the language description, and a low reward otherwise
  
  The essential ideas of MineCLIP. A function trained on youtube videos of minecraft that takes in a description of an activity, a video purporting to complete it, and a reward score for how well it did.
2. mark.crowley 29 Jul 2024
  
  in Public
  
  Agents developed in popular RL benchmarks [ 119 , 146 ] often rely on meticulously crafted dense andtask-specific reward functions to guide random explorations. However, these rewards are hard or eveninfeasible to define for our diverse and open-ended tasks in MINEDOJO. To address this challenge, ourkey insight is to learn a dense, language-conditioned reward function from in-the-wild YouTubevideos and their transcripts. Therefore, we introduce MINECLIP, a contrastive video-languagemodel that learns to correlate video snippets and natural language descriptions (Fig. 4). MINECLIPis multi-task by design, as it is trained on open-vocabulary and diverse English transcripts
  
  Designing a reward function is expensive and difficult. They learn one from the rich dataset they have for this domain.
Visit annotations in context

Annotators

mark.crowley

URL

arxiv.org/pdf/2206.08853
discovery.ucl.ac.uk discovery.ucl.ac.uk

agz_unformatted_nature (1).pdf

2
1. mark.crowley 23 Jul 2024
  
  in Public
  
  simpler tree search that relies upon thissingle neural network to evaluate positions and sample moves, without performing any Monte-Carlo rollouts.
  
  simpler network for evaluation of positions, not MCTS in the value update
2. mark.crowley 23 Jul 2024
  
  in Public
  
  new reinforcement learning algorithm thatincorporates lookahead search inside the training loop, resulting in rapid improvement and preciseand stable learning
  
  lookahead still happens, but now is inside the training loop
Visit annotations in context

Annotators

mark.crowley

URL

discovery.ucl.ac.uk/id/eprint/10045895/1/agz_unformatted_nature.pdf
en.wikipedia.org en.wikipedia.org

AlphaGo Zero - Wikipedia

1
1. mark.crowley 23 Jul 2024
  
  in Public
  
  Unlike earlier versions of AlphaGo, Zero only perceived the board's stones, rather than having some rare human-programmed edge cases to help recognize unusual Go board positions.
  
  AlphaGo Zero didn't even know the rules of Go.
Visit annotations in context

Annotators

mark.crowley

URL

en.wikipedia.org/wiki/AlphaGo_Zero
www.nature.com www.nature.com

Mastering the game of Go with deep neural networks and tree search

7
1. mark.crowley 23 Jul 2024
  
  in Public
  
  We use a reward function r(s) that is zero for allnon-terminal time steps t < T. The outcome zt = ± r(sT) is the termi-nal reward at the end of the game from the perspective of the currentplayer at time step t: +1 for winning and −1 for losing.
  
  reward function is as simple and sparse as possible, using the only thing you know for certain, whether you won or lost the game.
2. mark.crowley 23 Jul 2024
  
  in Public
  
  Using no searchat all, the RL policy network won 85% of games against Pachi
  
  Best solver up to that point was Pachi, a rules based program.
  
  SL policies could beat it 10-20% of the time.
  
  Training the RL Policy to beat the SL policy got them to 80% defeat against the state of the art solver up to that point, Pachi.
3. mark.crowley 23 Jul 2024
  
  in Public
  
  When played head-to-head, the RL policynetwork won more than 80% of games against the SL policy network.
  
  First test for the RL policy was to beat the SL Policy.
4. mark.crowley 23 Jul 2024
  
  in Public
  
  The net-work predicted expert moves on a held out test set with an accuracy of57.0% using all input features, and 55.7% using only raw board posi-tion and move history as inputs, compared to the state-of-the-art fromother research groups of 44.4% at date of submission
  
  SL Policy is a CNN that predicts the next moved trained on expert games. performance is better than previous work but still under 60% accuracy.
5. mark.crowley 22 Jul 2024
  
  in Public
  
  The summary paper for AlphaGo.
  
  ece457c reinforcement-learning
6. mark.crowley 22 Jul 2024
  
  in Public
  
  Monte Carlo tree search in AlphaGo.
  
  Showing how monte carlo tree search works in alphago
7. mark.crowley 22 Jul 2024
  
  in Public
  
  The policy network
  
  nice view of the policy and value networks in action
Visit annotations in context

Tags

reinforcement-learning

ece457c

Annotators

mark.crowley

URL

nature.com/articles/nature16961.pdf
en.wikipedia.org en.wikipedia.org

Monte Carlo tree search - Wikipedia

5
1. mark.crowley 23 Jul 2024
  
  in Public
  
  the search attempts to prune sequences which are less relevant. In some cases, a play can lead to a very specific line of play which is significant, but which is overlooked when the tree is pruned, and this outcome is therefore "off the search radar"
  
  disadvantage 1 : which is present in any pruning algorithm
2. mark.crowley 23 Jul 2024
  
  in Public
  
  it achieves better results than classical algorithms in games with a high branching factor
  
  advantage 2
3. mark.crowley 23 Jul 2024
  
  in Public
  
  pure Monte Carlo tree search does not need an explicit evaluation function
  
  advantage 1
4. mark.crowley 22 Jul 2024
  
  in Public
  
  Most contemporary implementations of Monte Carlo tree search are based on some variant of UCT
  
  The UCB algorithm for bandits comes back again as UCT to form the basis for model estimation via MCTS
  
  reinforcement-learning ece457c
5. mark.crowley 22 Jul 2024
  
  in Public
  
  The main difficulty in selecting child nodes is maintaining some balance between the exploitation of deep variants after moves with high average win rate and the exploration of moves with few simulations.
  
  Tree search makes this tradeoff very clear, how many paths will you explore before you stop and use the knowledge you already have?
  
  ece457c reinforcement-learning
Visit annotations in context

Tags

reinforcement-learning

ece457c

Annotators

mark.crowley

URL

en.wikipedia.org/wiki/Monte_Carlo_tree_search
beamlab.org beamlab.org

Deep Learning 101 - Part 1: History and Background

1
1. mark.crowley 22 Jul 2024
  
  in Public
  
  A realy nice visual history of the development of Deep Learning, the cornerstone of modern AI and ML.
  
  ece457c
Visit annotations in context

Tags

ece457c

Annotators

mark.crowley

URL

beamlab.org/deeplearning/2017/02/23/deep_learning_101_part1.html
en.wikipedia.org en.wikipedia.org

Alpha–beta pruning - Wikipedia

1
1. mark.crowley 22 Jul 2024
  
  in Public
  
  An illustration of alpha–beta pruning. The grayed-out subtrees don't need to be explored (when moves are evaluated from left to right), since it is known that the group of subtrees as a whole yields the value of an equivalent subtree or worse, and as such cannot influence the final result. The max and min levels represent the turn of the player and the adversary, respectively.
  
  Alpha-Beta pruning comes down to being smart about searching the tree of possible future game states to be more efficient about rollouts.
  
  ece457c
Visit annotations in context

Tags

ece457c

Annotators

mark.crowley

URL

en.wikipedia.org/wiki/Alpha–beta_pruning
en.wikipedia.org en.wikipedia.org

Minimax - Wikipedia, the free encyclopedia

1
1. mark.crowley 22 Jul 2024
  
  in Public
  
  For example, the chess computer Deep Blue (the first one to beat a reigning world champion, Garry Kasparov at that time) looked ahead at least 12 plies, then applied a heuristic evaluation function.[6]
  
  Deep Blue used a kind of minimax algorithm to beat Garry Kasparov at chess, 12 step lookehead.
  
  ece457c
Visit annotations in context

Tags

ece457c

Annotators

mark.crowley

URL

en.wikipedia.org/wiki/Minimax
en.wikipedia.org en.wikipedia.org

AlphaZero - Wikipedia

3
1. mark.crowley 22 Jul 2024
  
  in Public
  
  Comparing Monte Carlo tree search searches, AlphaZero searches just 80,000 positions per second in chess and 40,000 in shogi, compared to 70 million for Stockfish and 35 million for Elmo. AlphaZero compensates for the lower number of evaluations by using its deep neural network to focus much more selectively on the most promising variation.[1]
  
  The model allows it to be selective about what rollouts to do during MCTS
2. mark.crowley 22 Jul 2024
  
  in Public
  
  Wikipedia: AlphaZero
  
  ece457c reinforcement-learning
3. mark.crowley 22 Jul 2024
  
  in Public
  
  AZ has hard-coded rules for setting search hyperparameters.
  
  that's interesting...
Visit annotations in context

Tags

reinforcement-learning

ece457c

Annotators

mark.crowley

URL

en.wikipedia.org/wiki/AlphaZero
openreview.net openreview.net

1661_implementation_matters_in_deep.pdf

3
1. mark.crowley 16 Jul 2024
  
  in Public
  
  Overall, our results highlight the necessity of designing deep RL methods in a modular manner.
  
  Modularity is more important than a big idea.
2. mark.crowley 16 Jul 2024
  
  in Public
  
  It turns out that the clipping mechanism is not necessary to achieve high performance—we findthat PPO-NOCLIP performs uniformly better than PPO-M, despite the latter employing the corePPO clipping mechanism.
  
  maybe clipping isn't so important?
3. mark.crowley 16 Jul 2024
  
  in Public
  
  We find that varying the use of code-level optimizations impactsperformance significantly more than varying whether the PPO or TRPO step is used.
  
  Writing better code had a bigger impact than the difference in the algorithm!
Visit annotations in context

Annotators

mark.crowley

URL

openreview.net/pdf
arxiv.org arxiv.org

1707.06347.pdf

1
1. mark.crowley 16 Jul 2024
  
  in Public
  
  The theory justifying TRPO actually suggests using a penalty instead of a constraint, i.e.,solving the unconstrained optimization problemmaximizeθˆEt[ πθ(at | st)πθold (at | st) ˆAt − β KL[πθold (· | st), πθ(· | st)]](5)for some coefficient β
  
  This parameter \Beta is a bit mysterious. PPO works very well generally, but setting \Beta is tricky, and influences other parts of the algorithm.
Visit annotations in context

Annotators

mark.crowley

URL

arxiv.org/pdf/1707.06347
huggingface.co huggingface.co

Illustrating Reinforcement Learning from Human Feedback (RLHF)

7
1. mark.crowley 16 Jul 2024
  
  in Public
  
  PPO has been around for a relatively long time
  
  7 years is a long time apparently!
2. mark.crowley 16 Jul 2024
  
  in Public
  
  At this point in the RLHF system, we have an initial language model that can be used to generate text and a preference model that takes in any text and assigns it a score of how well humans perceive it.
  
  The basic input/output structure needed.
3. mark.crowley 16 Jul 2024
  
  in Public
  
  By comparing model outputs in head-to-head matchups, an Elo system can be used to generate a ranking of the models and outputs relative to each-other. These different methods of ranking are normalized into a scalar reward signal for training.
  
  Human's as judge of relative performance of two LLMs, scores become reward signal.
4. mark.crowley 16 Jul 2024
  
  in Public
  
  rankings are used to compare the outputs of multiple models and create a much better regularized dataset.
  
  so human beings are used to gather the reward information, but we don't trust them to actually give us the reward value itself
5. mark.crowley 16 Jul 2024
  
  in Public
  
  Human annotators are used to rank the generated text outputs from the LM.
  
  Ranking the text outputs.
6. mark.crowley 16 Jul 2024
  
  in Public
  
  The underlying goal is to get a model or system that takes in a sequence of text, and returns a scalar reward which should numerically represent the human preference.
  
  the main idea
7. mark.crowley 16 Jul 2024
  
  in Public
  
  OpenAI fine-tuned on human-generated text that was “preferable” and Anthropic generated their initial LM for RLHF by distilling an original LM on context clues for their “helpful, honest, and harmless” criteria
  
  (optional) produce human augmented data to demonstrate a better sentence response.
Visit annotations in context

Annotators

mark.crowley

URL

huggingface.co/blog/rlhf
arxiv.org arxiv.org

2403.07691.pdf

1
1. mark.crowley 16 Jul 2024
  
  in Public
  
  2024 paper arguing that other methods beyond PPO could be better for "value alignment" of LLMs
  
  reinforcement-learning ppo ece457c
Visit annotations in context

Tags

reinforcement-learning

ppo

ece457c

Annotators

mark.crowley

URL

arxiv.org/pdf/2403.07691
arxiv.org arxiv.org

Deep Reinforcement Learning that Matters

2
1. mark.crowley 15 Jul 2024
  
  in Public
  
  Through experimental methods focusing on PG methodsfor continuous control, we investigate problems with repro-ducibility in deep RL. We find that both intrinsic (e.g. randomseeds, environment properties) and extrinsic sources (e.g. hy-perparameters, codebases) of non-determinism can contributeto difficulties in reproducing baseline algorithms. Moreover,we find that highly varied results due to intrinsic sourcesbolster the need for using proper significance analysis. Wepropose several such methods and show their value on asubset of our experiments.
  
  Their findings, random seeds matter (unfortunately).
2. mark.crowley 15 Jul 2024
  
  in Public
  
  Paper "Deep Reinforcement Learning that Matters" on evaluating RL algorithms.
  
  reinforcement-learning ece457c
Visit annotations in context

Tags

reinforcement-learning

ece457c

Annotators

mark.crowley

URL

arxiv.org/pdf/1709.06560
Feb 2024
arxiv.org arxiv.org

2205.08192.pdf

1
1. mark.crowley 18 Feb 2024
  
  in Public
  
  T. Herlau, "Moral Reinforcement Learning Using Actual Causation," 2022 2nd International Conference on Computer, Control and Robotics (ICCCR), Shanghai, China, 2022, pp. 179-185, doi: 10.1109/ICCCR54399.2022.9790262. keywords: {Digital control;Ethics;Costs;Philosophical considerations;Toy manufacturing industry;Reinforcement learning;Forestry;Causality;Reinforcement learning;Actual Causation;Ethical reinforcement learning}
  
  ai-ethics ai-morality reinforcement-learning
Visit annotations in context

Tags

ai-morality

ai-ethics

reinforcement-learning

Annotators

mark.crowley

URL

arxiv.org/pdf/2205.08192.pdf
pdf.sciencedirectassets.com pdf.sciencedirectassets.com

Can model-free reinforcement learning explain deontological moral judgments?

1
1. mark.crowley 18 Feb 2024
  
  in Public
  
  Can model-free reinforcement learning explain deontological moraljudgments?Alisabeth AyarsUniversity of Arizona, Dept. of Psychology, Tucson, AZ, USA
  
  ai-morality ai-ethics reinforcement-learning
Visit annotations in context

Tags

ai-morality

ai-ethics

reinforcement-learning

Annotators

mark.crowley

URL

pdf.sciencedirectassets.com/271061/1-s2.0-S0010027716X00030/1-s2.0-S0010027716300300/am.pdf
www.mdpi.com www.mdpi.com

A New Metric for Quantifying Burn Severity: The Relativized Burn Ratio

1
1. mark.crowley 16 Feb 2024
  
  in Public
  
  Parks, S.A.; Dillon, G.K.; Miller, C. A New Metric for Quantifying Burn Severity: The Relativized Burn Ratio. Remote Sens. 2014, 6, 1827-1844. https://doi.org/10.3390/rs6031827
  
  Widely used model for #fire-severity prediction for forest wildfires in Canada and USA.
  
  forest-wildfire
Visit annotations in context

Tags

forest-wildfire

Annotators

mark.crowley

URL

mdpi.com/2072-4292/6/3/1827
esajournals.onlinelibrary.wiley.com esajournals.onlinelibrary.wiley.com

Living on the edge: trailing edge forests at risk of fire-facilitated conversion to non-forest

2
1. mark.crowley 16 Feb 2024
  
  in Public
  
  Briefly, these gridded datasets were built using an observed, satellite-derived measure of fire severity (Parks et al. 2014) and statistical models in which the probability of stand-replacing fire was modeled as a function of fuel, topography, climate, and weather. For a subset of ecoregions in our study area (Colorado Plateau, AZ–NM Mountains, and Apache Highlands), Parks et al. (2018b) also produced gridded datasets representing the probability of stand-replacing fire under extreme fire weather conditions.
  
  prior work on predicting fire severity using a fixed model
2. mark.crowley 16 Feb 2024
  
  in Public
  
  Paper using fire risk prediction model.
  
  forest-wildfire
Visit annotations in context

Tags

forest-wildfire

Annotators

mark.crowley

URL

esajournals.onlinelibrary.wiley.com/doi/10.1002/ecs2.2651
Jan 2024
arxiv.org arxiv.org

2401.05566.pdf

1
1. mark.crowley 26 Jan 2024
  
  in Public
  
  Hubinger, et. al. "SLEEPER AGENTS: TRAINING DECEPTIVE LLMS THAT PERSIST THROUGH SAFETY TRAINING". Arxiv: 2401.05566v3. Jan 17, 2024.
  
  Very disturbing and interesting results from team of researchers from Anthropic and elsewhere.
  
  large-language-models transformers rdgrp rdgrp-w24
Visit annotations in context

Tags

rdgrp

transformers

rdgrp-w24

large-language-models

Annotators

mark.crowley

URL

arxiv.org/pdf/2401.05566
arxiv.org arxiv.org

NGBoost: Natural Gradient Boosting for Probabilistic Prediction

1
1. mark.crowley 11 Jan 2024
  
  in Public
  
  You know XGBoost, but do you know NGBoost? I'd passed over this one, mentioned to me by someone wanting confidence intervals in their classification models. This could be an interesting paper to add to the ML curriculum.
  
  toread ece657a machine-learning ensemble-methods tree-based-methods boosting
Visit annotations in context

Tags

ece657a

tree-based-methods

machine-learning

toread

boosting

ensemble-methods

Annotators

mark.crowley

URL

arxiv.org/pdf/1910.03225.pdf
cdn.openai.com cdn.openai.com

gpt-4-system-card.pdf

1
1. mark.crowley 06 Jan 2024
  
  in Public
  
  GPT-4 System CardOpenAIMarch 23, 2023
  
  chat-gpt large-language-models openai system-cards transformers toread reading_group_crowley
Visit annotations in context

Tags

transformers

system-cards

large-language-models

openai

chat-gpt

toread

reading_group_crowley

Annotators

mark.crowley

URL

cdn.openai.com/papers/gpt-4-system-card.pdf
Nov 2023
proceedings.neurips.cc proceedings.neurips.cc

NeurIPS-2021-offline-reinforcement-learning-as-one-big-sequence-modeling-problem-Paper.pdf

17
1. mark.crowley 27 Nov 2023
  
  in Public
  
  denote dimensions0 through i − 1 of the state
  
  Very odd/interesting! dimensions are independent but we are doing them in order?
2. mark.crowley 27 Nov 2023
  
  in Public
  
  τ<t to denote a trajectory from timesteps 0 through t − 1
  
  tau<t short hand for all the previous s_t-1, a_t-1 etc.
3. mark.crowley 27 Nov 2023
  
  in Public
  
  lower-diagonal attention mask
  
  why lower-diagonal?
4. mark.crowley 27 Nov 2023
  
  in Public
  
  Transformer architectures feature a “causal” attentionmask to ensure that predictions only depend on previous tokens in a sequence
  
  Causal is in quotes here for a good reason. It is called causal attention mask in the LLM literature, but it has only to do with the probaility of the next token/word. It isn't attached to the meaning of the words at all.
5. mark.crowley 27 Nov 2023
  
  in Public
  
  We can use this directly as a goal-reaching method by conditioning on a desired final state sT .
  
  interesting, goal directed RL cast as a sequence of samples from conditional probabilities
6. mark.crowley 27 Nov 2023
  
  in Public
  
  If we set the predictedsequence length to be the action dimension, our approach corresponds exactly to the simplest form ofbehavior cloning with an autoregressive policy
  
  why is that? because the sample from the actions will be a proper sample? why would the sequence length ever be larger then?
7. mark.crowley 27 Nov 2023
  
  in Public
  
  Pθ (· | x)
  
  where does the distritbuion come from initially? empircal?
8. mark.crowley 27 Nov 2023
  
  in Public
  
  Uniform discretization has the advantage that it retains information about Euclidean distance inthe original continuous space, which may be more reflective of the structure of a problem than thetraining data distribution.
  
  always important to consider, if the relative magnitudes between points is important
9. mark.crowley 27 Nov 2023
  
  in Public
  
  modeling considerations are concerned lesswith architecture design and more with how to represent trajectory data – potentially consisting ofcontinuous states and actions – for processing by a discrete-token architecture
  
  They don't care what kind of transformer is being used, they are interested in how to get SASASASA into the right form.
  
  good question: what about continuous states and/or actions?
10. mark.crowley 27 Nov 2023
  
  in Public
  
  Concurrently with our work, Chen et al. (2021) also proposed an RL approach centered aroundsequence prediction, focusing on reward conditioning as opposed to the beam-search-based planningused by the Trajectory Transformer.
  
  This is the Decision Transformer paper we read last week
11. mark.crowley 27 Nov 2023
  
  in Public
  
  Modeling the states and actions jointly already provides a biastoward generating in-distribution actions, which avoids the need for explicit pessimism
  
  pessimism is a popular method to avoid (overfitting?) of the learned dynamics to what you saw. Since transformers maintain a huge context, this isn't needded, the predictions will always be tied to the same situation as in the training data
12. mark.crowley 27 Nov 2023
  
  in Public
  
  model-based RL
  
  learn the dynamics, then optimize via RL
13. mark.crowley 27 Nov 2023
  
  in Public
  
  stimateconditional distributions over actions
  
  policy as a distribution over actions
14. mark.crowley 27 Nov 2023
  
  in Public
  
  While such works demonstrate the importance of such models for representingmemory (Oh et al., 2016), they still rely on standard RL algorithmic advances to improve performance
  
  is the sequence modeling for just learning the model or is it deeper?
15. mark.crowley 27 Nov 2023
  
  in Public
  
  The Trajectory Transformeris a substantially more reliable long-horizon predictor than conventional dynamics models
  
  So the TT becomes a new type of model based RL
16. mark.crowley 27 Nov 2023
  
  in Public
  
  When decoded with a modified beam search procedure that biases trajectory samples according totheir cumulative reward,
  
  so beam search is just a decoder of the learned dynamics that optimizes for reward?
17. mark.crowley 24 Nov 2023
  
  in Public
  
  Reading this one on Nov 27, 2023 for the reading group.
  
  rdgrp-f23 reinforcement-learning transformers
Visit annotations in context

Tags

transformers

reinforcement-learning

rdgrp-f23

Annotators

mark.crowley

URL

proceedings.neurips.cc/paper_files/paper/2021/file/099fe6b0b444c23836c4a5d07346082b-Paper.pdf
proceedings.mlr.press proceedings.mlr.press

janner22a.pdf

1
1. mark.crowley 24 Nov 2023
  
  in Public
  
  Reading this one on Nov 27, 2023 for the reading group.
  
  rdgrp-f23 reinforcement-learning transformers
Visit annotations in context

Tags

transformers

reinforcement-learning

rdgrp-f23

Annotators

mark.crowley

URL

proceedings.mlr.press/v162/janner22a/janner22a.pdf
arxiv.org arxiv.org

2106.01345.pdf

6
1. mark.crowley 13 Nov 2023
  
  in Public
  
  K = 50 for Pong, K = 30 for others
  
  **Q: ** where did these numbers come from
2. mark.crowley 13 Nov 2023
  
  in Public
  
  loss = mean (( a_preds - a )**2)
  
  supervised learning for RL task
3. mark.crowley 13 Nov 2023
  
  in Public
  
  We feed the last K timesteps into Decision Transformer, for a total of 3K tokens (onefor each modality: return-to-go, state, or action)
  
  Data - K timesteps with three tokens per timestep - return-to-go token - state token embedding - action token - token embedding for each token - linear (or convolutional) layer to learn - normalize - timestep embedding - embedding of the time index itself, adjusting for 3x? - question: added or concatenated? is timestep embedding on the raw tokens or on the emebdding?
4. mark.crowley 13 Nov 2023
  
  in Public
  
  Does Decision Transformer perform behavior cloning on a subset of the data?
  
  good questions
5. mark.crowley 13 Nov 2023
  
  in Public
  
  we use the GPT architecture [ 9 ], which modifies the transformer architecture with a causal self-attention mask to enable autoregressive generation, replacing the summation/softmax over the ntokens with only the previous tokens in the sequence (j ∈ [1, i]).
  
  This sentence is working hard.
6. mark.crowley 13 Nov 2023
  
  in Public
  
  this allows the layer to assign “credit” by implicitly forming state-returnassociations via similarity of the query and key vectors (maximizing the dot product)
  
  that's a different way of thinking about what's happening in a transformer.
Visit annotations in context

Annotators

mark.crowley

URL

arxiv.org/pdf/2106.01345
arxiv.org arxiv.org

2305.15486.pdf

15
1. mark.crowley 06 Nov 2023
  
  in Public
  
  We then use a similar QA summarization framework as Wu et al. (2023) which produces QA dialogueon game mechanics
  
  Q: what was the main focus of this paper?
  
  A: "Read and Reap the Rewards: Learning to Play Atari with the Help of Instruction Manuals"
  
  Our framework consists of a QA Extraction module that extracts and summarizes relevant information from the manual and a Reasoning module that evaluates object-agent interactions based on information from the manual
2. mark.crowley 06 Nov 2023
  
  in Public
  
  LATEX source code
  
  Q: why are they using the source code and not the text output?
3. mark.crowley 06 Nov 2023
  
  in Public
  
  all prior works require expert or human generated example trajectories
  
  Training the LLMs using generated trajectories.
4. mark.crowley 06 Nov 2023
  
  in Public
  
  Wu et al. (2023) proposes a summary (Read) and reasoning (Reward) through a QA promptingframework with an open-source QA LLM Tafjord and Clark (2021). The framework demonstratesthe possibility of an using real-world human-written manuals to improve RL performance on populargames, despite limiting the interaction types to only “hit”. Our framework handles all 17 kinds ofinteractions available in the game. Moreover, our framework makes use of information on tech-treedependencies, and suggestions on desired policies extracted from the academic paper
  
  Main paper they are based on.
5. mark.crowley 06 Nov 2023
  
  in Public
  
  Indicate their priority out of 5
  
  Q: Where does "priority" even come from for the LLM for a domain like this? What prior knowledge and biases are built in here?
6. mark.crowley 06 Nov 2023
  
  in Public
  
  The visual descriptor takes the last two gameplay screens as input, andoutputs their descriptions in language (dt, dt−1)
  
  Q: so does the language it uses internally keep changing?
7. mark.crowley 06 Nov 2023
  
  in Public
  
  Answer to the final question qa is mapped to environment action usingsub-string matching.
  
  Q: is this explained in more detail anywhere?
8. mark.crowley 06 Nov 2023
  
  in Public
  
  Experimentally, we find that prompting the LLM with only the direct parents of a question greatlyreduces the context length, and helps LLM to focus on the most relevant contextual information
  
  Interesting: What is being given up here? You need to cut or summarize context at some point for sure. But when?
9. mark.crowley 06 Nov 2023
  
  in Public
  
  model-based methods like DreamerV2 Hafner et al. (2020);DreamerV3 Hafner et al. (2023)
  
  Summary: how do these methods work?
10. mark.crowley 06 Nov 2023
  
  in Public
  
  We add the prompt “DO NOT answer inLaTeX.” to all of Qgame to prevent the LLM from outputting the list in LATEX format
  
  does GPT 3.5 understand latex that well?
11. mark.crowley 06 Nov 2023
  
  in Public
  
  in an environmentwhere control tasks are less required
  
  Q: what do they mean by this?
12. mark.crowley 06 Nov 2023
  
  in Public
  
  zero-shot LLM-based (GPT-4) policy
  
  What does "zero-shot" mean when it involves an LLM?
13. mark.crowley 06 Nov 2023
  
  in Public
  
  ,we promote and regulate in-context chain-of-thought reasoning in LLMs to solvecomplex games. The reasoning module is a directed acyclic graph (DAG), with questions as nodesand dependencies as edges. For example, the question “For each action, are the requirements met?"depends on the question “What are the top 5 actions?", creating an edge from the latter to the former.For each environment step, we traverse the DAG computing LLM answers for each node in thetopological order of the graph. The final node of the DAG is a question about the best action to takeand the LLM answer for the question is directly translated to environment action
  
  seems sensible
14. mark.crowley 06 Nov 2023
  
  in Public
  
  decidingthe paragraphs that are relevant for playing the game
  
  this could be very subjective
15. mark.crowley 06 Nov 2023
  
  in Public
  
  the environment is OOD to them.
  
  Translation: the Crafter game is too new for GPT to know about
Visit annotations in context

Annotators

mark.crowley

URL

arxiv.org/pdf/2305.15486.pdf
Oct 2023
arxiv.org arxiv.org

2308.13067.pdf

21
1. mark.crowley 30 Oct 2023
  
  in Public
  
  In a nutshell, the CHT seems to disprove the scaling hypothesis.Or does it? In this work, we argue that foundation models might be exploiting a “loop hole” in the CHT4.Namely, what happens if the causal assumptions (which are required, by the CHT, for causal inference) arerepresented in observational data itself?
  
  Are LLMs exploiting a loophole in Pearl's ladder?
  
  It's not really a loophole, it's just that observational dataset that explicitely contains answers to your interventional queries.
2. mark.crowley 30 Oct 2023
  
  in Public
  
  Plato. Republic: Allegory of the cave, 375 BC
  
  ok, you win.
3. mark.crowley 30 Oct 2023
  
  in Public
  
  Same Implication, Different Representations
  
  Big Question: they cover text and experiment, but what about embodied experience? What is it's role? We believe in causality for very visceral (ie. physical and unavoidable) reasons as human beings.
  
  eg. we touch a hot stove and then it hurts
4. mark.crowley 30 Oct 2023
  
  in Public
  
  we expect P (YX←1 = 1) = P (Y = 1) since intervening on X will notchange Y
  
  Q: is that correct? wouldn't you need to show the $X\leftarrow 0$ case to demonstrate this?
5. mark.crowley 30 Oct 2023
  
  in Public
  
  the probability of a high number of Nobel laureates if the given chocolate consumption were to behigh.
  
  example of an L2 interventional query.
  
  Q: For this query $P(Y){x\leftarrow 1}=1$ wouldn't the more correct english translation be:
  
  "The probability of having a high number of Nobel laureates if high chocolate consumption was made mandatory."
6. mark.crowley 30 Oct 2023
  
  in Public
  
  We call these concepts ‘meta’ since they are one level above ‘regular’, simple SCM in thesense that they encode information about answering causal questions in another SCM.
  
  keep reading this sentence until it makes sense...or argue why it doesn't make sense
7. mark.crowley 30 Oct 2023
  
  in Public
  
  More intriguingly, it does not matter where that L2 fact comes from since the formulation is independent ofwhether the model learns the fact and simply requires that the model knows about the fact. We state oursecond key insight as
  
  good point to remember, we don't need to learn everything, some knowledge can be encoded directly, a priori.
8. mark.crowley 30 Oct 2023
  
  in Public
  
  Example 1 serves to show how the rather abstract definition of an SCM can be made tangible to communicatewhat we believe about our observed data and more so the underlying data generating process.
  
  Does everyone agree that it's crystal clear now? (maybe not...)
9. mark.crowley 30 Oct 2023
  
  in Public
  
  The Pearl’s Causal Hierarchy
  
  An important theoretical framework to read up on if you aren't familiar with it.
10. mark.crowley 30 Oct 2023
  
  in Public
  
  It is clear how the observed correlation in this case corresponds to a directcausation according to
  
  We should draw these models out
11. mark.crowley 30 Oct 2023
  
  in Public
  
  These models are castles in theair. They have no foundations whatsoever.” discrediting the models for lacking any identifiable notion tocausality.
  
  discussion: Do we really need to just pick one of these options?
12. mark.crowley 30 Oct 2023
  
  in Public
  
  Our explanation for this is that they are not only ‘stochastic parrots’ as already suggested by Benderet al. (2021) but sometimes also ‘causal parrots’ since they will also encounter correlations over causal factsduring training in their vast oceans of textual data.
  
  Q: what wsa Bender's argument exactly?
13. mark.crowley 30 Oct 2023
  
  in Public
  
  parameterizedvariants of SCMs such as the neural ones presented in (Xia et al., 2021
  
  to read: this sounds interesting
  
  toread
14. mark.crowley 30 Oct 2023
  
  in Public
  
  y meta SCM
  
  Q: definition needed
15. mark.crowley 30 Oct 2023
  
  in Public
  
  However, this conclusion is arguably nothing new, as most people wouldagree, and this is partly so because such obtained knowledge has been embedded as textual articles into en-cyclopedias such as Wikipedia, which are freely accessibl
  
  Bit strange: this sounds like they are saying people know this because of wikipedia, rather than from lived experience.
16. mark.crowley 30 Oct 2023
  
  in Public
  
  IPEEE denotes the exogenousdistribution
  
  Q: Can we get a definition of this?
17. mark.crowley 30 Oct 2023
  
  in Public
  
  to our real worldintuition since there is a bidirected edge X ↔ Y ∈ G(M2) with E3 being the underlying confounder
  
  **Intuition: ** whatever explains GDP, we call E3, that also explains X and Y.
18. mark.crowley 28 Oct 2023
  
  in Public
  
  The following block paragraph serves as a summary
  
  question: where does this paragraph come from? who wrote it?
19. mark.crowley 28 Oct 2023
  
  in Public
  
  we take the former perspectivepro causal AI/ML. We argue that the questions around causality can fuel research also on questions of recentdebates such as how much ‘real’ progress towards AGI has been made since the advent of large scale models
  
  I would agree with this stance!
20. mark.crowley 28 Oct 2023
  
  in Public
  
  counteringopinions start to speak out against causal AI/ML (Bishop, 2021)
  
  Should we read this paper as well? Is there an updated paper or opinion piece from these researchers about why causal AI/ML isn't needed?
  
  causality
21. mark.crowley 25 Oct 2023
  
  in Public
  
  Zecevic, Willig, Singh Dhami and Kersting. "Causal Parrots: Large Language Models May Talk Causality But Are Not Causal". In Transactions on Machine Learning Research, Aug, 2023.
  
  transformers large-language-models nlp reading_group_crowley rdgrp-f23
Visit annotations in context

Tags

transformers

large-language-models

toread

causality

nlp

reading_group_crowley

rdgrp-f23

Annotators

mark.crowley

URL

arxiv.org/pdf/2308.13067.pdf
arxiv.org arxiv.org

RoBERTa: A Robustly Optimized BERT Pretraining Approach

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Introduction of the RoBERTa improved analysis and training approach to BERT NLP models.
  
  large-language-models nlp transformers rdgrp-s23 reading_group_crowley
Visit annotations in context

Tags

transformers

large-language-models

nlp

reading_group_crowley

rdgrp-s23

Annotators

mark.crowley

URL

arxiv.org/pdf/1907.11692
arxiv.org arxiv.org

2106.01345.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  (Chen, NeurIPS, 2021) Che1, Lu, Rajeswaran, Lee, Grover, Laskin, Abbeel, Srinivas, and Mordatch. "Decision Transformer: Reinforcement Learning via Sequence Modeling". Arxiv preprint rXiv:2106.01345v2, June, 2021.
  
  Quickly a very influential paper with a new idea of how to learn generative models of action prediction using SARSA training from demonstration trajectories. No optimization of actions or rewards, but target reward is an input.
  
  reinforcement-learning transformers generative-models minecraft minerl rdgrp-f23 reading_group_crowley
Visit annotations in context

Tags

minerl

transformers

minecraft

generative-models

reinforcement-learning

reading_group_crowley

rdgrp-f23

Annotators

mark.crowley

URL

arxiv.org/pdf/2106.01345
proceedings.mlr.press proceedings.mlr.press

kallus20a.pdf

2
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Kallus, N. (2020). DeepMatch: Balancing deep covariate representations for causal inference using adversarial training. In I. H. Daumé, & A. Singh (Eds.), Proceedings of the 37th international conference on machine learning. In Proceedings of Machine Learning Research: vol. 119 (pp. 5067–5077). PMLR
  
  causal-inference
2. mark.crowley 25 Oct 2023
  
  in Public
  
  Using adversarial deep learning approaches to get a better correction for causal inference from observational data.
  
  causal-inference
Visit annotations in context

Tags

causal-inference

Annotators

mark.crowley

URL

proceedings.mlr.press/v119/kallus20a/kallus20a.pdf
arxiv.org arxiv.org

2303.02186.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  "Causal Deep Learning" Authors:Jeroen Berrevoets, Krzysztof Kacprzyk, Zhaozhi Qian, Mihaela van der Schaar
  
  Very general and ambitious approach for representing the full continuous conceptual spectrum of Pearl's Causal Ladder, and ability to model and learning parts of this from Data.
  
  Discussed by Prof. van der Shaar at ICML2023 workshop on Counterfactuals.
  
  causal-inference causality to-file
Visit annotations in context

Tags

to-file

causality

causal-inference

Annotators

mark.crowley

URL

arxiv.org/pdf/2303.02186.pdf
www.nature.com www.nature.com

Scientific discovery in the age of artificial intelligence

16
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Performing optimization in the latent space can more flexibly model underlying data distributions than mechanistic approaches in the original hypothesis space. However, extrapolative prediction in sparsely explored regions of the hypothesis space can be poor. In many scientific disciplines, hypothesis spaces can be vastly larger than what can be examined through experimentation. For instance, it is estimated that there are approximately 1060 molecules, whereas even the largest chemical libraries contain fewer than 1010 molecules12,159. Therefore, there is a pressing need for methods to efficiently search through and identify high-quality candidate solutions in these largely unexplored regions.
  
  Question: how does this notion of hypothesis space relate to causal inference and reasoning?
  
  causality causal-inference ai-for-science question
2. mark.crowley 25 Oct 2023
  
  in Public
  
  Wang et. al. "Scientific discovery in the age of artificial intelligence", Nature, 2023.
  
  A paper about the current state of using AI/ML for scientific discovery, connected with the AI4Science workshops at major conferences.
  
  (NOTE: since Springer/Nature don't allow public pdfs to be linked without a paywall, we can't use hypothesis directly on the pdf of the paper, this link is to the website version of it which is what we'll use to guide discussion during the reading group.)
  
  machine-learning deep-learning ai-for-science artificial-intelligence reading_group_crowley rdgrp-f23
3. mark.crowley 25 Oct 2023
  
  in Public
  
  Petersen, B. K. et al. Deep symbolic regression: recovering mathematical expressions from data via risk-seeking policy gradients. In International Conference on Learning Representations (2020).
  
  Description: Reinforcement learning uses neural networks to generate a mathematical expression sequentially by adding mathematical symbols from a predefined vocabulary and using the learned policy to decide which notation symbol to be added next. The mathematical formula is represented as a parse tree. The learned policy takes the parse tree as input to determine what leaf node to expand and what notation (from the vocabulary) to add.
  
  rdgrp-f23 to-read
4. mark.crowley 25 Oct 2023
  
  in Public
  
  Reinforcement learning uses neural networks to generate a mathematical expression sequentially by adding mathematical symbols from a predefined vocabulary and using the learned policy to decide which notation symbol to be added next140. The mathematical formula is represented as a parse tree. The learned policy takes the parse tree as input to determine what leaf node to expand and what notation (from the vocabulary) to add
  
  very interesting approach
  
  to-read
5. mark.crowley 25 Oct 2023
  
  in Public
  
  In chemistry, models such as simplified molecular-input line-entry system (SMILES)-VAE155 can transform SMILES strings, which are molecular notations of chemical structures in the form of a discrete series of symbols that computers can easily understand, into a differentiable latent space that can be optimized using Bayesian optimization techniques (Fig. 3c).
  
  This could be useful for chemistry research for robotic labs.
  
  proj-chemgymrl to-read
6. mark.crowley 25 Oct 2023
  
  in Public
  
  Neural operators are guaranteed to be discretization invariant, meaning that they can work on any discretization of inputs and converge to a limit upon mesh refinement. Once neural operators are trained, they can be evaluated at any resolution without the need for re-training. In contrast, the performance of standard neural networks can degrade when data resolution during deployment changes from model training.
  
  Look this up: anyone familiar with this? sounds complicated but very promising for domains with a large range of resolutions (medical-imaging, wildfire-management)
  
  medical-imaging forest-wildfire to-read
7. mark.crowley 23 Oct 2023
  
  in Public
  
  Standard neural network models can be inadequate for scientific applications as they assume a fixed data discretization. This approach is unsuitable for many scientific datasets collected at varying resolutions and grids.
  
  Is discretized resolution of neural networks an issue for science?
8. mark.crowley 23 Oct 2023
  
  in Public
  
  generating hypotheses
  
  Are any of the "generated hypotheses" more general than a molecular shape? Are they full hypothetical explanations for a problem? (yes)
9. mark.crowley 23 Oct 2023
  
  in Public
  
  Applications of symbolic regression in physics use grammar VAEs150. These models represent discrete symbolic expressions as parse trees using context-free grammar and map the trees into a differentiable latent space. Bayesian optimization is then employed to optimize the latent space for symbolic laws while ensuring that the expressions are syntactically valid. In a related study, Brunton and colleagues151 introduced a method for differentiating symbolic rules by assigning trainable weights to predefined basis functions. Sparse regression was used to select a linear combination of the basis functions that accurately represented the dynamic system while maintaining compactness. Unlike equivariant neural networks, which use a predefined inductive bias to enforce symmetry, symmetry can be discovered as the characteristic behaviour of a domain. For instance, Liu and Tegmark152 described asymmetry as a smooth loss function and minimized the loss function to extract previously unknown symmetries. This approach was applied to uncover hidden symmetries in black-hole waveform datasets, revealing unexpected space–time structures that were historically challenging to find.
  
  This seems very important, even though I only understand half of it. My question is, can similar approaches be used to apply to planning in complex domains or to meaning and truth in language?
  
  question
10. mark.crowley 23 Oct 2023
  
  in Public
  
  to address the difficulties that scientists care about, the development and evaluation of AI methods must be done in real-world scenarios, such as plausibly realizable synthesis paths in drug design217,218, and include well calibrated uncertainty estimators to assess the model’s reliability before transitioning it to real-world implementation
  
  It's important to move beyond toy models.
11. mark.crowley 23 Oct 2023
  
  in Public
  
  However, current transfer-learning schemes can be ad hoc, lack theoretical guidance213 and are vulnerable to shifts in underlying distributions214. Although preliminary attempts have addressed this challenge215,216, more exploration is needed to systematically measure transferability across domains and prevent negative transfer.
  
  There is still a lot of work to do to know how to best use human knowledge to guide learning systems and how to reuse models in different domains.
12. mark.crowley 23 Oct 2023
  
  in Public
  
  Another approach for using neural networks to solve mathematical problems is transforming a mathematical formula into a binary sequence of symbols. A neural network policy can then probabilistically and sequentially grow the sequence one binary character at a time6. By designing a reward that measures the ability to refute the conjecture, this approach can find a refutation to a mathematical conjecture without prior knowledge about the mathematical problem.
  
  A nice idea to learn a formula of symbols which can be evaluated logically for truth. But do they mention more general approaches such as using SAT solvers for this task? See Vijay Ganesh work.
  
  question satisfiability
13. mark.crowley 23 Oct 2023
  
  in Public
  
  foresighted
  
  is "foresighted" a word?
14. mark.crowley 23 Oct 2023
  
  in Public
  
  AI methods have become invaluable when hypotheses involve complex objects such as molecules. For instance, in protein folding, AlphaFold210 can predict the 3D atom coordinates of proteins from amino acid sequences with atomic accuracy, even for proteins whose structure is unlike any of the proteins in the training dataset.
  
  This is an important category, but it can't apply to all fields and will have a limit to what it can do to move science forward. It's also very dependent on vast computing resources.
15. mark.crowley 23 Oct 2023
  
  in Public
  
  Transformer architectures
  
  Question: what is the inductive bias of Transformers for NLP? Can we define the symmetries that are implicitly leveraged in the architecture.
16. mark.crowley 23 Oct 2023
  
  in Public
  
  Such pretrained models96,97,98 with a broad understanding of a scientific domain are general-purpose predictors that can be adapted for various tasks, thereby improving label efficiency and surpassing purely supervised methods8.
  
  Pre-trained models: these are obviously important and powerful, they almost always work better than training from scratch.
  
  general-purpose predictors: However, we should be suspicious of accepting this claim that they are general purpose predictors. Why?
  
  Have all of the scenarios been tested?
  
  Does the system have a general underlying model?
  
  Is there some bias in the training and testing data?
  
  Example: - you pretrain a model on motion of objects on a plane, such a pool table. You learn a very good model to predict movement. - Now, does it work if the table is curved? or even has bumps and imperfections? - Now train it on 3D Netwonian examples, will it predict relativitistic effects? (No)
Visit annotations in context

Tags

artificial-intelligence

to-read

causal-inference

satisfiability

deep-learning

machine-learning

reading_group_crowley

rdgrp-f23

medical-imaging

question

proj-chemgymrl

causality

forest-wildfire

ai-for-science

Annotators

mark.crowley

URL

nature.com/articles/s41586-023-06221-2
arxiv.org arxiv.org

Untitled document

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  [ Bengio, The Consciousness Prior, Arxiv, 2018]
  
  causal-inference causality to-read
Visit annotations in context

Tags

causality

to-read

causal-inference

Annotators

mark.crowley

URL

arxiv.org/pdf/1709.08568.pdf
arxiv.org arxiv.org

Causal Deep Learning

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Causal Deep Learning Authors:Jeroen Berrevoets, Krzysztof Kacprzyk, Zhaozhi Qian, Mihaela van der Schaar
  
  Very general and ambitious approach for representing the full continuous conceptual spectrum of Pearl's Causal Ladder, and ability to model and learning parts of this from Data.
  
  Discussed by Prof. van der Shaar at ICML2023 workshop on Counterfactuals.
  
  causal-inference causality icml2023 to-file
Visit annotations in context

Tags

icml2023

causality

to-file

causal-inference

Annotators

mark.crowley

URL

arxiv.org/abs/2303.02186
arxiv.org arxiv.org

Estimating causal effects with optimization-based methods: A review and empirical comparison

2
1. mark.crowley 25 Oct 2023
  
  in Public
  
  (Cousineau,Verter, Murphy and Pineau, 2023) " Estimating causal effects with optimization-based methods: A review and empirical comparison"
  
  causal-inference
2. mark.crowley 25 Oct 2023
  
  in Public
  
  Bias-variance trade-off
  
  The Bias - Variance Tradeoff!
  
  ece657a
Visit annotations in context

Tags

ece657a

causal-inference

Annotators

mark.crowley

URL

arxiv.org/pdf/2203.00097.pdf
oid.wharton.upenn.edu oid.wharton.upenn.edu

Microsoft Word - A Review of Empirical Operations Management over the Last Two Decades Fi....docx

3
1. mark.crowley 25 Oct 2023
  
  in Public
  
  To avoid such bias, a fundamental aspect in the research design of studies of causalinference is the identification strategy: a clear definition of the sources of variation in the datathat can be used to estimate the causal effect of interest.
  
  To avoid making false conclusions, studies must identify all the sources of variation. Is this is even possible in most caes?
  
  causality causal-inference
2. mark.crowley 25 Oct 2023
  
  in Public
  
  Matching: This approach seeks to replicate a balanced experimental design usingobservational data by finding close matches between pairs or groups of units andseparating out the ones that received a specified treatment from those that did not, thusdefining the control groups.
  
  Matching approach to dealing with sampling bias. Basically use some intrinsic, or other, metric about the situations to cluster them so that "similar" situations will be dealt with similiarly. Then analysis is carried out on those clusters. Number of clusters has to be defined, some method, like k-means, if often used. Depends a lot on the similarity metric, the clustering approach, other assumptions
  
  causal-inference
3. mark.crowley 25 Oct 2023
  
  in Public
  
  Terwiesch, 2022 - "A review of Empircal Operations Managment over the Last Two Decades" Listed as an important review of methods for addressing biases in Operations management by explicitly addressing causality.
  
  causality causal-inference
Visit annotations in context

Tags

causality

causal-inference

Annotators

mark.crowley

URL

oid.wharton.upenn.edu/wp-content/uploads/2018/09/A-Review-of-Empirical-Operations-Management-over-the-Last-Two-Decades-Fi....pdf
openreview.net openreview.net

Generative Causal Representation Learning for Out-of-Distribution Motion Forecasting

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Shayan Shirahmad Gale Bagi, Zahra Gharaee, Oliver Schulte, and Mark Crowley Generative Causal Representation Learning for Out-of-Distribution Motion Forecasting In International Conference on Machine Learning (ICML). Honolulu, Hawaii, USA. Jul, 2023.
  
  causality causal-inference deep-learning machine-learning icml icml2023
Visit annotations in context

Tags

machine-learning

causal-inference

deep-learning

causality

icml2023

icml

Annotators

mark.crowley

URL

openreview.net/pdf
arxiv.org arxiv.org

2301.05169.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  "Causal Triplet: An Open Challenge for Intervention-centric Causal Representation Learning" Yuejiang Liu1, 2,* YUEJIANG.LIU@EPFL.CH Alexandre Alahi2 ALEXANDRE.ALAHI@EPFL.CH Chris Russell1 CMRUSS@AMAZON.DE Max Horn1 HORNMAX@AMAZON.DE Dominik Zietlow1 ZIETLD@AMAZON.DE Bernhard Sch ̈olkopf1, 3 BS@TUEBINGEN.MPG.DE Francesco Locatello1 LOCATELF@AMAZON.DE
  
  causality causal-inference open-dataset dataset student-shayan
Visit annotations in context

Tags

dataset

student-shayan

causal-inference

open-dataset

causality

Annotators

mark.crowley

URL

arxiv.org/pdf/2301.05169.pdf
arxiv.org arxiv.org

2305.15486.pdf

2
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Wu, Prabhumoye, Yeon Min, Bisk, Salakhutdinov, Azaria, Mitchell and Li. "SPRING: GPT-4 Out-performs RL Algorithms byStudying Papers and Reasoning". Arxiv preprint arXiv:2305.15486v2, May, 2023.
  
  reinforcement-learning nlp large-language-models chatgpt minecraft evaluation-methods rdgrp-f23
2. mark.crowley 25 Oct 2023
  
  in Public
  
  Quantitatively, SPRING with GPT-4 outperforms all state-of-the-art RLbaselines, trained for 1M steps, without any training.
  
  Them's fighten' words!
  
  I haven't read it yet, but we're putting it on the list for this fall's reading group. Seriously, a strong result with a very strong implied claim. they are careful to say it's from their empirical results, very worth a look. I suspect that amount of implicit knowledge in the papers, text and DAG are helping to do this.
  
  The Big Question: is their comparison to RL baselines fair, are they being trained from scratch? What does a fair comparison of any from-scratch model (RL or supervised) mean when compared to an LLM approach (or any approach using a foundation model), when that model is not really from scratch.
  
  reinforcement-learning rdgrp-f23 reading_group_crowley nlp larg deep-learning self-supervised supervised-learning evaluation-methods
Visit annotations in context

Tags

minecraft

supervised-learning

deep-learning

reinforcement-learning

self-supervised

rdgrp-f23

evaluation-methods

chatgpt

large-language-models

larg

nlp

reading_group_crowley

Annotators

mark.crowley

URL

arxiv.org/pdf/2305.15486.pdf
link.springer.com link.springer.com

Untitled document

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Chapter 21 "Adversarial Autonencoders" from our book "Elements of Dimensionality Reduction and Manifold Learning", Springer 2023.
  
  manifold-book manifold-learning dimensionality-reduction representation-learning autoencoders
Visit annotations in context

Tags

dimensionality-reduction

manifold-learning

autoencoders

representation-learning

manifold-book

Annotators

mark.crowley

URL

link.springer.com/content/pdf/10.1007/978-3-031-10602-6_21.pdf
assets.pubpub.org assets.pubpub.org

71652816875953.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Discussion of the paper:
  
  Ghojogh B, Ghodsi A, Karray F, Crowley M. Theoretical Connection between Locally Linear Embedding, Factor Analysis, and Probabilistic PCA. Proceedings of the Canadian Conference on Artificial Intelligence [Internet]. 2022 May 27; Available from: https://caiac.pubpub.org/pub/7eqtuyyc
  
  CanAI2022 dimensionality-reduction manifold-learning machine-learning university-waterloo
Visit annotations in context

Tags

dimensionality-reduction

manifold-learning

machine-learning

university-waterloo

CanAI2022

Annotators

mark.crowley

URL

assets.pubpub.org/zbfq7fzb/71652816875953.pdf
www.gatesnotes.com www.gatesnotes.com

The Age of AI has begun

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  "The Age of AI has begun : Artificial intelligence is as revolutionary as mobile phones and the Internet." Bill Gates, March 21, 2023. GatesNotes
  
  aig chatgpt large-language-models
Visit annotations in context

Tags

chatgpt

aig

large-language-models

Annotators

mark.crowley

URL

gatesnotes.com/The-Age-of-AI-Has-Begun
www.inc.com www.inc.com

Bill Gates Says We're Witnessing a 'Stunning' New Technology Age. 5 Ways You Must Prepare Now

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Minda Zetlin. "Bill Gates Says We're Witnessing a 'Stunning' New Technology Age. 5 Ways You Must Prepare Now". Inc.com, March 2023.
  
  chatgpt openai large-language-models
Visit annotations in context

Tags

chatgpt

large-language-models

openai

Annotators

mark.crowley

URL

inc.com/minda-zetlin/bill-gates-says-were-witnessing-a-stunning-new-technology-age-5-ways-to-prepare.html
openai.com openai.com

New AI classifier for indicating AI-written text

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  It should not be used as a primary decision-making tool, but instead as a complement to other methods of determining the source of a piece of text.
  
  This is true of any of these LLM models actually for any task.
  
  chatgpt
Visit annotations in context

Tags

chatgpt

Annotators

mark.crowley

URL

openai.com/blog/new-ai-classifier-for-indicating-ai-written-text
arxiv.org arxiv.org

2212.05032.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Feng, 2022. "Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis"
  
  Shared and found via: Gowthami Somepalli @gowthami@sigmoid.social Mastodon > Gowthami Somepalli @gowthami StructureDiffusion: Improve the compositional generation capabilities of text-to-image #diffusion models by modifying the text guidance by using a constituency tree or a scene graph.
  
  chatgpt large-language-models nlp transformers ece657a
Visit annotations in context

Tags

ece657a

transformers

large-language-models

chatgpt

nlp

Annotators

mark.crowley

URL

arxiv.org/pdf/2212.05032.pdf
arxiv.org arxiv.org

2203.02155.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Training language models to follow instructionswith human feedback
  
  Original Paper for discussion of the Reinforcement Learning with Human Feedback algorithm.
  
  large-language-models reinforcement-learning chatgpt
Visit annotations in context

Tags

chatgpt

reinforcement-learning

large-language-models

Annotators

mark.crowley

URL

arxiv.org/pdf/2203.02155
arxiv.org arxiv.org

2209.07550.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  [Kapturowski, DeepMind, Sep 2022] "Human-level Atari 200x Faster"
  
  Improving the 2020 Agent57 performance to be more efficeint.
  
  Arxiv: https://arxiv.org/abs/2209.07550
  
  reinforcement-learning atari-games ece457c to-read
Visit annotations in context

Tags

reinforcement-learning

to-read

atari-games

ece457c

Annotators

mark.crowley

URL

arxiv.org/pdf/2209.07550.pdf
cdn.openai.com cdn.openai.com

Language Models are Unsupervised Multitask Learners

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  GPT-2 Introduction paper
  
  Language Models are Unsupervised Multitask Learners A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, (2019).
  
  large-language-models nlp machine-learning transformers gpt reading_group_crowley rdgrp-s23
Visit annotations in context

Tags

transformers

large-language-models

rdgrp-s23

machine-learning

nlp

reading_group_crowley

gpt

Annotators

mark.crowley

URL

cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
arxiv.org arxiv.org

1706.03762.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  "Attention is All You Need" Foundational paper introducing the Transformer Architecture.
  
  transformers reading_group_crowley rdgrp-s23 large-language-models nlp
Visit annotations in context

Tags

transformers

large-language-models

nlp

reading_group_crowley

rdgrp-s23

Annotators

mark.crowley

URL

arxiv.org/pdf/1706.03762
papers.nips.cc papers.nips.cc

NeurIPS-2020-language-models-are-few-shot-learners-Paper.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  GPT-3 introduction paper
  
  large-language-models nlp machine-learning transformers gpt reading_group_crowley rdgrp-s23
Visit annotations in context

Tags

transformers

large-language-models

rdgrp-s23

machine-learning

nlp

reading_group_crowley

gpt

Annotators

mark.crowley

URL

papers.nips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
arxiv.org arxiv.org

2105.03322.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  "Are Pre-trained Convolutions Better than Pre-trained Transformers?"
  
  transformers deep-learning nlp large-language-models reading_group_crowley rdgrp-s23
Visit annotations in context

Tags

transformers

large-language-models

deep-learning

nlp

reading_group_crowley

rdgrp-s23

Annotators

mark.crowley

URL

arxiv.org/pdf/2105.03322.pdf
arxiv.org arxiv.org

2201.08239.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  LaMDA: Language Models for Dialog Application
  
  "LaMDA: Language Models for Dialog Application" Meta's introduction of LaMDA v1 Large Language Model.
  
  transformers reading_group_crowley rdgrp-s23 large-language-models nlp
Visit annotations in context

Tags

transformers

large-language-models

nlp

reading_group_crowley

rdgrp-s23

Annotators

mark.crowley

URL

arxiv.org/pdf/2201.08239.pdf
osf.io osf.io

Attention Mechanism, Transformers, BERT, and GPT: Tutorial and Survey

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  Benyamin GhojoghAli Ghodsi. "Attention Mechanism, Transformers, BERT, and GPT: Tutorial and Survey"
  
  reading_group_crowley transformers reading_group_crowley rdgrp-s23 nlp large-language-models
Visit annotations in context

Tags

transformers

large-language-models

nlp

reading_group_crowley

rdgrp-s23

Annotators

mark.crowley

URL

osf.io/m6gcn/

Mark Crowley

Associate Professor as the University of Waterloo.

Research and teaching on topics in Artificial Intelligence, Machine Learning and Reinforcement Learning.

Reading group links: https://markcrowley.ca/reading-groups/

Annotations: 411

Joined: April 4, 2020

Location: Waterloo, Canada

Link: markcrowley.ca

ORCID: 0000-0003-3921-4762

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL