40 Matching Annotations
  1. Last 7 days
    1. he wind speed component u is148often not available or its use is restricted in most meteorological satellite imagery or NWP

      is this correct? it can be derived from the two wind components?

    2. the authors themselves have pointed out that these models tend to produce blurred pre-111dictions, highlighting a critical issue.

      should evaluate gencast or neural gcm as a contrast family of models

    3. Graphcastnet

      should be GraphCast only

    4. Data-driven Weather Forecasting

      missing climax, Stormer, fengwu, keisler et al, gencast, neural gcm

    5. ERA5 provides data at six-hour intervals, includes 13 pressure levels, and79covers 62 meteorological variables

      wrong, the full ERA5 has much more variables and pressure levels

    6. Table 1: Summary of Characteristics for Various Evaluation and ACE Metrics

      this table is confusing, the tick implies these are good characteristics while they are not

    7. how perceptual the image was (how well the pixel distributions matched), regardless of how well the37model predicted the actual weather.

      unclear why the two are different

      the authors are judging the different forecasts based on their perception as well

    1. Experiments

      this section lacks important experimental details

      are the models fine-tuned on HRRR or simply evaluated on it?

      FuXi and Pangu work on 0.25deg grid, how did you adapt them to 3km-resolution data?

      why only use July-end of 2020 for evaluation instead of the whole year?

    2. Dataset Construction

      to clarify, all training and evaluation are done with the HRRR dataset, and the extreme events collected from storm database and storm prediction center are only to define which areas at what timestamp are considered an extreme event?

    3. Advancement in Models

      missing many models:

      Graphcast, climax, fengwu, Stormer, gencast, neuralgcm, keisler et al.,

  2. Jul 2024
    1. we use the encoder-decoder model in Figure 3 to calculate the feature-wise generation272probability using mean squared error (MSE) between X ′ and its generation ̄X ′.

      MSE between forecast and its VAE reconstruction?

    2. Table 2: Forecasting Error (MAE,

      this metric gives no insight to the actual performance

      should have a separate metric for each weather variable at their native scale

    3. Deployment of SGM for Image Segmentation

      this has nothing to do with the other two tasks and datasets

      and the description is super vague with no information at all, again.

    4. As long as this encoder-decoder model178can capture the latent distribution for normal events, then the generation probability of a piece of179time series data can be utilized as the condition for detecting anomaly patterns. This is because the180extreme values are identified with a remarkably low generation probability.

      this does not make sense at all

    5. Deployment of SGM for Time Series Forecasting and Anomaly Detection

      super vague and confusing

      no clear description of notations (what is H, what is X), the model architecture, is the model predicting the future or simple reconstructing?

      encoder-decoder: where is the encoder and where is the decoder, then mention sequence-to-sequence model, what is it eventually?

      what is Granger Causality and how is it used? what role does it play in the model?

      only mentioning the architecture (GNN) at the end with no detail at all

      encoder-decoder, sequence-to-sequence, Structural Equation Model, Granger Causality --> just trying to bring up unnecessarily complicated concepts

    6. Given the image data X ∈ RN ×D×T , we aim to predict the crop type of each154pixel y ∈ RN , as shown in Figure 2.

      not related to the other two tasks?

    7. i ∈ {1 . . . , N } can be the number of spatial locations (e.g., counties), d ∈ {1 . . . , D} can be134the dimension of weather features (e.g., temperature and humidity), and t ∈ {1 . . . , T } can be the135timestamp (e.g., hour).

      not common setting for weather forecasting

    8. For each chip,122we retrieve 3 satellite images from the NASA HLS dataset evenly distributed in time from March to123September 2022 to capture the ground view at different stages of the season

      how is this dataset aligned with the others temporally?

    9. The geographic103distribution of 238 selected counties in the United States of America is shown in Figure 1,

      so this is a point-wise dataset

      not standard in the literature, who will benefit from this?

    10. Related Work

      missing chaosbench

      possibly worth mentioning large-scale models in climate science that benefit from this effort

    1. Our97approach extends this by applying learnable scalar weights to each attention head

      limited novelty? instead of sum you use a weighted sum?

    2. when a transformer is trained with prompts containing60up to Ttrain examples, its ICL performance significantly deteriorates for prompts of length T > Ttrain.

      not always true TNP can generalize maybe this is true for autoregressive models because the positional embedding is not trained?

    3. current approaches46often fail to generalize to synthetic tasks1 consisting of a range of linear and nonlinear functions.

      TNP can generalize

    4. Not limited to linguistic tasks, it has been also demonstrated that transformers can in-context learn27a general class of functions F [ 7].

      missing references to TNP

    1. Experiments

      the experiment section is very confusing without any experiment setup information

      where is the zero-shot and few-shot settings mentioned in the introduction?

      how is POM adapted to a new objective? via fine-tuning?

    2. The final loss function (line 10) is determined by computing the average of the193loss functions for all tasks. Subsequently, in line 12, we update θ using a gradient-based optimizer,194such as Adam [ 44 ].

      this is difficult to believe

      how can we expect the same set of parameters to learn from many tasks?

      for example, if two tasks that conflicting gradient signals, they get canceled out, and the model basically learns nothing

      for example, for a certain x, for task i, the model should produce x', but for task j, the model should produce x' 'r' --> conflicting gradient signals

    3. Design of POM

      overall, this section is overly mathy with unecessary equations

      distracts away from the idea of the proposed method

    4. Zt ∈ RN ×3 is the population134information used by LCM.

      again, how did the author decide this?

    5. Ht = [ht1, ht2, · · · , htN ] serves as LMM’s input, encapsulating population information.

      how did the author decide this?

    6. When all individuals engage in information exchange, the algorithm’s convergence may suffer,122diversity could diminish, and susceptibility to local optima increases. To address this, we introduce a123mask operation during both training and testing phases, where the probability of setting each element124in ˆSt to 0 is rmask.

      there should be an ablation study here

    7. Equation (6) details the computation of ˆSt.

      how do you come up with this? is this simply attention? if yes why present here?

    8. St = LM M (Ht|θ1)

      S_t is produced by a neural network with parameters theta_1 and input H_t, where H_t is devised from the current population

    9. LCM

      what is LCM?

    10. LMM

      what is LMM?

    11. Pretrained Optimization Model

      difficult to read without a background section on population-based optimization

    1. Table 1: Selection of results for global forecasting, including geopotential at

      should have the line graph here

    2. we found it crucial to fine-tune232on rolled out forecasts of multiple time steps. This improves stability and performance for longer233lead times. In the final fine-tuning we include also a Continuous Ranked Probability Score (CRPS)234loss term [19, 26].

      how? should perform ablation study here as well

    3. Table 2: Selection of results for LAM forecasting, including geopotential at

      why in this case the graphcast arc is doing better than the proposed arc?

    4. Weather Forecasting with Hierarchical Graph Neural Networks

      is the architecture important to the performance? or can we just use a Unet architecture?

    5. GraphCast*+SWAG

      why not compare with graphcast perturbed and gencast?