24 Matching Annotations
  1. Jan 2025
  2. notebooksharing.space notebooksharing.space
    1. CHM_smooth = CHM %>% # any values below 0, set to 0 terra::clamp(lower = 0, values = TRUE) %>% # fill in NA values with their neighbors' min value terra::focal(w=3, fun="min", na.policy="only", na.rm=TRUE, expand = FALSE) %>% ifel(!is.na(crop(DTM_smooth, .)) & is.na(.), 0, .) %>% # apply a median filter only to cells where there are big changes in Z value ifel(terra::terrain(., "TRI") > 5, terra::focal(., w = 3, fun = "median", na.policy = "omit", na.rm = TRUE, expand = FALSE), ., filename = paste0(root_dir, "/output/_4_CHM_raster/CHM_smooth.tif"), overwrite = TRUE)

      I only had a look at the output product, but this currently seems to replace NA values by 0 values. It might be easier to debug/understand without the tidyverse pipe/concatenation (cf. also my comment below).

    2. This approach refits each model, leaving out one hectare of field plot data at a time (four subplots).

      I know there are limitations of what is doable with field plot data, and I think this is a good start. However, I would probably either increase the spatial fold size to 4 ha, or do a buffered leave one out cross validation, i.e. remove one 0.25 ha pixel each and its entire neighbourhood of 8 cells (9 * 0.25 ha = 2.25 ha). Realistically, AGB will be correlated over distances larger than this, so testing a 5 x 5 window (25 * 0.25 ha = 6.25 ha) or 7 x 7 window would be even better.

      Alternatively, at plots as large as the 50 ha BCI plot, it would also make sense to subdivide into 10 5ha or even 5 10 ha folds, and see how well the models do there.

    3. LAI: leaf area index above 5 m; derived from canopy cover adjusted for scan angle

      LAI is probably dependent on scanner properties (some scanners will penetrate deeper than others) and due to the non-linear transformation can also come with high noise levels (inflating small differences in cover). It's an interesting ecological metric, but I am wondering whether it is needed for AGB modelling. I am not sure it will massively improve models, and if there is a relationship, one could use cover directly (and log transform it, if this creates a better model)

    4. z_count: the number of returns.

      Return/point density is very heavily dependent on instrumentation, because some lasers are more powerful than others. A more reliable statistic is pulse density, which can be approximated by the return density of the first or last returns. I would replace z_count by "number of first returns" or "number of last returns".

    5. Generate canopy metrics as predictors for modeling of AGB at a pixel resolution of 50 m

      Point cloud properties to some extent depend on laser instruments, and raster products are usually more robust (alhtough within limits). I would consider:

      • computing the equivalent statistics directly from the CHM, i.e. mean canopy height from the CHM, 99th percentile from the CHM, etc.
      • computing the point cloud statistics only from first returns, because first returns tend to be more comparable across instruments than all returns (cf. also my point below about z_count).

      We recently came up with a few recommendations regarding this (Fischer et al. 2024, MEE)

    6. transform_with(dtm_smooth, "-") +

      Subtracting the best/corrected DTM makes a lot of sense to me.

      However, I think this might create an inconsistency between the point cloud product and the raster products, because there will be points classified as "ground", but also far away from the smoothed raster DTM (potentially several m). This is unproblematic if only raster products are used in subsequent steps, but could lead to odd behaviour if point cloud statistics and raster products are combined. If the raster DTM is the best ground classification, then I would run a filter here again and remove the "ground" label from all points > 0.5 and < -0.5 m from the raster surface to make sure that both products are consistent.

    7. Scan angles +/- 15 degrees from nadir

      Filtering to scan angles < 15 or < 20 is also a great idea, but I generally would not do this at this stage. Several reasons:

      • ground detection: ground classification is probably not very sensitive to scan angle, and the potential biases should be outweighed by the improved ground sampling in areas with complex terrain
      • scan angle definitions: scan angle is not always consistently defined (aircraft roll corrections or not), so this might inadvertently exclude the wrong angles, e.g., https://groups.google.com/g/lastools/c/H-qicsOxUhE/m/3YjrF-uLAAAJ

      I would suggest creating a CHM with this filter and without this filter to propagate errors/uncertainty, and creating a scan angle map that could be used as a predictor

    8. AGB_df %>%

      This is a matter of personal preference, but I find very long tidyverse pipes very hard to read/debug/access. If something goes wrong somewhere inside the pipe, I usually struggle to find the reason. I would suggest splitting this up into smaller steps, with intermediate outputs, or even do this with base R

    9. Laura and I have discussed this issue, and I think that it should be a larger discussions in the TIG. There are lots of new proposals coming for GEO-TREES sites; some have only one large forest plot, while others have many, scattered, 1-ha plots.

      One way to deal with this could be to use Bayesian hierarchical models fitted across sites. When parameterized with random effects, they will pool information to correct extreme estimates and fill in information that is locally missing. For example, there could be a default hierarchical model that is trained across already existing sites in the GEO-TREES network or beyond, which could then be locally updated with new information.

    10. best model

      What is the best model is going to depend a lot on the type of validation. I would bet that the RF model becomes poorer and poorer with a more rigorous spatial cross validation. I would probably either provide predictions for the same models across all sites or choose one default model fitted across sties.

    11. Negative AGB values.

      One way is to not rely too much on RF, but use more robust alternatives: - simple linear regressions - Bayesian hierarchical models, potentially fitted across sites with random effects (or updated with data from other sites)

    12. Goodness-of-fit for model predictions of AGB density and AGB density from field census data, for 0.25-ha subplots.

      minor point: as far as I can see, these are not yet cross-validation results. From a communication perspective I would probably not even create these plots, because they create the (misleading) impression that RF models are great. Plotting a linear regression like this may make sense, because it is fairly robust, but the Bias/MAE/RMSE of random forests are not usually calculated on the training samples due to overfitting. Alternatively, a good figure here would be non-spatial cross validation results, i.e., simple leave-one-out cross validation

    13. lin = step(lm(AGB_per_ha ~ Z_p99 + Volume + LAI_5m + Z_CV + Slope + Aspect, data = AGB_df), direction = "backward", trace = 0)

      When plot sizes are standardized, CHM-derived volume is perfectly correlated with (or just a transformation of) mean canopy height. I would use mean canopy height for interpretability. Also, cf. comment above (with reference Labriere et al. 2018) and Piotr's comment below. AGB_per_ha ~ log(Volume) would make more sense to me.

    14. defensible approaches

      I fully agree. To make the models defensible, I would remove most of the predictors variables, unless there is a clear justification why they should be important. I imagine that biomass might well vary with aspect and slope (wind exposure, etc.), but I am not sure a continguous 50 ha plot - as amazing as it is - can robustly assess this relationship. I would go for simple models predicting AGB from the log() of canopy height or volume (e.g. Labriere et al. 2018)

    15. apply a laplachian filter to replace spikes and pits with median values

      How robust are these processing steps to variation in pulse density / laser penetration into the canopy?

    16. paste0(root_dir, "/output/_1_classified_points/*_ground.las")

      super minor change, but for future code adaptability, it may make sense to replace this by something like file.path(dir_list[1], "*_ground.las").

      I won't mention this for the other write_las operations, but this would apply across the entire code.

    17. cloth_resolution = 0.3

      The default parameters from lidR seem very different (cf. below) - would be good to motivate the choices here, ideally by testing their robustness to different scan configurations and picking the most robust one. I have only limited experience with the CSF algorithm, but I remember that the defaults produced much more stable results than other configurations I tried out.

      Defaults: - class_threshold = 0.5 - cloth_resolution = 0.5 - rigidness = 1L, - iterations = 500L, - time_step = 0.65

    18. -drop_withheld"

      This step could also drop classes 7 and 18, which are the common noise classifications used by providers. An additional field could be provided for special noise classes (e.g. extended classification in newer LAS files)

    19. Points above the lowest elevation expected at the site, to exclude belowground noise

      This is a great idea and could also be implemented for top canopy height (i.e., no tree > 125 m, etc.).

      A potential problem: Some ALS scans come with a vertical reference to an ellipsoid, some with a vertical reference to a geoid, and the differences in elevation can easily be 50 m, so this requires careful manual inputs / documentation of what is meant.

    20. root_dir

      Super minor point, but for usability it might help to have a consistent naming convention for paths. I would suggest: - dir_root instead of root_dir - dir_output_list instead of dir_list - dir_input instead of path_als