Hypothesis

24 Matching Annotations

Jan 2025
notebooksharing.space notebooksharing.space

Notebook

24
1. fjdfischer 14 Jan 2025
  
  in Public
  
  CHM_smooth = CHM %>% # any values below 0, set to 0 terra::clamp(lower = 0, values = TRUE) %>% # fill in NA values with their neighbors' min value terra::focal(w=3, fun="min", na.policy="only", na.rm=TRUE, expand = FALSE) %>% ifel(!is.na(crop(DTM_smooth, .)) & is.na(.), 0, .) %>% # apply a median filter only to cells where there are big changes in Z value ifel(terra::terrain(., "TRI") > 5, terra::focal(., w = 3, fun = "median", na.policy = "omit", na.rm = TRUE, expand = FALSE), ., filename = paste0(root_dir, "/output/_4_CHM_raster/CHM_smooth.tif"), overwrite = TRUE)
  
  I only had a look at the output product, but this currently seems to replace NA values by 0 values. It might be easier to debug/understand without the tidyverse pipe/concatenation (cf. also my comment below).
2. fjdfischer 09 Jan 2025
  
  in Public
  
  las
  
  generally, they could also be written as .laz to save space
3. fjdfischer 09 Jan 2025
  
  in Public
  
  ncores = 16
  
  note that parallel processing did not work on my macOS (because no support for OpenMP)
4. fjdfischer 09 Jan 2025
  
  in Public
  
  This approach refits each model, leaving out one hectare of field plot data at a time (four subplots).
  
  I know there are limitations of what is doable with field plot data, and I think this is a good start. However, I would probably either increase the spatial fold size to 4 ha, or do a buffered leave one out cross validation, i.e. remove one 0.25 ha pixel each and its entire neighbourhood of 8 cells (9 * 0.25 ha = 2.25 ha). Realistically, AGB will be correlated over distances larger than this, so testing a 5 x 5 window (25 * 0.25 ha = 6.25 ha) or 7 x 7 window would be even better.
  
  Alternatively, at plots as large as the 50 ha BCI plot, it would also make sense to subdivide into 10 5ha or even 5 10 ha folds, and see how well the models do there.
5. fjdfischer 09 Jan 2025
  
  in Public
  
  LAI: leaf area index above 5 m; derived from canopy cover adjusted for scan angle
  
  LAI is probably dependent on scanner properties (some scanners will penetrate deeper than others) and due to the non-linear transformation can also come with high noise levels (inflating small differences in cover). It's an interesting ecological metric, but I am wondering whether it is needed for AGB modelling. I am not sure it will massively improve models, and if there is a relationship, one could use cover directly (and log transform it, if this creates a better model)
6. fjdfischer 09 Jan 2025
  
  in Public
  
  z_count: the number of returns.
  
  Return/point density is very heavily dependent on instrumentation, because some lasers are more powerful than others. A more reliable statistic is pulse density, which can be approximated by the return density of the first or last returns. I would replace z_count by "number of first returns" or "number of last returns".
7. fjdfischer 09 Jan 2025
  
  in Public
  
  Generate canopy metrics as predictors for modeling of AGB at a pixel resolution of 50 m
  
  Point cloud properties to some extent depend on laser instruments, and raster products are usually more robust (alhtough within limits). I would consider:
  
  computing the equivalent statistics directly from the CHM, i.e. mean canopy height from the CHM, 99th percentile from the CHM, etc.
  
  computing the point cloud statistics only from first returns, because first returns tend to be more comparable across instruments than all returns (cf. also my point below about z_count).
  
  We recently came up with a few recommendations regarding this (Fischer et al. 2024, MEE)
8. fjdfischer 09 Jan 2025
  
  in Public
  
  transform_with(dtm_smooth, "-") +
  
  Subtracting the best/corrected DTM makes a lot of sense to me.
  
  However, I think this might create an inconsistency between the point cloud product and the raster products, because there will be points classified as "ground", but also far away from the smoothed raster DTM (potentially several m). This is unproblematic if only raster products are used in subsequent steps, but could lead to odd behaviour if point cloud statistics and raster products are combined. If the raster DTM is the best ground classification, then I would run a filter here again and remove the "ground" label from all points > 0.5 and < -0.5 m from the raster surface to make sure that both products are consistent.
9. fjdfischer 09 Jan 2025
  
  in Public
  
  Scan angles +/- 15 degrees from nadir
  
  Filtering to scan angles < 15 or < 20 is also a great idea, but I generally would not do this at this stage. Several reasons:
  
  ground detection: ground classification is probably not very sensitive to scan angle, and the potential biases should be outweighed by the improved ground sampling in areas with complex terrain
  
  scan angle definitions: scan angle is not always consistently defined (aircraft roll corrections or not), so this might inadvertently exclude the wrong angles, e.g., https://groups.google.com/g/lastools/c/H-qicsOxUhE/m/3YjrF-uLAAAJ
  
  I would suggest creating a CHM with this filter and without this filter to propagate errors/uncertainty, and creating a scan angle map that could be used as a predictor
10. fjdfischer 09 Jan 2025
  
  in Public
  
  AGB_df %>%
  
  This is a matter of personal preference, but I find very long tidyverse pipes very hard to read/debug/access. If something goes wrong somewhere inside the pipe, I usually struggle to find the reason. I would suggest splitting this up into smaller steps, with intermediate outputs, or even do this with base R
11. fjdfischer 09 Jan 2025
  
  in Public
  
  dir_list
  
  This is a nice/intuitive progression of steps
12. fjdfischer 09 Jan 2025
  
  in Public
  
  Laura and I have discussed this issue, and I think that it should be a larger discussions in the TIG. There are lots of new proposals coming for GEO-TREES sites; some have only one large forest plot, while others have many, scattered, 1-ha plots.
  
  One way to deal with this could be to use Bayesian hierarchical models fitted across sites. When parameterized with random effects, they will pool information to correct extreme estimates and fill in information that is locally missing. For example, there could be a default hierarchical model that is trained across already existing sites in the GEO-TREES network or beyond, which could then be locally updated with new information.
13. fjdfischer 09 Jan 2025
  
  in Public
  
  best model
  
  What is the best model is going to depend a lot on the type of validation. I would bet that the RF model becomes poorer and poorer with a more rigorous spatial cross validation. I would probably either provide predictions for the same models across all sites or choose one default model fitted across sties.
14. fjdfischer 09 Jan 2025
  
  in Public
  
  Negative AGB values.
  
  One way is to not rely too much on RF, but use more robust alternatives: - simple linear regressions - Bayesian hierarchical models, potentially fitted across sites with random effects (or updated with data from other sites)
15. fjdfischer 09 Jan 2025
  
  in Public
  
  Goodness-of-fit for model predictions of AGB density and AGB density from field census data, for 0.25-ha subplots.
  
  minor point: as far as I can see, these are not yet cross-validation results. From a communication perspective I would probably not even create these plots, because they create the (misleading) impression that RF models are great. Plotting a linear regression like this may make sense, because it is fairly robust, but the Bias/MAE/RMSE of random forests are not usually calculated on the training samples due to overfitting. Alternatively, a good figure here would be non-spatial cross validation results, i.e., simple leave-one-out cross validation
16. fjdfischer 09 Jan 2025
  
  in Public
  
  spatial cross-validation
  
  Great idea!
17. fjdfischer 09 Jan 2025
  
  in Public
  
  lin = step(lm(AGB_per_ha ~ Z_p99 + Volume + LAI_5m + Z_CV + Slope + Aspect, data = AGB_df), direction = "backward", trace = 0)
  
  When plot sizes are standardized, CHM-derived volume is perfectly correlated with (or just a transformation of) mean canopy height. I would use mean canopy height for interpretability. Also, cf. comment above (with reference Labriere et al. 2018) and Piotr's comment below. AGB_per_ha ~ log(Volume) would make more sense to me.
18. fjdfischer 09 Jan 2025
  
  in Public
  
  defensible approaches
  
  I fully agree. To make the models defensible, I would remove most of the predictors variables, unless there is a clear justification why they should be important. I imagine that biomass might well vary with aspect and slope (wind exposure, etc.), but I am not sure a continguous 50 ha plot - as amazing as it is - can robustly assess this relationship. I would go for simple models predicting AGB from the log() of canopy height or volume (e.g. Labriere et al. 2018)
19. fjdfischer 09 Jan 2025
  
  in Public
  
  apply a laplachian filter to replace spikes and pits with median values
  
  How robust are these processing steps to variation in pulse density / laser penetration into the canopy?
20. fjdfischer 09 Jan 2025
  
  in Public
  
  paste0(root_dir, "/output/_1_classified_points/*_ground.las")
  
  super minor change, but for future code adaptability, it may make sense to replace this by something like file.path(dir_list[1], "*_ground.las").
  
  I won't mention this for the other write_las operations, but this would apply across the entire code.
21. fjdfischer 09 Jan 2025
  
  in Public
  
  cloth_resolution = 0.3
  
  The default parameters from lidR seem very different (cf. below) - would be good to motivate the choices here, ideally by testing their robustness to different scan configurations and picking the most robust one. I have only limited experience with the CSF algorithm, but I remember that the defaults produced much more stable results than other configurations I tried out.
  
  Defaults: - class_threshold = 0.5 - cloth_resolution = 0.5 - rigidness = 1L, - iterations = 500L, - time_step = 0.65
22. fjdfischer 09 Jan 2025
  
  in Public
  
  -drop_withheld"
  
  This step could also drop classes 7 and 18, which are the common noise classifications used by providers. An additional field could be provided for special noise classes (e.g. extended classification in newer LAS files)
23. fjdfischer 09 Jan 2025
  
  in Public
  
  Points above the lowest elevation expected at the site, to exclude belowground noise
  
  This is a great idea and could also be implemented for top canopy height (i.e., no tree > 125 m, etc.).
  
  A potential problem: Some ALS scans come with a vertical reference to an ellipsoid, some with a vertical reference to a geoid, and the differences in elevation can easily be 50 m, so this requires careful manual inputs / documentation of what is meant.
24. fjdfischer 09 Jan 2025
  
  in Public
  
  root_dir
  
  Super minor point, but for usability it might help to have a consistent naming convention for paths. I would suggest: - dir_root instead of root_dir - dir_output_list instead of dir_list - dir_input instead of path_als
Visit annotations in context

Annotators

fjdfischer

URL

notebooksharing.space/render/v1/d0af2a386d4febf7ad40f3ab9c309c0308aa3e27b3168f36021ed2e40485792a

Annotators

URL