14 Matching Annotations
  1. May 2026
    1. What does the bins argument in geom_histogram() do?

      bins — sets the number of bars. R figures out the width automatically based on your data range.

      binwidth — sets the width of each bar in the actual units of your variable. R figures out how many bars are needed.

      So if your data ranges from 0–100:

      bins = 10 → 10 bars, each automatically ~10 units wide binwidth = 10 → bars 10 units wide, automatically creates 10 bars Same result here, but they diverge when your data range is irregular. binwidth is generally preferred because it's more interpretable — saying "each bar represents 5 years" means more than "I want 20 bars.

    2. How are the following two plots different? Which aesthetic, color or fill, is more useful for changing the color of bars?

      color there is making the OUTLINE of the bars red. fill makes the entire thing red. use fill, then.

    3. Make a bar plot of species of penguins, where you assign species to the y aesthetic. How is this plot different?

      it would be horizontal instead- more or less different yeah

    4. Recreate the following visualization. What aesthetic should bill_depth_mm be mapped to? And should it be mapped at the global level or at the geom level?

      bill depth with one continuous line- natually the color is mapped to the geom level, mapped to geompoint (aes(color=bill_dep))

    5. Will these two graphs look different? Why/why not?

      no, they are applied on both local levels and on the global level- yields same result, but global is just easier.

    6. Run this code in your head and predict what the output will look like. Then, run the code in R and check your predictions. ggplot( data = penguins, mapping = aes(x = flipper_length_mm, y = body_mass_g, color = island) ) + geom_point() + geom_smooth(se = FALSE)

      method=lm not added, a regression curve is used making everything more curvy. you have three separate trend lines since the color is specified on mapping, and the graph is stratified as per the island here.

    7. Make a scatterplot of bill_depth_mm vs. bill_length_mm. That is, make a scatterplot with bill_depth_mm on the y-axis and bill_length_mm on the x-axis. Describe the relationship between these two variables.

      made- it looks almost liek there is no/rnegative relationship, but if you look at each individual species, they have a positive relationship but the trend line is negative. This is an example of confounding as per Simpson's Paradox, where the overall trend reverses as each species is on its own distinct point on the plot, but we see here that each has a positive relationship.. The overall negative trend here is being confounded by species- here species-specific stratificaiton is most important, as the crude trend overall can lie.

    8. What does the na.rm argument do in geom_point()? What is the default value of the argument? Create a scatterplot where you successfully use this argument set to TRUE

      na.rm controls whether R warns you about missing (NA) values when it plots. By default it's FALSE, which means R will plot everything it can but show you a warning message like "Removed 2 rows containing missing values" — you've already seen this in your console earlier.

      Setting it to TRUE just silences that warning. It still removes the NA rows from the plot either way — the only difference is whether R tells you about it or not.

      r ggplot(penguins, aes(x=flipper_length_mm, y=body_mass_g)) + geom_point(na.rm=TRUE) Run that, then run it without na.rm=TRUE and compare your console output. The plot looks identical — but without it you get the warning message, with it you don't.

      For your epi work, I'd actually recommend leaving it as FALSE (the default). You want to know when data is missing — missing data is a big deal in epidemiology and silencing warnings about it is a bad habit. But the exercise just wants you to know the argument exists.

    9. Add the following caption to the plot you made in the previous exercise: “Data come from the palmerpenguins package.” Hint: Take a look at the documentation for labs().

      in labs(), we then add a line with a comma that says: caption = "......"

    10. Why does the following give an error and how would you fix it? ggplot(data = penguins) + geom_point()

      in R studio, put them in same line, taking care to add the plus after the function., so:

      ggplot (data=penguins) + geom_point()

    11. What happens if you make a scatterplot of species vs. bill_depth_mm? What might be a better choice of geom?

      without even needing to try, species has 3 distinct classes- this would not be properly represrnted on a scatter. instead, try a bar graph., i.e, geom_bar()

    12. What does the bill_depth_mm variable in the penguins data frame describe? Read the help for ?penguins to find out.

      numeric variable indicating the bill depth of a penguin in millimeters(mm)

    13. When aesthetic mappings are defined in ggplot(), at the global level, they’re passed down to each of the subsequent geom layers of the plot. However, each geom function in ggplot2 can also take a mapping argument, which allows for aesthetic mappings at the local level that are added to those inherited from the global level. Since we want points to be colored based on species but don’t want the lines to be separated out for them, we should specify color = species for geom_point() only.

      When you put something inside ggplot(aes(...)), it applies to everything you add after it — every geom_point(), geom_line(), geom_smooth(), all of them. That's the "global" level. In other words, you will see that there is GLOBAL APPLICATION OF COLOR=SPECIES HERE, such that color is applied to the LINE POINTS AND SMOOTH CURVE, creating 3 different lines per color/species.

      When you put something inside a specific geom like geom_point(aes(...)), it only applies to that one layer. That's the "local" level. As a result, putting color=species here maintains the different colors for the various points, since this is the points layer, BUT only applies "local color" to the points in question.