83 Matching Annotations
  1. Last 7 days
    1. Author Leann Kim and Kass Traieh

      You can actually supply multiple authors!

    2. the

      I think I'd remove this

    3. Introduction

      I would just combine your introduction and background here. Your intro isn't long enough to really have its own section. Just make it the starting paragraph.

    4. our research question:

      remove

    5. Historically, homeownership has been a cornerstone of wealth accumulation for American families as it provides stability and financial security across generations.

      Do you have a source or citation for this?

    6. Research has shown that high levels of student debt can delay or reduce the likelihood of homeownership.

      Needs a citation (or few)

    7. can lead to more informed policies, enhanced educational programs, and a more equitable and prosperous future

      Some more specific examples here might be nice.

    8. proposed

      It's not just proposed! Because you did it!

    9. recent

      2024 (I think, or whatever year it actually is)

    10. generations

      generations? Or just generation?

    11. This scarcity was exacerbated by the fact that the raw data needed for our analysis was not available on any of the websites we consulted, including government portals and academic repositories.

      Isn't this kinda saying the same thing? The data was scarce, and thus it was hard to find...?

    12. predominantly aggregated

      in what ways?

    13. Figure 1: Original Tables

      Captions should be more expansive to describe exactly what you want a reader to be getting out of the image.

      Also, I can think of 0 situations where you need year to be a bigint. A smallint in fact would be MORE than enough.

    14. Furthermore, many of the existing studies and reports that addressed related topics did not provide explicit citations or source details for their data, making it difficult to trace the origins and verify the reliability of the information used in previous research.

      You only cited one or two things above. If you have a bigger trove of existing research on this, you should probably cite them.

    15. Original Tables

      This is not a subsection by itself.

    16. Data Dictionary - Features, Descriptions, and Sources (Click to Expand)

      This is really nice, but introduce it in the text. Don't just leave it up to a reader to randomly decide to click on it. Let them know it is there and what it contains.

    17. aggregated, ensuring no sensitive information was included within the datasets.

      I'm not sure just aggregating data means no sensitive information is available. It does probably mean that no personally identifying information is available though.

    18. Data Cleaning

      A lot of this feels like information that you would explain before showing your ERD?

    19. ease of joining the two tables together.

      Probably still need to explain how you were doing that.

    20. the join conditions account for overlapping age ranges

      I'd have to think of exactly how to show it, but a visual showing how your joined on the different overlapping regions might be very useful here.

    21. Additionally, overlaps or gaps in age ranges might lead to duplication of data, which would affect the overall quality and reliability of the dataset.

      Oh! Are you getting duplicated data? How much? Which overlaps are resulting in that duplication?

    22. Final Data Dictionary

      All sections should start with text introducing the section, not a figure or table, etc.

    23. integrate

      assign?

    24. Due to the limitations of our dataset,

      In our dataset,

    25. we created a formula to manage overlaps by comparing the minimum and maximum birth years to determine the closest generational boundary alignment for each row. We also created new columns for minimum and maximum graduation years. This was done by assuming the average college graduation age to be 21 and adding this to the minimum and maximum birth years.

      Barring a visual, a little example in here might be useful to confirm that a reader understands what you did and could replicate it themselves.

    26. However, despite these measures, the potential for bias remains a concern, as survey respondents may not fully represent the entire population, with certain demographics potentially being underrepresented.

      Coming from the census data, surely they have some data on how well their sample mimics the true population?

    27. a less reliable source

      I would just say that you were unable to trace back this data to the original source, and thus are having to rely on this particular analyst as trustworthy and reputable.

    28. Methods

      Introduce each section with at least a short paragraph of text. Here you could be explaining what the following subsections will be discussing.

    29. Summary

      You devolve a bit into a subheading swamp here. Ensure that you are only using headings when you have a significant amount of content in a that section. Otherwise paragraphs are perfectly fine!

    30. onsequently, we decided to exclude data for Generation Z from our analysi

      is this where the 50 rows vanished from? Or is this on top of that?

    31. Materials List (Software used)

      I think it is useful to state this up front as you did, but it needs some text introduction, and then I'd put everything in a table with the library and what you used that library for.

      Also, did you do everything in Python here?

    32. was refined to 92 rows

      what did we lose 50 rows to?

    33. homeownership data only dates back to 1989, it presents a challenge in directly comparing the homeownership rates of Baby Boomers with our graduation debt trend.

      Oof, yeah it does

    34. slope values of each generation

      Not 100% clear of what you mean by this. I could see this going a variety of ways.

    35. debt_at_grad, avg_debt_to_income, avg_start_salary, percent_mortgage, percent_education_loan, and generation_order

      while these names are pretty decent, if you are going to be relying on these heavily going forward, you should probably take the time to explain each.

    36. Where: to = total_owner ts = total_surveyed hr = homeownership_rate

      usually this would just be done in sentence structure. And then you need to follow it up with what you were using this equation for.

    37. formula

      I'd usually number important equations.

    38. Shows No Strong Curvature of Patterns

      No need to continue the caps! Also, you should mention your slight parabolic tendency here. And maybe changed Fitted values to home ownership percentage (because that is what it is isn't it?)

    39. These assumptions include linearity, independence, homoscedasticity, normality, and lack of multicollinearity.

      At the end of this: "We investigate each in the following sections."

    40. very slight parabolic tendency present

      Agreed

    41. we conclude that there is no statistically significant evidence of autocorrelation

      Good!

    42. In practice, DW values close to 2 (within a range of about 1.5 to 2.5) are often considered acceptable.

      Source/citation?

    43. no pronounced pattern of variance around the horizontal line, indicating that the homoscedasticity assumption is generally met.

      Would another way to evaluate this be to bin the fitted values and then look at the variance of each of the bins to see how consistent it is?

    44. Feature Value debt_at_grad 26.534943 avg_start_salary 21.435287 avg_debt_to_income 28.121863 percent_mortgage 1.411813 percent_education_loan 7.590111 generation_order 7.403381

      Tables should be labeled and captioned just like Figures (except with Table)

    45. Around the Horizontal Line

      no caps!

    46. Figure 4

      If you have multiple things on a plot, I think it is important to include a legend. Though good on you mentioning the 45 deg line in your caption.

    47. R-squared Adjusted R-squared Mean Absolute Error (MAE) Mean Squared Error (MSE) Root Mean Squared Error (RMSE) F-statistic

      You better be ready to explain why you are looking at each of these if you are going to list them here.

    48. Random Forest models, being ensemble methods based on decision trees, are less sensitive to multicollinearity and do not require the assumptions of linearity, homoscedasticity, or normality of residuals.

      Very nice.

    49. 2.41e+05

      Write this out in the more classic human readable form.

    50. relatively high

      compared to what?

    51. Although the other assumptions of the model are satisfied, the presence of multicollinearity necessitates cautious interpretation of our results.

      Can you tell what they are collinear with? Should you have dropped some of these variables?

    52. library to train and test the dataset.

      with what parameters? Using the same 70/30 split?

    53. R-squared Adjusted R-squared Mean Absolute Error (MAE) Mean Squared Error (MSE) Root Mean Squared Error (RMSE)

      Same comment as above. You better be ready to use and explain each of these if you are bringing them up here.

    54. Rising Average Debt Levels at Graduation Across Successive Generations.

      Don't just state what it is, but what a reader should conclude from looking at this image.

    55. Figure 5

      What happened in 1977-78??

    56. Generation Slope

      Presumably this is the slope of each of the above sections, but you don't introduce this part at all other than mentioning it in an earlier much earlier section. Introduce it, explain why it is interesting, etc. Why are you interested in slope here over raw average amount?

      Also, this is a table, and should be listed and referenced as such.

    57. Coefficient Results

      Why its own section? Is this not still about the model fit?

    58. approximately 65% by 2022. This recent increase might be due to various factors, including economic recovery and an increase in housing demand.

      This is super interesting to me, and not necessarily what I'd have expected to see. So home ownership rates are at their highest ever despite also seemingly at their most expensive ever? And with the most debt ever?

    59. A Reflection of Economic Conditions and Financial Barriers.

      no caps.

      And reiterate some of what you had in the text about what a reader should observe on this image.

    60. The linear regression model demonstrates a relatively good fit, with an R-squared value of 0.708, indicating that 70.8% of the variability in homeownership rates is explained by the predictors. The adjusted R-squared of 0.681 suggests that after accounting for the number of predictors in the model, about 68.1% of the variance is explained, which is still substantial. The F-statistic for the model is 26.67 with a p-value of 6.99e-16, indicating that the model as a whole is statistically significant and provides a good fit for the data.

      Well, you discussed some of them, but you still left off about half of them. Which isn't necessarily terrible, but you shouldn't make as big of a deal about what metrics you will be looking at earlier in that case.

    61. Metric Value

      Table!

    62. This counterintuitive finding may be attributable to factors such as high multicollinearity, sampling issues, or specific economic conditions that could influence the observed relationship.

      Could it be an indicator that getting a college education, despite the loans, ends up making more and thus still having the capacity to afford a home?

    63. The coefficient

      Just mentioning it here, but you have a lot of very small paragraphs in this section. I suspect several can be combined, and would make reading this flow a bit better.

    64. not provide a definitive answer

      Is there something that would give you a definitive answer?

    65. Feature Coefficient P-Value

      A) This is a table. B) I'd strongly suggest ordering these in some way to make this table easier to understand at a glance.

    66. Feature Importance Results Feature Importance

      Introduce and explain? Also, pretty sure this shouldn't be its own section, as it is part of the model fit.

      Also, as far as the table is concerned, you have this exact data represented visually below, which I tend to think is the better way to showcase it. Add the exact numbers if you want to the table, but otherwise I'd go with that instead of this table.

    67. Model test statistics: Metric Value

      Table!

    68. We excluded percent_mortgage from this visualization due to its dominance as the most significant feature.

      ok, good comment

    69. substantial

      Is substantial the correct word here? It is the dominant of what you have remaining, but I'd argue you already excluded the real substantial player...

    70. potentially delaying their entry into homeownership

      And yet home ownership is up! So it is older individuals that are buying all the homes?

    71. Figure 7

      reference properly in quarto to link

    72. percent_education_loan and debt_at_grad

      you have been monospace font-ing these variables the entire paper. Don't change it up now!

      Also, can you remind what the difference between these to is? Because they sound very similar.

    73. significantly

      Be careful throwing this word around unless you directly showed this

    74. The prominence of these variables implies that higher levels of student loan debt could be a considerable barrier to homeownership.

      I think you have this breakdown, but nowhere in this article do I recall seeing how home ownership is distributed across the generations. That feels like very useful context for understanding what you are trying to lay out here!

    75. has minimal impact on their likelihood of owning a home when compared to more direct financial factors, such as student loan debt.

      I think it is just mostly collinear with these more direct measures

    76. This result indicates that, according to the model, the generational sequence alone does not substantially influence homeownership rates. While generational trends and experiences might shape broader economic and social patterns, the specific cohort to which an individual belongs appears less relevant in this context than the tangible financial burdens they face.

      You have a lot of tiny paragraphs here again that can (and should) probably be combined.

    77. ebt at graduation (debt_at_grad) is a significant factor influencing homeownership rates. The data shows a clear increase in student loan debt across generations, with Millennials bearing the highest levels, which is likely contributing to the challenges they face in achieving homeownership. Despite the substantial debt load carried by recent graduates, the impact of debt at graduation on homeownership is relatively moderate.

      You say it is significant, and then you downplay it to relatively moderate? Which is it?

    78. In addressing our research question—how do differences in student loan debt between older and current generations affect homeownership trends?—our study offers both insightful and nuanced perspectives. The historical context and the analysis reveal a complex relationship between student loan debt and homeownership rates. Our findings indicate that student loan debt at graduation (debt_at_grad) is a significant factor influencing homeownership rates. The data shows a clear increase in student loan debt across generations, with Millennials bearing the highest levels, which is likely contributing to the challenges they face in achieving homeownership. Despite the substantial debt load carried by recent graduates, the impact of debt at graduation on homeownership is relatively moderate. Through linear regression and Random Forest analyses, we confirmed that debt_at_grad has a less pronounced but still notable effect. Percent_education_loan, while less influential than mortgage debt, also plays a role, suggesting that education loan debt does impact homeownership trends to a degree. Interestingly, generation_order, representing the sequence of generations, was found to be the least significant predictor of homeownership rates. This indicates that, within the context of our model, generational cohort alone does not substantially explain variations in homeownership rates compared to the financial burden of student loans and mortgages.

      This almost feels like the start of a conclusion, and then you can talk about other things?

    79. It is crucial to acknowledge that student loan debt is not the sole determinant of homeownership trends. Inflation, for instance, has affected purchasing power and contributed to rising housing costs, which can make homeownership more challenging for many.

      Now this feels like the start of a new paragraph

    80. References

      Look up how to properly cite websites in APA format. Or use a Bibtex entry and have Quarto do it for you.

    81. According to a study by the Federal Reserve Board, a $1,000 increase in student loan debt is associated with a 1.8 percent decrease in the homeownership rate for public four-year college graduates, resulting in a delay in purchasing a home.

      This feels extremely relevant to your study here, and thus should probably be brought up much earlier (and you need a citation)

    82. Furthermore and as addressed above, despite our efforts to mitigate bias, the potential for underrepresentation of certain demographics in the survey samples remains a concern. Student loan debt can exacerbate racial disparities. Racial wealth and income gaps are rooted in historical discriminatory housing policies - meaning that Black students, in particular, may face greater financial risks in pursuing higher education. Not adequately capturing the experiences of these underrepresented groups can affect our research and must be taken into consideration when evaluating our results.

      This also feels like a new paragraph

    83. We can reference this in our conclusion/limitations: https://housingmatters.urban.org/articles/how-student-loan-debt-affects-racial-homeownership-gap “Extensive evidence underscores how debt affects mortgage eligibility and credit score, erecting clear barriers to homeownership. A study by the Federal Reserve Board found that a $1,000 increase in student loan debt lowers the homeownership rate by about 1.8 percent for public four-year college goers; this amounts to an average delay in about four months for attaining homeownership. Student loan debt may reproduce and exacerbate the racial homeownership gap. Enduring racial disparities in wealth and income—which were, in part, created through decades of racist and discriminatory housing policies that blocked wealth building for many families—mean a greater proportion of Black students need to take on a greater and more enduring financial risk to pursue higher education. Therefore, reducing the impact of student loans on mortgage eligibility could be a critical component of ensuring a more equitable housing landscape.”

      What just happened down here?