63 Matching Annotations
  1. Sep 2023
    1. Dungeons & Dragons

      our dice have more sides so you can get even wilder, but it isn't odd at all since all the dice are even.

    2. “conditions”

      Given the conditions there is a low probability that there is a high chance that this sentence makes no sense.

    3. “Given”

      Given that this word is the statistical operative word it is likely that it will appear 100% of the time in my future writing.

    1. A frequency polygon for 642 psychology test scores shown in Figure 2.2.82.2.8\PageIndex{8} was constructed from the frequency table shown in Table 2.2.22.2.2\PageIndex{2}.

      This graph below shows a good use of grouping the data into buckets and then showing the number of items in each.

    2. Frequency polygons are also a good choice for displaying cumulative frequency distributions

      Good use of data, and the comparisons.

    3. The histogram makes it plain that most of the scores are in the middle of the distribution, with fewer scores in the extremes. You can also see that the distribution is not symmetric: the scores extend to the right farther than they do to the left. The distribution is therefore said to be skewed. (We'll have more to say about shapes of distributions in Chapter 3.)

      I will say this is almost a perfect example of a normal distribution!

    4. Interval's Lower Limit Interval's Upper Limit Class Frequency

      Interval mapping is important to show ranges in a quantitative data set.

    5. Notice that each stem value is split into five parts: 0-1, 2-3, 4-5, 67, and 8-9.

      Here is the confusing part, and the part that takes the most explanation. But it is a good way of keeping the large number in order.

    6. Observe that the figure contains a row headed by “0” and another headed by “-0.” The stem of 0 is for numbers between 0 and 9, whereas the stem of -0 is for numbers between 0 and -9.

      Good way to notate, but this could get confusing really quickly.

    7. Thus, the value 43.2 is rounded to 43 and represented with a stem of 4 and a leaf of 3. Similarly, 42.9 is rounded to 43. To represent negative numbers, we simply use negative stems.

      This causes the data to lose of coherence but if precision accuracy is not important this makes sense.

    8. First, the data are limited to whole numbers that can be represented with a one-digit stem and a one-digit leaf.

      Whole number only storage for these.

    9. Figure 2.2.42.2.4\PageIndex{4}: Back-to-back stem and leaf display. The left side shows the 1998 TD data and the right side shows the 2000 TD data.

      Somehow this is less confusing that the only one sided graph.

    10. To make this clear, let us examine Figure 2.2.22.2.2\PageIndex{2} more closely. In the top row, the four leaves to the right of stem 3 are 2, 3, 3, and 7. Combined with the stem, these leaves represent the numbers 32, 33, 33, and 37, which are the numbers of TD passes for the first four teams in Figure 2.2.12.2.1\PageIndex{1}.

      I can see how this is an effective data storage device. But I would say that the unintuitive nature of the presentation made this confusing.

    11. The numbers to the right of the bar are leaves, and they represent the 1’s digits. Every leaf in the graph therefore stands for the result of adding the leaf to 10 times its stem.

      Each individual number, or the second decimal place for each.

    12. A stem of 3, for example, can be used to represent the 10’s digit in any of the numbers from 30 to 39.

      First decimal place and bucket range.

    13. Scatter plots are used to show the relationship between two variables.

      i.e. correlational data?

    1. inally, we note that it is a serious mistake to use a line graph when the X-axis contains merely qualitative variables. A line graph is essentially a bar graph with the tops of the bars represented by points joined by lines (the rest of the bar is suppressed).

      Line graphs should only be used for quantitative data.

    2. Figure 2.1.52.1.5\PageIndex{5}: A redrawing of Figure 2.1.22.1.2\PageIndex{2} with a lie factor greater than 8.

      This does convey the same information but personally I just don't think it shows the whole story. Plus at a glance the chart doesn't have the same initial impact of data conveyance.

    3. For example, 3-dimensional bar charts such as the one shown in Figure 2.1.42.1.4\PageIndex{4} are usually not as effective as their two-dimensional counterparts.

      good example, and good note. Don't use a Z axis unless you are displaying Z data!

    4. Figure 2.1.32.1.3\PageIndex{3}: A bar chart of the number of people playing different card games on Sunday and Wednesday.

      Good chart, but maybe there is a use of color that could be better. Like a selection of black and red, the two card colors?

    5. Bar charts Bar charts can also be used to represent frequencies of different categories.

      What is key about the bar chart is the at a glance nature of seeing the N values of each item. Additionally this chart should have the percentages as labels on each item.

    6. For example, if just 5 people had been interviewed by Apple Computers, and 3 were former Windows users, it would be misleading to display a pie chart with the Windows slice showing 60%.

      You should solve this by adding a note with the N value next to the percentage or in a legend on the side.

    7. Pie charts are effective for displaying the relative frequencies of a small number of categories. They are not recommended, however, when you have a large number of categories.

      Anything over 5 will probably start to muddy the chart.

    8. The pie chart in Figure 2.1.12.1.1\PageIndex{1} shows the results of the iMac study. In a pie chart, each category is represented by a slice of the pie. The area of the slice is proportional to the percentage of responses in the category. This is simply the relative frequency multiplied by 100. Although most iMac purchasers were Macintosh owners, Apple was encouraged by the 12% of purchasers who were former Windows users, and by the 17% of purchasers who were buying a computer for the first time.

      This is probably the best visual way to present this data. It is clear, and the best part of a pie chart is that the step of showing that each percentage is part of a whole is done for you by the pie metaphor.

    9. 355

      To me the largest portion of the people are upgrading from a previous Mac. They know the product and the system. But I am more surprised the new computer owners being higher than windows users that swapped.

    10. All of the graphical methods shown in this section are derived from frequency tables. Table 1 shows a frequency table for the results of the iMac study; it shows the frequencies of the various response categories. It also shows the relative frequencies, which are the proportion of responses in each category. For example, the relative frequency for “none” of 0.17 = 85/500.

      The table below is also not a very good way to represent the data. At least in my opinion, the main issue I have is that the X and y axis should be flipped. This would make the table more readable.

    11. The key point about the qualitative data that occupy us in the present section is that they do not come with a pre-established ordering (the way numbers are ordered).

      Because this data is qualitative data it is easy enough to place the data into the buckets we want.

  2. Aug 2023
    1. Non-Experimental Designs

      This type of experiment means that there is a relationship between the two different variables but it isn't a causal link but a correlational link. So we study how the outcomes change based on the manipulations to the variables.

    2. Quasi-Experimental Designs

      There are some situations where there isn't a way to randomly assign the candidates a treatment. Therefore these experiment designs do not count of this as something they need. instead they focus on the changes to the dependent variable by applying the independent variable to all of the sample.

    3. Experimental Designs

      As the name implied this type of study is about experimenting with one of the variables at play, this could me a medication a meditation regiment or some other form a practice which the researchers believe is effecting the dependent variable.

    1. To solidify your understanding of sampling bias, consider the following example. Try to identify the population and the sample, and then reflect on whether the sample is likely to yield the information desired.

      In the below example the population is the whole class while the sample is the students in the front row. The issue with selecting only the front row is that students that sit in the front row might be the most engaged students meaning they will likely be the highest performers.

    2. We are interested in examining how many math classes have been taken on average by current graduating seniors at American colleges and universities during their four years in school.

      First you will need to select a representative sample of universities, or a single university to use as a test case.

    3. It is not practical to ask every single American how he or she feels about the fairness of the voting procedures. Instead, we query a relatively small number of Americans, and draw inferences about the entire country from their responses. The Americans actually queried constitute our sample of the larger population of all Americans.

      due to the nature of the population being studied we will need to gather a representative sample of all voters in the united states. With an N value that is large.

    1. For example, experimental subjects may be asked to rate their level of pain, how much they like a consumer product, their attitudes about capital punishment, their confidence in an answer to a test question.

      I believe these would be ordinal scale variables as they are ranking them on a predetermined scale.

    2. Ratio scales

      Ratio scales are like all of the previous variables contained in one. with one important distinction which is that this scale can have a zero value, meaning that someone of this scale can score a zero and it is not in error.

    3. Interval scales

      Each group of values in the scale represent the same amount. This means that you can compare the differences between intervals as if they are equivalent.

    4. Ordinal scales

      This is a weighted scale, meaning that each of the variables as a corresponding score and we are trying to determine where the respondents answers fall on that scale.

    5. Nominal scales

      Un-ordered list of data, all equal in a nominal scale.

    6. The essential point about nominal scales is that they do not imply any ordering among the responses.

      All elements are equal as they are rungs on the scale. Thus this does not imply that higher responses carry more weight or vice versa.

    7. Other variables such as “time to respond to a question” are continuous variables since the scale is continuous and not made up of discrete steps

      Variables that fall on a sliding scale, and can fall within a range of different numbers which include fractions of the variable.

    8. Variables such as number of children in a household are called discrete variables since the possible scores are discrete points on the scale. For example, a household could have three children or six children, but not 4.534.534.53 children

      These variables are something that cannot be split into fractions due to the nature of the variable. While the math might say that it was 2.3 children this doesn't make any sense cause .3 of a child wouldn't be a child.

    9. Some examples of quantitative variables are height, weight, and shoe size

      Numbers, something that is numeric in origin.

    10. Qualitative variables are those that express a qualitative attribute such as hair color, eye color, religion, favorite movie, gender, and so on.

      Something that is measurable but it is not a numeric value. A way to think of this is a categorical variable

    11. If an experiment were comparing five types of diets, then the independent variable (type of diet) would have 555 levels

      5 different elements are being tested against each other and a control.

    12. control

      This means that they participate in the trial but are given something that does not contain the independent variable.

    13. experimental

      Meaning that they are given the independent variable whatever it might be. Thus they are part of the testing group.

    14. systematically

      I would like to know why they didn't say that it, "was not statistically significant." But instead chose to say that it was not systematic.

    15. 39,000 women aged 45 and up

      Good N value.

    16. However, a study published in the Journal of the National Cancer Institute suggests this is false

      A single study does not mean that there is definative proof but it does tell us that there is a relationship to be studied. And it is out job to try and recreate the study to see if we get a different result.

    17. Does beta-carotene protect against cancer?

      Hypothesis being tested.

    18. Although all supplemented rats showed improvement

      The next question to be studied would be is the difference between the powders significant enough to be considered effective compared to just having a healthy diet.

    19. blueberry, strawberry, or spinach powder

      These are the independent variables giving us three different trials that the researcher is running.

    20. Can blueberries slow down aging?

      Hypothesis, this is the question that is being studied.

    21. In this example, relief from depression is called a dependent variable. In general, the independent variable is manipulated by the experimenter and its effects on the dependent variable are measured.

      Dependent because the value of the variable will change based on the effectiveness of the anti-depressant.

    22. In this case, the variable is “type of antidepressant.” When a variable is manipulated by an experimenter, it is called an independent variable

      Independent, because the researcher is changing it between each trial of the experiment.

    1. Many of the numbers thrown about in this way do not represent careful statistical analysis

      Not only that, but they are hiding their calculations and data set from the public. Meaning we can't tell if they are cherry picking data, or showing us a correlation and presenting it as a causation.

    2. People tend to be more persuasive when they look others directly in the eye and speak loudly and quickly.

      Up until the speak loudly part this seems like it could be true. But I don't think people like being shouted at.

    3. Almost 85%85%85\% of lung cancers in men and 45%45%45\% in women are tobacco-related.

      This seems like it is misleading, because unless there is a disparity between the rate at which men and women smoke then there is something else this data is not telling us.

    4. If you cannot distinguish good from faulty reasoning, then you are vulnerable to manipulation and to decisions that are not in your best interest.

      Marketing is the best example of this, as it is pointed out below. Statistics unlike most information is not as clear cut, because two people cause use a data set to show a different outcome.

    1. many students view statistics as a math class, which is actually not true

      Statistics, the English of math. As a non-math person I agree.

    2. In addition, the statistic provided does not rule out the possibility that the number of interracial marriages has seen dramatic fluctuations over the years and this year is not the highest.

      This should be a dramatic fluctuation when the US legalized the practice, and presumably a drop in the following years to a steady rate. But to know this we would need all historical data.

    3. The more churches in a city, the more crime there is. Thus, churches lead to crime.

      Correlation not causation.

    4. This effect is called a history effect

      Not the same but this feels similar to recency bias. Where they are looking at the most recent data and adding more value because of the recent nature. If they were to look at previous years they would see a corresponding bump in the summer, and then they could adjust accordingly.