PART II
ChatGPT wouldn’t make a graph for me… but then again I didn’t login.
PART II
ChatGPT wouldn’t make a graph for me… but then again I didn’t login.
Data set 1:
I’d reorganize these data into a horizontal bar graph although a vertical bar graph would also work. x-axis units = “Percentage of students “satisfied” or “very satisfied”” with tick marks from 0 to 100. y-axis units = “Major", with a unique bar for each of the four majors given in the table.
Title: “Percentage of Students Satisfied With Their Major”.
Data set 2:
Since this data set compares two groups across time (12 months), I’d reformulate this information into a line graph with one line for Detroit and another line for Austin. The lines would need different colors and/or patterns (i.e., one solid line, one dashed). Both or the latter option is more accessible to color-blinded individuals and is hence recommended. The legend would contain a line segment for each of the two groups followed by “Detroit” or “Austin”. The x-axis units = “Month” with tick-marks for Sept through Aug. The y-axis units = “Energy used (average kilowatts per hour)” with tick marks from 0 to ~ 40-50 (depending on the intended audience / any conventional technicalities for reporting energy usage).
Title = “Monthly Energy Usage in Detroit and Austin Buildings”
Data set 3:
Since multiple groups are being compared without regard to time, I would use a cluster bar graph with beef, meat, and poultry clustered together under two categories labeled underneath the x-axis: “Average calories” and “Average sodium content (in mg)”. The y-axis = “Average content” with tick marks from 0 to 500. The sample size can be in parentheses after each Type in the legend. Title = “Average Calories and Sodium Content of Different Hotdog Meat”
Data set 4:
Since two groups are compared with repect to time, I would reformulate this data set into a line graph with one line for “Revolution helmet” and another for “Standard helmet”. Each line would have a unique color and pattern labeled in a legend. The x-axis label = “Interval (weeks)” with tick marks for 1, 2, 3, and 4. The y-axis label = “Return to Play Rates (%)” with tick marks from 0 to 100.
Title of the graph = “Return to Play Rates After Experiencing a Concussion When Wearing a Revolution vs. Standard Helmet”
These data were collected to answer the question of which type of hotdog is healthiest. Hotdogs were analyzed for both calories and sodium content.
I’m not convinced calories and sodium content are good measurements of “healthy”; more context would need to support the decision to use these two metrics.
Table 2: Popularity of the top-selling motorcycle brands among registered motorcycle owners in Pittsburgh. Harley-Davidson is the most popular.
could include units of measurement and a short description of how the data was collected
Figure 4: Harley-Davidson is the most popular motorcycle.
Other than numbering the figure, this example includes only the optional part of a caption, the summary of the main story of the visualization. The caption also needs to identify the units of measurement used, the groups analyzed, and any other additional information crucial to understanding the figure (which needs more context than given to discern).
Table 1: Memory usage of five different web browsers with one tab open.
could include units of measurement, identify each of the five groups / web browsers tested, and more clearly state that one tab is open per web browser.
Figure 3: Levelized costs (in dollars per megawatt hours of electricity) of five different power plants.
could include a summary of the main story since five groups is a lot to quickly analyze by just looking at the figure.
Figure 2 shows the average test scores of male and female students.
needs to identify the units of measurement
Figure 1: Average GPA (on 4-point scale) by hours studied per week of students at University X.
adequate but could include a summary / main takeaway from the data
of poultry, beef, and meat hotdogs.
identifies the groups analyzed
A score of 1=elementaryschoollevel and 20=graduateschoollevel.
identifies the low and high end points of an arbitrary scale
grams CO2 per gallon
includes units of measurement
speed
doesn’t specify average typing speed
Figure 3.11a.
The y-axis could be rescaled so it goes to 5 to make it clearer what “1 = poor; 5 = excellent” in the caption refers to. Additionally, rescaling the y-axis would make both the control and experimental groups more accurately look less effective than they currently do.
Your audience is not technical. You are not expected to present your data to a technical person
redundant
Therefore, pie graphs are most appropriate for nontechnical contexts where readers only care about broad-stroke differences and are not concerned with precision.
mostly redundant but probably the most concise description of when to use pie graphs. update: also reiterated in the chapter summary
What are the pros and cons of the pie graphs in Figure 3.9 versus the same data in Table 3.3?
Fig. 3.9 could work if there was only one pie graph for one type of employee with each section (i.e., “Walk”) labeled with their respective percentage (i.e., “40%”). As it is now, Fig. 3.9 takes a long time to read and overcomplicates the data. Table 3.3 does a much better job at comparing each transportation method between two types of employees. In fact, more columns, one for each additional type of employee, could be added to the table without overcomplicating the presentation of the data.
Therefore — just like dessert pie — pie graphs should be used sparingly and saved for special occasions, rarely acting as a main course.
great analogy
If our goal is to rank items, a bar graph or a table is usually a better choice than a pie graph. However, pie graphs will continue to be used because they are engaging and colorful, and many writers use them to add visual appeal to presentations and documents.
Tension between visual appeal and technical utility
F. Gasoline prices dropped by over one-third between 2012–2015.
4
E. Shorter women receive more unsolicited emails than taller women do (especially women above 5'10"), while the opposite trend is true for men.
1
Violates: - 1. The y-axis is different from women and men. - 2. Arguably for women and especially for men, the number of unsolicited emails vs. height is negligible or at least a far smaller difference than the line graph implies. - 3. The demographic (i.e., locale, profession, race) of participant pool is unclear other than the distinction between women and men. - 4. Given #2, the conclusions from the data are exaggerated.
D. Revenue has steadily increased over the last decade.
3
Violates: - 3. How Cumulative Annual Revenue is calculated could be included.
C. Racist attitudes toward dating appear to have improved slightly 2008 to 2014.
2
Violates: 1. The blue and green trend lines contradict each other. 3. While percentages are given, the graph doesn’t provide an answer for the question “out of how many”? Additionally, “racial attitudes” is too broad of a category that does not effectively analyze the data presented.
B. The number of units sold increased from 2019 to 2022.
1
Violates: - 1. Awful visual presentation of the data. - 2. The number of units sold looks like it’s less in 2022 than in 2019 and if not, the difference seems negligible. Further, the number of units sold decreased in between 2019 and 2022. - 4. For reasons in #1, the conclusions are exaggerated.
A. Male students spoke more than their female peers.
2
Violates: - 2. 16 vs. 22 may not be large in a class of 100+ students - 3. How long each student speaks also contributes to total speaking time.
Remember, a good starting place to assess data is by asking: Out of how many?
This statement implies a preference for rates and proportions/percentages rather than a total count… are there scenarios when the latter (total count) could be preferable?
striking the balance between telling a story and taking care not to overstate that story.
the goal
Return your attention briefly to Figure 2.1 at the beginning of this chapter. If we were to describe this graph’s story as “Figure 2.1 shows that daily Twitter use shortens relationships,” we would be exaggerating what the data can tell us in order to create a sensational story. The data in Figure 2.1 can tell us that there is a correlation between Twitter use and relationship length, but not that one causes the other. It is just as likely that failed relationships drive people to post on Twitter as it is that Twitter use drives people out of relationships. Or there could be a third factor connected to both Twitter use and relationships that is the real cause here. For instance, both daily Twitter use and short relationships could be caused by some underlying personality factor that is the real explanation for the data in Figure 2.1.
This is still just speculation that has no weight without one or more supporting data sets, such as reason for breakup and if one or both parties wished the relationship was longer or shorter than it was.
Such manipulation seems deliberately deceptive and would hurt her credibility.
Is there a standard in industry to show sales data over an interval of at least one or two years to avoid this kind of manipulation?
What factors would contribute to its acceptability?
Showing both graphs, then arguing (if the writer wants to) that zooming, the 2-miles-per-hour decrease from 2012 to 2013 is more significant to the game result (if it is, which would probably need to be supported by more data, such as comparing average knuckleball velocities across different players and different — especially opposing — teams) than it looks from afar.
What determines whether data presentation is ethical or not largely depends on whether a writer is attempting to deceive a reader into seeing a story that the data do not support.
With less openness to interpretation in academic and/or specific circles (i.e., a sport) vs. a general public audience?
Here are four common ways writers lose credibility by telling inaccurate or unethical stories about their data:
There could be other motivations or causes of misrepresenting data, such as not understanding the audience or some other incompetence, in which case the loss of credibility is earned. Nonetheless, sometimes readers are at fault for misinterpreting the data… what metric(s) (i.e., an average reader reaction / understanding) could be used to determine whether the fault of the misinterpreted data is on the part of the writer and/or reader? In what situations does assessing fault have utility?
D. Average Hours Studied per Week, GPA, and Degree Completion of Students in Three Different Majors.
On average, humanities students study the second least with the highest average GPA and highest 5-year degree completion rate, business students study the least with the second highest average GPA and second highest 5-year degree completion rate, and science students study the most hours per week with the lowest average GPA and lowest 5-year degree completion rate.
C. Times to Complete Three Different Virtual Store Functions by Adults over 60 Years and under 30 Years.
Adults over 60 years took longer to login and complete the order form compared to users under 30 years while both demographics spent roughly the same amount of time searching the virtual store.
B. Recurrence of Gastric Ulcers for the Year after Successful Healing with Ranitidine Alone or Triple Therapy Plus Ranitidine.
Triple Therapy Plus Ranitidine significantly decreases the recurrence of gastric ulcers compared to healing with Ranitidine alone.
Readers want to know how Twitter use seems to affect relationships
Wouldn’t this require a different set of data points?
They want to know which country dominated the Olympics.
Similar to the Twitter example, they could also ask the how / why question and/or explicitly define “dominate” (depending on the writers’ purpose), but that would also need another data set to explain with credibility beyond mere speculation.
Can you guess which version was favored by U.S. media outlets?
The first table (2.1a), which ranks the countries by highest to lowest total number of medals won (in which case the US is first) rather than the second table 2.1b, which ranks countries by the highest to lowest number of gold medals won (in which case the US comes in second behind China).
The first — and more interesting — story is that daily Twitter5 users average shorter relationships than others.
Is the split significant? And what if it’s a bad / toxic relationship — wouldn’t leaving sooner then be considered “good”? So shouldn’t a metric of reasons for splitting be added (which would be a mix of qualitative and quantitative data)?
Are any of your revisions more or less trustworthy than others? Do any cast doubt on your credibility?
I added “reported” to my revisions, which casts doubt on the statistics themselves. I don’t think this hinders my credibility, although that is also linked to my purpose and audience which is unclear in #3.
How might you rewrite it to minimize the importance of depression?
78.7% of women and 87.3% of men (hence the majority) have not experienced depression in their lifetime.
What if you wanted to encourage more research into men’s experiences of depression?
Only 12.7% of men report experiencing depression in their lifetime, compared to 21.3% of women reporting depressive symptoms.
How might you rewrite this statistic to encourage a woman’s organization to support depression research?
Of the many women who experience depression in their lifetime, only 21.3% report it.
there is an emotional component to how we present data.
There's also an emotional component to what data you collect.
See if you can figure out why this memo did not sufficiently alarm those in charge of the launch.
A stronger and clearer conclusion should've been made for #1.
Point #2 should've had quantitative and qualitative / relative probabilities to better express the risk. There was also too much reliance on the primary seal not failing.
MTI engineers had warned their superiors about the seals and even estimated a 1 in 100 probability of flight failure and loss of life on the Challenger.
Wow... Is this a common cause of catastrophic failure, when the superiors ignore the warnings by their subordinates?