Specifically, they underscore the need for co-adaptive systems that can evolve along with users' mental models and definitions of labels.
any sentence that describes explicit design implications
Specifically, they underscore the need for co-adaptive systems that can evolve along with users' mental models and definitions of labels.
any sentence that describes explicit design implications
This requires the development of interfaces and visualizations that demystify the generated data, allowing systematic variation and coverage across the concept space.
any sentence that describes explicit design implications
By visualizing these consistent pattern rules, users may be better understanding the behavior of the model through inference projection [26].
any sentence that describes explicit design implications
By incorporating theories such as Structural Alignment Theory and Variation Theory, it aims to support the learning of both the human and the model.
any sentence that describes explicit design implications
Such studies could determine whether these non-alignable comparisons enhance user performance and elicit deeper insights in human-AI collaborative systems.
any sentence that describes explicit design implications
Most previous research in counterfactual generation has focused on the model side by either generating counterfactuals to improve the model's performance or explaining its behaviors post hoc.
any single sentence that compares and contrasts this work with prior work.
Variation Theory provides the conceptual basis for generating structurally consistent differences, while Structural Alignment Theory (SAT) enhances the user's ability in recognizing and processing these differences.
return any single sentence that describes an explicit or implicit connection to theory
While SAT-based rendering supported human sensemaking in both Gero et al. [29] and Mocha, we also show that the combination of VT and SAT support the model's learning.
any single sentence that compares and contrasts this work with prior work.
This finding is consistent with previous work that supports users' sense-making of text, e.g., by modulating text saliency. Specifically, Gu et al. [32] and Gero et al. [29] both found improved reading efficiency and comprehension with saliency-modulating text renderings.
any single sentence that compares and contrasts this work with prior work.
In decision making, SAT argues that people tend to focus on alignable differences—features that can be directly compared—rather than on differences that cannot be easily aligned.
return any single sentence that describes an explicit or implicit connection to theory
Structural Alignment Theory (SAT) [27] is a cognitive theory that explains how people make sense of concepts by comparing relational structures between two items.
return any single sentence that describes an explicit or implicit connection to theory
Specifically, we use Variation Theory of learning [44] which states that for learning to occur, some aspects that define the concept being learned must vary while others are held constant.
return any single sentence that describes an explicit or implicit connection to theory
According to SAT, humans compare two similar entities by trying to find structural alignments between them, and then comparing corresponding elements, with a special focus on differing aligned elements.
return any single sentence that describes an explicit or implicit connection to theory
VT posits that human learning occurs when learners experience variation across critical and superficial aspects of a concept—through exposure to contrasting examples that systematically vary along different critical and superficial feature dimensions.
return any single sentence that describes an explicit or implicit connection to theory
To analyze the annotation efficiency, we first conducted a Kruskal-Wallis rank sum test [39] to determine if there were statistically significant differences in annotation time across the three conditions, because our data violated the homogeneity of variances assumption, making non-parametric methods more appropriate.
return any single sentence that describes data analysis done on data collected by the authors when running human subjects experiments.
benchmarking AI against programming eval: discoverability, interpretation, predictability
Correspondence concerning this article should be addressed to Iza Ray Korsmit (iza.korsmit@mail.mcgill.ca).
reference to Montreal the city or any institution or author based there
The experimental protocol was certified for ethical compliance by the McGill University Research Ethics Board II.
reference to Montreal the city or any institution or author based there
IRK was supported by funding from the Prins Bernhard Cultuurfonds (The Netherlands). This project was also funded by a Canadian Social Sciences and Humanities Research Council Insight Grant (435-2021-0224), a Social Sciences and Humanities Research Council Partnership Grant (895-2018-1023), and a Canada Research Chair (950-231872) to SMc.
reference to Montreal the city or any institution or author based there
Part of this research was presented at the Society for Music Perception and Cognition Conference, Portland, Oregon (2022). The authors would like to thank Bennett K. Smith for programming the experimental interface and assisting with the experiment execution on Prolific, and Philippe Macnab-Seguin for creating the chromatic scales for the second experiment.
reference to Montreal the city or any institution or author based there
Iza Ray Korsmit, Marcel Montrey, Alix Yok Tin Wong-Min, & Stephen McAdams McGill University, Montreal, Canada
reference to Montreal the city or any institution or author based there
Grimaud and Eerola (2022) compared instrument ensembles of strings, woodwinds, and brass in a study where participants either rated the emotions they perceived or manipulated musical parameters to produce a certain emotion. They found that strings were associated with increased anger and fear, woodwinds with decreased anger and fear, and brass with decreased fear, in the cases of both emotion perception and production. For the other emotions (joy, sadness, calmness, power, surprise), however, results were less consistent between perception and production, indicating that the emotion-instrument association may also depend on context of the task.
makes an explicit connection between a music theory concept and congition
following the constructionist idea that an individual's personality and background will influence the affect they perceive or feel, we considered several sources of individual differences as moderating factors in the effects of instrument family, pitch height, and affect locus.
makes an explicit connection between a music theory concept and congition
This research follows a constructionist approach to musical affect (Cespedes-Guevara & Eerola, 2018). That is, although we are interested in the "bottom-up" influence of certain musical features on musical affect, we believe these cannot be adequately evaluated without considering the "top-down" effects of context and individual differences that are present when affects are constructed. The perception or induction of affect does not merely arise in response to a stimulus but is also formed in relation to the individual and the context.
makes an explicit connection between a music theory concept and congition
the context of a task (like perception, production, or induction) may change the effect of musical features.
makes an explicit connection between a music theory concept and congition
Furthermore, as the method of reporting on perceived and induced affect may influence the construction of an affect (e.g., facilitating categorical perception), we also compare two different affect representations.
makes an explicit connection between a music theory concept and congition
This research follows a constructionist approach to musical affect (Cespedes-Guevara & Eerola, 2018). That is, although we are interested in the \'bottom-up\' influence of certain musical features on musical affect, we believe these cannot be adequately evaluated without considering the \'top-down\' effects of context and individual differences that are present when affects are constructed. The perception or induction of affect does not merely arise in response to a stimulus but is also formed in relation to the individual and the context.
makes an explicit connection between a music theory concept and congition
Cognitive surrenderA paper that came out this year asked: if you’re working with AI a lot, and you’re using it as a machine to answer all of your questions, what happens with System 1 and System 2?
Cognitive surrender: what happens to System 1 and System 2 if you offload to AI to get any answers? (Is this diff from other cognitive tools, like writing and Plato's rejection of it?)
The paper is https://doi.org/10.31234/osf.io/yk25n_v1 and it posits AI offloading as System 3. That is an interesting perspective. Thinking—Fast, Slow, and Artificial: How AI is Reshaping Human Reasoning and the Rise of Cognitive Surrender by Shaw and Nave, 2026. Thinking—Fast, Slow, and Artificial: How AI is Reshaping Human Reasoning and the Rise of Cognitive Surrender in Zotero
2. Validator Another basic role for AI is validating your understanding. To do this, you ask it to review your notes for errors or gaps, do basic fact checking, or critique your reasoning. Again, you can do this via the chat interface, but I also experimented with passing my notes in Obsidian using the Copilot plugin and in Emacs using gptel. Example: After reading The Epic of Gilgamesh, I wrote a note in Obsidian summarizing its plot. When I asked ChatGPT to critique my summary, it pointed out that I’d given the central character a redemption arc that isn’t present in the text. I’m so accustomed to the standard hero’s journey, that I projected it onto the book — and an LLM helped me correct this ‘hallucination.’ Suggested prompt: Here are my notes on [WORK]. What important ideas did I miss or underemphasize? Don’t rewrite my notes — just flag the gaps.
Role 2 validator of one's understanding, also seen as basic. Might be a good complement to e.g. turning some of my notes into [[Anki]] card decks or combine in another way w spaced repetition. [[Spaced repetition 20201012201559]] [[Connecting my PKM to Anki]]
[[Jorge Arango p]] talk at PKM Summit 2026 robots in the garden, a perspective on PKM and the use of gen AI in it
list of scenario's in which AI agents will a) work against you b) be used against you at scale.
For the record, my posts aren’t written or conceived with an LLM, although I know an increasing number of people who use one to write a first draft and then edit. I’m not a fan. The whole point of the web — its beauty — is that it’s unrelentingly human and diverse.
A good case for disfavoring the use of AI/LLMs to write first drafts of blog posts. Implicit I believe is a distinction between using external tools to edit/proofread a human-written draft vs editing/proofreading a machine draft (granting I do not use these tools for either). Related to points I raised in Re; On AI in response to: A Positive Technologist Identity (2/4).
Given the high prevalence of such sounds in everyday life, having misophonia can have large negative effects on one's functioning in personal, academic, and work environments.
any sentences referring to misophonia verbatim
Although there are many idiosyncrasies in what may trigger a person with misophonia, the most common triggers are created by other humans, such as the sound of someone chewing, clearing their throat, tapping their foot, or typing on a keyboard.
any sentences referring to misophonia verbatim
Misophonia is a psychological disorder that is characterized by severe aversive responses to specific environmental sounds (i.e., triggers).
any sentences referring to misophonia verbatim
This indicates that misophonia is not a purely auditory processing disorder but is also influenced by a top-down process of source identification.
any sentences referring to misophonia verbatim
an fMRI study found that people with misophonia show increased response in the anterior insular cortex (AIC) in response to misophonic sounds, compared to control participants and other unpleasant or neutral sounds (Kumar et al., 2017).
any sentences referring to misophonia verbatim
Both the subjective judgment of aversiveness and the physiological measure of skin conductance response (SCR) increase when people with misophonia are presented with triggers (Edelstein et al., 2013).
any sentences referring to misophonia verbatim
The disorder is not yet recognized by the Diagnostic and Statistical Manual − 5th version (DSM-5; American Psychiatric Association, 2013), but there has been an increasing amount of research on the characterization and treatment of misophonia (Vitoratou et al., 2021; see also Brout et al., 2018, for a review).
any sentences referring to misophonia verbatim
Composers and music researchers had previously analyzed and annotated 65 movements from the Classical, Romantic, and early Modern repertoire in terms of the Taxonomy of Orchestral Grouping Effects (McAdams et al., 2022).
please find any claims that depend on citations referring to works by any of the present authors
In a study by McAdams and Goodchild (2018), orchestral simulations created with OrchSim were compared perceptually to commercial recordings and were shown to be of high quality.
please find any claims that depend on citations referring to works by any of the present authors
These results confirm with orchestral excerpts the findings of studies on isolated tones with dyads or triads of instruments in which the presence of impulsive instruments reduces the perception of blend (Lembke et al., 2019; Reuter, 1996; Tardieu & McAdams, 2012).
please find any claims that depend on citations referring to works by any of the present authors
This provides additional support for its significant contribution to blend in Fischer et al. (2021).
please find any claims that depend on citations referring to works by any of the present authors
Lembke et al. (2017) demonstrated that combinations of sustained and impulsive instruments blend less well.
please find any claims that depend on citations referring to works by any of the present authors
structuring by affecting sequential grouping through the segregation of auditory streams played by different instruments and segmental grouping through timbral contrasts (McAdams et al., 2022).
please find any claims that depend on citations referring to works by any of the present authors
Several other spectral and spectrotemporal descriptors were found to play a role in blend perception in orchestral works by Fischer et al. (2021). These include spectral flatness and spectral crest (different measures of the degree to which the spectrum is denser or has more emergence of spectral components), and spectral variation (the degree of variation of the spectral shape over time).
please find any claims that depend on citations referring to works by any of the present authors
Fischer et al. (2021) studied the blends of multi-instrument streams in the context of orchestral stream segregation in predominantly Romantic orchestral excerpts. They found that within-family instrument combinations blended better than between-family combinations. They demonstrated the role played by overlap in timbre correlates of spectral flatness (a measure of the tonalness/noisiness or density of the spectrum), spectral skewness (related to the shape of the spectral envelope), and spectral variation (evolution of the spectral envelope over time), as well as cues derived from the scores such as onset synchrony and the consonance of concurrent pitch relations.
please find any claims that depend on citations referring to works by any of the present authors
McAdams et al. (2022) distinguish two subcategories of these two types of blend: stable and transforming.
please find any claims that depend on citations referring to works by any of the present authors
Lembke, S.-A., Parker, K., Narmour, E., & McAdams, S. (2019). Acoustical correlates of perceptual blend in timbre dyads and triads. Musicae Scientiae, 23(2), 250–274.
please find any claims that depend on citations referring to works by any of the present authors
Several other spectral and spectrotemporal descriptors were found to play a role in blend perception in orchestral works by Fischer et al. (2021).
please find any claims that depend on citations referring to works by any of the present authors
Lembke and McAdams (2015) found that the degree of spectral overlap between constituent sounds enhanced blend perception.
please find any claims that depend on citations referring to works by any of the present authors
Tardieu and McAdams (2012) extended this work with combinations of unison sustained and impulsive instruments (including pitched percussion and string pizzicati).
please find any claims that depend on citations referring to works by any of the present authors
McAdams et al. (2022) introduce other notions related to blend as well.
please find any claims that depend on citations referring to works by any of the present authors
This in turn creates an effect of perceptual unity (McAdams, 1989).
please find any claims that depend on citations referring to works by any of the present authors
Four important concurrent grouping cues predict the perceptual fusion of sound events (McAdams, 1984): onset synchrony, harmonicity, coherent frequency behavior, and common spatial origin.
please find any claims that depend on citations referring to works by any of the present authors
To this, McAdams et al. (2022) have added segmental grouping (chunking into perceptual units).
please find any claims that depend on citations referring to works by any of the present authors
When the sudden drop to a pianissimo occurred towards the ending of the piece, the perceived arousal responses of CHM and WM dropped slightly but rose again immediately to end on a high arousal. These two groups of listeners appear to have anticipated a return to a loud and majestic close and therefore kept their arousal responses higher than those of the NM.
please highlight anything related to music performance practice
CHM, who are more experienced with the instruments and compositional techniques used in Chinese orchestral music, might have had an idea of which features figure more prominently in the communication of particular intentions, and therefore would have more information available for their judgments.
please highlight anything related to music performance practice
The perception of affective intentions in music is influenced by the degree of familiarity listeners have with a musical tradition, the content implicated in the music, and the complex sonic environment created by the composer's creation and the musicians' interpretation.
please highlight anything related to music performance practice
The version that participants heard was a premier of the work by the Taipei Chinese orchestra.
please highlight anything related to music performance practice
The communication of emotions or affect takes place when listeners perceive emotional meaning that is expressed by performers in music (Juslin, 2013a, 2013b).
please highlight anything related to music performance practice
An understanding of a tonal schema with its associations to happiness and sadness has been consistently found to influence listeners who have grown up in a culture of Western music.
please highlight anything related to music theory
Its mirmode function estimates the modal strength of the music in terms of ''majorness'' and ''minorness.''
please highlight anything related to music theory
The perception of consonance also plays an important role in the music listening process—combinations of tones that are consonant are perceived as more positively valenced than dissonant ones (Harrison & Pearce, 2020).
please highlight anything related to music theory
Musical structures such as pitch relations are perceptually salient and provide important information for listeners (e.g., Gabrielsson & Lindstrom, 2010; Krumhansl, 1998; Krumhansl & Kessler, 1982).
please highlight anything related to music theory
Iqa' (plural iqa'at) is used to describe a rhythmic cycle. Iqa'at are made up of two different basic building blocks, the dum and tak, onomatopoeias derived from the sound produced on membranophones such as the darabuka.
please highlight anything related to music theory
H5. Being more culturally bound, musical cues that are learned, such as modal structures, metrical relations, and so on, will exert a greater influence on listeners' perceived valence ratings than on their arousal ratings.
please highlight anything related to music theory
This is a simple PDF file. Fun fun fun.
testing something new today
using SemanticCommit, we recorded all instances of edits, check for conflicts, make change, local, and global resolution actions using telemetry.
sentences describing methods the authors used; one sentence at a time
A sub-task was considered a failure if the participant was unable to complete it within the time limit.
sentences describing methods the authors used; one sentence at a time
Finally, we conducted an informal interview about their experience.
sentences describing methods the authors used; one sentence at a time
After both tasks were completed, participants filled out a final survey to compare the two conditions.
sentences describing methods the authors used; one sentence at a time
We chose GPT-4o for performance and latency reasons, as it performed optimally against our evals.
sentences describing methods the authors used; one sentence at a time
We also ran evaluations of model latency and classification performance under varying false positive rates for the following LLMs by OpenAI: GPT-4o, GPT-4o-mini, and o3-mini.
sentences describing methods the authors used; one sentence at a time
For each task, participants were tasked with integrating three new pieces of information into the memory, one at a time ("sub-tasks").
sentences describing methods the authors used; one sentence at a time
We ensured each list was 30 items long as our pilot studies suggested this was long enough that manual detection starts to become unwieldy (users need to scroll up and down the document), but short enough that participants could become familiar in a short period.
sentences describing methods the authors used; one sentence at a time
We adapted two intent specifications from our evals: Mars Game Design Document and Financial Advice AI Agent Memory, as these tasks mapped to the two paradigmatic types covered in Sections 2 and 2.1 (design documents, and AI memory of the user).
sentences describing methods the authors used; one sentence at a time
We recruited 12 participants (7 female, 5 male) through the mailing lists of two research universities and one multinational technology company.
sentences describing methods the authors used; one sentence at a time
We chose OpenAI's ChatGPT Canvas as a baseline for five reasons: (i) it is a popular, commercially available tool, hence it is likely familiar to users; (ii) it provides a document editing view, where users can select text and ask GPT to rewrite it, or chat with an AI to make global edits; (iii) it employs a similar class of model (GPT-4o); (iv) it supports similar editing features as SemanticCommit like inline text selection, conflict highlighting, and a diff view, while adding free-form editing; and (v) similar interfaces like Anthropic Artifacts tended to rewrite the specification entirely, and did not offer Canvas's "diff" view to allow for a fair comparison.
sentences describing methods the authors used; one sentence at a time
With participant consent, we recorded audio and screen-casts, and participants were encouraged to think aloud.
sentences describing methods the authors used; one sentence at a time
Four coauthors created the evals, and two coauthors manually double-checked all conflicts, a process that took several days.
sentences describing methods the authors used; one sentence at a time
We ran one pilot study with five users of our card-based interface, and a second with four users of a revised interface.
sentences describing methods the authors used; one sentence at a time
Our explorations went through substantial iterations and prompt prototyping over a period of eight months, evolving in response to two pilot studies and progressing from a card-based interface to a list of texts.
sentences describing methods the authors used; one sentence at a time
We iterated on prompts using ChainForge [5] by setting up an evaluation pipeline against our datasets, which allowed us to observe the effects of prompt changes and model choices.
sentences describing methods the authors used; one sentence at a time
To measure statistical significance, we used Mann–Whitney–Wilcoxon tests and report the p-values.
sentences describing methods the authors used; one sentence at a time
For qualitative analysis, the first author performed open coding on participant responses and audio transcripts to identify themes, which were used to interpret the qualitative results.
sentences describing methods the authors used; one sentence at a time
In the post-task surveys, we collected self-reported NASA Task Load Index (TLX) scores, Likert-scale ratings for ease of use, and responses on how well the AI helped participants identify, understand, and resolve semantic conflicts.
sentences describing methods the authors used; one sentence at a time
Each condition had a time limit of 15 minutes, after which the participant completed a post-task survey.
sentences describing methods the authors used; one sentence at a time
Before each task, participants received a tutorial on the assigned tool and were given five minutes to explore it using a test document.
sentences describing methods the authors used; one sentence at a time
Both the order of task assignment and tool assignment were counterbalanced and randomly assigned.
sentences describing methods the authors used; one sentence at a time
We conducted a controlled within-subjects study with mixed methods, comparing SemanticCommit with a baseline interface.
sentences describing methods the authors used; one sentence at a time
We run end-to-end on our four eval datasets using GPT-4o and GPT-4o-mini and report the mean ± stddev for accuracy, precision, recall, and F1 scores for the three approaches in Figure 5.
sentences describing methods the authors used; one sentence at a time
We compare our end-to-end system against two simpler methods: (i) DropAllDocs, which adds all documents to the context for conflict classification; and (ii) InkSync [56] which generates a JSON list of string-replace operations.
sentences describing methods the authors used; one sentence at a time
In order to minimize relevance assessment issues, we apply a PageRank-based relevance ranking over the KG, akin to HippoRAG [36].
sentences describing methods the authors used; one sentence at a time
We implement the back-end using a knowledge-graph (KG) RAG architecture [36] consisting of two phases: pre-processing and inference.
sentences describing methods the authors used; one sentence at a time
Through a within-subjects study with 12 participants comparing SemanticCommit to a chat-with-document baseline (OpenAI Canvas), we find differences in workflow: half of our participants adopted a workflow of impact analysis when using SemanticCommit, where they would first flag conflicts without AI revisions then resolve conflicts locally, despite having access to a global revision feature.
sentences describing methods the authors used; one sentence at a time
We implement the back-end using a knowledge-graph (KG) RAG architecture [36] consisting of two phases: pre-processing and inference.
sentences describing methods the authors used; one sentence at a time
In order to minimize relevance assessment issues, we apply a PageRank-based relevance ranking over the KG, akin to HippoRAG [36].
sentences describing methods the authors used; one sentence at a time
We compare our end-to-end system against two simpler methods: (i) DropAllDocs, which adds all documents to the context for conflict classification; and (ii) InkSync [56] which generates a JSON list of string-replace operations.
sentences describing methods the authors used; one sentence at a time
We run end-to-end on our four eval datasets using GPT-4o and GPT-4o-mini and report the mean ± stddev for accuracy, precision, recall, and F1 scores for the three approaches in Figure 5.
sentences describing methods the authors used; one sentence at a time
We conducted a controlled within-subjects study with mixed methods, comparing SemanticCommit with a baseline interface.
sentences describing methods the authors used; one sentence at a time
Both the order of task assignment and tool assignment were counterbalanced and randomly assigned.
sentences describing methods the authors used; one sentence at a time
Before each task, participants received a tutorial on the assigned tool and were given five minutes to explore it using a test document.
sentences describing methods the authors used; one sentence at a time
Each condition had a time limit of 15 minutes, after which the participant completed a post-task survey.
sentences describing methods the authors used; one sentence at a time
In the post-task surveys, we collected self-reported NASA Task Load Index (TLX) scores, Likert-scale ratings for ease of use, and responses on how well the AI helped participants identify, understand, and resolve semantic conflicts.
sentences describing methods the authors used; one sentence at a time
For qualitative analysis, the first author performed open coding on participant responses and audio transcripts to identify themes, which were used to interpret the qualitative results.
sentences describing methods the authors used; one sentence at a time
To measure statistical significance, we used Mann–Whitney–Wilcoxon tests and report the p-values.
sentences describing methods the authors used; one sentence at a time
We iterated on prompts using ChainForge [5] by setting up an evaluation pipeline against our datasets, which allowed us to observe the effects of prompt changes and model choices.
sentences describing methods the authors used; one sentence at a time
Our explorations went through substantial iterations and prompt prototyping over a period of eight months, evolving in response to two pilot studies and progressing from a card-based interface to a list of texts.
sentences describing methods the authors used; one sentence at a time
We ran one pilot study with five users of our card-based interface, and a second with four users of a revised interface.
sentences describing methods the authors used; one sentence at a time
Four coauthors created the evals, and two coauthors manually double-checked all conflicts, a process that took several days.
sentences describing methods the authors used; one sentence at a time
With participant consent, we recorded audio and screen-casts, and participants were encouraged to think aloud.
sentences describing methods the authors used; one sentence at a time
These semantic conflicts require dedicated support to detect, visualize, and resolve. Semantic conflict resolution interfaces must go beyond visualizing what changes were made, to what changes could be made, where they should be made, and what the effects might be. This resembles feedforward: affordances that help the user foresee the impact of an action [67, 93].
sentences describing connections to theory; one sentence at a time
Today with LLMs, we are less limited by this constraint, and solutions to the problem of human-machine communication might be better found in cybernetics theory [9] than static formalism.
sentences describing connections to theory; one sentence at a time
This reflects the principle of feedforward [67, 93] in communication theory—"a needed prescription or plan for a feedback, to which the actual feedback may or may not confirm" [79]—where a communicator provides "the context of what one was planning to talk about" [64, p. 179-80] in order to "pre-test the impact of [its output]" on the listener [34, p. 65].
sentences describing connections to theory; one sentence at a time
We consider common sequences of chunk roles to be alignable structures that could be used to support users in identifying structural similarities and differences across sentences in different abstracts, in line with Structure-Mapping Theory [17].
sentences that mention theory, explicitly or implicitly; one sentence at a time
Like prior Structural Mapping Theory (SMT)-informed work in text corpora representation, AbstractExplorer's features have enabled some users to see more of both the overview and the details at the same time, facilitating abstraction without losing context.
sentences that mention theory, explicitly or implicitly; one sentence at a time
Our work demonstrates that designs informed by Structure-Mapping Theory can support users in navigating, making use of, and engaging with variation present in information.
sentences that mention theory, explicitly or implicitly; one sentence at a time
According to SMT, this generalization depends on most documents having some shared implicit structure.
sentences that mention theory, explicitly or implicitly; one sentence at a time
This ordering prioritizes dominant structural patterns (largest groups first) while exposing fine-grained variations (via length-sorted triplets), mirroring how humans compare sentences, if SMT is an accurate description in this domain of comparative close reading.
sentences that mention theory, explicitly or implicitly; one sentence at a time
Structural mappings between objects are part of the cognitive process of comparison according to the Structure-Mapping Theory [17], and juxtaposition can facilitate humans in recognizing particular possible structural mappings between objects [75].
sentences that mention theory, explicitly or implicitly; one sentence at a time
In SMT terminology, rendering and arranging according to corresponding chunks reify "commonalities in structure," while variation within corresponding chunks are "alignable differences" that users are predicted to notice.
sentences that mention theory, explicitly or implicitly; one sentence at a time
The prior SMT-informed tools in Section 2.3 for both code and natural language corpora suggest that the cognitive process of comparing texts may be no exception to the cognitive processes SMT predicts.
sentences that mention theory, explicitly or implicitly; one sentence at a time
SMT posits that visual alignment helps people perceive relational similarities and differences more clearly, thereby improving their ability to make meaningful comparisons and understand underlying patterns [28, 38, 47].
sentences that mention theory, explicitly or implicitly; one sentence at a time
SMT provides a framework for understanding how humans compare two or more objects by finding common structural alignments between objects.
sentences that mention theory, explicitly or implicitly; one sentence at a time
Structural Mapping Theory (SMT) is a long-standing well-vetted theory from Cognitive Science that describes how humans attend to and try to compare objects by finding mental representations of them that can be structurally mapped to each other (analogies).
sentences that mention theory, explicitly or implicitly; one sentence at a time
This SMT-informed approach, which AbstractExplorer shares, tries to give this mental machinery "a leg up," letting users perhaps skip some steps by accepting reified cross-document relationships identified by the computer.
sentences that mention theory, explicitly or implicitly; one sentence at a time
The human perceptual, comparative mental machinery that SMT describes is part of what enables humans to form more abstract structured mental models from concrete examples, among other critical knowledge tasks.
sentences that mention theory, explicitly or implicitly; one sentence at a time
These examples of text-centric lossless techniques do not abstract away or summarize; they strategically re-organize and re-render the existing text to help enhance readers' own perceptual cognition, informed by Structural Mapping Theory (SMT) [17].
sentences that mention theory, explicitly or implicitly; one sentence at a time
Theory (SMT) to facilitate seeing both the overview and the details at the same time, facilitating abstraction without losing context.
sentences that mention theory, explicitly or implicitly; one sentence at a time
We process this data in a three-stage pipeline (Figure 6). In the first stage, Sentence Segmentation and Categorization, abstracts are split into individual sentences using the NLTK package, and each sentence is classified into one of the five pre-defined aspects as listed in Section 4.1.1. Classification is performed by prompting an LLM (see prompt used in Appendix D.1) with the sentence and its full abstract.
sentence relating to methodology
Then, we segment sentences within each aspect into grammar-preserving chunks (see prompt used in Appendix D.2). This results in grammatically coherent chunks that are the basis of structure patterns. After identifying chunk boundaries, we again prompt an LLM to generate labels for chunks in a human-in-the-loop approach: starting from an initial set of labels for chunk roles, when a new label is generated, a researcher from the research team examines the new label and merges it with existing labels if appropriate, controlling for the total number of labels.
sentence relating to methodology
In this study, we allowed participants to experience views of same-aspect sentences (Section 4.1.1) with different combinations of highlighting, ordering, and alignment (as described in Section 4.1.2 and Section 4.1.4) enabled or not, in order to understand which and/or what combinations most effectively supported users' ability to skim and read laterally across documents.
sentence relating to methodology
Inspired by GP-TSM [24], AbstractExplorer first segments sentences into grammar-preserving chunks—segments that respect grammatical boundaries, i.e., an LLM judges that the sentence can be truncated at that chunk boundary without breaking the grammatical integrity of the preceding text. Each chunk is then classified by an LLM as having one of nine pre-defined roles, each of which has its own assigned color.
sentence relating to methodology
AbstractExplorer classifies sentences into five pre-defined aspects common in CHI abstracts: Problem Domain, Gaps in Prior Work, Methodology/Contribution, Results/Findings, and Discussion/Conclusion.
sentence relating to methodology
We conducted a qualitative analysis of user study transcripts and survey responses using a Grounded Theory approach [8]. First, the lead researcher collected a list of participants' behaviors, approaches, reflections on their experience, and feedback about the interface. The researcher then systematically coded this data, revisiting the data multiples times and refining the codes to ensure consistency and coherence. Through this process, high-level themes were identified and organized using affinity diagramming. Once the thematic structure was finalized, the researcher gathered supporting evidence for each theme and synthesized the findings, which were reviewed by the research team to ensure agreement on the results.
sentence describing how analysis was performed on data collected by the authors of this paper
Activity log data, which revealed how participants actually used the interface, echoed the above findings. According to the log data, participants spent most of their reading time (66.31%) with vertical alignment on the second element in structure pairs, followed by alignment on the first element (29.19%), and left-justified alignment (5.13%). Highlighting usage showed a similar preference: 91.13% of time with all chunks highlighted, 8.25% with partial highlighting, and minimal time (0.63%) without highlights.
sentence describing how analysis was performed on data collected by the authors of this paper
In this section, we present findings on how AbstractExplorer supports comparative close reading at scale by integrating quantitative survey responses and log data with qualitative analysis of transcripts and open-ended responses. The qualitative analysis process is described in detail in Appendix H.
sentence describing how analysis was performed on data collected by the authors of this paper
Throughout the two tasks, we also collected detailed interaction logs including counts of user-defined aspects created, duration of highlighting usage, and time allocation across the three possible alignment options.
sentence describing how analysis was performed on data collected by the authors of this paper
Both gaze data and the semi-structured interviews revealed that lower NFC participants were more willing to be guided by the three features and took advantage of them consciously.
sentence describing how analysis was performed on data collected by the authors of this paper
Using a two-tailed Mann-Whitney U Test, we found that participants who reported their lowest perceived cognitive load when all three features were enabled had significantly lower NFC than participants who reported their lowest cognitive load level when skimming with no features enabled—in the baseline interface (p=0.03).
sentence describing how analysis was performed on data collected by the authors of this paper
The raw NASA-TLX score is the sum of all 6 NASA-TLX questions after reversing the appropriate questions.
sentence describing how analysis was performed on data collected by the authors of this paper
To compute a participant's NFC score, we averaged their response to the six questions, each ranging from 1 to 7, after reversing the appropriate questions.
sentence describing how analysis was performed on data collected by the authors of this paper
For simplicity of analysis, we denote participants with NFC scores above the overall participants' median NFC of 5.42 (IQR = 0.583) as higher NFC, and lower NFC otherwise.
sentence describing how analysis was performed on data collected by the authors of this paper
To contrast participants' gaze patterns in each condition, we used a Tobii Pro Spark eye-tracker placed below the desktop monitor used by all subjects; Tobii Pro Lab software recorded each participant's gaze over time in each condition.
sentence describing how analysis was performed on data collected by the authors of this paper
We collected 80 sentences from our abstracts dataset labeled by our system as "Methodology/Contribution." Participants viewed the same 80 sentences in each condition—often with a different subset of sentences initially visible due to ordering changes—but only had two minutes to look at them in each condition.
sentence describing how analysis was performed on data collected by the authors of this paper
After obtaining an expanded set of high-level chunk labels, we assign them to each of the sentence chunks by using LLMs in a multiclass classification few-shot learning task, with the initial labels and assignment as examples (see prompt used in Appendix D.3).
sentence describing how analysis was performed on data collected by the authors of this paper
Then, we segment sentences within each aspect into grammarpreserving chunks (see prompt used in Appendix D.2). This results in grammatically coherent chunks that are the basis of structure patterns. After identifying chunk boundaries, we again prompt an LLM to generate labels for chunks in a human-in-the-loop approach: starting from an initial set of labels for chunk roles, when a new label is generated, a researcher from the research team examines the new label and merges it with existing labels if appropriate, controlling for the total number of labels.
sentence describing how analysis was performed on data collected by the authors of this paper
We process this data in a three-stage pipeline (Figure 6). In the first stage, Sentence Segmentation and Categorization, abstracts are split into individual sentences using the NLTK package, and each sentence is classified into one of the five pre-defined aspects as listed in Section 4.1.1. Classification is performed by prompting an LLM (see prompt used in Appendix D.1) with the sentence and its full abstract.
sentence describing how analysis was performed on data collected by the authors of this paper
After the interviews, we analyzed the data using the process described in Appendix B
sentence describing how analysis was performed on data collected by the authors of this paper
We conducted a study with 12 blind SR users.
sentence about testing
We conducted a collaborative, user-centered design study with a team of scientific researchers.
sentence about testing
We conducted quantitative (n=70) and qualitative (n=30) studies with healthcare experts.
sentence about testing
We conducted a qualitative study with 16 blind and visually impaired (BI) developers.
sentence about testing
We conducted role-playing exercises with 24 US journalists.
sentence about testing
We conducted a study with 32 blind SR users.
sentence about testing
We conducted a collaborative, user-centered design study with a team of scientific researchers.
sentence about testing
We conducted quantitative (N=79) and qualitative (N=93) studies with healthcare experts.
sentence about testing
We conducted a qualitative study with 35 blind and visually impaired (VI) Developers.
sentence about testing
We conducted a study with 32 blind US users.
sentence about testing
We conducted a study with 12 blind SR users.
sentence about testing
We conducted a collaborative, user-centered design study with a team of scientific researchers.
sentence about testing
We conducted quantitative (n=70) and qualitative (n=30) studies with healthcare experts.
sentence about testing
We conducted a qualitative study with 16 blind and visually impaired (BI) developers.
sentence about testing
We conducted role-playing exercises with 24 US journalists.
sentence about testing
We conducted a study with 32 blind SR users.
sentence about testing
We conducted a collaborative, user-centered design study with a team of scientific researchers.
sentence about testing
We conducted quantitative (N=79) and qualitative (N=93) studies with healthcare experts.
sentence about testing
We conducted a qualitative study with 35 blind and visually impaired (VI) Developers.
sentence about testing
We conducted a study with 32 blind US users.
sentence about testing
We conducted a study with 12 blind SR users.
sentence about testing
We conducted a collaborative, user-centered design study with a team of scientific researchers.
sentence about testing
We conducted quantitative (n=70) and qualitative (n=30) studies with healthcare experts.
sentence about testing
We conducted a qualitative study with 16 blind and visually impaired (BI) developers.
sentence about testing
We conducted role-playing exercises with 24 US journalists.
sentence about testing
We conducted a study with 32 blind SR users.
sentence about testing
We conducted a collaborative, user-centered design study with a team of scientific researchers.
sentence about testing
We conducted quantitative (N=79) and qualitative (N=93) studies with healthcare experts.
sentence about testing
We conducted a qualitative study with 35 blind and visually impaired (VI) Developers.
sentence about testing
We conducted a study with 32 blind US users.
sentence about testing
We conducted a study with 12 blind SR users.
sentence about testing
We conducted a collaborative, user-centered design study with a team of scientific researchers.
sentence about testing
We conducted quantitative (n=70) and qualitative (n=30) studies with healthcare experts.
sentence about testing
We conducted a qualitative study with 16 blind and visually impaired (BI) developers.
sentence about testing
We conducted role-playing exercises with 24 US journalists.
sentence about testing
We conducted a study with 32 blind SR users.
sentence about testing
We conducted a collaborative, user-centered design study with a team of scientific researchers.
sentence about testing
We conducted quantitative (N=79) and qualitative (N=93) studies with healthcare experts.
sentence about testing
We conducted a qualitative study with 35 blind and visually impaired (VI) Developers.
sentence about testing
We conducted a study with 32 blind US users.
sentence about testing
We conducted a study with 12 blind SR users.
sentence about testing
We conducted a collaborative, user-centered design study with a team of scientific researchers.
sentence about testing
We conducted quantitative (n=70) and qualitative (n=30) studies with healthcare experts.
sentence about testing
We conducted a qualitative study with 16 blind and visually impaired (BI) developers.
sentence about testing
We conducted role-playing exercises with 24 US journalists.
sentence about testing
We conducted a study with 32 blind SR users.
sentence about testing
We conducted a collaborative, user-centered design study with a team of scientific researchers.
sentence about testing
We conducted quantitative (N=79) and qualitative (N=93) studies with healthcare experts.
sentence about testing
We conducted a qualitative study with 35 blind and visually impaired (VI) Developers.
sentence about testing
We conducted a study with 32 blind US users.
sentence about testing