47 Matching Annotations
  1. Apr 2023
    1. Identify the chief complaint and pertinent past medical history

      This is an interesting, but perhaps idealistic view of what clinicians are supposed to do when presented with a patient. I can't imagine that all of these happen at a conscious level, let alone at a tactile level, for every single patient

  2. Aug 2022
    1. t would be difficult to assign, or even expect, a single “correct” diagnosis at this stage, the emphasis being on considering the most appropriate set of diagnoses (“high frequency, highly plausible, high impact”).

      Because of the setting that Isabel is in, even clinicians don't know the "correct" diagnosis at the point of care.

    2. suggestions were not weighted based on how reasonable or appropriate they were

      But why - because Isabel's source for suggestions is automatically irrelevant? Interesting

    3. searches the underlying knowledge base in response to clinical features input in free text and displays diseases with matching textual patterns arranged by body system rather than in order of clinical probability. Thus, ISABEL functions primarily as a reminder tool to suggest 8 to 10 diagnostic hypotheses to clinicians, rather than acting as an “oracle.

      Wow that's fascinating - they go for sheer quantity of suggestions, but filtered to the arbitrary (?) but probably cognitively reasonable 8 to 10 diagnoses

    4. the quality of a diagnostic hypotheses plan—irrespective of whose efforts it represented (system or user).

      This sounds great - what's the problem with that?

    5. In this setting, it was not essential that the system possessed a high degree of diagnostic accuracy, so long as its suggestions positively influenced users' diagnostic reasoning

      I believe this is where we fall. We rely on the practitioner to be a human filter. That automatically sounds like we are adding, not subtracting, from their cognitive load.

    6. More sophisticated measures of system performance proposed by Berner et al.14,22 also studied the ranking of diagnostic hypotheses in a system's list and other discrete indicators of diagnostic quality, such as relevance and comprehensiveness, generated by comparing the DDSS diagnostic hypothesis set to a “gold standard” set generated by expert clinicians.

      Omg this is perfect 🤩 - this is exactly the study I wanted to run

    7. Early studies expected the DDSS to be able to predict the “correct” diagnosis in a diagnostic dilemma.

      I don't know that we suggest the "correct" diagnoses. We suggest different possible diagnoses. Are we still a CDSS?

    8. ew studies have been able to convincingly show changes in physician behavior or improved patient outcomes

      These are two separate claims. Can the systems change Physician behavior? Do those changes positively impact patients?

    1. In an early project, we integrated,revised, and expanded the ten heuristics by Nielsen [54] and the eightgolden rules by Shneiderman [56] to form 14 principles customized forthe health domain [57].

      Where can I find other researchers using these heuristics? Was it wise or worth it to make the heuristics list longer and harder to compare against standards?

    1. Enter HPI (History of Present illness).2.Enter PMI (Present Medical Illness).3.Document Social History.4.Document Family History.5.Enter Vital Signs.6.Enter Order Consult.7.Document Coding of the Procedures.8.Entering the Lab Order.9.Document Instructions – Other Therapies.10.Order Radiology Study.11.Document Comments in A/P Diagnosis.12.Review Coding of Medical Encounter.13.Document Follow-up Plan.14.Associate Orders/Medication/Labs.

      Do we have the same prototypical usage?

    2. In this study, we used the Keystroke Level Modeling (KLM) to estimate time on task, task steps, and mental effort for fourteen prototypical use cases for the AHLTA EHR.

      What is Keystroke Level Modeling and does it involve special technology

    3. There are many other types of representation analysis, some of which are being developed and evaluated in our EHR Usability Lab at the National Center for Cognitive Informatics and Decision Making in Healthcare.

      What is this National Center for Cognitive Informatics and Decision Making in Healthcare?!

    4. This method starts with the identification of the system hierarchy of an EHR system. The system hierarchy was created by visually inspecting the user interface items from top to bottom and left to right.

      Hierarchy as the first step was unexpected

    5. he Designer Model has 60 functions and it was obtained through a complete system walkthrough. The User Model has 80 functions and it was developed by conducting interviews and surveys with end users. The Activity Model has 97 functions and it was developed by doing a field study that involved many sessions of shadowing and observation (for details, see [45]) of the end users in the clinics.

      A heuristic analysis or cognitive walkthrough could reveal lots from the Designer model

    6. n this example, the Goal is “treating high glucose level in a pre-diabetic patient”; the operation is “writing a medication prescription”, the objects for this operation include patient name, doctor’s name, diagnosis, medication name, dosage, frequency, duration, route, etc.; the constraints include the dependency relations between operation and objects (e.g., operation “write a medication prescription” and the objects “Metformin” and “500 mg”), between objects (e.g., the object “glucose level” and the object “Metformin”), and between operations (e.g., the operation “write a prescription” and the operation “modify problem list”)

      Ok just kidding, this works a lot differently than expected

    7. Operations are performed on the objects under the constraints to achieve the goals.

      Operations are series of tasks, objects are the icons and tools, constraints and goals are what they sound like

    8. it tells us the inherent complexity of the work, it separates work context (physical, social, organizational, etc.) from the inherent nature of the work; and it supports identification of overhead activities that are non-essential for the work but introduced solely due to the way the system is implemented.

      Separates the work that exists in reality from the work that the system introduces. Wonderful - great way to look at it especially in an EHR

    9. The ontology of work domain is the basic structure of the work that the system together with its human users will perform.

      This is not the definition of ontology that I'm used to seeing

    10. User analysis is the process of identifying the types of users and the characteristics of each type of users. For the EHR domain, types of users include physicians at various levels (e.g., attending, fellow, resident, medical student, etc.) and in various specialty areas (family practice, intensive care, dermatology, surgery, etc.), nurses at various specializations, medical technicians, medical staff, patients and family members, and so on.

      Ayyyy there's already a standard user classification. Wonder if this is thorough enough for us

    11. The essence of usability is the representation effect. Representation effect is the phenomenon that different representations of a common abstract structure (e.g., a work domain ontology, see Section 3.2.1 for details) can generate dramatically different representational efficiencies, task difficulties, and behavioral outcomes

      The same real thing represented in different ways can yield very different results

    12. Successful applications should be both useful and usable, and they should be considered together because they are not independent, as demonstrated by Butler et al. [35]

      How on earth do you measure how USEFUL something is? Interesting

    13. Landauer argued that successful applications should be not only usable, their functionality should also be useful

      TikTok is highly usable, highly useless

    14. Satisfaction under TURF is similar to satisfaction under ISO definition of usability. Under TURF, satisfaction refers to the subjective impression of how useful, usable, and likable the system is to a user.

      This would be very important to the marketing team and (maybe) leadership.

    15. Task steps refer to the number of steps (both mental steps such as recalling a drug name from memory and physical steps such as clicking a button on the screen) needed to complete a task

      I am fairly certain that someone has listed these out already for a study on common EHR tasks such as: suggested diagnosis acceptance

    16. Under TURF, a system is usable if it is easy to learn, efficient to use, and error-tolerant. How usable a system is can be measured by learnability, efficiency, and error tolerance.

      This seems inherently different from the definition of TURF above

    17. The evaluation was performed by three independent evaluators whose results were integrated into a master list of all violations. Then each evaluator independently rated each violation for its severity on a scale of 1 to 4 (1 = cosmetic; 2 = minor; 3 = major; 4 = catastrophic) and their ratings were then averaged, as shown in Fig. 9.

      Seems like a wonderfully simple study that we could do

    18. In an early project, we integrated, revised, and expanded the ten heuristics by Nielsen [54] and the eight golden rules by Shneiderman [56] to form fourteen principles customized for the health domain [57]. We have since applied the fourteen principles to a variety of healthcare domains [57], [58], [59], [60].

      Ooooh perfect - a new and untested heuristics set to use in a new and untested domain!

      Honestly, it is probably wise to have a more specific set of criteria for healthcare settings

    19. 1.High affordance: Operation can be perceived by using external cues in the interface.2.Medium affordance: Operation can be perceived by external cues in the interface and internal knowledge of the application.3.Low affordance: Operation can be perceived mainly by using internal knowledge of the application.

      Can someone figure out what this button does on the first try?

    20. ISO defines usability as “the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency, and satisfaction in a specified context of use.”

      This seems like a much better definition to be honest - what is ISO though?~

    21. useful, usable, and satisfying

      Interesting - reliable and efficient and correct are not captured here. Are they prioritizing satisfaction over success? Is that wise in a healthcare or decision support system?

    22. Designing and implementing EHR is not so much an IT project as a human project about usability, workflow, patient safety, and organizational change [8], [11], [18], [22], [23], [24], [25], [26], [27]

      Yes. Changes to the EHR seem to require changes to the daily lives and routines of practices. We are pushing organizational change. EHR success is heavily correlated with heavy training.

      What is the company's favorite type of practice? Who are we designing for? Why? Do we know?

    23. usability issues, which did not receive significant attention in the EHR community until recently

      It's been said elsewhere that usability did not become a priority or even a possibility in the EHR field until recently. First, compliance, then proliferation, now usability. This has hard baked some issues into EHR systems.

      I wonder what EHR systems outside of the US are designed like?

    24. However, there are huge gaps between the status quo and the potential of EHR, primarily due to cognitive, financial, security/privacy, technological, social/cultural, and workforce challenges [8], [9], [10], [11].

      That's a lot of different types of challenges o.o. Only some of those are in the direct control of a vendor organization

    1. You didn’t need to write down your opinions about anything; you were able to vent your spleen in the post-session debriefing with the researcher, and that was recorded on video tape so it could be transcribed and analyzed at some later stage.

      Capturing deep data through conversation rather than writing is a great practice for researchers.

    2. After all, if SUS indicated that people didn’t rate a system very highly on usability, we had video tapes of the sessions they spent interacting with the system to go to

      I see. SUS acts as a red flag. Then, you can go back and see why the user is struggling.

    3. But why is there the rigmarole around converting the scores to be between 0 and 4, then multiplying everything by 2.5? This was a marketing strategy within DEC, rather than anything scientific. Project managers, product managers, and engineers were more likely to understand a scale that went from 0 to 100 than one that went from 10 to 50, and the important thing was to be able to grab their attention in the short space of time they were likely to spend thinking about usability, without having to go into a detailed explanation. (Also, where differences in perceived usability were achieved, having a scale of 0 to 100 was likely to make the differences be perceived by team members as being greater than on a smaller scale—not that it makes any difference when it came to actual analysis.)

      This is pretty clever manipulation of an output. Methodologically nothing wrong with it, but great for shareability.

    4. he normative data collected by Bangor, Kortum, and Miller (2008) and Sauro (2011) provided the basis for positioning SUS scores as percentiles, providing a more meaningful basis for interpreting SUS scores

      SUS scores are NOT percentages, they are percentiles. That is to say, they are grades not scales. Getting 50 percent on a test doesn't mean that you know 50 percent of the subject, it means you failed.

    5. Tullis and Stetson’s (2004) research showed that using SUS enables you to get a measure of the perceived usability of a system with a small sample (say, 8-12 users) and be fairly confident that you’ve got a good assessment of how people see your system or product. As Figure 2 shows, using SUS means that you reach a “correct” conclusion quicker, and that you reach a greater level of consistency between respondents sooner than you do with other questionnaires.

      Get your hands on a dozen users with each evaluation taking 20 to 40 minutes and you can get a reliable SUS score. Interesting

    6. SUS is not diagnostic. That is, it does not tell you what makes a system usable or not.

      How is it possible to not be at all diagnostic? Are there any widely comparable evaluation tools that are diagnostic? Other than NN Group's Heuristic Evaluations

    7. SUS scores have a modest correlation with task performance, but it is not surprising that people’s subjective assessments may not be consistent with whether or not they were successful using a system. Subjective assessments of usability are only one component of the overall construct of usability.

      People may love a thing and do terribly on the task it supports. Conversely, people may hate the thing but succeed consistently. For our product, what is most important? For our stakeholders, I should say, what is more important?

    8. You might want to decide, for example, whether it would be better to use a website or a voice-based system to access a bank account. SUS allows you to make that comparison, at least as far as perceived usability goes. Because SUS is pretty much technology-neutral, you can continue to use it as technology evolves over the years, and you don’t have to continually reinvent questionnaires.

      Interesting. In our continually multimodal world where every service can interact via voice and IoT and more, SUS would be valuable. Is it possible or advisable to make more specific questionnaires while still preserving the technological agnosticity of SUS?

    9. Because of the efforts of many researchers over the years, it’s now possible to choose SUS and be confident it is a valid and reliable measuring tool, be able to make a comparison between the scores you achieve and some normative standards (and thus with other systems or products), and to have some idea not only of whether people like your system, but also whether they would recommend it to others.

      This is the journey that a standardized usability scoring system needs to go through huh? It has to be validated and tweaked via peer review. TURF is maybe not old enough to have gone through these tweaks, yet it might have more interesting comparitors using it.

    10. n my previous research, to do with the design of information displays and decision support systems, especially where they were concerned with the support of fault diagnosis for operators of continuous industrial processes (e.g., Brooke & Duncan, 1981, 1983) or for decision making by doctors (Sheldon, Brooke, & Rector, 1985)

      Whaaaaaaat? The person who made SUS started out investigating decision making in doctors and other support systems?! That's awesome. These old papers might be fascinating

    11. Eventually, about a decade after I first created it, I contributed a chapter describing SUS to a book on usability engineering in industry (Brooke, 1996). Since then its use has increased exponentially. It has now been cited in (at the time of writing) more than 1,200 publications and has probably been used in many more evaluations that have not been published. It has been incorporated into commercial usability evaluation toolkits such as Morae, and I have recently seen several publications refer to it as an “industry standard”—although it has never been through any formal standardization process.

      Would be nuts to see something you scrapped together for work proliferate worldwide and become an "industry standard". Maybe it's pioneers bias because UX was so young. But still, nuts.