{"total": 3, "rows": [{"consumer": "00000000-0000-0000-0000-000000000000", "id": "hAQdXaQ9RleboePC6fyzUQ", "created": "2015-03-29T15:59:31.991566+00:00", "updated": "2015-04-14T19:15:27.062779+00:00", "user": "acct:Inquistor@hypothes.is", "uri": "https://twitter.com/KenNeumeister/status/582190731592396800", "text": "**Accidental Data Recognitions**\n\n@KenNeumiester suggested in this tweet:\n\n<blockquote class=\"twitter-tweet\" data-conversation=\"none\" lang=\"en\"><p><a href=\"https://twitter.com/dbarthjones\">@dbarthjones</a>  intentional re-identification is different from accidental re-identification. accidents frequent. intention is hard.</p>&mdash; Ken Neumeister (@KenNeumeister) <a href=\"https://twitter.com/KenNeumeister/status/582190731592396800\">March 29, 2015</a></blockquote>\n<script async src=\"//platform.twitter.com/widgets.js\" charset=\"utf-8\"></script>\n\nthat accidental data re-identifications occur frequently. His comment was surprising to me because I've never experienced an accidental or \"spontaneous\" data re-identification in 25 years of working on health data de-identification. Our twitter exchange about this was interesting and I learned that his take on this may differ from mine because he apparently works in a different data domain than I do, but it started me thinking.\n\nPart of the reason for my having never encountered a single spontaneous re-identification in over two decades of opportunity for this to happen might be that I'd ended my experience providing patient care in a clinical setting before I began working in the area of data de-identification. I don't doubt that spontaneous health data re-identifications might occasionally occur for clinicians who work with de-identified data for patients that they've personally seen (or maybe even with their recognizing patients that have been treated in the units/offices where they work).\n\nA couple of points are probably worth mentioning though about such spontaneous clinical environment data \"re-identifications\". First, if they occur, they would typically only impact a single person and certainly not any large numbers of persons. In any large healthcare data set this should presumably constitute a very small re-identification risk among the total number of persons at risk of re-identification. Secondly, and more importantly, I not really sure we should consider such spontaneous events as \"re-identifications\". I'd propose that \"spontaneous data recognitions\" would be a more illuminating name for such events.\n\nWhy does the distinction I'm making matter? Well, because it speaks to both the real source of potential privacy concern and the likelihood of privacy harms resulting from such an event. The de-identified data isn't the source of the critical information needed to enable possible privacy concerns here. The knowledge of a patient's details enabling a spontaneous recognition of the patient within a de-identified data set hasn't come from the data set -- it comes from external information that the data viewer already possessed independent of the data. \n\nSure, on occasion we might suppose that the data \"recognizer\" might also learn something new about the patient from the data which wasn't already known to them. But with respect to the possibility of resulting privacy harms from such events, we need to remind ourselves that the likelihood of the newly revealed information being of great sensitivity or otherwise capable of producing possible privacy harms for the patient will likely be a comparatively rare occurrence.This is because it will have most likely have been the case that the patient was recognizable primarily due to their having some atypical / rare characteristics enabling their spontaneous recognition. There might be additional info within the data that the recognizer didn't already know, but initial recognitions will be rare events, and additional revelations resulting in privacy harms will be rarer still. This is basic probabilistic reasoning (in spite of the problem that even very smart people will sometimes avoid \"doing the math\" if they've been poked with a scary scenario first). See http://blogs.law.harvard.edu/infolaw/2014/11/21/the-antidote-for-anecdata-a-little-science-can-separate-data-privacy-facts-from-folklore/\n\nBut the probabilistic reality remains that the result of a final outcome B stemming from an initial rare event A, which then can lead to yet another rare event (or even moderately frequent) event B, yields only rarer still occurrences of event B. \n\nWe can add to this probabilistic assurance that because, in general (at least for this context of spontaneous health data recognitions) having sufficiently detailed knowledge that a person has atypical characteristics (or a rare medical condition) which could enable their spontaneous recognition usually comes from having already been placed in a general position of trust with respect to the participating in the patient's care, or otherwise having already had some sort of trusted relationship with the patient.\n\nSo in the same way the we generally trust doctors, nurses and other healthcare providers to with our personal health information, the same trust - and ethics that enable such trust - should be rightly expected to be in place for sensitive information revealed through spontaneous medical care data recognition events.\n\nThe very same reasoning also generally applies even for the context of non-medically trained personnel.(statisticians, data analysts, etc.) accessing de-identified medical data. With properly de-identified data, spontaneous data recognitions are only likely to occur extremely rarely for those cases where there isn't already some sort of existing relationship with the person being recognized that enabled the detailed knowledge required to realize the recognition. \n\nYes, on extremely rare occasions, statisticians and data analysts, might spontaneously recognize their family members, neighbors, or workplace friends/ acquaintances or even themselves within properly de-identified health data, but the number of people for they will know sufficient details to allow this to occur will be very small in any large and properly de-identified data set. And when this does rarely occur, resultant privacy harms will likely also be rare because, in general, the knowledge of the details through which a spontaneous re-identification could occur will be have been obtained through having an existing relationship with the individual who has been recognized. Put in statistical terms, we are helped in this specific context of spontaneous recognitions by the fact that knowledge of our intimate details that are needed to enable such recognitions is typically correlated with having some sort of trust relationship with us.\n\nI'd further argue though that we should also be supplementing these inherently probabilistic protections by consistently providing ethical training and mentorship to all personnel who access de-identified medical data. In my training as an HIV epidemiologist, I received plenty of meaningful mentorship about privacy ethics, but had admittedly little course-based training on these issues. This is an area where we can and should do more.\n\nHowever, I think it's helpful to recognize that although accidental or spontaneous re-identification might occur on occasion, there's good reason to understand that they are likely to be quite rare and when they do occur, they most often will be unlikely to pose an influential source of privacy harms.", "tags": [], "group": "__world__", "moderation_status": "APPROVED", "permissions": {"read": ["group:__world__"], "admin": ["acct:Inquistor@hypothes.is"], "update": ["acct:Inquistor@hypothes.is"], "delete": ["acct:Inquistor@hypothes.is"]}, "target": [{"source": "https://twitter.com/KenNeumeister/status/582190731592396800", "selector": [{"type": "RangeSelector", "endOffset": 138, "startOffset": 21, "endContainer": "/div[2]/div[2]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/p[1]", "startContainer": "/div[2]/div[2]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/p[1]"}, {"type": "TextQuoteSelector", "exact": "intentional re-identification is different from accidental re-identification.  accidents frequent. intention is hard."}, {"type": "FragmentSelector", "value": ""}]}], "document": {"title": ["Ken Neumeister on Twitter: \"@dbarthjones @sib313 intentional re-identification is different from accidental re-identification.  accidents frequent. intention is hard.\""]}, "links": {"html": "https://hypothes.is/a/hAQdXaQ9RleboePC6fyzUQ", "incontext": "https://hyp.is/hAQdXaQ9RleboePC6fyzUQ/twitter.com/KenNeumeister/status/582190731592396800", "json": "https://hypothes.is/api/annotations/hAQdXaQ9RleboePC6fyzUQ"}, "actions": [], "mentions": [], "user_info": {"display_name": null}, "flagged": false, "hidden": false}, {"consumer": "00000000-0000-0000-0000-000000000000", "id": "Kmzaols6RmGvAEkVbwDWrQ", "created": "2015-04-14T18:53:42.321762+00:00", "updated": "2015-04-14T18:53:42.321778+00:00", "user": "acct:dwhly@hypothes.is", "uri": "https://twitter.com/KenNeumeister/status/582190731592396800", "text": "test3", "tags": [], "group": "__world__", "moderation_status": "APPROVED", "permissions": {"read": ["group:__world__"], "admin": ["acct:dwhly@hypothes.is"], "update": ["acct:dwhly@hypothes.is"], "delete": ["acct:dwhly@hypothes.is"]}, "target": [{"source": "https://twitter.com/KenNeumeister/status/582190731592396800", "selector": [{"type": "RangeSelector", "endOffset": 118, "startOffset": 110, "endContainer": "/div[2]/div[2]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/p[1]", "startContainer": "/div[2]/div[2]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/p[1]"}, {"end": 1121, "type": "TextPositionSelector", "start": 1113}, {"type": "TextQuoteSelector", "exact": "frequent", "prefix": "al re-identification. accidents", "suffix": ". intention is hard. 0 retweets"}, {"type": "FragmentSelector", "value": ""}]}], "document": {"title": ["Ken Neumeister on Twitter: \"@dbarthjones @sib313 intentional re-identification is different from accidental re-identification.  accidents frequent. intention is hard.\""]}, "links": {"html": "https://hypothes.is/a/Kmzaols6RmGvAEkVbwDWrQ", "incontext": "https://hyp.is/Kmzaols6RmGvAEkVbwDWrQ/twitter.com/KenNeumeister/status/582190731592396800", "json": "https://hypothes.is/api/annotations/Kmzaols6RmGvAEkVbwDWrQ"}, "actions": [], "mentions": [], "user_info": {"display_name": "Dan Whaley"}, "flagged": false, "hidden": false}, {"consumer": "00000000-0000-0000-0000-000000000000", "id": "_fqplC-TT2eJRXMmIPl0zQ", "created": "2015-04-14T18:45:02.769334+00:00", "updated": "2015-04-14T18:45:02.769350+00:00", "user": "acct:dwhly@hypothes.is", "uri": "https://twitter.com/KenNeumeister/status/582190731592396800", "text": "re-test1", "tags": [], "group": "__world__", "moderation_status": "APPROVED", "permissions": {"read": ["group:__world__"], "admin": ["acct:dwhly@hypothes.is"], "update": ["acct:dwhly@hypothes.is"], "delete": ["acct:dwhly@hypothes.is"]}, "target": [{"source": "https://twitter.com/KenNeumeister/status/582190731592396800", "selector": [{"type": "RangeSelector", "endOffset": 138, "startOffset": 21, "endContainer": "/div[2]/div[2]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/p[1]", "startContainer": "/div[2]/div[2]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/p[1]"}, {"type": "TextQuoteSelector", "exact": "intentional re-identification is different from accidental re-identification.  accidents frequent. intention is hard."}, {"type": "FragmentSelector", "value": ""}]}], "document": {"title": ["Ken Neumeister on Twitter: \"@dbarthjones @sib313 intentional re-identification is different from accidental re-identification.  accidents frequent. intention is hard.\""]}, "links": {"html": "https://hypothes.is/a/_fqplC-TT2eJRXMmIPl0zQ", "incontext": "https://hyp.is/_fqplC-TT2eJRXMmIPl0zQ/twitter.com/KenNeumeister/status/582190731592396800", "json": "https://hypothes.is/api/annotations/_fqplC-TT2eJRXMmIPl0zQ"}, "actions": [], "mentions": [], "user_info": {"display_name": "Dan Whaley"}, "flagged": false, "hidden": false}]}