Friday, January 24, 2020

Your Hospital Records Might Be Public

Much of your private data is available online. But you probably already know that. Google is tracking your every move, Facebook is selling your data, and your car may even be monitoring you now.  Ok, but this is in the hands of private companies right? Everyone can't see these data.  However, health care providers, financial institutions, and educational institutions have been publicly releasing their records for use in academic research, and these records are very useful for developing machine learning algorithms. 

Before data is anonymized



This seems like a major privacy concern and sounds very dangerous. Luckily, these companies were on the same page as Moor and decided to do the ethics first before feeding these data to machine learning algorithms. To protect the people in these records, they decided to anonymize the data. Basically, they would take the data and remove some of the information that could easily identify the people in the records before they released it publicly.
Anonymized data

At a first glance this seems very secure. When reading the anonymized table it doesn't seem possible to match up the disease with the person, however, with new technology it is becoming easier and easier. This is a limitation that Moor recognized with the ethics first approach to new technologies.  In 1997, a researcher was able to identify the Governor of Massachusetts William Weld's hospital records from records released for research purposes. She was able to do this by piecing together the hospital records that were released with voter rolls in the city of Cambridge, MA.

To prevent data re-identification, companies generalize the data even more before releasing, but this comes with a cost. Generalization of data loses information which makes hit harder to perform research using the data.  It is unethical to release sensitive data that could be traced back to specific people, but we need to find a middle ground that protects the people in the data while also releasing useful data for machine learning research.
Generalized data

6 comments:


  1. Your post intrigued me because in my EECS 485 (Web Systems) class, we learned about database privacy and k-anonymity. Unfortunately, the problem with k-anonymity is that generalization loses information just as you mentioned with medical research. This is one of the only posts I have seen where the technology usage is “proactive,” so it would be interesting to hear more on whether you think they did a good job of identifying all of the ethical issues or how they could improve this anonymization such as telling users about the potential risk of their data being used.

    ReplyDelete
  2. Medical privacy is definitely a massive concern we face today. I wish you clarified a bit more on the application of anonymized data - you said they're useful for "machine-learning algorithms", what exactly do you mean by this? If you're talking about health insurance companies using this anonymized data to predict life expectancy, that's its own issue. Off the top of my head, I can't see why a bank would want access to medical data, especially if it's anonymized (in a bank's case, they can just see if their client has paid hospital bills recently). Can you elaborate a bit more?

    ReplyDelete
  3. You provided useful insight on hospital data, and I agree that we need to find a middle ground for protecting both privacy and research capabilities. Your exposition of hospital data was adequately robust, but your explanation of Moor's paper could be bolstered. I suggest giving a bit of background information on Moor and who he is in the first paragraph, as this would allow you to further connect his arguments to your own in the second paragraph. Also, I got a bit confused in the third paragraph when you started talking about new technology's increased ability to match data to people, then describe it as "a limitation that Moor recognized" in the next sentence. Expanding on how this extension of technology is also a limitation would be helpful.

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. As someone who has little to no knowledge of medical privacy, this blog post did a good job explaining to me how data can be anonymized and generalized for research purposes. I like how this post is clearly structured and each paragraph has its own points to make. And I find the visual aids to be useful. However, I noticed that you have only mentioned Moor’s “ethics first approach” and then quickly moved on to the next example. If I were someone who has never done the reading before, I would not have been able to know what exactly you were referring to, and what the approach entails. Therefore, I think it would be useful for you to incorporate more in-depth analysis of Moor’s work to help make your message clearer.

    ReplyDelete
  6. I found your post to be very captivating and really hooked me as a reader and as someone who has interested in health systems. I like how you linked to other student's blog posts within your post and how you backed up your main claim with the example with 1997 medical records. It would've been even more helpful if you linked to a story about it. I do think you could have elaborated on the readings a bit more and given a sentence or two about what the general scope of Moor's ideas were. This would be very helpful if someone outside of this class were reading this. Other than that, I really like your post and thought this was a unique example!

    ReplyDelete

Note: Only a member of this blog may post a comment.