Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

How were the records de-identified?

April 26, 2017de-identified records

0

0 Posted

How were the records de-identified?

1 Answer

0

0 Posted

The records were de-identified semi-automatically. An automatic and two manual passes were made over each record. The manual passes followed the automatic but were in parallel with each other. A third manual annotator resolved the disagreements that resulted from the two manual passes and finalized the identification of private health information. We then replaced the identified private health information in several ways. For names of doctors and patients, we drew random names from the US Census Bureau names dictionary. Therefore, the surrogate names in the records will look like real names but they do not belong to the actual patients. We made no effort to keep co-reference, i.e., for each occurrence of Dr. “John Smith”, we drew another name from the US Census bureau dictionary. For phone numbers, ID numbers, and ages, we randomly generated surrogates by replacing each digit with a random digit and each letter with a random letter. For dates, we generated random dates as surrogates. F