Each medical image, including mammograms, is stored in standard DICOM format. In addition to the pixel data (image) the DICOM header stores meta-data in fields that are known as “tags” which contain a large amount of information including patient identifiable information. In order to preserve the confidentiality of the patients, all header data is de-identified adopting the guidelines provided by DICOM Supplement 142 Annex A1. This supplement is an international standard for de-identification of data in medical image files.
In total 222 DICOM tags are “nulled” (removed or obfuscated according to DICOM specifications), in some instance obfuscating is necessary in order to preserve the integrity of the DICOM file. A full list of all tags nulled can be found here.
Pseudonymisation of DICOM tags
Pseudonymisation is necessary to allow us to send screening images back to expert radiologists or radiographers to annotate. Our pseudonymisation procedures have received ethical approval and were extensively reviewed by staff at the clinical sites.
Table 1, details the tags which are pseudonymised. The pseudonym lookup tables are held on secure servers at the clinical collection sites and access is restricted to the data managers.
DICOM Tag | Description | Pseudonymisation |
---|---|---|
0010,0010 | PatientName | Replaced with an auto incrementing number |
0010,0020 | PatientID | Changed to be the same as PatientName |
0010,0030 | PatientBirthDate | Altered to retain year and day and month set to '01' |
N/B: The auto incrementing can be global (i.e. across all institutes) or local (each institute has its own auto-incrementing number)
Additional De-identification
We have received ethical approval to share our collected de-identified data with other researchers to help stimulate further research which will benefit breast cancer screening. Before sharing images and data with third parties, additional de-identification is implemented in accordance with DICOM supplemented 142 annex A. Including removal of manufacturers private tags.
Table 2, illustrates the additional tags removed/obfuscated before sharing with third parties.
DICOM tag | Description |
---|---|
0008,0020 | StudyDate |
0008,0021 | SeriesDate |
0008,0022 | AcquisitionDate |
0008,0023 | ContentDate |
0008,0050 | AccessionNumber |
0008,0090 | ReferringPhysicianName |
0008,1010 | StationName |
0008,1070 | OperatorsName |
0020,0010 | StudyID |
0020,4000 | ImageComments |
0040,0244 | PerformedProcedureStepStartDate |
0018,700a | DetectorID |
0018,700c | DateOfLastDetectorCalibration |
0400,0100 | Digital Signature UID |
0020,000E | SeriesInstanceUID |
0040,A124 | UID |
0000,1000 | Affected SOP Instance UID |
0020,9161 | Concatenation UID |
0008,010D | Context Group Extension Creator UID |
0008,9123 | Creator Version UID |
0018,1002 | Device UID |
0020,9164 | Dimension Organization UID |
300A,0013 | Dose Reference UID |
0070,031A | Fiducial UID |
0020,0052 | Frame Of Reference UID |
0008,0014 | Instance Creator UID |
0008,3010 | Irradiation Event UID |
0028,1214 | Large Palette Color Lookup Table UID |
0002,0003 | Media Storage SOP Instance UID |
0028,1199 | Palette Color Lookup Table UID |
3006,0024 | Referenced Frame Of Reference UID |
0040,4023 | Referenced General Purpose Scheduled Procedure Step Transaction UID |
0008,1155 | Referenced SOP Instance UID |
0004,1511 | Referenced SOP Instance UID In File |
3006,00C2 | Related Frame Of Reference UID |
0000,1001 | Requested SOP Instance UID |
0008,0018 | SOP Instance UID |
0088,0140 | Storage Media File Set UID |
0020,000D | Study Instance UID |
0020,0200 | Synchronization Frame Of Reference UID |
0040,DB0D | Template Extension Creator UID |
0040,DB0C | Template Extension Organization UID |
0008,1195 | Transaction UID |