De-identification

Each medical image, including mammograms, is stored in standard DICOM format. In addition to the pixel data (image)  the DICOM header stores meta-data in fields that are known as “tags” which contain a large amount of information including patient identifiable information. In order to preserve the confidentiality of the patients, all header data is de-identified adopting the guidelines provided by DICOM Supplement 142 Annex A1. This supplement is an international standard for de-identification of data in medical image files.

In total 222 DICOM tags are “nulled” (removed or obfuscated according to DICOM specifications), in some instance obfuscating is necessary in order to preserve the integrity of the DICOM file. A full list of all tags nulled can be found here.

Pseudonymisation of DICOM tags

Pseudonymisation is necessary to allow us to send screening images back to expert radiologists or radiographers to annotate. Our pseudonymisation procedures have received ethical approval and were extensively reviewed by staff at the clinical sites.

Table 1, details the tags which are pseudonymised. The pseudonym lookup tables are held on secure servers at the clinical collection sites and access is restricted to the data managers.

DICOM TagDescriptionPseudonymisation
0010,0010 PatientNameReplaced with an auto incrementing number
0010,0020PatientIDChanged to be the same as PatientName
0010,0030PatientBirthDateAltered to retain year and day and month set to '01'

N/B: The auto incrementing can be global (i.e. across all institutes) or local (each institute has its own auto-incrementing number)

Additional De-identification

We have received ethical approval to share our collected de-identified data with other researchers to help stimulate further research which will benefit breast cancer screening. Before sharing images and data with third parties, additional de-identification is implemented in accordance with DICOM supplemented 142 annex A.  Including removal of manufacturers private tags.

Table 2, illustrates the additional tags removed/obfuscated before sharing with third parties.

DICOM tagDescription
0008,0020StudyDate
0008,0021SeriesDate
0008,0022AcquisitionDate
0008,0023ContentDate
0008,0050AccessionNumber
0008,0090ReferringPhysicianName
0008,1010StationName
0008,1070OperatorsName
0020,0010StudyID
0020,4000ImageComments
0040,0244PerformedProcedureStepStartDate
0018,700aDetectorID
0018,700cDateOfLastDetectorCalibration
0400,0100Digital Signature UID
0020,000ESeriesInstanceUID
0040,A124UID
0000,1000Affected SOP Instance UID
0020,9161Concatenation UID
0008,010DContext Group Extension Creator UID
0008,9123Creator Version UID
0018,1002Device UID
0020,9164Dimension Organization UID
300A,0013Dose Reference UID
0070,031AFiducial UID
0020,0052Frame Of Reference UID
0008,0014Instance Creator UID
0008,3010Irradiation Event UID
0028,1214Large Palette Color Lookup Table UID
0002,0003Media Storage SOP Instance UID
0028,1199Palette Color Lookup Table UID
3006,0024Referenced Frame Of Reference UID
0040,4023Referenced General Purpose Scheduled Procedure Step Transaction UID
0008,1155Referenced SOP Instance UID
0004,1511Referenced SOP Instance UID In File
3006,00C2Related Frame Of Reference UID
0000,1001Requested SOP Instance UID
0008,0018SOP Instance UID
0088,0140Storage Media File Set UID
0020,000DStudy Instance UID
0020,0200Synchronization Frame Of Reference UID
0040,DB0DTemplate Extension Creator UID
0040,DB0CTemplate Extension Organization UID
0008,1195Transaction UID

Documents