Medical Image Analysis with Deep Learning , Part 4

This is the fourth installment of this series, and covers medical images and their components, medical image formats and their format conversions. The goal is to develop knowledge to help us with our ultimate goal — medical image analysis with deep learning.

By Taposh Roy, Kaiser Permanente.


Nvidia GTC conference 2017 was an excellent source for all the effort on work on health care in Deep learning. Deep learning experts such as Ian GoodFellow, Jeremy Howard and others shared their perspective on Deep learning. Top medical schools (Mount Sinai, NYU, Massachusetts General Hospital, etc.) and Kaggle — lung cancer BOWL winners explained their modeling strategies. Coming back to our series, in the last article we talked about basic deep-learning on text and image data. In this article we will focus on the medical images and their formats.

This article is structured into 3 parts — Medical Images and their components, Medical Image formats and their format conversions. The goal is to develop knowledge to help us with our ultimate goal — medical image analysis with deep learning.


Medical Images & Components

A very good resource for this discussion is the paper published by Michele Larobina & Loredana Murino from, Institute of bio structures and bioimaging (IBB), Italy. This is part of The National Research Council (CNR). It is the largest public research facility in Italy. Another good source of reference is the paper “Working with the DICOM and NIfTI Data Standards in R”.

What is a medical Image? : A medical image is an representation of the internal structure or function of an anatomic region. It is in the form of an array of picture elements called pixels (2 dimensional) or voxels (3 dimensional) . It is a discrete representation resulting from a sampling or reconstruction process that maps numerical values to positions of the space. The number of pixels used to describe the field-of-view of a certain acquisition modality is an expression of the detail with which the anatomy or function can be depicted. What the numerical value of the pixel expresses depends on the imaging modality, the acquisition protocol, the reconstruction, and eventually, the post-processing. (Source: Link)

Medical Image Components

Medical Image Components

Medical Images have 4 key constituents — Pixel Depth, Photometric Interpretation,Metadata and Pixel data. These constituents are responsible for the size and resolution of the image.

Pixel Depth or Bit Depth or Color Depth is the number of bits used to encode the information of each pixel. For example, an 8-bit raster can have 256 unique values that range from 0 to 255.


Photometric Interpretation specifies how the pixel data should be interpreted for the correct image display as a monochrome or color image. To specify if color information is or is not stored in the image pixel values, we introduce the concept of samples per pixel, also known as number of channels. Monochrome images have one sample per pixel and no color information stored in the image. A scale of shades of gray from black to white is used to display the images. The number of shades of gray depends clearly from the number of bits used to store the sample that, in this case, coincide with the pixel depth. Clinical radiological images, like x-ray computed tomography (CT) and magnetic resonance (MR) images have a gray scale photometric interpretation. Nuclear medicine images, like positron emission tomography (PET) and single photon emission tomography (SPECT), are typically displayed with a color map or color palette. [Source: link]

Metadata is the information that describe the image. It can seem strange, but in any file format, there is always information associated with the image beyond the pixel data. This information called metadata is typically stored at the beginning of the file as a header and contains at least the image matrix dimensions, the spatial resolution, the pixel depth, and the photometric interpretation.

Pixel Data - This is the section where the numerical values of the pixels are stored. According to the data type, pixel data are stored as integers or floating-point numbers using the minimum number of bytes required to represent the values

Image Size = Header size(includes metadata)+ Rows* Columns * Pixel Depth * (Number of Frames)


Medical Image Formats

There are 6 predominant formats for radiology images — DICOM (Digital Imaging and Communications in Medicine) , NIFTI (Neuroimaging Informatics Technology Initiative) , PAR/REC (Philips MRI scanner formats), ANALYZE (Mayo Medical Imaging), NRRD (Nearly Raw Raster Data) and MNIC.

Medical Image formats as of May 2017

Of these 5, DICOM and NIFTI are the more popular ones.


DICOM Format Basics

DICOM stands for Digital Imaging and Communications in Medicine. DICOM is standard created by the National Electrical Manufacturers Association (NEMA). It defines a standard for handling, storing, printing and transmitting information in medical imaging. These are the format of files that you can expect right off a scanner or hospital PACS (picture archiving and communication system).

It includes a file format and a network communications protocol that uses TCP/IP to communicate between entities that are capable of receiving image and patient data in DICOM format.

A DICOM file consists of a header and the image data in the same file (*.dcm). The size of the header depends on how much header information is provided. The header contains information such the Patient Id, Patient Name, Modality and other information. It defines also how many frames are contained and in which resolutions. This is used by image viewers to display the image. For a single acquisition there will be a lot of DICOM files.


A python library to read dicom files is pydicom. Refer to the code sample in the part 1 of this article.

A R-package for reading dicom data is “oro.dicom”.

Using oro.dicom package to read an Uncompressed DICOM File


NIFTI Format Basics

I remember Nifti was originally created for Neuroimaging. The NIfTI format was envisioned by the Neuroimaging Informatics Technology Initiative (NIFTI) as a replacement for the ANALYZE 7.5 Format. It has it’s origin in the field of neuro-imaging but can be used in other fields as well. A major feature is that the format contains two affine coordinate definitions which relates each voxel index (i,j,k) to a spatial location (x,y,z).

A python library to read nifti files is nibabel. A R-package for reading nifti data is “oro.nifti”.

Differences between DICOM and NIFTI

The main difference between DICOM and NIfTI is that the raw image data in NIfTI is saved as a 3d image, where in DICOM you have 2d image slices. This makes NIFTI more preferable for some machine learning applications over DICOM, because it is modeled as a 3d image. Handling a single NIFTI file instead of several hundreds of DICOM is easier. Nifti stores 2 files per 3d image as opposed to dozens in DICOM.


NRRD Format Basics

The flexible Nrrd format includes a single header file and image file(s) that can be separate or combined. A Nrrd header accurately represents N-dimensional raster information for scientific visualization and medical image processing. National Alliance for Medical Image Computing (NA-MIC) has developed a way of using the Nrrd format to represent Diffusion Weighted Images (DWI) volumes and Diffusion Tensor Images (DTI).Nrrd DWI and Nrrd DTI data can be read into 3D Slicer, to visually confirm that the orientation of the tensors is consistent with expected neuroanatomy [link]

The general format of a NRRD file (with attached header) is:



MINC Format Basics

MINC stands for Medical Imaging NetCDF Toolkit. The MINC file format development was started in 1992 at Montreal Neurological Institute (MNI). Currently there is active work going on at McGill’s Brain Imaging center (BCI). The first version of Minc format (Minc1) was based on the standard Network Common Data Format (NetCDF).

Minc2 switched from NetCDF to Hierarchical Data Format version 5 (HDF5). HDF5 supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex data. These added features helped Minc2 to work with large and complex data-sets.

Some comparisons obtained from research papers in headers of these formats.

Source: Medical Image Formats, Springer Publication 2014.


Format Conversions


A popular tool for converting DICOM to NIfTI is dicom2nii. A python library to read and write nifti files is nibabel. If one would like to convert DICOM to Nifti, there are tools for automatic conversion (e.g. dcm2nii). Python 2 library “dcmstack” allows series of DICOM images to be stacked into multi-dimensional arrays. These arrays can be written out as Nifti files with an optional header extension (the DcmMeta extension) containing a summary of all the meta data from the source DICOM files. A newer library dicom2nifti is available for Python 3. I would also recommend the reader to check out the nipy project.


MINC team at BIC has developed a tool to convert from DICOM to MINC images. This program is written in C and the github repo is here.


MINC team at BIC has developed another tool to convert from NIFTI or Analyze images to MINC images. This program is called nii2mnc. A list of conversion tools by BIC including nii2mnc is here.



As we see there are several formats of storing imaging and utilizing them for Deep learning. Our goal is to use the best format that will enable us to get all features we need for our Convolutional Neural Net (CNN) to predict accurately.

In the next article we will discuss how to segment the lungs from a CT scan using one of the formats.


Bio: Taposh Roy leads innovation team in Kaiser Permanente's Decision Support group. He works with research, technology and business leaders to derive insights from data.

Original. Reposted with permission.