使用pydicom和SimpleITK预处理dicom文件_dcm改层厚-程序员宅基地

技术标签： paper

注意：我安装了pydicom之后需要安装gdcm依赖，但我不能成功import gdcm，所以在下面的代码中都可能同时（混合）使用了pydicom和SimpleITK包读取的图像数据来做预处理。

可以成功import gdcm的同学请直接移步参考：https://www.kaggle.com/gzuidhof/full-preprocessing-tutorial

（一）简介与可视化

对于医学CT图像来说，每个人有若干张（切片）.dcm类型的图像存入同一个文件夹中，分别是不同角度/部位的医学成像，可以用mango/小赛之类的看图软件看一下，也可以用matlab可视化：

clear all;
clc;
close all;
file_path='.\images\001\';
img_path_list=dir(strcat(file_path,'*.dcm'));%获取该文件夹中所有dcm格式的图像
img_num=length(img_path_list);%获取图像总数量
imagename=img_path_list(1).name;%图像名
imagemax=dicominfo(strcat(file_path,imagename));  %读取图像
imweigth=imagemax.Rows;     %DICOM水平像素数量
imhigh=imagemax.Columns;    %DICOM垂直像素数量
imagemax1=dicomread(strcat(file_path,imagename));  %读取图像
I=imagemax1;

for j=2:img_num;  %逐一读取图像
    imagename=img_path_list(j).name;%图像名
    imagemax=dicomread(strcat(file_path,imagename));  %读取图像
    j
            I=imfuse(imagemax,I); %投影图像对应点取两幅图像合成值
end
I=rgb2gray(I);
m=max(max(I));
imshow(I,[]); %显示投影图像
imwrite(I,'001.png','png'); %保存投影图像

（二）读取

3D的CT不像2D图像可以直接读取预处理使用，可以借助python中的pydicom或者SimpleITK包来处理dicom。

SimpleITK

首先要将每个人的CT存储为一个大小为N*W*H的3D数组，其中N是这个人的.dcm切片图像数量，W*H是每张图的尺寸。

import SimpleITK as sitk

directorypath = './images/001/'#
def get_img_array(directorypath):
    reader = sitk.ImageSeriesReader()
    reader.MetaDataDictionaryArrayUpdateOn()
    reader.LoadPrivateTagsOn()
    series_IDs = sitk.ImageSeriesReader.GetGDCMSeriesIDs(directorypath)
    series_ID = series_IDs[0]
    dicom_names = reader.GetGDCMSeriesFileNames(directorypath,series_ID)
    reader.SetFileNames(dicom_names)
    image3D = reader.Execute()
    imgArray = sitk.GetArrayFromImage(image3D)
    return image3D, imgArray

如果想要获取某个切片slice_index的详细信息：

slice_index = 1 # for example
key = reader.GetMetaDataKeys(slice_index)
print(key)
for key in reader.GetMetaDataKeys(slice_index):
    value = reader.GetMetaData(slice_index,key)
    print("({0}) = = \"{1}\"".format(key, value))

pydicom

（1）如果使用pydicom读取目录中的多个文件（它们都是DICOM文件），不能直接：

dicom.read_dicomdir(CT_files_dir)

这样会报错：

PermissionError: [Errno 13] Permission denied:

参考：https://github.com/pydicom/pydicom/issues/322

dirname = '/some/path'
files = os.listdir(dirname)
ds_list = [dicom.read_file(os.path.join(dirname, filename)) for filename in files]

（2）如果缺失提示meta information，读取时使用“,force=True”：

ds_list = [dicom.read_file(os.path.join(dirname, filename), force=True) for filename in files]

（3）如果有的图像缺失部分meta info，则无法获取ImagePositionPatient。

此时可以注释掉第二行#slices.sort(key = lambda x: float(x.ImagePositionPatient[2]))：

def load_scan(path):
    slices = [pydicom.read_file(path + '/' + s,force=True) for s in os.listdir(path)]
    #slices.sort(key = lambda x: float(x.ImagePositionPatient[2]))
    try:
        slice_thickness = np.abs(slices[0].ImagePositionPatient[2] - slices[1].ImagePositionPatient[2])
    except:
        slice_thickness = np.abs(slices[0].SliceLocation - slices[1].SliceLocation)
    for s in slices:
        s.SliceThickness = slice_thickness     
    return slices

（三）预处理

（1）将dcm图像的值转换为HU值

CT扫描的测量单位是亨斯菲尔德单位(HU)，这是一种辐射密度的测量。CT扫描仪经过仔细校准，以准确地测量这一点。

HU（Hounsfiled Unit）值，反映了组织对X射线吸收程度。以水的吸收程度作为参考，即水的HU=0，衰减系数大于水的为正值，小于水的为负值。并以骨皮质和空气的HU值为上限和下限。

为了站在医生的角度看问题，所以必须将dcm图像的值转换为HU值。

def get_pixels_hu(slices):
    image = np.stack([s.pixel_array for s in slices])
    # Convert to int16 (from sometimes int16), 
    # should be possible as values should always be low enough (<32k)
    image = image.astype(np.int16)

    # Set outside-of-scan pixels to 0
    # The intercept is usually -1024, so air is approximately 0
    image[image == -2000] = 0
    
    # Convert to Hounsfield units (HU)
    for slice_number in range(len(slices)):
        
        intercept = slices[slice_number].RescaleIntercept
        slope = slices[slice_number].RescaleSlope
        
        if slope != 1:
            image[slice_number] = slope * image[slice_number].astype(np.float64)
            image[slice_number] = image[slice_number].astype(np.int16)
            
        image[slice_number] += np.int16(intercept)
    
    return np.array(image, dtype=np.int16)

因为我安装之后无法成功import gdcm，所以采用迂回战术，即同时使用SimpleITK和pydicom读取的图像sitkarray和slices：

def get_pixels_hu(sitkarray, slices):
    #sys.exit(0)
    print(np.array(slices).shape) #maybe different from 'num'
    [num,w,h] = sitkarray.shape
    #print(sitkarray[0])
    image = np.stack([sitkarray[i] for i in range(num)])
    # Convert to int16 (from sometimes int16), 
    # should be possible as values should always be low enough (<32k)
    image = image.astype(np.int16)
    # Set outside-of-scan pixels to 0
    # The intercept is usually -1024, so air is approximately 0
    image[image == -2000] = 0
    # Convert to Hounsfield units (HU)
    for slice_number in range(num): 
        #print(slices[slice_number].RescaleIntercept,slice_number)  
        intercept = slices[slice_number].RescaleIntercept
        slope = slices[slice_number].RescaleSlope 
        if slope != 1:
            image[slice_number] = slope * image[slice_number].astype(np.float64)
            image[slice_number] = image[slice_number].astype(np.int16)    
        image[slice_number] += np.int16(intercept)
    return np.array(image, dtype=np.int16)

（2）重采样

扫描的像素间距如果是[2.5,0.5,0.5]，这表示切片之间的距离为2.5毫米。对于不同的扫描，这项参数可能是不同的。

处理这一问题的一种常见方法是将整个数据集重新采样到一定的各向同性分辨率。

重采样到一个同构分辨率1mm *1mm *1mm，以消除扫描仪分辨率的差异：

def resample(sitk_img3D, image, scan, new_spacing=[1,1,1]):
    # Determine current pixel spacing
    #spacing = np.array(sitk_img3D.GetSpacing())
    spacing = np.array([scan[0].SliceThickness] + scan[0].PixelSpacing, dtype=np.float32) 
    #print(spacing)
    #spacing = np.array([scan[0].SliceThickness] + scan[0].PixelSpacing, dtype=np.float32)
    resize_factor = spacing / new_spacing
    new_real_shape = image.shape * resize_factor
    new_shape = np.round(new_real_shape)
    real_resize_factor = new_shape / image.shape
    new_spacing = spacing / real_resize_factor
    image = scipy.ndimage.interpolation.zoom(image, real_resize_factor, mode='nearest')
    return image, new_spacing

（3）归一化

两个优点：

1）加快了梯度下降求最优解的速度；

2）有可能提高精度。

MIN_BOUND = -1000.0
MAX_BOUND = 400.0

def normalize(image):
    image = (image - MIN_BOUND) / (MAX_BOUND - MIN_BOUND)
    image[image>1] = 1.
    image[image<0] = 0.
    return image

（4）中心化（零均值化）

意义：数据中心化和标准化在回归分析中是取消由于量纲不同、自身变异或者数值相差较大所引起的误差。
原理：数据标准化：是指数值减去均值，再除以标准差；

数据中心化：是指变量减去它的均值。

目的：通过中心化和标准化处理，得到均值为0，标准差为1的服从标准正态分布的数据。

PIXEL_MEAN = 0.25 # To determine this mean you simply average all images in the whole dataset. If that sounds like a lot of work, we found this to be around 0.25 in the LUNA16 competition.
def zero_center(image):
    image = image - PIXEL_MEAN
    return image

函数调用：

sitk_img3D, sitk_img = get_img_array(directorypath)
dicom_img = load_scan(directorypath)
img_pixels = get_pixels_hu(sitk_img, dicom_img)
pix_resampled, spacing = resample(sitk_img3D, img_pixels, dicom_img, [1,1,1])

参考：

https://www.cnblogs.com/jxblog/p/12010354.html

https://github.com/pydicom/pydicom/issues/322

https://www.jianshu.com/p/f98635abac65

https://www.kaggle.com/gzuidhof/full-preprocessing-tutorial

https://blog.csdn.net/weixin_37536336/article/details/109386431

https://blog.csdn.net/csdnxiekai/article/details/109448128

本文链接：https://blog.csdn.net/MIKASA3/article/details/113063418

原作者删帖不实内容删帖广告或垃圾文章投诉

智能推荐

机器学习模型评分总结（sklearn）_model.score-程序员宅基地

文章浏览阅读1.5w次，点赞10次，收藏129次。文章目录目录模型评估评价指标1.分类评价指标acc、recall、F1、混淆矩阵、分类综合报告1.准确率方式一：accuracy_score方式二：metrics2.召回率3.F1分数4.混淆矩阵5.分类报告6.kappa scoreROC1.ROC计算2.ROC曲线3.具体实例2.回归评价指标3.聚类评价指标1.Adjusted Rand index 调整兰德系数2.Mutual Informa..._model.score

Apache虚拟主机配置mod_jk_apache mod_jk 虚拟-程序员宅基地

文章浏览阅读344次。因工作需要，在Apache上使用，重新学习配置mod_jk1. 分别安装Apache和Tomcat：2. 编辑httpd-vhosts.conf： LoadModule jk_module modules/mod_jk.so #加载mod_jk模块 JkWorkersFile conf/workers.properties #添加worker信息 JkLogFil_apache mod_jk 虚拟

Android ConstraintLayout2.0 过度动画MotionLayout MotionScene3_android onoffsetchanged-程序员宅基地

文章浏览阅读335次。待老夫kotlin大成,扩展:MotionLayout 与 CoordinatorLayout,DrawerLayout,ViewPager 的交互众所周知,MotionLayout 的动画是有完成度的即Progress ,他在0-1之间变化,一.CoordinatorLayout 与AppBarLayout 交互时,其实就是监听 offsetliner 这个偏移量的变化同样..._android onoffsetchanged

【转】多核处理器的工作原理及优缺点_多核处理器怎么工作-程序员宅基地

文章浏览阅读8.3k次，点赞3次，收藏19次。【转】多核处理器的工作原理及优缺点《处理器关于多核概念与区别多核处理器工作原理及优缺点》原文传送门　　摘要：目前关于处理器的单核、双核和多核已经得到了普遍的运用，今天我们主要说说关于多核处理器的一些相关概念，它的工作与那里以及优缺点而展开的分析。1、多核处理器　　多核处理器是指在一枚处理器中集成两个或多个完整的计算引擎（内核），此时处理器能支持系统总线上的多个处理器，由总..._多核处理器怎么工作

个人小结---eclipse/myeclipse配置lombok_eclispe每次运行个新项目都需要重新配置lombok吗-程序员宅基地

文章浏览阅读306次。1. eclipse配置lombok 拷贝lombok.jar到eclipse.ini同级文件夹下，编辑eclipse.ini文件，添加: -javaagent:lombok.jar2. myeclipse配置lombok myeclipse像eclipse配置后，定义对象后，直接访问方法，可能会出现飘红的报错。如果出现报错，可按照以下方式解决。 ..._eclispe每次运行个新项目都需要重新配置lombok吗

【最新实用版】Python批量将pdf文本提取并存储到txt文件中_python批量读取文字并批量保存-程序员宅基地

文章浏览阅读1.2w次，点赞31次，收藏126次。#注意：笔者在2021/11/11当天调试过这个代码是可用的，由于pdfminer版本的更新，网络上大多数的语法没有更新，我也是找了好久的文章才修正了我的代码，仅供学习参考。1、把pdf文件移动到本代码文件的同一个目录下，笔者是在pycharm里面运行的项目，下图中的x1文件夹存储了我需要转换成文本文件的所有pdf文件。然后要在此目录下创建一个存放转换后的txt文件的文件夹，如图中的txt文件夹。2、编写代码（1）导入所需库# coding:utf-8import ..._python批量读取文字并批量保存

随便推点

vite build-程序员宅基地

文章浏览阅读4.3k次。vite在开发阶段采用的是按需加载的方式，不会将所有文件打包。但是生产环境的部署是需要进行打包的，这里它使用的是rollup打包方式。对于代码切割的需求，使用原生动态导入，因此打包后支持新浏览器，对IE的兼容性不是很好，但是可以用对应的polyfill解决。使用esbuild来处理需要pre-undle的在cli.ts的build命令中引入build.ts调用doBuild方法，在这个方法中配置打包参数（input output plugin等）调用buildHtmlPlugin解析文件入口in_vite build