TY - GEN
T1 - Adaptive Region-Oriented Masked Vision Retentive Network for Predicting Macrovascular Invasion in Hepatocellular Carcinoma
AU - Takahashi, Kengo
AU - Inamori, Ryusei
AU - Ichiji, Kei
AU - Zhang, Zhang
AU - Zeng, Yuwen
AU - Homma, Noriyasu
N1 - Publisher Copyright:
© 2025 SPIE.
PY - 2025
Y1 - 2025
N2 - The aim of the present study was to develop the Adaptive Region-Oriented Masked Vision Retentive Network (AROMA ViR) model, which can efficiently learn the morphological structures of the liver, to predict macrovascular invasion (MI) in hepatocellular carcinoma (HCC). Retrospective CT images were obtained from the University of Texas MD Anderson Cancer Center, in accordance with The Cancer Imaging Archive data usage policy and restrictions. The image dataset comprised 51,968 slices taken during the arterial phase in 105 patients with HCC. We split the data patient-wise into training, validation, and test datasets in a 6:2:2 ratio after applying specific exclusion criteria. The AROMA ViR was designed to enhance the relevant areas in the retention map by incorporating spatial information for liver parenchyma, tumor, and portal vein. The model applied causal masks specialized for specific liver shapes for each slice image into retention encoders. We compared the proposed model with Residual Network 101, Vision Transformer, and Vision Retentive Network. We calculated the area under the receiver-operating characteristic curve (AUC-ROC) and that under the precision recall curve (AUC-PR). We also obtained accuracy, sensitivity, specificity, and F1-score using Youden’s index. AROMA ViR pretrained by ImageNet showed AUC-ROC of 0.860, AUC-PR of 0.790, accuracy of 0.853, sensitivity of 0.719, specificity of 0.902, and F1 score of 0.578.
AB - The aim of the present study was to develop the Adaptive Region-Oriented Masked Vision Retentive Network (AROMA ViR) model, which can efficiently learn the morphological structures of the liver, to predict macrovascular invasion (MI) in hepatocellular carcinoma (HCC). Retrospective CT images were obtained from the University of Texas MD Anderson Cancer Center, in accordance with The Cancer Imaging Archive data usage policy and restrictions. The image dataset comprised 51,968 slices taken during the arterial phase in 105 patients with HCC. We split the data patient-wise into training, validation, and test datasets in a 6:2:2 ratio after applying specific exclusion criteria. The AROMA ViR was designed to enhance the relevant areas in the retention map by incorporating spatial information for liver parenchyma, tumor, and portal vein. The model applied causal masks specialized for specific liver shapes for each slice image into retention encoders. We compared the proposed model with Residual Network 101, Vision Transformer, and Vision Retentive Network. We calculated the area under the receiver-operating characteristic curve (AUC-ROC) and that under the precision recall curve (AUC-PR). We also obtained accuracy, sensitivity, specificity, and F1-score using Youden’s index. AROMA ViR pretrained by ImageNet showed AUC-ROC of 0.860, AUC-PR of 0.790, accuracy of 0.853, sensitivity of 0.719, specificity of 0.902, and F1 score of 0.578.
KW - causal mask
KW - computed tomography
KW - convolutional neural network
KW - hepatocellular carcinoma
KW - macrovascular invasion
KW - promising mask
KW - retentive network
KW - vision transformer
UR - http://www.scopus.com/inward/record.url?scp=105004409890&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=105004409890&partnerID=8YFLogxK
U2 - 10.1117/12.3045625
DO - 10.1117/12.3045625
M3 - Conference contribution
AN - SCOPUS:105004409890
T3 - Progress in Biomedical Optics and Imaging - Proceedings of SPIE
BT - Medical Imaging 2025
A2 - Astley, Susan M.
A2 - Wismuller, Axel
PB - SPIE
T2 - Medical Imaging 2025: Computer-Aided Diagnosis
Y2 - 17 February 2025 through 20 February 2025
ER -