Yuncong Feng, Yeming Cong, Shuaijie Xing, Hairui Wang, Zihang Ren, Xiaoli Zhang
Transformer-based networks are becoming indispensable in the field of medical image segmentation. However, most Transformer-based methods overlook the impact of different scale features on encoding efficiency and neglect the fusion and further processing of multi-scale features. Due to the lack of learning different scale features, the reduction of noise interference is not effectively achieved. Consequently, there is an inability to distinctly differentiate between the target segmented area and the surrounding tissues, leading to subpar segmentation results. This issue is particularly pronounced when dealing with multiple segmented areas, where the edges of various organs and tissues cannot be well identified. To enhance learning diversity, we propose the Global Context Transformer (GCFormer), a medical image segmentation network that combines Transformers with CNN. In our proposed network, a novel multi-scale feature processing mechanism is adopted to reasonably encode and decode features at different scales, thereby improving segmentation efficiency. We employ the Global Token Generator (GTG) module to filter and partition multi-scale features, extracting useful information. The encoder incorporates the Pass Down module to fuse multi-scale information, while the decoder efficiently concatenates different-scale features using the Cat module. Experimental results indicate that our proposed algorithm surpasses other mainstream methods in terms of segmentation capability.