Attention-Enhanced Swin Transformers for Robust Brain Tumor Classification Under Patient-Level Data Splitting

Abstract

Contemporary brain tumor classification systems report accuracies exceeding 98%, yet such metrics are artificially inflated by dataset partitioning flaws. Conventional image-level splitting allows identical patient scans in both training and testing sets, enabling networks to memorize patient-specific features rather than tumor patterns. We enforce patient-level separation where each patient remains in a single partition. Evaluation of five architectures reveals degradations reaching 3.71% under patient-level splitting, validating widespread data leakage. We propose an Attention-Enhanced Swin Transformer integrating hierarchical windowed attention with Convolutional Block Attention Modules. Our architecture achieves 96.82% accuracy with 1.23% degradation—the smallest gap among methods. Gradient-weighted activation mapping confirms attention on tumor regions rather than anatomical artifacts, establishing reliable feature extraction for trustworthy medical AI.

Key Methodologies & Contributions

Code & Resources