Deep Delta Vision Mamba: A Lightweight State Space Architecture with Deep Delta Learning for Efficient Remote Sensing

Abstract

Real-time land cover classification on autonomous satellites requires models that are accurate, lightweight, and computationally efficient within strict hardware constraints. Vision Transformers and convolutional neural networks reach state-of-the-art results on benchmarks, but their quadratic self-attention cost and large numbers of parameters make them unsuitable for edge computing. We propose Deep Delta Vision Mamba (DDV-Mamba), a hierarchical model consisting of two principled components. First, we extend the Deep Delta operator from one-dimensional to two-dimensional feature maps: each DDV block chooses to erase redundant spectral data along a learned projection direction and write discriminative data through an SSM-gated pathway. Second, an SSM-inspired gated aggregation module substitutes self-attention with depthwise convolution and channel-wise gating, restoring global context at linear rather than quadratic complexity. Assessed on EuroSAT, DDV-Mamba reaches 96.95% accuracy with 5.08 M parameters at 510 frames per second.

Key Methodologies & Contributions

Code & Resources