NeurIPS

Act to See, See to Act: Diffusion-Driven Perception-Action Interplay for Adaptive Policies

Existing imitation learning methods decouple perception and action, which overlooks the causal reciprocity between sensory representations and action execution that humans naturally leverage for adaptive behaviors. To bridge this gap, we introduce …

Unleashing Multispectral Video's Potential in Semantic Segmentation: A Semi-supervised Viewpoint and New UAV-View Benchmark

DVSOD: RGB-D Video Salient Object Detection

Joint Semantic Mining for Weakly Supervised RGB-D Salient Object Detection

Training saliency detection models with weak supervisions, e.g., image-level tags or captions, is appealing as it removes the costly demand of per-pixel annotations. Despite the rapid progress of RGB-D saliency detection in fully-supervised setting, …