We introduce MoMask, a novel masked modeling framework for text-driven 3D human motion generation. In MoMask, a hierarchical quantization scheme is employed to represent human motion as multi-layer discrete motion tokens with high-fidelity details. …
Recently, Meta AI Research approaches a general, promptable Segment Anything Model (SAM) pre-trained on an unprecedentedly large segmentation dataset (SA-1B). Without a doubt, the emergence of SAM will yield significant benefits for a wide array of …
Robust and reliable semantic segmentation in complex scenes is crucial for many real-life applications such as autonomous safe driving and nighttime rescue. In most approaches, it is typical to make use of RGB images as input. They however work well …
This paper systematically addresses the depth-related side effects via the designed calibration strategy towards boosting saliency detection accuracy.
This paper proposes a principled research investigation on exploiting the rich agreement information among multiple raters for improving the calibrated performance.
Isn't it about time to help judges with the challenging task of evaluating athletes' performances in sports with extreme poses? To tackle this problem and inspired by human judges' grading schema, we propose a virtual refereeing network to evaluate …
A hierarchical recurrent network structure is developed to simultaneously encodes local contexts of individual frames and global contexts of the sequence.