CVPR | Vision and Learning Lab @ UAlberta

BOOTPLACE: Bootstrapped Object Placement with Detection Transformers

In this paper, we tackle the copy-paste image-to-image composition problem with a focus on object placement learning. Prior methods have leveraged generative models to reduce the reliance for dense supervision. However, this often limits their …

CASAGPT: Cuboid Arrangement and Scene Assembly for Interior Design

We present a novel approach for indoor scene synthesis, which learns to arrange decomposed cuboid primitives to represent 3D objects within a scene. Unlike conventional methods that use bounding boxes to determine the placement and scale of 3D …

MoMask: Generative Masked Modeling of 3D Human Motions

We introduce MoMask, a novel masked modeling framework for text-driven 3D human motion generation. In MoMask, a hierarchical quantization scheme is employed to represent human motion as multi-layer discrete motion tokens with high-fidelity details. …

Segment Anything Is Not Always Perfect: An Investigation of SAM on Different Real-world Applications

Recently, Meta AI Research approaches a general, promptable Segment Anything Model (SAM) pre-trained on an unprecedentedly large segmentation dataset (SA-1B). Without a doubt, the emergence of SAM will yield significant benefits for a wide array of …

Multispectral Video Semantic Segmentation: A Benchmark Dataset and Baseline

Robust and reliable semantic segmentation in complex scenes is crucial for many real-life applications such as autonomous safe driving and nighttime rescue. In most approaches, it is typical to make use of RGB images as input. They however work well …

Calibrated RGB-D Salient Object Detection

This paper systematically addresses the depth-related side effects via the designed calibration strategy towards boosting saliency detection accuracy.

Learning Calibrated Medical Image Segmentation via Multi-rater Agreement Modeling

This paper proposes a principled research investigation on exploiting the rich agreement information among multiple raters for improving the calibrated performance.

FALCONS: FAst Learner-grader for CONtorted poses in Sports

Isn't it about time to help judges with the challenging task of evaluating athletes' performances in sports with extreme poses? To tackle this problem and inspired by human judges' grading schema, we propose a virtual refereeing network to evaluate …

Towards Natural and Accurate Future Motion Prediction of Humans and Animals

A hierarchical recurrent network structure is developed to simultaneously encodes local contexts of individual frames and global contexts of the sequence.