EgoAvatar, Whole-Body 3D Gaussian Avatar, and GaussianBody
Multi-Person Text-to-Motion Synthesis
Autoregressive Image Generation without Vector Quantization_Hang
MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild
IMAGEBIND: One Embedding Space To Bind Them All
Telling Left from Right: Identifying Geometry-AwareSemantic Correspondence
Achieving Human Level Competitive Robot Table Tennis
Egocentric Whole-Body Motion Capture with FisheyeViT and Diffusion-Based Motion Refinement
WalkTheDog: Cross-Morphology Motion Alignment via Phase Manifolds
Improving Semantic Correspondence with Viewpoint Guided Spherical Maps
Animate Anyone & MusePose
Deep Marching Tetrahedra: a Hybrid Representation for High-Resolution 3D Shape Synthesis
Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation
Language Model Beats Diffusion– Tokenizer is Key to Visual Generation
Learning Physically Simulated Tennis Skills from Broadcast Videos
SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation
Introduction to Sora
Mocap Everyone Everywhere: Lightweight Motion Capture With Smartwatches and a Head-Mounted Camera
Reasoning with Foundation Models
SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments
Animal3D: A Comprehensive Dataset of 3D Animal Pose and Shape
DiffMesh: A Motion-aware Diffusion-like Framework for Human Mesh Recovery from Videos
Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset
When it comes to HUGE
Masked Autoencoders Are Scalable Vision Learners
Non-local Neural Networks, CVPR 2018
Transform and Tell: Entity-Aware News Image Captioning, CVPR 2020
DGPose: Deep Generative Models for Human Body Analysis, IJCV 2020
Learning to Estimate 3D Human Pose and Shape from a Single Color Image, CVPR 2018
Video Object Segmentation with Episodic Graph Memory Networks, ECCV 2020
MonoPerfCap: Human Performance Capture from Monocular Video, TOG 2018 & LiveCap: Real-time Human Performance Capture from Monocular Video, TOG 2019
Listen to Look: Action Recognition by Previewing Audio, CVPR 2020
EventCap: Monocular 3D Capture of High-Speed Human Motions using an Event Camera, CVPR 2020
DeepCap: Monocular Human Performance Capture Using Weak Supervision, CVPR 2020
Long-term Human Motion Prediction with Scene Context, ECCV 2020
Deep Multi-View Learning via Task-Optimal CCA, ICLR 2020
Learning from Demonstration in the Wild, ICRA 2019
Structured Prediction Helps 3D Human Motion Modelling, ICCV 2019
Generative Adversarial Minority Oversampling, ICCV 2019
Tracking by Instance Detection: A Meta-Learning Approach, CVPR 2020
RigNet: Neural Rigging for Articulated Characters, SIGGRAPH 2020
D3S – A Discriminative Single Shot Segmentation Tracker, CVPR 2020
4D Association Graph for Realtime Multi-person Motion Capture Using Multiple Video Cameras, CVPR 2020
Deformation-aware Unpaired Image Translation for Pose Estimation on Laboratory Animals, CVPR 2020
Momentum Contrast for Unsupervised Visual Representation Learning, ICLR 2019
VIBE: Video Inference for Human Body Pose and Shape Estimation, CVPR 2019
Timeception for Complex Action Recognition, CVPR 2019