cv

updated on 2025.07.21

Basics

Name Woojin Cho
Label Augmented Reality / Computer Vision Researcher
Email woojin.cho@kaist.ac.kr
Url https://UVR-WJCHO.github.io
Summary I am a researcher with a background in mechanical engineering and a Ph.D. in Culture Technology from KAIST, specializing in augmented reality, computer vision, and machine learning. My work focuses on natural hand-object interactions, leveraging advanced tracking, optimization, and datasets to enhance immersive computing and real-world applications.

Work

  • 2025.03 - Present
    KAIST Post Metaverse Research Center. Post-doctoral Researcher
    Computer Vision / Machine Learning based research for metaverse; Hand-Object Interaction, 3D Avatar Interaction
    • Mixed Reality
    • Real-time Computer Vision
    • 3D Avatar Interaction
  • 2019.08 - 2020.02
    CMU Intensive Program in Artificial Intelligence
    Short-term Project Managing. CMU lectures on Computer Vision / Artificial Intelligence.
    • Computer Vision
    • Artificial Intelligence
  • 2017.03 - 2025.02
    KAIST UVR Lab. Research Assistant
    Several projects on Augmented/Virtual Reality. Working with Computer Vision / Machine Learning; Real-time methods, Optimization, Tracking, Datasets, GAN, GCN, etc. Experience with existing Head-mounted Displays and practical utilization.
    • Augmented Reality
    • 3D Tracking
    • Computer Vision
    • Machine Learning

Publications

  • 2024.10.01
    Dense Hand-Object (HO) GraspNet with Full Grasping Taxonomy and Dynamics
    2024 IEEE European Conference on Computer Vision (ECCV)
    Existing datasets for 3D hand-object interaction are limited either in the data cardinality, data variations in interaction scenarios, or the quality of annotations. In this work, we present a comprehensive new training dataset for hand-object interaction called HOGraspNet. It is the only real dataset that captures full grasp taxonomies, providing grasp annotation and wide intraclass variations. Using grasp taxonomies as atomic actions, their space and time combinatorial can represent complex hand activities around objects. We select 22 rigid objects from the YCB dataset and 8 other compound objects using shape and size taxonomies, ensuring coverage of all hand grasp configurations. The dataset includes diverse hand shapes from 99 participants aged 10 to 74, continuous video frames, and a 1.5M RGB-Depth of sparse frames with annotations. It offers labels for 3D hand and object meshes, 3D keypoints, contact maps, and grasp labels. Accurate hand and object 3D meshes are obtained by fitting the hand parametric model (MANO) and the hand implicit function (HALO) to multi-view RGBD frames, with the MoCap system only for objects. Note that HALO fitting does not require any parameter tuning, enabling scalability to the dataset's size with comparable accuracy to MANO. We evaluate HOGraspNet on relevant tasks: grasp classification and 3D hand pose estimation. The result shows performance variations based on grasp type and object class, indicating the potential importance of the interaction space captured by our dataset. The provided data aims at learning universal shape priors or foundation models for 3D hand-object interaction. Our dataset and code are available at https://hograspnet2024.github.io/.
  • 2024.08.01
    Temporally enhanced graph convolutional network for hand tracking from an egocentric camera
    Virtual Reality
    We propose a robust 3D hand tracking system in various hand action environments, including hand-object interaction, which utilizes a single color image and a previous pose prediction as input. We observe that existing methods deterministically exploit temporal information in motion space, failing to address realistic diverse hand motions. Also, prior methods paid less attention to efficiency as well as robust performance, i.e., the balance issues between time and accuracy. The Temporally Enhanced Graph Convolutional Network (TE-GCN) utilizes a 2-stage framework to encode temporal information adaptively. The system establishes balance by adopting an adaptive GCN, which effectively learns the spatial dependency between hand mesh vertices. Furthermore, the system leverages the previous prediction by estimating the relevance across image features through the attention mechanism. The proposed method achieves state-of-the-art balanced performance on challenging benchmarks and demonstrates robust results on various hand motions in real scenes. Moreover, the hand tracking system is integrated into a recent HMD with an off-loading framework, achieving a real-time framerate while maintaining high performance. Our study improves the usability of a high-performance hand-tracking method, which can be generalized to other algorithms and contributes to the usage of HMD in everyday life. Our code with the HMD project will be available at https://github.com/UVR-WJCHO/TEGCN_on_Hololens2.
  • 2023.10.16
    RC-SMPL: Real-time cumulative SMPL-based avatar body generation
    2023 IEEE International Symposium on Mixed and Augmented Reality (ISMAR)
    We present a novel method for avatar body generation that cumulatively updates the texture and normal map in real-time. Multiple images or videos have been broadly adopted to create detailed 3D human models that capture more realistic user identities in both Augmented Reality (AR) and Virtual Reality (VR) environments. However, this approach has a higher spatiotemporal cost because it requires a complex camera setup and extensive computational resources. For lightweight reconstruction of personalized avatar bodies, we design a system that progressively captures the texture and normal values using a single RGBD camera to generate the widelyaccepted 3D parametric body model, SMPL-X. Quantitatively, our system maintains real-time performance while delivering reconstruction quality comparable to the state-of-the-art method. Moreover, user studies reveal the benefits of real-time avatar creation and its applicability in various collaborative scenarios. By enabling the production of high-fidelity avatars at a lower cost, our method provides more general way to create personalized avatar in AR/VR applications, thereby fostering more expressive self-representation in the metaverse.
  • 2020.11.09
    Bare-hand depth inpainting for 3d tracking of hand interacting with object
    2020 IEEE International Symposium on Mixed and Augmented Reality (ISMAR)
    We propose a 3D hand tracking system using bare-hand depth inpainting from an RGB-depth image for a hand interacting with an object. The effectiveness of most existing hand-object tracking methods is impeded by the insufficiency of data, which do not include hand data occluded by the object, and their reliance on the information inferred from assuming the specific object type. We generate a sufficiently accurate bare-hand depth image from a hand interacting with an object using a conditional generative adversarial network, which is trained using the synthesized 2D silhouettes of the object to learn the morphology of the hand. We evaluate the proposed approach using a hierarchical particle filter-based hand tracker and prove that our approach utilizing the bare-hand tracker in the hand-object interaction dataset achieve state-of-the-art performance. The generalization of our work will enable visual-tactile interaction that is more natural in various wearable augmented reality applications.
  • 2018.10.16
    Tracking an object-grabbing hand using occluded depth reconstruction
    2018 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct)
    We propose a method that is effective in tracking 3D hand poses occluded by a real object. Since existing model-based tracking methods rely only on observed images to estimate hand joints, tracking generally fails when the hand joints are largely invisible. This problem becomes more prevalent when the tracked hand is grabbing an object, as occlusion by the object makes it harder to find a proper correspondence between the hand model and observation. The proposed method utilizes the occluded part of the hand as additional information for model-based tracking. The occluded depth information is reconstructed according to the geometric of the object and model-based tracking is employed based on particle swarm optimization (PSO). We demonstrate that the reconstructed depth information improves the performance of tracking an object-grabbing hand.
  • 2017.10.09
    Boosthand: Distance-free object manipulation system with switchable non-linear mapping for augmented reality classrooms
    2017 IEEE International Symposium on Mixed and Augmented Reality (ISMAR-Adjunct)
    In this paper, we propose BoostHand, a freehand, distance-free object-manipulation system that supports simple trigger gestures using Leap Motion. In AR classrooms, it is necessary to allow both lecturers and students to utilize virtual teaching materials without any spatial restrictions, while handling virtual objects easily, regardless of distance. To provide efficient and accurate methods of handling AR classroom objects, our system requires only simple intuitive freehand gestures to control the users virtual hands in an enlarged, shared control space of users. We modified the GoGo interaction technique [5] by adding simple trigger gestures, and we evaluated performance against gaze-assisted selection (GaS) capabilities. Our proposed system enables both lecturers and students to utilize virtual teaching materials easily from their remote positions.

Projects

Education

  • 2019.03 - 2025.02

    Daejeon, South Korea

    PhD(TBD)
    Korea Advanced Institute of Science and Technology
    Culture Technology
  • 2017.03 - 2019.02

    Daejeon, South Korea

    MS
    Korea Advanced Institute of Science and Technology
    Culture Technology
  • 2013.03 - 2017.02

    Daejeon, South Korea

    BS
    Korea Advanced Institute of Science and Technology
    Mechanical Engineering

Skills

Research Area
3D Tracking
Computer Vision
Augmented Reality
Machine Learning
Dataset acquisition
Programming Language
Python
C#
C++
MATLAB
Unity

Languages

Korean
Native speaker
English
Fluent