Skip to the content.

Yongzhi

Team Leader, Senior Machine Learning Engineer, and Applied Researcher with 13 years of combined experience across top-tier industry and academia. Proven track record of shipping scalable AI, including five high-impact products for TikTok and CapCut that reached billions of users (2022 ByteDance-Style Award). Formerly the Team Lead for Gaming Scene-AIGC at Tencent (2023 Sole Outstanding Performance). Currently leading the development of a world-class, fully AI-driven moderation system for TikTok Live, leveraging MLLMs, SFT, and RL to achieve super-human accuracy for billions of global users.

Tiktok live content moderation using MLLM

Background:
Our team of 12 engineers was tasked with building an automated AI moderation system to fully replace human reviewers within one year. The scope was massive: we needed to cover 90 different content policies. We faced a major architectural conflict:

  1. OPOM (One Policy One Model): Safe, isolated, but maintenance-heavy.
  2. AIO (All In One): Scalable, efficient, but high technical risk and unproven generalization.

Objective:
The goal was 100% automation before the deadline. Initially, the group voted for OPOM to minimize short-term risk. I had strong reservations because I foresaw scalability issues, but I practiced ‘disagree and commit.’ I aligned with the team’s decision to start with OPOM, while assigning myself to monitor the efficiency metrics.

Development:

  1. Identifying the Bottleneck: While implementing OPOM for the first 2 months, I validated my concerns. The process was incredibly labor-intensive; repetitive feature engineering for each policy meant we would mathematically miss the deadline due to linear scaling costs.
  2. Proposing a Strategic Pilot (The Pivot):
    • I didn’t just argue theoretically. I proposed a hybrid strategy to leadership: Keep 10 engineers on the ‘safe’ OPOM path to ensure coverage for top policies, but allow me to lead a small ‘strike team’ (myself + 1 engineer) to pilot the AIO solution on 20 long-tail policies.
    • This reduced the project risk while allowing me to prove the concept.
  3. Navigating Ambiguity & Perseverance:
    • The first 3 months of AIO were brutal. Progress was slow due to the lack of foundational infrastructure for multi-task learning.
    • Despite the pressure and lack of immediate results, I insisted on building a robust shared architecture rather than quick hacks. I focused on solving the ‘negative transfer’ issues between policies.

Results:

The pipeline of project All-in-one (AIO) AIM

The methodology of project All-in-one (AIO) AIM

Policy decoupling using Multi-head architecture

Policy decoupling using Multi-Lora architecture

Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User’s Casual Sketches

3D scene generation using panoramic RGBD diffusion models

RGBD diffusion

Dense 3D reconstruction and scene understanding for VR headset

Tencent, Canberra Australia
Senior researcher in computer vision, Team lead of Mixed Reality (MR) group

SimpleRecon Structure-aware virtual camera locations
MVS_AULab MVS_AULab
Input video Multi-view stereo reconstruction 3D scene understanding
MVS_AULab MVS_AULab RoomPlan

General plan detection

Input point cloud Plane detection Input point cloud Plane detection
MVS_AULab MVS_AULab RoomPlan RoomPlan

AR (Augmented reality) Cloud

Structure from motion (SFM) of 3D line map

Key contributions:

  1. A novel 3D mapping pipeline.
  2. Multi-view triangulation using Plucker representation.
  3. No Manhattan assumption
High-precision map for AR The pipeline of proposed 3D line mapping approach.
AR_HR_Map line_mapping
Tri Tri
 
3D point cloud BJF_Points1 BJF_Points2
3D line cloud BJF_line_sfm BJF_line_cloud
Reprojections rep_lines_A rep_lines_A

Visual positioning system combining features of point and line

Key contributions:

VPS (Visual Positioning Service) The pipeline of proposed line-based pose verification & refinement.
VPS point_line_vps
I proposed a new approach of pose refinement by combing deep features of points and lines. The 1st contribution is a structure-aware line detector \& descriptor network, which jointly matches lines and junctions locally. The 2nd one is a fused PnPL-based pose estimator combing line-matching, junction-matching and vanishing points. The localization accuracy (within 1m) has been improved from 91\% to 96\% compared with using points only. JcASL_MAT

SuperPoint

I improved the open implementation of SuperPoint which achieve similar performance of the official model. The recall of MagicLeap can achieve 0.42. However, the recalls of pretrained model of TF_SP and PyTorch_SP are both around 0.145. I have improved the recall of PyTorch_SP to 0.41.

3D surface detection from a single view

Multiple 3D surfaces are detected from a single view.

Wrap virtual materials on the 3D surfaces:

   
Surface3D AR_Chinese

Intelligent advertisement placement

Scan-to-BIM

FloorDet

CorDet