Jianhao Zheng, Liyuan Zhu, Zihan Zhu, Iro Armeni
Computer Vision and Pattern Recognition (CVPR) 2026
Paper Coming Soon | Video | Project Page
WildPose is a unified monocular camera pose-estimation framework for in-the-wild videos, including dynamic scenes, static scenes, and low-ego-motion sequences. It combines 3D-aware features from a frozen MASt3R backbone with differentiable dense bundle adjustment and learned motion masks for robust trajectory estimation.
- Add run and evaluation scripts for the Sintel dataset.
- Add scripts for customized videos.
- Release training code.
The following setup follows the working notes in log of installation.md. The tested environment uses Python 3.10, PyTorch 2.5.1, CUDA 12.4 wheels, MASt3R, lietorch, and the local CUDA extension in setup.py.
- Clone the repository and initialize submodules.
git clone --recursive https://github.com/GradientSpaces/WildPose.git
cd WildPose
git submodule update --init --recursive- Create and activate the conda environment.
conda create -n wildpose python=3.10 ninja mkl mkl-include -c conda-forge -y
conda activate wildpose- Install PyTorch and torch-scatter.
pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu124
pip install torch-scatter -f https://data.pyg.org/whl/torch-2.5.0+cu124.html- Install local dependencies.
# Optional: set this to match your GPU architectures before compiling CUDA extensions.
export TORCH_CUDA_ARCH_LIST="7.5;8.6;8.9;9.0"
pip install --no-build-isolation thirdparty/lietorch
pip install --no-build-isolation -e thirdparty/mast3r- Build WildPose's CUDA backend and install Python requirements.
pip install --no-build-isolation .
pip install -r requirements.txt- Check the installation.
python - <<'PY'
import torch
import lietorch
import mast3r
import droid_backends
print("CUDA available:", torch.cuda.is_available())
PYCreate a pretrained/ directory and download both checkpoints from the WildPose Hugging Face repository:
mkdir -p pretrained/
wget https://huggingface.co/gradient-spaces/WildPose/resolve/main/wildpose_v0.pth -P pretrained/
wget https://huggingface.co/gradient-spaces/WildPose/resolve/main/MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric.pth -P pretrained/Download the dynamic benchmark datasets:
bash scripts_downloading/download_dynamic_all.shThis downloads the Wild-SLAM Mocap, Bonn Dynamic, and TUM RGB-D dynamic sequences used by scripts_run/run_dynamic_all.sh into:
datasets/
Wild_SLAM_Mocap/
Bonn/
TUM_RGBD/
Run all dynamic benchmarks:
bash scripts_run/run_dynamic_all.shYou can also run one benchmark group at a time:
bash scripts_run/run_dynamic_all.sh wild_slam_mocap
bash scripts_run/run_dynamic_all.sh bonn
bash scripts_run/run_dynamic_all.sh tum_rgbdResults are saved under the output/ directory. Each benchmark has one folder, and each scene/sequence has its own run folder:
output/
<benchmark>/
<scene>/
cfg.yaml
video.npz
traj/
est_poses_full.txt
metrics_full_traj.txt
metrics_kf_traj.txt
full_traj_2d.png
kf_traj_2d.png
before_final_ba/
metrics_kf_traj.txt
kf_traj_2d.png
For example, a Wild-SLAM Mocap run writes to output/Wild_SLAM_Mocap/crowd/, while a 7-Scenes run writes to output/7scenes/chess/. The estimated full trajectory is stored in TUM format at traj/est_poses_full.txt, and the main pose metrics are summarized in traj/metrics_full_traj.txt.
Download the static benchmark datasets:
bash scripts_downloading/download_static_all.shThis downloads the 7-Scenes and TUM RGB-D static benchmark sequences used by scripts_run/run_static_all.sh into:
datasets/
7-scenes/
TUM_RGBD/
Run all static benchmarks:
bash scripts_run/run_static_all.shYou can also run one benchmark group at a time:
bash scripts_run/run_static_all.sh seven_scenes
bash scripts_run/run_static_all.sh tum_ablationThis repository is initialized from WildGS-SLAM, and the README structure follows its release style. We thank the WildGS-SLAM authors for releasing their codebase.
WildPose also builds on ideas and components from DROID-SLAM, lietorch, MASt3R/DUSt3R, GO-SLAM/GlORIE-SLAM-style factor graph optimization, MoGe, and standard RGB-D trajectory evaluation tools. We thank the authors of these projects for making their work publicly available.
This work is supported by the Center for Integrated Facility Engineering (CIFE) and the Stanford Robotics Center (SRC). We also thank Stanford's Marlowe and Sherlock clusters for providing GPU computing resources for model training and evaluation.
If you find this code or paper useful, please cite:
@inproceedings{Zheng2026WildPose,
author = {Zheng, Jianhao and Zhu, Liyuan and Zhu, Zihan and Armeni, Iro},
title = {WildPose: A Unified Framework for Robust Pose Estimation in the Wild},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2026}
}For questions, comments, and bug reports, please contact Jianhao Zheng.
