This example visualizes the paper "Depth Pro: Sharp Monocular Metric Depth in Less Than a Second" (arXiv). The example runs inference for each frame in the provided video, and logs the predicted depth map to Rerun.
DepthPro is a fast, zero-shot monocular depth estimation model developed by Apple. It produces highly detailed and sharp depth maps at 2.25 megapixels in just 0.3 seconds on a standard GPU. The model works using a multi-scale vision transformer architecture that captures both global context and fine-grained details, enabling it to accurately predict metric depth without requiring camera intrinsics such as focal length or principal point. Additionally the model is able to predict the focal length of camera used to take the photo, which is also visualized in this example.
This example uses the open-source code and model weights provided by the authors.
This is an external example. Check the repository for more information.
You can try the example on a HuggingFace space here.