AFUN predictions across diverse scenes. Pick a scene below to see AFUN's prediction for every language query in that scene, side by side. Points inside the predicted affordance mask are highlighted in red, and the trajectory threads from yellow (contact) to blue (end). drag to orbit, scroll to zoom.
Start End
Without any robot-specific finetuning, AFUN predicts a precise functional mask and 3D motion that the robot uses to plan and execute manipulation in the real world. The same model generalizes across object categories, language instructions, and embodiments, suggesting a practical path toward open-world affordance models that unify functionality perception with executable action.
@misc{wang2026afun,
title = {{AFUN}: Towards an Affordance Foundation Model for Functionality Understanding},
author = {Wang, Zhaoning and Zhong, Yi and Fu, Jiawei and Christensen, Henrik I. and Gao, Jun},
year = {2026},
eprint = {2606.02551},
archivePrefix = {arXiv},
primaryClass = {cs.RO},
url = {https://arxiv.org/abs/2606.02551}
}