1 National University of Singapore 2 NUS Guangzhou Research Translation and Innovation Institute
3 Nanjing University 4 Peking University 5 Shanghai Jiao Tong University
* denotes equal contribution † corresponding author
Garment folding is a common yet challenging task in robotic manipulation. The deformability of garments leads to a vast state space and complex dynamics and complicates precise fine-grained manipulation. Previous approaches often rely on predefined key points or demonstrations, constraining their generalizability across diverse garment categories. This paper presents a framework, MetaFold, that disentangles task planning from action prediction, learning each independently to enhance model generalization. It employs language-guided point cloud trajectory generation for task planning and a low-level foundation model for action prediction. This structure facilitates multi-category learning, enabling the model to adapt flexibly to various user instructions and folding tasks. Experimental results demonstrate our proposed framework's superiority.
@misc{chen2025metafoldlanguageguidedmulticategorygarment,
title={MetaFold: Language-Guided Multi-Category Garment Folding Framework via Trajectory Generation and Foundation Model},
author={Haonan Chen and Junxiao Li and Ruihai Wu and Yiwei Liu and Yiwen Hou and Zhixuan Xu and Jingxiang Guo and Chongkai Gao and Zhenyu Wei and Shensi Xu and Jiaqi Huang and Lin Shao},
year={2025},
eprint={2503.08372},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2503.08372},
}