MetaFold

MetaFold: Language-Guided Multi-Category Garment Folding Framework via Trajectory Generation and Foundation Model

¹ National University of Singapore ² NUS Guangzhou Research Translation and Innovation Institute
³ Nanjing University ⁴ Peking University ⁵ Shanghai Jiao Tong University
^* denotes equal contribution ^† corresponding author

Abstract

Garment folding is a common yet challenging task in robotic manipulation. The deformability of garments leads to a vast state space and complex dynamics and complicates precise fine-grained manipulation. Previous approaches often rely on predefined key points or demonstrations, constraining their generalizability across diverse garment categories. This paper presents a framework, MetaFold, that disentangles task planning from action prediction, learning each independently to enhance model generalization. It employs language-guided point cloud trajectory generation for task planning and a low-level foundation model for action prediction. This structure facilitates multi-category learning, enabling the model to adapt flexibly to various user instructions and folding tasks. Experimental results demonstrate our proposed framework's superiority.

Pipeline Overview

Overview of MetaFold framework image — **Overview**: The folding trajectory data for clothing is generated using heuristic methods in the DiffClothAI simulation environment, with language descriptions subsequently added **(Green)**. The trajectory generation model takes a point cloud from any given frame and a corresponding language description as inputs to generate the subsequent trajectory **(Orange)**. The generated trajectory is fed into the ManiFoundation model to estimate contact points and force directions, enabling the robot to conduct garment folding actions. This process is then iteratively refined using a feedback loop **(Blue)**.

BibTeX

@misc{chen2025metafoldlanguageguidedmulticategorygarment, title={MetaFold: Language-Guided Multi-Category Garment Folding Framework via Trajectory Generation and Foundation Model}, author={Haonan Chen and Junxiao Li and Ruihai Wu and Yiwei Liu and Yiwen Hou and Zhixuan Xu and Jingxiang Guo and Chongkai Gao and Zhenyu Wei and Shensi Xu and Jiaqi Huang and Lin Shao}, year={2025}, eprint={2503.08372}, archivePrefix={arXiv}, primaryClass={cs.RO}, url={https://arxiv.org/abs/2503.08372}, }