We introduce Mesh Silksong, a compact and efficient mesh representation tailored to generate the polygon mesh in an auto-regressive manner akin to silk weaving. Existing mesh tokenization methods always produce token sequences with repeated vertex tokens, wasting the network capability. Therefore, our approach tokenizes mesh vertices by accessing each mesh vertice only once, reduces the token sequence’s redundancy by 50%, and achieves a state-of-the-art compression rate of approximately 22%. Furthermore, Mesh Silksong produces polygon meshes with superior geometric properties, including manifold topology, watertight detection, and consistent face normals, which are critical for practical applications. Experimental results demonstrate the effectiveness of our approach, showcasing not only intricate mesh generation but also significantly improved geometric integrity.
Mesh generated conditioned on point cloud. Left mouse: change view; Right mouse: move mesh.
The mesh vertex for layer \( L\) with order \( i\) is denoted as \( \mathcal{V}_i^L\). Given start half-edge \(\mathcal{V}_1^0\)-\(\mathcal{V}_1^1\), the vertices' layer are easily obtained based on the graph distance to \(\mathcal{V}_1^0\) via BFS algorithm. After layering, the vertices' order in layer \( L+1\) are obtained based on local order of vertices in layer \( L\). The vertices' order of first layer can be obtained based on the start half-edge, and since the vertices' order of layer \( L+1\) can be derived from layer \( L\), the remaining vertices' order can be determined through mathematical induction.
All topological connections can be represented by two type of 0-1 matrices: the self-layer matrix \( \mathcal{S}_L\) and the between-layer matrix \( \mathcal{B}_L\). Suppose layer \( L\) contains \( M\) vertices and layer \( L-1\) contains \( N\) vertices. The \( \mathcal{S}_L \) with shape \( M \times M\) represents connections between vertices within layer \( L\), while \( \mathcal{B}_L \) with shape \( M \times N\) represents the connections between vertices in layer \( L\) and layer \( L-1\). We use two different algorithms to compress these matrices to tokens efficiently.
For a mesh with \( N_C\) connected components, the above algorithm will be executed \( N_C\) times. To obtain the final token representation of a mesh, we use special tokens "C" and "E" to distinguish different connected components, and token "U" to separate different layers within the same connected component. A vertex's coordinate token and its corresponding matrices tokens are packed together to facilitate better pattern learning by the autoregressive model.
Qualitative Comparison on EdgeRunner*, TreeMeshGPT, BPT, DeepMesh and our method. The * denotes faithful training on the same dataset as ours. The connected components of our generated meshes are colored since our model is connected component aware based on token "C" and can capture details for minute connected components.
(1) For tree-traversal methods EdgeRunner* and TreeMeshGPT, the manifold topology for generated mesh is guaranteed, while our method demonstrates more robust generation capability and generates more details.
(2) We achieved comparable visual results compared to local patch methods BPT and DeepMesh. However, these methods can not generate meshes with manifold topology, which hinders practical application. The non-manifold edges are colored with red for BPT and DeepMesh.
@misc{song2025meshsilksongautoregressivemesh, title={Mesh Silksong: Auto-Regressive Mesh Generation as Weaving Silk}, author={Gaochao Song and Zibo Zhao and Haohan Weng and Jingbo Zeng and Rongfei Jia and Shenghua Gao}, year={2025}, eprint={2507.02477}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2507.02477}, }