Mesh Silksong

Abstract

We introduce Mesh Silksong, a compact and efficient mesh representation tailored to generate the polygon mesh in an auto-regressive manner akin to silk weaving. Existing mesh tokenization methods always produce token sequences with repeated vertex tokens, wasting the network capability. Therefore, our approach tokenizes mesh vertices by accessing each mesh vertice only once, reduces the token sequence’s redundancy by 50%, and achieves a state-of-the-art compression rate of approximately 22%. Furthermore, Mesh Silksong produces polygon meshes with superior geometric properties, including manifold topology, watertight detection, and consistent face normals, which are critical for practical applications. Experimental results demonstrate the effectiveness of our approach, showcasing not only intricate mesh generation but also significantly improved geometric integrity.

Algorithm Pipeline

1. Vertices Layering and Sorting

The mesh vertex for layer \( L\) with order \( i\) is denoted as \( \mathcal{V}_i^L\). Given start half-edge \(\mathcal{V}_1^0\)-\(\mathcal{V}_1^1\), the vertices' layer are easily obtained based on the graph distance to \(\mathcal{V}_1^0\) via BFS algorithm. After layering, the vertices' order in layer \( L+1\) are obtained based on local order of vertices in layer \( L\). The vertices' order of first layer can be obtained based on the start half-edge, and since the vertices' order of layer \( L+1\) can be derived from layer \( L\), the remaining vertices' order can be determined through mathematical induction.

2. Layer Adjacency Matrices Compression

All topological connections can be represented by two type of 0-1 matrices: the self-layer matrix \( \mathcal{S}_L\) and the between-layer matrix \( \mathcal{B}_L\). Suppose layer \( L\) contains \( M\) vertices and layer \( L-1\) contains \( N\) vertices. The \( \mathcal{S}_L \) with shape \( M \times M\) represents connections between vertices within layer \( L\), while \( \mathcal{B}_L \) with shape \( M \times N\) represents the connections between vertices in layer \( L\) and layer \( L-1\). We use two different algorithms to compress these matrices to tokens efficiently.

3. Token Packing

For a mesh with \( N_C\) connected components, the above algorithm will be executed \( N_C\) times. To obtain the final token representation of a mesh, we use special tokens "C" and "E" to distinguish different connected components, and token "U" to separate different layers within the same connected component. A vertex's coordinate token and its corresponding matrices tokens are packed together to facilitate better pattern learning by the autoregressive model.

Comparation

Qualitative Comparison on EdgeRunner*, TreeMeshGPT, BPT, DeepMesh and our method. The * denotes faithful training on the same dataset as ours. The connected components of our generated meshes are colored since our model is connected component aware based on token "C" and can capture details for minute connected components.

(1) For tree-traversal methods EdgeRunner* and TreeMeshGPT, the manifold topology for generated mesh is guaranteed, while our method demonstrates more robust generation capability and generates more details.

(2) We achieved comparable visual results compared to local patch methods BPT and DeepMesh. However, these methods can not generate meshes with manifold topology, which hinders practical application. The non-manifold edges are colored with red for BPT and DeepMesh.

Acknowledgement

We especially thank Math Magic 's strong support for this work.

Citation

@misc{song2025meshsilksongautoregressivemesh,
      title={Mesh Silksong: Auto-Regressive Mesh Generation as Weaving Silk}, 
      author={Gaochao Song and Zibo Zhao and Haohan Weng and Jingbo Zeng and Rongfei Jia and Shenghua Gao},
      year={2025},
      eprint={2507.02477},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2507.02477}, 
}

Mesh Silksong: Auto-Regressive Mesh Generation as Weaving Silk

Meshes generated by Mesh Silksong. Vertices are colored based on different connected component of mesh. Our method is connected component aware based on special tokens when generating, which helps capture small but critical parts of mesh.