TetraDiffusion

TetraDiffusion: Tetrahedral Diffusion Models
for 3D Shape Generation

Nikolai Kalischek ^*,1

Torben Peters ^*,1

Jan Dirk Wegner ²

Konrad Schindler ¹

¹ ETH Zürich ² University of Zürich
^* equal contribution

TetraDiffusion operates on a tetrahedral partitioning of 3D space to enable efficient, high-resolution 3D shape generation. Our model introduces operators for convolution and transpose convolution that act directly on the tetrahedral partition, and seamlessly includes additional attributes such as color. It enables rapid sampling of detailed 3D objects in nearly real-time with unprecedented resolution, works on standard consumer hardware, and delivers superior results.

Abstract

Probabilistic denoising diffusion models (DDMs) have set a new standard for 2D image generation. Extending DDMs for 3D content creation is an active field of research. Here, we propose TetraDiffusion, a diffusion model that operates on a tetrahedral partitioning of 3D space to enable efficient, high-resolution 3D shape generation. Our model introduces operators for convolution and transpose convolution that act directly on the tetrahedral partition, and seamlessly includes additional attributes such as color. Remarkably, TetraDiffusion enables rapid sampling of detailed 3D objects in nearly real-time with unprecedented resolution. It's also adaptable for generating 3D shapes conditioned on 2D images. Compared to existing 3D mesh diffusion techniques, our method is up to 200 times faster in inference speed, works on standard consumer hardware, and delivers superior results.

Method

The project introduces a novel approach to 3D convolution called "Tetra Convolution", designed to work with displacement and signed distance fields in a tetrahedral grid. Unlike regular 3D convolutions that struggle with varying neighborhoods and connectivity, Tetra Convolution uses the tetrahedral A15 lattice structure for collision-free spatial ordering of vertices. This unique method outperforms traditional graph convolutions and eliminates the need for searching points under the kernel as in KPConv. It employs a discrete binning process for local edge orientations to achieve efficient collision-free ordering, resulting in a significantly smaller basis of reference directions compared to 3D voxel convolutions (16 vs 27). The TetraConv layer computes a weighted sum over each vertex and its neighborhood, accounting for varying neighborhood sizes with zero padding. This structure enables flexible down- and upsampling on the tetrahedral grid, essential for convolutional learning. Additionally, the project introduces "Grid Pruning", a technique to efficiently prune the tetrahedral grid, which is typically sparsely populated. This method allows for targeted deletion of unused vertices and connections, leading to optimized memory and speed for training and inference.

Finally, the project presents TetraDiffusion, a process for diffusing features like deformation vectors and color directly in the tetrahedral space. The network integrates standard diffusion model architecture with tetrahedral convolution operators, group normalization, SiLU activations, and attention layers. This approach promises more efficient and effective processing of 3D data, especially in applications involving complex spatial relationships.

	Training	Inference
GET3D [1]	13.3	0.10	11.3	0.83
MeshDiffusion [2]	76.6	0.5	29.2 (22.6^†)	714.3 (526.3^†)
Ours*	12.0	2.8	7.4	3.4
Ours	20.8	1.0	9.7	11.2
Ours_hr*	20.9	1.2	11.7	9.1
Ours_hr	78.2	0.3	42.1	33.3
Memory consumption and computing time of different generative shape models. ^† We implement 16-bit inference in MD for fair comparison. Our methods labelled with a * are pruned versions, highlighting the efficiency boost of our tetrahedral formulation.

Training

Inference

Method

GPU (GB)

Speed (it/s)

GPU (GB)

Speed (s/shape)

GET3D [1]

13.3

0.10

11.3

0.83

MeshDiffusion [2]

76.6

0.5

29.2 (22.6^†)

714.3 (526.3^†)

Ours*

12.0

2.8

7.4

3.4

Ours

20.8

1.0

9.7

11.2

Ours_hr*

20.9

1.2

11.7

9.1

Ours_hr

78.2

0.3

42.1

33.3

Memory consumption and computing time of different generative shape models. ^† We implement 16-bit inference in MD for fair comparison. Our methods labelled with a * are pruned versions, highlighting the efficiency boost of our tetrahedral formulation.

Zoom-ins

Our efficient architecture is capable of generating meshes in unprecedented resolution, capturing intricate details such as chains, fine rims, brake disks and brake handles. Importantly, these details are represented in the geometry of the mesh and not in texture maps. Diffusing per vertex texture information enforces the strong geometry.

Hover over the images to enable magnifying glass or tap the image on mobile.

References

[1] Gao, Jun, et al. "Get3d: A generative model of high quality 3d textured shapes learned from images." Advances In Neural Information Processing Systems 35 (2022): 31841-31854.
[2] Liu, Zhen, et al. "MeshDiffusion: Score-based Generative 3D Mesh Modeling." The Eleventh International Conference on Learning Representations. 2022.

@article{kalischek2023tetradiffusion, title={TetraDiffusion: Tetrahedral Diffusion Models for 3D Shape Generation}, author={Kalischek, Nikolai and Peters, Torben and Wegner, Jan D and Schindler, Konrad}, journal={arXiv preprint arXiv:2211.13220}, year={2023} }

TetraDiffusion: Tetrahedral Diffusion Models
for 3D Shape Generation

tl;dr

Abstract

Method

Additional qualitative results

Interpolations

Zoom-ins

Inner structures

References

Cite