Advertisement

NVIDIA Edify 3D: Scalable High-Quality 3D Asset Generation

it can efficiently generate 3D assets based on text prompts or reference images.

Research findings

It visualizes PBR rendering, basic albedo colors, and surface normals, further verifying its high-quality generation effects.

A full backpack with hanging space tools. 

A phonograph made of wood and gold. 

An orange factory robot arm. 

Edify 3D topology and application

Quadrilateral mesh topology

with adaptive and orderly topological structures. This design makes the model more convenient for editing and rendering, while seamlessly integrating into 3D workflows, providing high visual fidelity and flexibility. Thanks to this feature, users can easily adjust and optimize the model.

Application scenarios: 3D scene generation

demands.

Generation pipeline

  1. : Starting from a text description, the system generates multi-view RGB images of the specified object through a multi-view diffusion model.
  2. synthesis to produce corresponding surface normal maps.
  3. : The reconstruction model combines RGB images and normal maps to predict neural 3D representations and generate latent space representations (Latent Tokens). Subsequently, isosurface extraction and mesh post-processing are used to generate the object's geometry.
  4. are upscaled and reprojected onto the texture map to form high-quality textures.

Multi-view diffusion model

. Given a text prompt and camera direction, these models can generate the appearance of objects from multiple angles. The main variants of the model include:

  1. : Generates the RGB appearance of the object.
  2. : Generates surface normals based on RGB images and text.
  3. : Generates high-resolution outputs based on textures and surface normals.

supports cross-view attention mechanisms, while lightweight MLPs encode camera poses and embed them along the timeline.

  • : By increasing the number of training views, the model generates more natural and consistent images. During inference, the model can arbitrarily sample view angles, ensuring multi-view consistency and thereby enhancing the coverage and quality of downstream 3D reconstruction.

Reconstruction model

multi-view images to generate 3D mesh geometries, textures, and material maps, showing strong generalization capabilities for unseen objects (including 2D outputs synthesized by diffusion models).

  1. : The reconstruction model predicts latent triplane representations based on RGB and normal maps.
  2. to generate PBR properties.
  3. : Converts the neural SDF into a 3D mesh through isosurface extraction.
  4. : Includes quadrilateral mesh optimization, UV mapping, and PBR property baking, ultimately generating editable, design-ready high-quality assets.

Model performance and scalability

  • : As the number of input views increases, the performance of the reconstruction model significantly improves. This scalability allows the model to generate higher-precision results when provided with more training views.
  • : Increasing the size of the triplane markers also helps improve reconstruction quality, demonstrating the model's adaptability to different computational resources.

provides a complete solution from text to 3D assets, with precise and natural results widely applicable to artistic design and 3D development fields.

Trial

https://build.nvidia.com/shutterstock/edify-3d

The figure below shows 3D assets created using the Shutterstock 3D AI Generator*, rendered and arranged as tiled image effects. The images are provided by Shutterstock.