Using Gaussian Splatting, Meta implements high-fidelity relighting of the head for Codec Avatar

4,480 0

CheckCitation/SourcePlease click:XR Navigation Network

(XR Navigation Network 2023年12月09日）自出世以来便迅速吸引了业界的关注，3D Gaussian Splatting的主要优点是在保证高重建质量的同时支持传统光栅化，而且优化速度快速。

Since the official public introduction of the Codec Avatar project in 2019, which aims to create photo-realistic virtual digital humans,MetaWe have been actively exploring various optimization methods. In a study released a few days ago, the team has begun to use Gaussian Splatting to improve the realism of Avatar, mainly involving heavy lighting.

Relighting Avatar is very challenging. Our visual perception is very sensitive to facial expressions. To convince the visual system, we needed to model each part of the head in sufficient detail and ensure it was consistent with the environment. This synthesis often needs to be performed in real time.

用令人信服的细节实时重新照明可动画人类头部依然是一个重大的挑战。原因有三个。第一个挑战是，人类的头部是由高度复杂和多样化的材质组成，它们会表现出不同的散射和反射特性。例如，皮肤由于微观几何结构和显著的次表面散射而产生复杂的反射，头发由于其半透明的纤维结构而表现出多次反射的面外散射，眼睛则具有多层高反射膜。总的来说，没有单一的材质表示可以准确地表示所有这一切，特别是对于实时情况。

Additionally, accurate tracking and modeling of the underlying geometry in motion is extremely challenging because deformations do not always contain sufficient visual markers for tracking. Finally, real-time requirements severely limit the design of algorithms. Traditionally, increasing realism results in an exponential increase in the cost of transmitting light and tracking motion.

MetaThe goal of Meta is to design a learning framework that can build real-time renderable head avatars with accurate scattering and reflection under any spatial frequency of illumination. Given detailed measurements obtained using light tables, physically based rendering methods can generalize to new lighting. However, it is still important to extend the described method to dynamic performance capture and non-skin parts such as hair and eyeballs.

At the same time, obtaining sufficiently accurate geometry and material parameters is a laborious process. More recently, neural relighting methods sidestep the need for precise geometry and material modeling, using only neural networks and approximate geometry using meshes, volumetric primitives, and neural fields to model the input (i.e., lighting) and output (i.e., output brightness). ) direct relationship between.

Although the correlation results are quite impressive, existing methods suffer from poor performance due to insufficient expressiveness of geometric and appearance representations. In particular, no method achieves full-frequency reflections in hair and eyes, and submillimeter-thin structures like hair strands are often blurred, making hair renderings less photorealistic.

To solve the above problems, Meta puts forward three suggestions:

Driven Avatar based on 3D Gaussian that can efficiently render complex geometric details
Illumable appearance model based on Learned Radiance Transfer, supporting real-time global light transmission and full-frequency reflection
Illumable explicit eye model that, for the first time, separates gaze control from other facial movements and the full frequency of eye reflections in a fully data-driven way.

Using Gaussian Splatting, Meta implements high-fidelity relighting of the head for Codec Avatar

The geometric representation proposed by the researchers is based on 3D Gaussian and can be rendered in real time using Splatting. To achieve a drivable Avatar, the team used a 2D convolutional neural network to decode a 3D Gaussian signal in a shared UV space at the head of the template.

They encode driving signals such as facial expressions in a self-supervised manner similar to traditional Codecs. This enables tracking of a moving head in a temporally coherent manner with complex geometric details such as hair.

In terms of appearance, inspired by precomputed Radiance Transfer, Meta introduces an illuminated appearance model based on learnable Radiance Transfer, where the model is composed of diffuse spherical harmonics and specular spherical Gaussian. Then, learn to parameterize the diffuse Radiance Transfer of each 3D Gaussian with dynamic spherical harmonic coefficients. This transfer preconvolves visibility and global light transfer, including multiple reflections and subsurface scattering.

For specular reflection, the researchers introduced a new spherical Gaussian parameterized function. The described function has view-dependent visibility and can effectively approximate the combined effects of occlusion, Fresnel and geometric attenuation without explicitly estimating the individual contributions.

The specular Gaussian lobe proposed by the team is aligned with the reflection vector and calculated using the view direction and normals associated with each Gaussian view. Best of all, Spherical Gaussian supports full-frequency real-time reflections under high-resolution lighting. Both diffuse and specular representations satisfy the linearity of light transmission and therefore support real-time rendering under point lights and ambient lighting without additional training.

In addition, the proposed learnable Radiance Transfer supports global light transmission and full-frequency reflection of eyes, skin, and hair with a unified representation, significantly simplifying the learning process while enabling extremely high-fidelity relighting.

To reproduce corneal reflections, the team's re-illuminable Gaussian Avatar incorporates an explicit eye model that allows explicit control of the eyeball for better disentanglement. Additionally, the appearance model naturally supports relighting the eye with full frequency reflections, which is critical for realism in natural environments.

Experiments show that the combination of a 3D Gaussian model and a heavily lit appearance model outperforms any other combination.

Overall, the Relightable Gaussian Codec Avatars proposed by the team is a novel appearance and geometric representation that supports real-time rendering and is used for relightable head Avatars. Experiments show that using the proposed Radiance Transfer basis consisting of spherical harmonics and spherical Gaussians, it is now possible to perform high-fidelity relighting of hair, skin, and eyes in full-frequency illumination in real time. The team further showed that the choice of a geometric representation based on 3D Gaussian Splatting is critical for accurate hair reconstruction and relighting. The team's approach achieves significant quality improvements both qualitatively and quantitatively compared to existing real-time renderable geometry and appearance models.