A differentiable 3D renderer with Pytorch, Tensorflow and Matlab interfaces
DEODR (for Discontinuity-Edge-Overdraw based Differentiable Renderer) is a differentiable 3D mesh renderer written in C with Python and Matlab bindings. The python code provides interfaces with Pytorch and Tensorflow. It provides a differentiable rendering function and its associated reverse mode differentiation function (a.k.a adjoint function) that provides derivatives of a loss defined on the rendered image with respect to the lightning, the 3D vertices positions and the vertices colors. The core triangle rasterization procedures and their adjoint are written in C for speed, while the vertices normals computation and camera projections are computed in either Python (numpy, pytorch or tensorflow) or Matlab in order to gain flexibility and improve the integration with automatic differentiation libraries. The core C++ differentiable renderer has been implemented in 2008 and described in [1,2]. Unlike most other differentiable renderers (except the recent SoftRas [8] and to some extend the differentiable ray/path tracing methods in [10] and [13]), the rendering is differentiable along the occlusion boundaries and no had-hoc approximation is needed in the backpropagation pass to deal with discontinuities along occlusion boundaries. This is achieved by using a differentiable antialiasing method called Discontinuity-Edge-Overdraw [3] that progressively blends the colour of the front triangle with the back triangle along occlusion boundaries.
Some unsupported features:
Keeping the rendering differentiable everywhere when using texture is challenging: if you use textured triangles you will need to make sure there are no adjacent triangles in the 3D mesh that are simultaneously visible while being disconnected in the UV map, i.e. that there is no visible seam. Otherwise the rendering will not in general be continuous with respect to the 3D vertices positions due to the texture discontinuity along the seam. Depending on the shape of your object, you might not be able to define a continuous UV mapping over the entire mesh and you will need to define the UV texture coordinates in a very specific manner described in Figure 3 in [1], with some constraints on the texture intensities so that the continuity of the rendering is still guaranteed along edges between disconnected triangles in the UV map after texture bilinear interpolation. Note that an improved version of that approach is also described in [8].
For python 3.6,3.7 and python 3.8 under Windows and Linux you can use
pip install deodr
Otherwise or if that does not work you can try
pip install git+https://github.com/martinResearch/DEODR.git
Simply download the zip file, decompress it, run compile.m.
For the hand fitting example you will also need to download my Matlab automatic differentiation toolbox from here add the decompressed folder in your matlab path
Example of fitting a hand mesh to a depth sensor image deodr/examples/depthimagehand_fitting.py
Example of fitting a hand mesh to a RGB sensor image deodr/examples/rgbimagehand_fitting.py
Example of fitting a hand mesh to several RGB sensor images deodr/examples/rgbmultiviewhand.py
Example of a simple triangle soup fitting Matlab/examples/trianglesoupfitting.m
Example of fitting a hand mesh to a RGB sensor image Matlab/examples/hand_fitting.m. For this example you will also need to download the automatic differentiation toolbox from https://github.com/martinResearch/MatlabAutoDiff
This code implements the core of the differentiable renderer described in [1,2] and has been mostly written in 2008-2009. It is anterior to OpenDR and is to my knowledge the first differentiable renderer that deals with vertices displacements to appear in the literature. It renders a set of triangles with either a texture that is bilinearly interpolated and shaded or with interpolated RGB colours. In contrast with most renderers, the intensity of each pixel in the rendered image is continuous and differentiable with respect to the vertices positions even along occlusion boundaries. This is achieved by using a differentiable antialiasing method called Discontinuity-Edge-Overdraw [3] that progressively blends the colour of the front triangle with the back triangle along occlusion boundaries, using a linear combination of the front and back triangles with a mixing coefficient that varies continuously as the reprojected vertices move in the image (see [1,2] for more details). This allows us to capture the effect of change of visibility along occlusion boundaries in the gradient of the loss in a principled manner by simply applying the chain rule of derivatives to our differentiable rendering function. Note that this code does not provide explicitly the sparse Jacobian of the rendering function (where each row would correspond to the color intensity of a pixel of the rendered image, like done in [4]) but it provides the vector-Jacobian product operator, which corresponds to the backward function in PyTorch.
This can be used to do efficient analysis-by-synthesis computer vision by minimizing the function E that corresponds to the sum or the squared pixel intensities differences between a rendered image and a reference observed image I_{o} with respect to the scene parameters we aim to estimate.
Using D(i,j) as the adjoint i.e. derivative of the error (for example the squared residual) between observed pixel intensity and synthetized pixel intensity at location (i,j)
We can use DEODR to obtain the gradient with respect to the 2D vertices locations and their colors i.e :
and
In combination with an automatic differentiation tool, this core function allows one to obtain the gradient of the error function with respect to the parameters of a complex 3D scene one aims to estimate.
The rendering function implemented in C++ can draw an image given a list of 2D projected triangles with the associated vertices depths (i.e 2.5D scene), where each triangle can have
We provide a functions in Matlab and Python to obtain the 2.5D representation of the scene from a textured 3D mesh, a camera and a simple lighting model. This function is used in the example in which we fit a 3D hand model to an image. We kept this function as minimalist as possible as we did not intend to rewrite an entire rendering pipeline but to focus on the part that is difficult to differentiate and that cannot be differentiated easily using automatic differentiation.
Our code provides two methods to handle discontinuities at the occlusion boundaries
The choice of the method is done through the Boolean parameter antialiaseError. Both approaches lead to a differentiable error function after summation of the residuals over the pixels and both lead to similar gradients. The difference is subtle and is only noticeable at the borders after convergence on synthetic antialiased data. The first methods can potentially provide more flexibility for the design of the error function as one can for example use a non-local image loss by comparing image moments instead of comparing pixel per pixel.
Note: In order to keep the code minimal and well documented, I decided not to provide here the Matlab code to model the articulated hand and the code to update the texture image from observation used in [1]. The hand fitting example provided here does not relies on a underlying skeleton but on a regularization term that penalizes non-rigid deformations. Some Matlab code for Linear Blend Skinning can be found here. Using a Matlab implementation of the skinning equations would allow the use of the Matlab automatic differentiation toolbox provided here to compute the Jacobian of the vertices positions with respect to the hand pose parameters.
BSD 2-clause "Simplified" license.
If you use any part of this work please cite the following:
Model-based 3D Hand Pose Estimation from Monocular Video. M. de la Gorce, N. Paragios and David Fleet. PAMI 2011 pdf
@article{deLaGorce:2011:MHP:2006854.2007005, author = {de La Gorce, Martin and Fleet, David J. and Paragios, Nikos}, title = {Model-Based 3D Hand Pose Estimation from Monocular Video}, journal = {IEEE Trans. Pattern Anal. Mach. Intell.}, issue_date = {September 2011}, volume = {33}, number = {9}, month = sep, year = {2011}, issn = {0162-8828}, pages = {1793--1805}, numpages = {13}, url = {http://dx.doi.org/10.1109/TPAMI.2011.33}, doi = {10.1109/TPAMI.2011.33}, acmid = {2007005}, publisher = {IEEE Computer Society}, address = {Washington, DC, USA}, }
OpenDR 4 is an open source differentiable renderer written in python and make publicly available in 2014. OpenDR calls OpenGL and relies an a python automatic differentiation toolbox by the same author called chumpy. Like in our code OpenDR uses a intermediate 2.5D representation of the scene using a set of 2D projected triangles. In contrast to our code OpenDR does not provide a continuous loss function as there is not continuous antialiasing formulation at the occlusion boundaries and the minimised function will have jumps when a pixel at the boundary switch between the front of back object. By providing a continuous differentiable error function using edge-overdraw antialiasing and its exact gradient, our method can lead to better a convergence of continuous optimisation methods..
DIRT (MIT licence) is an open source differentiable renderer that uses approximations in the gradient computation similar OpenDR but that is interfaced with tensorflow. It makes considerable effort to return correctly-behaving derivatives even in cases of self-occlusion, where most other differentiable renderers can fail.
Neural 3D Mesh Renderer (MIT Licence). Method published in [6]. This method consists in a differentiable render whose gradients are designed to be used in neural networks.
tf_mesh_renderer (Apache License 2.0). A differentiable, 3D mesh renderer using TensorFlow. Unlike other differentiable renderer it does not provides suppport for occlusion boundaries in the gradient computation and thus is inadequate for many applications.
Code accompanying the paper [7] github. It renders only silhouettes.
redner Method published in [10]. It is a differentiable path-tracer that can propagate gradients through indirect illumination.
Differentiable Surface Splatting Method publihed in [11]. It is a differentiable implementation of the surface splatting method that allows to render point clouds. The advantage of this approach over mesh based methods is that there is no predefined connectivity associated to the set of points, which allows topological changes during minimization.
DIB-Render Method published in [12]. This method removes discontinuies along the projected mesh and background boundary (external silhouette) by rendering background pixels colors using a weighted sum of the background colour and nearby projected faces. However, unlike our method, it does not remove discontinuities along self-occlusions.
Differentiable path tracing Method published in [13]
PyTorch3D's renderer. The method implemented keeps a list of the K nearest faces intersecting the ray corresponding to each pixel in the image (as opposed to traditional rasterization which returns only the index of the closest face in the mesh per pixel). The top K face properties can then be aggregated using different methods (such as the sigmoid/softmax approach proposed by Li et at in SoftRasterizer). Note however that the K face selection is not a continuous operation and thus may potentially lead to discontinuities in the rendered pixel intensities with respect to the shape parameters.
Tensorflow graphics. Library from google that allows differentiable rendering. At the date of march 2020 discontinuities along occlusion boundaries are not handled yet in a differentiable manner. As a result fitting using a loss defined with the rendered image may not converge well in some scenario.
SDFDiff.[15] Differentiable Implicit surface rendering using signed distance functions. This approach seems not to deal with depth discontinuity along the silhouette and self occlusions in a differentiable way, and thus the change in visibility in these locations is not captured in the back-propagated gradients which will hamper convergence, especially in scenarios where the silhouette is an important signal for the fitting..
DIST[16] Differentiable Implicit surface rendering using signed distance functions. The depth discontinuity along the object/background boundary is taken into account in the computation of the silhouette mask only, and is not taken into account in the rendered color, depth or normal images. This could hamper convergence of the surface fitting methods that uses these images in the loss for concave objects that exhibit self occlusion in the chosen camera view point(s).
Nvdiffrast[17] Fast and modular differentiable renderer implemented using OpenGL and cuda with Pytorch and tensoflow interfaces. The method uses deferred rendering which yields great flexibility for the user by allowing the user to write a custom pixel shader using pytorch . An antialiasing step is applied after deferred rendering in order to blend the colors along self-occlusion edges using weights that depend on the distance of the adjacent pixel centers to the line segment. The colors used for blending of along the self occlusion edges are extracted from the nearest points with integer coordinates in the adjacent triangles. This is not a continuous operation and thus may result in small temporal color jumps along the self-occlusion when the edges line segment crosses a pixel center. This may degrade the quality of the gradients as a first order approximations of the change in the loss function w.r.t scene parameters. Moreover the method used to detect self occlusion edges is done in the image space using faces indices which is not very reliable when triangles are small, which introduces noise in the gradient that might be biased. In contrast, our anti-aliasing method uses the color of the nearest point on the occlusing triangle edge (euclidean projection), which does not lead to discontinuity when edges cross pixel centers, and our silhouette edges detection is done in the object space and thus is reliable regardless of the triangles size during rasterization.
[1] Model-based 3D Hand Pose Estimation from Monocular Video. M. de la Gorce, N. Paragios and David Fleet. PAMI 2011 paper
[2] Model-based 3D Hand Pose Estimation from Monocular Video. M. de la Gorce. PhD thesis. Ecole centralde de Paris 2009. paper
[3] Discontinuity edge overdraw P.V. Sander and H. Hoppe, J.Snyder and S.J. Gortler. SI3D 2001 paper
[4] OpenDR: An Approximate Differentiable Renderer Loper, Matthew M. and Black, Michael J. ECCV 2014 paper code online documentation
[5] A Morphable Model For The Synthesis Of 3D Faces. Volker Blanz and Thomas Vetter. SIGGRAPH 99. paper
[6] Neural 3D Mesh Renderer. Hiroharu Kato and Yoshitaka Ushiku and Tatsuya Harada.CoRR 2017 paper
[7] End-to-end 6-DoF Object Pose Estimation through Differentiable Rasterization Andrea Palazzi, Luca Bergamini, Simone Calderara, Rita Cucchiara. Second Workshop on 3D Reconstruction Meets Semantics (3DRMS) at ECCVW 2018. paper
[8] Mesh Color textures Cem Yuksel. Proceedings of High Performance Graphics 2017. paper
[9] Soft Rasterizer: A Differentiable Renderer for Image-based 3D Reasoning. Shichen Liu, Tianye Li, Weikai Chen and Hao Li. ICCV 2019. paper
[10] Differentiable Monte Carlo Ray Tracing through Edge Sampling. zu-Mao Li, Miika Aittala, Frédo Durand, Jaakko Lehtinen. SIGGRAPH Asia 2018. project page
[11] Differentiable Surface Splatting for Point-based Geometry Processing. Felice Yifan, Serena Wang, Shihao Wu, Cengiz Oztireli and Olga Sorkine-Hornung. SIGGRAPH Asia 2019. paper and video
[12] Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer Wenzheng Chen, Jun Gao, Huan Ling, Edward J. Smith, Jaakko Lehtinen, Alec Jacobson, Sanja Fidler. NeurIPS 2019. paper
[13] Reparameterizing discontinuous integrands for differentiable rendering. Guillaume Loubet, Nicolas Holzschuch and Wenzel Jakob. SIGGRAPH Asia 2019. project page
[14] TensorFlow Graphics: Computer Graphics Meets Deep Learning. Valentin, Julien and Keskin, Cem and Pidlypenskyi, Pavel and Makadia, Ameesh and Sud, Avneesh and Bouaziz, Sofien. 2019
[15] SDFDiff: Differentiable Rendering of Signed Distance Fields for 3D Shape Optimization. Yue Jiang, Dantong Ji, Zhizhong Han, Matthias Zwicker. CVPR 2020. code
[16] DIST: Rendering Deep Implicit Signed Distance Function with Differentiable Sphere Tracing. Shaohui Liu1, Yinda Zhang, Songyou Peng, Boxin Shi, Marc Pollefeys, Zhaopeng Cui. CVPR 2020. code
[17] Modular Primitives for High-Performance Differentiable Rendering. Samuli Laine, Janne Hellsten, Tero Karras, Yeongho Seol, Jaakko Lehtinen, Timo Aila. paper.