Researchers at Google Research have proposed a new approach to neural network rendering of volumetric scenes based on conventional planar photographs. They showed that using raw footage without preprocessing allows for much greater dynamic range and noise reduction, which can be used to suppress noise and create HDR images, not just create shots from new angles, says N+1.
In 2020, a group of American researchers, which included the head of the new work, Jonathan Barron, presented the NeRF neural network rendering method, which showed excellent results and gained popularity among other researchers. Unlike most neural network algorithms, the NeRF model is trained not on many different data in order to work well in different conditions, but on several tens or hundreds of photographs of the same scene or object taken from different angles. Thanks to this, the model very well remembers this particular scene and can generate its images from new angles, while maintaining the shape of objects, reflections, transparency and other properties. When generating a new photograph, the model receives a point in space and an angle of observation, and in response gives the values of the density and color of space along the resulting ray. This is how one pixel is formed, then the same is repeated for the rest.
Previously, regular photographs were used to train NeRF models, which had already been preprocessed in the camera. A group of researchers from Google Research led by Barron suggested using raw data in RAW format to train NeRF models. Such photographs have more noise and they have not yet been debayered, in which the algorithm interpolates the colors of each pixel (in the photomatrix, there is a filter of one of the three primary colors in front of each photodiode, so they register the intensity of only one color). However, they contain the most reliable, and not “average” data, moreover, in a wider dynamic range, notes NIX Solutions.
In fact, the researchers used the same approach that smartphones already use for low-light photography: the camera takes several pictures with a lot of noise, and then creates one frame from them, which contains useful information from individual noisy pictures. NeRF was originally designed to form a single representation of a scene from many separate frames, so the researchers assumed that it would be able to extract useful information from noisy photographs in the same way, and they were right.
The authors trained models for different scenes on hundreds of shots from different angles, and then compared them with algorithms for noise suppression based on machine learning algorithms. It turned out that NeRF models trained on one scene show comparable results with algorithms trained on huge datasets. In addition, the authors showed how models allow you to control exposure and create HDR images, as well as change focus.