diff --git a/README.md b/README.md index f044c82..3ebc65e 100644 --- a/README.md +++ b/README.md @@ -3,11 +3,84 @@ CUDA Denoiser For CUDA Path Tracer **University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 4** -* (TODO) YOUR NAME HERE -* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab) +* Matt Elser + * [LinkedIn](https://www.linkedin.com/in/matt-elser-97b8151ba/), [twitter](twitter.com/__mattelser__) +* Tested on: Tested on: Windows 10, i3-10100F @ 3.6GHz 16GB, GeForce 1660 Super 6GB -### (TODO: Your README) -*DO NOT* leave the README to the last minute! It is a crucial part of the -project, and we will not be able to grade you without a good README. +### Features + +![example denoise image](img/256example.png) + +This is an implementation of A-Trous wavelet denoising as described in +[Edge-Avoiding A-Trous Wavelet Transform for fast Global Illumination Filtering +by Dammertz et. al.](https://jo.dreggn.org/home/2010_atrous.pdf). + +*NOTE* This repo only contains the denoise features added to a minimal +pathtracer. To see this denoiser in a more featured pathtracer (from which all +sample images were generated), check out [my pathtracer +repo](https://github.com/mattelser/Project3-CUDA-Path-Tracer) + +A-Trous wavelet denoising works by approximating a Gaussian blur kernel and +incorporating data from "Gbuffers". These Gbuffers are data gathered from the +scene, and for this implementation include: +- surface normal ![surface normal gbuffer](img/normalGbuffer.png) +- surface position ![position gbuffer](img/positionGbuffer.png) +- image color + +### Performance +In comparing rendertimes between the full-featured pathtracer with no denoise +code and the full-featured pathtracer running the denoise kernel, no notable +performace difference was found across multiple iteration counts and multiple +resolutions for low filter sizes (e.x. a 15x15 pixel kernel). Large filter +sizes (e.x. an 80x80 pixel kernel) slowed render times by ~5%, but large +filters are generally undesireable since they cause visual artifacts. Note the +banding around the edges of the room in the following image, rendered with an +80x80 filter. +![artifacting image](img/artifacting.png) + +Scenes that produce less noise (e.x. those with larger light sources) also +produce better source images for the denoiser. + +### Visual Improvement +the following comparison shows three columns. The +first is an untouched pathtracer render, the second is a denoised version of +that render, and the third is the visual difference between them (the stronger +the red, the greater the difference at that pixel). Denoise parameters were kept +constant for all renders. +![denoise comparison](img/comparisonChart.png) + +As the above image shows, the denoiser is capable of improving even an +extremely noisy image. Also, as the iterations increase, the denoiser does not +noticeably reduce image quality. The fact that the difference images get +progressively less red indicates that the source image and the denoise image +are converging towards one agreed upon image. + +The visual gains of the denoiser are most dramatic with the extremely noisy +low-iteration images, but the most practical aspects are available at slightly +higher iteration values. Notice how little a visual difference exists between +the following two images: +![iteration comparison](img/qualcomp.png) +The left image took twice as much computation to achieve approximately the same +result! + +### Known limitations + +The gbuffers only take into account surface info, which makes it less effective +for reflective and refractive surfaces. Note in the gbuffers above that the +reflective/refractive spheres show the same position and normals as a diffuse +sphere instead of the position or normal of what they reflect/refract. One way +to solve this would be to adjust the gbuffer based on the shader and one or +more bounced rays. + +The denoiser does not work well with depth of field. The gbuffers are rebuilt +each iteration, but with depth of field the camera rays change each iteration. +The shifting rays lead to inconsistant results from iteration to iteration, and +whereas the un-denoised image sums these iterations and converges, the denoiser +does not. The image below shows the normal gbuffer with depth of field enabled. +The dark pixels are unexpected and may indicate a bug. +![noisy gbuffer](img/DOFNormalGbuffer.png) +One way to fix this would be to have the gbuffers accumulate over +time. This fix could pair well with the previously mentioned fix for +reflective/refractive surfaces. diff --git a/img/256example.png b/img/256example.png new file mode 100644 index 0000000..161d1da Binary files /dev/null and b/img/256example.png differ diff --git a/img/DOFNormalGbuffer.png b/img/DOFNormalGbuffer.png new file mode 100644 index 0000000..a6a8b10 Binary files /dev/null and b/img/DOFNormalGbuffer.png differ diff --git a/img/artifacting.png b/img/artifacting.png new file mode 100644 index 0000000..24ed9fe Binary files /dev/null and b/img/artifacting.png differ diff --git a/img/comparisonChart.png b/img/comparisonChart.png new file mode 100644 index 0000000..0429065 Binary files /dev/null and b/img/comparisonChart.png differ diff --git a/img/normalGbuffer.png b/img/normalGbuffer.png new file mode 100644 index 0000000..2f92386 Binary files /dev/null and b/img/normalGbuffer.png differ diff --git a/img/positionGbuffer.png b/img/positionGbuffer.png new file mode 100644 index 0000000..dbb660f Binary files /dev/null and b/img/positionGbuffer.png differ diff --git a/img/qualcomp.png b/img/qualcomp.png new file mode 100644 index 0000000..d27a089 Binary files /dev/null and b/img/qualcomp.png differ diff --git a/src/main.cpp b/src/main.cpp index 4092ae4..d738cbd 100644 --- a/src/main.cpp +++ b/src/main.cpp @@ -23,13 +23,14 @@ int ui_iterations = 0; int startupIterations = 0; int lastLoopIterations = 0; bool ui_showGbuffer = false; -bool ui_denoise = false; -int ui_filterSize = 80; +bool ui_denoise = true; // false; +int ui_filterSize = 15; float ui_colorWeight = 0.45f; float ui_normalWeight = 0.35f; float ui_positionWeight = 0.2f; bool ui_saveAndExit = false; + static bool camchanged = true; static float dtheta = 0, dphi = 0; static glm::vec3 cammove; @@ -40,6 +41,7 @@ glm::vec3 ogLookAt; // for recentering the camera Scene *scene; RenderState *renderState; +DenoiseSettings *denoiseSettings; int iteration; int width; @@ -62,6 +64,14 @@ int main(int argc, char** argv) { // Load scene file scene = new Scene(sceneFile); + // Set up denoise settings to pass data from UI + denoiseSettings = new DenoiseSettings(); + denoiseSettings->denoise = &ui_denoise; + denoiseSettings->filterSize = &ui_filterSize; + denoiseSettings->colorWeight = &ui_colorWeight; + denoiseSettings->normalWeight = &ui_normalWeight; + denoiseSettings->positionWeight = &ui_positionWeight; + // Set up camera stuff from loaded path tracer settings iteration = 0; renderState = &scene->state; @@ -69,6 +79,8 @@ int main(int argc, char** argv) { width = cam.resolution.x; height = cam.resolution.y; + renderState->denoiseSettings = denoiseSettings; + ui_iterations = renderState->iterations; startupIterations = ui_iterations; @@ -167,7 +179,11 @@ void runCuda() { if (ui_showGbuffer) { showGBuffer(pbo_dptr); - } else { + } + else if (ui_denoise) { + showDenoise(pbo_dptr, iteration); + } + else { showImage(pbo_dptr, iteration); } diff --git a/src/pathtrace.cu b/src/pathtrace.cu index 23e5f90..238ea27 100644 --- a/src/pathtrace.cu +++ b/src/pathtrace.cu @@ -67,28 +67,52 @@ __global__ void sendImageToPBO(uchar4* pbo, glm::ivec2 resolution, } } +////Kernel that writes the image to the OpenGL PBO directly. +//__global__ void sendDenoiseToPBO(uchar4* pbo, glm::ivec2 resolution, +// int iter, glm::vec3* image) { +// int x = (blockIdx.x * blockDim.x) + threadIdx.x; +// int y = (blockIdx.y * blockDim.y) + threadIdx.y; +// +// if (x < resolution.x && y < resolution.y) { +// int index = x + (y * resolution.x); +// glm::vec3 pix = image[index]; +// +// glm::ivec3 color; +// color.x = glm::clamp((int) (pix.x * 255.0), 0, 255); +// color.y = glm::clamp((int) (pix.y * 255.0), 0, 255); +// color.z = glm::clamp((int) (pix.z * 255.0), 0, 255); +// +// // Each thread writes one pixel location in the texture (textel) +// pbo[index].w = 0; +// pbo[index].x = color.x; +// pbo[index].y = color.y; +// pbo[index].z = color.z; +// } +//} + __global__ void gbufferToPBO(uchar4* pbo, glm::ivec2 resolution, GBufferPixel* gBuffer) { int x = (blockIdx.x * blockDim.x) + threadIdx.x; int y = (blockIdx.y * blockDim.y) + threadIdx.y; if (x < resolution.x && y < resolution.y) { int index = x + (y * resolution.x); - float timeToIntersect = gBuffer[index].t * 256.0; - + //float timeToIntersect = gBuffer[index].t * 256.0; pbo[index].w = 0; - pbo[index].x = timeToIntersect; - pbo[index].y = timeToIntersect; - pbo[index].z = timeToIntersect; + pbo[index].x = gBuffer[index].normal.x * 256.0f; + pbo[index].y = gBuffer[index].normal.y * 256.0f; + pbo[index].z = gBuffer[index].normal.z * 256.0f; } } static Scene * hst_scene = NULL; +static DenoiseSettings * denoiseSettings = NULL; static glm::vec3 * dev_image = NULL; static Geom * dev_geoms = NULL; static Material * dev_materials = NULL; static PathSegment * dev_paths = NULL; static ShadeableIntersection * dev_intersections = NULL; static GBufferPixel* dev_gBuffer = NULL; +static glm::vec3 * dev_dnImage = NULL; // TODO: static variables for device memory, any extra info you need, etc // ... @@ -113,6 +137,8 @@ void pathtraceInit(Scene *scene) { cudaMalloc(&dev_gBuffer, pixelcount * sizeof(GBufferPixel)); + cudaMalloc(&dev_dnImage, pixelcount * sizeof(glm::vec3)); + cudaMemset(dev_dnImage, 0, pixelcount * sizeof(glm::vec3)); // TODO: initialize any extra device memeory you need checkCUDAError("pathtraceInit"); @@ -125,6 +151,7 @@ void pathtraceFree() { cudaFree(dev_materials); cudaFree(dev_intersections); cudaFree(dev_gBuffer); + cudaFree(dev_dnImage); // TODO: clean up any extra device memory you created checkCUDAError("pathtraceFree"); @@ -281,7 +308,10 @@ __global__ void generateGBuffer ( int idx = blockIdx.x * blockDim.x + threadIdx.x; if (idx < num_paths) { - gBuffer[idx].t = shadeableIntersections[idx].t; + //gBuffer[idx].t = shadeableIntersections[idx].t; + gBuffer[idx].normal = shadeableIntersections[idx].surfaceNormal; + gBuffer[idx].position = getPointOnRay(pathSegments[idx].ray, + shadeableIntersections[idx].t); } } @@ -297,6 +327,97 @@ __global__ void finalGather(int nPaths, glm::vec3 * image, PathSegment * iterati } } +__global__ void denoise(int n, + GBufferPixel* gbuff, + glm::vec3* image, + glm::vec3 * dnImage, + int step, + int imageWidth, + float normalWeight, + float posWeight, + float colorWeight) +{ + int index = (blockIdx.x * blockDim.x) + threadIdx.x; + + if (index < n) + { + glm::vec3 colSum = glm::vec3(0.0f); + float wSum = 0.0f; + // hardcode a 5x5 Gaussian filter + float GaussianFilter[5][5] = { {1, 4, 6, 4, 1}, + {4, 16, 24, 16, 4}, + {6, 24, 36, 24, 6}, + {4, 16, 24, 16, 4}, + {1, 4, 6, 4, 1} }; + + // a way to convert from 2d pixel space to the 1d pixel array we have + int uStepIm = 1; + int vStepIm = imageWidth; + + // the relative offset from the center pixel in the image + // e.x. -2, -2 is two pixels left and two pixels up in screenspace + int imStartX = -2; + int imStartY = -2; + + + // store the gbuffer values for the center pixel of our filter + // i.e. the one we're actually calculating the color for + glm::vec3 centralNorm = gbuff[index].normal; + glm::vec3 centralPos = gbuff[index].position; + glm::vec3 centralCol = image[index]; + + // the cell count in 2d, starting in the upper left corner of + // our 5x5 filter + for (int y = 0; y < 5; y++) { + for (int x = 0; x < 5; x++) { + int imX = (imStartX + x) * uStepIm * step; + int imY = (imStartY + y) * vStepIm * step; + + // i is the index for 1d representations of our 2d + // data, i.e. the beauty pass and the gbuffer + int i = index + imX + imY; + if (i < 0 || i >= n) { + // i can be out of bounds along the edges of the image + continue; + } + + // get the Gaussian value for this pixel + float gVal = GaussianFilter[y][x]; + + // get the gbuffer values for this pixel + glm::vec3 nVal = gbuff[i].normal; + glm::vec3 pVal = gbuff[i].position; + glm::vec3 cVal = image[i]; + + // get the distance of the gbuffer values + // from our central pixel + //glm::vec3 a = centralCol - cVal; + float nDist = max(glm::length(centralNorm - nVal)/(step*step), 0.0f); + float pDist = glm::length(centralPos - pVal);// , centralPos - pVal); + float cDist = glm::length(centralCol - cVal);// , centralCol - cVal); + + // get the weights based on these distances + // and our input values + float nw = min(exp(-1.0f * nDist / normalWeight), 1.0f); + float pw = min(exp(-1.0f * pDist / posWeight), 1.0f); + float cw = min(exp(-1.0f * cDist / colorWeight), 1.0f); + + // get the overall + float w = nw * pw * cw; + + colSum += cVal * w * gVal; + wSum += w * gVal; + } + } + + //bring denoise + volatile float3 foo = make_float3(colSum.x, colSum.y, colSum.z); + volatile float3 bar = make_float3(centralCol.x, centralCol.y, centralCol.z); + dnImage[index] = colSum / wSum; + //dnImage[index] = colSum / (256.0f * steps); + } +} + /** * Wrapper for the __global__ call that sets up the kernel calls and does a ton * of memory management @@ -399,6 +520,25 @@ void pathtrace(int frame, int iter) { finalGather<<>>(num_paths, dev_image, dev_paths); /////////////////////////////////////////////////////////////////////////// + if (*hst_scene->state.denoiseSettings->denoise){ + + float nWeight = pow(*hst_scene->state.denoiseSettings->normalWeight, 2); + float pWeight = pow(*hst_scene->state.denoiseSettings->positionWeight, 2); + float cWeight = pow(*hst_scene->state.denoiseSettings->colorWeight, 2); + + int steps = *hst_scene->state.denoiseSettings->filterSize / 5; + for (int step = 1; step <= steps; step++) { + denoise <<>>(num_paths, + dev_gBuffer, + dev_image, + dev_dnImage, + step, + cam.resolution.x, + nWeight, + pWeight, + cWeight); + } + } // CHECKITOUT: use dev_image as reference if you want to implement saving denoised images. // Otherwise, screenshots are also acceptable. @@ -421,6 +561,17 @@ void showGBuffer(uchar4* pbo) { gbufferToPBO<<>>(pbo, cam.resolution, dev_gBuffer); } +void showDenoise(uchar4* pbo, int iter) { + const Camera &cam = hst_scene->state.camera; + const dim3 blockSize2d(8, 8); + const dim3 blocksPerGrid2d( + (cam.resolution.x + blockSize2d.x - 1) / blockSize2d.x, + (cam.resolution.y + blockSize2d.y - 1) / blockSize2d.y); + + // CHECKITOUT: process the gbuffer results and send them to OpenGL buffer for visualization + sendImageToPBO<<>>(pbo, cam.resolution, iter, dev_dnImage); +} + void showImage(uchar4* pbo, int iter) { const Camera &cam = hst_scene->state.camera; const dim3 blockSize2d(8, 8); diff --git a/src/pathtrace.h b/src/pathtrace.h index 9e12f44..420e5c4 100644 --- a/src/pathtrace.h +++ b/src/pathtrace.h @@ -8,3 +8,4 @@ void pathtraceFree(); void pathtrace(int frame, int iteration); void showGBuffer(uchar4 *pbo); void showImage(uchar4 *pbo, int iter); +void showDenoise(uchar4 *pbo, int iter); diff --git a/src/sceneStructs.h b/src/sceneStructs.h index da7e558..f9ff885 100644 --- a/src/sceneStructs.h +++ b/src/sceneStructs.h @@ -51,12 +51,21 @@ struct Camera { glm::vec2 pixelLength; }; +struct DenoiseSettings { + bool * denoise; + int * filterSize; + float * colorWeight; + float * normalWeight; + float * positionWeight; +}; + struct RenderState { Camera camera; unsigned int iterations; int traceDepth; std::vector image; std::string imageName; + DenoiseSettings *denoiseSettings; }; struct PathSegment { @@ -78,5 +87,7 @@ struct ShadeableIntersection { // CHECKITOUT - a simple struct for storing scene geometry information per-pixel. // What information might be helpful for guiding a denoising filter? struct GBufferPixel { - float t; + //float t; + glm::vec3 normal; + glm::vec3 position; };