CIS565-Fall-2021 · mattelser · Oct 20, 2021 · Oct 20, 2021 · Oct 20, 2021 · Oct 21, 2021
diff --git a/README.md b/README.md
@@ -3,11 +3,84 @@ CUDA Denoiser For CUDA Path Tracer
 
 **University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 4**
 
-* (TODO) YOUR NAME HERE
-* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
+* Matt Elser
+  * [LinkedIn](https://www.linkedin.com/in/matt-elser-97b8151ba/), [twitter](twitter.com/__mattelser__)
+* Tested on: Tested on: Windows 10, i3-10100F @ 3.6GHz 16GB, GeForce 1660 Super 6GB
 
-### (TODO: Your README)
 
-*DO NOT* leave the README to the last minute! It is a crucial part of the
-project, and we will not be able to grade you without a good README.
+### Features
+
+![example denoise image](img/256example.png)
+
+This is an implementation of A-Trous wavelet denoising as described in
+[Edge-Avoiding A-Trous Wavelet Transform for fast Global Illumination Filtering
+by Dammertz et. al.](https://jo.dreggn.org/home/2010_atrous.pdf). 
+
+*NOTE* This repo only contains the denoise features added to a minimal
+pathtracer. To see this denoiser in a more featured pathtracer (from which all
+sample images were generated), check out [my pathtracer
+repo](https://github.com/mattelser/Project3-CUDA-Path-Tracer)
+
+A-Trous wavelet denoising works by approximating a Gaussian blur kernel and
+incorporating data from "Gbuffers". These Gbuffers are data gathered from the
+scene, and for this implementation include:
+- surface normal ![surface normal gbuffer](img/normalGbuffer.png)
+- surface position ![position gbuffer](img/positionGbuffer.png)
+- image color
+
+### Performance
+In comparing rendertimes between the full-featured pathtracer with no denoise
+code and the full-featured pathtracer running the denoise kernel, no notable
+performace difference was found across multiple iteration counts and multiple
+resolutions for low filter sizes (e.x. a 15x15 pixel kernel). Large filter
+sizes (e.x. an 80x80 pixel kernel) slowed render times by ~5%, but large
+filters are generally undesireable since they cause visual artifacts. Note the
+banding around the edges of the room in the following image, rendered with an
+80x80 filter.
+![artifacting image](img/artifacting.png)
+
+Scenes that produce less noise (e.x. those with larger light sources) also
+produce better source images for the denoiser.
+
+### Visual Improvement
+the following comparison shows three columns. The
+first is an untouched pathtracer render, the second is a denoised version of
+that render, and the third is the visual difference between them (the stronger
+the red, the greater the difference at that pixel). Denoise parameters were kept
+constant for all renders.
+![denoise comparison](img/comparisonChart.png)
+
+As the above image shows, the denoiser is capable of improving even an
+extremely noisy image. Also, as the iterations increase, the denoiser does not
+noticeably reduce image quality. The fact that the difference images get
+progressively less red indicates that  the source image and the denoise image
+are converging towards one agreed upon image.
+
+The visual gains of the denoiser are most dramatic with the extremely noisy
+low-iteration images, but the most practical aspects are available at slightly
+higher iteration values. Notice how little a visual difference exists between
+the following two images:
+![iteration comparison](img/qualcomp.png)
+The left image took twice as much computation to achieve approximately the same
+result!
+
+### Known limitations
+
+The gbuffers only take into account surface info, which makes it less effective
+for reflective and refractive surfaces. Note in the gbuffers above that the
+reflective/refractive spheres show the same position and normals as a diffuse
+sphere instead of the position or normal of what they reflect/refract. One way
+to solve this would be to adjust the gbuffer based on the shader and one or
+more bounced rays.
+
+The denoiser does not work well with depth of field. The gbuffers are rebuilt
+each iteration, but with depth of field the camera rays change each iteration.
+The shifting rays lead to inconsistant results from iteration to iteration, and
+whereas the un-denoised image sums these iterations and converges, the denoiser
+does not. The image below shows the normal gbuffer with depth of field enabled.
+The dark pixels are unexpected and may indicate a bug.
+![noisy gbuffer](img/DOFNormalGbuffer.png)
+One way to fix this would be to have the gbuffers accumulate over
+time. This fix could pair well with the previously mentioned fix for
+reflective/refractive surfaces.
 
diff --git a/img/256example.png b/img/256example.png
diff --git a/img/DOFNormalGbuffer.png b/img/DOFNormalGbuffer.png
diff --git a/img/artifacting.png b/img/artifacting.png
diff --git a/img/comparisonChart.png b/img/comparisonChart.png
diff --git a/img/normalGbuffer.png b/img/normalGbuffer.png
diff --git a/img/positionGbuffer.png b/img/positionGbuffer.png
diff --git a/img/qualcomp.png b/img/qualcomp.png
diff --git a/src/main.cpp b/src/main.cpp
@@ -23,13 +23,14 @@ int ui_iterations = 0;
 int startupIterations = 0;
 int lastLoopIterations = 0;
 bool ui_showGbuffer = false;
-bool ui_denoise = false;
-int ui_filterSize = 80;
+bool ui_denoise = true; // false;
+int ui_filterSize = 15;
 float ui_colorWeight = 0.45f;
 float ui_normalWeight = 0.35f;
 float ui_positionWeight = 0.2f;
 bool ui_saveAndExit = false;
 
+
 static bool camchanged = true;
 static float dtheta = 0, dphi = 0;
 static glm::vec3 cammove;
@@ -40,6 +41,7 @@ glm::vec3 ogLookAt; // for recentering the camera
 
 Scene *scene;
 RenderState *renderState;
+DenoiseSettings *denoiseSettings;
 int iteration;
 
 int width;
@@ -62,13 +64,23 @@ int main(int argc, char** argv) {
     // Load scene file
     scene = new Scene(sceneFile);
 
+    // Set up denoise settings to pass data from UI
+    denoiseSettings = new DenoiseSettings();
+    denoiseSettings->denoise = &ui_denoise;
+    denoiseSettings->filterSize = &ui_filterSize;
+    denoiseSettings->colorWeight = &ui_colorWeight;
+	denoiseSettings->normalWeight = &ui_normalWeight;
+	denoiseSettings->positionWeight = &ui_positionWeight;
+
     // Set up camera stuff from loaded path tracer settings
     iteration = 0;
     renderState = &scene->state;
     Camera &cam = renderState->camera;
     width = cam.resolution.x;
     height = cam.resolution.y;
 
+    renderState->denoiseSettings = denoiseSettings;
+
     ui_iterations = renderState->iterations;
     startupIterations = ui_iterations;
 
@@ -167,7 +179,11 @@ void runCuda() {
 
     if (ui_showGbuffer) {
       showGBuffer(pbo_dptr);
-    } else {
+    } 
+    else if (ui_denoise) {
+        showDenoise(pbo_dptr, iteration);
+    }
+    else {
       showImage(pbo_dptr, iteration);
     }
 

diff --git a/src/pathtrace.cu b/src/pathtrace.cu
@@ -67,28 +67,52 @@ __global__ void sendImageToPBO(uchar4* pbo, glm::ivec2 resolution,
     }
 }
 
+////Kernel that writes the image to the OpenGL PBO directly.
+//__global__ void sendDenoiseToPBO(uchar4* pbo, glm::ivec2 resolution,
+//        int iter, glm::vec3* image) {
+//    int x = (blockIdx.x * blockDim.x) + threadIdx.x;
+//    int y = (blockIdx.y * blockDim.y) + threadIdx.y;
+//
+//    if (x < resolution.x && y < resolution.y) {
+//        int index = x + (y * resolution.x);
+//        glm::vec3 pix = image[index];
+//
+//        glm::ivec3 color;
+//        color.x = glm::clamp((int) (pix.x * 255.0), 0, 255);
+//        color.y = glm::clamp((int) (pix.y * 255.0), 0, 255);
+//        color.z = glm::clamp((int) (pix.z * 255.0), 0, 255);
+//
+//        // Each thread writes one pixel location in the texture (textel)
+//        pbo[index].w = 0;
+//        pbo[index].x = color.x;
+//        pbo[index].y = color.y;
+//        pbo[index].z = color.z;
+//    }
+//}
+
 __global__ void gbufferToPBO(uchar4* pbo, glm::ivec2 resolution, GBufferPixel* gBuffer) {
     int x = (blockIdx.x * blockDim.x) + threadIdx.x;
     int y = (blockIdx.y * blockDim.y) + threadIdx.y;
 
     if (x < resolution.x && y < resolution.y) {
         int index = x + (y * resolution.x);
-        float timeToIntersect = gBuffer[index].t * 256.0;
-
+        //float timeToIntersect = gBuffer[index].t * 256.0;
         pbo[index].w = 0;
-        pbo[index].x = timeToIntersect;
-        pbo[index].y = timeToIntersect;
-        pbo[index].z = timeToIntersect;
+        pbo[index].x = gBuffer[index].normal.x * 256.0f;
+        pbo[index].y = gBuffer[index].normal.y * 256.0f;
+        pbo[index].z = gBuffer[index].normal.z * 256.0f;
     }
 }
 
 static Scene * hst_scene = NULL;
+static DenoiseSettings * denoiseSettings = NULL;
 static glm::vec3 * dev_image = NULL;
 static Geom * dev_geoms = NULL;
 static Material * dev_materials = NULL;
 static PathSegment * dev_paths = NULL;
 static ShadeableIntersection * dev_intersections = NULL;
 static GBufferPixel* dev_gBuffer = NULL;
+static glm::vec3 * dev_dnImage = NULL;
 // TODO: static variables for device memory, any extra info you need, etc
 // ...
 
@@ -113,6 +137,8 @@ void pathtraceInit(Scene *scene) {
 
     cudaMalloc(&dev_gBuffer, pixelcount * sizeof(GBufferPixel));
 
+    cudaMalloc(&dev_dnImage, pixelcount * sizeof(glm::vec3));
+    cudaMemset(dev_dnImage, 0, pixelcount * sizeof(glm::vec3));
     // TODO: initialize any extra device memeory you need
 
     checkCUDAError("pathtraceInit");
@@ -125,6 +151,7 @@ void pathtraceFree() {
   	cudaFree(dev_materials);
   	cudaFree(dev_intersections);
     cudaFree(dev_gBuffer);
+    cudaFree(dev_dnImage); 
     // TODO: clean up any extra device memory you created
 
     checkCUDAError("pathtraceFree");
@@ -281,7 +308,10 @@ __global__ void generateGBuffer (
   int idx = blockIdx.x * blockDim.x + threadIdx.x;
   if (idx < num_paths)
   {
-    gBuffer[idx].t = shadeableIntersections[idx].t;
+    //gBuffer[idx].t = shadeableIntersections[idx].t;
+    gBuffer[idx].normal = shadeableIntersections[idx].surfaceNormal;
+    gBuffer[idx].position = getPointOnRay(pathSegments[idx].ray, 
+                                          shadeableIntersections[idx].t);
   }
 }
 
@@ -297,6 +327,97 @@ __global__ void finalGather(int nPaths, glm::vec3 * image, PathSegment * iterati
 	}
 }
 
+__global__ void denoise(int n, 
+						GBufferPixel* gbuff, 
+						glm::vec3* image, 
+						glm::vec3 * dnImage,
+						int step, 
+						int imageWidth,
+						float normalWeight,
+						float posWeight, 
+						float colorWeight)
+{
+	int index = (blockIdx.x * blockDim.x) + threadIdx.x;
+
+    if (index < n)
+    {
+        glm::vec3 colSum = glm::vec3(0.0f);
+        float wSum = 0.0f;
+        // hardcode a 5x5 Gaussian filter
+        float GaussianFilter[5][5] = { {1,  4, 6,  4,  1},
+                                       {4, 16, 24, 16, 4},
+                                       {6, 24, 36, 24, 6},
+                                       {4, 16, 24, 16, 4},
+                                       {1,  4,  6,  4, 1} };
+
+        // a way to convert from 2d pixel space to the 1d pixel array we have
+		int uStepIm = 1;
+		int vStepIm = imageWidth;
+
+        // the relative offset from the center pixel in the image
+        // e.x. -2, -2 is two pixels left and two pixels up in screenspace
+        int imStartX = -2;
+        int imStartY = -2;
+
+
+        // store the gbuffer values for the center pixel of our filter
+        // i.e. the one we're actually calculating the color for
+        glm::vec3 centralNorm = gbuff[index].normal;
+        glm::vec3 centralPos = gbuff[index].position;
+        glm::vec3 centralCol = image[index];
+
+        // the cell count in 2d, starting in the upper left corner of
+        // our 5x5 filter
+		for (int y = 0; y < 5; y++) {
+			for (int x = 0; x < 5; x++) {
+				int imX = (imStartX + x) * uStepIm * step;
+				int imY = (imStartY + y) * vStepIm * step;
+
+				// i is the index for 1d representations of our 2d
+				// data, i.e. the beauty pass and the gbuffer
+				int i = index + imX + imY;
+				if (i < 0 || i >= n) {
+					// i can be out of bounds along the edges of the image
+					continue;
+				}
+
+				// get the Gaussian value for this pixel
+				float gVal = GaussianFilter[y][x];
+
+				// get the gbuffer values for this pixel
+				glm::vec3 nVal = gbuff[i].normal;
+				glm::vec3 pVal = gbuff[i].position;
+				glm::vec3 cVal = image[i];
+
+				// get the distance of the gbuffer values 
+				// from our central pixel
+				//glm::vec3 a = centralCol - cVal;
+				float nDist = max(glm::length(centralNorm - nVal)/(step*step), 0.0f);
+				float pDist = glm::length(centralPos - pVal);// , centralPos - pVal);
+				float cDist = glm::length(centralCol - cVal);// , centralCol - cVal);
+
+				// get the weights based on these distances
+				// and our input values
+				float nw = min(exp(-1.0f * nDist / normalWeight), 1.0f);
+				float pw = min(exp(-1.0f * pDist / posWeight), 1.0f);
+				float cw = min(exp(-1.0f * cDist / colorWeight), 1.0f);
+
+				// get the overall 
+				float w = nw * pw * cw;
+
+				colSum += cVal * w * gVal;
+				wSum += w * gVal;
+			}
+		}
+
+        //bring denoise
+        volatile float3 foo = make_float3(colSum.x, colSum.y, colSum.z);
+        volatile float3 bar = make_float3(centralCol.x, centralCol.y, centralCol.z);
+        dnImage[index] = colSum / wSum;
+        //dnImage[index] = colSum / (256.0f * steps);
+    }
+}
+
 /**
  * Wrapper for the __global__ call that sets up the kernel calls and does a ton
  * of memory management
@@ -399,6 +520,25 @@ void pathtrace(int frame, int iter) {
 	finalGather<<<numBlocksPixels, blockSize1d>>>(num_paths, dev_image, dev_paths);
 
     ///////////////////////////////////////////////////////////////////////////
+    if (*hst_scene->state.denoiseSettings->denoise){
+
+        float nWeight = pow(*hst_scene->state.denoiseSettings->normalWeight, 2);
+        float pWeight = pow(*hst_scene->state.denoiseSettings->positionWeight, 2);
+        float cWeight = pow(*hst_scene->state.denoiseSettings->colorWeight, 2);
+
+        int steps = *hst_scene->state.denoiseSettings->filterSize / 5;
+        for (int step = 1; step <= steps; step++) {
+			denoise <<<numBlocksPixels, blockSize1d>>>(num_paths, 
+													   dev_gBuffer, 
+													   dev_image, 
+													   dev_dnImage, 
+													   step, 
+													   cam.resolution.x,
+													   nWeight, 
+													   pWeight,
+													   cWeight);
+        }
+    }
 
     // CHECKITOUT: use dev_image as reference if you want to implement saving denoised images.
     // Otherwise, screenshots are also acceptable.
@@ -421,6 +561,17 @@ void showGBuffer(uchar4* pbo) {
     gbufferToPBO<<<blocksPerGrid2d, blockSize2d>>>(pbo, cam.resolution, dev_gBuffer);
 }
 
+void showDenoise(uchar4* pbo, int iter) {
+    const Camera &cam = hst_scene->state.camera;
+    const dim3 blockSize2d(8, 8);
+    const dim3 blocksPerGrid2d(
+            (cam.resolution.x + blockSize2d.x - 1) / blockSize2d.x,
+            (cam.resolution.y + blockSize2d.y - 1) / blockSize2d.y);
+
+    // CHECKITOUT: process the gbuffer results and send them to OpenGL buffer for visualization
+    sendImageToPBO<<<blocksPerGrid2d, blockSize2d>>>(pbo, cam.resolution, iter, dev_dnImage);
+}
+
 void showImage(uchar4* pbo, int iter) {
 const Camera &cam = hst_scene->state.camera;
     const dim3 blockSize2d(8, 8);

diff --git a/src/pathtrace.h b/src/pathtrace.h
@@ -8,3 +8,4 @@ void pathtraceFree();
 void pathtrace(int frame, int iteration);
 void showGBuffer(uchar4 *pbo);
 void showImage(uchar4 *pbo, int iter);
+void showDenoise(uchar4 *pbo, int iter);