CIS565-Fall-2022 · NicholasMoon · Oct 21, 2022 · Oct 21, 2022 · Oct 21, 2022 · Oct 21, 2022
diff --git a/README.md b/README.md
@@ -3,11 +3,265 @@ CUDA Denoiser For CUDA Path Tracer
 
 **University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 4**
 
-* (TODO) YOUR NAME HERE
-* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
+* Nick Moon
+  * [LinkedIn](https://www.linkedin.com/in/nick-moon1/), [personal website](https://nicholasmoon.github.io/)
+* Tested on: Windows 10, AMD Ryzen 9 5900HS @ 3.0GHz 32GB, NVIDIA RTX 3060 Laptop 6GB (Personal Laptop)
 
-### (TODO: Your README)
 
-*DO NOT* leave the README to the last minute! It is a crucial part of the
-project, and we will not be able to grade you without a good README.
+**This project is an implementation of the Edge-Avoiding �-Trous Wavelet Transform for Fast Global
+Illumination Filtering.
+This denoising algorithm uses a style of gaussian blurring to smooth noisy parts of 
+the render, while smartly detecting edges with G-Buffer values stored during path-tracing.
+This allows for segmented denoising that preserves object boundaries.**
 
+## RESULTS
+
+
+| Denoised 1 SPP   | Denoised 100 SPP | Denoised 1000 SPP |
+| ----------- | ----------- |  ----------- |
+| ![](img/results/render_denoised_1.PNG)      |   ![](img/results/render_denoised_100.PNG)     | ![](img/results/render_denoised_1000.PNG) |
+
+| Original 1 SPP   | Original 100 SPP | Original 1000 SPP |
+| ----------- | ----------- |  ----------- |
+| ![](img/results/render_1.PNG)      |   ![](img/results/render_100.PNG)     | ![](img/results/render_1000.PNG) |
+
+Adding denoising to these renders only incurred an additional constant 27ms of runtime, no
+matter how many path tracing iterations were used!
+
+
+## IMPLEMENTATION
+
+### Gaussian Blur and Filtering
+
+As a small introduction, the core of the denoising algorithm is based on filters/kernels. These
+are a collection of values that describe weighting around a center pixel ```p```. 
+So, for example, if you have a 5x5
+kernel ```k``` and are at pixel ```p``` then the middle element ```k[2][2]``` will be multiplied
+by the value at pixel p, and the result accumulated for each of the 25 pixels around ```p```.
+
+Below is an example of a kernel generated with the gaussian function (from Wikipedia):
+
+![](img/figures/gaussiankernel.PNG)
+
+Applying the kernel to every pixel in an image will result in blur (via Krita Art Application) like the image below:
+
+| Original   | Blurred |
+| ----------- | ----------- |
+| ![](img/results/iteration_1.PNG)      |   ![](img/figures/gaussian_blur.PNG)     |
+
+
+
+### �-Trous Wavelet Transform
+
+The �-Trous Wavelet Transform described in the paper is a filter similar to the gaussian kernel, 
+but optimized. Instead of having a kernel size that grows quadratically with the number of
+pixels desired to be sampled, the �-Trous Wavelet Transform instead reuses the same kernel,
+for example a 5x5 like used in this project, but performs multiple iterations of denoising
+using exponentially greater offsets between pixels sampled each time. This allows for a larger
+neighborhood of pixels to be sampled without significantly increasing the amount of computation
+required. An illustration of this is shown in the below figure:
+
+![](img/figures/kerneloffsets.PNG)
+
+Below is also a demonstration of the �-Trous Wavelet Transform applied to a noisy path-traced
+cornell box render, without the edge detection described in the next section:
+
+| Kernel Size 1 (1 iter)    | Kernel Size 4 (3 iter) | Kernel Size 16 (5 iter) | Kernel Size 64 (7 iter) |
+| ----------- | ----------- |  ----------- |  ----------- |
+| ![](img/results/no_edge_detection_filter1.PNG)      |   ![](img/results/no_edge_detection_filter3.PNG)     | ![](img/results/no_edge_detection_filter5.PNG) | ![](img/results/no_edge_detection_filter7.PNG) |
+
+As can be seen, this looks very similar to the pure gaussian blur from the previous section.
+It just blurs the entire screen, and it would be rare to describe it as "denoising".
+
+Specifically, the offset between pixels for each iteration ```i``` of the kernel is ```2^i```.
+
+In order to implement this blurring operation, I needed the blur filter, which was 5x5,
+a filter offset array, which was also 5x5, and two buffers of vec3s that stored the color
+information between blur processes and were eventually written to the openGL PBO to be rendered.
+I needed to buffers because I needed to ping-pong between them between iterations of the
+denoising kernel.
+
+### Edge Detection
+
+#### G-Buffer
+
+The edge detection process uses the positions and normals at the intersection of the 
+camera rays associated with each pixel. So, in order to have this information available to us
+to use in post process, we need to create a new geometry buffer (G-Buffer) to store relevant 
+information at each pixel.
+
+Below you can see a visualization of the data collected in this G-Buffer for a simple scene:
+
+| Position Buffer      | Normal Buffer | Depth Buffer |
+| ----------- | ----------- | ----------- |
+| ![](img/results/pos_buffer.PNG)     | ![](img/results/nor_buffer.PNG)       | ![](img/results/depth_buffer.PNG) |
+
+#### Edge Detecting with Weights
+
+Edge detection is perfomed by using the source path-traced image (i.e. per-pixel color information),
+per-pixel intersection world space positions, and per-pixel intersection world space normals.
+At a certain pixel ```p```, the squared distance between ```p's``` position, normals, and 
+color information and one of ```p's``` neighbors (what index into the filter the process is in)
+is calculated. Then weighting terms for these three components (color, position, and normal)
+are calculated using an exponential function. The three weights are multiplied together
+to get a combined weight for this pixel comparison. The combined weight is then
+multiplied by the filter value and the offset pixel value to get the contribution at that offset.
+The weight is also accumulated by multiplying it by the filter value and adding it to a
+variable keeping track of the sum of weights. At the end, the accumulated color is divided by
+the accumulated weights to yield the final pixel color at ```p```, now denoised.
+
+While solving for the weight values for position, normal, and color, bias values are also included
+that are parameterizable by the user. This allow the artist to increase and decrease the scale of
+these individual components. Increasing the color bias value causes a greater amount of blur.
+Increasing the normal bias causes more smoothing along object boundaries where the per-pixel
+normal values have large change. Increasing the position bias causes more smoothing along
+object boundaries where one object is in front of another. Increasing the normal and position
+biases, along with the color bias, will cause the edge detection to fail (at least for the
+test scene), as the normal and position values have almost no impact now on the amount of blurring
+going on between objects.
+
+## Visual Analysis
+
+
+### Filter Size
+Below is a visual comparison of different filter sizes with edge detection. The number of
+sample-per-pixel is only 20, with a very high color weight.
+
+| Kernel Offset = 1     | Kernel Offset = 2 | Kernel Offset = 4 |
+| ----------- | ----------- |  ----------- |
+| ![](img/results/kernel_size_1_iter_1.PNG)      |   ![](img/results/kernel_size_2_iter_1.PNG)     | ![](img/results/kernel_size_3_iter_1.PNG) |
+
+| Kernel Offset = 8     | Kernel Offset = 16 | Kernel Offset = 32 |
+| ----------- | ----------- |  ----------- |
+| ![](img/results/kernel_size_4_iter_1.PNG)      |   ![](img/results/kernel_size_5_iter_1.PNG)     | ![](img/results/kernel_size_6_iter_1.PNG) |
+
+| Kernel Offset = 64     | Kernel Offset = 128 | Kernel Offset = 256 |
+| ----------- | ----------- |  ----------- |
+| ![](img/results/kernel_size_7_iter_1.PNG)      |   ![](img/results/kernel_size_8_iter_1.PNG)     | ![](img/results/kernel_size_9_iter_1.PNG) |
+
+I would not say that the visual quality scales uniformly with scale. At around kernel offset
+of 16 is about where the visable changes slow down significantly, so much so that I cannot
+really make it out with my eyes. Additionally, changing the kernel offset from 1 to 2 does not
+really make a large visual impact, but offsets 4, 8, and 16 have large changes. This is due to a mixture of
+the large magnitude of the noise in the source image, our eyes perception of that noise and
+the change in it from iteration to iteration, and also because the color weight of the denoising
+algorithm is cut in half for each denoising kernel call, which the authors used to help keep small
+scale detail. 
+
+### Different Material Types
+
+| 1 Iteration Not Denoised     | 1 Iteration Denoised | 5000 Iterations Not Denoised |
+| ----------- | ----------- |  ----------- |
+| ![](img/results/matcom_1iter.PNG)      |   ![](img/results/matcomp_denoised.PNG)     | ![](img/results/matcomp_5000iter.PNG) |
+
+As can be seen from the comparison above, the denoising algorithm actually struggles a bit with specular
+materials. While the edge detection handles the edges of the specular material quite well, the
+reflection on the surface appears rough and more like a microfacet material. This also does not
+go completely away until hundreds of iterations of path tracing are generated. Unlike the
+specular surface, the diffuse surface already scatters light in all directions randomly, so
+the smudging and blurring is MUCH less apparant. At one iteration and denoised the sphere
+almost looks good enough to consider converged.
+
+### Different Scenes
+
+|  | 1 Iteration  | 100 Iterations |
+| ----------- | ----------- | ----------- |
+| Smaller Ceiling Light     |   ![](img/results/cornell_1iter_denoised.PNG)     | ![](img/results/cornell_100iter_denoised.PNG) |
+| Larger Ceiling Light     |   ![](img/results/iteration_1_denoised.PNG)     | ![](img/results/iteration_100_denoised.PNG) |
+
+As can be seen above, the denoising algorithm struggles a lot more with the smaller light scene
+than the larger light scene. This is because the smaller light scene will naturally sample the
+light less times than the one with a larger light, because it is much more likely to hit the
+larger light while sending a ray in a random direction. This of course impacts the denoising,
+because at lower path-tracing iterations, this will mean more pixels are black, and also
+there will be much more variance between pixels. Both of these are bad for blurring, because
+blurring requires there to be at least some minimum amount of useful information to use
+without looking splotchy (almost like low iteration photon mapping).
+
+## Performance Analysis
+
+### Convergence
+Below shows renders of different samples-per-pixel before and after denoising:
+
+| 1 Iteration     | 5 Iterations  | 10 Iterations |
+| ----------- | ----------- |  ----------- |
+| Original     | Original  | Original |
+| ![](img/results/iteration_1.PNG)      |   ![](img/results/iteration_5.PNG)     | ![](img/results/iteration_10.PNG) |
+| Denoised     | Denoised  | Denoised |
+| ![](img/results/iteration_1_denoised.PNG)      |   ![](img/results/iteration_5_denoised.PNG)     | ![](img/results/iteration_10_denoised.PNG) |
+
+| 50 Iteration     | 100 Iterations  | 500 Iterations |
+| ----------- | ----------- |  ----------- |
+| Original     | Original  | Original |
+| ![](img/results/iteration_50.PNG)      |   ![](img/results/iteration_100.PNG)     | ![](img/results/iteration_500.PNG) |
+| Denoised     | Denoised  | Denoised |
+| ![](img/results/iteration_50_denoised.PNG)      |   ![](img/results/iteration_100_denoised.PNG)     | ![](img/results/iteration_500_denoised.PNG) |
+
+| 1000 Iteration     | 5000 Iterations  |
+| ----------- | ----------- |
+| Original     | Original  |
+| ![](img/results/iteration_1000.PNG)      |   ![](img/results/iteration_5000.PNG)     |
+| Denoised     | Denoised  |
+| ![](img/results/iteration_1000_denoised.PNG)      |   ![](img/results/iteration_5000_denoised.PNG)     |
+
+
+I would say that, for denoising, iteration 500 is about where I would say the results are "acceptably smooth".
+And what I mean by that, is that by iteration 500 the image not only looks like the background colors
+have smooth outed to a near converged look, but also that the specular sphere no longer looks smudged.
+In comparison, I think that between 1000-5000 iterations is where the none denoised render
+looks "acceptably smooth". Anything before that has the apparant path-tracing noise pattern.
+Here are the diff images for 500 iterations with and without denoising in comparison to
+the 5000 iteration result (with no denoising):
+
+| | Original     | Denoised  |
+| ----------- | ----------- | ----------- |
+| Render | ![](img/results/iteration_500.PNG)      |   ![](img/results/iteration_500_denoised.PNG)     |
+| Diff from 5000 iter | ![](img/results/diff_500.PNG)      |   ![](img/results/diff_500_denoised.PNG)     |
+
+### Varying Filter Size
+
+![](img/figures/denoise_runtime_vs_pt_iter.png)
+
+As can be seen from the graph above, the amount of time taken for the denoising algorithm
+with varying number of path tracing iterations is the same; this is because the denoising
+algorithm is only influenced by the resolution of the image to denoise and the size of the 
+convolution filter. In addition, each increase in filter size only results in a constant amount
+of additional runtime, about equal to the size of filter size 1. This is because the number of
+pixels we are sampling for each additional iteration of the denoising kernel is the same,
+as a result of the A Trous Wavelet. This means that this algorithm scales very well regardless
+of number of samples taken.
+
+### Path Tracing vs Denoising
+
+![](img/figures/pathtracing_v_denoising.png)
+
+As can be seen from the figure above, and having shown already that the denoising algorithm
+runtime is independent of the number of path tracing iterations and indeed constant at a
+certain filter size, the percentage of time taken to do the denoising operation vs. the actual
+path tracing decreases exponentially, and the dropoff is fast. This is because each path tracing
+iteration takes about 7 seconds to run, about the same time as the denoising algorithm. So, each
+additional iteration of path tracing cuts the percentage of time taken to do the denoising
+be around ```1 / iter + 1```.
+
+### Render Resolution
+
+![](img/figures/resolution.png)
+
+As can be seen from the figure above, the runtime of the denoising algorithm increases about
+quadratically with increased resolution (where width and height are the same). This is what we
+expect. Although the kernel size is constant for all of these data points, the number of pixels
+the GPU needs to run the code on increases quadratically as well.
+
+## References
+
+Edge-Avoiding A-Trous Wavelet Transform for fast Global Illumination Filtering:
+
+Paper: https://jo.dreggn.org/home/2010_atrous.pdf
+
+Presentation: https://www.highperformancegraphics.org/previous/www_2010/media/RayTracing_I/HPG2010_RayTracing_I_Dammertz.pdf
+
+Wikipedia Gaussian Blur: https://en.wikipedia.org/wiki/Gaussian_blur
+
+Filter used in the paper: https://www.eso.org/sci/software/esomidas/doc/user/18NOV/volb/node317.html
+
+Gaussian blur from Krita Application: https://krita.org/en/
diff --git a/img/figures/denoise_runtime_vs_pt_iter.png b/img/figures/denoise_runtime_vs_pt_iter.png
diff --git a/img/figures/gaussian_blur.PNG b/img/figures/gaussian_blur.PNG
diff --git a/img/figures/gaussiankernel.PNG b/img/figures/gaussiankernel.PNG
diff --git a/img/figures/kerneloffsets.PNG b/img/figures/kerneloffsets.PNG
diff --git a/img/figures/pathtracing_v_denoising.png b/img/figures/pathtracing_v_denoising.png
diff --git a/img/figures/resolution.png b/img/figures/resolution.png
diff --git a/img/results/ceiling_light.PNG b/img/results/ceiling_light.PNG
diff --git a/img/results/cornell_100iter_denoised.PNG b/img/results/cornell_100iter_denoised.PNG
diff --git a/img/results/cornell_1iter_denoised.PNG b/img/results/cornell_1iter_denoised.PNG
diff --git a/img/results/depth_buffer.PNG b/img/results/depth_buffer.PNG
diff --git a/img/results/diff_1000_denoised.PNG b/img/results/diff_1000_denoised.PNG
diff --git a/img/results/diff_500.PNG b/img/results/diff_500.PNG
diff --git a/img/results/diff_500_denoised.PNG b/img/results/diff_500_denoised.PNG
diff --git a/img/results/iteration_1.PNG b/img/results/iteration_1.PNG
diff --git a/img/results/iteration_10.PNG b/img/results/iteration_10.PNG
diff --git a/img/results/iteration_100.PNG b/img/results/iteration_100.PNG
diff --git a/img/results/iteration_1000.PNG b/img/results/iteration_1000.PNG
diff --git a/img/results/iteration_1000_denoised.PNG b/img/results/iteration_1000_denoised.PNG
diff --git a/img/results/iteration_100_denoised.PNG b/img/results/iteration_100_denoised.PNG
diff --git a/img/results/iteration_10_denoised.PNG b/img/results/iteration_10_denoised.PNG
diff --git a/img/results/iteration_1_denoised.PNG b/img/results/iteration_1_denoised.PNG
diff --git a/img/results/iteration_5.PNG b/img/results/iteration_5.PNG
diff --git a/img/results/iteration_50.PNG b/img/results/iteration_50.PNG
diff --git a/img/results/iteration_500.PNG b/img/results/iteration_500.PNG
diff --git a/img/results/iteration_5000.PNG b/img/results/iteration_5000.PNG
diff --git a/img/results/iteration_5000_denoised.PNG b/img/results/iteration_5000_denoised.PNG
diff --git a/img/results/iteration_500_denoised.PNG b/img/results/iteration_500_denoised.PNG
diff --git a/img/results/iteration_50_denoised.PNG b/img/results/iteration_50_denoised.PNG
diff --git a/img/results/iteration_5_denoised.PNG b/img/results/iteration_5_denoised.PNG
diff --git a/img/results/kernel_size_10_iter_20.PNG b/img/results/kernel_size_10_iter_20.PNG
diff --git a/img/results/kernel_size_1_iter_1.PNG b/img/results/kernel_size_1_iter_1.PNG
diff --git a/img/results/kernel_size_1_iter_20.PNG b/img/results/kernel_size_1_iter_20.PNG
diff --git a/img/results/kernel_size_2_iter_1.PNG b/img/results/kernel_size_2_iter_1.PNG
diff --git a/img/results/kernel_size_2_iter_20.PNG b/img/results/kernel_size_2_iter_20.PNG
diff --git a/img/results/kernel_size_3_iter_1.PNG b/img/results/kernel_size_3_iter_1.PNG
diff --git a/img/results/kernel_size_3_iter_20.PNG b/img/results/kernel_size_3_iter_20.PNG
diff --git a/img/results/kernel_size_4_iter_1.PNG b/img/results/kernel_size_4_iter_1.PNG
diff --git a/img/results/kernel_size_4_iter_20.PNG b/img/results/kernel_size_4_iter_20.PNG
diff --git a/img/results/kernel_size_5_iter_1.PNG b/img/results/kernel_size_5_iter_1.PNG
diff --git a/img/results/kernel_size_5_iter_20.PNG b/img/results/kernel_size_5_iter_20.PNG
diff --git a/img/results/kernel_size_6_iter_1.PNG b/img/results/kernel_size_6_iter_1.PNG
diff --git a/img/results/kernel_size_6_iter_20.PNG b/img/results/kernel_size_6_iter_20.PNG
diff --git a/img/results/kernel_size_7_iter_1.PNG b/img/results/kernel_size_7_iter_1.PNG
diff --git a/img/results/kernel_size_7_iter_20.PNG b/img/results/kernel_size_7_iter_20.PNG
diff --git a/img/results/kernel_size_8_iter_1.PNG b/img/results/kernel_size_8_iter_1.PNG
diff --git a/img/results/kernel_size_8_iter_20.PNG b/img/results/kernel_size_8_iter_20.PNG
diff --git a/img/results/kernel_size_9_iter_1.PNG b/img/results/kernel_size_9_iter_1.PNG
diff --git a/img/results/kernel_size_9_iter_20.PNG b/img/results/kernel_size_9_iter_20.PNG
diff --git a/img/results/matcom_1iter.PNG b/img/results/matcom_1iter.PNG
diff --git a/img/results/matcomp_5000iter.PNG b/img/results/matcomp_5000iter.PNG
diff --git a/img/results/matcomp_denoised.PNG b/img/results/matcomp_denoised.PNG
diff --git a/img/results/no_edge_detection_filter1.PNG b/img/results/no_edge_detection_filter1.PNG
diff --git a/img/results/no_edge_detection_filter3.PNG b/img/results/no_edge_detection_filter3.PNG
diff --git a/img/results/no_edge_detection_filter5.PNG b/img/results/no_edge_detection_filter5.PNG
diff --git a/img/results/no_edge_detection_filter7.PNG b/img/results/no_edge_detection_filter7.PNG
diff --git a/img/results/nor_buffer.PNG b/img/results/nor_buffer.PNG
diff --git a/img/results/plain_blur_i1.PNG b/img/results/plain_blur_i1.PNG
diff --git a/img/results/plain_blur_i10.PNG b/img/results/plain_blur_i10.PNG
diff --git a/img/results/plain_blur_i2.PNG b/img/results/plain_blur_i2.PNG
diff --git a/img/results/plain_blur_i3.PNG b/img/results/plain_blur_i3.PNG
diff --git a/img/results/plain_blur_i4.PNG b/img/results/plain_blur_i4.PNG
diff --git a/img/results/plain_blur_i5.PNG b/img/results/plain_blur_i5.PNG
diff --git a/img/results/plain_blur_i6.PNG b/img/results/plain_blur_i6.PNG
diff --git a/img/results/plain_blur_i7.PNG b/img/results/plain_blur_i7.PNG
diff --git a/img/results/plain_blur_i8.PNG b/img/results/plain_blur_i8.PNG
diff --git a/img/results/plain_blur_i9.PNG b/img/results/plain_blur_i9.PNG
diff --git a/img/results/pos_buffer.PNG b/img/results/pos_buffer.PNG
diff --git a/img/results/render_1.PNG b/img/results/render_1.PNG
diff --git a/img/results/render_100.PNG b/img/results/render_100.PNG
diff --git a/img/results/render_1000.PNG b/img/results/render_1000.PNG
diff --git a/img/results/render_denoised_1.PNG b/img/results/render_denoised_1.PNG
diff --git a/img/results/render_denoised_100.PNG b/img/results/render_denoised_100.PNG
diff --git a/img/results/render_denoised_1000.PNG b/img/results/render_denoised_1000.PNG
diff --git a/scenes/cornell.txt b/scenes/cornell.txt
@@ -51,11 +51,11 @@ EMITTANCE   0
 // Camera
 CAMERA
 RES         800 800
-FOVY        45
-ITERATIONS  5000
+FOVY        19.5
+ITERATIONS  100
 DEPTH       8
 FILE        cornell
-EYE         0.0 5 10.5
+EYE         0.0 5 19.0
 LOOKAT      0 5 0
 UP          0 1 0
 

diff --git a/scenes/cornell_ceiling_light.txt b/scenes/cornell_ceiling_light.txt
@@ -51,11 +51,11 @@ EMITTANCE   0
 // Camera
 CAMERA
 RES         800 800
-FOVY        45
-ITERATIONS  10
+FOVY        19.5
+ITERATIONS  5000
 DEPTH       8
 FILE        cornell
-EYE         0.0 5 10.5
+EYE         0.0 5 19.0
 LOOKAT      0 5 0
 UP          0 1 0