Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 48 additions & 29 deletions INSTRUCTION.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ more recent papers on denoising, such as "Spatiotemporal Variance-Guided Filteri
## Part 2 - A-trous wavelet filter

Implement the A-trous wavelet filter from the paper. :shrug:

* Refraction (e.g. glass/water) [PBRT 8.2] with Frensel effects using [Schlick's approximation](https://en.wikipedia.org/wiki/Schlick's_approximation) or more accurate methods [PBRT 8.5]. You can use `glm::refract` for Snell's law.
It's always good to break down techniques into steps that you can individually verify.
Such a breakdown for this paper could include:
1. add UI controls to your project - we've done this for you in this base code, but see `Base Code Tour`
Expand All @@ -111,11 +111,11 @@ Such a breakdown for this paper could include:
1. use the G-Buffers to preserve perceived edges
1. tune parameters to see if they respond in ways that you expect
1. test more advanced scenes
* Physically-based depth-of-field (by jittering rays within an aperture). [PBRT 6.2.3]

## Base Code Tour

* Overview write-up of the feature
This base code is derived from Project 3. Some notable differences:

* How might this feature be optimized beyond your current implementation?
* `src/pathtrace.cu` - we've added functions `showGBuffer` and `showImage` to help you visualize G-Buffer info and your denoised results. There's also a `generateGBuffer` kernel on the first bounce of `pathtrace`.
* `src/sceneStructs.h` - there's a new `GBufferPixel` struct
* the term G-buffer is more common in the world of rasterizing APIs like OpenGL or WebGL, where many G-buffers may be needed due to limited pixel channels (RGB, RGBA)
Expand All @@ -124,7 +124,7 @@ This base code is derived from Project 3. Some notable differences:
* `src/main.h` and `src/main.cpp` - we've added a bunch of `ui_` variables - these connect to the UI sliders in `src/preview.cpp`, and let you toggle between `showGBuffer` and `showImage`, among other things.
* `scenes` - we've added `cornell_ceiling_light.txt`, which uses a much larger light and fewer iterations. This can be a good scene to start denoising with, since even in the first iteration many rays will terminate at the light.
* As usual, be sure to search across the project for `CHECKITOUT` and `TODO`

* If a primitive spans more than one leaf cell in the datastructure, it is sufficient for this project to count the primitive in each leaf cell.
Note that the image saving functionality isn't hooked up to gbuffers or denoised images yet - you may need to do this yourself, but doing so will be considerably more usable than screenshotting every image.

There's also a couple specific git commits that you can look at for guidance on how to add some of these changes to your own pathtracer, such as `imgui`. You can view these changes on the command line using `git diff [commit hash]`, or on github, for example: https://github.com/CIS565-Fall-2020/Project4-CUDA-Denoiser/commit/0857d1f8f477a39a9ba28a1e0a584b79bd7ec466
Expand All @@ -140,11 +140,18 @@ The point of denoising is to reduce the number of samples-per-pixel/pathtracing
* how denoising influences the number of iterations needed to get an "acceptably smooth" result
* how denoising at different resolutions impacts runtime
* how varying filter sizes affect performance

This project uses GLM for linear algebra.
In addition to the above, you should also analyze your denoiser on a qualitative level:
* how visual results vary with filter size -- does the visual quality scale uniformly with filter size?
* how effective/ineffective is this method with different material types
* how do results compare across different scenes - for example, between `cornell.txt` and `cornell_ceiling_light.txt`. Does one scene produce better denoised results? Why or why not?
On NVIDIA cards pre-Fermi (pre-DX12), you may have issues with mat4-vec4 multiplication. If you have one of these cards, be careful! If you have issues, you might need to grab `cudamat4` and `multiplyMV` from the [Fall 2014 project](https://github.com/CIS565-Fall-2014/Project3-Pathtracer).

Let us know if you need to do this.

### Scene File Format

This project uses a custom scene description format. Scene files are flat text files that describe all geometry, materials, lights, cameras, and render settings inside of the scene. Items in the format are delimited by new lines, and comments can be added using C-style `// comments`.

Note that "acceptably smooth" is somewhat subjective - we will leave the means for image comparison up to you, but image diffing tools may be a good place to start, and can help visually convey differences between two images.

Expand All @@ -157,29 +164,39 @@ The following extra credit items are listed roughly in order of level-of-effort,
## G-Buffer optimization

When starting out with gbuffers, it's probably easiest to start storing per-pixel positions and normals as glm::vec3s. However, this can be a decent amount of per-pixel data, which must be read from memory.

Implement methods to store positions and normals more compactly. Two places to start include:
* storing Z-depth instead of position, and reconstruct position based on pixel coordinates and an inverted projection matrix
* oct-encoding normals: http://jcgt.org/published/0003/02/01/paper.pdf

Be sure to provide performance comparison numbers between optimized and unoptimized implementations.

## Comparing A-trous and Gaussian filtering

Dammertz-et-al mention in their section 2.2 that A-trous filtering is a means for approximating gaussian filtering. Implement gaussian filtering and compare with A-trous to see if one method is significantly faster. Also note any visual differences in your results.

## Shared Memory Filtering
Be sure to provide performance comparison numbers between optimized and unoptimized implementations.

* Use of any third-party code must be approved by asking on our Piazza.
* If it is approved, all students are welcome to use it. Generally, we approve use of third-party code that is not a core part of the project. For example, for the path tracer, we would approve using a third-party library for loading models, but would not approve copying and pasting a CUDA function for doing refraction.
Filtering techniques can be somewhat memory-expensive - for each pixel, the technique reads several neighboring pixels to compute a final value. This only gets more expensive with the aditional data in G-Buffers, so these tecniques are likely to benefit from shared memory.

* Sell your project.
* Assume the reader has a little knowledge of path tracing - don't go into
detail explaining what it is. Focus on your project.
* Don't talk about it like it's an assignment - don't say what is and isn't
## Implement Temporal Sampling
Be sure to provide performance comparison numbers between implementations with and without shared memory.
Also pay attention to how shared memory use impacts the block size for your kernels, and how this may change as the filter width changes.

## Implement Temporal Sampling
This will require additional buffers, as well as reprojection code to move samples from where they were in a previous frame to the current frame.
* You wil not be graded on how fast your path tracer runs, but getting close to
real-time is always nice!
* If you have a fast GPU renderer, it is very good to show case this with a
video to show interactivity. If you do so, please include a link!

High-performance raytracers in dynamic applications (like games, or real-time visualization engines) now often use temporal sampling, borrowing and repositioning samples from previous frames so that each frame effectively only computes 1 sample-per-pixel but can denoise from many frames.

This will require additional buffers, as well as reprojection code to move samples from where they were in a previous frame to the current frame.
* Stream compaction helps most after a few bounces. Print and plot the
If you have modified any of the `CMakeLists.txt` files at all (aside from the list of `SOURCE_FILES`), mentions it explicity. Beware of any build issues discussed on the Piazza.
* Compare scenes which are open (like the given cornell box) and closed
(i.e. no light can escape the scene). Again, compare the performance effects

The title should be "Project 4: YOUR NAME".
terminate, so what might you expect?
* For optimizations that target specific kernels, we recommend using
stacked bar graphs to convey total execution time and improvements in
individual kernels. For example:

Note that our basic pathtracer doesn't do animation, so you will also need to implement some kind of dynamic aspect in your scene - this may be as simple as an automated panning camera, or as complex as translating models.

Expand All @@ -188,24 +205,26 @@ See https://research.nvidia.com/publication/2017-07_Spatiotemporal-Variance-Guid
Submission
===

If you have modified any of the `CMakeLists.txt` files at all (aside from the list of `SOURCE_FILES`), mentions it explicity. Beware of any build issues discussed on the Piazza.

If you have modified any of the `CMakeLists.txt` files at all (aside from the
* [Edge-Avoiding A-Trous Wavelet Transform for fast Global Illumination Filtering](https://jo.dreggn.org/home/2010_atrous.pdf)
* [Spatiotemporal Variance-Guided Filtering](https://research.nvidia.com/publication/2017-07_Spatiotemporal-Variance-Guided-Filtering%3A)
* [A Survey of Efficient Representations for Independent Unit Vectors](http://jcgt.org/published/0003/02/01/paper.pdf)
* ocornut/imgui - https://github.com/ocornut/imgui
Open a GitHub pull request so that we can see that you have finished.
The title should be "Project 3: YOUR NAME".

The title should be "Project 4: YOUR NAME".
The template of the comment section of your pull request is attached below, you can do some copy and paste:

* [Repo Link](https://link-to-your-repo)
* (Briefly) Mentions features that you've completed. Especially those bells and whistles you want to highlight
* Feature 0
* Feature 1
* ...
* Feature 0
* Feature 1
* ...
* Feedback on the project itself, if any.

References
===

* [Edge-Avoiding A-Trous Wavelet Transform for fast Global Illumination Filtering](https://jo.dreggn.org/home/2010_atrous.pdf)
* [Spatiotemporal Variance-Guided Filtering](https://research.nvidia.com/publication/2017-07_Spatiotemporal-Variance-Guided-Filtering%3A)
* [A Survey of Efficient Representations for Independent Unit Vectors](http://jcgt.org/published/0003/02/01/paper.pdf)
* ocornut/imgui - https://github.com/ocornut/imgui
* [PBRT] Physically Based Rendering, Second Edition: From Theory To Implementation. Pharr, Matt and Humphreys, Greg. 2010.
* Antialiasing and Raytracing. Chris Cooksey and Paul Bourke, http://paulbourke.net/miscellaneous/aliasing/
* [Sampling notes](http://graphics.ucsd.edu/courses/cse168_s14/) from Steve Rotenberg and Matteo Mannino, University of California, San Diego, CSE168: Rendering Algorithms
Binary file added Performance Analysis.xlsx
Binary file not shown.
70 changes: 65 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,71 @@ CUDA Denoiser For CUDA Path Tracer

**University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 4**

* (TODO) YOUR NAME HERE
* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
* Yuxuan Zhu
* [LinkedIn](https://www.linkedin.com/in/andrewyxzhu/)
* Tested on: Windows 10, i7-7700HQ @ 2.80GHz 16GB, GTX 1050 4096MB (Personal Laptop)

### (TODO: Your README)
## Demo

*DO NOT* leave the README to the last minute! It is a crucial part of the
project, and we will not be able to grade you without a good README.
![Demo](img/Final.JPG)

## Introduction

This is a GPU based path tracer with denoising capability. By performing edge aware filtering and denoising, we can dramatically improve image quality with low samples per pixel and reduce the number of samples per pixel required to generate an acceptably smooth image. This is based on the paper: Edge-Avoiding À-TrousWavelet Transform for fast Global Illumination Filtering. For the demo scene, we are able to get a smooth image with just 50 iterations.


## Performance Analysis

The path tracer applies denoising after all the iteration has finished since this can be considered as a post processing step and it is more efficient. Therefore, the amount of time denoising takes is constant with respect to the number of iterations and can be amortized to 0 as the number of iterations go up. The run time of denoising without optimization is on the order of 10ms. I decided to put all the filter weights and offsets in cuda constant memory which allows for broadcasting and faster access.

Denoising drastically reduces the number of iterations needed to get an acceptably smooth image. For the simple cornell box scene, the number of iteration required roughly decreases from 200 iterations to 20 iterations.

Original (200 iter) | Denoised (20 iter)
:-------------------------:|:-------------------------:
![1iter](img/200original.JPG) | ![1iterDenoise](img/20denoise.JPG)

**Denoising and Resolution**

For a constant filter size of 20x20, increasing the resolution increases the denoising runtime. Denoising runtime is roughly quadratic with respect to the horizontal resolution. This makes sense because the runtime is essentially linear with the number of pixels.

![resolution](img/resolution.png)

**Denoising and Filter Size**

For a fixed resolution of 800x800, increasing the filter size increases the denoising runtime. Denoising runtime is sub linear with respect to the filter side length. This also makes sense because we don't sample each pixel in the area defined by the filter. Instead we take limited samples and the larger the filter the sparser the samples. Larger kernel size also produces more blurry images since we are taking weighted averages of pixels of a larger area.

![filter](img/filtersize.png)


**Qualitative Comparison**

As shown below, there is diminishing return on applying denoising as the number of samples per pixel increases. The denoised images from images with fewer samples are slightly more blurry since there are a lot of noise which requires extra smoothing. There is sometimes
a tradeoff between reducing noise and preserving detail. This tradeoff is even more prominent if we are using a purely Gaussian kernel.

Original (1/2/4/8/16/1000 iter) | Denoised (1/2/4/8/16/1000 iter)
:-------------------------:|:-------------------------:
![1iter](img/1iter.JPG) | ![1iterDenoise](img/1iterDenoise.JPG)
![1iter](img/2iter.JPG) | ![1iterDenoise](img/2iterDenoise.JPG)
![1iter](img/4iter.JPG) | ![1iterDenoise](img/4iterDenoise.JPG)
![1iter](img/8iter.JPG) | ![1iterDenoise](img/8iterDenoise.JPG)
![1iter](img/16iter.JPG) | ![1iterDenoise](img/16iterDenoise.JPG)
![1iter](img/1000iter.JPG) | ![1iterDenoise](img/1000iterDenoise.JPG)

As shown below, the larger the kernel size, the smoother the image is. However, sometimes the finer details will also be smoothed out. Denoising works perfectly for diffuse surfaces like walls. It is less ideal for materials like mirror spheres because the colors on the mirror sphere are less correlated due to reflection. It is also more difficult to determine edges on a reflected sphere surface.

Original | 5x5 Filter | 20x20 Filter | 60x60 Filter | 100x100 Filter
:-------------------------:|:-------------------------:|:-------------------------:|:-------------------------:|:-------------------------:
![0filter](img/0filter.JPG) | ![1filter](img/5filter.JPG) | ![20filter](img/20filter.JPG) | ![60filter](img/60filter.JPG) | ![100filter](img/100filter.JPG)

Denoising also performs differently on different scenes. There are several reasons. The first is that different scenes have differnt material compositions which may make denoising difficult. The second reason is that scenes with bigger lights usually has less noise since light rays are more likely to terminate at a light source. Denoising also performs better in this case.

Small Light | Big Light | Lots of Reflection
:-------------------------:|:-------------------------:|:-------------------------:
![smallLight](img/cornell.JPG) | ![bigLight](img/20filter.JPG) | ![lotsofReflection](img/custom.JPG)

## Bloopers

![Blooper](img/blooper1.JPG)

This image looks like it has been terribly exposed and it is caused by the program not normalziing the ray traced image by the number of iterations.

8 changes: 5 additions & 3 deletions cmake/CUDAComputesList.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,8 @@ IF( CUDA_COMPUTE_20
OR CUDA_COMPUTE_70
OR CUDA_COMPUTE_72
OR CUDA_COMPUTE_75
OR CUDA_COMPUTE_80
OR CUDA_COMPUTE_86
)
SET(FALLBACK OFF)
ELSE()
Expand All @@ -70,8 +72,8 @@ LIST(LENGTH COMPUTES_DETECTED_LIST COMPUTES_LEN)
IF(${COMPUTES_LEN} EQUAL 0 AND ${FALLBACK})
MESSAGE(STATUS "You can use -DCOMPUTES_DETECTED_LIST=\"AB;XY\" (semicolon separated list of CUDA Compute versions to enable the specified computes")
MESSAGE(STATUS "Individual compute versions flags are also available under CMake Advance options")
LIST(APPEND COMPUTES_DETECTED_LIST "30" "50" "60" "70")
MESSAGE(STATUS "No computes detected. Fall back to 30, 50, 60 70")
LIST(APPEND COMPUTES_DETECTED_LIST "30" "50" "60" "70" "80")
MESSAGE(STATUS "No computes detected. Fall back to 30, 50, 60, 70, 80")
ENDIF()

LIST(LENGTH COMPUTES_DETECTED_LIST COMPUTES_LEN)
Expand All @@ -90,7 +92,7 @@ MACRO(SET_COMPUTE VERSION)
ENDMACRO(SET_COMPUTE)

# Iterate over compute versions. Create variables and enable computes if needed
FOREACH(VER 20 30 32 35 37 50 52 53 60 61 62 70 72 75)
FOREACH(VER 20 30 32 35 37 50 52 53 60 61 62 70 72 75 80 86)
OPTION(CUDA_COMPUTE_${VER} "CUDA Compute Capability ${VER}" OFF)
MARK_AS_ADVANCED(CUDA_COMPUTE_${VER})
IF(${CUDA_COMPUTE_${VER}})
Expand Down
96 changes: 48 additions & 48 deletions cmake/FindGLFW.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -20,66 +20,66 @@
include(FindPackageHandleStandardArgs)

if (WIN32)
# Find include files
find_path(
GLFW_INCLUDE_DIR
NAMES GLFW/glfw3.h
PATHS
$ENV{PROGRAMFILES}/include
${GLFW_ROOT_DIR}/include
DOC "The directory where GLFW/glfw.h resides")
# Find include files
find_path(
GLFW_INCLUDE_DIR
NAMES GLFW/glfw3.h
PATHS
$ENV{PROGRAMFILES}/include
${GLFW_ROOT_DIR}/include
DOC "The directory where GLFW/glfw.h resides")

# Use glfw3.lib for static library
if (GLFW_USE_STATIC_LIBS)
set(GLFW_LIBRARY_NAME glfw3)
else()
set(GLFW_LIBRARY_NAME glfw3dll)
endif()
# Use glfw3.lib for static library
if (GLFW_USE_STATIC_LIBS)
set(GLFW_LIBRARY_NAME glfw3)
else()
set(GLFW_LIBRARY_NAME glfw3dll)
endif()

# Find library files
find_library(
GLFW_LIBRARY
NAMES ${GLFW_LIBRARY_NAME}
PATHS
$ENV{PROGRAMFILES}/lib
${GLFW_ROOT_DIR}/lib)
# Find library files
find_library(
GLFW_LIBRARY
NAMES ${GLFW_LIBRARY_NAME}
PATHS
$ENV{PROGRAMFILES}/lib
${GLFW_ROOT_DIR}/lib)

unset(GLFW_LIBRARY_NAME)
unset(GLFW_LIBRARY_NAME)
else()
# Find include files
find_path(
GLFW_INCLUDE_DIR
NAMES GLFW/glfw.h
PATHS
/usr/include
/usr/local/include
/sw/include
/opt/local/include
DOC "The directory where GL/glfw.h resides")
# Find include files
find_path(
GLFW_INCLUDE_DIR
NAMES GLFW/glfw.h
PATHS
/usr/include
/usr/local/include
/sw/include
/opt/local/include
DOC "The directory where GL/glfw.h resides")

# Find library files
# Try to use static libraries
find_library(
GLFW_LIBRARY
NAMES glfw3
PATHS
/usr/lib64
/usr/lib
/usr/local/lib64
/usr/local/lib
/sw/lib
/opt/local/lib
${GLFW_ROOT_DIR}/lib
DOC "The GLFW library")
# Find library files
# Try to use static libraries
find_library(
GLFW_LIBRARY
NAMES glfw3
PATHS
/usr/lib64
/usr/lib
/usr/local/lib64
/usr/local/lib
/sw/lib
/opt/local/lib
${GLFW_ROOT_DIR}/lib
DOC "The GLFW library")
endif()

# Handle REQUIRD argument, define *_FOUND variable
find_package_handle_standard_args(GLFW DEFAULT_MSG GLFW_INCLUDE_DIR GLFW_LIBRARY)

# Define GLFW_LIBRARIES and GLFW_INCLUDE_DIRS
if (GLFW_FOUND)
set(GLFW_LIBRARIES ${OPENGL_LIBRARIES} ${GLFW_LIBRARY})
set(GLFW_INCLUDE_DIRS ${GLFW_INCLUDE_DIR})
set(GLFW_LIBRARIES ${OPENGL_LIBRARIES} ${GLFW_LIBRARY})
set(GLFW_INCLUDE_DIRS ${GLFW_INCLUDE_DIR})
endif()

# Hide some variables
Expand Down
Loading