diff --git a/README.md b/README.md
index f044c821..dc1a5c29 100644
--- a/README.md
+++ b/README.md
@@ -3,11 +3,77 @@ CUDA Denoiser For CUDA Path Tracer
**University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 4**
-* (TODO) YOUR NAME HERE
-* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
+* Shixuan Fang
+* Tested on: Windows 11, i7-12700kf, RTX 3080Ti
-### (TODO: Your README)
+# Project Overview
-*DO NOT* leave the README to the last minute! It is a crucial part of the
-project, and we will not be able to grade you without a good README.
+| Original | Denoised |
+| ------------- |:-------------:|
+|
|
|
+In this project, I implemented a pathtracing denoiser using A-trous wavelet filter.
+
+
+
+A-trous is an approximation of gaussian blur by iteratively applying sparse blurs of increasing size, which significantly reduce the number of pixel read.
+
+## G-Buffer
+
+In order to use A-trous filter to denoise the result while preseving edges, we need extra information other than the path traced image.
+
+|Normal|Position|
+|------|:-------:|
+|
|
|
+
+## Performance Analysis
+
+- how much time denoising adds to renders
+
+
+
+As seen in this figure, denoising only adds 2-3ms to the render, which is considered good enough for real-time rendering.
+
+- how denoising influences the number of iterations needed to get an "acceptably smooth" result
+
+|2spp|10spp|Ground Truth|
+|------|:-------:|:-------------:|
+|
|
|
|
+
+As seen in this table, 10spp with denoising is acceptably smooth compared to the ground truth(5000spp).
+
+- how denoising at different resolutions impacts runtime
+
+
+
+As seen in this chart, with the increasement of screen resolution, the runtime of denoising also increases. This makes sense as there are more pixels to be considered, and A-trous filter needs more iterations to finish the whole picture, therefore denoising takes more time.
+
+- how varying filter sizes affect performance
+
+
+
+As seen in this chart, when filter size increases the runtime of denoising also increases. This also caused by the same reason, which is with filter size increases, the A-trous filter needs more iterations, therefore increases denoising time.
+
+- how visual results vary with filter size
+
+|1|16|32|100|
+|------|:-------:|:--------:|:--------:|
+|
|
|
|
|
+
+This table shows how the denoised image looks with 1/16/32/80 filter size. The visual quality does increase from 1 to 16 and 16 to 32, but from 32 to 100 there aren't too much difference.
+
+- how effective/ineffective is this method with different material types
+
+|diffuse|specular|
+|------|:-------:|
+|
|
|
+
+As seen in this table, A-trous works better with diffuse object in my opinion, as with specular objects it will blur the reflection face, which will result in a very blury surface. This is because position buffer and normal buffer can't really help split the edges on the same surface.
+
+- how do results compare across different scenes
+
+|cornell|cornell_ceiling|
+|------|:-------:|
+|
|
|
+
+As seen in this table, with same parameters, cornell_ceiling scene looks better and smoother than the cornell scene. This is because it has a larger light source, therefore making path tracing converge faster than small light source. Better path traced image will definitely makes denoised image better.
diff --git a/img/1.jpg b/img/1.jpg
new file mode 100644
index 00000000..82be6048
Binary files /dev/null and b/img/1.jpg differ
diff --git a/img/1000.jpg b/img/1000.jpg
new file mode 100644
index 00000000..4107e44c
Binary files /dev/null and b/img/1000.jpg differ
diff --git a/img/10spp.png b/img/10spp.png
new file mode 100644
index 00000000..f7e20a92
Binary files /dev/null and b/img/10spp.png differ
diff --git a/img/1200.jpg b/img/1200.jpg
new file mode 100644
index 00000000..4fcbe91f
Binary files /dev/null and b/img/1200.jpg differ
diff --git a/img/16.jpg b/img/16.jpg
new file mode 100644
index 00000000..31cb0b97
Binary files /dev/null and b/img/16.jpg differ
diff --git a/img/2spp.png b/img/2spp.png
new file mode 100644
index 00000000..436c856f
Binary files /dev/null and b/img/2spp.png differ
diff --git a/img/32.jpg b/img/32.jpg
new file mode 100644
index 00000000..3443bcfc
Binary files /dev/null and b/img/32.jpg differ
diff --git a/img/80.jpg b/img/80.jpg
new file mode 100644
index 00000000..9796d11b
Binary files /dev/null and b/img/80.jpg differ
diff --git a/img/800.jpg b/img/800.jpg
new file mode 100644
index 00000000..b0fc6481
Binary files /dev/null and b/img/800.jpg differ
diff --git a/img/cornell.png b/img/cornell.png
new file mode 100644
index 00000000..337e2932
Binary files /dev/null and b/img/cornell.png differ
diff --git a/img/diffuse.png b/img/diffuse.png
new file mode 100644
index 00000000..b5bec78f
Binary files /dev/null and b/img/diffuse.png differ
diff --git a/img/gauss.png b/img/gauss.png
new file mode 100644
index 00000000..6d88d9ff
Binary files /dev/null and b/img/gauss.png differ
diff --git a/img/noise.png b/img/noise.png
new file mode 100644
index 00000000..163c3199
Binary files /dev/null and b/img/noise.png differ
diff --git a/img/normal.jpg b/img/normal.jpg
new file mode 100644
index 00000000..e9880534
Binary files /dev/null and b/img/normal.jpg differ
diff --git a/img/pos.jpg b/img/pos.jpg
new file mode 100644
index 00000000..95392571
Binary files /dev/null and b/img/pos.jpg differ
diff --git a/img/truch.png b/img/truch.png
new file mode 100644
index 00000000..bb70aa8a
Binary files /dev/null and b/img/truch.png differ
diff --git a/scenes/cornell_ceiling_light.txt b/scenes/cornell_ceiling_light.txt
index 15af5f19..ad7b6435 100644
--- a/scenes/cornell_ceiling_light.txt
+++ b/scenes/cornell_ceiling_light.txt
@@ -48,6 +48,16 @@ REFR 0
REFRIOR 0
EMITTANCE 0
+// Diffuse blue
+MATERIAL 5
+RGB .35 .35 .85
+SPECEX 0
+SPECRGB 0 0 0
+REFL 0
+REFR 0
+REFRIOR 0
+EMITTANCE 0
+
// Camera
CAMERA
RES 800 800
diff --git a/src/main.cpp b/src/main.cpp
index 4092ae4a..3ee37a10 100644
--- a/src/main.cpp
+++ b/src/main.cpp
@@ -30,6 +30,7 @@ float ui_normalWeight = 0.35f;
float ui_positionWeight = 0.2f;
bool ui_saveAndExit = false;
+
static bool camchanged = true;
static float dtheta = 0, dphi = 0;
static glm::vec3 cammove;
@@ -70,6 +71,7 @@ int main(int argc, char** argv) {
height = cam.resolution.y;
ui_iterations = renderState->iterations;
+ ui_iterations = 5000;
startupIterations = ui_iterations;
glm::vec3 view = cam.view;
@@ -165,7 +167,11 @@ void runCuda() {
pathtrace(frame, iteration);
}
- if (ui_showGbuffer) {
+ if (ui_denoise) {
+ Denoise_Image(ui_colorWeight, ui_normalWeight, ui_positionWeight, ui_filterSize);
+ showDenoisedImage(pbo_dptr, iteration);
+ }
+ else if (ui_showGbuffer) {
showGBuffer(pbo_dptr);
} else {
showImage(pbo_dptr, iteration);
diff --git a/src/main.h b/src/main.h
index 06d311a8..45afcb8c 100644
--- a/src/main.h
+++ b/src/main.h
@@ -42,6 +42,7 @@ extern float ui_normalWeight;
extern float ui_positionWeight;
extern bool ui_saveAndExit;
+
void runCuda();
void keyCallback(GLFWwindow *window, int key, int scancode, int action, int mods);
void mousePositionCallback(GLFWwindow* window, double xpos, double ypos);
diff --git a/src/pathtrace.cu b/src/pathtrace.cu
index 23e5f909..19720abe 100644
--- a/src/pathtrace.cu
+++ b/src/pathtrace.cu
@@ -14,8 +14,14 @@
#include "intersections.h"
#include "interactions.h"
+#include "device_launch_parameters.h"
+
#define ERRORCHECK 1
+#define gauss 1
+#define stride 25
+#define show_pos 0
+
#define FILENAME (strrchr(__FILE__, '/') ? strrchr(__FILE__, '/') + 1 : __FILE__)
#define checkCUDAError(msg) checkCUDAErrorFn(msg, FILENAME, __LINE__)
void checkCUDAErrorFn(const char *msg, const char *file, int line) {
@@ -67,7 +73,7 @@ __global__ void sendImageToPBO(uchar4* pbo, glm::ivec2 resolution,
}
}
-__global__ void gbufferToPBO(uchar4* pbo, glm::ivec2 resolution, GBufferPixel* gBuffer) {
+__global__ void gbufferToPBO_Normal(uchar4* pbo, glm::ivec2 resolution, GBufferPixel* gBuffer) {
int x = (blockIdx.x * blockDim.x) + threadIdx.x;
int y = (blockIdx.y * blockDim.y) + threadIdx.y;
@@ -75,10 +81,27 @@ __global__ void gbufferToPBO(uchar4* pbo, glm::ivec2 resolution, GBufferPixel* g
int index = x + (y * resolution.x);
float timeToIntersect = gBuffer[index].t * 256.0;
+ glm::vec3 nor = gBuffer[index].normal;
pbo[index].w = 0;
- pbo[index].x = timeToIntersect;
- pbo[index].y = timeToIntersect;
- pbo[index].z = timeToIntersect;
+ pbo[index].x = glm::clamp(abs((int)(nor.x * 255.f)), 0, 255);
+ pbo[index].y = glm::clamp(abs((int)(nor.y * 255.f)), 0, 255);
+ pbo[index].z = glm::clamp(abs((int)(nor.z * 255.f)), 0, 255);
+ }
+}
+
+__global__ void gbufferToPBO_Position(uchar4* pbo, glm::ivec2 resolution, GBufferPixel* gBuffer) {
+ int x = (blockIdx.x * blockDim.x) + threadIdx.x;
+ int y = (blockIdx.y * blockDim.y) + threadIdx.y;
+
+ if (x < resolution.x && y < resolution.y) {
+ int index = x + (y * resolution.x);
+
+ glm::vec3 position = gBuffer[index].position;
+ pbo[index].w = 0;
+ pbo[index].x = glm::clamp(abs(position.x * stride), 0.f, 255.f);
+ pbo[index].y = glm::clamp(abs(position.y * stride), 0.f, 255.f);
+ pbo[index].z = glm::clamp(abs(position.z * stride), 0.f, 255.f);
+
}
}
@@ -90,7 +113,19 @@ static PathSegment * dev_paths = NULL;
static ShadeableIntersection * dev_intersections = NULL;
static GBufferPixel* dev_gBuffer = NULL;
// TODO: static variables for device memory, any extra info you need, etc
-// ...
+
+static glm::vec3* dev_denoised_image = NULL;
+static glm::vec3* dev_denoised_buffer = NULL;
+static float* dev_gauss_kernal = NULL;
+
+//https://www.geeksforgeeks.org/gaussian-filter-generation-c/
+static float gauss_kernel[25] = {
+0.00296902, 0.0133062, 0.0219382, 0.0133062, 0.00296902,
+0.0133062, 0.0596343, 0.0983203, 0.0596343, 0.0133062,
+0.0219382, 0.0983203, 0.162103, 0.0983203, 0.0219382,
+0.0133062, 0.0596343, 0.0983203, 0.0596343, 0.0133062,
+0.00296902, 0.0133062, 0.0219382, 0.0133062, 0.00296902,
+};
void pathtraceInit(Scene *scene) {
hst_scene = scene;
@@ -114,6 +149,11 @@ void pathtraceInit(Scene *scene) {
cudaMalloc(&dev_gBuffer, pixelcount * sizeof(GBufferPixel));
// TODO: initialize any extra device memeory you need
+ cudaMalloc(&dev_denoised_image, pixelcount * sizeof(glm::vec3));
+ cudaMalloc(&dev_denoised_buffer, pixelcount * sizeof(glm::vec3));
+
+ cudaMalloc(&dev_gauss_kernal, 25 * sizeof(float));
+ cudaMemcpy(dev_gauss_kernal, gauss_kernel, 25 * sizeof(float), cudaMemcpyHostToDevice);
checkCUDAError("pathtraceInit");
}
@@ -126,6 +166,9 @@ void pathtraceFree() {
cudaFree(dev_intersections);
cudaFree(dev_gBuffer);
// TODO: clean up any extra device memory you created
+ cudaFree(dev_denoised_image);
+ cudaFree(dev_denoised_buffer);
+ cudaFree(dev_gauss_kernal);
checkCUDAError("pathtraceFree");
}
@@ -282,6 +325,9 @@ __global__ void generateGBuffer (
if (idx < num_paths)
{
gBuffer[idx].t = shadeableIntersections[idx].t;
+ gBuffer[idx].normal = shadeableIntersections[idx].surfaceNormal;
+ gBuffer[idx].position = getPointOnRay(pathSegments[idx].ray, shadeableIntersections[idx].t);
+
}
}
@@ -356,42 +402,42 @@ void pathtrace(int frame, int iter) {
// --- PathSegment Tracing Stage ---
// Shoot ray into scene, bounce between objects, push shading chunks
- // Empty gbuffer
- cudaMemset(dev_gBuffer, 0, pixelcount * sizeof(GBufferPixel));
+ // Empty gbuffer
+ cudaMemset(dev_gBuffer, 0, pixelcount * sizeof(GBufferPixel));
// clean shading chunks
cudaMemset(dev_intersections, 0, pixelcount * sizeof(ShadeableIntersection));
- bool iterationComplete = false;
+ bool iterationComplete = false;
while (!iterationComplete) {
- // tracing
- dim3 numblocksPathSegmentTracing = (num_paths + blockSize1d - 1) / blockSize1d;
- computeIntersections <<>> (
- depth
- , num_paths
- , dev_paths
- , dev_geoms
- , hst_scene->geoms.size()
- , dev_intersections
- );
- checkCUDAError("trace one bounce");
- cudaDeviceSynchronize();
-
- if (depth == 0) {
- generateGBuffer<<>>(num_paths, dev_intersections, dev_paths, dev_gBuffer);
- }
-
- depth++;
-
- shadeSimpleMaterials<<>> (
- iter,
- num_paths,
- dev_intersections,
- dev_paths,
- dev_materials
- );
- iterationComplete = depth == traceDepth;
+ // tracing
+ dim3 numblocksPathSegmentTracing = (num_paths + blockSize1d - 1) / blockSize1d;
+ computeIntersections <<>> (
+ depth
+ , num_paths
+ , dev_paths
+ , dev_geoms
+ , hst_scene->geoms.size()
+ , dev_intersections
+ );
+ checkCUDAError("trace one bounce");
+ cudaDeviceSynchronize();
+
+ if (depth == 0) {
+ generateGBuffer<<>>(num_paths, dev_intersections, dev_paths, dev_gBuffer);
+ }
+
+ depth++;
+
+ shadeSimpleMaterials<<>> (
+ iter,
+ num_paths,
+ dev_intersections,
+ dev_paths,
+ dev_materials
+ );
+ iterationComplete = depth == traceDepth;
}
// Assemble this iteration and apply it to the image
@@ -409,6 +455,109 @@ void pathtrace(int frame, int iter) {
checkCUDAError("pathtrace");
}
+__global__ void ATrousDenoise(float c_phi, float n_phi, float p_phi, glm::ivec2 resolution, int stepWidth, GBufferPixel* gBuffer,
+ glm::vec3* pt_image, glm::vec3* denoised_image)
+{
+ int x = blockDim.x * blockIdx.x + threadIdx.x;
+ int y = blockDim.y * blockIdx.y + threadIdx.y;
+ if (x < resolution.x && y < resolution.y) {
+ float cum_w = 0.0f;
+ glm::vec3 sum(0.f, 0.f, 0.f);
+
+ int index = y * resolution.x + x;
+ glm::vec3 nval = gBuffer[index].normal;
+ glm::vec3 pval = gBuffer[index].position;
+ glm::vec3 cval = pt_image[index];
+
+ float kernal[5] = { 0.0625, 0.25, 0.375, 0.25, 0.0625 };
+ for (int i = 0; i < 5; i++) {
+ for (int j = 0; j < 5; j++) {
+ glm::ivec2 offset;
+ offset.x = x + (i - 2) * stepWidth;
+ offset.y = y + (j - 2) * stepWidth;
+ offset = glm::clamp(offset, glm::ivec2(0, 0), glm::ivec2(resolution.x - 1, resolution.y - 1));
+
+ int tmp = offset.y * resolution.x + offset.x;
+
+ glm::vec3 ctmp = pt_image[tmp];
+
+ glm::vec3 t = cval - ctmp;
+
+ float dist2 = glm::dot(t, t);
+ float c_w = min(exp(-dist2 / (c_phi* c_phi)), 1.0f);
+
+ glm::vec3 ntmp = gBuffer[tmp].normal;
+ t = nval - ntmp;
+ dist2 = max(glm::dot(t, t) / (stepWidth * stepWidth), 0.0f);
+ float n_w = min(exp(-dist2 / (n_phi* n_phi)), 1.f);
+
+ glm::vec3 ptmp = gBuffer[tmp].position;
+ t = pval - ptmp;
+ dist2 = glm::dot(t, t);
+ float p_w = min(exp(-dist2 / (p_phi * n_phi)), 1.f);
+
+ float filter = kernal[i] * kernal[j];//NOT SURE WHETHER THIS KERNAL IS CORREST
+ float weight = c_w * n_w * p_w;
+ sum += ctmp * weight * filter;
+ cum_w += weight * filter;
+
+ }
+ }
+ denoised_image[index] = sum / cum_w;
+ }
+}
+
+__global__ void GaussBlur(glm::ivec2 resolution, int stepWidth, float* gauss_kernal,
+ glm::vec3* pt_image, glm::vec3* denoised_image)
+{
+ int x = blockDim.x * blockIdx.x + threadIdx.x;
+ int y = blockDim.y * blockIdx.y + threadIdx.y;
+ if (x < resolution.x && y < resolution.y) {
+ int index = y * resolution.x + x;
+ glm::vec3 sum(0.f, 0.f, 0.f);
+ for (int i = 0; i < 5; i++) {
+ for (int j = 0; j < 5; j++) {
+ glm::ivec2 offset;
+ offset.x = x + (i - 2) * stepWidth;
+ offset.y = y + (j - 2) * stepWidth;
+ offset = glm::clamp(offset, glm::ivec2(0, 0), glm::ivec2(resolution.x - 1, resolution.y - 1));
+
+ int tmp = offset.y * resolution.x + offset.x;
+
+ glm::vec3 ctmp = pt_image[tmp];
+ float weight = gauss_kernal[j * 5 + i];
+ sum += weight * ctmp;
+ }
+ }
+ denoised_image[index] = sum;
+ }
+}
+
+void Denoise_Image(float c_phi, float n_phi, float p_phi, float stepWidth)
+{
+ Camera& cam = hst_scene->state.camera;
+ glm::ivec2 resolution = cam.resolution;
+
+ const dim3 blockSize2d(8, 8);
+ const dim3 blocksPerGrid2d(
+ (cam.resolution.x + blockSize2d.x - 1) / blockSize2d.x,
+ (cam.resolution.y + blockSize2d.y - 1) / blockSize2d.y);
+
+ cudaMemcpy(dev_denoised_image, dev_image, resolution.x * resolution.y * sizeof(glm::vec3), cudaMemcpyDeviceToDevice);
+
+ int iteration = ceil(glm::log2(stepWidth));
+ for (int i = 0; i < iteration; i++) {
+#if gauss
+ GaussBlur << > > (resolution, 1 << i, dev_gauss_kernal, dev_denoised_image, dev_denoised_buffer);
+#else
+ ATrousDenoise << > > (c_phi, n_phi, p_phi, resolution, 1 << i, dev_gBuffer, dev_denoised_image, dev_denoised_buffer);
+
+#endif
+ std::swap(dev_denoised_buffer, dev_denoised_image);
+ }
+ cudaMemcpy(hst_scene->state.image.data(), dev_denoised_image, resolution.x * resolution.y * sizeof(glm::vec3), cudaMemcpyDeviceToHost);
+}
+
// CHECKITOUT: this kernel "post-processes" the gbuffer/gbuffers into something that you can visualize for debugging.
void showGBuffer(uchar4* pbo) {
const Camera &cam = hst_scene->state.camera;
@@ -418,7 +567,13 @@ void showGBuffer(uchar4* pbo) {
(cam.resolution.y + blockSize2d.y - 1) / blockSize2d.y);
// CHECKITOUT: process the gbuffer results and send them to OpenGL buffer for visualization
- gbufferToPBO<<>>(pbo, cam.resolution, dev_gBuffer);
+#if show_pos
+ gbufferToPBO_Position << > > (pbo, cam.resolution, dev_gBuffer);
+
+#else
+ gbufferToPBO_Normal <<>>(pbo, cam.resolution, dev_gBuffer);
+
+#endif
}
void showImage(uchar4* pbo, int iter) {
@@ -431,3 +586,14 @@ const Camera &cam = hst_scene->state.camera;
// Send results to OpenGL buffer for rendering
sendImageToPBO<<>>(pbo, cam.resolution, iter, dev_image);
}
+
+void showDenoisedImage(uchar4* pbo, int iter) {
+ const Camera& cam = hst_scene->state.camera;
+ const dim3 blockSize2d(8, 8);
+ const dim3 blocksPerGrid2d(
+ (cam.resolution.x + blockSize2d.x - 1) / blockSize2d.x,
+ (cam.resolution.y + blockSize2d.y - 1) / blockSize2d.y);
+
+ // Send results to OpenGL buffer for rendering
+ sendImageToPBO << > > (pbo, cam.resolution, iter, dev_denoised_image);
+}
diff --git a/src/pathtrace.h b/src/pathtrace.h
index 9e12f440..362237d8 100644
--- a/src/pathtrace.h
+++ b/src/pathtrace.h
@@ -8,3 +8,5 @@ void pathtraceFree();
void pathtrace(int frame, int iteration);
void showGBuffer(uchar4 *pbo);
void showImage(uchar4 *pbo, int iter);
+void showDenoisedImage(uchar4* pbo, int iter);
+void Denoise_Image(float c_phi, float n_phi, float p_phi, float stepWidth);
\ No newline at end of file
diff --git a/src/sceneStructs.h b/src/sceneStructs.h
index da7e558a..131f2118 100644
--- a/src/sceneStructs.h
+++ b/src/sceneStructs.h
@@ -79,4 +79,6 @@ struct ShadeableIntersection {
// What information might be helpful for guiding a denoising filter?
struct GBufferPixel {
float t;
+ glm::vec3 normal;
+ glm::vec3 position;
};