diff --git a/README.md b/README.md index f044c821..dc1a5c29 100644 --- a/README.md +++ b/README.md @@ -3,11 +3,77 @@ CUDA Denoiser For CUDA Path Tracer **University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 4** -* (TODO) YOUR NAME HERE -* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab) +* Shixuan Fang +* Tested on: Windows 11, i7-12700kf, RTX 3080Ti -### (TODO: Your README) +# Project Overview -*DO NOT* leave the README to the last minute! It is a crucial part of the -project, and we will not be able to grade you without a good README. +| Original | Denoised | +| ------------- |:-------------:| +||| +In this project, I implemented a pathtracing denoiser using A-trous wavelet filter. + + + +A-trous is an approximation of gaussian blur by iteratively applying sparse blurs of increasing size, which significantly reduce the number of pixel read. + +## G-Buffer + +In order to use A-trous filter to denoise the result while preseving edges, we need extra information other than the path traced image. + +|Normal|Position| +|------|:-------:| +| | | + +## Performance Analysis + +- how much time denoising adds to renders + +![800](https://user-images.githubusercontent.com/54868517/197315609-766b58ce-e6d0-4e43-9315-0e191bcae323.jpg) + +As seen in this figure, denoising only adds 2-3ms to the render, which is considered good enough for real-time rendering. + +- how denoising influences the number of iterations needed to get an "acceptably smooth" result + +|2spp|10spp|Ground Truth| +|------|:-------:|:-------------:| +| | | | + +As seen in this table, 10spp with denoising is acceptably smooth compared to the ground truth(5000spp). + +- how denoising at different resolutions impacts runtime + +![image](https://user-images.githubusercontent.com/54868517/197312998-79594b67-bc01-4701-b9e7-e6a04f5094f3.png) + +As seen in this chart, with the increasement of screen resolution, the runtime of denoising also increases. This makes sense as there are more pixels to be considered, and A-trous filter needs more iterations to finish the whole picture, therefore denoising takes more time. + +- how varying filter sizes affect performance + +![image](https://user-images.githubusercontent.com/54868517/197314664-59f5fd76-cf51-473b-9c6d-76e3ab028a6c.png) + +As seen in this chart, when filter size increases the runtime of denoising also increases. This also caused by the same reason, which is with filter size increases, the A-trous filter needs more iterations, therefore increases denoising time. + +- how visual results vary with filter size + +|1|16|32|100| +|------|:-------:|:--------:|:--------:| +| | | | | + +This table shows how the denoised image looks with 1/16/32/80 filter size. The visual quality does increase from 1 to 16 and 16 to 32, but from 32 to 100 there aren't too much difference. + +- how effective/ineffective is this method with different material types + +|diffuse|specular| +|------|:-------:| +| || + +As seen in this table, A-trous works better with diffuse object in my opinion, as with specular objects it will blur the reflection face, which will result in a very blury surface. This is because position buffer and normal buffer can't really help split the edges on the same surface. + +- how do results compare across different scenes + +|cornell|cornell_ceiling| +|------|:-------:| +| || + +As seen in this table, with same parameters, cornell_ceiling scene looks better and smoother than the cornell scene. This is because it has a larger light source, therefore making path tracing converge faster than small light source. Better path traced image will definitely makes denoised image better. diff --git a/img/1.jpg b/img/1.jpg new file mode 100644 index 00000000..82be6048 Binary files /dev/null and b/img/1.jpg differ diff --git a/img/1000.jpg b/img/1000.jpg new file mode 100644 index 00000000..4107e44c Binary files /dev/null and b/img/1000.jpg differ diff --git a/img/10spp.png b/img/10spp.png new file mode 100644 index 00000000..f7e20a92 Binary files /dev/null and b/img/10spp.png differ diff --git a/img/1200.jpg b/img/1200.jpg new file mode 100644 index 00000000..4fcbe91f Binary files /dev/null and b/img/1200.jpg differ diff --git a/img/16.jpg b/img/16.jpg new file mode 100644 index 00000000..31cb0b97 Binary files /dev/null and b/img/16.jpg differ diff --git a/img/2spp.png b/img/2spp.png new file mode 100644 index 00000000..436c856f Binary files /dev/null and b/img/2spp.png differ diff --git a/img/32.jpg b/img/32.jpg new file mode 100644 index 00000000..3443bcfc Binary files /dev/null and b/img/32.jpg differ diff --git a/img/80.jpg b/img/80.jpg new file mode 100644 index 00000000..9796d11b Binary files /dev/null and b/img/80.jpg differ diff --git a/img/800.jpg b/img/800.jpg new file mode 100644 index 00000000..b0fc6481 Binary files /dev/null and b/img/800.jpg differ diff --git a/img/cornell.png b/img/cornell.png new file mode 100644 index 00000000..337e2932 Binary files /dev/null and b/img/cornell.png differ diff --git a/img/diffuse.png b/img/diffuse.png new file mode 100644 index 00000000..b5bec78f Binary files /dev/null and b/img/diffuse.png differ diff --git a/img/gauss.png b/img/gauss.png new file mode 100644 index 00000000..6d88d9ff Binary files /dev/null and b/img/gauss.png differ diff --git a/img/noise.png b/img/noise.png new file mode 100644 index 00000000..163c3199 Binary files /dev/null and b/img/noise.png differ diff --git a/img/normal.jpg b/img/normal.jpg new file mode 100644 index 00000000..e9880534 Binary files /dev/null and b/img/normal.jpg differ diff --git a/img/pos.jpg b/img/pos.jpg new file mode 100644 index 00000000..95392571 Binary files /dev/null and b/img/pos.jpg differ diff --git a/img/truch.png b/img/truch.png new file mode 100644 index 00000000..bb70aa8a Binary files /dev/null and b/img/truch.png differ diff --git a/scenes/cornell_ceiling_light.txt b/scenes/cornell_ceiling_light.txt index 15af5f19..ad7b6435 100644 --- a/scenes/cornell_ceiling_light.txt +++ b/scenes/cornell_ceiling_light.txt @@ -48,6 +48,16 @@ REFR 0 REFRIOR 0 EMITTANCE 0 +// Diffuse blue +MATERIAL 5 +RGB .35 .35 .85 +SPECEX 0 +SPECRGB 0 0 0 +REFL 0 +REFR 0 +REFRIOR 0 +EMITTANCE 0 + // Camera CAMERA RES 800 800 diff --git a/src/main.cpp b/src/main.cpp index 4092ae4a..3ee37a10 100644 --- a/src/main.cpp +++ b/src/main.cpp @@ -30,6 +30,7 @@ float ui_normalWeight = 0.35f; float ui_positionWeight = 0.2f; bool ui_saveAndExit = false; + static bool camchanged = true; static float dtheta = 0, dphi = 0; static glm::vec3 cammove; @@ -70,6 +71,7 @@ int main(int argc, char** argv) { height = cam.resolution.y; ui_iterations = renderState->iterations; + ui_iterations = 5000; startupIterations = ui_iterations; glm::vec3 view = cam.view; @@ -165,7 +167,11 @@ void runCuda() { pathtrace(frame, iteration); } - if (ui_showGbuffer) { + if (ui_denoise) { + Denoise_Image(ui_colorWeight, ui_normalWeight, ui_positionWeight, ui_filterSize); + showDenoisedImage(pbo_dptr, iteration); + } + else if (ui_showGbuffer) { showGBuffer(pbo_dptr); } else { showImage(pbo_dptr, iteration); diff --git a/src/main.h b/src/main.h index 06d311a8..45afcb8c 100644 --- a/src/main.h +++ b/src/main.h @@ -42,6 +42,7 @@ extern float ui_normalWeight; extern float ui_positionWeight; extern bool ui_saveAndExit; + void runCuda(); void keyCallback(GLFWwindow *window, int key, int scancode, int action, int mods); void mousePositionCallback(GLFWwindow* window, double xpos, double ypos); diff --git a/src/pathtrace.cu b/src/pathtrace.cu index 23e5f909..19720abe 100644 --- a/src/pathtrace.cu +++ b/src/pathtrace.cu @@ -14,8 +14,14 @@ #include "intersections.h" #include "interactions.h" +#include "device_launch_parameters.h" + #define ERRORCHECK 1 +#define gauss 1 +#define stride 25 +#define show_pos 0 + #define FILENAME (strrchr(__FILE__, '/') ? strrchr(__FILE__, '/') + 1 : __FILE__) #define checkCUDAError(msg) checkCUDAErrorFn(msg, FILENAME, __LINE__) void checkCUDAErrorFn(const char *msg, const char *file, int line) { @@ -67,7 +73,7 @@ __global__ void sendImageToPBO(uchar4* pbo, glm::ivec2 resolution, } } -__global__ void gbufferToPBO(uchar4* pbo, glm::ivec2 resolution, GBufferPixel* gBuffer) { +__global__ void gbufferToPBO_Normal(uchar4* pbo, glm::ivec2 resolution, GBufferPixel* gBuffer) { int x = (blockIdx.x * blockDim.x) + threadIdx.x; int y = (blockIdx.y * blockDim.y) + threadIdx.y; @@ -75,10 +81,27 @@ __global__ void gbufferToPBO(uchar4* pbo, glm::ivec2 resolution, GBufferPixel* g int index = x + (y * resolution.x); float timeToIntersect = gBuffer[index].t * 256.0; + glm::vec3 nor = gBuffer[index].normal; pbo[index].w = 0; - pbo[index].x = timeToIntersect; - pbo[index].y = timeToIntersect; - pbo[index].z = timeToIntersect; + pbo[index].x = glm::clamp(abs((int)(nor.x * 255.f)), 0, 255); + pbo[index].y = glm::clamp(abs((int)(nor.y * 255.f)), 0, 255); + pbo[index].z = glm::clamp(abs((int)(nor.z * 255.f)), 0, 255); + } +} + +__global__ void gbufferToPBO_Position(uchar4* pbo, glm::ivec2 resolution, GBufferPixel* gBuffer) { + int x = (blockIdx.x * blockDim.x) + threadIdx.x; + int y = (blockIdx.y * blockDim.y) + threadIdx.y; + + if (x < resolution.x && y < resolution.y) { + int index = x + (y * resolution.x); + + glm::vec3 position = gBuffer[index].position; + pbo[index].w = 0; + pbo[index].x = glm::clamp(abs(position.x * stride), 0.f, 255.f); + pbo[index].y = glm::clamp(abs(position.y * stride), 0.f, 255.f); + pbo[index].z = glm::clamp(abs(position.z * stride), 0.f, 255.f); + } } @@ -90,7 +113,19 @@ static PathSegment * dev_paths = NULL; static ShadeableIntersection * dev_intersections = NULL; static GBufferPixel* dev_gBuffer = NULL; // TODO: static variables for device memory, any extra info you need, etc -// ... + +static glm::vec3* dev_denoised_image = NULL; +static glm::vec3* dev_denoised_buffer = NULL; +static float* dev_gauss_kernal = NULL; + +//https://www.geeksforgeeks.org/gaussian-filter-generation-c/ +static float gauss_kernel[25] = { +0.00296902, 0.0133062, 0.0219382, 0.0133062, 0.00296902, +0.0133062, 0.0596343, 0.0983203, 0.0596343, 0.0133062, +0.0219382, 0.0983203, 0.162103, 0.0983203, 0.0219382, +0.0133062, 0.0596343, 0.0983203, 0.0596343, 0.0133062, +0.00296902, 0.0133062, 0.0219382, 0.0133062, 0.00296902, +}; void pathtraceInit(Scene *scene) { hst_scene = scene; @@ -114,6 +149,11 @@ void pathtraceInit(Scene *scene) { cudaMalloc(&dev_gBuffer, pixelcount * sizeof(GBufferPixel)); // TODO: initialize any extra device memeory you need + cudaMalloc(&dev_denoised_image, pixelcount * sizeof(glm::vec3)); + cudaMalloc(&dev_denoised_buffer, pixelcount * sizeof(glm::vec3)); + + cudaMalloc(&dev_gauss_kernal, 25 * sizeof(float)); + cudaMemcpy(dev_gauss_kernal, gauss_kernel, 25 * sizeof(float), cudaMemcpyHostToDevice); checkCUDAError("pathtraceInit"); } @@ -126,6 +166,9 @@ void pathtraceFree() { cudaFree(dev_intersections); cudaFree(dev_gBuffer); // TODO: clean up any extra device memory you created + cudaFree(dev_denoised_image); + cudaFree(dev_denoised_buffer); + cudaFree(dev_gauss_kernal); checkCUDAError("pathtraceFree"); } @@ -282,6 +325,9 @@ __global__ void generateGBuffer ( if (idx < num_paths) { gBuffer[idx].t = shadeableIntersections[idx].t; + gBuffer[idx].normal = shadeableIntersections[idx].surfaceNormal; + gBuffer[idx].position = getPointOnRay(pathSegments[idx].ray, shadeableIntersections[idx].t); + } } @@ -356,42 +402,42 @@ void pathtrace(int frame, int iter) { // --- PathSegment Tracing Stage --- // Shoot ray into scene, bounce between objects, push shading chunks - // Empty gbuffer - cudaMemset(dev_gBuffer, 0, pixelcount * sizeof(GBufferPixel)); + // Empty gbuffer + cudaMemset(dev_gBuffer, 0, pixelcount * sizeof(GBufferPixel)); // clean shading chunks cudaMemset(dev_intersections, 0, pixelcount * sizeof(ShadeableIntersection)); - bool iterationComplete = false; + bool iterationComplete = false; while (!iterationComplete) { - // tracing - dim3 numblocksPathSegmentTracing = (num_paths + blockSize1d - 1) / blockSize1d; - computeIntersections <<>> ( - depth - , num_paths - , dev_paths - , dev_geoms - , hst_scene->geoms.size() - , dev_intersections - ); - checkCUDAError("trace one bounce"); - cudaDeviceSynchronize(); - - if (depth == 0) { - generateGBuffer<<>>(num_paths, dev_intersections, dev_paths, dev_gBuffer); - } - - depth++; - - shadeSimpleMaterials<<>> ( - iter, - num_paths, - dev_intersections, - dev_paths, - dev_materials - ); - iterationComplete = depth == traceDepth; + // tracing + dim3 numblocksPathSegmentTracing = (num_paths + blockSize1d - 1) / blockSize1d; + computeIntersections <<>> ( + depth + , num_paths + , dev_paths + , dev_geoms + , hst_scene->geoms.size() + , dev_intersections + ); + checkCUDAError("trace one bounce"); + cudaDeviceSynchronize(); + + if (depth == 0) { + generateGBuffer<<>>(num_paths, dev_intersections, dev_paths, dev_gBuffer); + } + + depth++; + + shadeSimpleMaterials<<>> ( + iter, + num_paths, + dev_intersections, + dev_paths, + dev_materials + ); + iterationComplete = depth == traceDepth; } // Assemble this iteration and apply it to the image @@ -409,6 +455,109 @@ void pathtrace(int frame, int iter) { checkCUDAError("pathtrace"); } +__global__ void ATrousDenoise(float c_phi, float n_phi, float p_phi, glm::ivec2 resolution, int stepWidth, GBufferPixel* gBuffer, + glm::vec3* pt_image, glm::vec3* denoised_image) +{ + int x = blockDim.x * blockIdx.x + threadIdx.x; + int y = blockDim.y * blockIdx.y + threadIdx.y; + if (x < resolution.x && y < resolution.y) { + float cum_w = 0.0f; + glm::vec3 sum(0.f, 0.f, 0.f); + + int index = y * resolution.x + x; + glm::vec3 nval = gBuffer[index].normal; + glm::vec3 pval = gBuffer[index].position; + glm::vec3 cval = pt_image[index]; + + float kernal[5] = { 0.0625, 0.25, 0.375, 0.25, 0.0625 }; + for (int i = 0; i < 5; i++) { + for (int j = 0; j < 5; j++) { + glm::ivec2 offset; + offset.x = x + (i - 2) * stepWidth; + offset.y = y + (j - 2) * stepWidth; + offset = glm::clamp(offset, glm::ivec2(0, 0), glm::ivec2(resolution.x - 1, resolution.y - 1)); + + int tmp = offset.y * resolution.x + offset.x; + + glm::vec3 ctmp = pt_image[tmp]; + + glm::vec3 t = cval - ctmp; + + float dist2 = glm::dot(t, t); + float c_w = min(exp(-dist2 / (c_phi* c_phi)), 1.0f); + + glm::vec3 ntmp = gBuffer[tmp].normal; + t = nval - ntmp; + dist2 = max(glm::dot(t, t) / (stepWidth * stepWidth), 0.0f); + float n_w = min(exp(-dist2 / (n_phi* n_phi)), 1.f); + + glm::vec3 ptmp = gBuffer[tmp].position; + t = pval - ptmp; + dist2 = glm::dot(t, t); + float p_w = min(exp(-dist2 / (p_phi * n_phi)), 1.f); + + float filter = kernal[i] * kernal[j];//NOT SURE WHETHER THIS KERNAL IS CORREST + float weight = c_w * n_w * p_w; + sum += ctmp * weight * filter; + cum_w += weight * filter; + + } + } + denoised_image[index] = sum / cum_w; + } +} + +__global__ void GaussBlur(glm::ivec2 resolution, int stepWidth, float* gauss_kernal, + glm::vec3* pt_image, glm::vec3* denoised_image) +{ + int x = blockDim.x * blockIdx.x + threadIdx.x; + int y = blockDim.y * blockIdx.y + threadIdx.y; + if (x < resolution.x && y < resolution.y) { + int index = y * resolution.x + x; + glm::vec3 sum(0.f, 0.f, 0.f); + for (int i = 0; i < 5; i++) { + for (int j = 0; j < 5; j++) { + glm::ivec2 offset; + offset.x = x + (i - 2) * stepWidth; + offset.y = y + (j - 2) * stepWidth; + offset = glm::clamp(offset, glm::ivec2(0, 0), glm::ivec2(resolution.x - 1, resolution.y - 1)); + + int tmp = offset.y * resolution.x + offset.x; + + glm::vec3 ctmp = pt_image[tmp]; + float weight = gauss_kernal[j * 5 + i]; + sum += weight * ctmp; + } + } + denoised_image[index] = sum; + } +} + +void Denoise_Image(float c_phi, float n_phi, float p_phi, float stepWidth) +{ + Camera& cam = hst_scene->state.camera; + glm::ivec2 resolution = cam.resolution; + + const dim3 blockSize2d(8, 8); + const dim3 blocksPerGrid2d( + (cam.resolution.x + blockSize2d.x - 1) / blockSize2d.x, + (cam.resolution.y + blockSize2d.y - 1) / blockSize2d.y); + + cudaMemcpy(dev_denoised_image, dev_image, resolution.x * resolution.y * sizeof(glm::vec3), cudaMemcpyDeviceToDevice); + + int iteration = ceil(glm::log2(stepWidth)); + for (int i = 0; i < iteration; i++) { +#if gauss + GaussBlur << > > (resolution, 1 << i, dev_gauss_kernal, dev_denoised_image, dev_denoised_buffer); +#else + ATrousDenoise << > > (c_phi, n_phi, p_phi, resolution, 1 << i, dev_gBuffer, dev_denoised_image, dev_denoised_buffer); + +#endif + std::swap(dev_denoised_buffer, dev_denoised_image); + } + cudaMemcpy(hst_scene->state.image.data(), dev_denoised_image, resolution.x * resolution.y * sizeof(glm::vec3), cudaMemcpyDeviceToHost); +} + // CHECKITOUT: this kernel "post-processes" the gbuffer/gbuffers into something that you can visualize for debugging. void showGBuffer(uchar4* pbo) { const Camera &cam = hst_scene->state.camera; @@ -418,7 +567,13 @@ void showGBuffer(uchar4* pbo) { (cam.resolution.y + blockSize2d.y - 1) / blockSize2d.y); // CHECKITOUT: process the gbuffer results and send them to OpenGL buffer for visualization - gbufferToPBO<<>>(pbo, cam.resolution, dev_gBuffer); +#if show_pos + gbufferToPBO_Position << > > (pbo, cam.resolution, dev_gBuffer); + +#else + gbufferToPBO_Normal <<>>(pbo, cam.resolution, dev_gBuffer); + +#endif } void showImage(uchar4* pbo, int iter) { @@ -431,3 +586,14 @@ const Camera &cam = hst_scene->state.camera; // Send results to OpenGL buffer for rendering sendImageToPBO<<>>(pbo, cam.resolution, iter, dev_image); } + +void showDenoisedImage(uchar4* pbo, int iter) { + const Camera& cam = hst_scene->state.camera; + const dim3 blockSize2d(8, 8); + const dim3 blocksPerGrid2d( + (cam.resolution.x + blockSize2d.x - 1) / blockSize2d.x, + (cam.resolution.y + blockSize2d.y - 1) / blockSize2d.y); + + // Send results to OpenGL buffer for rendering + sendImageToPBO << > > (pbo, cam.resolution, iter, dev_denoised_image); +} diff --git a/src/pathtrace.h b/src/pathtrace.h index 9e12f440..362237d8 100644 --- a/src/pathtrace.h +++ b/src/pathtrace.h @@ -8,3 +8,5 @@ void pathtraceFree(); void pathtrace(int frame, int iteration); void showGBuffer(uchar4 *pbo); void showImage(uchar4 *pbo, int iter); +void showDenoisedImage(uchar4* pbo, int iter); +void Denoise_Image(float c_phi, float n_phi, float p_phi, float stepWidth); \ No newline at end of file diff --git a/src/sceneStructs.h b/src/sceneStructs.h index da7e558a..131f2118 100644 --- a/src/sceneStructs.h +++ b/src/sceneStructs.h @@ -79,4 +79,6 @@ struct ShadeableIntersection { // What information might be helpful for guiding a denoising filter? struct GBufferPixel { float t; + glm::vec3 normal; + glm::vec3 position; };