Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 78 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,84 @@ CUDA Denoiser For CUDA Path Tracer

**University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 4**

* (TODO) YOUR NAME HERE
* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
* Matt Elser
* [LinkedIn](https://www.linkedin.com/in/matt-elser-97b8151ba/), [twitter](twitter.com/__mattelser__)
* Tested on: Tested on: Windows 10, i3-10100F @ 3.6GHz 16GB, GeForce 1660 Super 6GB

### (TODO: Your README)

*DO NOT* leave the README to the last minute! It is a crucial part of the
project, and we will not be able to grade you without a good README.
### Features

![example denoise image](img/256example.png)

This is an implementation of A-Trous wavelet denoising as described in
[Edge-Avoiding A-Trous Wavelet Transform for fast Global Illumination Filtering
by Dammertz et. al.](https://jo.dreggn.org/home/2010_atrous.pdf).

*NOTE* This repo only contains the denoise features added to a minimal
pathtracer. To see this denoiser in a more featured pathtracer (from which all
sample images were generated), check out [my pathtracer
repo](https://github.com/mattelser/Project3-CUDA-Path-Tracer)

A-Trous wavelet denoising works by approximating a Gaussian blur kernel and
incorporating data from "Gbuffers". These Gbuffers are data gathered from the
scene, and for this implementation include:
- surface normal ![surface normal gbuffer](img/normalGbuffer.png)
- surface position ![position gbuffer](img/positionGbuffer.png)
- image color

### Performance
In comparing rendertimes between the full-featured pathtracer with no denoise
code and the full-featured pathtracer running the denoise kernel, no notable
performace difference was found across multiple iteration counts and multiple
resolutions for low filter sizes (e.x. a 15x15 pixel kernel). Large filter
sizes (e.x. an 80x80 pixel kernel) slowed render times by ~5%, but large
filters are generally undesireable since they cause visual artifacts. Note the
banding around the edges of the room in the following image, rendered with an
80x80 filter.
![artifacting image](img/artifacting.png)

Scenes that produce less noise (e.x. those with larger light sources) also
produce better source images for the denoiser.

### Visual Improvement
the following comparison shows three columns. The
first is an untouched pathtracer render, the second is a denoised version of
that render, and the third is the visual difference between them (the stronger
the red, the greater the difference at that pixel). Denoise parameters were kept
constant for all renders.
![denoise comparison](img/comparisonChart.png)

As the above image shows, the denoiser is capable of improving even an
extremely noisy image. Also, as the iterations increase, the denoiser does not
noticeably reduce image quality. The fact that the difference images get
progressively less red indicates that the source image and the denoise image
are converging towards one agreed upon image.

The visual gains of the denoiser are most dramatic with the extremely noisy
low-iteration images, but the most practical aspects are available at slightly
higher iteration values. Notice how little a visual difference exists between
the following two images:
![iteration comparison](img/qualcomp.png)
The left image took twice as much computation to achieve approximately the same
result!

### Known limitations

The gbuffers only take into account surface info, which makes it less effective
for reflective and refractive surfaces. Note in the gbuffers above that the
reflective/refractive spheres show the same position and normals as a diffuse
sphere instead of the position or normal of what they reflect/refract. One way
to solve this would be to adjust the gbuffer based on the shader and one or
more bounced rays.

The denoiser does not work well with depth of field. The gbuffers are rebuilt
each iteration, but with depth of field the camera rays change each iteration.
The shifting rays lead to inconsistant results from iteration to iteration, and
whereas the un-denoised image sums these iterations and converges, the denoiser
does not. The image below shows the normal gbuffer with depth of field enabled.
The dark pixels are unexpected and may indicate a bug.
![noisy gbuffer](img/DOFNormalGbuffer.png)
One way to fix this would be to have the gbuffers accumulate over
time. This fix could pair well with the previously mentioned fix for
reflective/refractive surfaces.

Binary file added img/256example.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/DOFNormalGbuffer.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/artifacting.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/comparisonChart.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/normalGbuffer.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/positionGbuffer.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/qualcomp.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
22 changes: 19 additions & 3 deletions src/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,14 @@ int ui_iterations = 0;
int startupIterations = 0;
int lastLoopIterations = 0;
bool ui_showGbuffer = false;
bool ui_denoise = false;
int ui_filterSize = 80;
bool ui_denoise = true; // false;
int ui_filterSize = 15;
float ui_colorWeight = 0.45f;
float ui_normalWeight = 0.35f;
float ui_positionWeight = 0.2f;
bool ui_saveAndExit = false;


static bool camchanged = true;
static float dtheta = 0, dphi = 0;
static glm::vec3 cammove;
Expand All @@ -40,6 +41,7 @@ glm::vec3 ogLookAt; // for recentering the camera

Scene *scene;
RenderState *renderState;
DenoiseSettings *denoiseSettings;
int iteration;

int width;
Expand All @@ -62,13 +64,23 @@ int main(int argc, char** argv) {
// Load scene file
scene = new Scene(sceneFile);

// Set up denoise settings to pass data from UI
denoiseSettings = new DenoiseSettings();
denoiseSettings->denoise = &ui_denoise;
denoiseSettings->filterSize = &ui_filterSize;
denoiseSettings->colorWeight = &ui_colorWeight;
denoiseSettings->normalWeight = &ui_normalWeight;
denoiseSettings->positionWeight = &ui_positionWeight;

// Set up camera stuff from loaded path tracer settings
iteration = 0;
renderState = &scene->state;
Camera &cam = renderState->camera;
width = cam.resolution.x;
height = cam.resolution.y;

renderState->denoiseSettings = denoiseSettings;

ui_iterations = renderState->iterations;
startupIterations = ui_iterations;

Expand Down Expand Up @@ -167,7 +179,11 @@ void runCuda() {

if (ui_showGbuffer) {
showGBuffer(pbo_dptr);
} else {
}
else if (ui_denoise) {
showDenoise(pbo_dptr, iteration);
}
else {
showImage(pbo_dptr, iteration);
}

Expand Down
163 changes: 157 additions & 6 deletions src/pathtrace.cu
Original file line number Diff line number Diff line change
Expand Up @@ -67,28 +67,52 @@ __global__ void sendImageToPBO(uchar4* pbo, glm::ivec2 resolution,
}
}

////Kernel that writes the image to the OpenGL PBO directly.
//__global__ void sendDenoiseToPBO(uchar4* pbo, glm::ivec2 resolution,
// int iter, glm::vec3* image) {
// int x = (blockIdx.x * blockDim.x) + threadIdx.x;
// int y = (blockIdx.y * blockDim.y) + threadIdx.y;
//
// if (x < resolution.x && y < resolution.y) {
// int index = x + (y * resolution.x);
// glm::vec3 pix = image[index];
//
// glm::ivec3 color;
// color.x = glm::clamp((int) (pix.x * 255.0), 0, 255);
// color.y = glm::clamp((int) (pix.y * 255.0), 0, 255);
// color.z = glm::clamp((int) (pix.z * 255.0), 0, 255);
//
// // Each thread writes one pixel location in the texture (textel)
// pbo[index].w = 0;
// pbo[index].x = color.x;
// pbo[index].y = color.y;
// pbo[index].z = color.z;
// }
//}

__global__ void gbufferToPBO(uchar4* pbo, glm::ivec2 resolution, GBufferPixel* gBuffer) {
int x = (blockIdx.x * blockDim.x) + threadIdx.x;
int y = (blockIdx.y * blockDim.y) + threadIdx.y;

if (x < resolution.x && y < resolution.y) {
int index = x + (y * resolution.x);
float timeToIntersect = gBuffer[index].t * 256.0;

//float timeToIntersect = gBuffer[index].t * 256.0;
pbo[index].w = 0;
pbo[index].x = timeToIntersect;
pbo[index].y = timeToIntersect;
pbo[index].z = timeToIntersect;
pbo[index].x = gBuffer[index].normal.x * 256.0f;
pbo[index].y = gBuffer[index].normal.y * 256.0f;
pbo[index].z = gBuffer[index].normal.z * 256.0f;
}
}

static Scene * hst_scene = NULL;
static DenoiseSettings * denoiseSettings = NULL;
static glm::vec3 * dev_image = NULL;
static Geom * dev_geoms = NULL;
static Material * dev_materials = NULL;
static PathSegment * dev_paths = NULL;
static ShadeableIntersection * dev_intersections = NULL;
static GBufferPixel* dev_gBuffer = NULL;
static glm::vec3 * dev_dnImage = NULL;
// TODO: static variables for device memory, any extra info you need, etc
// ...

Expand All @@ -113,6 +137,8 @@ void pathtraceInit(Scene *scene) {

cudaMalloc(&dev_gBuffer, pixelcount * sizeof(GBufferPixel));

cudaMalloc(&dev_dnImage, pixelcount * sizeof(glm::vec3));
cudaMemset(dev_dnImage, 0, pixelcount * sizeof(glm::vec3));
// TODO: initialize any extra device memeory you need

checkCUDAError("pathtraceInit");
Expand All @@ -125,6 +151,7 @@ void pathtraceFree() {
cudaFree(dev_materials);
cudaFree(dev_intersections);
cudaFree(dev_gBuffer);
cudaFree(dev_dnImage);
// TODO: clean up any extra device memory you created

checkCUDAError("pathtraceFree");
Expand Down Expand Up @@ -281,7 +308,10 @@ __global__ void generateGBuffer (
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx < num_paths)
{
gBuffer[idx].t = shadeableIntersections[idx].t;
//gBuffer[idx].t = shadeableIntersections[idx].t;
gBuffer[idx].normal = shadeableIntersections[idx].surfaceNormal;
gBuffer[idx].position = getPointOnRay(pathSegments[idx].ray,
shadeableIntersections[idx].t);
}
}

Expand All @@ -297,6 +327,97 @@ __global__ void finalGather(int nPaths, glm::vec3 * image, PathSegment * iterati
}
}

__global__ void denoise(int n,
GBufferPixel* gbuff,
glm::vec3* image,
glm::vec3 * dnImage,
int step,
int imageWidth,
float normalWeight,
float posWeight,
float colorWeight)
{
int index = (blockIdx.x * blockDim.x) + threadIdx.x;

if (index < n)
{
glm::vec3 colSum = glm::vec3(0.0f);
float wSum = 0.0f;
// hardcode a 5x5 Gaussian filter
float GaussianFilter[5][5] = { {1, 4, 6, 4, 1},
{4, 16, 24, 16, 4},
{6, 24, 36, 24, 6},
{4, 16, 24, 16, 4},
{1, 4, 6, 4, 1} };

// a way to convert from 2d pixel space to the 1d pixel array we have
int uStepIm = 1;
int vStepIm = imageWidth;

// the relative offset from the center pixel in the image
// e.x. -2, -2 is two pixels left and two pixels up in screenspace
int imStartX = -2;
int imStartY = -2;


// store the gbuffer values for the center pixel of our filter
// i.e. the one we're actually calculating the color for
glm::vec3 centralNorm = gbuff[index].normal;
glm::vec3 centralPos = gbuff[index].position;
glm::vec3 centralCol = image[index];

// the cell count in 2d, starting in the upper left corner of
// our 5x5 filter
for (int y = 0; y < 5; y++) {
for (int x = 0; x < 5; x++) {
int imX = (imStartX + x) * uStepIm * step;
int imY = (imStartY + y) * vStepIm * step;

// i is the index for 1d representations of our 2d
// data, i.e. the beauty pass and the gbuffer
int i = index + imX + imY;
if (i < 0 || i >= n) {
// i can be out of bounds along the edges of the image
continue;
}

// get the Gaussian value for this pixel
float gVal = GaussianFilter[y][x];

// get the gbuffer values for this pixel
glm::vec3 nVal = gbuff[i].normal;
glm::vec3 pVal = gbuff[i].position;
glm::vec3 cVal = image[i];

// get the distance of the gbuffer values
// from our central pixel
//glm::vec3 a = centralCol - cVal;
float nDist = max(glm::length(centralNorm - nVal)/(step*step), 0.0f);
float pDist = glm::length(centralPos - pVal);// , centralPos - pVal);
float cDist = glm::length(centralCol - cVal);// , centralCol - cVal);

// get the weights based on these distances
// and our input values
float nw = min(exp(-1.0f * nDist / normalWeight), 1.0f);
float pw = min(exp(-1.0f * pDist / posWeight), 1.0f);
float cw = min(exp(-1.0f * cDist / colorWeight), 1.0f);

// get the overall
float w = nw * pw * cw;

colSum += cVal * w * gVal;
wSum += w * gVal;
}
}

//bring denoise
volatile float3 foo = make_float3(colSum.x, colSum.y, colSum.z);
volatile float3 bar = make_float3(centralCol.x, centralCol.y, centralCol.z);
dnImage[index] = colSum / wSum;
//dnImage[index] = colSum / (256.0f * steps);
}
}

/**
* Wrapper for the __global__ call that sets up the kernel calls and does a ton
* of memory management
Expand Down Expand Up @@ -399,6 +520,25 @@ void pathtrace(int frame, int iter) {
finalGather<<<numBlocksPixels, blockSize1d>>>(num_paths, dev_image, dev_paths);

///////////////////////////////////////////////////////////////////////////
if (*hst_scene->state.denoiseSettings->denoise){

float nWeight = pow(*hst_scene->state.denoiseSettings->normalWeight, 2);
float pWeight = pow(*hst_scene->state.denoiseSettings->positionWeight, 2);
float cWeight = pow(*hst_scene->state.denoiseSettings->colorWeight, 2);

int steps = *hst_scene->state.denoiseSettings->filterSize / 5;
for (int step = 1; step <= steps; step++) {
denoise <<<numBlocksPixels, blockSize1d>>>(num_paths,
dev_gBuffer,
dev_image,
dev_dnImage,
step,
cam.resolution.x,
nWeight,
pWeight,
cWeight);
}
}

// CHECKITOUT: use dev_image as reference if you want to implement saving denoised images.
// Otherwise, screenshots are also acceptable.
Expand All @@ -421,6 +561,17 @@ void showGBuffer(uchar4* pbo) {
gbufferToPBO<<<blocksPerGrid2d, blockSize2d>>>(pbo, cam.resolution, dev_gBuffer);
}

void showDenoise(uchar4* pbo, int iter) {
const Camera &cam = hst_scene->state.camera;
const dim3 blockSize2d(8, 8);
const dim3 blocksPerGrid2d(
(cam.resolution.x + blockSize2d.x - 1) / blockSize2d.x,
(cam.resolution.y + blockSize2d.y - 1) / blockSize2d.y);

// CHECKITOUT: process the gbuffer results and send them to OpenGL buffer for visualization
sendImageToPBO<<<blocksPerGrid2d, blockSize2d>>>(pbo, cam.resolution, iter, dev_dnImage);
}

void showImage(uchar4* pbo, int iter) {
const Camera &cam = hst_scene->state.camera;
const dim3 blockSize2d(8, 8);
Expand Down
1 change: 1 addition & 0 deletions src/pathtrace.h
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@ void pathtraceFree();
void pathtrace(int frame, int iteration);
void showGBuffer(uchar4 *pbo);
void showImage(uchar4 *pbo, int iter);
void showDenoise(uchar4 *pbo, int iter);
Loading