diff --git a/README.md b/README.md index f044c82..a94f1ae 100644 --- a/README.md +++ b/README.md @@ -3,11 +3,76 @@ CUDA Denoiser For CUDA Path Tracer **University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 4** -* (TODO) YOUR NAME HERE -* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab) +* Lindsay Smith +* [LinkedIn](https://www.linkedin.com/in/lindsay-j-smith/), [personal website](https://lindsays-portfolio-d6aa5d.webflow.io/). +* Tested on: Windows 10, i7-11800H 144Hz 16GB RAM, GeForce RTX 3060 512GB SSD (Personal Laptop) -### (TODO: Your README) +This project builds upon the [Pathtracer](https://github.com/lsmith24/Project3-CUDA-Path-Tracer) I previously implemented by adding a denoiser to provide clearer images. The algorithm used +to do this denoising is detailed in ["Edge-Avoiding À-Trous Wavelet Transform for fast Global Illumination Filtering"](https://jo.dreggn.org/home/2010_atrous.pdf). +The idea is to blur the image while preserving the edges. This will give the appearance of a denoised image because large noisy areas will get smoothed over, but the edges +and shapes will remain intact. We can use the position, normals, and color of the scene elements to apply weights to the blur that will determine how much denoising should be applied. Although a true Gaussian blur is effective, it requires a large amount of computation. By implementing the À-Trous Wavelet transform we are able to spread out +the coordinates of the Gaussian kernel. This results in a larger blur and far fewer iterations than a true Gaussian. -*DO NOT* leave the README to the last minute! It is a crucial part of the -project, and we will not be able to grade you without a good README. +Here we can see the difference between the images where there is no denoising, a blur with no edge detection, and the blur with edge detection. All of these images +went through 10 iterations of the pathtracer. +| raw pathtraced image | simple blur | blur guided by G-buffers | +|---|---|---| +|![](img/noDenoise10samples.png)|![](img/myBlur.png)|![](img/myDenoised.png)| + +We can also see here the visualization of the position and normals of the objects. This gives us some more insight into how the per-pixel normals and positions are +able to help us with the weights for edge detection. + +| per-pixel normals | per-pixel position | +|---|---| +|![](img/myNormals.png)|![](img/myPosition.png)| + +# Performance Analysis + +By timing our regular pathtracer, as well as our denoiser we are able to see how varying different aspects of the pathtracer and denoiser affect runtime. The first thing I looked at was filter size. The filter size of the denoiser increases the time it takes for the denoiser to run. This makes sense because the denoiser will have to consider a larger area of the image each time. The graph is a logarithmic curve due to the fact that the À-Trous algorithm depends on a log2 of the filter size. + +![](img/filterGraph.png) + +The next thing I looked at was how the resolution impacts performance. We can see that as the resolution increases the runtime also increases. This makes sense because an image with a higher resolution will have more pixels that the pathtracer and denoiser need to look at with each iteration. + +![](img/resolutionGraph.png) + +I also found that on average the denoiser adds about 4ms to the time it takes to produce the image. It is interesting though that we can produce nice images overall much faster with the denoiser because the number of iterations of the pathtracer can be brought down substantially. It is a bit hard to quanitify this because the number of iterations needed to produce a good image depends heavily on the complexity of the scene. + +## Qualitative Analysis + +Another interesting thing to note is that the appearance of our image does not scale uniformly with filter size. The difference in the images when using a large filter size +becomes essentially non-existent. We can see from these images that the difference between a filter of size 20 and size 50 is much more substantial than the difference between a filter size of 100 to 120. + +| filter size = 20 | filter size = 50 | filter size = 100 | filter size = 120 | +|---|---|---|---| +|![](img/filter20.png)|![](img/filter50.png)|![](img/filter100.png)|![](img/filter120.png)| + +If we continue to increase the filter size even further we start to see no difference at all. + +| filter size = 200 | filter size = 300 | +|---|---| +|![](img/filter200.png)|![](img/filter300.png)| + +Another thing we can see through looking at the images produced is that certain materials are much more compatible with the denoiser. For example, a diffused surface works +very well with the denoiser because the surface is essentially one color. The surface getting blurred is effective at making it look better, and there is not really any information being lost. This is not the case however with refractive and specular materials. These materials generally create specific reflections of light or the surrounding scene. When these reflections get blurred out by the denoiser they no longer truly look like the material they are supposed to be.This is especially noticeable in the refractive material objects where the contrast between the object and the reflections are not as great and the entire interior of the object gets blurred. It no longer looks like glass because the visuals that make it refractive are not seen. + +It is also interesting to note that with a more complicated scene it takes more than 10 iterations of the pathtracer to be able to produce a good denoised image. To obtain this denoised result I required 100 iterations of the pathtracer first. Of course this is still significantly less iterations than the 5,000 we used to produce images with just the Pathtracer. We can see the difference that more iterations of the pathtracer make in these images. Both use a filter size of 200. + +| iterations = 10 | iterations = 100 | +|---|---| +|![](img/iter10.png)|![](img/iter100.png)| + +We can also see how the amount of light in the scene impacts the effectiveness of the denoiser. A darker scene makes it much more difficult for the denoiser to produce a good image. The same filter size, iterations, and weights were used to produce both of these images, but the lack of light in the second one makes it appear much noisier. + +| Good Lighting | Poor Lighting | +|---|---| +|![](img/goodLighting.png)|![](img/badLighting.png)| + +## Bloopers + +Incorrect clamp value for position and normals + +| Position | Normals | +|---|---| +|![](img/positionBlooper.png)|![](img/NormalsBlooper.png)| diff --git a/img/NormalsBlooper.png b/img/NormalsBlooper.png new file mode 100644 index 0000000..298df3a Binary files /dev/null and b/img/NormalsBlooper.png differ diff --git a/img/anti_aliasing_cover.png b/img/anti_aliasing_cover.png new file mode 100644 index 0000000..2d9b733 Binary files /dev/null and b/img/anti_aliasing_cover.png differ diff --git a/img/badLighting.png b/img/badLighting.png new file mode 100644 index 0000000..afa66bf Binary files /dev/null and b/img/badLighting.png differ diff --git a/img/filter100.png b/img/filter100.png new file mode 100644 index 0000000..8176d2d Binary files /dev/null and b/img/filter100.png differ diff --git a/img/filter120.png b/img/filter120.png new file mode 100644 index 0000000..99c7893 Binary files /dev/null and b/img/filter120.png differ diff --git a/img/filter20.png b/img/filter20.png new file mode 100644 index 0000000..ec8a15e Binary files /dev/null and b/img/filter20.png differ diff --git a/img/filter200.png b/img/filter200.png new file mode 100644 index 0000000..db2f68c Binary files /dev/null and b/img/filter200.png differ diff --git a/img/filter300.png b/img/filter300.png new file mode 100644 index 0000000..dd6a8c4 Binary files /dev/null and b/img/filter300.png differ diff --git a/img/filter50.png b/img/filter50.png new file mode 100644 index 0000000..ae2d42e Binary files /dev/null and b/img/filter50.png differ diff --git a/img/filterGraph.png b/img/filterGraph.png new file mode 100644 index 0000000..0600532 Binary files /dev/null and b/img/filterGraph.png differ diff --git a/img/goodLighting.png b/img/goodLighting.png new file mode 100644 index 0000000..024d22a Binary files /dev/null and b/img/goodLighting.png differ diff --git a/img/highPosWeight.png b/img/highPosWeight.png new file mode 100644 index 0000000..af6a549 Binary files /dev/null and b/img/highPosWeight.png differ diff --git a/img/iter10.png b/img/iter10.png new file mode 100644 index 0000000..8f08089 Binary files /dev/null and b/img/iter10.png differ diff --git a/img/iter100.png b/img/iter100.png new file mode 100644 index 0000000..f0e09bd Binary files /dev/null and b/img/iter100.png differ diff --git a/img/iter30.png b/img/iter30.png new file mode 100644 index 0000000..2e8f094 Binary files /dev/null and b/img/iter30.png differ diff --git a/img/myBlur.png b/img/myBlur.png new file mode 100644 index 0000000..a6a3b3f Binary files /dev/null and b/img/myBlur.png differ diff --git a/img/myDenoised.png b/img/myDenoised.png new file mode 100644 index 0000000..c00f3e0 Binary files /dev/null and b/img/myDenoised.png differ diff --git a/img/myNormals.png b/img/myNormals.png new file mode 100644 index 0000000..d6a77fe Binary files /dev/null and b/img/myNormals.png differ diff --git a/img/myPosition.png b/img/myPosition.png new file mode 100644 index 0000000..aac6a33 Binary files /dev/null and b/img/myPosition.png differ diff --git a/img/noDenoise10samples.png b/img/noDenoise10samples.png new file mode 100644 index 0000000..4e867dc Binary files /dev/null and b/img/noDenoise10samples.png differ diff --git a/img/normals.png b/img/normalsGiven.png similarity index 100% rename from img/normals.png rename to img/normalsGiven.png diff --git a/img/positionBlooper.png b/img/positionBlooper.png new file mode 100644 index 0000000..34e4d11 Binary files /dev/null and b/img/positionBlooper.png differ diff --git a/img/resolutionGraph.png b/img/resolutionGraph.png new file mode 100644 index 0000000..6187f16 Binary files /dev/null and b/img/resolutionGraph.png differ diff --git a/scenes/cornell.txt b/scenes/cornell.txt index 83ff820..3518f12 100644 --- a/scenes/cornell.txt +++ b/scenes/cornell.txt @@ -52,7 +52,7 @@ EMITTANCE 0 CAMERA RES 800 800 FOVY 45 -ITERATIONS 5000 +ITERATIONS 30 DEPTH 8 FILE cornell EYE 0.0 5 10.5 diff --git a/src/interactions.h b/src/interactions.h index 144a9f5..d65fddd 100644 --- a/src/interactions.h +++ b/src/interactions.h @@ -1,14 +1,17 @@ #pragma once #include "intersections.h" +#include "glm/glm.hpp" +#include "glm/gtx/norm.hpp" +// CHECKITOUT /** * Computes a cosine-weighted random direction in a hemisphere. * Used for diffuse lighting. */ __host__ __device__ glm::vec3 calculateRandomDirectionInHemisphere( - glm::vec3 normal, thrust::default_random_engine &rng) { + glm::vec3 normal, thrust::default_random_engine& rng) { thrust::uniform_real_distribution u01(0, 1); float up = sqrt(u01(rng)); // cos(theta) @@ -23,9 +26,11 @@ glm::vec3 calculateRandomDirectionInHemisphere( glm::vec3 directionNotNormal; if (abs(normal.x) < SQRT_OF_ONE_THIRD) { directionNotNormal = glm::vec3(1, 0, 0); - } else if (abs(normal.y) < SQRT_OF_ONE_THIRD) { + } + else if (abs(normal.y) < SQRT_OF_ONE_THIRD) { directionNotNormal = glm::vec3(0, 1, 0); - } else { + } + else { directionNotNormal = glm::vec3(0, 0, 1); } @@ -41,22 +46,119 @@ glm::vec3 calculateRandomDirectionInHemisphere( } /** - * Simple ray scattering with diffuse and perfect specular support. + * Scatter a ray with some probabilities according to the material properties. + * For example, a diffuse surface scatters in a cosine-weighted hemisphere. + * A perfect specular surface scatters in the reflected ray direction. + * In order to apply multiple effects to one surface, probabilistically choose + * between them. + * + * The visual effect you want is to straight-up add the diffuse and specular + * components. You can do this in a few ways. This logic also applies to + * combining other types of materias (such as refractive). + * + * - Always take an even (50/50) split between a each effect (a diffuse bounce + * and a specular bounce), but divide the resulting color of either branch + * by its probability (0.5), to counteract the chance (0.5) of the branch + * being taken. + * - This way is inefficient, but serves as a good starting point - it + * converges slowly, especially for pure-diffuse or pure-specular. + * - Pick the split based on the intensity of each material color, and divide + * branch result by that branch's probability (whatever probability you use). + * + * This method applies its changes to the Ray parameter `ray` in place. + * It also modifies the color `color` of the ray in place. + * + * You may need to change the parameter list for your purposes! */ + +__host__ __device__ +float schlickEquation(float ior, float n, float cos) { + float r0 = (n - ior) / (n + ior); + r0 = r0 * r0; + return r0 + (1.f - r0) * glm::pow(1.f - cos, 5.f); +} + +//__host__ __device__ +//bool refractHelper(Ray& ray, glm::vec3& normal, glm::vec3& refract, float n) { +// /*glm::vec3 normalized = glm::normalize(ray.direction); +// float dot = glm::dot(normalized, normal);*/ +// +// float d = 1.0 - n * n * (1.0 - dot * dot); +// if (d >= 1.f) { +// refract = n * (normalized - normal * dot) - normal * glm::sqrt(d); +// return true; +// } +// return false; +//} + +__host__ __device__ +void refractScatter(PathSegment& path, const Material& m, glm::vec3 intersect, glm::vec3 normal, thrust::default_random_engine& rng) { + thrust::uniform_real_distribution u01(0, 1); + float num = u01(rng); + float n = 1.f; + float probability; + glm::vec3 normal2 = normal; + float ior = m.indexOfRefraction; + + float cos = glm::clamp(glm::dot(path.ray.direction, normal), -1.f, 1.f); + + if (cos >= 0.f) { + normal2 = -normal; + n = ior; + ior = 1.f; + } + else { + cos = glm::abs(cos); + } + + glm::vec3 reflect = glm::normalize(glm::reflect(path.ray.direction, normal2)); + float x = n / ior; + float sin = glm::sqrt(glm::max(0.f, 1.f - cos * cos)); + + if (x * sin < 1.f) { + //schlick equation + probability = schlickEquation(ior, n, cos); + + if (num < probability) { + path.ray.direction = reflect; + } + else { + path.ray.direction = glm::refract(path.ray.direction, normal2, x); + } + } + else { + path.ray.direction = reflect; + } + + path.ray.origin = intersect + (path.ray.direction * 0.01f); + path.color *= m.specular.color; +} + __host__ __device__ void scatterRay( - PathSegment & pathSegment, - glm::vec3 intersect, - glm::vec3 normal, - const Material &m, - thrust::default_random_engine &rng) { - glm::vec3 newDirection; - if (m.hasReflective) { - newDirection = glm::reflect(pathSegment.ray.direction, normal); - } else { - newDirection = calculateRandomDirectionInHemisphere(normal, rng); - } - - pathSegment.ray.direction = newDirection; - pathSegment.ray.origin = intersect + (newDirection * 0.0001f); + PathSegment& pathSegment, + glm::vec3 intersect, + glm::vec3 normal, + const Material& m, + thrust::default_random_engine& rng) { + + glm::vec3 dir_diffuse = calculateRandomDirectionInHemisphere(normal, rng); + glm::vec3 dir_specular = glm::normalize(glm::reflect(pathSegment.ray.direction, normal)); //not sure if have to normalize here + + //specular + if (m.hasReflective > 0) { + pathSegment.ray.direction = dir_specular; + pathSegment.ray.origin = intersect + 0.0001f * normal; + pathSegment.color *= m.specular.color; + } + else if (m.hasRefractive > 0) { + //refractive (glass, water, etc) + refractScatter(pathSegment, m, intersect, normal, rng); + } + else { + //diffuse + pathSegment.ray.direction = dir_diffuse; + pathSegment.ray.origin = intersect + 0.0001f * normal; + pathSegment.color *= m.color; + } } diff --git a/src/intersections.h b/src/intersections.h index c3e81f4..58a14db 100644 --- a/src/intersections.h +++ b/src/intersections.h @@ -6,6 +6,8 @@ #include "sceneStructs.h" #include "utilities.h" +#define BOUNDING_BOX 0 + /** * Handy-dandy hash function that provides seeds for random number generation. */ @@ -19,6 +21,7 @@ __host__ __device__ inline unsigned int utilhash(unsigned int a) { return a; } +// CHECKITOUT /** * Compute a point at parameter value `t` on ray `r`. * Falls slightly short so that it doesn't intersect the object it's hitting. @@ -34,6 +37,7 @@ __host__ __device__ glm::vec3 multiplyMV(glm::mat4 m, glm::vec4 v) { return glm::vec3(m * v); } +// CHECKITOUT /** * Test intersection between a ray and a transformed cube. Untransformed, * the cube ranges from -0.5 to 0.5 in each axis and is centered at the origin. @@ -44,9 +48,9 @@ __host__ __device__ glm::vec3 multiplyMV(glm::mat4 m, glm::vec4 v) { * @return Ray parameter `t` value. -1 if no intersection. */ __host__ __device__ float boxIntersectionTest(Geom box, Ray r, - glm::vec3 &intersectionPoint, glm::vec3 &normal, bool &outside) { + glm::vec3& intersectionPoint, glm::vec3& normal, bool& outside) { Ray q; - q.origin = multiplyMV(box.inverseTransform, glm::vec4(r.origin , 1.0f)); + q.origin = multiplyMV(box.inverseTransform, glm::vec4(r.origin, 1.0f)); q.direction = glm::normalize(multiplyMV(box.inverseTransform, glm::vec4(r.direction, 0.0f))); float tmin = -1e38f; @@ -87,6 +91,7 @@ __host__ __device__ float boxIntersectionTest(Geom box, Ray r, return -1; } +// CHECKITOUT /** * Test intersection between a ray and a transformed sphere. Untransformed, * the sphere always has radius 0.5 and is centered at the origin. @@ -97,7 +102,7 @@ __host__ __device__ float boxIntersectionTest(Geom box, Ray r, * @return Ray parameter `t` value. -1 if no intersection. */ __host__ __device__ float sphereIntersectionTest(Geom sphere, Ray r, - glm::vec3 &intersectionPoint, glm::vec3 &normal, bool &outside) { + glm::vec3& intersectionPoint, glm::vec3& normal, bool& outside) { float radius = .5; glm::vec3 ro = multiplyMV(sphere.inverseTransform, glm::vec4(r.origin, 1.0f)); @@ -121,10 +126,12 @@ __host__ __device__ float sphereIntersectionTest(Geom sphere, Ray r, float t = 0; if (t1 < 0 && t2 < 0) { return -1; - } else if (t1 > 0 && t2 > 0) { + } + else if (t1 > 0 && t2 > 0) { t = min(t1, t2); outside = true; - } else { + } + else { t = max(t1, t2); outside = false; } @@ -139,3 +146,18 @@ __host__ __device__ float sphereIntersectionTest(Geom sphere, Ray r, return glm::length(r.origin - intersectionPoint); } + +__host__ __device__ float triangleIntersectionTest(Geom triangle, Ray r, glm::vec3& intersectionPoint, glm::vec3& normal, bool& outside) { + glm::vec3 pt1 = glm::vec3(triangle.transform * glm::vec4(triangle.triangle.pt1.pos, 1.0f)); + glm::vec3 pt2 = glm::vec3(triangle.transform * glm::vec4(triangle.triangle.pt2.pos, 1.0f)); + glm::vec3 pt3 = glm::vec3(triangle.transform * glm::vec4(triangle.triangle.pt3.pos, 1.0f)); + + glm::vec3 inter; + bool intersects = glm::intersectRayTriangle(r.origin, r.direction, pt1, pt2, pt3, inter); + if (!intersects) return -1.f; + + float z = 1.0f - inter.x - inter.y; + intersectionPoint = inter.x * pt1 + inter.y * pt2 + z * pt3; + normal = glm::normalize(glm::cross(pt2 - pt1, pt3 - pt1)); + return inter.z; +} diff --git a/src/main.cpp b/src/main.cpp index 4092ae4..4e6b572 100644 --- a/src/main.cpp +++ b/src/main.cpp @@ -1,11 +1,15 @@ #include "main.h" #include "preview.h" #include +#include #include "../imgui/imgui.h" #include "../imgui/imgui_impl_glfw.h" #include "../imgui/imgui_impl_opengl3.h" +#define DENOISE 1 +#define TIMER 0 + static std::string startTimeString; // For camera controls @@ -23,8 +27,8 @@ int ui_iterations = 0; int startupIterations = 0; int lastLoopIterations = 0; bool ui_showGbuffer = false; -bool ui_denoise = false; -int ui_filterSize = 80; +bool ui_denoise = true; +int ui_filterSize = 200; float ui_colorWeight = 0.45f; float ui_normalWeight = 0.35f; float ui_positionWeight = 0.2f; @@ -38,13 +42,17 @@ float zoom, theta, phi; glm::vec3 cameraPosition; glm::vec3 ogLookAt; // for recentering the camera -Scene *scene; -RenderState *renderState; +Scene* scene; +RenderState* renderState; int iteration; int width; int height; +#if TIMER +float averageTime = 0; +#endif + //------------------------------- //-------------MAIN-------------- //------------------------------- @@ -57,7 +65,7 @@ int main(int argc, char** argv) { return 1; } - const char *sceneFile = argv[1]; + const char* sceneFile = argv[1]; // Load scene file scene = new Scene(sceneFile); @@ -65,7 +73,7 @@ int main(int argc, char** argv) { // Set up camera stuff from loaded path tracer settings iteration = 0; renderState = &scene->state; - Camera &cam = renderState->camera; + Camera& cam = renderState->camera; width = cam.resolution.x; height = cam.resolution.y; @@ -122,13 +130,13 @@ void saveImage() { void runCuda() { if (lastLoopIterations != ui_iterations) { - lastLoopIterations = ui_iterations; - camchanged = true; + lastLoopIterations = ui_iterations; + camchanged = true; } if (camchanged) { iteration = 0; - Camera &cam = renderState->camera; + Camera& cam = renderState->camera; cameraPosition.x = zoom * sin(phi) * sin(theta); cameraPosition.y = zoom * cos(theta); cameraPosition.z = zoom * cos(phi) * sin(theta); @@ -144,7 +152,7 @@ void runCuda() { cameraPosition += cam.lookAt; cam.position = cameraPosition; camchanged = false; - } + } // Map OpenGL buffer object for writing from CUDA on a single GPU // No data is moved (Win & Linux). When mapped to CUDA, OpenGL should not use this buffer @@ -154,7 +162,7 @@ void runCuda() { pathtraceInit(scene); } - uchar4 *pbo_dptr = NULL; + uchar4* pbo_dptr = NULL; cudaGLMapBufferObject((void**)&pbo_dptr, pbo); if (iteration < ui_iterations) { @@ -162,13 +170,41 @@ void runCuda() { // execute the kernel int frame = 0; + +#if TIMER + std::chrono::high_resolution_clock::time_point timer_start = std::chrono::high_resolution_clock::now(); +#endif pathtrace(frame, iteration); +#if TIMER + std::chrono::high_resolution_clock::time_point timer_end = std::chrono::high_resolution_clock::now(); + std::chrono::duration period = timer_end - timer_start; + float prev_cpu_time = static_cast(period.count()); + averageTime = (averageTime * (iteration - 1) + prev_cpu_time) / (iteration); + cout << "Iterations:" << iteration << ", Time: " << prev_cpu_time << ", Average Time" << averageTime << endl; +#endif } if (ui_showGbuffer) { - showGBuffer(pbo_dptr); - } else { - showImage(pbo_dptr, iteration); + showGBuffer(pbo_dptr); + } + else if (iteration == ui_iterations) { + //showDenoised(pbo_dptr, iteration, ui_colorWeight, ui_normalWeight, ui_positionWeight, ui_filterSize); + if (ui_denoise) { +#if TIMER + std::chrono::high_resolution_clock::time_point timer_start = std::chrono::high_resolution_clock::now(); +#endif + showDenoised(pbo_dptr, iteration, ui_colorWeight, ui_normalWeight, ui_positionWeight, ui_filterSize); +#if TIMER + std::chrono::high_resolution_clock::time_point timer_end = std::chrono::high_resolution_clock::now(); + std::chrono::duration period = timer_end - timer_start; + float prev_cpu_time = static_cast(period.count()); + cout << "Denoising time:" << prev_cpu_time << endl; +#endif + } + + } + else { + showImage(pbo_dptr, iteration); } // unmap buffer object @@ -184,59 +220,61 @@ void runCuda() { void keyCallback(GLFWwindow* window, int key, int scancode, int action, int mods) { if (action == GLFW_PRESS) { - switch (key) { - case GLFW_KEY_ESCAPE: - saveImage(); - glfwSetWindowShouldClose(window, GL_TRUE); - break; - case GLFW_KEY_S: - saveImage(); - break; - case GLFW_KEY_SPACE: - camchanged = true; - renderState = &scene->state; - Camera &cam = renderState->camera; - cam.lookAt = ogLookAt; - break; - } + switch (key) { + case GLFW_KEY_ESCAPE: + saveImage(); + glfwSetWindowShouldClose(window, GL_TRUE); + break; + case GLFW_KEY_S: + saveImage(); + break; + case GLFW_KEY_SPACE: + camchanged = true; + renderState = &scene->state; + Camera& cam = renderState->camera; + cam.lookAt = ogLookAt; + break; + } } } void mouseButtonCallback(GLFWwindow* window, int button, int action, int mods) { - if (ImGui::GetIO().WantCaptureMouse) return; - leftMousePressed = (button == GLFW_MOUSE_BUTTON_LEFT && action == GLFW_PRESS); - rightMousePressed = (button == GLFW_MOUSE_BUTTON_RIGHT && action == GLFW_PRESS); - middleMousePressed = (button == GLFW_MOUSE_BUTTON_MIDDLE && action == GLFW_PRESS); + if (ImGui::GetIO().WantCaptureMouse) return; + leftMousePressed = (button == GLFW_MOUSE_BUTTON_LEFT && action == GLFW_PRESS); + rightMousePressed = (button == GLFW_MOUSE_BUTTON_RIGHT && action == GLFW_PRESS); + middleMousePressed = (button == GLFW_MOUSE_BUTTON_MIDDLE && action == GLFW_PRESS); } void mousePositionCallback(GLFWwindow* window, double xpos, double ypos) { - if (xpos == lastX || ypos == lastY) return; // otherwise, clicking back into window causes re-start - if (leftMousePressed) { - // compute new camera parameters - phi -= (xpos - lastX) / width; - theta -= (ypos - lastY) / height; - theta = std::fmax(0.001f, std::fmin(theta, PI)); - camchanged = true; - } - else if (rightMousePressed) { - zoom += (ypos - lastY) / height; - zoom = std::fmax(0.1f, zoom); - camchanged = true; - } - else if (middleMousePressed) { - renderState = &scene->state; - Camera &cam = renderState->camera; - glm::vec3 forward = cam.view; - forward.y = 0.0f; - forward = glm::normalize(forward); - glm::vec3 right = cam.right; - right.y = 0.0f; - right = glm::normalize(right); - - cam.lookAt -= (float) (xpos - lastX) * right * 0.01f; - cam.lookAt += (float) (ypos - lastY) * forward * 0.01f; - camchanged = true; - } - lastX = xpos; - lastY = ypos; + if (xpos == lastX || ypos == lastY) return; // otherwise, clicking back into window causes re-start + if (leftMousePressed) { + // compute new camera parameters + phi -= (xpos - lastX) / width; + theta -= (ypos - lastY) / height; + theta = std::fmax(0.001f, std::fmin(theta, PI)); + camchanged = true; + } + else if (rightMousePressed) { + zoom += (ypos - lastY) / height; + zoom = std::fmax(0.1f, zoom); + camchanged = true; + } + else if (middleMousePressed) { + renderState = &scene->state; + Camera& cam = renderState->camera; + glm::vec3 forward = cam.view; + forward.y = 0.0f; + forward = glm::normalize(forward); + glm::vec3 right = cam.right; + right.y = 0.0f; + right = glm::normalize(right); + + cam.lookAt -= (float)(xpos - lastX) * right * 0.01f; + cam.lookAt += (float)(ypos - lastY) * forward * 0.01f; + camchanged = true; + } + lastX = xpos; + lastY = ypos; } + + diff --git a/src/pathtrace.cu b/src/pathtrace.cu index 23e5f90..dcb5de8 100644 --- a/src/pathtrace.cu +++ b/src/pathtrace.cu @@ -4,6 +4,10 @@ #include #include #include +#include +#include +#include +#include #include "sceneStructs.h" #include "scene.h" @@ -15,10 +19,25 @@ #include "interactions.h" #define ERRORCHECK 1 +#define ANTI_ALIASING 0 +#define CACHE_BOUNCE 0 +#define SORT_MATERIALS 0 +#define DEPTH_OF_FIELD 0 +#define DIRECT_LIGHTING 0 + +#define LENS_RADIUS 0.07 +#define FOCAL_DISTANCE 5 + +#define GBUF_MODE 0 //change this to be 0 for pos or 1 for nor +#define POS 0 +#define NOR 1 + +#define TIMER 0 #define FILENAME (strrchr(__FILE__, '/') ? strrchr(__FILE__, '/') + 1 : __FILE__) #define checkCUDAError(msg) checkCUDAErrorFn(msg, FILENAME, __LINE__) -void checkCUDAErrorFn(const char *msg, const char *file, int line) { + +void checkCUDAErrorFn(const char* msg, const char* file, int line) { #if ERRORCHECK cudaDeviceSynchronize(); cudaError_t err = cudaGetLastError(); @@ -45,8 +64,7 @@ thrust::default_random_engine makeSeededRandomEngine(int iter, int index, int de } //Kernel that writes the image to the OpenGL PBO directly. -__global__ void sendImageToPBO(uchar4* pbo, glm::ivec2 resolution, - int iter, glm::vec3* image) { +__global__ void sendImageToPBO(uchar4* pbo, glm::ivec2 resolution, int iter, glm::vec3* image) { int x = (blockIdx.x * blockDim.x) + threadIdx.x; int y = (blockIdx.y * blockDim.y) + threadIdx.y; @@ -55,9 +73,9 @@ __global__ void sendImageToPBO(uchar4* pbo, glm::ivec2 resolution, glm::vec3 pix = image[index]; glm::ivec3 color; - color.x = glm::clamp((int) (pix.x / iter * 255.0), 0, 255); - color.y = glm::clamp((int) (pix.y / iter * 255.0), 0, 255); - color.z = glm::clamp((int) (pix.z / iter * 255.0), 0, 255); + color.x = glm::clamp((int)(pix.x / iter * 255.0), 0, 255); + color.y = glm::clamp((int)(pix.y / iter * 255.0), 0, 255); + color.z = glm::clamp((int)(pix.z / iter * 255.0), 0, 255); // Each thread writes one pixel location in the texture (textel) pbo[index].w = 0; @@ -75,61 +93,159 @@ __global__ void gbufferToPBO(uchar4* pbo, glm::ivec2 resolution, GBufferPixel* g int index = x + (y * resolution.x); float timeToIntersect = gBuffer[index].t * 256.0; + glm::vec3 pos = glm::clamp(glm::abs(gBuffer[index].pos * 20.0f), 0.0f, 255.0f); + glm::vec3 nor = glm::clamp(glm::abs(gBuffer[index].nor * 255.0f), 0.0f, 255.0f); + pbo[index].w = 0; - pbo[index].x = timeToIntersect; - pbo[index].y = timeToIntersect; - pbo[index].z = timeToIntersect; + +#if GBUF_MODE == POS + pbo[index].x = pos.x; + pbo[index].y = pos.y; + pbo[index].z = pos.z; +#elif GBUF_MODE == NOR + pbo[index].x = nor.x; + pbo[index].y = nor.y; + pbo[index].z = nor.z; +#endif + } } -static Scene * hst_scene = NULL; -static glm::vec3 * dev_image = NULL; -static Geom * dev_geoms = NULL; -static Material * dev_materials = NULL; -static PathSegment * dev_paths = NULL; -static ShadeableIntersection * dev_intersections = NULL; +static Scene* hst_scene = NULL; +static glm::vec3* dev_image = NULL; +static Geom* dev_geoms = NULL; +static Material* dev_materials = NULL; +static PathSegment* dev_paths = NULL; +static ShadeableIntersection* dev_intersections = NULL; static GBufferPixel* dev_gBuffer = NULL; +static glm::vec3* dev_denoise1 = NULL; +static glm::vec3* dev_denoise2 = NULL; + +#if TIMER +cudaEvent_t start, stop; +float totalTime = 0.0; +bool countStart = true; +#endif + // TODO: static variables for device memory, any extra info you need, etc // ... +static ShadeableIntersection* dev_first_bounce = NULL; +#if DIRECT_LIGHTING +static Geom* dev_lights = NULL; +#endif -void pathtraceInit(Scene *scene) { +void pathtraceInit(Scene* scene) { hst_scene = scene; - const Camera &cam = hst_scene->state.camera; + const Camera& cam = hst_scene->state.camera; const int pixelcount = cam.resolution.x * cam.resolution.y; cudaMalloc(&dev_image, pixelcount * sizeof(glm::vec3)); cudaMemset(dev_image, 0, pixelcount * sizeof(glm::vec3)); - cudaMalloc(&dev_paths, pixelcount * sizeof(PathSegment)); + cudaMalloc(&dev_paths, pixelcount * sizeof(PathSegment)); - cudaMalloc(&dev_geoms, scene->geoms.size() * sizeof(Geom)); - cudaMemcpy(dev_geoms, scene->geoms.data(), scene->geoms.size() * sizeof(Geom), cudaMemcpyHostToDevice); + cudaMalloc(&dev_geoms, scene->geoms.size() * sizeof(Geom)); + cudaMemcpy(dev_geoms, scene->geoms.data(), scene->geoms.size() * sizeof(Geom), cudaMemcpyHostToDevice); - cudaMalloc(&dev_materials, scene->materials.size() * sizeof(Material)); - cudaMemcpy(dev_materials, scene->materials.data(), scene->materials.size() * sizeof(Material), cudaMemcpyHostToDevice); + cudaMalloc(&dev_materials, scene->materials.size() * sizeof(Material)); + cudaMemcpy(dev_materials, scene->materials.data(), scene->materials.size() * sizeof(Material), cudaMemcpyHostToDevice); - cudaMalloc(&dev_intersections, pixelcount * sizeof(ShadeableIntersection)); - cudaMemset(dev_intersections, 0, pixelcount * sizeof(ShadeableIntersection)); + cudaMalloc(&dev_intersections, pixelcount * sizeof(ShadeableIntersection)); + cudaMemset(dev_intersections, 0, pixelcount * sizeof(ShadeableIntersection)); cudaMalloc(&dev_gBuffer, pixelcount * sizeof(GBufferPixel)); + cudaMalloc(&dev_denoise1, pixelcount * sizeof(glm::vec3)); + cudaMemset(dev_denoise1, 0, pixelcount * sizeof(glm::vec3)); + + cudaMalloc(&dev_denoise2, pixelcount * sizeof(glm::vec3)); + cudaMemset(dev_denoise2, 0, pixelcount * sizeof(glm::vec3)); + // TODO: initialize any extra device memeory you need +#if CACHE_BOUNCE || SORT_MATERIALS + cudaMalloc(&dev_first_bounce, pixelcount * sizeof(ShadeableIntersection)); + cudaMemset(dev_first_bounce, 0, pixelcount * sizeof(ShadeableIntersection)); +#endif + +#if DIRECT_LIGHTING + cudaMalloc(&dev_lights, scene->lights.size() * sizeof(Geom)); + cudaMemcpy(dev_lights, scene->lights.data(), scene->lights.size() * sizeof(Geom), cudaMemcpyHostToDevice); +#endif + +#if TIMER + cudaEventCreate(&start); + cudaEventCreate(&stop); +#endif checkCUDAError("pathtraceInit"); } void pathtraceFree() { cudaFree(dev_image); // no-op if dev_image is null - cudaFree(dev_paths); - cudaFree(dev_geoms); - cudaFree(dev_materials); - cudaFree(dev_intersections); + cudaFree(dev_paths); + cudaFree(dev_geoms); + cudaFree(dev_materials); + cudaFree(dev_intersections); cudaFree(dev_gBuffer); + cudaFree(dev_denoise1); + cudaFree(dev_denoise2); + // TODO: clean up any extra device memory you created +#if CACHE_BOUNCE || SORT_MATERIALS + cudaFree(dev_first_bounce); +#endif +#if DIRECT_LIGHTING + cudaFree(dev_lights); +#endif checkCUDAError("pathtraceFree"); } +__host__ __device__ +glm::vec3 pointOnPlane(Geom light, thrust::default_random_engine& rng) { + thrust::uniform_real_distribution u01(0, 1); + glm::vec2 pt(u01(rng), u01(rng)); + glm::vec3 planePt = glm::vec3((pt - glm::vec2(0.5f)), 0.f); + return glm::vec3(light.transform * glm::vec4(planePt, 1.f)); +} + + +__host__ __device__ +glm::vec3 convertDisk(const glm::vec2& v) { + float x = v.x; + float y = v.y; + float phi, r; + float a = 2 * x - 1.f; + float b = 2 * y - 1.f; + + if (a > -b) { + if (a > b) { + r = a; + phi = (PI / 4) * (b / a); + } + else { + r = b; + phi = (PI / 4) * (2 - (a / b)); + } + } + else { + if (a < b) { + r = -a; + phi = (PI / 4) * (4 + (b / a)); + } + else { + r = -b; + if (b < 0 || b > 0) { + phi = (PI / 4) * (6 - (a / b)); + } + else { + phi = 0; + } + } + } + return glm::vec3(cosf(phi) * r, sinf(phi) * r, 0); +} + /** * Generate PathSegments with rays from the camera through the screen into the * scene, which is the first bounce of rays. @@ -140,190 +256,397 @@ void pathtraceFree() { */ __global__ void generateRayFromCamera(Camera cam, int iter, int traceDepth, PathSegment* pathSegments) { - int x = (blockIdx.x * blockDim.x) + threadIdx.x; - int y = (blockIdx.y * blockDim.y) + threadIdx.y; + int x = (blockIdx.x * blockDim.x) + threadIdx.x; + int y = (blockIdx.y * blockDim.y) + threadIdx.y; + + if (x < cam.resolution.x && y < cam.resolution.y) { + int index = x + (y * cam.resolution.x); + PathSegment& segment = pathSegments[index]; + + segment.ray.origin = cam.position; + segment.color = glm::vec3(1.0f, 1.0f, 1.0f); - if (x < cam.resolution.x && y < cam.resolution.y) { - int index = x + (y * cam.resolution.x); - PathSegment & segment = pathSegments[index]; + //antialiasing by jittering the ray + float x_aa = x; + float y_aa = y; + thrust::default_random_engine random = makeSeededRandomEngine(iter, index, traceDepth); - segment.ray.origin = cam.position; - segment.color = glm::vec3(1.0f, 1.0f, 1.0f); - segment.ray.direction = glm::normalize(cam.view - - cam.right * cam.pixelLength.x * ((float)x - (float)cam.resolution.x * 0.5f) - - cam.up * cam.pixelLength.y * ((float)y - (float)cam.resolution.y * 0.5f) - ); +#if ANTI_ALIASING + thrust::uniform_real_distribution u01(-0.55f, 0.55f); + x_aa += u01(random); + y_aa += u01(random); +#endif + segment.ray.direction = glm::normalize(cam.view + - cam.right * cam.pixelLength.x * ((float)x_aa - (float)cam.resolution.x * 0.5f) + - cam.up * cam.pixelLength.y * ((float)y_aa - (float)cam.resolution.y * 0.5f) + ); + + //for depth of field +#if DEPTH_OF_FIELD + thrust::uniform_real_distribution u02(0, 1); + glm::vec3 sample = convertDisk(glm::vec2(u02(random), u02(random))); + glm::vec3 lens = (float)LENS_RADIUS * sample; + glm::vec3 pt = segment.ray.origin + lens; + glm::vec3 fp = segment.ray.origin + (float)FOCAL_DISTANCE * segment.ray.direction; + + segment.ray.origin = pt; + segment.ray.direction = glm::normalize(fp - pt); - segment.pixelIndex = index; - segment.remainingBounces = traceDepth; - } +#endif + segment.pixelIndex = index; + segment.remainingBounces = traceDepth; + } } +// TODO: +// computeIntersections handles generating ray intersections ONLY. +// Generating new rays is handled in your shader(s). +// Feel free to modify the code below. __global__ void computeIntersections( - int depth - , int num_paths - , PathSegment * pathSegments - , Geom * geoms - , int geoms_size - , ShadeableIntersection * intersections - ) + int depth + , int num_paths + , PathSegment* pathSegments + , Geom* geoms + , int geoms_size + , ShadeableIntersection* intersections +) { - int path_index = blockIdx.x * blockDim.x + threadIdx.x; - - if (path_index < num_paths) - { - PathSegment pathSegment = pathSegments[path_index]; - - float t; - glm::vec3 intersect_point; - glm::vec3 normal; - float t_min = FLT_MAX; - int hit_geom_index = -1; - bool outside = true; - - glm::vec3 tmp_intersect; - glm::vec3 tmp_normal; - - // naive parse through global geoms - - for (int i = 0; i < geoms_size; i++) - { - Geom & geom = geoms[i]; - - if (geom.type == CUBE) - { - t = boxIntersectionTest(geom, pathSegment.ray, tmp_intersect, tmp_normal, outside); - } - else if (geom.type == SPHERE) - { - t = sphereIntersectionTest(geom, pathSegment.ray, tmp_intersect, tmp_normal, outside); - } - - // Compute the minimum t from the intersection tests to determine what - // scene geometry object was hit first. - if (t > 0.0f && t_min > t) - { - t_min = t; - hit_geom_index = i; - intersect_point = tmp_intersect; - normal = tmp_normal; - } - } - - if (hit_geom_index == -1) - { - intersections[path_index].t = -1.0f; - } - else - { - //The ray hits something - intersections[path_index].t = t_min; - intersections[path_index].materialId = geoms[hit_geom_index].materialid; - intersections[path_index].surfaceNormal = normal; - } - } + int path_index = blockIdx.x * blockDim.x + threadIdx.x; + + if (path_index < num_paths) + { + PathSegment pathSegment = pathSegments[path_index]; + + float t; + glm::vec3 intersect_point; + glm::vec3 normal; + float t_min = FLT_MAX; + int hit_geom_index = -1; + bool outside = true; + + glm::vec3 tmp_intersect; + glm::vec3 tmp_normal; + + // naive parse through global geoms + + for (int i = 0; i < geoms_size; i++) + { + Geom& geom = geoms[i]; + + if (geom.type == CUBE) + { + t = boxIntersectionTest(geom, pathSegment.ray, tmp_intersect, tmp_normal, outside); + } + else if (geom.type == SPHERE) + { + t = sphereIntersectionTest(geom, pathSegment.ray, tmp_intersect, tmp_normal, outside); + } + else if (geom.type == TRIANGLE) { + t = triangleIntersectionTest(geom, pathSegment.ray, tmp_intersect, tmp_normal, outside); + } + // TODO: add more intersection tests here... triangle? metaball? CSG? + + // Compute the minimum t from the intersection tests to determine what + // scene geometry object was hit first. + if (t > 0.0f && t_min > t) + { + t_min = t; + hit_geom_index = i; + intersect_point = tmp_intersect; + normal = tmp_normal; + } + } + + if (hit_geom_index == -1) + { + intersections[path_index].t = -1.0f; + } + else + { + //The ray hits something + intersections[path_index].t = t_min; + intersections[path_index].materialId = geoms[hit_geom_index].materialid; + intersections[path_index].surfaceNormal = normal; + } + } } -__global__ void shadeSimpleMaterials ( - int iter - , int num_paths - , ShadeableIntersection * shadeableIntersections - , PathSegment * pathSegments - , Material * materials - ) +// shade for direct lighting +__global__ void shadeDirectLighting( + int iter + , int num_paths + , ShadeableIntersection* shadeableIntersections + , PathSegment* pathSegments + , Material* materials + , Geom* lights + , int num +) { - int idx = blockIdx.x * blockDim.x + threadIdx.x; - if (idx < num_paths) - { - ShadeableIntersection intersection = shadeableIntersections[idx]; - PathSegment segment = pathSegments[idx]; - if (segment.remainingBounces == 0) { - return; + int idx = blockIdx.x * blockDim.x + threadIdx.x; + if (idx < num_paths) + { + if (pathSegments[idx].remainingBounces <= 0) { + return; + } + + ShadeableIntersection intersection = shadeableIntersections[idx]; + thrust::default_random_engine rng = makeSeededRandomEngine(iter, idx, pathSegments[idx].remainingBounces); + PathSegment path = pathSegments[idx]; + + if (path.remainingBounces != 2 && path.remainingBounces > 0 && intersection.t > 0.f) { + + thrust::uniform_real_distribution u01(0, 1); + + Material material = materials[intersection.materialId]; + glm::vec3 materialColor = material.color; + + // If the material indicates that the object was a light, "light" the ray + if (material.emittance > 0.0f) { + pathSegments[idx].color *= (materialColor * material.emittance); + pathSegments[idx].remainingBounces = 0; + } + else if (pathSegments[idx].remainingBounces == 1) { + pathSegments[idx].remainingBounces -= 1; + pathSegments[idx].color = glm::vec3(0.0f); + } + else { + pathSegments[idx].remainingBounces -= 1; + scatterRay(pathSegments[idx], pathSegments[idx].ray.origin + pathSegments[idx].ray.direction * intersection.t, intersection.surfaceNormal, + material, rng); + } + + } + else if (path.remainingBounces == 2 && intersection.t > 0.f) { + Material material = materials[intersection.materialId]; + glm::vec3 materialColor = material.color; + + // If the material indicates that the object was a light, "light" the ray + if (material.emittance > 0.0f) { + pathSegments[idx].color *= (materialColor * material.emittance); + pathSegments[idx].remainingBounces = 0; + } + else { + scatterRay(path, path.ray.origin + path.ray.direction * intersection.t, intersection.surfaceNormal, material, rng); + thrust::uniform_real_distribution u01(0, 1); + float r = u01(rng); + int lightIdx = 0; + if (num != 0) { + lightIdx = glm::min((int)glm::floor(r * num), num - 1); + } + glm::vec3 lightPt = pointOnPlane(lights[lightIdx], rng); + path.ray.direction = glm::normalize(lightPt - path.ray.origin); + path.remainingBounces--; + } + } + else { + pathSegments[idx].color = glm::vec3(0.0f); + pathSegments[idx].remainingBounces = 0; + } } +} + - if (intersection.t > 0.0f) { // if the intersection exists... - segment.remainingBounces--; - // Set up the RNG - thrust::default_random_engine rng = makeSeededRandomEngine(iter, idx, segment.remainingBounces); - - Material material = materials[intersection.materialId]; - glm::vec3 materialColor = material.color; - - // If the material indicates that the object was a light, "light" the ray - if (material.emittance > 0.0f) { - segment.color *= (materialColor * material.emittance); - segment.remainingBounces = 0; - } - else { - segment.color *= materialColor; - glm::vec3 intersectPos = intersection.t * segment.ray.direction + segment.ray.origin; - scatterRay(segment, intersectPos, intersection.surfaceNormal, material, rng); - } - // If there was no intersection, color the ray black. - // Lots of renderers use 4 channel color, RGBA, where A = alpha, often - // used for opacity, in which case they can indicate "no opacity". - // This can be useful for post-processing and image compositing. - } else { - segment.color = glm::vec3(0.0f); - segment.remainingBounces = 0; +__global__ void shadeFakeMaterial( + int iter + , int num_paths + , ShadeableIntersection* shadeableIntersections + , PathSegment* pathSegments + , Material* materials +) +{ + int idx = blockIdx.x * blockDim.x + threadIdx.x; + if (idx < num_paths) + { + if (pathSegments[idx].remainingBounces <= 0) { + return; + } + + ShadeableIntersection intersection = shadeableIntersections[idx]; + if (intersection.t > 0.0f) { // if the intersection exists... + // Set up the RNG + // LOOK: this is how you use thrust's RNG! Please look at + // makeSeededRandomEngine as well. + thrust::default_random_engine rng = makeSeededRandomEngine(iter, idx, pathSegments[idx].remainingBounces); + thrust::uniform_real_distribution u01(0, 1); + + Material material = materials[intersection.materialId]; + glm::vec3 materialColor = material.color; + + // If the material indicates that the object was a light, "light" the ray + if (material.emittance > 0.0f) { + pathSegments[idx].color *= (materialColor * material.emittance); + pathSegments[idx].remainingBounces = 0; + } + else if (pathSegments[idx].remainingBounces == 1) { + pathSegments[idx].remainingBounces -= 1; + pathSegments[idx].color = glm::vec3(0.0f); + } + else { + pathSegments[idx].remainingBounces -= 1; + scatterRay(pathSegments[idx], pathSegments[idx].ray.origin + pathSegments[idx].ray.direction * intersection.t, intersection.surfaceNormal, + material, rng); + } + // If there was no intersection, color the ray black. + } + else { + pathSegments[idx].color = glm::vec3(0.0f); + pathSegments[idx].remainingBounces = 0; + } } +} - pathSegments[idx] = segment; - } +// GBUF + DENOISING +__global__ void generateGBuffer( + int num_paths, + ShadeableIntersection* shadeableIntersections, + PathSegment* pathSegments, + GBufferPixel* gBuffer) { + int idx = blockIdx.x * blockDim.x + threadIdx.x; + if (idx < num_paths) + { + ShadeableIntersection& si = shadeableIntersections[idx]; + Ray& ray = pathSegments[idx].ray; + gBuffer[idx].t = si.t; + gBuffer[idx].pos = ray.origin + si.t * ray.direction; + gBuffer[idx].nor = si.surfaceNormal; + } } -__global__ void generateGBuffer ( - int num_paths, - ShadeableIntersection* shadeableIntersections, - PathSegment* pathSegments, - GBufferPixel* gBuffer) { - int idx = blockIdx.x * blockDim.x + threadIdx.x; - if (idx < num_paths) - { - gBuffer[idx].t = shadeableIntersections[idx].t; - } +__global__ void denoiseInit(int iteration, glm::ivec2 resolution, glm::vec3* denoise1, glm::vec3* image) +{ + int x = (blockIdx.x * blockDim.x) + threadIdx.x; + int y = (blockIdx.y * blockDim.y) + threadIdx.y; + + if (x < resolution.y && y < resolution.y) { + int idx = x + resolution.x * y; + glm::vec3 pixel = image[idx]; + + denoise1[idx].x = pixel.x / iteration; + denoise1[idx].y = pixel.y / iteration; + denoise1[idx].z = pixel.z / iteration; + } +} + +__global__ void denoiseToPBO(uchar4* pbo, glm::ivec2 resolution, glm::vec3* denoised) +{ + int x = (blockIdx.x * blockDim.x) + threadIdx.x; + int y = (blockIdx.y * blockDim.y) + threadIdx.y; + + if (x < resolution.y && y < resolution.y) { + int idx = x + resolution.x * y; + glm::vec3 pixel = denoised[idx]; + + pbo[idx].w = 0; + pbo[idx].x = glm::clamp((int)(pixel.x * 255.0), 0, 255); + pbo[idx].y = glm::clamp((int)(pixel.y * 255.0), 0, 255); + pbo[idx].z = glm::clamp((int)(pixel.z * 255.0), 0, 255); + } +} + +__global__ void denoiseImage(glm::ivec2 resolution, glm::vec3* denoise1, glm::vec3* denoise2, GBufferPixel* gBuffer, + float cPhi, float nPhi, float pPhi, int step) +{ + int x = (blockIdx.x * blockDim.x) + threadIdx.x; + int y = (blockIdx.y * blockDim.y) + threadIdx.y; + + if (x < resolution.y && y < resolution.y) { + int idx = x + resolution.x * y; + glm::vec3 c = denoise1[idx]; + glm::vec3 n = gBuffer[idx].nor; + glm::vec3 p = gBuffer[idx].pos; + glm::vec3 sum = glm::vec3(0.f); + float cumW = 0.f; + + float kernel1D[5] = { 0.0625f, 0.25f, 0.375f, 0.25f, 0.0625f }; //from recitation slides kernel + + for (int i = -2; i <= 2; i++) { + for (int j = -2; j <= 2; j++) { + float kernel2D = kernel1D[i + 2] * kernel1D[j + 2]; + int xCoord = glm::clamp(x + step * i, 0, resolution.x - 1); + int yCoord = glm::clamp(y + step * j, 0, resolution.y - 1); + int curIdx = xCoord + yCoord * resolution.x; + + glm::vec3 cCur = denoise1[curIdx]; + glm::vec3 nCur = gBuffer[curIdx].nor; + glm::vec3 pCur = gBuffer[curIdx].pos; + + float cDist = glm::dot(cCur - c, cCur - c); + float nDist = glm::dot(nCur - n, nCur - n); + float pDist = glm::dot(pCur - p, pCur - p); + + float cWeight = glm::min(glm::exp(-cDist / cPhi), 1.f); + float nWeight = glm::min(glm::exp(-nDist / nPhi), 1.f); + float pWeight = glm::min(glm::exp(-pDist / pPhi), 1.f); + + float weight = cWeight * nWeight * pWeight; + + if (weight != 0) { + sum += cCur * weight * kernel2D; + cumW += weight * kernel2D; + } + else { + sum += cCur * kernel2D; + cumW += kernel2D; + } + } + } + denoise2[idx] = sum / cumW; + } } // Add the current iteration's output to the overall image -__global__ void finalGather(int nPaths, glm::vec3 * image, PathSegment * iterationPaths) +__global__ void finalGather(int nPaths, glm::vec3* image, PathSegment* iterationPaths) { - int index = (blockIdx.x * blockDim.x) + threadIdx.x; + int index = (blockIdx.x * blockDim.x) + threadIdx.x; - if (index < nPaths) - { - PathSegment iterationPath = iterationPaths[index]; - image[iterationPath.pixelIndex] += iterationPath.color; - } + if (index < nPaths) + { + PathSegment iterationPath = iterationPaths[index]; + image[iterationPath.pixelIndex] += iterationPath.color; + } } +struct should_end { + __host__ __device__ + bool operator()(const PathSegment& pathSegment) { + return (pathSegment.remainingBounces >= 0); + } +}; + +struct compare_materials { + __host__ __device__ + bool operator()(const ShadeableIntersection& m1, const ShadeableIntersection& m2) { + return (m1.materialId > m2.materialId); + } +}; + /** * Wrapper for the __global__ call that sets up the kernel calls and does a ton * of memory management */ void pathtrace(int frame, int iter) { const int traceDepth = hst_scene->state.traceDepth; - const Camera &cam = hst_scene->state.camera; + const Camera& cam = hst_scene->state.camera; const int pixelcount = cam.resolution.x * cam.resolution.y; - // 2D block for generating ray from camera + // 2D block for generating ray from camera const dim3 blockSize2d(8, 8); const dim3 blocksPerGrid2d( - (cam.resolution.x + blockSize2d.x - 1) / blockSize2d.x, - (cam.resolution.y + blockSize2d.y - 1) / blockSize2d.y); + (cam.resolution.x + blockSize2d.x - 1) / blockSize2d.x, + (cam.resolution.y + blockSize2d.y - 1) / blockSize2d.y); - // 1D block for path tracing - const int blockSize1d = 128; + // 1D block for path tracing + const int blockSize1d = 128; /////////////////////////////////////////////////////////////////////////// - // Pathtracing Recap: + // Recap: // * Initialize array of path rays (using rays that come out of the camera) // * You can pass the Camera object to that kernel. // * Each path ray must carry at minimum a (ray, color) pair, // * where color starts as the multiplicative identity, white = (1, 1, 1). // * This has already been done for you. - // * NEW: For the first depth, generate geometry buffers (gbuffers) // * For each depth: // * Compute an intersection in the scene for each path ray. // A very naive version of this has been implemented for you, but feel @@ -331,103 +654,167 @@ void pathtrace(int frame, int iter) { // Currently, intersection distance is recorded as a parametric distance, // t, or a "distance along the ray." t = -1.0 indicates no intersection. // * Color is attenuated (multiplied) by reflections off of any object - // * Stream compact away all of the terminated paths. + // * TODO: Stream compact away all of the terminated paths. // You may use either your implementation or `thrust::remove_if` or its // cousins. // * Note that you can't really use a 2D kernel launch any more - switch // to 1D. - // * Shade the rays that intersected something or didn't bottom out. + // * TODO: Shade the rays that intersected something or didn't bottom out. // That is, color the ray by performing a color computation according // to the shader, then generate a new ray to continue the ray path. // We recommend just updating the ray's PathSegment in place. // Note that this step may come before or after stream compaction, // since some shaders you write may also cause a path to terminate. - // * Finally: - // * if not denoising, add this iteration's results to the image - // * TODO: if denoising, run kernels that take both the raw pathtraced result and the gbuffer, and put the result in the "pbo" from opengl - - generateRayFromCamera <<>>(cam, iter, traceDepth, dev_paths); - checkCUDAError("generate camera ray"); - - int depth = 0; - PathSegment* dev_path_end = dev_paths + pixelcount; - int num_paths = dev_path_end - dev_paths; - - // --- PathSegment Tracing Stage --- - // Shoot ray into scene, bounce between objects, push shading chunks - - // Empty gbuffer - cudaMemset(dev_gBuffer, 0, pixelcount * sizeof(GBufferPixel)); - - // clean shading chunks - cudaMemset(dev_intersections, 0, pixelcount * sizeof(ShadeableIntersection)); - - bool iterationComplete = false; - while (!iterationComplete) { - - // tracing - dim3 numblocksPathSegmentTracing = (num_paths + blockSize1d - 1) / blockSize1d; - computeIntersections <<>> ( - depth - , num_paths - , dev_paths - , dev_geoms - , hst_scene->geoms.size() - , dev_intersections - ); - checkCUDAError("trace one bounce"); - cudaDeviceSynchronize(); - - if (depth == 0) { - generateGBuffer<<>>(num_paths, dev_intersections, dev_paths, dev_gBuffer); - } - - depth++; - - shadeSimpleMaterials<<>> ( - iter, - num_paths, - dev_intersections, - dev_paths, - dev_materials - ); - iterationComplete = depth == traceDepth; - } - - // Assemble this iteration and apply it to the image - dim3 numBlocksPixels = (pixelcount + blockSize1d - 1) / blockSize1d; - finalGather<<>>(num_paths, dev_image, dev_paths); + // * Finally, add this iteration's results to the image. This has been done + // for you. + + // TODO: perform one iteration of path tracing + + generateRayFromCamera << > > (cam, iter, traceDepth, dev_paths); + checkCUDAError("generate camera ray"); + + int depth = 0; + PathSegment* dev_path_end = dev_paths + pixelcount; + int num_paths = dev_path_end - dev_paths; + + // --- PathSegment Tracing Stage --- + // Shoot ray into scene, bounce between objects, push shading chunks + cudaMemset(dev_gBuffer, 0, pixelcount * sizeof(GBufferPixel)); + +#if TIMER + cudaEventRecord(start); +#endif + + bool iterationComplete = false; + while (!iterationComplete) { + + // tracing + dim3 numblocksPathSegmentTracing = (num_paths + blockSize1d - 1) / blockSize1d; + //cache first bounce + if (CACHE_BOUNCE && !ANTI_ALIASING && depth == 0 && iter != 1) { + thrust::copy(thrust::device, dev_first_bounce, dev_first_bounce + num_paths, dev_intersections); + + //sort by material + if (SORT_MATERIALS) { + thrust::sort_by_key(thrust::device, dev_intersections, dev_intersections + num_paths, dev_paths, compare_materials()); + } + } + else { + + // clean shading chunks + cudaMemset(dev_intersections, 0, pixelcount * sizeof(ShadeableIntersection)); + + computeIntersections << > > (depth, num_paths, dev_paths, dev_geoms, hst_scene->geoms.size(), dev_intersections); + checkCUDAError("trace one bounce"); + cudaDeviceSynchronize(); + + if (depth == 0) { + generateGBuffer << > > (num_paths, dev_intersections, dev_paths, dev_gBuffer); + } + + //cache first bounce + if (CACHE_BOUNCE && !ANTI_ALIASING && depth == 0 && iter == 1) { + thrust::copy(thrust::device, dev_intersections, dev_intersections + num_paths, dev_first_bounce); + } + //sort by material + if (SORT_MATERIALS) { + thrust::sort_by_key(thrust::device, dev_intersections, dev_intersections + num_paths, dev_paths, compare_materials()); + } + } + + depth++; + +#if DIRECT_LIGHTING + shadeDirectLighting << > > (iter, num_paths, dev_intersections, dev_paths, + dev_materials, dev_lights, hst_scene->lights.size()); +#else + shadeFakeMaterial << > > (iter, num_paths, dev_intersections, dev_paths, dev_materials); +#endif + + //dev_path_end = thrust::stable_partition(thrust::device, dev_paths, dev_path_end, should_end()); + //num_paths = dev_path_end - dev_paths; + + + /*if (num_paths == 0 || depth > traceDepth) { + iterationComplete = true; + }*/ + + if (depth > traceDepth) { + iterationComplete = true; + } + + } + +#if TIMER + cudaEventRecord(stop); + cudaEventSynchronize(stop); + float t; + cudaEventElapsedTime(&t, start, stop); + //std::cout << "CUDA TIME: " << t << std::endl; + + if (countStart && iter > 20) { + totalTime += t; + std::cout << totalTime / iter << std::endl; + countStart = false; + } +#endif + + // Assemble this iteration and apply it to the image + dim3 numBlocksPixels = (pixelcount + blockSize1d - 1) / blockSize1d; + finalGather << > > (pixelcount, dev_image, dev_paths); /////////////////////////////////////////////////////////////////////////// - // CHECKITOUT: use dev_image as reference if you want to implement saving denoised images. - // Otherwise, screenshots are also acceptable. + // Send results to OpenGL buffer for rendering + // sendImageToPBO<<>>(pbo, cam.resolution, iter, dev_image); + // Retrieve image from GPU cudaMemcpy(hst_scene->state.image.data(), dev_image, - pixelcount * sizeof(glm::vec3), cudaMemcpyDeviceToHost); + pixelcount * sizeof(glm::vec3), cudaMemcpyDeviceToHost); checkCUDAError("pathtrace"); } -// CHECKITOUT: this kernel "post-processes" the gbuffer/gbuffers into something that you can visualize for debugging. void showGBuffer(uchar4* pbo) { - const Camera &cam = hst_scene->state.camera; + const Camera& cam = hst_scene->state.camera; const dim3 blockSize2d(8, 8); const dim3 blocksPerGrid2d( - (cam.resolution.x + blockSize2d.x - 1) / blockSize2d.x, - (cam.resolution.y + blockSize2d.y - 1) / blockSize2d.y); + (cam.resolution.x + blockSize2d.x - 1) / blockSize2d.x, + (cam.resolution.y + blockSize2d.y - 1) / blockSize2d.y); // CHECKITOUT: process the gbuffer results and send them to OpenGL buffer for visualization - gbufferToPBO<<>>(pbo, cam.resolution, dev_gBuffer); + gbufferToPBO << > > (pbo, cam.resolution, dev_gBuffer); } void showImage(uchar4* pbo, int iter) { -const Camera &cam = hst_scene->state.camera; + const Camera& cam = hst_scene->state.camera; const dim3 blockSize2d(8, 8); const dim3 blocksPerGrid2d( - (cam.resolution.x + blockSize2d.x - 1) / blockSize2d.x, - (cam.resolution.y + blockSize2d.y - 1) / blockSize2d.y); + (cam.resolution.x + blockSize2d.x - 1) / blockSize2d.x, + (cam.resolution.y + blockSize2d.y - 1) / blockSize2d.y); // Send results to OpenGL buffer for rendering - sendImageToPBO<<>>(pbo, cam.resolution, iter, dev_image); + sendImageToPBO << > > (pbo, cam.resolution, iter, dev_image); +} + +void showDenoised(uchar4* pbo, int iter, float cPhi, float nPhi, float pPhi, int ui_filterSize) { + const Camera& cam = hst_scene->state.camera; + const dim3 blockSize2d(8, 8); + const dim3 blocksPerGrid2d( + (cam.resolution.x + blockSize2d.x - 1) / blockSize2d.x, + (cam.resolution.y + blockSize2d.y - 1) / blockSize2d.y); + + denoiseInit << > > (iter, cam.resolution, dev_denoise1, dev_image); + + int numSteps = (int)glm::round(glm::log2(ui_filterSize)); + for (int i = 0; i < numSteps; i++) { + // compute actual denoise + denoiseImage << > > (cam.resolution, dev_denoise1, dev_denoise2, dev_gBuffer, cPhi, nPhi, pPhi, i); + std::swap(dev_denoise1, dev_denoise2); + } + denoiseToPBO << > > (pbo, cam.resolution, dev_denoise1); + cudaDeviceSynchronize(); + cudaMemcpy(hst_scene->state.image.data(), dev_denoise1, + cam.resolution.x * cam.resolution.y * sizeof(glm::vec3), cudaMemcpyDeviceToHost); + cudaDeviceSynchronize(); } diff --git a/src/pathtrace.h b/src/pathtrace.h index 9e12f44..d611d59 100644 --- a/src/pathtrace.h +++ b/src/pathtrace.h @@ -8,3 +8,4 @@ void pathtraceFree(); void pathtrace(int frame, int iteration); void showGBuffer(uchar4 *pbo); void showImage(uchar4 *pbo, int iter); +void showDenoised(uchar4* pbo, int iter, float cPhi, float nPhi, float pPhi, int ui_filterSize); diff --git a/src/sceneStructs.h b/src/sceneStructs.h index da7e558..110e60b 100644 --- a/src/sceneStructs.h +++ b/src/sceneStructs.h @@ -10,6 +10,19 @@ enum GeomType { SPHERE, CUBE, + TRIANGLE, +}; + +struct Point { + glm::vec3 pos; + glm::vec3 nor; + glm::vec2 uv; +}; + +struct Triangle { + Point pt1; + Point pt2; + Point pt3; }; struct Ray { @@ -26,6 +39,7 @@ struct Geom { glm::mat4 transform; glm::mat4 inverseTransform; glm::mat4 invTranspose; + Triangle triangle; }; struct Material { @@ -60,23 +74,23 @@ struct RenderState { }; struct PathSegment { - Ray ray; - glm::vec3 color; - int pixelIndex; - int remainingBounces; + Ray ray; + glm::vec3 color; + int pixelIndex; + int remainingBounces; }; // Use with a corresponding PathSegment to do: // 1) color contribution computation // 2) BSDF evaluation: generate a new ray struct ShadeableIntersection { - float t; - glm::vec3 surfaceNormal; - int materialId; + float t; + glm::vec3 surfaceNormal; + int materialId; }; -// CHECKITOUT - a simple struct for storing scene geometry information per-pixel. -// What information might be helpful for guiding a denoising filter? struct GBufferPixel { - float t; -}; + float t; + glm::vec3 pos; + glm::vec3 nor; +}; \ No newline at end of file