My Raytracer History

Post your questions, suggestions and experiences regarding to Image manipulation, 3d modeling and level editing for the Irrlicht engine here.
BlindSide
Admin
Posts: 2821
Joined: Thu Dec 08, 2005 9:09 am
Location: NZ!

My Raytracer History

Post by BlindSide »

I posted this on another forum but I thought you guys might find it interesting. It details my work in progress raytracer driver for Irrlicht, which I haven't touched in a long time now.
Just thought I'd share something.

When I first started writing my raytracer I used a very simple test scene (I recommend you keep the scene you have now as a reference to when you improve your performance.)

This is what it looked like (But without the soft shadows, I can't seem to find the original image):

Image

Now I'll tell you a timeline of my modifications and performance changes throughout the development cycle of the raytracer. I'd like to call this "My journey to realtime". All of this was conducted on a single core 1.6ghz AMD Turion laptop with 320mb of ram.

First I started off with a very simple raytracer, shoot rays for every pixel in the screen, loop over every single triangle in every mesh and intersect those with the ray, then do simple lighting and read a texture for those coordinates. Everything is pure C++ no SSE, etc. I also blitted to the screen every screenline so I can see the image update in realtime. This took approximately 8 seconds to render the above image at 512x512 (Without the shadows).

After this I did some tweaks here and there, added a simple KDTree and KDTree builder, lots of artifacts, missing triangles etc from my buggy KDTree code, but the render time suddenly jumped down to 3 seconds for the same image.

After this I removed the constant blitting to the screen, and this significantly reduced the render time to 1 second with the KDTree, and I did a few tweaks here and there to the KDTree algorithm (Early exit, first hit, etc), got rid of a few artifacts, but there were still some rendering problems here and there, and the time only got down to something like 900ms, which isn't a big improvement overall.

After this I got fed up with the KDTree and min/max. Sure the traversal loop only needed like 2 or 3 lines of code, but what's the point when memory bandwidth is usually the biggest bottleneck? So I moved to a BVH. Instantly my render artifacts dissappeared but my render time jumped back up to 1.5 seconds.

I finally bit the bullet and implemented SSE intrinsics. I wrote a simple wrapper vector class that could be switched with a simple compile define to use either SSE intrinsics or simple fpu code (For platforms that didn't support it), and another SOA vector wrapper on top of that that made dealing with 4 vectors at a time easy. This, as revealed by several threads on here, isn't an efficient way to implement SSE (Especially since I was using MSVC which is known for producing especially bad code in this case), but I was happy, with 4 ray packets and this SSE vector class traversing through a BVH, my render time jumped down to 500ms, I finally felt like I was getting really close.

After this I read a few papers and came across Wald et al's excellent "Using BVH with dynamic geometry paper" (Not the main Wald paper on interactive raytracing, but quite popular too). Following this as a guide I implemented frustum traversal, early hit exit, storying ray ids, and all sorts of lil tweaks described in the paper and some that I just thought of myself while staring blankly at the code. My render time after adding frustums went down to around 300ms, and then 250ms, and then finally after much tweaking and optimizing, 180ms (~5fps!). This number I recall very distinctly because I couldn't find any easy way to make it any faster than this. Soon after I got a new computer and that beast could render that same scene using the same code at a steady 20 fps (AMD Phenom 9950 with 2 GB ram). But this laptop's rendering speed I could not forget, as it was the first time I ray traced at an "interactive" frame rate. :D

Haha sorry that got a bit long but I hope theres something useful for you in there to reach your desired performance goals!
ShadowMapping for Irrlicht!: Get it here
Need help? Come on the IRC!: #irrlicht on irc://irc.freenode.net
wITTus
Posts: 167
Joined: Tue Jun 24, 2008 7:41 pm
Location: Germany

Post by wITTus »

I understand now why you stopped the development. You simply wait another few years till you've got a better machine and 100fps. :lol:
Generated Documentation for BlindSide's irrNetLite.
"When I heard birds chirping, I knew I didn't have much time left before my mind would go." - clinko
Lonesome Ducky
Competition winner
Posts: 1123
Joined: Sun Jun 10, 2007 11:14 pm

Post by Lonesome Ducky »

Very interesting, do you still have it laying around on your hard drive somewhere?
lulzfish
Posts: 34
Joined: Sat Aug 15, 2009 8:19 pm

Post by lulzfish »

That's quite awesome.
Any idea how the rendering time would scale if you added another couple of scene nodes?
Is the somber dream of the Grim Reaper a shade Darker Than Black?
torleif
Posts: 188
Joined: Mon Jun 30, 2008 4:53 am

Post by torleif »

Nice! I'd love to see it put into the irrlicht engine for some fun :p
devsh
Competition winner
Posts: 2057
Joined: Tue Dec 09, 2008 6:00 pm
Location: UK
Contact:

Post by devsh »

You got a phenom... i think you "might have" (i think you didn't but anyway incase you did) forgotten about multiple cores. you could split the 512x512 into 256x256 and process 4 quads of 256x256 so all your CPU cores are busy.
I think if you have a quad core... you could have quadrupled your fps to almost 80. You coulg get ready for eight-cores by splitting the 512x512 into 256x128 segments and 8 threads.

Also some rays might fail more often or be traced quicker than others in one quad and the cpu that was challenged with the easier quad could be sitting around doing nothing... a solution to that would be asynchronous rendering/ray tracing meaning each thread out of the four has their own image buffer. The first thread to be finished with their portion of the screen ray traced... updates the camera and events and node movement + animation (note: the other threads function on copy's made from these, so changing these values while they are still ray tracing should have no imapact).
Then it starts tracing the new frame, while the others finish the last. If the thread(s) tracing the new scene finish before the last scene is finished, it/they store the new finished scene parts into buffers and "help out" to finish tracing the old scene. When the last thread is done... the threads that are already ahead one frame present their buffers to the screen and start on the next.

personally i think you could take a hybrid approach to ray tracing

render the scene onto two MRTs

1st - screen normals (E vector as the call it in specular highlights) reflected by normals (GLSL reflect(E,normal) )
2nd - color (or diffuse or just the texture thing)
3rd - depth buffer

then in software (CPU) download the image from VRAM and reconstruct pixel world position and there you have your bounced off rays, and the hardest part is yet ahead finding where they intersect other stuff. I would reccomend a concept similar to matrices, only that you dont contruct one. You multiply stuff with it as if it had no scale and was a 3x3 matrix and was orthogonal.
This way you have your 2 bounce raytracer.

ofcourse the render time would be 2 seconds and there would be absolutely no use to it, but it could be taken a step forward into OpenCL or nvidia CUDA where you wouldnt have to download the MRTs to RAM which cuts your fps to 3. In OCL or CUDA you would be able to use the texture straight in video RAM so no download overhead.
Valmond
Posts: 308
Joined: Thu Apr 12, 2007 3:26 pm

Post by Valmond »

It would be cool to check out though (if you have a 'lil demo).

And total respect :!:
BlindSide
Admin
Posts: 2821
Joined: Thu Dec 08, 2005 9:09 am
Location: NZ!

Post by BlindSide »

devsh I used Intel TBB to schedule the multi-core stuff, it automatically balances the load across cores. This is a pure software renderer, no GPUs were harmed in the making. I will look into OpenCL etc when the technology matures, I still feel it's not ready.

I posted some demos here a few months ago.

I was thinking of releasing a header/lib only package for testing sometime but I just haven't gotten around to it as I'm busy with other stuff.
ShadowMapping for Irrlicht!: Get it here
Need help? Come on the IRC!: #irrlicht on irc://irc.freenode.net
devsh
Competition winner
Posts: 2057
Joined: Tue Dec 09, 2008 6:00 pm
Location: UK
Contact:

Post by devsh »

nvidia optix is what you need... it will give you an interactive FPS for sure :)
bitplane
Admin
Posts: 3204
Joined: Mon Mar 28, 2005 3:45 am
Location: England
Contact:

Post by bitplane »

*ignores devsh*

That's pretty interesting BlindSide. Any plans to support multi-threading? Machines are getting faster, but the number of cores is increasing faster than the clock speed is doubling.
Submit bugs/patches to the tracker!
Need help right now? Visit the chat room
BlindSide
Admin
Posts: 2821
Joined: Thu Dec 08, 2005 9:09 am
Location: NZ!

Post by BlindSide »

Looks like you ignored my post too because I just mentioned that I use TBB for multithreading. :P

Yeah NVidia Optix is nice but it's not officially released yet, I'll be sure to check it out eventually.
ShadowMapping for Irrlicht!: Get it here
Need help? Come on the IRC!: #irrlicht on irc://irc.freenode.net
devsh
Competition winner
Posts: 2057
Joined: Tue Dec 09, 2008 6:00 pm
Location: UK
Contact:

Post by devsh »

I think that in 4 years it will be the end of "drawing engines", it will be sorrow...
hybrid
Admin
Posts: 14143
Joined: Wed Apr 19, 2006 9:20 pm
Location: Oldenburg(Oldb), Germany
Contact:

Post by hybrid »

Just as tv killed radio, video killed cinema, ... So, it simply won't happen. Maybe most high-end games will provide a ray-tracer engine, but hand-helds and mobile phones still require rendering engines, because not more than a hand ful of cores will be available there. And what will happen in 15 years is not forseeable.
BlindSide
Admin
Posts: 2821
Joined: Thu Dec 08, 2005 9:09 am
Location: NZ!

Post by BlindSide »

Yes but cellphones tend to have very small screens, that's why you see so many raycasting (Not raytracing) engines on cellphones, because the performance scales very well with smaller screen sizes (And very badly with larger ones :lol: ).
ShadowMapping for Irrlicht!: Get it here
Need help? Come on the IRC!: #irrlicht on irc://irc.freenode.net
Mel
Competition winner
Posts: 2292
Joined: Wed May 07, 2008 11:40 am
Location: Granada, Spain

Post by Mel »

I think that in 4 years it will be the end of "drawing engines", it will be sorrow...
I don't think so. The algorithm for raytracing takes into account for every pixel, all the lights the pixel can see. So, either you use a small amount of lights, or use simple scenes or else you might have a trouble when the number of objects/lights grows. Even with the optimizations, the complexity of the raytracing increases with each light and object you add, so, it would be very limiting to have an engine based only in raytracing.
"There is nothing truly useless, it always serves as a bad example". Arthur A. Schmitt
Post Reply