Page 4 of 7

Re: Call overhead stats

Posted: Mon Sep 24, 2012 2:56 pm
by hendu
This doesn't have much to do with shaders, it's about reducing the CPU overhead of irrlicht.
Ie, irrlicht was doing things it wouldn't have needed to do.


Books on optimization? Drepper has written several papers on the topic, they're freely available on his homepage.

I recommend the optimization tutorials 1 and 2, and "What every programmer should know about memory".

Re: Call overhead stats

Posted: Sat Nov 17, 2012 8:19 pm
by ACE247
Just to check in again, are any of these changes actually going to be applied to svn? I see they are just hanging there on the tracker, or are the main irr devs too busy currently?

Re: Call overhead stats

Posted: Sun Nov 18, 2012 12:55 am
by hybrid
Yes, we had a major release just a few days ago. Development will start soon again.

Re: Call overhead stats

Posted: Sat Dec 08, 2012 1:36 am
by devsh
somebody could start off with a SSE3 implementation of vector3df or at least matrix4 with 16 byte alignment (Declspec!)

and drop a lot of setTransform() calls

Re: Call overhead stats

Posted: Sat Dec 08, 2012 9:48 pm
by hendu
SSE3 implementation of vector3df
I've tried that. It was slower than the current, fully inline-able template version.
and drop a lot of setTransform() calls
I don't remember there being a lot of those in vain. The issue with those was calling setTransform(identitymatrix) not going through the fast path of glLoadIdentity, but instead uploading the identity matrix every time.

Re: Call overhead stats

Posted: Sun Dec 09, 2012 12:32 am
by hybrid
Yeah, these fixes will be the first ones to go into the engine once I come back to working on it.

Re: Call overhead stats

Posted: Sun Dec 09, 2012 3:23 pm
by devsh
I think a number of us tried to do SSE vector but never actually used SSE correctly

Re: Call overhead stats

Posted: Sun Dec 09, 2012 3:55 pm
by hendu
Oh, the actual calculations were faster. But vector3df's are very short-lived, you usually create it, do one or two calculations, extract the components.

And the vector packing and unpacking overhead were more than what was gained from the faster calculation.

Re: Call overhead stats

Posted: Mon Dec 10, 2012 1:09 am
by Nadro
This week I'll be back to Irrlicht development (in last time I was very busy), so patches from this thread will be integrated with core soon.

Re: Call overhead stats

Posted: Thu Dec 13, 2012 6:56 pm
by Nadro
I merged with a trunk patches related to reset materials, depth and metrices. I also send a commit which improves shader constant handling. OpenGL get location call overhead is also fixed in this revision.

Now a constant ID is returned by getVertex/PixelShaderConstantID. After shader is created You should run this method only once for each constant. When You will have a constant ID, You have to pass it into setVertex/PixelShaderConstant.

Re: Call overhead stats

Posted: Fri Dec 14, 2012 5:51 pm
by hendu
I see, thanks.

Note that the get*ConstantID function is still a bit inefficient - I would add a sort right after all names have been added, and then use binary_search in get*ConstantID. Still, having these on the user's side does free resources.

I do wonder whether user pushback will be big - afterall, all other wrappers accept names.

Re: Call overhead stats

Posted: Sat Dec 15, 2012 12:15 pm
by Nadro
Hmmm... we have to call get*ConstantID only once and I think that combination sort + binary_search will be more expensive than standard search method (differences should be really small). Of course binary_search would be usefull when get*ConstantID would be call many times but it's not required.

Of course I can add sort + binary_search to an interface, but I'm not sure if this is needed.

What about performance changes? I checked example no. 10 and I saw some FPS more than before :) I think that for heavily shader based apps, boost should be more visible.

Re: Call overhead stats

Posted: Sat Dec 15, 2012 1:09 pm
by hendu
The sort would only happen once, after shader linking?

Re: Call overhead stats

Posted: Sat Dec 15, 2012 1:25 pm
by Nadro
Yep, I know but time spend to sort will be equal to one standard search pass for get*ConstantID, but after sort You need binary search, thats why I compared sort + bin search vs standard search, because both combinations should be called only once.

Re: Call overhead stats

Posted: Sat Dec 15, 2012 3:19 pm
by hendu
Huh, are we talking about the same thing?

Current cost for ten uniforms: 10 linear searches = 100 work units
Proposed cost: 1 sort + 10 binary searches = 10 + 23 = 33 work units