Call overhead stats

hendu · Post by **hendu** » Mon Sep 24, 2012 2:56 pm

This doesn't have much to do with shaders, it's about reducing the CPU overhead of irrlicht.
Ie, irrlicht was doing things it wouldn't have needed to do.

Books on optimization? Drepper has written several papers on the topic, they're freely available on his homepage.

I recommend the optimization tutorials 1 and 2, and "What every programmer should know about memory".

ACE247 · Post by **ACE247** » Sat Nov 17, 2012 8:19 pm

Just to check in again, are any of these changes actually going to be applied to svn? I see they are just hanging there on the tracker, or are the main irr devs too busy currently?

hybrid · Post by **hybrid** » Sun Nov 18, 2012 12:55 am

Yes, we had a major release just a few days ago. Development will start soon again.

devsh · Post by **devsh** » Sat Dec 08, 2012 1:36 am

somebody could start off with a SSE3 implementation of vector3df or at least matrix4 with 16 byte alignment (Declspec!)

and drop a lot of setTransform() calls

hendu · Post by **hendu** » Sat Dec 08, 2012 9:48 pm

SSE3 implementation of vector3df

I've tried that. It was slower than the current, fully inline-able template version.

and drop a lot of setTransform() calls

I don't remember there being a lot of those in vain. The issue with those was calling setTransform(identitymatrix) not going through the fast path of glLoadIdentity, but instead uploading the identity matrix every time.

hybrid · Post by **hybrid** » Sun Dec 09, 2012 12:32 am

Yeah, these fixes will be the first ones to go into the engine once I come back to working on it.

devsh · Post by **devsh** » Sun Dec 09, 2012 3:23 pm

I think a number of us tried to do SSE vector but never actually used SSE correctly

hendu · Post by **hendu** » Sun Dec 09, 2012 3:55 pm

Oh, the actual calculations were faster. But vector3df's are very short-lived, you usually create it, do one or two calculations, extract the components.

And the vector packing and unpacking overhead were more than what was gained from the faster calculation.

Nadro · Post by **Nadro** » Mon Dec 10, 2012 1:09 am

This week I'll be back to Irrlicht development (in last time I was very busy), so patches from this thread will be integrated with core soon.

Nadro · Post by **Nadro** » Thu Dec 13, 2012 6:56 pm

I merged with a trunk patches related to reset materials, depth and metrices. I also send a commit which improves shader constant handling. OpenGL get location call overhead is also fixed in this revision.

Now a constant ID is returned by getVertex/PixelShaderConstantID. After shader is created You should run this method only once for each constant. When You will have a constant ID, You have to pass it into setVertex/PixelShaderConstant.

hendu · Post by **hendu** » Fri Dec 14, 2012 5:51 pm

I see, thanks.

Note that the get*ConstantID function is still a bit inefficient - I would add a sort right after all names have been added, and then use binary_search in get*ConstantID. Still, having these on the user's side does free resources.

I do wonder whether user pushback will be big - afterall, all other wrappers accept names.

Nadro · Post by **Nadro** » Sat Dec 15, 2012 12:15 pm

Hmmm... we have to call get*ConstantID only once and I think that combination sort + binary_search will be more expensive than standard search method (differences should be really small). Of course binary_search would be usefull when get*ConstantID would be call many times but it's not required.

Of course I can add sort + binary_search to an interface, but I'm not sure if this is needed.

What about performance changes? I checked example no. 10 and I saw some FPS more than before

I think that for heavily shader based apps, boost should be more visible.

hendu · Post by **hendu** » Sat Dec 15, 2012 1:09 pm

The sort would only happen once, after shader linking?

Nadro · Post by **Nadro** » Sat Dec 15, 2012 1:25 pm

Yep, I know but time spend to sort will be equal to one standard search pass for get*ConstantID, but after sort You need binary search, thats why I compared sort + bin search vs standard search, because both combinations should be called only once.

hendu · Post by **hendu** » Sat Dec 15, 2012 3:19 pm

Huh, are we talking about the same thing?

Current cost for ten uniforms: 10 linear searches = 100 work units
Proposed cost: 1 sort + 10 binary searches = 10 + 23 = 33 work units

Irrlicht Engine

Call overhead stats

Re: Call overhead stats

Re: Call overhead stats

Re: Call overhead stats

Re: Call overhead stats

Re: Call overhead stats

Re: Call overhead stats

Re: Call overhead stats

Re: Call overhead stats

Re: Call overhead stats

Re: Call overhead stats

Re: Call overhead stats

Re: Call overhead stats

Re: Call overhead stats

Re: Call overhead stats

Re: Call overhead stats