Call overhead stats

Discuss about anything related to the Irrlicht Engine, or read announcements about any significant features or usage changes.
fmx

Re: Call overhead stats

Post by fmx »

The uniform cache and these misc material state optimisations should REALLY help to boost performance on mobile platforms.
Irrlicht has far too much happening under the hood at the moment, which was why I opted for my own rendering engine for commercial ios projects.
I'd love to use irrlicht more though, cant wait for the next couple of irrlicht versions to come out

Keep up the great work guys :D
hendu
Posts: 2600
Joined: Sat Dec 18, 2010 12:53 pm

Re: Call overhead stats

Post by hendu »

You also get big fps advances by optimizing your shaders with Mesa's GLSL optimizer, as most mobile drivers suck. Wouldn't surprise me in the least if state setting was done badly there, too.
hendu
Posts: 2600
Joined: Sat Dec 18, 2010 12:53 pm

Re: Call overhead stats

Post by hendu »

Doing that change didn't seem to break anything, as long as the correct lastMaterial was also passed (the current call passes current material for both current and old material, obviously breaking the comparison). Perhaps even 1.8 material?

Patch: https://sourceforge.net/tracker/?func=d ... tid=540678

Will remeasure the call overhead.
hendu
Posts: 2600
Joined: Sat Dec 18, 2010 12:53 pm

Re: Call overhead stats

Post by hendu »

Total number of calls in one frame: 17557 (~17.5k!)
Average call took: 0.260 usec
Total CPU time spent in these calls: 4559 usec (~4.5ms)

----

All calls, without draw calls took: 2598.2 usec (~2.6ms)

----

Call count: http://pastebin.com/RjirLzDR

----

So in total, removing that unnecessary state reset saved 2.5k gl calls, totalling 0.5ms savings.
hybrid
Admin
Posts: 14143
Joined: Wed Apr 19, 2006 9:20 pm
Location: Oldenburg(Oldb), Germany
Contact:

Re: Call overhead stats

Post by hybrid »

Yeah, sound slike a bug then, or at least a very suboptimal implementation. I'll keep this thread here, though, for general discussions of these optimizations. The patch will be considered from the tracker anyway, so we don't need another thread in bug forum.
hendu
Posts: 2600
Joined: Sat Dec 18, 2010 12:53 pm

Re: Call overhead stats

Post by hendu »

Added a nice uniform cache. It removed 215 calls to GetUniformLocation, that is, all of them for this late frame. The total calls dropped to 17342 for the frame.

It has no collisions with my app, where the largest shader has 14 uniforms. Will see how easily it patches to svn trunk.
hendu
Posts: 2600
Joined: Sat Dec 18, 2010 12:53 pm

Re: Call overhead stats

Post by hendu »

hendu
Posts: 2600
Joined: Sat Dec 18, 2010 12:53 pm

Re: Call overhead stats

Post by hendu »

Backporting the SetWrapMode skip check removed another 1466 calls. This one was already in trunk.
hendu
Posts: 2600
Joined: Sat Dec 18, 2010 12:53 pm

Re: Call overhead stats

Post by hendu »

Backported another active texture check in setbasicrendermodes, a couple k calls gone.

Then for active matters, I had 249 calls to glDepthMask. So I added tracking for that, now it's 2 - a drop of 247 calls.

Current gl call count is at 11183, 56% of where we started from two days ago.

---

CPU overhead currently stands at 3.9ms total, 2.0ms without draw calls. About one ms shaved off of each.
hendu
Posts: 2600
Joined: Sat Dec 18, 2010 12:53 pm

Re: Call overhead stats

Post by hendu »

Nadro
Posts: 1648
Joined: Sun Feb 19, 2006 9:08 am
Location: Warsaw, Poland

Re: Call overhead stats

Post by Nadro »

For a cache uniform the better solution is add integer value to SUniformInfo called eg. CachedLocation. Initial value of CachedLocation will be equal -1. In setPixelShaderConstant we will just use it like this:

Code: Select all

GLint Location = UniformInfo[i].CachedLocation;
 
if (Location < 0)
{
    if (Program2)
        Location = Driver->extGlGetUniformLocation(Program2,name);
    else
        Location = Driver->extGlGetUniformLocationARB(Program,name);
 
    UniformInfo[i].CachedLocation = Location;
}
In this implementation an interface stay more clean and this is faster solution, because we use direct access to a value than iterate over const char*.

Thanks for all patches and interest in the subject.
Library helping with network requests, tasks management, logger etc in desktop and mobile apps: https://github.com/GrupaPracuj/hermes
hendu
Posts: 2600
Joined: Sat Dec 18, 2010 12:53 pm

Re: Call overhead stats

Post by hendu »

You're right there, let's do it that way.

edit: Patch updated.

Though, now that I took a look at the uniforminfo, that is a linear search. For every uniform update, there is up to N string compares, where N is how many uniforms you have.

Should probably be replaced with a map (RB tree). Or sorted after we know that all uniforms have been added there.
hendu
Posts: 2600
Joined: Sat Dec 18, 2010 12:53 pm

Re: Call overhead stats

Post by hendu »

Updated patch with sort + binary search. Yay SVN and no way to do incremental patches.
Nadro
Posts: 1648
Joined: Sun Feb 19, 2006 9:08 am
Location: Warsaw, Poland

Re: Call overhead stats

Post by Nadro »

Thanks. I have also an idea to extend cache system - eliminate searching for const values (many values like texture ID, light colors etc. are const for one shader and doesn't change in next frames):

Code: Select all

const s32 i = UniformInfo.binary_search(target);
I think that we can extend IMaterialRendererServices by eg:

Code: Select all

/* This function will be removed uniform from an UniformInfo list and shift it to CachedUniformInfo. Each CachedUniformInfo element will be also store float/int/bool value assigned to current uniform. On COpenGLSLMaterialRenderer::OnRender method all CachedUniformInfo elements should be uploaded to GPU.*/
virtual bool addCachedPixelShaderConstant(const c8* name, const s32* ints, int count) = 0;
To efficient usage of addCachedPixelShaderConstant we should extend also IShaderConstantSetCallBack eg by:

Code: Select all

/* This function will be called only once after material is created, here we should put addCachedPixelShaderConstant calls. */
virtual void OnSetCachedConstants(video::IMaterialRendererServices* services, s32 userData) = 0;
When UniformInfo array will be smaller, set values for a dynamic uniforms will be faster (less element to searching by binary_search). I think that this is good optimization for a shaders with many uniforms and doesn't complicate existing interface too much. I will prepare patch soon. What do You think about this optimization? Personally I think that Handu's patches which doesn't change existing interface we can apply to v1.8, but I'm relatively new developer in a team, so Hybrid and CuteAlien should decide about it :)
Library helping with network requests, tasks management, logger etc in desktop and mobile apps: https://github.com/GrupaPracuj/hermes
hendu
Posts: 2600
Joined: Sat Dec 18, 2010 12:53 pm

Re: Call overhead stats

Post by hendu »

I don't think it's irr's place to decide that. It should be up to the app to decide when to upload what.

For example, in my app I send some uniforms only once (texture sampler slots), some per frame (matrices), some per phase, etc etc.
Setting up one system would be far too limiting, and trying to enumerate all possibilities in an API would be impossible.
Post Reply