Call overhead stats
Re: Call overhead stats
The uniform cache and these misc material state optimisations should REALLY help to boost performance on mobile platforms.
Irrlicht has far too much happening under the hood at the moment, which was why I opted for my own rendering engine for commercial ios projects.
I'd love to use irrlicht more though, cant wait for the next couple of irrlicht versions to come out
Keep up the great work guys
Irrlicht has far too much happening under the hood at the moment, which was why I opted for my own rendering engine for commercial ios projects.
I'd love to use irrlicht more though, cant wait for the next couple of irrlicht versions to come out
Keep up the great work guys
Re: Call overhead stats
You also get big fps advances by optimizing your shaders with Mesa's GLSL optimizer, as most mobile drivers suck. Wouldn't surprise me in the least if state setting was done badly there, too.
Re: Call overhead stats
Doing that change didn't seem to break anything, as long as the correct lastMaterial was also passed (the current call passes current material for both current and old material, obviously breaking the comparison). Perhaps even 1.8 material?
Patch: https://sourceforge.net/tracker/?func=d ... tid=540678
Will remeasure the call overhead.
Patch: https://sourceforge.net/tracker/?func=d ... tid=540678
Will remeasure the call overhead.
Re: Call overhead stats
Total number of calls in one frame: 17557 (~17.5k!)
Average call took: 0.260 usec
Total CPU time spent in these calls: 4559 usec (~4.5ms)
----
All calls, without draw calls took: 2598.2 usec (~2.6ms)
----
Call count: http://pastebin.com/RjirLzDR
----
So in total, removing that unnecessary state reset saved 2.5k gl calls, totalling 0.5ms savings.
Average call took: 0.260 usec
Total CPU time spent in these calls: 4559 usec (~4.5ms)
----
All calls, without draw calls took: 2598.2 usec (~2.6ms)
----
Call count: http://pastebin.com/RjirLzDR
----
So in total, removing that unnecessary state reset saved 2.5k gl calls, totalling 0.5ms savings.
-
- Admin
- Posts: 14143
- Joined: Wed Apr 19, 2006 9:20 pm
- Location: Oldenburg(Oldb), Germany
- Contact:
Re: Call overhead stats
Yeah, sound slike a bug then, or at least a very suboptimal implementation. I'll keep this thread here, though, for general discussions of these optimizations. The patch will be considered from the tracker anyway, so we don't need another thread in bug forum.
Re: Call overhead stats
Added a nice uniform cache. It removed 215 calls to GetUniformLocation, that is, all of them for this late frame. The total calls dropped to 17342 for the frame.
It has no collisions with my app, where the largest shader has 14 uniforms. Will see how easily it patches to svn trunk.
It has no collisions with my app, where the largest shader has 14 uniforms. Will see how easily it patches to svn trunk.
Re: Call overhead stats
Backporting the SetWrapMode skip check removed another 1466 calls. This one was already in trunk.
Re: Call overhead stats
Backported another active texture check in setbasicrendermodes, a couple k calls gone.
Then for active matters, I had 249 calls to glDepthMask. So I added tracking for that, now it's 2 - a drop of 247 calls.
Current gl call count is at 11183, 56% of where we started from two days ago.
---
CPU overhead currently stands at 3.9ms total, 2.0ms without draw calls. About one ms shaved off of each.
Then for active matters, I had 249 calls to glDepthMask. So I added tracking for that, now it's 2 - a drop of 247 calls.
Current gl call count is at 11183, 56% of where we started from two days ago.
---
CPU overhead currently stands at 3.9ms total, 2.0ms without draw calls. About one ms shaved off of each.
Re: Call overhead stats
For a cache uniform the better solution is add integer value to SUniformInfo called eg. CachedLocation. Initial value of CachedLocation will be equal -1. In setPixelShaderConstant we will just use it like this:
In this implementation an interface stay more clean and this is faster solution, because we use direct access to a value than iterate over const char*.
Thanks for all patches and interest in the subject.
Code: Select all
GLint Location = UniformInfo[i].CachedLocation;
if (Location < 0)
{
if (Program2)
Location = Driver->extGlGetUniformLocation(Program2,name);
else
Location = Driver->extGlGetUniformLocationARB(Program,name);
UniformInfo[i].CachedLocation = Location;
}
Thanks for all patches and interest in the subject.
Library helping with network requests, tasks management, logger etc in desktop and mobile apps: https://github.com/GrupaPracuj/hermes
Re: Call overhead stats
You're right there, let's do it that way.
edit: Patch updated.
Though, now that I took a look at the uniforminfo, that is a linear search. For every uniform update, there is up to N string compares, where N is how many uniforms you have.
Should probably be replaced with a map (RB tree). Or sorted after we know that all uniforms have been added there.
edit: Patch updated.
Though, now that I took a look at the uniforminfo, that is a linear search. For every uniform update, there is up to N string compares, where N is how many uniforms you have.
Should probably be replaced with a map (RB tree). Or sorted after we know that all uniforms have been added there.
Re: Call overhead stats
Updated patch with sort + binary search. Yay SVN and no way to do incremental patches.
Re: Call overhead stats
Thanks. I have also an idea to extend cache system - eliminate searching for const values (many values like texture ID, light colors etc. are const for one shader and doesn't change in next frames):
I think that we can extend IMaterialRendererServices by eg:
To efficient usage of addCachedPixelShaderConstant we should extend also IShaderConstantSetCallBack eg by:
When UniformInfo array will be smaller, set values for a dynamic uniforms will be faster (less element to searching by binary_search). I think that this is good optimization for a shaders with many uniforms and doesn't complicate existing interface too much. I will prepare patch soon. What do You think about this optimization? Personally I think that Handu's patches which doesn't change existing interface we can apply to v1.8, but I'm relatively new developer in a team, so Hybrid and CuteAlien should decide about it
Code: Select all
const s32 i = UniformInfo.binary_search(target);
Code: Select all
/* This function will be removed uniform from an UniformInfo list and shift it to CachedUniformInfo. Each CachedUniformInfo element will be also store float/int/bool value assigned to current uniform. On COpenGLSLMaterialRenderer::OnRender method all CachedUniformInfo elements should be uploaded to GPU.*/
virtual bool addCachedPixelShaderConstant(const c8* name, const s32* ints, int count) = 0;
Code: Select all
/* This function will be called only once after material is created, here we should put addCachedPixelShaderConstant calls. */
virtual void OnSetCachedConstants(video::IMaterialRendererServices* services, s32 userData) = 0;
Library helping with network requests, tasks management, logger etc in desktop and mobile apps: https://github.com/GrupaPracuj/hermes
Re: Call overhead stats
I don't think it's irr's place to decide that. It should be up to the app to decide when to upload what.
For example, in my app I send some uniforms only once (texture sampler slots), some per frame (matrices), some per phase, etc etc.
Setting up one system would be far too limiting, and trying to enumerate all possibilities in an API would be impossible.
For example, in my app I send some uniforms only once (texture sampler slots), some per frame (matrices), some per phase, etc etc.
Setting up one system would be far too limiting, and trying to enumerate all possibilities in an API would be impossible.