Call overhead stats
Re: Call overhead stats
This doesn't have much to do with shaders, it's about reducing the CPU overhead of irrlicht.
Ie, irrlicht was doing things it wouldn't have needed to do.
Books on optimization? Drepper has written several papers on the topic, they're freely available on his homepage.
I recommend the optimization tutorials 1 and 2, and "What every programmer should know about memory".
Ie, irrlicht was doing things it wouldn't have needed to do.
Books on optimization? Drepper has written several papers on the topic, they're freely available on his homepage.
I recommend the optimization tutorials 1 and 2, and "What every programmer should know about memory".
Re: Call overhead stats
Just to check in again, are any of these changes actually going to be applied to svn? I see they are just hanging there on the tracker, or are the main irr devs too busy currently?
-
- Admin
- Posts: 14143
- Joined: Wed Apr 19, 2006 9:20 pm
- Location: Oldenburg(Oldb), Germany
- Contact:
Re: Call overhead stats
Yes, we had a major release just a few days ago. Development will start soon again.
Re: Call overhead stats
somebody could start off with a SSE3 implementation of vector3df or at least matrix4 with 16 byte alignment (Declspec!)
and drop a lot of setTransform() calls
and drop a lot of setTransform() calls
Re: Call overhead stats
I've tried that. It was slower than the current, fully inline-able template version.SSE3 implementation of vector3df
I don't remember there being a lot of those in vain. The issue with those was calling setTransform(identitymatrix) not going through the fast path of glLoadIdentity, but instead uploading the identity matrix every time.and drop a lot of setTransform() calls
-
- Admin
- Posts: 14143
- Joined: Wed Apr 19, 2006 9:20 pm
- Location: Oldenburg(Oldb), Germany
- Contact:
Re: Call overhead stats
Yeah, these fixes will be the first ones to go into the engine once I come back to working on it.
Re: Call overhead stats
I think a number of us tried to do SSE vector but never actually used SSE correctly
Re: Call overhead stats
Oh, the actual calculations were faster. But vector3df's are very short-lived, you usually create it, do one or two calculations, extract the components.
And the vector packing and unpacking overhead were more than what was gained from the faster calculation.
And the vector packing and unpacking overhead were more than what was gained from the faster calculation.
Re: Call overhead stats
This week I'll be back to Irrlicht development (in last time I was very busy), so patches from this thread will be integrated with core soon.
Library helping with network requests, tasks management, logger etc in desktop and mobile apps: https://github.com/GrupaPracuj/hermes
Re: Call overhead stats
I merged with a trunk patches related to reset materials, depth and metrices. I also send a commit which improves shader constant handling. OpenGL get location call overhead is also fixed in this revision.
Now a constant ID is returned by getVertex/PixelShaderConstantID. After shader is created You should run this method only once for each constant. When You will have a constant ID, You have to pass it into setVertex/PixelShaderConstant.
Now a constant ID is returned by getVertex/PixelShaderConstantID. After shader is created You should run this method only once for each constant. When You will have a constant ID, You have to pass it into setVertex/PixelShaderConstant.
Library helping with network requests, tasks management, logger etc in desktop and mobile apps: https://github.com/GrupaPracuj/hermes
Re: Call overhead stats
I see, thanks.
Note that the get*ConstantID function is still a bit inefficient - I would add a sort right after all names have been added, and then use binary_search in get*ConstantID. Still, having these on the user's side does free resources.
I do wonder whether user pushback will be big - afterall, all other wrappers accept names.
Note that the get*ConstantID function is still a bit inefficient - I would add a sort right after all names have been added, and then use binary_search in get*ConstantID. Still, having these on the user's side does free resources.
I do wonder whether user pushback will be big - afterall, all other wrappers accept names.
Re: Call overhead stats
Hmmm... we have to call get*ConstantID only once and I think that combination sort + binary_search will be more expensive than standard search method (differences should be really small). Of course binary_search would be usefull when get*ConstantID would be call many times but it's not required.
Of course I can add sort + binary_search to an interface, but I'm not sure if this is needed.
What about performance changes? I checked example no. 10 and I saw some FPS more than before I think that for heavily shader based apps, boost should be more visible.
Of course I can add sort + binary_search to an interface, but I'm not sure if this is needed.
What about performance changes? I checked example no. 10 and I saw some FPS more than before I think that for heavily shader based apps, boost should be more visible.
Library helping with network requests, tasks management, logger etc in desktop and mobile apps: https://github.com/GrupaPracuj/hermes
Re: Call overhead stats
The sort would only happen once, after shader linking?
Re: Call overhead stats
Yep, I know but time spend to sort will be equal to one standard search pass for get*ConstantID, but after sort You need binary search, thats why I compared sort + bin search vs standard search, because both combinations should be called only once.
Library helping with network requests, tasks management, logger etc in desktop and mobile apps: https://github.com/GrupaPracuj/hermes
Re: Call overhead stats
Huh, are we talking about the same thing?
Current cost for ten uniforms: 10 linear searches = 100 work units
Proposed cost: 1 sort + 10 binary searches = 10 + 23 = 33 work units
Current cost for ten uniforms: 10 linear searches = 100 work units
Proposed cost: 1 sort + 10 binary searches = 10 + 23 = 33 work units