Page 1 of 1

Strange problem with GLSL

Posted: Mon Feb 25, 2008 12:43 pm
by arras
I did run in to strange problem which I can describe on this test shader:

vertex shader code (file "shader_v.glsl"):

Code: Select all

uniform int Lighting;

void main()
{
   vec4 color = vec4(0,0,0,1);

   int a = 0;
   int b = 5;

   if(Lighting == 0)
   {

      for(int i=0; i<5; i++)
      {
         color.r += 0.1;
      }

   }

   gl_FrontColor = color;

	gl_Position = ftransform();
}
fragment shader code (file "shader_f.glsl"):

Code: Select all

void main()
{
	gl_FragColor = gl_Color;
}
And finally simple Irrlicht code using that shader on simple sphere:

Code: Select all

#include <irrlicht.h>
using namespace irr;

int main()
{
   IrrlichtDevice *Device = createDevice(video::EDT_OPENGL,
      core::dimension2d<s32>(640, 480), 16, false, false, false, NULL);

   video::IVideoDriver* driver = Device->getVideoDriver();
   scene::ISceneManager* smgr = Device->getSceneManager();
   gui::IGUIEnvironment* guienv = Device->getGUIEnvironment();

   video::IGPUProgrammingServices *services = driver->getGPUProgrammingServices();

   s32 CMT_SHADER = services->addHighLevelShaderMaterialFromFiles(
		"shader_v.glsl", "main", video::EVST_VS_1_1,
      "shader_f.glsl", "main", video::EPST_PS_1_1,
      0, video::EMT_SOLID);

   scene::ISceneNode *node = smgr->addSphereSceneNode(5, 16);
   node->setMaterialType((video::E_MATERIAL_TYPE)CMT_SHADER);

   scene::ICameraSceneNode* camera = smgr->addCameraSceneNode(0, core::vector3df(0,0,-20));

   gui::IGUIStaticText *text = guienv->addStaticText(L"", core::rect<s32>(10,10,200,200));

   while(Device->run())
   {
      driver->beginScene(true, true, video::SColor(0,100,100,150));

      core::stringw tmp("FPS: ");
      tmp += driver->getFPS();
      text->setText(tmp.c_str());

      smgr->drawAll();
      guienv->drawAll();

      driver->endScene();
   }

   Device->drop();

   return 0;
}
It does nothing special, its just for testing my problem since it seems somehow related to combination of uniform variable with if() and for() loops. Code runs fine like this.

However if I replace vertex shader code with:

Code: Select all

uniform int Lighting;

void main()
{
   vec4 color = vec4(0,0,0,1);

   int a = 0;
   int b = 5;

   if(Lighting == 0)
   {

      for(int i=0; i<b; i++)
      {
         color.r += 0.1;
      }

   }

   gl_FrontColor = color;

	gl_Position = ftransform();
}
FPS suddenly drops from 1900 to 44! That something I don't understand. I did spend some hours reading GLSL docs and testing but did not find any reason for such behavior.

Here is another combination of code, which cause same FPS drop:

Code: Select all

uniform int Lighting;

void main()
{
   vec4 color = vec4(0,0,0,1);

   int a = Lighting;
   int b = 5;

   if(a == 0)
   {

      for(int i=0; i<b; i++)
      {
         color.r += 0.1;
      }

   }

   gl_FrontColor = color;

	gl_Position = ftransform();
}
???!

On the other hand this code works again fine:

Code: Select all

uniform int Lighting;

void main()
{
   vec4 color = vec4(0,0,0,1);

   int a = 0;
   int b = 5;

   if(a == 0)
   {

      for(int i=0; i<b; i++)
      {
         color.r += 0.1;
      }

   }

   gl_FrontColor = color;

	gl_Position = ftransform();
}
I don't see any reason for such huge FPS differences. Does somebody have idea of what's going on? ...Or at last verify the same behavior on his machine?

Posted: Mon Feb 25, 2008 1:13 pm
by xDan
Hmm, is your graphics card quite old?

With my old card, if I put over a certain amount of code in it (e.g. too many loop iterations, as the loops apparently get unrolled) then it can't handle it and goes into software mode... Which is indeed very slow.

Maybe assigning "a" to a uniform variable takes some more instructions than assigning it to a constant 0...

You could try decreasing the number of iterations of the loop, see if that helps.

Posted: Mon Feb 25, 2008 1:18 pm
by BlindSide
Very simple, you are using a non-const value for the for loop, so the driver has to recompile the shader EVERY FRAME. Thats the reason for your performance.

Shader compilers like to do stuff like loop unrolling, to make the shader performance better. Thats why when a uniform value is changed, it has to recompile the entire shader.

I can see that in the last example "b" works fine without recompiling. That is probably because for some reason the compiler decided not to use loop-unrolling on that particular shader. They most likely decide whether to do this on many complex factors. The "if(a == 0)" will probably be removed by the compiler as it is redundant so maybe it thought that since it is a less complex shader than the others it does not need such serious optimisations.

Look into driver docs to see if any companies provide an extension to explicitly disable loop-unrolling in shader compilation.

Cheers

Posted: Mon Feb 25, 2008 2:17 pm
by arras
xDan >> Yes card is ATI Mobility Radeon X700 which is not the newest model out there but decreasing number of for() loops does nothing. Basically there is no difference if b = 5 or 1 or even 0.

BlindSide >> the hell with it! This really complicate things since what I need it to check if Lighting is on for material and then run light color computation code as many times as there are lights in scene. I have to look for some workaround...

What is really strange is that this code works fine:

Code: Select all

uniform int Lighting;

void main()
{
   vec4 color = vec4(0,0,0,1);

   for(int i=0; i<Lighting; i++)
   {
      color.r += 0.1;
   }

   gl_FrontColor = color;

	gl_Position = ftransform();
}
Its only if for() loop is included inside if() statement...

Also I did notice that putting some code in to function makes some small drop in speed ...which is also strange since compiler should just replace function call with code ...at last I think.

Posted: Mon Feb 25, 2008 2:24 pm
by arras
Now this whole thing makes me think that best would be to make diferent shaders for every state ...like Lighting on/off, different fog settings and then run one which is appropriate for rendered materials settings. Which may of course result in to dozen or more shader files.

Posted: Tue Feb 26, 2008 12:48 am
by Luben
just put the shader functions in the same file but with different names, like Lighting_Main, NoLighting_Main. Related code in the same file, yatta!

Posted: Wed Feb 27, 2008 8:03 pm
by arras
OK I just came around bunch of articles and forums which deals with this problem. Here are two representative: http://www.gamedev.net/community/forums ... 1&#2287633
http://www.opengl.org/discussion_boards ... ber=234684
So the way to go is to made different shader for each situation as I already thought ...this will result in to really monstrous solution if I want to made my material able to mimic every situation (number, type of lights, fog with different settings). Thats really bad...

Posted: Fri Feb 29, 2008 2:06 am
by BlindSide
Thats ok, I have a solution for you. I'll post it in the code snippets forums soon.

Posted: Fri Feb 29, 2008 8:19 am
by arras
OK I'll look at it. Meanwhile I did some tests wit fixed number of lights and found out really large drop in performance. I did test with 3 lights in scene, one point, spot and directional. While test run at 104 FPS on EMT_SOLID shader crawled at 33 FPS. And thats still only the same stuff fixed pipeline does but without fog and texture. If I would like to run some more advanced stuff like pixel lighting or bump mapping it would slow rendering down to death. Not speaking about running full complement of 8 lights in scene.
I did found that fixed pipeline handle several lights quit well. For example it dropped only 4-6 FPS between rendering scene with 1 then 3 lights (in above scene). Howe the hell fixed pipeline does it ...lighting calculation is probably from those more expensive.


I am really starting to think that rendering my tiles with ONE_TEXTURE_BLEND several times (that is rendering geometry in few passes) would be cheaper than using shaders!

Did somebody ever made some complex shader which can actually be used in something else than just more or less simple demo?

Posted: Fri Feb 29, 2008 9:04 am
by BlindSide
I'm interested to see what this shader of yours looks like.

Posted: Fri Feb 29, 2008 10:05 am
by arras
Well I try to optimize it up and down so it changed a little bit, but here is it:
vertex shader

Code: Select all

uniform int LightType0, LightType1, LightType2;

void main()
{
   int LightType[3];

   LightType[0] = LightType0;
   LightType[1] = LightType1;
   LightType[2] = LightType2;

   vec3 position = vec3(gl_ModelViewMatrix * gl_Vertex);
   vec3 normal = normalize(gl_NormalMatrix * gl_Normal);
   vec3 cameraVector = normalize(-position);

   gl_FrontColor = vec4(0,0,0,1);

   for(int i=0; i<3; i++)
   {
      vec3 lightVector = gl_LightSource[i].position.xyz - position;
      float distance = length(lightVector);
      lightVector = normalize(lightVector);
      float diffusePower = max(0.0, dot(normal, lightVector));
      vec3 halfVector = normalize(lightVector + cameraVector);
      float specularPower = max( 0.0, dot(normal, halfVector) );
      specularPower = pow( specularPower, gl_FrontMaterial.shininess );

      vec4 color = gl_FrontLightProduct[i].ambient;

      // point light
      if(LightType[i] == 0)
      {
         color += gl_FrontLightProduct[i].diffuse * diffusePower;
         color += gl_FrontLightProduct[i].specular * specularPower;
      }

      // spot light
      if(LightType[i] == 1)
      {
         float spotEffect = dot( normalize(gl_LightSource[i].spotDirection), -lightVector );
         if(spotEffect > gl_LightSource[i].spotCosCutoff)
         {
            spotEffect = pow(spotEffect, gl_LightSource[i].spotExponent);
            color += gl_FrontLightProduct[i].diffuse * diffusePower * spotEffect;
            color += gl_FrontLightProduct[i].specular * specularPower * spotEffect;
         }
      }

      // directional light
      if(LightType[i] == 2)
      {
         lightVector = normalize(gl_LightSource[i].position.xyz);
         distance = 0;

         diffusePower = max(0.0, dot(normal, lightVector));
         color += gl_FrontLightProduct[i].diffuse * diffusePower;

         halfVector = normalize(lightVector + cameraVector);
         specularPower = max( 0.0, dot(normal, halfVector) );
         specularPower = pow( specularPower, gl_FrontMaterial.shininess );
         color += gl_FrontLightProduct[i].specular * specularPower;
      }

      // attentuation
      float attentuation = 1.0 / (gl_LightSource[i].constantAttenuation +
         gl_LightSource[i].linearAttenuation * distance +
         gl_LightSource[i].quadraticAttenuation * distance * distance);

      color = color * attentuation;
      gl_FrontColor += color;
   }

   // final color
   gl_FrontColor += gl_FrontLightModelProduct.sceneColor;

   gl_Position = ftransform();
}
fragment shader

Code: Select all

void main()
{
	gl_FragColor = gl_Color;
}
Here is whole test project (source code) if you are interested:
ShaderTester.zip

Posted: Fri Feb 29, 2008 10:51 am
by ebo
Try removing ifs ...
Branches are really slow, as they cause shader threads to diverge (see CUDA docs)

Posted: Fri Feb 29, 2008 10:53 am
by arras
I know, all branches are executed then one result is picked.