Painting. ITexture::Lock() seems too slow

Mloren · Post by **Mloren** » Tue Dec 07, 2010 9:20 am

Part of the program I'm making involves a paint-like interface.
You can select one of a bunch of different painting tools and use them to draw, works pretty much just like MSPaint.

To do this, I have an IGUIImage which can be scrolled.
I have a pixel array and when the user is painting I get the mouse position, alter the appropriate pixels, call lock() on the IGUIImage's texture, copy the pixel array across and then call unlock().

The image that you can paint on is a huge 2048x2048 and I lock() and unlock() it once per frame.

This causes my framerate to drop from 60 to 15.
It's the lock() and unlock() that's doing it, if I comment those out, the framerate goes back to 60.

I know the size of the image is big and that lock() and unlock() are bound to be slow operations but still this seems like way too big of a performance hit.
Also the CPU isn't above 50% so it must be the GPU that's struggling. My video card is not slow (GeForce series 8 ) so I can't blame that.

Does anyone have a suggestion for how I could do this better? I need to keep the framerate at 60.

I'm also wondering if lock()/unlock() could be optimized, I expect them to be slow but not that slow.

hybrid · Post by **hybrid** » Tue Dec 07, 2010 9:51 am

Yeah, I've already thought about adding a 'write-only' flag for the lock. This would reduce the data travel by 50% (right now the data is copied from GPU to CPU, changed, and copied back from CPU to GPU). Another optimization would be to change only parts of the texture. So far, we always upload the whole texture. We could introduce a rect parameter to lock, which would upload only that portion. Both seems possible and should reduce the transfer costs significantly.

greenya · Post by **greenya** » Wed Dec 08, 2010 7:08 pm

Does anyone have a suggestion for how I could do this better? I need to keep the framerate at 60.

Instead of one single 2048x2048 piece you can use couple smaller pieces, like 512x512 or even 256x256. You will have two dimensional array of these pieces; for example if you use 256x256 parts, you need 8x8 array (64 pieces). So each time when you need to draw something at x,y -- you calculate destination piece and lock/unlock only that piece.

slavik262 · Post by **slavik262** » Wed Dec 08, 2010 8:25 pm

hybrid wrote:Yeah, I've already thought about adding a 'write-only' flag for the lock. This would reduce the data travel by 50% (right now the data is copied from GPU to CPU, changed, and copied back from CPU to GPU). Another optimization would be to change only parts of the texture. So far, we always upload the whole texture. We could introduce a rect parameter to lock, which would upload only that portion. Both seems possible and should reduce the transfer costs significantly.

This would be extremely useful. How easily could this be implemented?

bitplane · Post by **bitplane** » Wed Dec 08, 2010 10:12 pm

I would "write" to the texture by drawing into a transparent render target overlay using the hardware, then flatten it into a single image every so often.

hybrid · Post by **hybrid** » Wed Dec 08, 2010 11:49 pm

slavik262 wrote:
hybrid wrote:Yeah, I've already thought about adding a 'write-only' flag for the lock. This would reduce the data travel by 50% (right now the data is copied from GPU to CPU, changed, and copied back from CPU to GPU). Another optimization would be to change only parts of the texture. So far, we always upload the whole texture. We could introduce a rect parameter to lock, which would upload only that portion. Both seems possible and should reduce the transfer costs significantly.
This would be extremely useful. How easily could this be implemented?

The wite-only flag should be pretty simple. Just replacing the bool flag by an enum value. The rectangle region could be harder. I have to check for the driver support, and think about proper storing of the areas and data. But both things should be possible within a few changes.

Mloren · Post by **Mloren** » Thu Dec 09, 2010 1:55 am

bitplane wrote:I would "write" to the texture by drawing into a transparent render target overlay using the hardware, then flatten it into a single image every so often.

I'm not sure what you mean by "render target overlay".
Is this something that irrlicht lets you do? or would I have to go directly to the hardware for this?

hybrid · Post by **hybrid** » Wed Dec 15, 2010 9:24 pm

hybrid wrote:
slavik262 wrote:
hybrid wrote:Yeah, I've already thought about adding a 'write-only' flag for the lock. This would reduce the data travel by 50% (right now the data is copied from GPU to CPU, changed, and copied back from CPU to GPU). Another optimization would be to change only parts of the texture. So far, we always upload the whole texture. We could introduce a rect parameter to lock, which would upload only that portion. Both seems possible and should reduce the transfer costs significantly.
This would be extremely useful. How easily could this be implemented?
The wite-only flag should be pretty simple. Just replacing the bool flag by an enum value. The rectangle region could be harder. I have to check for the driver support, and think about proper storing of the areas and data. But both things should be possible within a few changes.

Argh. It seems that I either don't understand Direct3D or it does neither have WRITE_ONLY, nor easy rectangle locking. The latter exists, but moves the lock region into a new image window. Only the locked rectangle will be returned as an image. This introduces some additional complexity due to the different functionality in the other drivers. So I guess this will only work when we change the lock to return an IImage instead of a raw pointer.
Anyway, the write_only flag is implemented in SVN for OpenGL now.

nespa · Post by **nespa** » Wed Dec 15, 2010 9:49 pm

My metode to increase the painting speed:

We have 2 textures, the source texture and the destination texture.
The destination texture must be visible , so...must be rendered every frame.The source texture it is not necesary to be visible.
My metode:

outside of the renderer loop:
1. Lock source texture to get a start pointer to pixels;

in the renderer loop:
Start main loop
2.Lock the destination texture;
3.Call a function to blend all the pixels in the rectangle area from the 2 textures(src and dest) and write them to dest texture;
4.UnLock the destination texture;
..........Draw (Render)
End main loop
5.UnLock the source texture;

I used this metode in Clady3dTerrainEditor and in CladyColorMapEditor;
DirectX is faster than OpenGl in my tests;

Mloren · Post by **Mloren** » Thu Dec 16, 2010 12:14 am

nespa wrote:My metode to increase the painting speed:

We have 2 textures, the source texture and the destination texture.
The destination texture must be visible , so...must be rendered every frame.The source texture it is not necesary to be visible.
My metode:

outside of the renderer loop:
1. Lock source texture to get a start pointer to pixels;

in the renderer loop:
Start main loop
2.Lock the destination texture;
3.Call a function to blend all the pixels in the rectangle area from the 2 textures(src and dest) and write them to dest texture;
4.UnLock the destination texture;
..........Draw (Render)
End main loop
5.UnLock the source texture;

I used this metode in Clady3dTerrainEditor and in CladyColorMapEditor;
DirectX is faster than OpenGl in my tests;

This still involves locking and unlocking the destination texture every frame which is the bottleneck for me. Probably just because the texture is so large. (2048x2048)
I've solved this using a variation of greenya's suggestion, I have multiple textures in strips, 2048x64 pixels each. I just go down the screen, updating them as necessary almost like scanlines on a TV.

disks86 · Post by **disks86** » Fri Dec 24, 2010 6:59 pm

Alright so maybe I just don't understand but why do we have to lock at all. I'm pretty sure in sdl you can bitblit any sdl surface to any other surface whether it is in video memory or system memory. So my question is this can we have a copyTo method like the one in IImage that can target another ITexture and for that matter another one in IImage that can target an ITexture. Both of which including a rectangle parameter of course.

Guess I never realized how much of a pain dealing with textures was before now.

http://www.gamedev.net/community/forums ... 1&#2646821

Mloren · Post by **Mloren** » Sun Dec 26, 2010 1:23 am

disks86 wrote:Alright so maybe I just don't understand but why do we have to lock at all. I'm pretty sure in sdl you can bitblit any sdl surface to any other surface whether it is in video memory or system memory.

The lock command ensures that the GPU isn't using the texture while the GPU is writing to it, there's no guarantee that the CPU and GPU are doing anything in sync so even if you can write directly to the GPU's version of the texture, there's no guarantee it would be safe to do so.
The GPU could be half way through rendering the texture when you change it which would result in weird graphical corruption for a frame where it was half the old texture and half the new one.

nathanf534 · Post by **nathanf534** » Sun Dec 26, 2010 4:02 am

The GPU could be half way through rendering the texture when you change it which would result in weird graphical corruption for a frame where it was half the old texture and half the new one.

If thats the only side effect for such an expensive operation, does it really matter? If you are drawing that at about 60 fps, you wouldn't really notice this.

Mloren · Post by **Mloren** » Sun Dec 26, 2010 4:04 am

nathanf534 wrote:
The GPU could be half way through rendering the texture when you change it which would result in weird graphical corruption for a frame where it was half the old texture and half the new one.
If thats the only side effect for such an expensive operation, does it really matter? If you are drawing that at about 60 fps, you wouldn't really notice this.

Yes you would. Especially if the old texture and new texture were very different. Also I'm not even sure if its possible to change the GPU texture mid render, it may just corrupt memory and crash. not sure on that.

hybrid · Post by **hybrid** » Sun Dec 26, 2010 1:30 pm

lock and unlock on a texture ensure that the data is sync'ed between CPU and GPU. This involves a lot of data copying, which might be unnecessary as stated above. So far, you could only avoid the upload on unlock (read_only). In SVN/trunk we also have the write_only mode, but so far only for OpenGL. What you cannot do (and what is not possible in general on all graphics cards) is to copy one texture to another without copying it to the CPU. I am planning for such a method, which would be fast on most cards, but which might still involve a full copy-out copy-in via CPU on some cards. This would hold true for SDL as well.