Why is it so slow when I scale the render?

If you are a new Irrlicht Engine user, and have a newbie-question, this is the forum for you. You may also post general programming questions here.
Post Reply
Noiecity
Posts: 363
Joined: Wed Aug 23, 2023 7:22 pm
Contact:

Why is it so slow when I scale the render?

Post by Noiecity »

I tried rendering burnings video with example 14, I get 4000 or 5000 fps if it is at 128x128, however when scaling(after the render in 128x128) using the irrlicht pipeline it drops to about 140, why does it drop so much? I have examples of using a simple rasterizer and ddraw, and I still get very high fps when scaling(over 3000).

However irrlicht is very slow to scale the render, you should get 3000 fps or more in 128x128 with burning video.

ddraw example:
https://drive.google.com/file/d/1bzO6fn ... sp=sharing

Code: Select all

/** Example 014 Win32 Window

This example only runs under MS Windows and demonstrates that Irrlicht can
render inside a win32 window. MFC and .NET Windows.Forms windows are possible,
too.

In the beginning, we create a windows window using the windows API. I'm not
going to explain this code, because it is windows specific. See the MSDN or a
windows book for details.
*/

#include <irrlicht.h>
#ifndef _IRR_WINDOWS_
#error Windows only example
#else
#include <windows.h> // this example only runs with windows
#include <iostream>
#include <cstdio> // for snprintf
#include "driverChoice.h"

using namespace irr;

#pragma comment(lib, "irrlicht.lib")

HWND hOKButton;
HWND hFpsText;  // Handle FPS text
HWND hWnd;

static LRESULT CALLBACK CustomWndProc(HWND hWnd, UINT message,
		WPARAM wParam, LPARAM lParam)
{
	switch (message)
	{
	case WM_COMMAND:
		{
			HWND hwndCtl = (HWND)lParam;
			int code = HIWORD(wParam);

			if (hwndCtl == hOKButton)
			{
				DestroyWindow(hWnd);
				PostQuitMessage(0);
				return 0;
			}
		}
		break;
	case WM_DESTROY:
		PostQuitMessage(0);
		return 0;

	}

	return DefWindowProc(hWnd, message, wParam, lParam);
}


/*
   Now ask for the driver and create the Windows specific window.
*/
int main()
{
	// ask user for driver
	video::E_DRIVER_TYPE driverType=driverChoiceConsole();
	if (driverType==video::EDT_COUNT)
		return 1;

	printf("Select the render window (some dead window may exist too):\n"\
		" (a) Window with button (via CreationParam)\n"\
		" (b) Window with button (via beginScene)\n"\
		" (c) Own Irrlicht window (default behavior)\n"\
		" (otherKey) exit\n\n");

	char key;
	std::cin >> key;
	if (key != 'a' && key != 'b' && key != 'c')
		return 1;

	HINSTANCE hInstance = 0;
	// create dialog

	const char* Win32ClassName = "CIrrlichtWindowsTestDialog";

	WNDCLASSEX wcex;
	wcex.cbSize			= sizeof(WNDCLASSEX);
	wcex.style			= CS_HREDRAW | CS_VREDRAW;
	wcex.lpfnWndProc	= (WNDPROC)CustomWndProc;
	wcex.cbClsExtra		= 0;
	wcex.cbWndExtra		= DLGWINDOWEXTRA;
	wcex.hInstance		= hInstance;
	wcex.hIcon			= NULL;
	wcex.hCursor		= LoadCursor(NULL, IDC_ARROW);
	wcex.hbrBackground	= (HBRUSH)(COLOR_WINDOW);
	wcex.lpszMenuName	= 0;
	wcex.lpszClassName	= Win32ClassName;
	wcex.hIconSm		= 0;

	RegisterClassEx(&wcex);

	DWORD style = WS_SYSMENU | WS_BORDER | WS_CAPTION |
		WS_CLIPCHILDREN | WS_CLIPSIBLINGS | WS_MAXIMIZEBOX | WS_MINIMIZEBOX | WS_SIZEBOX;

	int windowWidth = 440;
	int windowHeight = 380;

	hWnd = CreateWindow( Win32ClassName, "Irrlicht Win32 window example",
		style, 100, 100, windowWidth, windowHeight,
		NULL, NULL, hInstance, NULL);

	RECT clientRect;
	GetClientRect(hWnd, &clientRect);
	windowWidth = clientRect.right;
	windowHeight = clientRect.bottom;

	// create ok button

	hOKButton = CreateWindow("BUTTON", "OK - Close", WS_CHILD | WS_VISIBLE | BS_TEXT,
		windowWidth - 160, windowHeight - 40, 150, 30, hWnd, NULL, hInstance, NULL);

	// create some text

	CreateWindow("STATIC", "This is Irrlicht running inside a standard Win32 window.\n"\
		"Also mixing with MFC and .NET Windows.Forms is possible.",
		WS_CHILD | WS_VISIBLE, 20, 20, 400, 40, hWnd, NULL, hInstance, NULL);

	// fps text
	hFpsText = CreateWindow("STATIC", "FPS: 0", WS_CHILD | WS_VISIBLE,
		windowWidth - 160, windowHeight - 70, 150, 20, hWnd, NULL, hInstance, NULL);

	// create window to put irrlicht in

	HWND hIrrlichtWindow = CreateWindow("BUTTON", "",
			WS_CHILD | WS_VISIBLE | BS_OWNERDRAW,
			50, 80, 128, 128, hWnd, NULL, hInstance, NULL);
	video::SExposedVideoData videodata((key=='b')?hIrrlichtWindow:0);

	/*
	So now that we have some window, we can create an Irrlicht device
	inside of it. We use Irrlicht createEx() function for this. We only
	need the handle (HWND) to that window, set it as windowsID parameter
	and start up the engine as usual. That's it.
	*/
	// create irrlicht device in the button window

	irr::SIrrlichtCreationParameters param;
	param.DriverType = driverType;
	if (key=='a')
		param.WindowId = reinterpret_cast<void*>(hIrrlichtWindow);

	irr::IrrlichtDevice* device = irr::createDeviceEx(param);
	if (!device)
		return 1;

	// setup a simple 3d scene

	irr::scene::ISceneManager* smgr = device->getSceneManager();
	video::IVideoDriver* driver = device->getVideoDriver();

	if (driverType==video::EDT_OPENGL)
	{
		HDC HDc=GetDC(hIrrlichtWindow);
		PIXELFORMATDESCRIPTOR pfd={0};
		pfd.nSize=sizeof(PIXELFORMATDESCRIPTOR);
		int pf = GetPixelFormat(HDc);
		DescribePixelFormat(HDc, pf, sizeof(PIXELFORMATDESCRIPTOR), &pfd);
		pfd.dwFlags |= PFD_DOUBLEBUFFER | PFD_SUPPORT_OPENGL | PFD_DRAW_TO_WINDOW;
		pfd.cDepthBits=16;
		pf = ChoosePixelFormat(HDc, &pfd);
		SetPixelFormat(HDc, pf, &pfd);
		videodata.OpenGLWin32.HDc = HDc;
		videodata.OpenGLWin32.HRc=wglCreateContext(HDc);
		wglShareLists((HGLRC)driver->getExposedVideoData().OpenGLWin32.HRc, (HGLRC)videodata.OpenGLWin32.HRc);
	}
	scene::ICameraSceneNode* cam = smgr->addCameraSceneNode();
	cam->setTarget(core::vector3df(0,0,0));

	scene::ISceneNodeAnimator* anim =
		smgr->createFlyCircleAnimator(core::vector3df(0,15,0), 30.0f);
	cam->addAnimator(anim);
	anim->drop();

	scene::ISceneNode* cube = smgr->addCubeSceneNode(20);
	cube->setMaterialFlag( video::EMF_LIGHTING, false );
	/*
	cube->setMaterialTexture(0, driver->getTexture("../../media/wall.bmp"));
	cube->setMaterialTexture(1, driver->getTexture("../../media/water.jpg"));
	cube->setMaterialFlag( video::EMF_LIGHTING, false );
	cube->setMaterialType( video::EMT_REFLECTION_2_LAYER );

	smgr->addSkyBoxSceneNode(
	driver->getTexture("../../media/irrlicht2_up.jpg"),
	driver->getTexture("../../media/irrlicht2_dn.jpg"),
	driver->getTexture("../../media/irrlicht2_lf.jpg"),
	driver->getTexture("../../media/irrlicht2_rt.jpg"),
	driver->getTexture("../../media/irrlicht2_ft.jpg"),
	driver->getTexture("../../media/irrlicht2_bk.jpg"));

	// show and execute dialog*/

	ShowWindow(hWnd , SW_SHOW);
	UpdateWindow(hWnd);

	// do message queue

	/*
	Now the only thing missing is the drawing loop using
	IrrlichtDevice::run(). We do this as usual. But instead of this, there
	is another possibility: You can also simply use your own message loop
	using GetMessage, DispatchMessage and whatever. Calling
	Device->run() will cause Irrlicht to dispatch messages internally too.
	You need not call Device->run() if you want to do your own message
	dispatcher loop, but Irrlicht will not be able to fetch user input
	then and you have to do it on your own using the window messages,
	DirectInput, or whatever.
	*/

	while (device->run())
	{
		driver->beginScene(true, true, 0, videodata);
		smgr->drawAll();
		driver->endScene();

		// Update FPS every second
		static u32 lastFpsUpdate = 0;
		u32 now = device->getTimer()->getTime();
		if (now - lastFpsUpdate > 1000)
		{
			lastFpsUpdate = now;
			s32 fps = driver->getFPS();
			char fpsStr[64];
			snprintf(fpsStr, sizeof(fpsStr), "FPS: %d", fps);
			SetWindowTextA(hFpsText, fpsStr);
		}
	}

	/*
	The alternative, own message dispatching loop without Device->run()
	would look like this:
	*/

	/*MSG msg;
	while (true)
	{
		if (PeekMessage(&msg, NULL, 0, 0, PM_REMOVE))
		{
			TranslateMessage(&msg);
			DispatchMessage(&msg);

			if (msg.message == WM_QUIT)
				break;
		}

		// advance virtual time
		device->getTimer()->tick();

		// draw engine picture
		driver->beginScene(true, true, 0, (key=='c')?hIrrlichtWindow:0);
		smgr->drawAll();
		driver->endScene();
	}*/

	device->closeDevice();
	device->drop();

	return 0;
}
#endif // if windows

/*
That's it, Irrlicht now runs in your own windows window.
**/
But after scale the render, you have 100 FPS...

Code: Select all

#include <irrlicht.h>

using namespace irr;

#ifdef _IRR_WINDOWS_
#pragma comment(lib, "Irrlicht.lib")
#endif

// Window settings
const u32 WINDOW_WIDTH = 640;
const u32 WINDOW_HEIGHT = 480;

// Render texture settings (virtual resolution)
const u32 RENDER_WIDTH = 128;   // Width of the render target texture
const u32 RENDER_HEIGHT = 128;  // Height of the render target texture

/**
 * Manages a render-to-texture (RTT) system.
 * Handles creating the render target, rendering to it, and then drawing the
 * resulting texture onto the screen with proper aspect ratio and centering.
 */
class RenderTextureManager {
private:
    IrrlichtDevice* device;
    video::IVideoDriver* driver;
    video::ITexture* renderTexture;  // The off-screen render target
    u32 renderWidth;
    u32 renderHeight;
    f32 aspectRatio;                // Calculated as width/height

public:
    /**
     * Constructor.
     * @param dev    Pointer to the Irrlicht device.
     * @param width  Desired width of the render texture.
     * @param height Desired height of the render texture.
     */
    RenderTextureManager(IrrlichtDevice* dev, u32 width, u32 height)
        : device(dev), renderWidth(width), renderHeight(height) {
        driver = device->getVideoDriver();

        // Compute aspect ratio for later scaling
        aspectRatio = (f32)renderWidth / (f32)renderHeight;

        // Create a render target texture with the specified dimensions
        renderTexture = driver->addRenderTargetTexture(
            core::dimension2d<u32>(renderWidth, renderHeight), "RTT");
    }

    /**
     * Destructor. Cleans up the render target texture.
     */
    ~RenderTextureManager() {
        if (renderTexture) {
            driver->removeTexture(renderTexture);
        }
    }

    /**
     * Sets the render target to the internal texture and clears it.
     * Also forces the viewport to exactly match the render texture size.
     * @param clearColor Background color used when clearing the texture.
     */
    void beginRenderToTexture(const video::SColor& clearColor = video::SColor(255, 100, 101, 140)) {
        driver->setRenderTarget(renderTexture, true, true, clearColor);

        // CRITICAL: the viewport must be set to the exact dimensions of the render target.
        driver->setViewPort(core::rect<s32>(0, 0, renderWidth, renderHeight));
    }

    /**
     * Restores the render target to the primary screen (backbuffer).
     * Also resets the viewport to cover the whole window.
     */
    void endRenderToTexture() {
        driver->setRenderTarget(0); // Switch back to main screen

        // Restore full screen viewport
        core::dimension2d<u32> screenSize = driver->getScreenSize();
        driver->setViewPort(core::rect<s32>(0, 0, screenSize.Width, screenSize.Height));
    }

    /**
     * Draws the render texture onto the screen.
     * The image is scaled to fit the screen while preserving its aspect ratio.
     */
    void drawToScreen() {
        core::dimension2d<u32> screenSize = driver->getScreenSize();

        // Compute destination size that maintains the original aspect ratio
        u32 destHeight = screenSize.Height;
        u32 destWidth = (u32)(destHeight * aspectRatio);

        // If the computed width exceeds the screen width, recalculate using width as base
        if (destWidth > screenSize.Width) {
            destWidth = screenSize.Width;
            destHeight = (u32)(destWidth / aspectRatio);
        }

        // Center the image on screen
        s32 destX = (screenSize.Width - destWidth) / 2;
        s32 destY = (screenSize.Height - destHeight) / 2;

        // Draw the rendered texture onto the screen, scaled and centered
        driver->draw2DImage(renderTexture,
                            core::rect<s32>(destX, destY, destX + destWidth, destY + destHeight),
                            core::rect<s32>(0, 0, renderWidth, renderHeight),
                            0, 0, true);

    }

    // Getters
    u32 getRenderWidth() const { return renderWidth; }
    u32 getRenderHeight() const { return renderHeight; }
    f32 getAspectRatio() const { return aspectRatio; }
    video::ITexture* getRenderTexture() const { return renderTexture; }

    void setRenderSize(u32 width, u32 height) {
        if (renderTexture) {
            driver->removeTexture(renderTexture);
        }

        renderWidth = width;
        renderHeight = height;
        aspectRatio = (f32)width / (f32)height;

        renderTexture = driver->addRenderTargetTexture(
            core::dimension2d<u32>(width, height), "RTT");
    }
};

int main() {
    // Create Irrlicht Device
    IrrlichtDevice* device = createDevice(video::EDT_DIRECT3D9,
                                         core::dimension2d<u32>(WINDOW_WIDTH, WINDOW_HEIGHT),
                                         16, false, false, false, 0);
    if (!device) return 1;

    video::IVideoDriver* driver = device->getVideoDriver();
    scene::ISceneManager* smgr = device->getSceneManager();
    gui::IGUIEnvironment* guienv = device->getGUIEnvironment();

    // Extra scope to control RenderTextureManager's lifetime
    {
        RenderTextureManager rtManager(device, RENDER_WIDTH, RENDER_HEIGHT);

        // Setup FPS Camera
        scene::ICameraSceneNode* camera = smgr->addCameraSceneNodeFPS();
        camera->setPosition(core::vector3df(0, 0, -80));
        camera->setAspectRatio(rtManager.getAspectRatio());
        camera->setFOV(53.4f * core::DEGTORAD);

        // Simple Scene: A cube
        scene::ISceneNode* cube1 = smgr->addCubeSceneNode(15);
        if (cube1) {
            cube1->setPosition(core::vector3df(0, 0, 0));
            cube1->setMaterialFlag(video::EMF_LIGHTING, false);
        }

        gui::IGUIFont* font = guienv->getBuiltInFont();
        core::stringw infoText;
        device->setWindowCaption(L"Configurable Render to Texture - FPS Counter");

        s32 lastFPS = -1;

        // Main rendering loop
        while (device->run()) {
            if (device->isWindowActive()) {
                camera->setAspectRatio(rtManager.getAspectRatio());

                // 1. Render scene to the small texture
                rtManager.beginRenderToTexture();
                smgr->drawAll();
                rtManager.endRenderToTexture();

                // 2. Render the texture and UI to the screen
                driver->beginScene(true, true, video::SColor(255, 0, 0, 0));
                
                rtManager.drawToScreen();

                // Get current FPS
                s32 fps = driver->getFPS();

                driver->endScene();

                // Update window title only when FPS changes (optional)
                if (lastFPS != fps) {
                    core::stringw title = L"Irrlicht RTT - FPS: ";
                    title += fps;
                    device->setWindowCaption(title.c_str());
                    lastFPS = fps;
                }
            }
        }
    } // rtManager destructor is called here, driver is still valid

    device->drop(); // Safe to release device
    return 0;
}
Even if I scale using opengl or direct3d9 I still get about 900 fps... it's still little compared to the 3000 fps I get on a cpu rasterizer.

And it's not that irrlicht is slower, in fact it is faster, it just slows down when scaling the render.

I really want to use burningvideo, render at 128x128, scale with a trilinear filter, but this stops me completely, if I get 4000 or 5000 fps at 128x128, I would expect about 800 or 900 fps when scaling at almost any resolution
Irrlicht is love, Irrlicht is life, long live to Irrlicht
Noiecity
Posts: 363
Joined: Wed Aug 23, 2023 7:22 pm
Contact:

Re: Why is it so slow when I scale the render?

Post by Noiecity »

I confirm, the scaling is extremely slow, I made a version with GDI and I got 4 times more fps... and GDI does not use graphical acceleration. Probably if I used directdraw I would get about 2000 fps.

Code: Select all

// main_gdi_blit.cpp
// Compile with: cl /EHsc /I<irrlicht_include_path> main_gdi_blit.cpp /link Irrlicht.lib user32.lib gdi32.lib
#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <vector>
#include <cstring>      // for memcpy
#include <cstdio>
#include <iostream>
#include <irrlicht.h>


#ifdef _MSC_VER
#pragma comment(lib, "Irrlicht.lib")
#endif

using namespace irr;
using namespace core;
using namespace scene;
using namespace video;
using namespace io;
using namespace gui;

// -------------------------------------------------------------
// Finds a visible window belonging to the current process
// (more reliable than FindWindow by title)
// -------------------------------------------------------------
struct EnumData {
    DWORD pid;
    HWND result;
};

static BOOL CALLBACK EnumWindowsProc(HWND hwnd, LPARAM lParam)
{
    EnumData* d = reinterpret_cast<EnumData*>(lParam);
    DWORD pid = 0;
    GetWindowThreadProcessId(hwnd, &pid);
    if (pid != d->pid) return TRUE;
    if (!IsWindowVisible(hwnd)) return TRUE;
    // Ignore windows without a title (probably not the main window)
    int len = GetWindowTextLengthW(hwnd);
    if (len == 0) return TRUE;
    d->result = hwnd;
    return FALSE; // stop enumeration
}

static HWND FindWindowForCurrentProcess()
{
    EnumData data;
    data.pid = GetCurrentProcessId();
    data.result = NULL;
    EnumWindows(EnumWindowsProc, reinterpret_cast<LPARAM>(&data));
    return data.result;
}

// -------------------------------------------------------------
// Main program
// -------------------------------------------------------------
int WINAPI WinMain(HINSTANCE, HINSTANCE, LPSTR, int)
{
    // Size of the small render target you want to render
    const u32 TEX_W = 128;
    const u32 TEX_H = 128;

    // Create Irrlicht device (software driver for maximum compatibility)
    IrrlichtDevice* device = createDevice(
        video::EDT_BURNINGSVIDEO,
        dimension2d<u32>(640, 480),
        32,            // color depth of the window framebuffer
        false, false, false, 0);

    if (!device) {
        MessageBoxA(NULL, "Could not create IrrlichtDevice.", "Error", MB_ICONERROR);
        return 1;
    }

    device->setWindowCaption(L"Render 128x128 scaled with GDI");

    // Wait a moment and locate the HWND of the process's window
    // (sometimes the HWND is not fully initialized right after device creation)
    Sleep(50);
    HWND hWnd = FindWindowForCurrentProcess();
    if (!hWnd) {
        // fallback: try GetActiveWindow (less reliable)
        hWnd = GetActiveWindow();
    }

    IVideoDriver* driver = device->getVideoDriver();
    ISceneManager* smgr = device->getSceneManager();

    // --- simple test scene ---
    IAnimatedMesh* mesh = smgr->getMesh("../../media/sydney.md2");
    if (!mesh) {
        // if the mesh is not found, you could create something simple (a test node)
        // but here we prefer to alert and exit so you place your media.
        MessageBoxA(NULL, "Could not find ../../media/sydney.md2. Place the mesh in that path or adjust the code.", "Warning", MB_ICONWARNING);
        device->drop();
        return 1;
    }

    IAnimatedMeshSceneNode* node = smgr->addAnimatedMeshSceneNode(mesh);
    if (node) {
        node->setMaterialFlag(EMF_LIGHTING, false);
        node->setMD2Animation(scene::EMAT_STAND);
        ITexture* tex = driver->getTexture("../../media/sydney.bmp");
        if (tex) node->setMaterialTexture(0, tex);
    }

    smgr->addCameraSceneNode(0, vector3df(0,30,-40), vector3df(0,5,0));

    // Create a 128x128 render target
    ITexture* rt = driver->addRenderTargetTexture(dimension2d<u32>(TEX_W, TEX_H), "RT_GDI");
    if (!rt) {
        MessageBoxA(NULL, "Could not create 128x128 render target.", "Error", MB_ICONERROR);
        device->drop();
        return 1;
    }

    // Temporary tightly-packed buffer (reused)
    std::vector<uint8_t> tmp;
    tmp.resize((size_t)TEX_W * (size_t)TEX_H * 4); // 32 bpp

    // Prepare BITMAPINFO (32bpp top-down)
    BITMAPINFO bmi;
    ZeroMemory(&bmi, sizeof(bmi));
    bmi.bmiHeader.biSize = sizeof(BITMAPINFOHEADER);
    bmi.bmiHeader.biWidth = (LONG)TEX_W;
    bmi.bmiHeader.biHeight = -((LONG)TEX_H); // top-down
    bmi.bmiHeader.biPlanes = 1;
    bmi.bmiHeader.biBitCount = 32;
    bmi.bmiHeader.biCompression = BI_RGB;
    bmi.bmiHeader.biSizeImage = 0;

    // FPS variables (optional)
    u32 lastTime = device->getTimer()->getRealTime();
    u32 frames = 0;

    // Main loop
    while (device->run())
    {
        // 1) Render to the render target (clear RT)
        driver->setRenderTarget(rt, true, true, video::SColor(255,0,0,0)); // alpha=255 (opaque), black
        smgr->drawAll();
        driver->setRenderTarget(0); // switch back to backbuffer

        // 2) Readback from RT -> tmp (row by row respecting pitch)
        //    In older Irrlicht versions, lock() takes a bool: true = read-only
        void* pixels = rt->lock(true);   // <--- CHANGE HERE: use true instead of ETLM_READ_ONLY
        if (pixels)
        {
            // real pitch (bytes per row) from the texture
            u32 pitch = rt->getPitch(); // bytes per row
            // color format in case you want to use different logic per format
            ECOLOR_FORMAT fmt = rt->getColorFormat(); // e.g. ECF_A8R8G8B8

            // Copy row by row respecting pitch
            uint8_t* src = reinterpret_cast<uint8_t*>(pixels);
            const size_t dstRowBytes = (size_t)TEX_W * 4u;
            for (u32 y = 0; y < TEX_H; ++y) {
                // CHANGE: use &tmp[0] instead of tmp.data()
                uint8_t* dstRow = &tmp[0] + (size_t)y * dstRowBytes;
                uint8_t* srcRow = src + (size_t)y * (size_t)pitch;
                // copy only min(dstRowBytes, pitch) bytes for safety
                size_t copyBytes = dstRowBytes;
                if (copyBytes > (size_t)pitch) copyBytes = (size_t)pitch;
                memcpy(dstRow, srcRow, copyBytes);
                // If pitch > dstRowBytes, ignore the padding (not needed)
            }

            rt->unlock();

            // 3) Draw to the window with StretchDIBits (GDI scaling)
            HDC hdc = GetDC(hWnd);
            if (hdc)
            {
                // get current client size
                RECT rc;
                GetClientRect(hWnd, &rc);
                int winW = rc.right - rc.left;
                int winH = rc.bottom - rc.top;
                if (winW > 0 && winH > 0) {
                    StretchDIBits(
                        hdc,
                        0, 0, winW, winH,       // dest rect (scale to whole window)
                        0, 0, TEX_W, TEX_H,     // src rect from tightly-packed image (top-down)
                        &tmp[0],                 // <--- CHANGE HERE: &tmp[0] instead of tmp.data()
                        &bmi,
                        DIB_RGB_COLORS,
                        SRCCOPY
                    );
                }
                ReleaseDC(hWnd, hdc);
            }
        }
        else
        {
            // lock returned NULL: the driver does not allow readback from render target
            // (this can happen on some drivers/hardware). For debugging:
            OutputDebugStringA("Warning: rt->lock() returned NULL. Cannot readback this frame.\n");
            // You could fall back to driver->createScreenShot(...) or draw with draw2DImage.
        }

        // 4) Update title with FPS (optional)
        frames++;
        u32 currentTime = device->getTimer()->getRealTime();
        if (currentTime - lastTime >= 1000)
        {
            stringw windowText = L"Render 128x128 scaled with GDI - FPS: ";
            windowText += frames;
            device->setWindowCaption(windowText.c_str());
            lastTime = currentTime;
            frames = 0;
        }

        // slight sleep to avoid flickering on some drivers (optional)
        // Sleep(0);
    }

    device->drop();
    return 0;
}
Although I used irrlicht 1.6.1 instead of 1.9.0, since I was too lazy

edit: irrlicht 1.9.0, configure console to GUI:

Code: Select all

// main_gdi_blit.cpp
// Compile with: cl /EHsc /I<irrlicht_include_path> main_gdi_blit.cpp /link Irrlicht.lib user32.lib gdi32.lib
#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <vector>
#include <cstring>      // for memcpy
#include <cstdio>
#include <iostream>
#include <irrlicht.h>

#ifdef _MSC_VER
#pragma comment(lib, "Irrlicht.lib")
#endif

using namespace irr;
using namespace core;
using namespace scene;
using namespace video;
using namespace io;
using namespace gui;

// -------------------------------------------------------------
// Finds a visible window belonging to the current process
// (more reliable than FindWindow by title)
// -------------------------------------------------------------
struct EnumData {
    DWORD pid;
    HWND result;
};

static BOOL CALLBACK EnumWindowsProc(HWND hwnd, LPARAM lParam)
{
    EnumData* d = reinterpret_cast<EnumData*>(lParam);
    DWORD pid = 0;
    GetWindowThreadProcessId(hwnd, &pid);
    if (pid != d->pid) return TRUE;
    if (!IsWindowVisible(hwnd)) return TRUE;
    // Ignore windows without a title (probably not the main window)
    int len = GetWindowTextLengthW(hwnd);
    if (len == 0) return TRUE;
    d->result = hwnd;
    return FALSE; // stop enumeration
}

static HWND FindWindowForCurrentProcess()
{
    EnumData data;
    data.pid = GetCurrentProcessId();
    data.result = NULL;
    EnumWindows(EnumWindowsProc, reinterpret_cast<LPARAM>(&data));
    return data.result;
}

// -------------------------------------------------------------
// Main program
// -------------------------------------------------------------
int WINAPI WinMain(HINSTANCE, HINSTANCE, LPSTR, int)
{
    // Size of the small render target you want to render
    const u32 TEX_W = 128;
    const u32 TEX_H = 128;

    // Create Irrlicht device (software driver for maximum compatibility)
    IrrlichtDevice* device = createDevice(
        video::EDT_BURNINGSVIDEO,
        dimension2d<u32>(640, 480),
        32,            // color depth of the window framebuffer
        false, false, false, 0);

    if (!device) {
        MessageBoxA(NULL, "Could not create IrrlichtDevice.", "Error", MB_ICONERROR);
        return 1;
    }

    device->setWindowCaption(L"Render 128x128 scaled with GDI");

    // Wait a moment and locate the HWND of the process's window
    // (sometimes the HWND is not fully initialized right after device creation)
    Sleep(50);
    HWND hWnd = FindWindowForCurrentProcess();
    if (!hWnd) {
        // fallback: try GetActiveWindow (less reliable)
        hWnd = GetActiveWindow();
    }

    IVideoDriver* driver = device->getVideoDriver();
    ISceneManager* smgr = device->getSceneManager();

    // --- simple test scene ---
    IAnimatedMesh* mesh = smgr->getMesh("../../media/sydney.md2");
    if (!mesh) {
        // if the mesh is not found, you could create something simple (a test node)
        // but here we prefer to alert and exit so you place your media.
        MessageBoxA(NULL, "Could not find ../../media/sydney.md2. Place the mesh in that path or adjust the code.", "Warning", MB_ICONWARNING);
        device->drop();
        return 1;
    }

    IAnimatedMeshSceneNode* node = smgr->addAnimatedMeshSceneNode(mesh);
    if (node) {
        node->setMaterialFlag(EMF_LIGHTING, false);
        node->setMD2Animation(scene::EMAT_STAND);
        ITexture* tex = driver->getTexture("../../media/sydney.bmp");
        if (tex) node->setMaterialTexture(0, tex);
    }

    smgr->addCameraSceneNode(0, vector3df(0,30,-40), vector3df(0,5,0));

    // Create a 128x128 render target
    ITexture* rt = driver->addRenderTargetTexture(dimension2d<u32>(TEX_W, TEX_H), "RT_GDI");
    if (!rt) {
        MessageBoxA(NULL, "Could not create 128x128 render target.", "Error", MB_ICONERROR);
        device->drop();
        return 1;
    }

    // Temporary tightly-packed buffer (reused)
    std::vector<uint8_t> tmp;
    tmp.resize((size_t)TEX_W * (size_t)TEX_H * 4); // 32 bpp

    // Prepare BITMAPINFO (32bpp top-down)
    BITMAPINFO bmi;
    ZeroMemory(&bmi, sizeof(bmi));
    bmi.bmiHeader.biSize = sizeof(BITMAPINFOHEADER);
    bmi.bmiHeader.biWidth = (LONG)TEX_W;
    bmi.bmiHeader.biHeight = -((LONG)TEX_H); // top-down
    bmi.bmiHeader.biPlanes = 1;
    bmi.bmiHeader.biBitCount = 32;
    bmi.bmiHeader.biCompression = BI_RGB;
    bmi.bmiHeader.biSizeImage = 0;

    // FPS variables (optional)
    u32 lastTime = device->getTimer()->getRealTime();
    u32 frames = 0;

    // Main loop
    while (device->run())
    {
        // 1) Render to the render target (clear RT)
        driver->setRenderTarget(rt, true, true, video::SColor(255,0,0,0)); // alpha=255 (opaque), black
        smgr->drawAll();
        driver->setRenderTarget(0); // switch back to backbuffer

        // 2) Readback from RT -> tmp (row by row respecting pitch)
        //    In Irrlicht 1.9.0, lock() takes an E_TEXTURE_LOCK_MODE parameter.
        void* pixels = rt->lock(video::ETLM_READ_ONLY);   // <--- UPDATED for Irrlicht 1.9.0
        if (pixels)
        {
            // real pitch (bytes per row) from the texture
            u32 pitch = rt->getPitch(); // bytes per row
            // color format in case you want to use different logic per format
            ECOLOR_FORMAT fmt = rt->getColorFormat(); // e.g. ECF_A8R8G8B8

            // Copy row by row respecting pitch
            uint8_t* src = reinterpret_cast<uint8_t*>(pixels);
            const size_t dstRowBytes = (size_t)TEX_W * 4u;
            for (u32 y = 0; y < TEX_H; ++y) {
                // Use &tmp[0] which is portable in C++98/03
                uint8_t* dstRow = &tmp[0] + (size_t)y * dstRowBytes;
                uint8_t* srcRow = src + (size_t)y * (size_t)pitch;
                // copy only min(dstRowBytes, pitch) bytes for safety
                size_t copyBytes = dstRowBytes;
                if (copyBytes > (size_t)pitch) copyBytes = (size_t)pitch;
                memcpy(dstRow, srcRow, copyBytes);
                // If pitch > dstRowBytes, ignore the padding (not needed)
            }

            rt->unlock();

            // 3) Draw to the window with StretchDIBits (GDI scaling)
            HDC hdc = GetDC(hWnd);
            if (hdc)
            {
                // get current client size
                RECT rc;
                GetClientRect(hWnd, &rc);
                int winW = rc.right - rc.left;
                int winH = rc.bottom - rc.top;
                if (winW > 0 && winH > 0) {
                    StretchDIBits(
                        hdc,
                        0, 0, winW, winH,       // dest rect (scale to whole window)
                        0, 0, TEX_W, TEX_H,     // src rect from tightly-packed image (top-down)
                        &tmp[0],                 // <--- &tmp[0] instead of tmp.data()
                        &bmi,
                        DIB_RGB_COLORS,
                        SRCCOPY
                    );
                }
                ReleaseDC(hWnd, hdc);
            }
        }
        else
        {
            // lock returned NULL: the driver does not allow readback from render target
            // (this can happen on some drivers/hardware). For debugging:
            OutputDebugStringA("Warning: rt->lock() returned NULL. Cannot readback this frame.\n");
            // You could fall back to driver->createScreenShot(...) or draw with draw2DImage.
        }

        // 4) Update title with FPS (optional)
        frames++;
        u32 currentTime = device->getTimer()->getRealTime();
        if (currentTime - lastTime >= 1000)
        {
            stringw windowText = L"Render 128x128 scaled with GDI - FPS: ";
            windowText += frames;
            device->setWindowCaption(windowText.c_str());
            lastTime = currentTime;
            frames = 0;
        }

        // slight sleep to avoid flickering on some drivers (optional)
        // Sleep(0);
    }

    device->drop();
    return 0;
}
Last edited by Noiecity on Sun Mar 01, 2026 2:48 pm, edited 1 time in total.
Irrlicht is love, Irrlicht is life, long live to Irrlicht
CuteAlien
Admin
Posts: 9969
Joined: Mon Mar 06, 2006 2:25 pm
Location: Tübingen, Germany
Contact:

Re: Why is it so slow when I scale the render?

Post by CuteAlien »

Sorry, didn't get to checking it out today. Thought probably not much I can tell about software renderer anyway. Just... fps fluctuations sounds higher when the base-fps is high. Going from 5000 to 140 you lose less time than going for example from 100fps to 99fps.

Calling device->setWindowCaption might at such high framerates already affect speed even if it's only called once per second (it's a super slow function).

Anyway - if you want to profile things and find out what migth slow down then use a profiler. Modern Visual Studio has one build-in. Otherwise use Very Sleepy on Windows which is super simply to use (you just start it - click the which application you want to profile and then wait a while).

Last - make sure you compile in release for profiling. Debug has no meaning. (edit: But enable "generate debug information" in linker-debugging)
IRC: #irrlicht on irc.libera.chat
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
CuteAlien
Admin
Posts: 9969
Joined: Mon Mar 06, 2006 2:25 pm
Location: Tübingen, Germany
Contact:

Re: Why is it so slow when I scale the render?

Post by CuteAlien »

I did a quick profile with your second example (which uses EDT_DIRECT3D9) and by far most time (~75%) there is spend in CD3D9Driver::endScene.. in other words flipping back/front buffer. Around 15% lost to timer using SetThreadAffinityMask. Under 4% in render the scenegraph (around half of that in driver-calls to send the cube to the card) and tiny bit time is also spend in setting up render states for a 2d drawing call.

But - kinda everthing of that hardly matters if you use a larger scene. The reason those show up so much is that you do nothing else.

edit: Must admit I'm not sure Irrlicht uses SetThreadAffinityMask correct setting/resetting it for every timer-check. Sounds more like this should only be set once, thought maybe there was some risk involved with that. But I'll have to test with some real scene if that makes any difference. (edit2: Nope, with real scene that no longer matters)
IRC: #irrlicht on irc.libera.chat
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
Noiecity
Posts: 363
Joined: Wed Aug 23, 2023 7:22 pm
Contact:

Re: Why is it so slow when I scale the render?

Post by Noiecity »

Yes, but even so you lose a lot of fps dear cuteAlien, with the version of burnings video using GDI I got 4 times more fps (gdi only scaled), that is 400-500 fps, while using the scaling based on the irrlicht pipeline I got 100 fps... the problem is the scaling, right when it comes to scaling everything is lost, in burningsvideo it went from 5000 fps to 90-100... However, with GDI it reached 400-500.
I was looking for a way to save resources not only through burningsvideo, but to use a greater number of shaders without penalizing myself so much in terms of performance, at low resolutions the penalties for shaders are very low, although it depends on its implementation, I also get a good number of frames with d3d9 when scaling, better than with burningsvideo, however I see that I lose about half of the fps... maybe I am being too optimistic, maybe when comparing it with directdraw I am being unfair... after all After all, they do not have the same color format.

I'm impressed that burningvideo already has shadows volume and normalmaps implemented as of 1.8.5...

Image

SoftwareDriver2_compile_config.h

Code: Select all

#ifdef BURNINGVIDEO_RENDERER_BEAUTIFUL
	#define SOFTWARE_DRIVER_2_PERSPECTIVE_CORRECT
	#define SOFTWARE_DRIVER_2_SUBTEXEL
	//#define SOFTWARE_DRIVER_2_BILINEAR
	#define SOFTWARE_DRIVER_2_LIGHTING
	#define SOFTWARE_DRIVER_2_USE_VERTEX_COLOR
	#define SOFTWARE_DRIVER_2_32BIT
	//#define SOFTWARE_DRIVER_2_MIPMAPPING
	#define SOFTWARE_DRIVER_2_USE_WBUFFER
	#define SOFTWARE_DRIVER_2_TEXTURE_TRANSFORM
	#define SOFTWARE_DRIVER_2_TEXTURE_MAXSIZE		0
#endif
However you can use a similar filter by changing SetStretchBltMode(hdc, COLORONCOLOR);
to
SetStretchBltMode(hdc, HALFTONE);

But it's slow... so I left it at that.

Code: Select all

// main_gdi_blit.cpp
// Compile with: cl /EHsc /I<irrlicht_include_path> main_gdi_blit.cpp /link Irrlicht.lib user32.lib gdi32.lib
#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <vector>
#include <cstring>      // for memcpy
#include <cstdio>
#include <iostream>
#include <irrlicht.h>
#include <cmath>        // for sinf, cosf

#ifdef _MSC_VER
#pragma comment(lib, "Irrlicht.lib")
#endif

using namespace irr;
using namespace core;
using namespace scene;
using namespace video;
using namespace io;
using namespace gui;

// -------------------------------------------------------------
// Finds a visible window belonging to the current process
// (more reliable than FindWindow by title)
// -------------------------------------------------------------
struct EnumData {
    DWORD pid;
    HWND result;
};

static BOOL CALLBACK EnumWindowsProc(HWND hwnd, LPARAM lParam)
{
    EnumData* d = reinterpret_cast<EnumData*>(lParam);
    DWORD pid = 0;
    GetWindowThreadProcessId(hwnd, &pid);
    if (pid != d->pid) return TRUE;
    if (!IsWindowVisible(hwnd)) return TRUE;
    // Ignore windows without a title (probably not the main window)
    int len = GetWindowTextLengthW(hwnd);
    if (len == 0) return TRUE;
    d->result = hwnd;
    return FALSE; // stop enumeration
}

static HWND FindWindowForCurrentProcess()
{
    EnumData data;
    data.pid = GetCurrentProcessId();
    data.result = NULL;
    EnumWindows(EnumWindowsProc, reinterpret_cast<LPARAM>(&data));
    return data.result;
}

// -------------------------------------------------------------
// Main program
// -------------------------------------------------------------
int WINAPI WinMain(HINSTANCE, HINSTANCE, LPSTR, int)
{
    // Size of the small render target you want to render
    const u32 TEX_W = 200;
    const u32 TEX_H = 200;

    // Create Irrlicht device (software driver for maximum compatibility)
    IrrlichtDevice* device = createDevice(
        video::EDT_BURNINGSVIDEO,
        dimension2d<u32>(640, 480),
        32,            // color depth of the window framebuffer
        false, true, true, 0);

    if (!device) {
        MessageBoxA(NULL, "Could not create IrrlichtDevice.", "Error", MB_ICONERROR);
        return 1;
    }

    device->setWindowCaption(L"Render 128x128 scaled with GDI");

    // Wait a moment and locate the HWND of the process's window
    // (sometimes the HWND is not fully initialized right after device creation)
    Sleep(50);
    HWND hWnd = FindWindowForCurrentProcess();
    if (!hWnd) {
        // fallback: try GetActiveWindow (less reliable)
        hWnd = GetActiveWindow();
    }

    IVideoDriver* driver = device->getVideoDriver();
    ISceneManager* smgr = device->getSceneManager();
    driver->setTextureCreationFlag(ETCF_CREATE_MIP_MAPS, false);
    // --- simple test scene ---
    IAnimatedMesh* mesh = smgr->getMesh("../../media/sydney.md2");
    if (!mesh) {
        // if the mesh is not found, you could create something simple (a test node)
        // but here we prefer to alert and exit so you place your media.
        MessageBoxA(NULL, "Could not find ../../media/sydney.md2. Place the mesh in that path or adjust the code.", "Warning", MB_ICONWARNING);
        device->drop();
        return 1;
    }

    IAnimatedMeshSceneNode* node = smgr->addAnimatedMeshSceneNode(mesh);
    if (node) {
        node->setMaterialFlag(EMF_LIGHTING, true);
        node->setMD2Animation(scene::EMAT_STAND);
        node->setMaterialFlag(EMF_BILINEAR_FILTER, false);
        ITexture* tex = driver->getTexture("../../media/sydney.bmp");
        if (tex) node->setMaterialTexture(0, tex);

        // --- Add volumetric shadows to Sydney ---
        node->addShadowVolumeSceneNode();          // Enables stencil shadow
        // (Optional) Adjust shadow color (semi-transparent)
        smgr->setShadowColor(video::SColor(100,0,0,0));
    }

    // --- Add a low-height cube at Sydney's feet ---
    // Scale it to be a wide platform (XZ scale) and very thin (Y scale)
    // Center it vertically just below Sydney (assume Sydney is at y=0)
    f32 cubeHeight = 2.5f;          // cube height
    f32 cubeWidth = 5.0f;           // width (X)
    f32 cubeDepth = 5.0f;           // depth (Z)
    ISceneNode* cube = smgr->addCubeSceneNode(
        1.0f,                     // base size 1x1x1
        0,                        // parent
        -1,                       // id
        vector3df(0, -cubeHeight/2, 0), // position centered at Y = -height/2 so that its top face is at Y=0
        vector3df(0,0,0),         // rotation
        vector3df(cubeWidth, cubeHeight, cubeDepth) // scale
    );
    if (cube) {
        // Give it a color or texture to make it visible
        cube->setMaterialFlag(video::EMF_LIGHTING, true);
        cube->setPosition(core::vector3df(0.0f,-30.5f,0.0f));
        cube->setScale(core::vector3df(25.0f,15.0f,25.0f));
        cube->setMaterialFlag(video::EMF_NORMALIZE_NORMALS, true);
        // Optional: assign a ground texture
        video::ITexture* groundTex = driver->getTexture("../../media/rockwall.jpg"); // change it for the one you have
        if (groundTex)
            cube->setMaterialTexture(0, groundTex);
        else
            cube->getMaterial(0).EmissiveColor = video::SColor(255,80,80,80); // dark gray
    }

    // --- Create the rotating light ---
    ILightSceneNode* rotatingLight = smgr->addLightSceneNode(
        0,                          // parent (none)
        vector3df(0, 20, 0),        // initial position (will be updated later)
        SColorf(1.0f, 1.0f, 1.0f),  // diffuse white color
        60.0f);                     // radius of influence
    if (rotatingLight)
    {
        // Adjust light parameters (optional)
        rotatingLight->getLightData().DiffuseColor = SColorf(1.0f, 1.0f, 1.0f);
        rotatingLight->getLightData().SpecularColor = SColorf(0.5f, 0.5f, 0.5f);
        rotatingLight->getLightData().Attenuation = vector3df(1.0f, 0.000001f, 0.1f);
        // Keep it as point light (default)
    }

    ICameraSceneNode* cam = smgr->addCameraSceneNode(0, vector3df(0,30,-70), vector3df(0,5,0));
    cam->setFOV(54.0f * core::DEGTORAD);

    // Create a 128x128 render target
    ITexture* rt = driver->addRenderTargetTexture(dimension2d<u32>(TEX_W, TEX_H), "RT_GDI");
    if (!rt) {
        MessageBoxA(NULL, "Could not create 128x128 render target.", "Error", MB_ICONERROR);
        device->drop();
        return 1;
    }

    // Temporary tightly-packed buffer (reused)
    std::vector<uint8_t> tmp;
    tmp.resize((size_t)TEX_W * (size_t)TEX_H * 4); // 32 bpp

    // Prepare BITMAPINFO (32bpp top-down)
    BITMAPINFO bmi;
    ZeroMemory(&bmi, sizeof(bmi));
    bmi.bmiHeader.biSize = sizeof(BITMAPINFOHEADER);
    bmi.bmiHeader.biWidth = (LONG)TEX_W;
    bmi.bmiHeader.biHeight = -((LONG)TEX_H); // top-down
    bmi.bmiHeader.biPlanes = 1;
    bmi.bmiHeader.biBitCount = 32;
    bmi.bmiHeader.biCompression = BI_RGB;
    bmi.bmiHeader.biSizeImage = 0;

    // FPS variables (optional)
    u32 lastTime = device->getTimer()->getRealTime();
    u32 frames = 0;

    // Main loop
    while (device->run())
    {
        // --- Update rotating light position ---
        if (rotatingLight)
        {
            u32 t = device->getTimer()->getTime(); // milliseconds since start
            float angle = t * 0.002f;               // rotation speed (radians per millisecond * factor)
            float radius = 15.0f;                    // distance from center
            float height = 5.0f;                     // light height
            rotatingLight->setPosition(vector3df(
                radius * cosf(angle),
                height,
                radius * sinf(angle)
            ));
        }

        // 1) Render to the render target (clear RT)
        driver->setRenderTarget(rt, true, true, video::SColor(255,0,0,0)); // alpha=255 (opaque), black
        smgr->drawAll();
        driver->setRenderTarget(0); // switch back to backbuffer

        // 2) Readback from RT -> tmp (row by row respecting pitch)
        //    In older Irrlicht versions, lock() takes a bool: true = read-only
        void* pixels = rt->lock(video::ETLM_READ_ONLY);   // read-only access   // <--- CHANGE HERE: use true instead of ETLM_READ_ONLY
        if (pixels)
        {
            // real pitch (bytes per row) from the texture
            u32 pitch = rt->getPitch(); // bytes per row
            // color format in case you want to use different logic per format
            ECOLOR_FORMAT fmt = rt->getColorFormat(); // e.g. ECF_A8R8G8B8

            // Copy row by row respecting pitch
            uint8_t* src = reinterpret_cast<uint8_t*>(pixels);
            const size_t dstRowBytes = (size_t)TEX_W * 4u;
            for (u32 y = 0; y < TEX_H; ++y) {
                // CHANGE: use &tmp[0] instead of tmp.data()
                uint8_t* dstRow = &tmp[0] + (size_t)y * dstRowBytes;
                uint8_t* srcRow = src + (size_t)y * (size_t)pitch;
                // copy only min(dstRowBytes, pitch) bytes for safety
                size_t copyBytes = dstRowBytes;
                if (copyBytes > (size_t)pitch) copyBytes = (size_t)pitch;
                memcpy(dstRow, srcRow, copyBytes);
                // If pitch > dstRowBytes, ignore the padding (not needed)
            }

            rt->unlock();

            // 3) Draw to the window with StretchDIBits (GDI scaling)
            HDC hdc = GetDC(hWnd);
            if (hdc)
            {
                // get current client size
                RECT rc;
                GetClientRect(hWnd, &rc);
                int winW = rc.right - rc.left;
                int winH = rc.bottom - rc.top;
                SetStretchBltMode(hdc, COLORONCOLOR);
                if (winW > 0 && winH > 0) {
                    StretchDIBits(
                        hdc,
                        0, 0, winW, winH,       // dest rect (scale to whole window)
                        0, 0, TEX_W, TEX_H,     // src rect from tightly-packed image (top-down)
                        &tmp[0],                 // <--- CHANGE HERE: &tmp[0] instead of tmp.data()
                        &bmi,
                        DIB_RGB_COLORS,
                        SRCCOPY
                    );
                }
                ReleaseDC(hWnd, hdc);
            }
        }
        else
        {
            // lock returned NULL: the driver does not allow readback from render target
            // (this can happen on some drivers/hardware). For debugging:
            OutputDebugStringA("Warning: rt->lock() returned NULL. Cannot readback this frame.\n");
            // You could fall back to driver->createScreenShot(...) or draw with draw2DImage.
        }

        // 4) Update title with FPS (optional)
        frames++;
        u32 currentTime = device->getTimer()->getRealTime();
        if (currentTime - lastTime >= 1000)
        {
            stringw windowText = L"Render 128x128 scaled with GDI - FPS: ";
            windowText += frames;
            device->setWindowCaption(windowText.c_str());
            lastTime = currentTime;
            frames = 0;
        }

        // slight sleep to avoid flickering on some drivers (optional)
         Sleep(16);
    }

    device->drop();
    return 0;
}
With a Sleep(16) to consume 5% or 10% CPU, very light. Without Sleep I was going at more than 100 fps in the example in the gif, with Sleep(16) I was going at a smooth 32 fps, with hundreds of things open...

By the way, when I compiled without mipmaps and without a bilinear filter I got better performance, although I had to recompile the irrlicht .dll to do it.

For a retro game this style is incredible, unique, and compatible with practically any computer that has Windows, although as you said, it works well only in small scenes (I think you were referring to something else)
Irrlicht is love, Irrlicht is life, long live to Irrlicht
CuteAlien
Admin
Posts: 9969
Joined: Mon Mar 06, 2006 2:25 pm
Location: Tübingen, Germany
Contact:

Re: Why is it so slow when I scale the render?

Post by CuteAlien »

Yeah, just it's a constant time. So the more your game does the less that factor plays a role. Well, there's no magic trick I can do to make Irrlicht faster.
And rendering fullscreen with software render will indeed be slow.

Btw, GDI StretchDIBits should generally be hardware accelerated. It depends a bit on your system and drivers, but I think most graphic cards still have support for it. So GDI - Direct3D comparison makes more sense. Pure software will unlikely catch up.
IRC: #irrlicht on irc.libera.chat
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
Noiecity
Posts: 363
Joined: Wed Aug 23, 2023 7:22 pm
Contact:

Re: Why is it so slow when I scale the render?

Post by Noiecity »

Thank you mr cutealien. I think that in most cases, pixels are always drawn using the graphics card (rgb and pixel coordinate), I think the only difference is that GDI uses the system resources (for example I see that my graphics card uses 2% of the gpu, when I open the program it continues using the same, using directdraw increases it almost to the top of the graphics card unless I add a Sleep)(GPU-Z).

Certainly for large resolutions, burningsvideo is unviable, but I tried it on a 4k computer, with more elements, and it was going at 800 fps... on a more modern computer... but on an older computer, it can probably only go well at 1366x768 maximum or similar.

I'm trying to make the orthographic view work in irrlicht for isometric graphics, but I give up, I can't make it work, if I render a cube, everything is fine, but when adding an extra cube the z-order seems to be inverted, and if the model is not closed the case gets even worse...

I was hoping to replicate starcraft 1, but with dynamic light, or sacred 1...
Irrlicht is love, Irrlicht is life, long live to Irrlicht
Noiecity
Posts: 363
Joined: Wed Aug 23, 2023 7:22 pm
Contact:

Re: Why is it so slow when I scale the render?

Post by Noiecity »

Okay, I have seen that irrlicht 1.9.0 already has optimizations, I could not remove the mipmaps in this version haha, however I was able to improve performance.
On my old 2.7ghz computer, it was running at 300 fps even at 1920x1080 resolutions (it remained stable)

Before, at least in 1.8.5, the unscaled rendering obtained more FPS up to approximately 512x512, then it began to decrease, already at 1366x768 it was unbearable, while render to texture is constant, it barely suffers from scaling, very few fps are reduced because of it.

Code: Select all

// main_multithread_fullscene.cpp
// Multithreaded Irrlicht rendering with GDI display
// This program creates two threads:
// - RenderThread: uses Irrlicht to render a 3D scene to a 128x128 render target,
//   then copies the pixels into a shared buffer.
// - DisplayThread: takes the latest frame from the shared buffer and displays it
//   in a main window using GDI StretchDIBits.

#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <irrlicht.h>
#include <cmath>
#include <process.h>


#ifdef _MSC_VER
#pragma comment(lib, "Irrlicht.lib")
#endif

using namespace irr;
using namespace core;
using namespace scene;
using namespace video;
using namespace io;

// Structure for data shared between the two threads
struct SharedData {
    u8 buffer1[128 * 128 * 4];          // First framebuffer (BGRA 32-bit)
    u8 buffer2[128 * 128 * 4];          // Second framebuffer (double buffering)
    volatile int currentWriteBuffer;     // Index of buffer being written (0 or 1)
    volatile int lastReadyBuffer;        // Index of buffer ready for display, -1 if none
    CRITICAL_SECTION cs;                 // Critical section to protect buffer indices
    HANDLE renderDoneEvent;               // Event signaled when a new frame is ready
    volatile bool quit;                   // Flag to signal threads to exit
    HWND hTargetWnd;                      // Main window where the image is displayed
    HWND hIrrWnd;                         // Hidden window used by Irrlicht as a rendering surface
};

// Rendering thread: runs Irrlicht, renders the scene to a texture, and fills the shared buffer
unsigned int __stdcall RenderThread(void* param) {
    SharedData* shared = (SharedData*)param;

    // Create Irrlicht device inside the hidden window
    SIrrlichtCreationParameters params;
    params.DriverType = video::EDT_BURNINGSVIDEO;   // Software renderer, fast and simple
    params.WindowSize = dimension2d<u32>(128, 128);
    params.Bits = 32;
    params.Fullscreen = false;
    params.Stencilbuffer = true;                     // Required for stencil shadows
    params.Vsync = false;
    params.WindowId = reinterpret_cast<void*>(shared->hIrrWnd); // Render into hidden window

    IrrlichtDevice* device = createDeviceEx(params);
    if (!device) {
        MessageBoxA(shared->hTargetWnd, "Error creating Irrlicht device", "Error", MB_ICONERROR);
        return 1;
    }

    IVideoDriver* driver = device->getVideoDriver();
    ISceneManager* smgr = device->getSceneManager();
    driver->setTextureCreationFlag(ETCF_CREATE_MIP_MAPS, false);

    // --- Build the full 3D scene (similar to Irrlicht examples) ---

    // Load the Sydney model
    IAnimatedMesh* mesh = smgr->getMesh("../../media/sydney.md2");
    if (!mesh) {
        MessageBoxA(shared->hTargetWnd, "sydney.md2 not found, using fallback cube", "Warning", MB_ICONWARNING);
        // Fallback: a simple cube to have something visible
        scene::ISceneNode* fallback = smgr->addCubeSceneNode(20.0f);
        if (fallback) fallback->setMaterialFlag(video::EMF_LIGHTING, false);
    } else {
        IAnimatedMeshSceneNode* node = smgr->addAnimatedMeshSceneNode(mesh);
        if (node) {
            node->setMaterialFlag(EMF_LIGHTING, true);
            node->setMD2Animation(scene::EMAT_STAND);
            node->setMaterialFlag(EMF_BILINEAR_FILTER, false);
            ITexture* tex = driver->getTexture("../../media/sydney.bmp");
            if (tex) node->setMaterialTexture(0, tex);

            // --- Volumetric shadows using stencil buffer ---
            scene::IShadowVolumeSceneNode * shadVol = node->addShadowVolumeSceneNode();
            smgr->setShadowColor(video::SColor(100, 0, 0, 0)); // Semi‑transparent shadow
        }
    }

    // --- Ground cube (like in the original example) ---
    ISceneNode* cube = smgr->addCubeSceneNode(1.0f);
    if (cube) {
        cube->setMaterialFlag(video::EMF_LIGHTING, true);
        cube->setPosition(core::vector3df(0.0f, -30.5f, 0.0f));
        cube->setScale(core::vector3df(25.0f, 15.0f, 25.0f));
        cube->setMaterialFlag(video::EMF_NORMALIZE_NORMALS, true);

        // Try to load a texture for the ground
        video::ITexture* groundTex = driver->getTexture("../../media/rockwall.jpg");
        if (groundTex)
            cube->setMaterialTexture(0, groundTex);
        else
            cube->getMaterial(0).EmissiveColor = video::SColor(255, 80, 80, 80); // dark gray
    }

    // --- Rotating light (same as original) ---
    ILightSceneNode* rotatingLight = smgr->addLightSceneNode(
        0,
        vector3df(0, 20, 0),
        SColorf(1.0f, 1.0f, 1.0f),
        60.0f);
    if (rotatingLight) {
        rotatingLight->getLightData().DiffuseColor = SColorf(1.0f, 1.0f, 1.0f);
        rotatingLight->getLightData().SpecularColor = SColorf(0.5f, 0.5f, 0.5f);
        rotatingLight->getLightData().Attenuation = vector3df(1.0f, 0.000001f, 0.00001f);
    }

    // --- Camera (same as original) ---
    ICameraSceneNode* cam = smgr->addCameraSceneNode(
        0,
        vector3df(0, 30, -70),
        vector3df(0, 5, 0));
    cam->setFOV(54.0f * core::DEGTORAD);

    // --- Create a 128x128 render target ---
    ITexture* rt = driver->addRenderTargetTexture(dimension2d<u32>(128, 128), "RT");
    if (!rt) {
        MessageBoxA(shared->hTargetWnd, "Failed to create render target", "Error", MB_ICONERROR);
        device->drop();
        return 1;
    }

    // Main render loop
    while (!shared->quit && device->run()) {
        // Update rotating light position over time
        if (rotatingLight) {
            u32 t = device->getTimer()->getTime();
            float angle = t * 0.002f;               // same speed as original
            float radius = 15.0f;
            float height = 5.0f;
            rotatingLight->setPosition(vector3df(
                radius * cosf(angle),
                height,
                radius * sinf(angle)
            ));
        }

        // Render scene to the render target (clear with black)
        driver->setRenderTarget(rt, true, true, video::SColor(255, 0, 0, 0));
        smgr->drawAll();
        driver->setRenderTarget(0);

        // Read pixels from the render target
        void* pixels = rt->lock(video::ETLM_READ_ONLY);
        if (pixels) {
            u32 pitch = rt->getPitch();                 // actual row pitch in bytes
            int writeIdx = shared->currentWriteBuffer;  // buffer to write into
            u8* dst = (writeIdx == 0) ? shared->buffer1 : shared->buffer2;
            u8* src = (u8*)pixels;

            // Copy the pixel data, handling possible pitch mismatch
            if (pitch == 128 * 4) {
                memcpy(dst, src, 128 * 128 * 4);
            } else {
                for (u32 y = 0; y < 128; ++y) {
                    memcpy(dst + y * 128 * 4, src + y * pitch, 128 * 4);
                }
            }
            rt->unlock();

            // Update shared buffer indices (protected by critical section)
            EnterCriticalSection(&shared->cs);
            shared->lastReadyBuffer = writeIdx;          // mark this buffer as ready
            shared->currentWriteBuffer = 1 - writeIdx;    // flip to the other buffer
            LeaveCriticalSection(&shared->cs);

            // Signal the display thread that a new frame is available
            SetEvent(shared->renderDoneEvent);
        }

        // Small sleep to avoid hogging the CPU (optional)
        // Sleep(0);
    }

    device->drop();
    return 0;
}

// Display thread: waits for a new frame and draws it to the main window using GDI
unsigned int __stdcall DisplayThread(void* param) {
    SharedData* shared = (SharedData*)param;

    // Bitmap info for 32-bit top-down DIB (negative height indicates top-down)
    BITMAPINFO bmi = {0};
    bmi.bmiHeader.biSize = sizeof(BITMAPINFOHEADER);
    bmi.bmiHeader.biWidth = 128;
    bmi.bmiHeader.biHeight = -128;      // top-down
    bmi.bmiHeader.biPlanes = 1;
    bmi.bmiHeader.biBitCount = 32;
    bmi.bmiHeader.biCompression = BI_RGB;

    DWORD lastTime = GetTickCount();
    int frames = 0;
    char title[256];

    while (!shared->quit) {
        // Wait for a new frame (timeout 100 ms to check quit flag)
        DWORD wait = WaitForSingleObject(shared->renderDoneEvent, 100);
        if (wait == WAIT_OBJECT_0) {
            int readIdx;
            EnterCriticalSection(&shared->cs);
            readIdx = shared->lastReadyBuffer;   // get the latest ready buffer
            shared->lastReadyBuffer = -1;         // mark it as consumed
            LeaveCriticalSection(&shared->cs);

            if (readIdx != -1) {
                u8* src = (readIdx == 0) ? shared->buffer1 : shared->buffer2;

                HDC hdc = GetDC(shared->hTargetWnd);
                if (hdc) {
                    RECT rc;
                    GetClientRect(shared->hTargetWnd, &rc);
                    SetStretchBltMode(hdc, COLORONCOLOR);
                    // Stretch the 128x128 image to the whole client area
                    StretchDIBits(hdc,
                        0, 0, rc.right, rc.bottom,
                        0, 0, 128, 128,
                        src,
                        &bmi,
                        DIB_RGB_COLORS,
                        SRCCOPY);
                    ReleaseDC(shared->hTargetWnd, hdc);
                }

                // Update FPS counter in window title once per second
                frames++;
                DWORD now = GetTickCount();
                if (now - lastTime >= 1000) {
                    sprintf(title, "MultiThread Full Scene - FPS: %d", frames);
                    SetWindowTextA(shared->hTargetWnd, title);
                    lastTime = now;
                    frames = 0;
                }
            }
        }
    }
    return 0;
}

// Main window procedure
LRESULT CALLBACK WndProc(HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam) {
    switch (msg) {
        case WM_DESTROY:
            PostQuitMessage(0);
            return 0;
        case WM_KEYDOWN:
            if (wParam == VK_ESCAPE) DestroyWindow(hWnd);
            return 0;
    }
    return DefWindowProc(hWnd, msg, wParam, lParam);
}

int WINAPI WinMain(HINSTANCE hInstance, HINSTANCE, LPSTR, int nCmdShow) {
    // Register the main window class
    const char CLASS_NAME[] = "MultiThreadFullScene";
    WNDCLASSA wc = {};
    wc.lpfnWndProc = WndProc;
    wc.hInstance = hInstance;
    wc.lpszClassName = CLASS_NAME;
    wc.hbrBackground = (HBRUSH)(COLOR_WINDOW+1);
    wc.hCursor = LoadCursor(NULL, IDC_ARROW);
    RegisterClassA(&wc);

    // Create the main window (where the rendered image will be shown)
    HWND hMainWnd = CreateWindowExA(0, CLASS_NAME, "MultiThread Full Scene (Sydney with shadows)",
        WS_OVERLAPPEDWINDOW | WS_VISIBLE,
        CW_USEDEFAULT, CW_USEDEFAULT, 660, 520,
        NULL, NULL, hInstance, NULL);
    if (!hMainWnd) return 1;

    // Create a hidden window for Irrlicht to render into
    HWND hIrrWnd = CreateWindowExA(0, "STATIC", "",
        WS_POPUP, 0, 0, 128, 128, NULL, NULL, hInstance, NULL);
    ShowWindow(hIrrWnd, SW_HIDE); // Hide it

    // Initialize shared data
    SharedData shared = {};
    shared.currentWriteBuffer = 0;
    shared.lastReadyBuffer = -1;
    shared.hTargetWnd = hMainWnd;
    shared.hIrrWnd = hIrrWnd;
    shared.quit = false;
    InitializeCriticalSection(&shared.cs);
    shared.renderDoneEvent = CreateEvent(NULL, FALSE, FALSE, NULL); // auto-reset, initially unsignaled

    // Create the two threads
    HANDLE hRender = (HANDLE)_beginthreadex(NULL, 0, RenderThread, &shared, 0, NULL);
    HANDLE hDisplay = (HANDLE)_beginthreadex(NULL, 0, DisplayThread, &shared, 0, NULL);

    if (!hRender || !hDisplay) {
        MessageBoxA(hMainWnd, "Error creating threads", "Error", MB_ICONERROR);
        return 1;
    }

    // Main message loop
    MSG msg;
    while (GetMessage(&msg, NULL, 0, 0)) {
        TranslateMessage(&msg);
        DispatchMessage(&msg);
    }

    // Shutdown: signal threads to quit and wait for them
    shared.quit = true;
    SetEvent(shared.renderDoneEvent); // wake up display thread if waiting
    WaitForSingleObject(hRender, 3000);
    WaitForSingleObject(hDisplay, 3000);
    CloseHandle(hRender);
    CloseHandle(hDisplay);
    CloseHandle(shared.renderDoneEvent);
    DeleteCriticalSection(&shared.cs);
    DestroyWindow(hIrrWnd);

    return 0;
}
I used 2 threads this time, one for rendering and the other for scaling, but 2 threads don't solve anything even though it seems like it does, I don't even know how I got to this solution, but it works.

Without shadow volume I reached more than 1000 constant fps at all resolutions.

Shadow volume could be replaced by something faster like cascading shadows.

If you could try this method with larger scenes, I would be grateful if you told me about your experience.

I compiled it in codeblocks 12, in windows xp.
Last edited by Noiecity on Sun Mar 01, 2026 2:49 pm, edited 1 time in total.
Irrlicht is love, Irrlicht is life, long live to Irrlicht
Noiecity
Posts: 363
Joined: Wed Aug 23, 2023 7:22 pm
Contact:

Re: Why is it so slow when I scale the render?

Post by Noiecity »

Although now I go from 5000 fps to 1000, it is an advance, however all this would be fixed if the screens had larger pixels, but with low resolution, there would be no performance drops due to scaling, lmao.
You can buy a banana pi m1:
Image
https://www.aliexpress.com/item/1005007982009032.html
You connect two of these to your banana pi m1:
Image
https://www.aliexpress.com/item/32845686589.html

Using HDMI you transmit the image through this controller:
Image
https://www.aliexpress.com/item/32845295658.html

Then you connect the screens to that board.

You run irrlicht with burningsvideo/Opengl, and voila, real gaming experience, with a real refresh rate of 3840Hz (although you will see a maximum of 60 different images per second, basically it flashes 3840 times per second).

At night you feel a real "bloom".... I have no proof, but I have no doubts either.

And if you want to buy the banana pi m1 with case and power charger:
Image
https://www.aliexpress.com/item/1005009715834706.html
----
(or a simple LED like the size of a laptop):
Image
https://www.aliexpress.com/item/10000413660185.html
(In fact you only need a banana pi m1 and an hdmi to run irrlicht on a monitor with an hdmi input, using the scaling and burningsvideo technique, but the banana pi m1 also supports opengl but I have not tested that.)
I scale using 128x128, but you could simply scale from 512x512 using opengl to 4k resolutions.
-----
You would no longer have a penalty for scales, you would maintain a very high FPS rate (even if you only see 60 fps per second, it would be more for the processor). You would no longer worry about shaders reducing performance because it would become negligible, you no longer need bloom filters since they would come by default with the LEDs.

But it is a possibility that I would like to be able to achieve with someone, but I am poor. Irrlicht Game Console.

I would use my discovery of different angles for each texture based on render and the uvmap based on the camera perspective, not only would it be fast, it would even be very realistic.

If anyone wants to achieve that... I am available.



Edit:You could even create your own VR glasses on a budget, since there are flexible ones available:

https://youtu.be/pFnTVNhYD6E?si=RLAxGZGpxzvtzcbY

In the VR helmet you could use polarizing optical glass, it would even improve the colors, besides the LEDs do not damage the eyes due to pollution, but they can blur your vision in the long term or cause fatigue.
Image
Last edited by Noiecity on Sun Mar 01, 2026 2:21 pm, edited 2 times in total.
Irrlicht is love, Irrlicht is life, long live to Irrlicht
CuteAlien
Admin
Posts: 9969
Joined: Mon Mar 06, 2006 2:25 pm
Location: Tübingen, Germany
Contact:

Re: Why is it so slow when I scale the render?

Post by CuteAlien »

OK, that's getting offtopic (and most links not working in my country, but then again I'm not a hardware tinkerer anyway). And my screen resolution is also 1920x1080.

The code... has the AI bot feeling and I think I mentioned in another thread that it's not possible for me to keep up with evalulating code like that. You can post x versions in a day - I may spend time my spare time trying to understand one version every few days (this is real life-time for me - not some bot running and doing the understanding for me, but it's actual hours taken from my evenings/weekends). Basically once you no longer code it yourself you won't understand it and no one will be able to help you. Just telling AI to try another thing and then asking an open-source maintainer to spend time on it is extremely unkind!!! Spend your own time on it and you _will_ learn this stuff. And in my experience it's the _only_ way to learn. And yes, it takes time - one of my first games was a snake clone I did in my spare-time, the ascii version took one afternoon so I decided to do it with real graphics and sound. It took a (hobby) year (and I had a friend doing the graphics). But it tought me lots of the basics of game programming. If I had had AI back then and let it do it for me I would have gotten the same results maybe faster. But I would afterwards have been as stupid as before. Don't fall in that trap!

Inverted z-order in cube usually has reasons like backface culling not disabled and z-buffer flags being set wrong. Thought games back in the days of Starcraft and Sacred generally faked the isometric look with bitmaps (don't remember exactly, not sure if I'll manage to get Sacred still working these days and never played Starcraft).
IRC: #irrlicht on irc.libera.chat
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
Noiecity
Posts: 363
Joined: Wed Aug 23, 2023 7:22 pm
Contact:

Re: Why is it so slow when I scale the render?

Post by Noiecity »

Thanks, you guessed the situation perfectly, lmao. I'm just really grateful to irrlicht and his work, and it bothers me that he's so underrated. Whenever I can, I try to get someone to make a game with irrlicht, but they always complain about the blurry effects I try to achieve.

And yes, I think those games use sprites, at least for the environments, since I remember it had dynamic lighting, and the monsters and characters looked like low-poly models on top of them. Starcrt must be full sprites... I always try to program and learn C++, but I always go back to basics, I'm defeated by how well-optimized irrlicht is, and I come back humiliated, haha.

I also always check the forum to read the posts... because I learn a lot.

Thanks.
Irrlicht is love, Irrlicht is life, long live to Irrlicht
Post Reply