A few years ago when I tested irrlicht, I had an old notebook with which I was testing to see its behavior, I noticed in one of the tests that programs can completely skip a condition (if) when the CPU is too stressed, it happened to me even with irrlicht engine when passing certain collisions ignoring my collision.
Well, this is similar to "branch misprediction", only that instead of being a "logical" error either of the program or of the processor not being able to interpret well where the written instruction is directed, what happened when stressing the cpu is known as "hot-spot".
The "hot-spot" occurs when an area of the semiconductor gets too hot, which can produce errors when interpreting the logic, such as skipping a condition or producing a result with respect to the expected results.
When the processor temperature is measured, it indicates a general temperature, but this temperature varies depending on the zone, one zone may be much hotter than another, and it is very difficult to handle in situations of high thermal stress.
This can make a modern game on a modern engine such as unreal engine 5, where consumption is higher, prone to crashes, no matter how many implementations you make, glitchy animations, glitchy collisions, etc.
There is a study in this regard, which can be summarized as "can cause serious problems, such as drastic loss of performance, incorrect circuit operation and reduced device lifetime. Traditional cooling solutions and design tools are no longer sufficient to mitigate these effects.":
https://sites.tufts.edu/tcal/files/2021 ... C_2021.pdf
The following study addresses "Thermally induced soft errors or delay faults capable of causing data corruption or incorrect execution.":
https://scholarworks.umass.edu/bitstrea ... 2/download
It is worth noting that these studies speak of solutions, but the problem will continue to persist as seen in current modern chips.
A thermal hot-spot can cause a logic condition to "jump", even though there is no apparent error in the code or design.
At higher temperature, MOSFET transistors:
Conduct worse (lower electron/carrier mobility).
They take longer to change state (0→1 or 1→0).
This results in internal signal delays, which can cause a signal:
- Arrive late at a logic gate,
- Not arrive before the clock edge,
- And the circuit registers an erroneous data.
For example:
Code: Select all
if (x > 5) {
doSomething();
}- Misevaluate the result.
- Not trigger the control signal.
- And the doSomething() block is not executed, even though x was greater than 5.
This is not a software bug. It is a physical bug: the logical signal did not change in time.
The processors use synchronous logic based on clock edges.
If a signal arrives late because of the hotspot:
- The logical condition is not reached.
- A comparison, an instruction, a conditional operation, etc. is skipped.
Not only can a condition be skipped, there are cases that can even change the values at the time of interpretation, known as "Bit flip", i.e. a register cell changes its value due to heat (soft error):
https://semiengineering.com/heat-relate ... c-designs/
https://dramsec.ethz.ch/papers/mathur-dramsec22.pdf
https://arxiv.org/abs/2110.10291