Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unrestrained high rendering frame rates cause high gpu utilization (inability to cap to 60) #1107

Open
Cleod9 opened this issue Oct 10, 2022 · 9 comments

Comments

@Cleod9
Copy link

Cleod9 commented Oct 10, 2022

So I'm having trouble trying to figure out a way to cap the rendering frame rate of our game to 60 in a consistent, reliable way regardless of the rendering target (hlsdl, hldx, etc).

For example, with vsync on and a 165hz monitor (as to be expected with heaps out of the box):

image

And vsync off it will easily get over 2000fps with over 90% gpu utilization. For this reason we want to introduce a hard cap at 60 since there is no visual benefit to rendering unchanged frames (we don't process subframes nor use delta times).

So far I experimented a bit with hxd.Timer, in addition to playing with the sleep timer in SDL (#941), but I still haven't been able to get the frame rate under control. The SDL sleep tweak was the closet to a solution, but the frame rate appears to be wobbly/unstable, and that change is also SDL-specific. I should also note that Dead Cells appears to exhibit similar behavior.

I understand the concept that vsync should be synced with the monitor refresh rate and uncapping it should lift the restriction, however it's not too uncommon for determinstic 2D games to lock at a hard 60. In a perfect world I'd love to just see the game hold at 60 when vsync is off. If anyone has any guidance on the matter it would be greatly appreciated!

@MSGhero
Copy link
Contributor

MSGhero commented Oct 10, 2022

Are you opposed to using delta time? I've overridden most of hxd.App to force update and render calls onto dt increments (with room to improve), with the real fps being something quite high.

@Cleod9
Copy link
Author

Cleod9 commented Oct 10, 2022

Are you opposed to using delta time? I've overridden most of hxd.App to force update and render calls onto dt increments (with room to improve), with the real fps being something quite high.

@MSGhero Thanks for the reply. Actually, we do something similar to that already with a dt accumulator by leveraging deepNightLib's Process.hx https://github.com/deepnight/deepnightLibs/blob/35bbf4368cd6a4b160a1818bc61b977fe90cecd3/src/dn/Process.hx

Gameplay-wise things run as smooth as butter, but like you said the real FPS can get incredibly high, which seems like wasted resources (and a burning hot GPU). If there were some way to simply interleave rendering in between the logic updates I feel like our issue would be solved. But if we were to leverage delta time I fear it wouldn't address the main issue with the high frame rates, since our goal is to cap the rate of render calls. From my understanding, the main purpose behind delta-based update loops is to smooth out gameplay visuals based on varying machine performance, but we want to avoid wasting resources by just not drawing altogether if there were no actual graphical changes. So I imagine even in a delta-time world this problem still exists in Heaps (and it seems likely other Heaps games probably exhibit the same problem regardless, Dead Cells being a prime example)

@MSGhero
Copy link
Contributor

MSGhero commented Oct 10, 2022

I see your point. My head was a level higher in modulating scene.render() calls, but your focus probably goes deeper into however sdl or hlsdl handles it.

@Cleod9
Copy link
Author

Cleod9 commented Oct 11, 2022

Yeah, it seems like I'm getting a similar response from members of the Haxe Discord as well. If tweaking hlsdl/hldx is the only solution I'm ok with it, but I'm just a little surprised there is no exposed mechanism to constrain the rendering loop at a higher level.

@ncannasse
Copy link
Member

ncannasse commented Oct 11, 2022 via email

@Cleod9
Copy link
Author

Cleod9 commented Oct 11, 2022

Thanks nicolas, I think we may try something like that as well if nothing else pans out. Even though ideally i'd like to see a solid locked/precise 60fps, adding sleeps would address the core issue with constraining the render calls to an upper limit

@Cleod9
Copy link
Author

Cleod9 commented Oct 16, 2022

Update:

So I was able to solve/workaround the issue after discovering the source of my struggles with Sys.sleep(). It turns out that the accuracy of the function is different between SDL and DX (I assume there's something going on behind the scenes that differs between the two render targets)

For example, a printout of seconds elapsed after a 1ms sleep duration every frame:

Sys.sleep(0.001) in DirectX:

0.0143139362335205
0.0146241188049316
0.0143618583679199
0.0145838260650635
0.014082670211792

Sys.sleep(0.001) in SDL:

0.00111913681030273
0.00150227546691895
0.00146150588989258
0.00152826309204102
0.00146985054016113

As you can see above, DX is off by 10x. What's especially strange is that if you add another level of precision with 0.1ms it appears to work more accurately, but is still off compared to SDL:

Sys.sleep(0.0001) in DirectX:

1.43051147460938e-06
2.86102294921875e-06
2.38418579101563e-06
1.66893005371094e-06
1.43051147460938e-06

Sys.sleep(0.0001) in SDL:

4.76837158203125e-07
7.15255737304688e-07
4.76837158203125e-07
9.5367431640625e-07
4.76837158203125e-07

So it's clear from this that DirectX is capable of more precise timing but it is not consistent with SDL's precision. It's unclear to me the reason why since it seems to work fine if I sleep for a whole number, like 1 second.

As for my workaround, a spin lock seems to be the only solution in DX. Not great for the CPU usage but definitely an improvement as far as resource consumption is concerned.

The fix I introduced locally involved replacing the present() logic here:

cur.driver.present();

Psuedo-code below:

// Related helper vars
// desiredFrameRate = 120.0; // Set this to the rough value that you want the frame rate to be capped at.
// updateFrameRate = 60.0; // Set this to match game update loop (required in order to skip frames when we want to render fewer frames than the update tick speed)
// timeAccumulator = 0.0; // This tracks accumulated frame time (used with updateFrameRate to calculate when to redraw)
// static var lastFrame = 0.0 // For tracking elapsed time of update loop
if( !vsync) {
	var desiredRefreshTime:Float = 1 / desiredFrameRate;
	var now = haxe.Timer.stamp();
	var spent = now - lastFrame;
	timeAccumulator += lastFrame == 0 ? desiredRefreshTime : spent;
	// If frame rate > 0 (i.e. not unlimited), and some extra time remains for the update tick to hit target frame rate
	if (desiredFrameRate > 0 && spent < desiredRefreshTime) {
		// Sleep to cap the frame rate (but don't sleep longer than the update frame rate)
		var sleepSecs:Float = desiredRefreshTime - spent;
		if (sleepSecs < 1 / updateFrameRate) {
			preciseSleep(sleepSecs); // Either sleep() or spin lock depending on target
		}
		
		// Update time accumulator
		spent = haxe.Timer.stamp() - now;
		timeAccumulator += spent;
	}
	// Only render if enough time has passed to satisfy refresh rate (i.e. for FPS below 60, we need to skip some frames)
	if (timeAccumulator >= desiredRefreshTime) {
		// Use while loop to conserve remainder millis
		while (timeAccumulator - desiredRefreshTime > 0) {
			timeAccumulator -= desiredRefreshTime;
		}
		cur.driver.present();
	}
	lastFrame = haxe.Timer.stamp();
} else {
	// Always redraw
	cur.driver.present();
}

It's a little verbose and could be improved, but it gets the job done. If vsync is off, the update loop will respect the frame rate cap. It also accounts for rendering fewer frames than the game's native frame rate (i.e. 30 fps in a 60fps game). Hopefully if anyone else runs into this problem this code will point them in the right direction on how to address!

@trethaller
Copy link
Contributor

I don't see how DX/GL could affect Sys.sleep ? Aren't these results different because of vertical sync ?

@trethaller
Copy link
Contributor

trethaller commented Dec 7, 2022

At Shiro we currently do a soft spin lock to mitigate high framerates:

		var targetDT = 1 / targetFPS;
		var safeTime = 1.0 / 1000.0;
		while(haxe.Timer.stamp() - lastFrame < targetDT - safeTime) { }

the 1ms safety margin is there to prevent VSync from missing a frame, which seems to roughly work (so disabling VSync you usually go from 60fps to something that fluctuates around 66fps)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants