Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TileMap update_bitmask_region() poor performance #25731

Closed
danboo opened this issue Feb 9, 2019 · 6 comments
Closed

TileMap update_bitmask_region() poor performance #25731

danboo opened this issue Feb 9, 2019 · 6 comments

Comments

@danboo
Copy link
Contributor

danboo commented Feb 9, 2019

Godot version:
3.1 beta 3

OS/device including version:
Windows 10

Issue description:
Working with a large (millions of cells) TileMap I found that updating the autotile bitmask using the easiest, built-in method was slower than I could afford. After trying some alternate methods I found faster ways to update, but the result was unintuitive.

On my computer with a 1000x1000 TileMap I got the following results:

  1. tilemap.update_bitmask_region() - 7.1 seconds

    This is letting the engine update the entire TileMap by calling the method with no arguments. I'd actually expect this to be the fastest way to update the tilemap since it isn't iterating through a GDScript loop and calling into engine code. It is wholly contained in the C++ code.

  2. tilemap.update_bitmask_region(Vector2(min_x, min_y), Vector2(max_x, max_y)) - 1.7 seconds

    I thought this would have the same performance as the default call in example 1, just with explicit bounds.

  3. tilemap.update_bitmask_area(Vector2(col_i, row_i)) - 0.9 seconds

    In this method I iterate through the rows and columns in GDScript, calling this method for every 3rd cell. I thought this would be the slowest method since it has the overhead of the GDScript loop and method calls. The gist of it is:

	for i in range(0, dimensions.y+1, 3):
		for j in range(0, dimensions.x+1, 3):
			tilemap.update_bitmask_area(Vector2(j,i))

I'm not surprised that updating a million tiles takes a measurable amount of time, just that the method of iterating and calling methods in GDScript was faster than the single-call methods that stay within the C++ code. It seems like implementing method 3 in C++ might even be that much more efficient.

Steps to reproduce:
Install the referenced project and use the 3 buttons to try each method.

Minimal reproduction project:
https://github.com/danboo/Godot-SlowTileMapBitmaskUpdate

@zzz-assault
Copy link
Contributor

The issue is still the same in 3.2.1

@sian2005
Copy link

Yeah this is pretty laggy, it takes my pc 7 seconds to update the bitmask on a 500 by 500 tiles area.

@zzz-assault
Copy link
Contributor

Just had a look onto the source code (I'm no C++ programmer, but it's understandable). Regarding the huge difference between point 1 and 2, this is related to a fallback if no vector2 (start & end) are given, which is using the tilemap.update_bitmask_area() for all tiles, therefor resulting in a nested double loop in the already existing double loop of tilemap.update_bitmask_region().

Triggered by the second part of the if statement (p_end.x == p_start.x && p_end.y == p_start.y).

void TileMap::update_bitmask_region(const Vector2 &p_start, const Vector2 &p_end) {

	if ((p_end.x < p_start.x || p_end.y < p_start.y) || (p_end.x == p_start.x && p_end.y == p_start.y)) {

		int i;

		Array a = get_used_cells();

		for (i = 0; i < a.size(); i++) {

			// update_bitmask_area() in order to update cells adjacent to the

			// current cell, since ordering in array may not be reliable

			Vector2 vector = (Vector2)a[i];

			update_bitmask_area(Vector2(vector.x, vector.y));

		}

		return;

	}

@zzz-assault
Copy link
Contributor

zzz-assault commented Apr 26, 2020

For 2 a short workaround / performance trick :)

As long as the tilemap is not loaded to scene tree (add child) it's thread save, which means, we can use multi threading via GD-Script (THREAD). This also applies to the bitmask function, so we can use the below code, to boost the performance by the nr. of available CPU threads.

Sample Code for use of THREAD:

const map_size : Vector2 = Vector2(1024, 1024)
var map_threads : Array = []
var threads_finished_count : int = 0
var sample_tilemap : TileMap = load("res://...")

func _ready() -> void:
	for i in range (10):
		map_threads.append(Thread.new())
	_update_bitmask(sample_tilemap)

func _update_bitmask(sample_tilemap : TileMap) -> void:
	threads_finished_count = 0
	for i in range (map_threads.size()):
		var thread_data : Array = [] as Array
		thread_data.append(Tilemap)
		thread_data.append(i)
		map_threads[i].start(self, "_update_bitmask_part", thread_data)

func _update_bitmask_part(thread_data : Array) -> void:
	thread_data[0].update_bitmask_region(Vector2(0, int(map_size.y / map_threads.size() * (thread_data[1]))), Vector2(map_size.x, int(map_size.y / map_threads.size() * (thread_data[1] + 1))))
	call_deferred("_update_bitmask_finished", thread_data)
	
func _update_bitmask_finished(thread_data : Array) -> void:
	map_threads[thread_data[1]].wait_to_finish()
	threads_finished_count += 1
	if threads_finished_count == map_threads.size():
		get_parent().add_child(thread_data[0])

This is copied out of my project (slightly simplified), I'm sure it would be easier to show, but thats basicly how it works :)

@zzz-assault
Copy link
Contributor

Regarding issue 1, I thought about the code and I think the "update_bitmask_area()" in the fallback makes no sense (it will update the cell + all adjacent cells), but with "get_used_cells()" we will anyhow get all cells which have a tile used and also the ordering is irrelevant, as "update_cell_bitmask()" is not affected by the ordering of the array. Breaking it down, the use of "update_bitmask_area()" is unnecessarily updating plenty of cells mutliple times -> therefore the high computing time!

So I would recommend to change "update_bitmask_area()" in the fallback to "update_cell_bitmask()", then the computing time between issue 1 & 2 should be the same.

@danboo
Copy link
Contributor Author

danboo commented May 7, 2020

I had a chance to pull and build the latest 3.2 branch and confirmed that the performance of method 1 was much improved. On my build method 1, at 1.1 seconds, was actually faster than method 2, at 1.6 seconds, implying that fetching and iterating over the array is faster than the double nested for loop.

Thanks @zzz-assault !

@danboo danboo closed this as completed May 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants