Optimize province pixel processing in load_map_images #608

Spartan322 · 2025-10-31T00:16:42Z

This should slightly improve the cache-friendlyness of the pixel search at the memory cost of a practical copy of the province pixel map. Did not benchmark performance but on Linux built from GCC the relative cost of load_map_images appears to shrink.

schombert · 2025-10-31T01:26:29Z

This is not more cache friendly. In creating the new copy, you load all of the existing data into the cache, plus you end up with the additional new copy in the cache. So you end up with the same cost as the existing version, of loading the same data into the cache (if it wasn't there already) plus you consume cache space for the new copy, possibly evicting something useful in the process.

Spartan322 · 2025-10-31T01:54:20Z

Well the profiling I have actually done shows a consistent decrease of relative time spent in load_map_images with this being the only change, the data is processed into a linear allocation which would inherently speed up cache access.

schombert · 2025-10-31T02:38:37Z

The version prior to your change also has the data in a linear allocation -- see the implementation of the colour_at function and the implementation of get_pixel_index_from_pos which is return pos.x + pos.y * dims.x;; So the line
const colour_t province_colour = colour_at(province_data, pixel_index);
already resolved to

idx = get_pixel_index_from_pos(pos) * 3;
province_colour = { colour_data[idx + 2], colour_data[idx + 1], colour_data[idx] };

which is the exact same lookup pattern in terms of indexing into a contiguous linear allocation as the new version, but without making an additional copy.

schombert · 2025-10-31T02:51:29Z

One difference between the two is that colour_t stores an internal 32bit integer, meaning that there are 4 bytes per pixel stored instead of 3 bytes per pixel in the source data, meaning that the copy wastes an additional byte of space per pixel, making it less cache friendly. However, maybe the pattern of spacing the data in 4 byte strides instead of 3 byte strides enabled the compiler to do some optimization with the copy it couldn't do with the original version.

Spartan322 · 2025-10-31T03:07:17Z

Regardless the improvement while notably consistent is still fairly marginal and we only perform this calculation once, it did shrink the function in the flame graph, but I'm still undecided if the improvement is worth it hence why it was left as a draft, probably better if it was processing it with SIMD operations.

Optimize province pixel processing in load_map_images

8977e2a

Spartan322 added enhancement New feature or request performance topic:map labels Oct 31, 2025

Spartan322 marked this pull request as draft October 31, 2025 00:19

Spartan322 added the archive label Nov 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize province pixel processing in load_map_images #608

Optimize province pixel processing in load_map_images #608

Uh oh!

Spartan322 commented Oct 31, 2025 •

edited

Loading

Uh oh!

schombert commented Oct 31, 2025

Uh oh!

Spartan322 commented Oct 31, 2025

Uh oh!

schombert commented Oct 31, 2025

Uh oh!

schombert commented Oct 31, 2025

Uh oh!

Spartan322 commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Optimize province pixel processing in load_map_images #608

Are you sure you want to change the base?

Optimize province pixel processing in load_map_images #608

Uh oh!

Conversation

Spartan322 commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

schombert commented Oct 31, 2025

Uh oh!

Spartan322 commented Oct 31, 2025

Uh oh!

schombert commented Oct 31, 2025

Uh oh!

schombert commented Oct 31, 2025

Uh oh!

Spartan322 commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Spartan322 commented Oct 31, 2025 •

edited

Loading