-
Notifications
You must be signed in to change notification settings - Fork 54
[MESA]: WA the state cache flush issue #106
base: master
Are you sure you want to change the base?
Conversation
Building of 32bit mesa with meson causes issue: "implicit declaration of function ‘__builtin_ia32_clflush’". Fixed by adding msse2 compilation flag. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107843 Fixes: 314879f (i965: Fix asynchronous mappings on !LLC platforms.) Signed-off-by: Sergii Romantsov <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> (cherry picked from commit 97fcccb) [Andres Gomez: resolve trivial conflicts] Signed-off-by: Andres Gomez <[email protected]> Conflicts: src/intel/tools/meson.build
There were two bugs working together to make things mostly work: I wasn't dividing the VPM output size available by the size of a batch (vertex), but I also had the size of the VPM reduced by a factor of 8. Fixes dEQP-GLES3.functional.vertex_array_objects.all_attributes and it seems also my intermittent varying failures. Fixes: 1561e49 ("v3d: Emit the VCM_CACHE_SIZE packet.") (cherry picked from commit a91b158)
The Vulkan 1.1.81 spec says: "It is legal for offset.x + extent.width or offset.y + extent.height to exceed the dimensions of the framebuffer - the scissor test still applies as defined above. Rasterization does not produce fragments outside of the framebuffer, so such fragments never have the scissor test performed on them." Elsewhere, the Vulkan 1.1.81 spec says: "The application must ensure (using scissor if necessary) that all rendering is contained within the render area, otherwise the pixels outside of the render area become undefined and shader side effects may occur for fragments outside the render area. The render area must be contained within the framebuffer dimensions." Unfortunately, there's some room for interpretation here as to what the consequences are of having the render area set to exactly the framebuffer dimensions and having a scissor that is larger than the framebuffer. Given that GL and other APIs provide automatic clipping to the framebuffer, it makes sense that applications would assume that Vulkan does this as well. It costs us very little to play it safe and just clamp client-provided scissors to the framebuffer dimensions. Fortunately, the user is required to provide us with at least one scissor so we don't need to handle the case where they don't. Fixes: fb2a5ce "anv: Emit DRAWING_RECTANGLE once at driver..." Reviewed-by: Kenneth Graunke <[email protected]> (cherry picked from commit 465e5a8)
VI uses addrlib so it's unaffected. Cc: 18.1 18.2 <[email protected]> Tested-by: Dieter Nützel <[email protected]> (cherry picked from commit a1b9a00)
… on SI/CI Cc: 18.2 <[email protected]> Tested-by: Dieter Nützel <[email protected]> (cherry picked from commit d4e5228)
Cc: 18.1 18.2 <[email protected]> Tested-by: Dieter Nützel <[email protected]> (cherry picked from commit da72b62)
important for debugging Cc: 18.1 18.2 <[email protected]> Tested-by: Dieter Nützel <[email protected]> (cherry picked from commit 662db03)
Cc: 18.2 <[email protected]> Tested-by: Dieter Nützel <[email protected]> (cherry picked from commit a5f35aa)
Acked-by: Bas Nieuwenhuizen <[email protected]> (cherry picked from commit 34a17a4)
Reviewed-by: Bas Nieuwenhuizen <[email protected]> (cherry picked from commit 6f00785)
Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> CC: 18.2 <[email protected]> (cherry picked from commit f6e09db)
Since commit af3685d various OpenGL applications regressed on the classic mesa radeon driver. Signed-off-by: Christopher Egert <[email protected]> CC: 18.1 18.2 <[email protected]> Signed-off-by: Marek Olšák <[email protected]> (cherry picked from commit 51995f6)
This fixes the situation where we'd send a shader with just the header and no data. piglit/glsl-max-varyings test was causing this to happen, and the renderer fix was breaking it. v2: drop fprintf Fixes: a8987b8 "virgl: add driver for virtio-gpu 3D (v2)" Reviewed-by: Erik Faye-Lund <[email protected]> (cherry picked from commit 240af61)
Building of 32bit mesa with meson causes linkage issue: "undefined reference to `util_get_process_name'" Fixed by adding link-with mesa_util for xmlconfig primary. v2: Removed '[]', commit message corrected. v3: Reverted changes in gbm and glx libraries. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107843 Fixes: 2e1e651 "util: extract get_process_name from xmlconfig.c" Cc: Marek Olšák <[email protected]> Cc: Dylan Baker <[email protected]> Signed-off-by: Sergii Romantsov <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]> (cherry picked from commit bbe551f)
We don't need to wait before drawing to the fake front buffer, as front buffer rendering by definition is allowed to produce artifacts. Fixes hangs in some cases when re-using the fake front buffer, due to it still being busy (i.e. in use for presentation). Cc: [email protected] Bugzilla: https://bugs.freedesktop.org/106404 Bugzilla: https://bugs.freedesktop.org/107757 Tested-by: Olivier Fourdan <[email protected]> Reviewed-by: Thomas Hellstrom <[email protected]> (cherry picked from commit aefac10)
Otherwise they are not exported. CC: 18.2 <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected] Signed-off-by: Samuel Pitoiset <[email protected]> (cherry picked from commit d4bf954)
It's actually just the opposite. This fixes the new Sascha conditionalrender demo. CC: 18.2 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> (cherry picked from commit abdf396)
Bumping to 64 should be safe enough. Fixes some crashes with new CTS: dEQP-VK.binding_model.descriptorset_random.* CC: 18.2 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> (cherry picked from commit b9f6521)
This was wrong for descriptor #0 when all of them are indirect. This is because indirect_offset was 0 and we emitted a "normal" descriptor pointer for nothing. While we are at it remove radv_userdata_info::indirect_offset which is useless. CC: 18.2 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> (cherry picked from commit aa30205)
LLVM 6 isn't affected. Fixes GPU hangs with new CTS: dEQP-VK.binding_model.descriptorset_random.* CC: 18.2 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> (cherry picked from commit 063264d)
Let say, we first bind a graphics pipeline that needs indirect descriptors sets. The userdata pointers will be emitted at draw time. Then if we bind a compute pipeline that doesn't need any indirect descriptors, the driver will re-emit them for all grpahics stages. To avoid this to happen, just check the bind point type. CC: 18.2 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> (cherry picked from commit 748f4cc)
Indirect descriptors only need one entry, we don't have to emit a location for every descriptors. Fixes GPU hangs with new CTS: dEQP-VK.binding_model.descriptorset_random.* CC: 18.2 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> (cherry picked from commit 9de062e)
When using Freecad, I was getting intermittent segfaults inside of mesa. I traced it down to this path in st_cb_drawpixels.c where the result of pipe_transfer_map wasn't being checked. In my case, it was returning NULL because nouveau_bo_new returned ENOENT. I'm by no means a mesa developer, but this patch solves the problem for me and seems reasonable enough. v2: Marek - also unmap the PBO and release the texture, and call the make_texture function sooner for less cleanup Cc: 18.1 18.2 <[email protected]> (cherry picked from commit 936e0dc)
…macro Fixes the following building error, happening when building both intel and broadcom: Gen Header: libmesa_broadcom_genxml_32 <= v3d_packet_v21_pack.h FAILED: gen/STATIC_LIBRARIES/libmesa_broadcom_genxml_intermediates/broadcom/cle/v3d_packet_v21_pack.h /bin/bash -c "python external/mesa/src/broadcom/cle/gen_pack_header.py \ external/mesa/src/broadcom/cle/v3d_packet_v21.xml \ > gen/STATIC_LIBRARIES/libmesa_broadcom_genxml_intermediates/broadcom/cle/v3d_packet_v21_pack.h" Traceback (most recent call last): File "external/mesa/src/broadcom/cle/gen_pack_header.py", line 626, in <module> p = Parser(sys.argv[2]) IndexError: list index out of range header-gen macro is already defined by Intel genxml building rules and the existing header-gen does not have the $(PRIVATE_VER) argument, infact the bash command line logged in the building error is missing exactly $(PRIVATE_VER) argument Renaming the macro as pack-header-gen in src/broadcom/Android.genxml.mk solves the building error, another possible way is to keep the gen rules commands expanded and not use the macros. Fixes: 7f80a9f ("vc4: Introduce XML-based packet header generation like Intel's.") Cc: "18.2" <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Signed-off-by: Mauro Rossi <[email protected]> (cherry picked from commit 3341429)
Fixes the following building error: In file included from external/mesa/src/broadcom/cle/v3d_decoder.c:38: In file included from external/mesa/src/broadcom/cle/v3d_packet_helpers.h:29: external/mesa/src/gallium/auxiliary/util/u_math.h:42:10: fatal error: 'pipe/p_compiler.h' file not found ^~~~~~~~~~~~~~~~~~~ 1 error generated. Fixes: 5b10216 ("broadcom/genxml: Introduce a V3D packet/struct decoder.") Cc: "18.2" <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Signed-off-by: Mauro Rossi <[email protected]> (cherry picked from commit 9158e0b)
Fixes the following building error in vc4 build: In file included from external/mesa/src/gallium/drivers/vc4/kernel/vc4_render_cl.c:34: In file included from external/mesa/src/gallium/drivers/vc4/kernel/vc4_drv.h:27: In file included from external/mesa/src/gallium/drivers/vc4/vc4_simulator_validate.h:34: In file included from external/mesa/src/gallium/drivers/vc4/vc4_context.h:39: In file included from external/mesa/src/gallium/drivers/vc4/vc4_cl.h:56: gen/STATIC_LIBRARIES/libmesa_broadcom_genxml_intermediates/broadcom/cle/v3d_packet_v21_pack.h:12:10: fatal error: 'cle/v3d_packet_helpers.h' file not found ^~~~~~~~~~~~~~~~~~~~~~~~~~ 1 error generated. Fixes: 5b10216 ("broadcom/genxml: Introduce a V3D packet/struct decoder.") Cc: "18.2" <[email protected]> Acked-by: Eric Anholt <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Signed-off-by: Mauro Rossi <[email protected]> (cherry picked from commit cc3b99b)
Otherwise using 32 user SGPRs would be broken. CC: <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> (cherry picked from commit d97c892)
Apparently for compute there are only 16 instead of the 32 for the graphics path. Fixes dEQP-VK.binding_model.descriptorset_random.sets16.noarray.ubolimitlow.sbolimitlow.imglimitlow.noiub.comp.0 CC: <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> (cherry picked from commit 0dd8189)
If we end up never taking the loop that writes ret, we can end up with an uninitialized value, and if we're *really* unlucky, that value can be -1, causing us to go down an error-path instead of a success path. This was obviously not intended, so let's just initialize this to zero. Noticed by Valgrind: Conditional jump or move depends on uninitialised value(s) at 0xBA640A0: virgl_drm_winsys_resource_cache_create (virgl_drm_winsys.c:348) by 0xBA62FCF: virgl_buffer_create (virgl_buffer.c:170) by 0xBA605AC: virgl_resource_create (virgl_resource.c:60) by 0xBCF816F: bufferobj_data (st_cb_bufferobjects.c:344) by 0xBCF816F: st_bufferobj_data (st_cb_bufferobjects.c:390) by 0xBB7E836: vbo_use_buffer_objects (vbo_exec_api.c:1136) by 0xBCFCC6E: st_create_context_priv (st_context.c:414) by 0xBCFD3CD: st_create_context (st_context.c:590) by 0xBBB30CA: st_api_create_context (st_manager.c:896) by 0xB981E76: dri_create_context (dri_context.c:155) by 0xB97BDCE: driCreateContextAttribs (dri_util.c:473) by 0x5288331: dri3_create_context_attribs (dri3_glx.c:309) by 0x5264D64: glXCreateContextAttribsARB (create_context.c:78) Fixes: a8987b8 ("virgl: add driver for virtio-gpu 3D (v2)") Signed-off-by: Erik Faye-Lund <[email protected]> (cherry picked from commit eaa7185)
Those operations do not map to actual hardware instructions, therefore those should always be lowered to 32-bit instructions. Fixes: 009c54a "nv50/ir: Split 64-bit integer MAD/MUL operations" Signed-off-by: Pierre Moreau <[email protected]> Reviewed-by: Karol Herbst <[email protected]> Signed-off-by: Karol Herbst <[email protected]> (cherry picked from commit 21b92b3)
Gen9 hardware requires some workarounds to disable preemption depending on the type of primitive being emitted. We implement this by adding a new atom that tracks BRW_NEW_PRIMITIVE. Whenever it happens, we check the current type of primitive and enable/disable object preemption. For now, we just ignore blorp. The only primitive it emits is 3DPRIM_RECTLIST, and since it's not listed in the workarounds, we can safely leave preemption enabled when it happens. Or it will be disabled by a previous 3DPRIMITIVE, which should be fine too. Signed-off-by: Rafael Antognolli <[email protected]> Cc: Kenneth Graunke <[email protected]> (am from https://patchwork.freedesktop.org/patch/210952/)
Apparently, we're supposed to look at the texture object's built-in sampler object's sRGB decode setting in order to decide whether to decode/downsample/re-encode, or simply downsample as-is. Previously, I had always done the decoding/encoding. Fixes SKQP's Skia_Unit_Tests.SRGBMipMaps test. Reviewed-by: Tapani Pälli <[email protected]> (cherry picked from commit 337a808)
…pport Fixes Skqp's unitTest_EGLImageTest test. For Intel platforms, we support external textures only for EGLImages created with EGL_EXT_image_dma_buf_import. This restriction seems to be Intel specific and not present for other platforms. While running SKQP test - unitTest_EGLImageTest, GL_INVALID is sent to the test because of this restriction. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105301 Signed-off-by: Aditya Swarup <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Chad Versace <[email protected]> (cherry picked from commit a5c39ed)
…ort it on gen8+ Dual source blending behaviour is undefined when shader doesn't have second color output, dismissing fragment in such situation leads to a hang on gen8+ if depth test in enabled. Since blending cannot be gracefully fixed in such case and the result is undefined - blending is simply disabled. v2 (Kenneth Graunke): - Listen to BRW_NEW_FS_PROG_DATA in 3DSTATE_PS_BLEND - Also whack BLEND_STATE[] to keep the two in sync, since we're not sure exactly which copy of the redundant info the hardware will use. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107088 Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> (cherry picked from commit eca4a65)
(cover letter https://patchwork.freedesktop.org/series/51006/) FROMLIST: i965: SIMD32 heuristics debug flag Added a new DEBUG_HEUR32 flag to INTEL_DEBUG flags for enabling SIMD32 selection heuristics. (am from https://patchwork.freedesktop.org/patch/256764/) FROMLIST: i965: SIMD32 heuristics control data Added a new structure for holding SIMD32 heuristics control data. The control data itself will be fetched from drirc. (am from https://patchwork.freedesktop.org/patch/256806/) FROMLIST: i965: SIMD32 heuristics control data from drirc To be able to test the heuristics with different parameters, they can be controlled via environment variables through drirc. (am from https://patchwork.freedesktop.org/patch/256788/) FROMLIST: mesa: Helper functions for counting set bits in a mask (am from https://patchwork.freedesktop.org/patch/256765/) FROMLIST: i965/fs: Save the instruction count of each dispatch width The SIMD32 selection heuristics will use this information for deciding whether SIMD32 shaders should be used. (am from https://patchwork.freedesktop.org/patch/256793/) FROMLIST: i965/fs: SIMD32 selection heuristic based on grouped texture fetches The function goes through the compiled shader and checks how many grouped texture fetches there are. This is a simple heuristic which gets rid of most of the regressions when enabling SIMD32 shaders but still retains some of the benefits. (am from https://patchwork.freedesktop.org/patch/256798/) FROMLIST: i965/fs: Enable all SIMD32 heuristics There are three simple heuristics for SIMD32 shader enabling: - How many MRTs does the shader write into? - How many grouped texture fetches does the shader have? - How many instructions does the SIMD32 shader have compared to the SIMD16 shader? For testing purposes, the heuristics can be controlled via these environment variables: simd32_heuristic_mrt_check - Enables MRT write check - Default: true simd32_heuristic_max_mrts - How many MRT writes the heuristic allows - Default: 1 simd32_heuristic_grouped_check - Enables grouped texture fetch check - Default: true simd32_heuristic_grouped_sends - How many grouped texture fetches the heuristic allows - Default: 6 simd32_heuristic_inst_check - Enables SIMD32 vs. SIMD16 instruction count check - Default: true simd32_heuristic_inst_ratio - SIMD32 vs. SIMD16 instruction count ratio the heuristic allows - Default: 2.3 SIMD32 shaders will not be compiled also when SIMD16 compilation fails or spills. (am from https://patchwork.freedesktop.org/patch/256766/)
Hi @tpalli, does this change look good for you? |
@renchenglei Why it include so many commits here? totally 44 files changed |
@js0701 This may be caused by one new round of mesa rebase, @YangShi-Intel will help rebase the PR. :) |
On Gordenpeak platform, this state flush function have problem. It will cause GPU read some garbage data sometime. So add time wait after cache flush. After stress test, It can keep GPU read the right state data now. Will continue to optimize this solution. Signed-off-by: Yang Shi <[email protected]>
Yeah, this is the best solution we have for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Autobuild started from pull-request-approved on this PR.
FAILURE: CheckBug Bad comments/Bugs
For more information, see: /absp/builders/celadon-autobuild/builds/89
device->no_hw = physical_device->no_hw; | ||
device->lost = false; | ||
device->needwait = false; | ||
sprintf(procPath,"/proc/%d/comm",getpid()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does skqp populate device->instance->app_info.app_name? If so, that's what we should run the strcmp against and avoid the fread.
char procPath[WAIT_DETECT_MAX_LEN]={0}; | ||
char procName[WAIT_DETECT_MAX_LEN]={0}; | ||
int count = 0; | ||
int need = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused?
034aaf2
to
31938ce
Compare
a69fa2f
to
6959b05
Compare
On Gordenpeak platform, this state flush function have problem.
It will cause GPU read some garbage data sometime.
So add time wait after cache flush.
After stress test, It can keep GPU read the right state data now.
Will continue to optimize this solution.
Signed-off-by: Yang Shi [email protected]