6158 – Random crashes(BT attached)

Bug 6158 - Random crashes(BT attached)

Status:	NEW

Alias:	None

Product:	Sanctum 2
Classification:	Unclassified
Component:	everything
Version:	unspecified
Hardware:	PC Linux

Importance:	P3 normal
Assignee:	Ryan C. Gordon
QA Contact:	Ryan C. Gordon

URL:

Depends on:
Blocks:

Reported:	2014-05-21 01:19 EDT by slacker
Modified:	2021-10-19 12:03:39 EDT
CC List:	2 users (show)

See Also:

Attachments
gdb bt (4.71 KB, text/plain) 2014-08-01 05:13 EDT, Heiko
gdb bt full (7.58 KB, text/plain) 2014-08-02 05:13 EDT, Heiko
xdelta3 binary patch to fix the glDrawRangeElements (223 bytes, application/octet-stream) 2015-07-07 15:30 EDT, Heiko

Description slacker 2014-05-21 01:19:59 EDT

could be related to some of the other crashes they seem to be random(this one during superheavy round in outpost but it has happened at other places) and always in the renderer. finally bothered to attach gdb and heres the BT

http://pastebin.com/4zWyG1Q3

specs
 * Slackware64 14.1
 * Nvidia GeForce GTX780TI 3GB (also had them with a GTX590)
 * intel i72600k
 * 16GB ram
 
hope this helps

Comment 1 slacker 2014-05-21 20:29:16 EDT

attached gdb to 3 more crashes they all look the same if i get a unique one ill post it.

Comment 2 Heiko 2014-08-01 05:12:45 EDT

I'm in for halfway-random crashes as well on the current beta from 31.07.14. Most often I noticed them on Biolab(World1) and ComTower(World2). Though I'm on mesa, the backtrace looks quite similar, i.e. crashing around ProcessBasePassMesh_LightMapped/FDrawTranslucentMeshAction

Running the x86_64 build on Intel Quad, sufficient RAM and a Radeon HD6850 on Mesa r600g (OpenGL version string: 3.0 Mesa 10.3.0-devel (git-150ac07)). I also noticed the same problem with the x86_32 build. I've got a ~2GB coredump there...

Comment 3 Heiko 2014-08-01 05:13:23 EDT

Created attachment 3483 [details]
gdb bt

Comment 4 Heiko 2014-08-02 05:13:25 EDT

Created attachment 3484 [details]
gdb bt full

More debugging info in Mesa shows pointer corruption
  ptr = 0x7fffbd6d1700 <error: Cannot access memory at address 0x7fffbd6d1700>
the one that memcpy fails to work with.

Trying using glibc's malloc check with
  env MALLOC_CHECK_=3 ./Binaries/linux-amd64/SanctumGame
did not just not show up an error, it didn't even crash...


Valgrind shows some invalid reads in possibly the crash location
==15394== Invalid read of size 8
==15394==    at 0x4C2CCEE: memcpy@@GLIBC_2.14 (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==15394==    by 0xC5DEAE1: u_upload_data (in /usr/lib64/mesa/r600g_dri.so)
==15394==    by 0xC5E0768: u_vbuf_draw_vbo (in /usr/lib64/mesa/r600g_dri.so)
==15394==    by 0xC49C3BE: st_draw_vbo (in /usr/lib64/mesa/r600g_dri.so)
==15394==    by 0xC46F303: vbo_handle_primitive_restart (in /usr/lib64/mesa/r600g_dri.so)
==15394==    by 0xC47064F: vbo_validated_drawrangeelements (in /usr/lib64/mesa/r600g_dri.so)
==15394==    by 0xC4709A4: vbo_exec_DrawRangeElementsBaseVertex (in /usr/lib64/mesa/r600g_dri.so)
==15394==    by 0xC470A5F: vbo_exec_DrawRangeElements (in /usr/lib64/mesa/r600g_dri.so)
==15394==    by 0x1A726B1: FOpenGLDynamicRHI::EndDrawIndexedPrimitiveUP() (in /Sanctum2/Binaries/linux-amd64/SanctumGame)
==15394==    by 0x1223756: FOcclusionQueryBatcher::Flush() (in /Sanctum2/Binaries/linux-amd64/SanctumGame)
==15394==    by 0x1225AFF: FSceneRenderer::BeginOcclusionTests() (in /Sanctum2/Binaries/linux-amd64/SanctumGame)
==15394==    by 0x1237557: FSceneRenderer::RenderDPGEnd(unsigned int, unsigned int, unsigned int&, unsigned int) (in /Sanctum2/Binaries/linux-amd64/SanctumGame)
==15394==  Address 0x83ac8790 is 0 bytes after a block of size 96 alloc'd
==15394==    at 0x4C28710: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==15394==    by 0x1A6FE34: ??? (in /Sanctum2/Binaries/linux-amd64/SanctumGame)
==15394==    by 0x1A725DD: FOpenGLDynamicRHI::BeginDrawIndexedPrimitiveUP(unsigned int, unsigned int, unsigned int, unsigned int, void*&, unsigned int, unsigned int, unsigned int, void*&) (in /Sanctum2/Binaries/linux-amd64/SanctumGame)
==15394==    by 0x122362D: FOcclusionQueryBatcher::Flush() (in /Sanctum2/Binaries/linux-amd64/SanctumGame)

Disassembly even shows the creation of that 8byte memory, iirc originating from appRealloc(void*, unsigned int, unsigned int)

But my system's not fast enough to properly handle valgrind on the game

Comment 5 Heiko 2014-09-10 06:27:55 EDT

Ok, screw that pointer corruption thought. It's just an mmap'ed pointer that gdb can't display. But I think I've found the problem where memcpy faults: Sanctum2 seems to issue per-vertex attribute updates/draws with only a subset of the vertex' attributes.

#0  __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:36
#1  0x00007ffff0477170 in u_upload_data (upload=0xca2bec0, min_out_offset=0, size=15428, data=0x7fff85ac63e0, out_offset=0xb9fe584, outbuf=0xb9fe588) at util/u_upload_mgr.c:253
#2  0x00007ffff0479fb7 in u_vbuf_upload_buffers (mgr=0xb9fde30, start_vertex=0, num_vertices=203, start_instance=0, num_instances=1) at util/u_vbuf.c:989
[..]
#9  0x00007ffff02545b3 in vbo_exec_DrawRangeElements (mode=5, start=0, end=202, count=202, type=5123, indices=0x7fff85859e80) at ../../src/mesa/vbo/vbo_exec_array.c:1142
#10 0x0000000000feb34a in FMeshDrawingPolicy::DrawMesh(FMeshBatch const&, int) const ()
[..]

From within u_vbuf_upload_buffers()
(gdb) p mgr->ve->ve[0]
$68 = {src_offset = 0, instance_divisor = 0, vertex_buffer_index = 0, src_format = PIPE_FORMAT_R32G32B32_FLOAT}
(gdb) p mgr->ve->ve[1]
$64 = {src_offset = 12, instance_divisor = 0, vertex_buffer_index = 0, src_format = PIPE_FORMAT_R32G32B32_FLOAT}
(gdb) p mgr->ve->ve[2]
$65 = {src_offset = 36, instance_divisor = 0, vertex_buffer_index = 0, src_format = PIPE_FORMAT_R32G32_FLOAT}
(gdb) p mgr->ve->ve[3]
$66 = {src_offset = 60, instance_divisor = 0, vertex_buffer_index = 0, src_format = PIPE_FORMAT_R32G32B32A32_FLOAT}
(gdb) p mgr->ve->ve[4]
$67 = {src_offset = 44, instance_divisor = 0, vertex_buffer_index = 0, src_format = PIPE_FORMAT_R32G32B32A32_FLOAT}

gdb) p mgr->vertex_buffer[0]
$69 = {stride = 76, buffer_offset = 0, buffer = 0x0, user_buffer = 0x7fff85ac63e0}

(gdb) p mgr->ve.src_format_size 
$59 = {12, 12, 8, 16, 16, 0 <repeats 27 times>}

The problem being that vertex_buffer's stride is 76, but the handed vertex elements sum only up to a total of 64 (I assume there's another attribute of size 12). So there's an access with array stride of 76 into the user_buffer and probably does an overflowing read (or crash if there's some unavail memory region). 

I'm no OpenGL expert, but I couldn't find a hint in the spec if attrib subset updates are allowed. I just tested shortening the access to the user_buffer which seemed to fix the crash and also the awkward graphics corruption when starting with disabled bloom filter. I suspect the bloom post-processing does update all vertex attribs, nut a subset.

Anyway, I'll try to come up with a useful and not that huge (api)trace.

Comment 6 Heiko 2014-09-20 06:12:13 EDT

It looks like Sanctum2 (and/or UE3 port) does call glDrawRangeElements() with end/maximum_index set to the size of the element array, rather than the maximum used index (or element_array.size - 1). This probably results in "implementation-behavior":

"
void glDrawRangeElements( GLenum mode, GLuint start, 
   GLuint end, GLsizei count, GLenum type, void *indices );

Unlike the "Arrays" functions, the start and end parameters specify the minimum and maximum index values (from the element buffer) that this draw call will use (rather than a first and count-style). If you try to violate this restriction, you will get implementation-behavior (ie: rendering may work fine or you may get garbage).
" [1]

Mesa obeys the user hints of start and end, and thus references index n of an n-sized array. Which results in major rendering corruption and eventually crashing.

Could possibly be related to XCOM: EU bug [2], which happens to also use UE3.


[1] https://www.opengl.org/wiki/Vertex_Rendering
[2] https://bugs.freedesktop.org/show_bug.cgi?id=80673

Comment 7 Heiko 2015-07-07 15:30:18 EDT

Created attachment 3542 [details]
xdelta3 binary patch to fix the glDrawRangeElements

Since the current build (1.4142?) still crashes I looked into the issue again. And since putting a workaround into mesa would be cumbersome, I made a binary patch by hacking the linux-amd64/SanctumGame binary. Seems to work without glitches thus far.

Attached as an xdelta3 patch: xdelta3 -d -s SanctumGame.orig SanctumGame.xdelta SanctumGame.fixed

Comment 8 James Le Cuirot 2021-10-19 12:03:39 EDT

For the record, this is still broken, but thankfully the patch still works. Many thanks for it!

Top of page