Bug 6669 - glib 2.76.0 breaks Openbox
Status: ASSIGNED
Alias: None
Product: Openbox
Classification: Unclassified
Component: general
Version: 3.6
Hardware: PC Linux
: P3 normal
Assignee: Dana Jansens
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2023-03-13 17:06 EDT by karabaja4
Modified: 2024-07-13 07:28:38 EDT
4 users (show)

See Also:


Attachments
Logs for the described bug reproduce (3.25 KB, text/x-log)
2023-03-13 17:06 EDT, karabaja4
patch (1.25 KB, patch)
2023-03-17 05:29 EDT, pldubouilh
Patch for Openbox bug 6669 (1.02 KB, patch)
2023-03-19 22:16 EDT, Aaron Rainbolt

Description karabaja4 2023-03-13 17:06:36 EDT
Created attachment 3645 [details]
Logs for the described bug reproduce

There is something in glib 2.76 that makes Openbox crash when focusing/minimizing/switching between windows.

The scenarios that I managed to reproduce it with are the following:

1) Run conky in "desktop" own_window_type, open any 3D game (e.g. openjk, ioquake3, any Wine game), and execute ToggleShowDesktop. Openbox will try to focus conky and will crash.

Alternatively, switch the windows with default Alt-Tab action (NextWindow + Focus, Raise, Unshade) with conky active. Same result.

2) Run any VirtualBox VM in a fullscreen (it needs to be fullscreen before power on).

I am attaching logs for cases 1) and 2), but there is nothing useful in them.

Reverting to glib 2.74.6 fixes the issues.
Change in glib that might be relevant: https://gitlab.gnome.org/GNOME/glib/-/issues/2937

Openbox 3.6.1-9
Arch Linux 6.1.18-1-lts
nvidia 1:525.89.02-9
xorg-server 21.1.7-1
Comment 1 karabaja4 2023-03-13 17:12:09 EDT
Note that 1) does not happen with own_window_type "override", and it only happens when switching/minimizing full screen apps such as games.
Comment 2 pldubouilh 2023-03-17 05:29:49 EDT
Created attachment 3646 [details]
patch
Comment 3 pldubouilh 2023-03-17 05:35:16 EDT
Also bumped on the issue - it crashed once or twice a day, but a guaranteed repro was when someone started screensharing with zoom :)
I wrote this patch this morning, it seems to work fine for now - I'll test further over the weekend and report if I see more crashes.

for background see:
https://gitlab.xfce.org/xfce/xfce4-session/-/issues/166
https://gitlab.gnome.org/GNOME/glib/-/issues/2941
https://bbs.archlinux.org/viewtopic.php?id=284299
Comment 4 Mikachu 2023-03-17 13:18:08 EDT
Hi, classic gnome developers as usual I see. I looked at your patch and the archlinux thread and see the problem with the code though. I think your patch is better than the code avoiding the list copy in the thread, which is n^2, while the copy is linear. It makes sense to me to just copy the list too since the goal is to visit every item in the list exactly once anyway. Perhaps the best thing is to just copy the items with layer FULLSCREEN, but i really doubt it matters in practice.
Comment 5 Mikachu 2023-03-17 13:27:40 EDT
I've committed it in the work branch (rebaseable) in my repo. If you want to be credited under another name than "pldubouilh" in the commit, let me know.
Comment 6 karabaja4 2023-03-17 14:34:54 EDT
@Mikachu, @pldubouilh: out of curiosity, could you elaborate what was the exact issue with the code and why copying the list fixes it, and how it relates to glib change with removing the slice allocator?
Comment 7 Aaron Rainbolt 2023-03-19 22:16:49 EDT
Created attachment 3647 [details]
Patch for Openbox bug 6669
Comment 8 Aaron Rainbolt 2023-03-19 22:17:36 EDT
We just ran into this bug in Ubuntu 23.04 after a glib update - the above attachment that I just posted has a patch I'm testing for it. I believe this is why the bug is happening and how the patch fixes it.

Openbox maintains a list of windows organized by stacking order from highest to lowest. This is a doubly-linked list.

There is a function in Openbox called client_calc_layer that uses a pointer into this list to walk through it while modifying it. When client_calc_layer modifies the list, it also messes up the pointer that it is actively using to walk through the list. The next element that Openbox loads contains a dangling pointer, and that promptly results in a segfault when Openbox tries to dereference that pointer.

To solve the problem, I added a patch that makes Openbox save a pointer to the list element *before* the current element before modifying the list. When the list is then modified, the still-valid pointer to the previous list element is used to get a new (and actually usable!) pointer to the current list element. The old, broken pointer is then overwritten by the new, working pointer, and the segfaults stop.
Comment 9 Mikachu 2023-03-20 15:12:59 EDT
I'm not sure if your patch is sufficient, the ->next pointer of the itPrev you save could still be the element that was removed by the client_calc_layer_internal call.
Comment 10 Aaron Rainbolt 2023-03-20 18:17:00 EDT
How so? From what I understand, "it" only gets corrupted because it's pointing to an object that is removed, so the pointers it points to are no longer correct. itPrev shouldn't get removed though, so the pointers it points to should still be pointing to valid places.

However, I am somewhat of a noob when it comes to C and am not deeply familiar with Openbox's inner workings, so I'm very open to being wrong and would love to have a better patch than this.
Comment 11 Adam Sampson 2024-07-12 21:14:11 EDT
The proposed copy-the-list patch (as taken by Mikachu and Debian) ends with "g_list_free(it)", which won't have any effect, as "it" is NULL at that point. Was that meant to be "g_list_free(list)"?
Comment 12 Mikachu 2024-07-13 07:28:38 EDT
Yeah that seems to be the case, pushed a fix for that on my work branch.