This was written some time ago. There’s some overlap with a Google+ post I wrote maybe around the same time. I haven’t reread it recently, so it may have mistakes or gaps.
I’ll now describe what should have been happening on X in the hope that it doesn’t go wrong on Wayland. I’m working on a demonstration, but only in my spare time and at a leisurely pace.
Chapter 4 of the dreaded ICCCM states:
In general, the object of the X Version 11 design is that clients should, as
far as possible, do exactly what they would do in the absence of a window
manager, except for the following:
* Hinting to the window manager about the resources they would like to obtain
* Cooperating with the window manager by accepting the resources they are
allocated even if they are not those requested
* Being prepared for resource allocations to change at any time”
This means that X clients should already be raising and lowering their own windows as needed. When a window manager is present, regular client attempts to raise and lower will instead be passed to it and from there the WM can enforce whatever policy is desired.
One sensible policy, for systems competing with Mac OS and MS Windows, is for a client to raise its windows when it has focus or in response to receiving focus, and never lower them. (If a palette, for example, should remain above a document, the palette should be raised rather than the document lowered.) Window managers using PointerRoot, sloppy, click-to-focus-but-only-raise-on-frame-control-clicks, or any other focus mode that would make such client raising undesirable could simply deny the ConfigureRequest.
Making drag-and-drop work as most users expect it to—or, more perhaps more accurately, not work as they don’t expect it to—requires not only not raising on that ButtonPress, but also not changing focus on it. On X this can be done with the globally active input model and a co-operating window manager. (I have not found one that does.) Actually, it’s easier than that: Forget the rest of the input models and use globally active for everything. The other models are mistakes; they are cruft from the history of computing. The globally active model is also the result of a mistake, as the ICCCM describes in Appendix B:
“There would be no need for WM_TAKE_FOCUS if the FocusIn event contained a timestamp and a previous-focus field. This could avoid the potential race condition. There is space in the event for this information; it should be added at the next protocol revision.”
XInput2 added the timestamp, but not the previous-focus field; I don’t believe that field is necessary to satisfy most user expectations.
To use the click-to-focus model of Mac OS and Windows, window managers follow some simple rules:
1) Ignore the client area.
2) Only set focus in response to:
a) proxied activation events (e.g. clicks on task lists),
b) global keybinding focus changes (e.g. Alt+Tab), or
c) global events that shift focus (e.g. changing workspaces).
Clients follow these rules:
1) Always accept focus, even if you must re-assign it.
2) Only set focus when you have a user-generated event with a timestamp or a granted focus event with a timestamp.
3) Always set focus when you receive events which the user would expect to transfer focus, such as a ButtonPress that can’t start a drag.
The idea behind WM_TAKE_FOCUS is that there are events which a client could not know about which should set focus and to notify the client of such events while providing a timestamp. Clicks in the client area are events the client knows about and has timestamps for; they should not result in a WM_TAKE_FOCUS message and they should not grabbed for by a window manager.
Every window manager I’ve tried does this the wrong way. Too harsh? Every window manager I’ve tried has a focus policy which makes impossible the implementation of drag-and-drop in accord with the expectations of most users. I believe the pattern is the same in all of them, and has even introduced other bugs which have resulted in other hacks. All of the window managers implementing click-to-focus grab the buttons on client windows or an ancestor thereof. When they receive a ButtonPress on a globally active input model window client area, they send a WM_TAKE_FOCUS client message and they may or may not pass the ButtonPress through before or after the client message. The problem here is that for everything else to work correctly, clients must set focus when they receives the client message. The button grab itself has been the source of other problems, as I recall though without specifics.
When I proposed changes along these lines 8 years ago, they were rejected for 2 reasons:
1) But clients shouldn’t do [what the ICCCM says they should do]!
2) But PointerRoot and sloppy won’t work!
For reason 1, I don’t know what to say. Perhaps something is lost in translation? Reason 2 perhaps deserves some elaboration. The simplest polite response is: This won’t interfere with that, and people using that don’t expect drag-and-drop to work in click-to-focus mode because they aren’t using click-to-focus mode. (The simplest impolite response is: So?) In the non-click-to-focus modes, the events which clients would receive prompting them to take focus do not occur until after they have already received focus. For example, a globally active window in those modes would have received a WM_TAKE_FOCUS message with the timestamp of the CrossingEvent, which must precede a ButtonEvent. (Using the appropriate timestamps should resolve any async problems.) Stacking doesn’t change from what I described at the beginning: the WM has selected the SubstructureRedirect for the root window, so it controls stacking.
There are some other things to get focus and stacking working on X as most users expect them to. So called “focus-stealing prevention” is, from what I’ve seen, nothing of the sort; without a redirect for focus management, it won’t work. Bad clients will be bad clients, and good clients aren’t a problem.
Five things seem to have been missing for “focus-stealing prevention” unless that really does mean “something that doesn’t prevent what I want it to prevent, and does prevent what I don’t want it to prevent, both randomly”:
1) WMs should put newly mapped windows lower in the stack than the focused window or ignore crossing events caused by the mapping.
2) WMs should not assign focus to newly mapped windows. (No WM_TAKE_FOCUS either.)
3) Clients should obey the three rules listed earlier.
4) Every client launching a client should provide that client with a timestamp for the event causing the launch, which the launched client can then use to set focus. (Non-clients launching clients could not provide timestamps and so clients launched that way would only receive focus after direct user action.)
5) Every client launching a client should set focus after the next event which would normally do so, even if they already have focus, so that the server last-focus-change time is updated.
The last one allows, for example, a user to launch a client from a terminal and either wait for it, in which case it will receive focus, or not wait for it, in which case (because its focus-setting timestamp will be earlier than the last-focus-change time) it will not receive focus. Launching from one client and then switching to another requires no special focus setting; the switch is enough. I haven’t gotten this far in any of the code I’ve written, but I believe that rule 5 effectively requires terminals to (re-)set focus after at least every KeyPress of the Enter key, maybe more often. If I understand this correctly, the traffic should be low: the set-focus message goes to the server, the time is updated, nothing comes back because focus hasn’t actually changed, the terminal doesn’t have to wait for anything; it’s a lot like _NET_WM_USER_TIME, but simpler, less trafficky, and (in combination with everything else I’ve described) probably obviates the need for that property.
All of that covers another focus and stacking problem: handling newly mapped windows.
I have played a little with getting launches from terminals to work right. Since there seems to be no way for the terminal to pass a timestamp into the shell it’s running, the solution I’ve devised is to cheat a little. A simple client sets a property on a window (creating that if need be) and outputs the PropertyNotify timestamp it receives. The output is then passed to launched programs as an environment variable. E.g.
$ TIMESTAMP=`get-x-timestamp` myXclient
The timestamp is a few milliseconds later than it should be, but seems sufficient in practice. It is at least earlier, therefore closer to the relevant user event, than the timestamp a client could generate for itself through the property setting method. Better would be something like SIGWINCH: The terminal could use a Linux real-time signal (or similar on other systems) to pass a timestamp in the siginfo.
A final note about legacy support. Because most, if not all, apps do not use the globally active input model, window managers can easily distinguish between old-rules apps and new-rules apps. New-rules apps will have no trouble with an old-rules window manager; the user will just not get all the app’s features. I’ve not encountered a toolkit that uses the globally active input model, and at least Gtk+ cannot be coerced into it, so I’m fairly confident the input model alone (instead of some extra window property) distinguishes app types.