https://gitlab.synchro.net/main/sbbs/-/commit/50258e70bf63ba7a82af7515
Modified Files:
src/sbbs3/websrvr.cpp
Log Message:
websrvr: detect TLS client disconnect in session_check() (#1155)
session_check()'s is_tls branch treated a readable socket as "connected"
and latched session->tls_pending; once set, it returned "connected" on
every later call without re-probing the socket. But a peer's TLS
close_notify (and a FIN) arrive as readable bytes, so after an HTTPS
client hung up, session_check() reported it connected forever. The
JavaScript disconnect check in js_OperationCallback (ead5ccf16) relies on session_check(), so its abort never armed (offline_counter stayed 0): a badly-formed SSJS/XJS page that loops on mswait() without checking for disconnection (e.g. the webv4 user/system stats) ran forever, pinning its http_session thread, a MaxClients slot, and a CLOSE_WAIT socket -- a pile
of zombie HTTPS clients in sbbsctrl/MQTT and eventual MaxClients
exhaustion.
Why this only bit Windows: socket_check() (xpdev) has two paths. On
non-Windows builds it uses poll() (CFLAGS += -DPREFER_POLL, set only in build/Common.gmake, i.e. the GNU-make/Unix builds). poll() reports
POLLHUP when the peer closes its end -- even while there is still buffered
data to read -- and socket_check() returns false on POLLHUP before it
ever runs the readable/MSG_PEEK logic. So on Unix the close was detected, session_check() returned false, and tls_pending never latched. Windows (MSBuild) does not define PREFER_POLL and uses select(), which has no
POLLHUP equivalent: a closing TLS socket simply looks "readable"
(MSG_PEEK returns the encrypted close_notify bytes), so the latch was set
and the disconnect masked. The session_check() bug is platform-
independent; poll()/POLLHUP merely hid it everywhere except Windows.
Fix: drop the tls_pending liveness latch. Use peeked_valid (a decrypted
byte already buffered) as the readable fast-path, and when the raw socket
is readable, probe via cryptPopData(1 byte) -- which a raw MSG_PEEK
cannot do -- to tell apart application data (connected; the byte is
cached in session->peeked so the next sess_recv() returns it), CRYPT_ERROR_TIMEOUT (connected, no app data yet) and CRYPT_ERROR_COMPLETE
(peer closed -> disconnected). The probe is non-blocking (CRYPT_OPTION_NET_READTIMEOUT == 0, set at session setup) and runs in the session's own thread, so there is no concurrent reader. Also close the
socket in place in recvbufsocket() when session_check() reports a
disconnect (it previously relied on the latch returning true and the
following sess_recv() failing).
Latch introduced in d93478b918 (famous-15-sons); the readable-as-
connected + tls_pending set predates it (dbbfabf1b1, funky-27-foam).
Validated on a production Windows server: CLOSE_WAIT count ~22 -> 0,
sbbsctrl thread count 221 -> 25, and ran overnight with no zombie HTTPS clients.
Co-Authored-By: Claude Opus 4.8 (1M context) <
noreply@anthropic.com>
---
þ Synchronet þ Vertrauen þ Home of Synchronet þ [vert/cvs/bbs].synchro.net