feat(cli,aura-gui): v3.4.4 — graceful Shutdown via admin socket (#1)

Closes the long-standing "GUI Disconnect button doesn't actually kill aura"
bug. The previous kill path sent SIGTERM to sudo (our direct child) and
hoped sudo's signal forwarding would propagate to the aura child running
as root; in practice this is unreliable when the parent has no controlling
terminal (which Tauri-spawned children don't), so aura would survive the
"Disconnect" click with the TUN still up and the OS routes still installed.

## Implementation

Adds a `Shutdown` admin-socket request. The aura-cli main loops
(`client::run` and `server::run`) now `tokio::select!` between their normal
work (router.run() / accept loop) and a `tokio::sync::Notify` carried on
the shared `AdminState`. When an admin client posts `{"cmd":"shutdown"}`
the handler calls `state.shutdown.notify_one()`, the select! second arm
fires, the work future is dropped, `OsRouteGuard::Drop` rolls back the
installed system routes, and the process exits with `Ok(())` — clean exit
code 0, kernel reaps the TUN device, no orphan.

The whole round-trip is sub-500 ms in practice (the slow step is the
`route delete` invocations on macOS).

## What changed

* `aura-cli/src/admin.rs`: `Request::Shutdown` variant, `AdminState.shutdown:
  Arc<Notify>` field, handler that calls `notify_one()` + returns `Response::ok()`.
* `aura-cli/src/client.rs`: clones `admin_state.shutdown` before spawning the
  admin server task, then `tokio::select!`s between `router.run()` and
  `shutdown.notified()`. Whichever finishes first ends the function; OsRouteGuard
  Drop runs after.
* `aura-cli/src/server.rs`: same pattern around the `MultiServer::accept` loop —
  graceful exit on admin Shutdown leaves the accept loop, breaks, and the
  router_task is aborted on function return.
* `aura-cli/src/main.rs`: `aura shutdown --admin-socket <path>` subcommand for
  CLI control (also useful from launchd/systemd post-stop hooks).
* `aura-gui/src-tauri/src/admin.rs`: new `send_shutdown(path)` helper; factored
  out `round_trip()` for the common write-line + read-line pattern. Windows
  stub returns "not implemented".
* `aura-gui/src-tauri/src/cli_proc.rs`: `ClientHandle::kill` now tries admin
  Shutdown first (3 s poll for graceful exit), then SIGTERM to sudo (2 s),
  then SIGKILL as last resort. The admin path needs no sudo because the
  socket is already chmod 0666 from v3.4.1.

## Test

New `admin::tests::shutdown_request_fires_notify` unit test: spawns a
notified() waiter, calls `handle_request(Request::Shutdown)`, asserts the
waiter wakes within 200 ms. Combined with the existing 5 admin tests, all 6
pass.

`cargo test --workspace` — all green, `cargo clippy --workspace --all-targets
-- -D warnings` — clean.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
xah30
2026-05-29 22:18:00 +03:00
parent 96c30ff01c
commit 1f82bc41c0
6 changed files with 197 additions and 34 deletions
+50 -2
View File
@@ -44,7 +44,7 @@ use aura_tunnel::{PacketCounters, RouteAction, RouteTable};
use ipnetwork::IpNetwork;
use serde::{Deserialize, Serialize};
use tokio::io::{AsyncBufReadExt, AsyncWriteExt, BufReader};
use tokio::sync::RwLock;
use tokio::sync::{Notify, RwLock};
use crate::config::parse_action;
@@ -132,10 +132,20 @@ pub struct AdminState {
pub mirror: Arc<RuleMirror>,
/// Live tunnel statistics.
pub stats: Arc<Stats>,
/// Shutdown signal — when a `Shutdown` admin request arrives, the handler calls
/// `shutdown.notify_one()` and the main client / server loop's `tokio::select!` listening on
/// `shutdown.notified()` returns, letting `OsRouteGuard::Drop` run and the process exit
/// cleanly. This is the v3.4.4 fix for "GUI Disconnect button doesn't kill aura": sudo's
/// signal forwarding from a non-tty Tauri-spawned parent is unreliable, so instead of sending
/// SIGTERM through sudo we just talk to the already-chmod-666 admin socket the GUI process
/// can write to as its own user.
pub shutdown: Arc<Notify>,
}
impl AdminState {
/// Construct admin state from a shared table and stats, seeding the mirror from the given rules.
/// Construct admin state from a shared table and stats, seeding the mirror from the given
/// rules. Creates a fresh `shutdown` signal; clone the resulting `AdminState::shutdown` into
/// the main loop's `tokio::select!` to listen for `Shutdown` admin requests.
pub fn new(
routes: Arc<RwLock<RouteTable>>,
stats: Arc<Stats>,
@@ -146,6 +156,7 @@ impl AdminState {
routes,
mirror: Arc::new(RuleMirror::from_rules(cidrs, domains)),
stats,
shutdown: Arc::new(Notify::new()),
}
}
}
@@ -176,6 +187,13 @@ pub enum Request {
},
/// Query tunnel statistics.
Status,
/// v3.4.4: Ask the running client/server to shut down gracefully. The handler signals the
/// main `tokio::select!` loop via [`AdminState::shutdown`] and returns OK immediately; the
/// process then exits after running `OsRouteGuard::Drop` etc. The GUI uses this instead of
/// sending SIGTERM through sudo (sudo's signal-forwarding from a non-tty Tauri-spawned
/// parent is unreliable and the previous kill path would leave the aura child orphaned with
/// the TUN still up).
Shutdown,
}
/// One CIDR rule in a `route_list` response.
@@ -372,6 +390,16 @@ pub async fn handle_request(state: &AdminState, req: Request) -> Response {
..Response::ok()
}
}
Request::Shutdown => {
// v3.4.4: signal the main client/server loop via the shared `Notify`. We don't wait
// here — the request returns immediately so the GUI's send-Shutdown round-trip
// doesn't get stuck behind OsRouteGuard::Drop (which can take a second or two on
// macOS as it issues multiple `route delete` commands). The caller then watches the
// process pid: it exits cleanly within a few hundred ms.
tracing::info!("shutdown requested via admin socket");
state.shutdown.notify_one();
Response::ok()
}
}
}
@@ -760,4 +788,24 @@ mod tests {
#[cfg(windows)]
assert_eq!(DEFAULT_SOCKET, r"\\.\pipe\aura-admin");
}
/// v3.4.4: `Request::Shutdown` signals the shared `Notify` so a caller listening on
/// `state.shutdown.notified()` can wake up and exit cleanly. Confirms the wire <-> shutdown
/// link is wired correctly; the actual select! in `client::run` / `server::run` exercises
/// the Notify in integration tests / live runs.
#[tokio::test]
async fn shutdown_request_fires_notify() {
let st = state();
let notify = Arc::clone(&st.shutdown);
// Spawn a waiter — it should resolve as soon as the Shutdown handler fires.
let waiter = tokio::spawn(async move { notify.notified().await });
let resp = handle_request(&st, Request::Shutdown).await;
assert!(resp.ok, "shutdown returned !ok: {resp:?}");
// Bounded timeout — the notify_one() in the handler should be immediate.
let res = tokio::time::timeout(std::time::Duration::from_millis(200), waiter).await;
assert!(
res.is_ok(),
"shutdown waiter did not wake within 200ms; Notify wasn't signalled"
);
}
}
+18 -1
View File
@@ -308,6 +308,12 @@ pub async fn run(config_path: &Path, admin_socket: &str) -> anyhow::Result<()> {
cidr_mirror,
domains.clone(),
);
// v3.4.4: clone the shutdown signal so the main router-select below can listen for it. When
// the GUI sends `{"cmd":"shutdown"}` over the admin socket, the admin handler signals this
// Notify, the select! arm fires, router.run() future is dropped (releasing TUN, inbound
// tasks, etc), and then OsRouteGuard's Drop runs and rolls back the OS routes — all before
// process exit. No SIGTERM-through-sudo race.
let shutdown = Arc::clone(&admin_state.shutdown);
let admin_path = admin_socket.to_string();
tokio::spawn(async move {
if let Err(e) = admin::serve(&admin_path, admin_state).await {
@@ -419,7 +425,18 @@ pub async fn run(config_path: &Path, admin_socket: &str) -> anyhow::Result<()> {
// Wire the same atomic counters the admin socket reads (via the `Stats` clone above) into the
// router so `aura status` shows live tx/rx numbers.
let router = AuraRouter::with_stats(tun, routes, conn, Some(stats.counters()));
let run_result = router.run().await.context("router run loop");
// v3.4.4: race the router loop against the admin shutdown notify. Whichever one finishes
// first ends the function; OsRouteGuard's Drop on the `_os_routes_guard` binding runs after
// this returns, rolling back the system routes. Graceful disconnect via admin is now a
// single round-trip: GUI posts `{"cmd":"shutdown"}`, admin handler notifies, select! fires
// the second arm, router future is dropped, routes are reverted, process exits cleanly.
let run_result = tokio::select! {
r = router.run() => r.context("router run loop"),
_ = shutdown.notified() => {
tracing::info!("graceful shutdown via admin socket; rolling back OS routes");
Ok(())
}
};
// _os_routes_guard drops here, rolling back any installed system routes.
run_result
}
+19
View File
@@ -50,6 +50,13 @@ enum Command {
/// Query a running client/server for tunnel status via the admin socket.
Status(AdminConnArgs),
/// v3.4.4: Ask a running client/server to shut down gracefully via the admin socket. The
/// process runs its `OsRouteGuard::Drop` to roll back installed system routes before
/// exiting; the kernel reaps the TUN device on close. Used by the GUI's Disconnect button
/// (talks to the chmod-666 admin socket without needing sudo) and useful from a terminal
/// when systemctl / launchctl aren't appropriate.
Shutdown(AdminConnArgs),
/// Quick crypto micro-benchmarks (KEM keygen/encaps/decaps, full handshake, AEAD).
BenchCrypto,
@@ -339,6 +346,7 @@ async fn main() -> anyhow::Result<()> {
Command::Client(args) => client::run(&args.config, &args.admin_socket).await,
Command::Route(cmd) => run_route(cmd).await,
Command::Status(args) => run_status(&args.admin_socket).await,
Command::Shutdown(args) => run_shutdown(&args.admin_socket).await,
Command::BenchCrypto => bench::run(),
Command::ServerInit(args) => run_server_init(args),
Command::ProvisionClient(args) => run_provision_client(args),
@@ -580,6 +588,17 @@ async fn run_status(admin_socket: &str) -> anyhow::Result<()> {
Ok(())
}
/// v3.4.4: dispatch `aura shutdown` over the admin socket.
async fn run_shutdown(admin_socket: &str) -> anyhow::Result<()> {
let resp = admin::request(admin_socket, &Request::Shutdown).await?;
if !resp.ok {
anyhow::bail!("shutdown failed: {}", resp.error.unwrap_or_default());
}
println!("shutdown signal sent; the running client/server is rolling back its routes and \
exiting (typically <500 ms).");
Ok(())
}
/// Print a generic admin response (ok / error, with optional `removed`).
fn print_response(resp: admin::Response) {
if resp.ok {
+17 -3
View File
@@ -280,6 +280,11 @@ pub async fn run(config_path: &Path, admin_socket: &str) -> anyhow::Result<()> {
std::iter::empty(),
std::iter::empty(),
);
// v3.4.4: clone the shutdown signal so the accept loop below can break out of accept() when
// an admin `Shutdown` request arrives. Lets operators stop the server gracefully via
// `aura shutdown --admin-socket /run/aura-admin.sock` instead of `systemctl stop aura.service`
// when they want to test on a live host without disturbing the unit file.
let shutdown = Arc::clone(&admin_state.shutdown);
let admin_path = admin_socket.to_string();
tokio::spawn(async move {
if let Err(e) = admin::serve(&admin_path, admin_state).await {
@@ -376,9 +381,18 @@ pub async fn run(config_path: &Path, admin_socket: &str) -> anyhow::Result<()> {
// others on the same listening port. Non-UDP transports (TCP, QUIC) skip rendezvous in
// v3.1; only UDP is supported as a hop transport.
loop {
let next = {
let mut srv = server.lock().await;
srv.accept().await
let next = tokio::select! {
n = async {
let mut srv = server.lock().await;
srv.accept().await
} => n,
// v3.4.4: graceful shutdown via admin socket. Breaks out of the accept loop without
// waiting for the next connection. router_task.abort() + the NatGuard / mask-rotator
// Drop run on return.
_ = shutdown.notified() => {
tracing::info!("server shutdown requested via admin socket; exiting accept loop");
break;
}
};
let Some(accepted) = next else { break };
let peer_id = accepted.peer_id.clone();