feat(transport,cli,tunnel): v3.4 port auto-detect + bug fixes from live test

Live macOS test against the production server uncovered six bugs (one of which
turned out to be a port collision with sing-box, not a real bug); this commit
addresses all of them and adds v3.4 port discovery so the same collision is
handled transparently next time.

## v3.4 server port-discovery

- Defaults moved off 443/444 to 8443/8443/8444 (TransportSection::default,
  ServerInitOpts, ProvisionClientOpts, CLI flags). 443 is heavily contested in
  practice (sing-box, Hysteria2, reverse proxies) and the previous default
  silently lost the bind when a co-tenant was already there.
- MultiServer::bind_with_outer_or_scan: scans forward up to
  DEFAULT_PORT_SCAN_MAX (20) candidates per transport when the requested port
  is occupied; QUIC keeps walking if it lands on the custom-UDP port.
- MultiServer::bound_addrs(): the actual addresses each transport bound to.
- Server logs the bound addresses and writes a runtime snapshot
  (server.toml.runtime.json) when they differ from the requested ones, so
  `aura sign-bridges` can re-sign the bridges manifest later.
- BridgeManifest gains an optional `endpoints: Vec<BridgeEndpoint>` field
  with per-transport ports. Backward-compatible: old v3.3 clients ignore the
  field and continue to use the v1 `bridges` line.
- `aura sign-bridges --endpoints HOST:tcp=N:quic=N:udp=N` to mint v3.4
  manifests; bridges line is auto-synthesised for v3.3 clients.

## Bug fixes from the live test

- macOS TUN naming (#41): the tun crate rejects names that don't match
  ^utun[0-9]+$. On macOS we now substitute `""` (kernel auto-assigns utunN),
  capture the assigned name via inner.tun_name(), and propagate it through to
  os_routes::OsRouteGuard::install — so `route add -interface utunN` uses
  the real interface, not "aura0".
- Packet counters (#42): Stats { tx_packets, rx_packets } are now actually
  bumped by the data path. `aura status` shows live numbers instead of
  permanent zeros.
- render_client_toml schema (#44): provisioner emits proper
  `[[tunnel.split.vpn]] cidr = "..."` / `[[tunnel.split.direct]]` blocks from
  new --vpn-cidrs / --direct-cidrs flags. The v3.3 `vpn_cidrs = [...]` flat
  array was silently ignored by serde, leaving users with `rules: 0` even
  when their CIDRs looked right.
- #43 / #46 (TCP/443 dial early-eof / no payload back): diagnosed as the
  sing-box port collision, not an Aura bug. The v3.4 port-scan path makes it
  go away — the server picks a free port and clients learn it from the
  manifest.

## Test coverage

Three new unit tests for the port-scanner (UDP busy, TCP busy, zero budget);
two new tests for v3.4 BridgeManifest round-trip with endpoints; one
integration test for the new `[[tunnel.split.vpn]]` rendering; tests for the
runtime-state file write/read round-trip; agent-added router-counter tests
in aura-tunnel/tests/routes.rs.

cargo test --workspace, cargo clippy --workspace -- -D warnings, and
cargo fmt --check all pass.

#45 (silent client exit when underlying QUIC transport breaks) is still
outstanding — needs deeper investigation; deferred to a follow-up.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
xah30
2026-05-29 17:14:45 +03:00
parent a173ced9b2
commit ba8d6b796f
20 changed files with 1267 additions and 110 deletions
+199
View File
@@ -193,8 +193,16 @@ pub struct MultiServer {
/// Live TCP server handle (shared with the accept loop), used by the mask rotator to update
/// the accept-time options. `None` when the TCP transport was not enabled.
tcp: Option<Arc<TcpServer>>,
/// v3.4: actual bound addresses for each transport. Differs from the originally requested
/// `Endpoints` when [`Self::bind_with_outer_or_scan`] had to walk past a busy port. Empty
/// (`None`) for transports that were disabled or failed to bind.
bound: Endpoints,
}
/// v3.4: default port-scan budget. When a transport's requested port is occupied,
/// [`MultiServer::bind_with_outer_or_scan`] walks forward this many candidates before giving up.
pub const DEFAULT_PORT_SCAN_MAX: u16 = 20;
impl MultiServer {
/// Bind and start accept loops for every transport whose address is set in `endpoints`.
/// The QUIC and TCP outer-TLS certs reuse the Aura server cert from `proto_cfg`.
@@ -251,10 +259,12 @@ impl MultiServer {
let (txc, rx) = mpsc::channel::<Accepted>(32);
let mut tasks = Vec::new();
let mut bound = Endpoints::default();
let udp_handle = if let Some(addr) = endpoints.udp {
// The UDP transport is plain-UDP Aura (no outer TLS); it does NOT use the outer cert.
let server = Arc::new(UdpServer::bind(addr, proto_cfg.clone(), udp)?);
bound.udp = server.local_addr().ok();
tasks.push(tokio::spawn(udp_accept_loop(
Arc::clone(&server),
txc.clone(),
@@ -271,6 +281,7 @@ impl MultiServer {
}
None => TcpServer::bind(addr, proto_cfg.clone(), tcp.clone()).await?,
});
bound.tcp = server.local_addr().ok();
tasks.push(tokio::spawn(tcp_accept_loop(
Arc::clone(&server),
txc.clone(),
@@ -289,6 +300,7 @@ impl MultiServer {
),
};
let server = AuraServer::bind(addr, oc, ok, proto_cfg.clone())?;
bound.quic = server.local_addr().ok();
tasks.push(tokio::spawn(quic_accept_loop(server, txc.clone())));
}
@@ -300,9 +312,119 @@ impl MultiServer {
tasks,
udp: udp_handle,
tcp: tcp_handle,
bound,
})
}
/// v3.4: like [`Self::bind_with_outer`], but if any transport's requested port is occupied
/// (returns `io::ErrorKind::AddrInUse`), scan forward up to `max_scan` candidates per
/// transport before failing. The actually-bound addresses are recorded in [`Self::bound_addrs`]
/// — they often differ from `endpoints` when the host has e.g. sing-box on the original port.
///
/// The UDP transport and QUIC must end up on different ports (both use UDP); if the scan
/// drives them into a collision, the second one keeps walking. TCP can share a port number
/// with either since it is a different protocol.
///
/// Per-transport policy:
/// * **Fatal bind error** (anything other than `AddrInUse`, or `AddrInUse` past the scan
/// budget) bubbles up and aborts the server — keeping behaviour consistent with v3.3.
/// * **No fallback for transports that were `None`** — they stay disabled.
///
/// # Errors
/// Same as [`Self::bind_with_outer`] after the scan-resolved endpoints are computed.
pub async fn bind_with_outer_or_scan(
mut endpoints: Endpoints,
proto_cfg: ServerConfig,
udp: UdpOpts,
tcp: TcpOpts,
outer_cert_pem: Option<&str>,
outer_key_pem: Option<&str>,
max_scan: u16,
) -> anyhow::Result<Self> {
// Pre-probe each transport's port. We use raw std::net binds (with SO_REUSEADDR is the
// OS default off-state on macOS/Linux) to test availability, drop the probe, and pass the
// resolved port to the real bind. There is a microsecond race window between drop and
// real bind; for a non-malicious environment that's acceptable, and the real bind will
// simply return AddrInUse if hit (caller can re-run the scan).
if let Some(addr) = endpoints.udp {
let resolved = scan_free_udp_port(addr, max_scan).ok_or_else(|| {
anyhow::anyhow!(
"no free UDP port in {}..{} for Aura custom-UDP transport",
addr.port(),
addr.port().saturating_add(max_scan)
)
})?;
if resolved != addr {
tracing::warn!(
requested = %addr,
actual = %resolved,
"UDP transport: requested port busy, scanned forward and picked a free one"
);
}
endpoints.udp = Some(resolved);
}
if let Some(addr) = endpoints.quic {
// QUIC must not collide with the custom-UDP port; if it does, start scanning from
// the next port.
let start = match endpoints.udp {
Some(udp_addr) if udp_addr.ip() == addr.ip() && udp_addr.port() == addr.port() => {
SocketAddr::new(addr.ip(), addr.port().saturating_add(1))
}
_ => addr,
};
let resolved = scan_free_udp_port(start, max_scan).ok_or_else(|| {
anyhow::anyhow!(
"no free UDP port in {}..{} for QUIC outer transport",
start.port(),
start.port().saturating_add(max_scan)
)
})?;
if resolved != addr {
tracing::warn!(
requested = %addr,
actual = %resolved,
"QUIC transport: requested port busy, scanned forward and picked a free one"
);
}
endpoints.quic = Some(resolved);
}
if let Some(addr) = endpoints.tcp {
let resolved = scan_free_tcp_port(addr, max_scan).ok_or_else(|| {
anyhow::anyhow!(
"no free TCP port in {}..{} for TCP outer transport",
addr.port(),
addr.port().saturating_add(max_scan)
)
})?;
if resolved != addr {
tracing::warn!(
requested = %addr,
actual = %resolved,
"TCP transport: requested port busy, scanned forward and picked a free one"
);
}
endpoints.tcp = Some(resolved);
}
Self::bind_with_outer(
endpoints,
proto_cfg,
udp,
tcp,
outer_cert_pem,
outer_key_pem,
)
.await
}
/// v3.4: the addresses each enabled transport actually bound to. After
/// [`Self::bind_with_outer_or_scan`], these may differ from the requested `Endpoints` if a
/// port had to be walked past a conflict. Transports that were not enabled remain `None`.
#[must_use]
pub fn bound_addrs(&self) -> &Endpoints {
&self.bound
}
/// Update the UDP accept-time options. The next [`Self::accept`] of a UDP connection will use
/// the new options; existing connections keep theirs. No-op if the UDP transport is disabled.
pub async fn set_udp_opts(&self, new_opts: UdpOpts) {
@@ -326,6 +448,42 @@ impl MultiServer {
}
}
/// Try `start.port()`, `start.port()+1`, ..., `start.port()+max_scan` until a UDP bind succeeds.
/// Returns the resolved [`SocketAddr`]; `None` if no candidate was free within the budget.
fn scan_free_udp_port(start: SocketAddr, max_scan: u16) -> Option<SocketAddr> {
let mut port = start.port();
let upper = port.saturating_add(max_scan);
while port <= upper {
let cand = SocketAddr::new(start.ip(), port);
if std::net::UdpSocket::bind(cand).is_ok() {
return Some(cand);
}
// Overflow guard: port is u16, saturating_add(1) caps at u16::MAX without wrap.
if port == u16::MAX {
return None;
}
port += 1;
}
None
}
/// Try `start.port()`, `start.port()+1`, ..., `start.port()+max_scan` until a TCP bind succeeds.
fn scan_free_tcp_port(start: SocketAddr, max_scan: u16) -> Option<SocketAddr> {
let mut port = start.port();
let upper = port.saturating_add(max_scan);
while port <= upper {
let cand = SocketAddr::new(start.ip(), port);
if std::net::TcpListener::bind(cand).is_ok() {
return Some(cand);
}
if port == u16::MAX {
return None;
}
port += 1;
}
None
}
impl Drop for MultiServer {
fn drop(&mut self) {
for t in &self.tasks {
@@ -399,3 +557,44 @@ async fn quic_accept_loop(server: AuraServer, tx: mpsc::Sender<Accepted>) {
}
}
}
#[cfg(test)]
mod port_scan_tests {
use super::*;
/// When the requested port is occupied, the scan walks forward and returns a port within
/// the budget. We hold a real socket to simulate the busy condition.
#[test]
fn udp_scan_skips_busy_port() {
// Start from an OS-assigned free port, then re-bind to the same port and start scanning
// from there — the scanner must skip the busy port and find a free neighbour.
let blocker = std::net::UdpSocket::bind("127.0.0.1:0").expect("bind blocker");
let busy_addr = blocker.local_addr().expect("local_addr");
let resolved = scan_free_udp_port(busy_addr, 10).expect("scan must find a free port");
assert_ne!(resolved.port(), busy_addr.port(), "must skip the busy port");
assert!(resolved.port() > busy_addr.port());
assert!(resolved.port() <= busy_addr.port() + 10);
drop(blocker);
}
#[test]
fn tcp_scan_skips_busy_port() {
let blocker = std::net::TcpListener::bind("127.0.0.1:0").expect("bind blocker");
let busy_addr = blocker.local_addr().expect("local_addr");
let resolved = scan_free_tcp_port(busy_addr, 10).expect("scan must find a free port");
assert_ne!(resolved.port(), busy_addr.port());
assert!(resolved.port() > busy_addr.port());
assert!(resolved.port() <= busy_addr.port() + 10);
drop(blocker);
}
/// With a zero scan budget, a busy port yields `None` (no walk, no luck).
#[test]
fn scan_with_zero_budget_returns_none_on_busy_port() {
let blocker = std::net::UdpSocket::bind("127.0.0.1:0").expect("bind blocker");
let busy_addr = blocker.local_addr().expect("local_addr");
let resolved = scan_free_udp_port(busy_addr, 0);
assert_eq!(resolved, None);
drop(blocker);
}
}
+3 -1
View File
@@ -72,7 +72,9 @@ pub mod tcp;
pub mod udp;
pub use conn::AuraConnection;
pub use dial::{dial, Accepted, DialConfig, Endpoints, MultiServer, TransportMode};
pub use dial::{
dial, Accepted, DialConfig, Endpoints, MultiServer, TransportMode, DEFAULT_PORT_SCAN_MAX,
};
pub use mimicry::{alpn_protocols, chrome_quic_transport_config, ALPN_H3, DEFAULT_SNI};
pub use padding::{
inject_padding_frames, next_bucket_for_profile, pad_to_bucket, pad_to_https_size,