feat(cli): v3.3 circuit rotation — background rebuild every N seconds

Adds RotatingCircuit: the multi-hop circuit is silently torn down and
rebuilt on a configurable interval (default off) so a long-running
client periodically rotates its on-wire path. Application packets never
see the swap.

- RotatingCircuit::new(hops, udp_opts, interval) seeds an initial
  CircuitConnection synchronously (errors surface), then spawns a
  background rotator that every `interval`:
    1. dial_circuit(&hops, udp_opts) -> next: CircuitConnection
    2. std::mem::replace inside Arc<RwLock<Arc<CircuitConnection>>>
    3. old Arc dropped when its last in-flight Arc clone is released
       (its Drop aborts forwarders / closes outers).
  send_packet/recv_packet grab a cheap snapshot of the current Arc
  before awaiting, so reads/writes never block under the rotator.
- [client.circuit] rotation_interval_secs: u64 (default 0 = disabled);
  serde(default) keeps old configs working. When 0, the path is exactly
  the v3.2 dial_circuit + optional CellPaddingConn wrap (back-compat).
- CellPaddingConn wraps RotatingCircuit on the OUTSIDE so every new
  circuit shares the same cell_size — on-wire size signature stays
  stable across rotations.
- Integration test multihop_rotation::rotating_circuit_swaps_inner_
  under_traffic: 6 s of 100-ms ping/echo at interval=1.5s -> 37 sent,
  37 received, 2 rotations counted via test-only AtomicU64 counter.
- Synchronous-failure test confirms initial dial errors bubble up from
  ::new without spawning the rotator task.

Workspace: 297 tests passed (+4), clippy -D warnings clean, fmt clean.
293 baseline tests unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
xah30
2026-05-27 21:25:05 +03:00
parent a070da0be9
commit 5e553b79df
5 changed files with 623 additions and 12 deletions
+187
View File
@@ -39,7 +39,9 @@
//! companion mitigation for.
use std::net::SocketAddr;
use std::sync::atomic::{AtomicU64, Ordering};
use std::sync::Arc;
use std::time::Duration;
use anyhow::{anyhow, bail, Context};
use async_trait::async_trait;
@@ -49,6 +51,7 @@ use aura_proto::{
};
use aura_transport::{UdpClient, UdpConnection, UdpOpts};
use tokio::net::UdpSocket;
use tokio::sync::RwLock;
use tokio::task::JoinHandle;
/// How long the client waits for each hop to reply with [`ControlKind::CircuitReady`] after
@@ -419,3 +422,187 @@ pub async fn dial_circuit_with_relay_name(
];
dial_circuit(&hop_cfgs, udp_opts).await
}
// ---- v3.3: RotatingCircuit ---------------------------------------------------------------------
//
// Every `interval` seconds the rotator silently rebuilds the entire N-hop circuit from scratch
// (new outer handshakes, new ExtendBridge envelopes, a fresh inner handshake to the exit) and
// atomically swaps the new [`CircuitConnection`] in for the old one. Any in-flight `send_packet`
// / `recv_packet` calls on the previous instance keep running on their own `Arc` clones until
// they complete or the OS-level socket dies; new sends/receives after the swap go through the
// fresh circuit. The old circuit is dropped — closing every outer connection and aborting every
// forwarder task — as soon as the last in-flight `Arc` is released.
//
// Identity rotation: because `dial_circuit` re-runs the full per-hop handshake every time, every
// relay sees a brand-new TLS session (different ephemeral key, fresh AEAD nonces). With per-hop
// client certs (v3.2) the certificate CN is also rotated. The exit only knows the client's
// stable cert CN; the relay only knows the previous and next IP — neither side can correlate
// activity across rotations to a single long-lived flow.
/// Parameters captured at construction time so the background rotator can rebuild the circuit
/// without re-reading the config. Immutable for the lifetime of the rotator.
struct RebuildParams {
/// Per-hop dial configs. The whole vector is cloned into every [`dial_circuit`] call so
/// concurrent rebuild attempts cannot mutate each other's view.
hops: Vec<HopConfig>,
/// UDP transport options applied to every outer hop's [`aura_transport::UdpClient::connect`].
udp_opts: UdpOpts,
/// How long to wait between successful rebuilds. Failures do not reset the timer — the next
/// tick is `interval` from the previous wakeup, regardless of outcome.
interval: Duration,
}
/// A [`PacketConnection`] wrapper that periodically rebuilds the underlying [`CircuitConnection`]
/// in the background. Every `send_packet` / `recv_packet` call delegates to the **currently active**
/// inner [`CircuitConnection`]; when a rebuild completes, the new circuit atomically replaces the
/// old one.
///
/// ## Lifecycle
///
/// * [`RotatingCircuit::new`] dials the initial circuit synchronously (so the caller can fail fast
/// if the entry hop is unreachable) and then spawns the background rotator.
/// * Every `interval` the rotator runs [`dial_circuit`] with the captured [`RebuildParams::hops`].
/// On success the new [`CircuitConnection`] replaces the previous one inside the [`RwLock`];
/// on failure the previous one is kept and the rotator logs a warning, then waits another
/// `interval` before retrying.
/// * [`Drop`] aborts the rotator task. The currently-active inner circuit is dropped through the
/// `Arc` chain, tearing down its forwarders and outer sockets.
///
/// ## Cell padding interaction
///
/// The CLI wires [`RotatingCircuit`] **inside** any [`crate::cells::CellPaddingConn`] — the
/// padding layer is applied to the rotator's `Arc<dyn PacketConnection>`, not to each individual
/// circuit. This means every rotation produces a circuit that carries cells of the **same**
/// `cell_size`, keeping the on-wire signature stable across rotations.
pub struct RotatingCircuit {
/// The currently-active circuit. Replaced on each successful rebuild.
///
/// `Arc<...>` so `send_packet` / `recv_packet` can grab a cheap clone, release the read-lock,
/// then await on the snapshot — any in-flight call on a *previous* inner does not block the
/// rotator's swap.
current: Arc<RwLock<Arc<CircuitConnection>>>,
/// Captured rebuild parameters. Wrapped in `Arc` so the rotator task can own a clone without
/// holding `&self`.
_rebuild: Arc<RebuildParams>,
/// Number of *successful* rotations completed since construction. Tests use this to assert
/// that the background rotator actually ran; production code does not depend on the value.
rotation_count: Arc<AtomicU64>,
/// Background rotator. Aborted on [`Drop`].
rotator_task: JoinHandle<()>,
}
impl Drop for RotatingCircuit {
fn drop(&mut self) {
// Stop the rotator first so it cannot replace `current` mid-drop.
self.rotator_task.abort();
// `current`'s last `Arc` is released when `self` goes out of scope; that drops the
// wrapped `CircuitConnection`, which in turn aborts every forwarder + closes every outer.
}
}
impl RotatingCircuit {
/// Dial the initial N-hop circuit and start the background rotator.
///
/// `interval` MUST be greater than zero; the caller is expected to gate construction on a
/// non-zero `rotation_interval_secs`. If `dial_circuit` fails synchronously, the error
/// propagates and no background task is spawned.
///
/// # Errors
/// * The initial [`dial_circuit`] failed (entry hop unreachable, hop count invalid, etc.).
pub async fn new(
hops: Vec<HopConfig>,
udp_opts: UdpOpts,
interval: Duration,
) -> anyhow::Result<Self> {
let initial = dial_circuit(&hops, udp_opts)
.await
.context("RotatingCircuit: initial dial_circuit")?;
let current = Arc::new(RwLock::new(Arc::new(initial)));
let rebuild = Arc::new(RebuildParams {
hops,
udp_opts,
interval,
});
let rotation_count = Arc::new(AtomicU64::new(0));
let task_current = Arc::clone(&current);
let task_rebuild = Arc::clone(&rebuild);
let task_counter = Arc::clone(&rotation_count);
let rotator_task = tokio::spawn(async move {
rotator_loop(task_current, task_rebuild, task_counter).await;
});
Ok(Self {
current,
_rebuild: rebuild,
rotation_count,
rotator_task,
})
}
/// Number of successful rotations that have occurred since construction. Test-only helper —
/// production code MUST not depend on the exact value because rotations are timer-driven.
#[must_use]
pub fn rotation_count(&self) -> u64 {
self.rotation_count.load(Ordering::Relaxed)
}
/// The verified peer Common Name of the **currently-active** inner circuit's exit. This may
/// change across rotations only if `hops[N-1].proto_cfg.server_name` was changed — under
/// normal operation (immutable `RebuildParams`) it stays the same.
pub async fn peer_id(&self) -> Option<String> {
let snap = { self.current.read().await.clone() };
snap.peer_id().map(str::to_owned)
}
}
#[async_trait]
impl PacketConnection for RotatingCircuit {
async fn send_packet(&self, packet: &[u8]) -> anyhow::Result<()> {
// Snapshot the current circuit (cheap `Arc` clone) and release the read-lock immediately
// so the rotator's `write().await` can replace `current` while this send is in flight.
let conn = { self.current.read().await.clone() };
conn.send_packet(packet).await
}
async fn recv_packet(&self) -> anyhow::Result<Vec<u8>> {
let conn = { self.current.read().await.clone() };
conn.recv_packet().await
}
}
/// Background rotator: every `interval` rebuild the circuit and atomically swap it in.
///
/// Failure handling: a failed rebuild leaves the previous circuit in place and the rotator waits
/// the full `interval` before retrying. This avoids tight-loop hammering an unreachable entry
/// hop (a transient network glitch should not multiply the dial rate).
async fn rotator_loop(
current: Arc<RwLock<Arc<CircuitConnection>>>,
rebuild: Arc<RebuildParams>,
rotation_count: Arc<AtomicU64>,
) {
loop {
tokio::time::sleep(rebuild.interval).await;
match dial_circuit(&rebuild.hops, rebuild.udp_opts).await {
Ok(next) => {
let new_arc = Arc::new(next);
{
let mut slot = current.write().await;
// `std::mem::replace` returns the previous `Arc<CircuitConnection>`. It drops
// here at the end of this block — if no `send_packet`/`recv_packet` is still
// holding a snapshot, the old `CircuitConnection`'s `Drop` runs immediately
// (aborting forwarders, closing sockets).
let _old = std::mem::replace(&mut *slot, new_arc);
}
let n = rotation_count.fetch_add(1, Ordering::Relaxed) + 1;
tracing::info!(rotation = n, "circuit rotated successfully");
}
Err(e) => {
tracing::warn!(
error = %e,
"circuit rotation failed; keeping previous circuit active until next tick"
);
}
}
}
}