feat(cli): v3.3 circuit rotation — background rebuild every N seconds
Adds RotatingCircuit: the multi-hop circuit is silently torn down and
rebuilt on a configurable interval (default off) so a long-running
client periodically rotates its on-wire path. Application packets never
see the swap.
- RotatingCircuit::new(hops, udp_opts, interval) seeds an initial
CircuitConnection synchronously (errors surface), then spawns a
background rotator that every `interval`:
1. dial_circuit(&hops, udp_opts) -> next: CircuitConnection
2. std::mem::replace inside Arc<RwLock<Arc<CircuitConnection>>>
3. old Arc dropped when its last in-flight Arc clone is released
(its Drop aborts forwarders / closes outers).
send_packet/recv_packet grab a cheap snapshot of the current Arc
before awaiting, so reads/writes never block under the rotator.
- [client.circuit] rotation_interval_secs: u64 (default 0 = disabled);
serde(default) keeps old configs working. When 0, the path is exactly
the v3.2 dial_circuit + optional CellPaddingConn wrap (back-compat).
- CellPaddingConn wraps RotatingCircuit on the OUTSIDE so every new
circuit shares the same cell_size — on-wire size signature stays
stable across rotations.
- Integration test multihop_rotation::rotating_circuit_swaps_inner_
under_traffic: 6 s of 100-ms ping/echo at interval=1.5s -> 37 sent,
37 received, 2 rotations counted via test-only AtomicU64 counter.
- Synchronous-failure test confirms initial dial errors bubble up from
::new without spawning the rotator task.
Workspace: 297 tests passed (+4), clippy -D warnings clean, fmt clean.
293 baseline tests unchanged.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -39,7 +39,9 @@
|
||||
//! companion mitigation for.
|
||||
|
||||
use std::net::SocketAddr;
|
||||
use std::sync::atomic::{AtomicU64, Ordering};
|
||||
use std::sync::Arc;
|
||||
use std::time::Duration;
|
||||
|
||||
use anyhow::{anyhow, bail, Context};
|
||||
use async_trait::async_trait;
|
||||
@@ -49,6 +51,7 @@ use aura_proto::{
|
||||
};
|
||||
use aura_transport::{UdpClient, UdpConnection, UdpOpts};
|
||||
use tokio::net::UdpSocket;
|
||||
use tokio::sync::RwLock;
|
||||
use tokio::task::JoinHandle;
|
||||
|
||||
/// How long the client waits for each hop to reply with [`ControlKind::CircuitReady`] after
|
||||
@@ -419,3 +422,187 @@ pub async fn dial_circuit_with_relay_name(
|
||||
];
|
||||
dial_circuit(&hop_cfgs, udp_opts).await
|
||||
}
|
||||
|
||||
// ---- v3.3: RotatingCircuit ---------------------------------------------------------------------
|
||||
//
|
||||
// Every `interval` seconds the rotator silently rebuilds the entire N-hop circuit from scratch
|
||||
// (new outer handshakes, new ExtendBridge envelopes, a fresh inner handshake to the exit) and
|
||||
// atomically swaps the new [`CircuitConnection`] in for the old one. Any in-flight `send_packet`
|
||||
// / `recv_packet` calls on the previous instance keep running on their own `Arc` clones until
|
||||
// they complete or the OS-level socket dies; new sends/receives after the swap go through the
|
||||
// fresh circuit. The old circuit is dropped — closing every outer connection and aborting every
|
||||
// forwarder task — as soon as the last in-flight `Arc` is released.
|
||||
//
|
||||
// Identity rotation: because `dial_circuit` re-runs the full per-hop handshake every time, every
|
||||
// relay sees a brand-new TLS session (different ephemeral key, fresh AEAD nonces). With per-hop
|
||||
// client certs (v3.2) the certificate CN is also rotated. The exit only knows the client's
|
||||
// stable cert CN; the relay only knows the previous and next IP — neither side can correlate
|
||||
// activity across rotations to a single long-lived flow.
|
||||
|
||||
/// Parameters captured at construction time so the background rotator can rebuild the circuit
|
||||
/// without re-reading the config. Immutable for the lifetime of the rotator.
|
||||
struct RebuildParams {
|
||||
/// Per-hop dial configs. The whole vector is cloned into every [`dial_circuit`] call so
|
||||
/// concurrent rebuild attempts cannot mutate each other's view.
|
||||
hops: Vec<HopConfig>,
|
||||
/// UDP transport options applied to every outer hop's [`aura_transport::UdpClient::connect`].
|
||||
udp_opts: UdpOpts,
|
||||
/// How long to wait between successful rebuilds. Failures do not reset the timer — the next
|
||||
/// tick is `interval` from the previous wakeup, regardless of outcome.
|
||||
interval: Duration,
|
||||
}
|
||||
|
||||
/// A [`PacketConnection`] wrapper that periodically rebuilds the underlying [`CircuitConnection`]
|
||||
/// in the background. Every `send_packet` / `recv_packet` call delegates to the **currently active**
|
||||
/// inner [`CircuitConnection`]; when a rebuild completes, the new circuit atomically replaces the
|
||||
/// old one.
|
||||
///
|
||||
/// ## Lifecycle
|
||||
///
|
||||
/// * [`RotatingCircuit::new`] dials the initial circuit synchronously (so the caller can fail fast
|
||||
/// if the entry hop is unreachable) and then spawns the background rotator.
|
||||
/// * Every `interval` the rotator runs [`dial_circuit`] with the captured [`RebuildParams::hops`].
|
||||
/// On success the new [`CircuitConnection`] replaces the previous one inside the [`RwLock`];
|
||||
/// on failure the previous one is kept and the rotator logs a warning, then waits another
|
||||
/// `interval` before retrying.
|
||||
/// * [`Drop`] aborts the rotator task. The currently-active inner circuit is dropped through the
|
||||
/// `Arc` chain, tearing down its forwarders and outer sockets.
|
||||
///
|
||||
/// ## Cell padding interaction
|
||||
///
|
||||
/// The CLI wires [`RotatingCircuit`] **inside** any [`crate::cells::CellPaddingConn`] — the
|
||||
/// padding layer is applied to the rotator's `Arc<dyn PacketConnection>`, not to each individual
|
||||
/// circuit. This means every rotation produces a circuit that carries cells of the **same**
|
||||
/// `cell_size`, keeping the on-wire signature stable across rotations.
|
||||
pub struct RotatingCircuit {
|
||||
/// The currently-active circuit. Replaced on each successful rebuild.
|
||||
///
|
||||
/// `Arc<...>` so `send_packet` / `recv_packet` can grab a cheap clone, release the read-lock,
|
||||
/// then await on the snapshot — any in-flight call on a *previous* inner does not block the
|
||||
/// rotator's swap.
|
||||
current: Arc<RwLock<Arc<CircuitConnection>>>,
|
||||
/// Captured rebuild parameters. Wrapped in `Arc` so the rotator task can own a clone without
|
||||
/// holding `&self`.
|
||||
_rebuild: Arc<RebuildParams>,
|
||||
/// Number of *successful* rotations completed since construction. Tests use this to assert
|
||||
/// that the background rotator actually ran; production code does not depend on the value.
|
||||
rotation_count: Arc<AtomicU64>,
|
||||
/// Background rotator. Aborted on [`Drop`].
|
||||
rotator_task: JoinHandle<()>,
|
||||
}
|
||||
|
||||
impl Drop for RotatingCircuit {
|
||||
fn drop(&mut self) {
|
||||
// Stop the rotator first so it cannot replace `current` mid-drop.
|
||||
self.rotator_task.abort();
|
||||
// `current`'s last `Arc` is released when `self` goes out of scope; that drops the
|
||||
// wrapped `CircuitConnection`, which in turn aborts every forwarder + closes every outer.
|
||||
}
|
||||
}
|
||||
|
||||
impl RotatingCircuit {
|
||||
/// Dial the initial N-hop circuit and start the background rotator.
|
||||
///
|
||||
/// `interval` MUST be greater than zero; the caller is expected to gate construction on a
|
||||
/// non-zero `rotation_interval_secs`. If `dial_circuit` fails synchronously, the error
|
||||
/// propagates and no background task is spawned.
|
||||
///
|
||||
/// # Errors
|
||||
/// * The initial [`dial_circuit`] failed (entry hop unreachable, hop count invalid, etc.).
|
||||
pub async fn new(
|
||||
hops: Vec<HopConfig>,
|
||||
udp_opts: UdpOpts,
|
||||
interval: Duration,
|
||||
) -> anyhow::Result<Self> {
|
||||
let initial = dial_circuit(&hops, udp_opts)
|
||||
.await
|
||||
.context("RotatingCircuit: initial dial_circuit")?;
|
||||
let current = Arc::new(RwLock::new(Arc::new(initial)));
|
||||
let rebuild = Arc::new(RebuildParams {
|
||||
hops,
|
||||
udp_opts,
|
||||
interval,
|
||||
});
|
||||
let rotation_count = Arc::new(AtomicU64::new(0));
|
||||
|
||||
let task_current = Arc::clone(¤t);
|
||||
let task_rebuild = Arc::clone(&rebuild);
|
||||
let task_counter = Arc::clone(&rotation_count);
|
||||
let rotator_task = tokio::spawn(async move {
|
||||
rotator_loop(task_current, task_rebuild, task_counter).await;
|
||||
});
|
||||
|
||||
Ok(Self {
|
||||
current,
|
||||
_rebuild: rebuild,
|
||||
rotation_count,
|
||||
rotator_task,
|
||||
})
|
||||
}
|
||||
|
||||
/// Number of successful rotations that have occurred since construction. Test-only helper —
|
||||
/// production code MUST not depend on the exact value because rotations are timer-driven.
|
||||
#[must_use]
|
||||
pub fn rotation_count(&self) -> u64 {
|
||||
self.rotation_count.load(Ordering::Relaxed)
|
||||
}
|
||||
|
||||
/// The verified peer Common Name of the **currently-active** inner circuit's exit. This may
|
||||
/// change across rotations only if `hops[N-1].proto_cfg.server_name` was changed — under
|
||||
/// normal operation (immutable `RebuildParams`) it stays the same.
|
||||
pub async fn peer_id(&self) -> Option<String> {
|
||||
let snap = { self.current.read().await.clone() };
|
||||
snap.peer_id().map(str::to_owned)
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl PacketConnection for RotatingCircuit {
|
||||
async fn send_packet(&self, packet: &[u8]) -> anyhow::Result<()> {
|
||||
// Snapshot the current circuit (cheap `Arc` clone) and release the read-lock immediately
|
||||
// so the rotator's `write().await` can replace `current` while this send is in flight.
|
||||
let conn = { self.current.read().await.clone() };
|
||||
conn.send_packet(packet).await
|
||||
}
|
||||
|
||||
async fn recv_packet(&self) -> anyhow::Result<Vec<u8>> {
|
||||
let conn = { self.current.read().await.clone() };
|
||||
conn.recv_packet().await
|
||||
}
|
||||
}
|
||||
|
||||
/// Background rotator: every `interval` rebuild the circuit and atomically swap it in.
|
||||
///
|
||||
/// Failure handling: a failed rebuild leaves the previous circuit in place and the rotator waits
|
||||
/// the full `interval` before retrying. This avoids tight-loop hammering an unreachable entry
|
||||
/// hop (a transient network glitch should not multiply the dial rate).
|
||||
async fn rotator_loop(
|
||||
current: Arc<RwLock<Arc<CircuitConnection>>>,
|
||||
rebuild: Arc<RebuildParams>,
|
||||
rotation_count: Arc<AtomicU64>,
|
||||
) {
|
||||
loop {
|
||||
tokio::time::sleep(rebuild.interval).await;
|
||||
match dial_circuit(&rebuild.hops, rebuild.udp_opts).await {
|
||||
Ok(next) => {
|
||||
let new_arc = Arc::new(next);
|
||||
{
|
||||
let mut slot = current.write().await;
|
||||
// `std::mem::replace` returns the previous `Arc<CircuitConnection>`. It drops
|
||||
// here at the end of this block — if no `send_packet`/`recv_packet` is still
|
||||
// holding a snapshot, the old `CircuitConnection`'s `Drop` runs immediately
|
||||
// (aborting forwarders, closing sockets).
|
||||
let _old = std::mem::replace(&mut *slot, new_arc);
|
||||
}
|
||||
let n = rotation_count.fetch_add(1, Ordering::Relaxed) + 1;
|
||||
tracing::info!(rotation = n, "circuit rotated successfully");
|
||||
}
|
||||
Err(e) => {
|
||||
tracing::warn!(
|
||||
error = %e,
|
||||
"circuit rotation failed; keeping previous circuit active until next tick"
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user