fix(server): v3.6 — implicit auto-NAT on Linux (root cause of full-VPN dying)
Symptoms: in default = "VPN" full-VPN mode external internet was dead even though tunnel-internal ping (10.7.0.1) worked perfectly. The tunnel itself was assembled and AEAD-encrypted (see TEST_CASES.md), but packets sent through it died on the server side. Root cause: server's `[server.nat]` was opt-in. On the production server (187.77.67.17) deployed before v2, the section is absent in /etc/aura/server.toml, so `aura server` never ran the iptables MASQUERADE plan. Packets egressed to the upstream router with src = 10.7.0.10 (RFC1918), which the provider's reverse-path filter dropped — full-VPN clients saw "internet is dead". Tunnel-internal pool addresses worked because they don't need NAT. Fix: * `server.rs`: when `[server.nat]` is absent in server.toml AND we are on Linux, attempt auto-NAT with an auto-detected egress_iface. If detection or the iptables call fails we DON'T bail — we log a loud error and let the server come up so safe-mode clients keep working. * `config.rs`: `ServerNatSection::default()` now defaults `auto = true`. A bare `[server.nat]` header (no `auto =`) now means "yes, enable it" instead of the silent-noop it used to be. * New tests for both bare-header and explicit `auto = false` opt-out paths. * `docs/server_nat_fix.md`: step-by-step instructions for fixing the existing 187.77.67.17 server (binary upgrade vs. manual server.toml patch vs. fully-manual sysctl + iptables). * `docs/deployment.md`: replaces "manual mandatory step" wording with the new auto-NAT story. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -264,24 +264,42 @@ impl ServerOuterCertSection {
|
||||
}
|
||||
|
||||
/// `[server.nat]` section: v2 auto-NAT configuration. See [`crate::nat`] for the apply / rollback
|
||||
/// semantics. Optional — when the section is omitted the server makes no changes to the host's
|
||||
/// IP forwarding state, matching v1 behaviour.
|
||||
#[derive(Debug, Clone, Default, Deserialize)]
|
||||
/// semantics. Optional — when the section is omitted the server falls back to the v3.6
|
||||
/// **implicit auto-NAT** path on Linux (see [`crate::server`]): it tries `auto = true` with an
|
||||
/// auto-detected `egress_iface`, logging a clear notice. To opt out explicitly write
|
||||
/// `[server.nat]\nauto = false` (or upgrade to a config with `[server.nat] auto = true`
|
||||
/// and an explicit `egress_iface`).
|
||||
#[derive(Debug, Clone, Deserialize)]
|
||||
#[serde(default)]
|
||||
pub struct ServerNatSection {
|
||||
/// Master switch. When `false` (or the section is omitted) the server does NOT touch the
|
||||
/// host network — the operator is expected to have configured forwarding by hand. When
|
||||
/// `true` the server applies the platform-appropriate set of commands at startup and
|
||||
/// rolls them back on shutdown.
|
||||
/// Master switch. **Defaults to `true`** so that an operator who writes `[server.nat]` at all
|
||||
/// gets working NAT without having to also remember `auto = true`. Set it to `false`
|
||||
/// explicitly to disable auto-NAT while still keeping the section (e.g. only to pin
|
||||
/// `egress_iface` for documentation purposes).
|
||||
#[serde(default = "default_true")]
|
||||
pub auto: bool,
|
||||
/// Name of the host interface traffic egresses through (e.g. `"eth0"` on Linux, `"en0"` on
|
||||
/// macOS). REQUIRED when `auto = true` — there is no auto-detection in v1 (that is v3).
|
||||
/// macOS). Optional since v3 — when empty the server auto-detects from the host's default
|
||||
/// route via [`crate::os_routes::detect_default_egress_iface`]; only set this if the host
|
||||
/// has multiple egresses or auto-detection fails.
|
||||
#[serde(default)]
|
||||
pub egress_iface: String,
|
||||
/// When `true`, every command is only logged (`would run: ...`) and not executed. Useful
|
||||
/// for verifying the plan without root privileges and for the unit tests.
|
||||
#[serde(default)]
|
||||
pub dry_run: bool,
|
||||
}
|
||||
|
||||
impl Default for ServerNatSection {
|
||||
fn default() -> Self {
|
||||
Self {
|
||||
auto: true,
|
||||
egress_iface: String::new(),
|
||||
dry_run: false,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// `[tunnel]` section of `server.toml`.
|
||||
#[derive(Debug, Clone, Deserialize)]
|
||||
pub struct ServerTunnelSection {
|
||||
@@ -1952,7 +1970,8 @@ pool_cidr = "10.7.0.0/24"
|
||||
}
|
||||
|
||||
/// Backwards compat: an old server.toml without `[server.nat]` parses fine and exposes
|
||||
/// `nat = None`. This preserves the v1 "operator configures NAT by hand" behaviour.
|
||||
/// `nat = None`. v3.6 keeps the *type* the same (`Option<ServerNatSection>`) — the new
|
||||
/// implicit-auto-NAT behaviour lives in [`crate::server::run`], not in the parser.
|
||||
#[test]
|
||||
fn server_nat_section_optional() {
|
||||
let s = r#"
|
||||
@@ -1966,7 +1985,65 @@ key = "c"
|
||||
pool_cidr = "10.7.0.0/24"
|
||||
"#;
|
||||
let cfg = ServerConfigFile::parse(s).expect("parse minimal v1 server.toml");
|
||||
assert!(cfg.server.nat.is_none(), "nat section absent by default");
|
||||
assert!(cfg.server.nat.is_none(), "nat section absent in toml");
|
||||
}
|
||||
|
||||
/// v3.6: `ServerNatSection::default()` is now `auto = true` (was `false` in v1/v2). This
|
||||
/// makes a bare `[server.nat]` section (no `auto =` field) work out of the box — the
|
||||
/// operator who wrote the section evidently wants it enabled.
|
||||
#[test]
|
||||
fn server_nat_section_default_is_auto_true() {
|
||||
let d = ServerNatSection::default();
|
||||
assert!(d.auto, "v3.6 default: auto = true");
|
||||
assert!(
|
||||
d.egress_iface.is_empty(),
|
||||
"v3.6 default: egress_iface empty (server.rs auto-detects)"
|
||||
);
|
||||
assert!(!d.dry_run, "v3.6 default: dry_run = false");
|
||||
}
|
||||
|
||||
/// v3.6: an operator who writes a bare `[server.nat]` section without specifying `auto =`
|
||||
/// gets `auto = true` (the new default). Egress is left empty so the runtime auto-detects.
|
||||
#[test]
|
||||
fn server_nat_section_bare_header_enables_auto() {
|
||||
let s = r#"
|
||||
[server]
|
||||
name = "edge"
|
||||
[server.nat]
|
||||
[pki]
|
||||
ca_cert = "a"
|
||||
cert = "b"
|
||||
key = "c"
|
||||
[tunnel]
|
||||
pool_cidr = "10.7.0.0/24"
|
||||
"#;
|
||||
let cfg = ServerConfigFile::parse(s).expect("parse server.toml with bare [server.nat]");
|
||||
let nat = cfg.server.nat.as_ref().expect("section present");
|
||||
assert!(nat.auto, "v3.6: bare [server.nat] defaults to auto = true");
|
||||
assert!(nat.egress_iface.is_empty(), "egress empty -> runtime auto-detect");
|
||||
assert!(!nat.dry_run);
|
||||
}
|
||||
|
||||
/// v3.6 opt-out: writing `auto = false` explicitly keeps the historical v1/v2 behaviour
|
||||
/// (server does not touch the host NAT). This is the explicit escape hatch for operators
|
||||
/// who have already configured iptables / nftables by hand.
|
||||
#[test]
|
||||
fn server_nat_section_explicit_opt_out() {
|
||||
let s = r#"
|
||||
[server]
|
||||
name = "edge"
|
||||
[server.nat]
|
||||
auto = false
|
||||
[pki]
|
||||
ca_cert = "a"
|
||||
cert = "b"
|
||||
key = "c"
|
||||
[tunnel]
|
||||
pool_cidr = "10.7.0.0/24"
|
||||
"#;
|
||||
let cfg = ServerConfigFile::parse(s).expect("parse server.toml with auto = false");
|
||||
let nat = cfg.server.nat.as_ref().expect("section present");
|
||||
assert!(!nat.auto, "explicit auto = false is honoured");
|
||||
}
|
||||
|
||||
/// v3.2: `[transport.masks] palette = "russian"` parses into [`MaskPalette::Russian`] and
|
||||
|
||||
@@ -124,17 +124,28 @@ pub async fn run(config_path: &Path, admin_socket: &str) -> anyhow::Result<()> {
|
||||
"starting Aura server"
|
||||
);
|
||||
|
||||
// Auto-NAT: when [server.nat] auto = true, enable IP forwarding and add a MASQUERADE rule
|
||||
// for the pool's CIDR through the configured egress interface. The returned guard is bound
|
||||
// to the lifetime of `run()` so its Drop reverts the changes on shutdown / panic. When
|
||||
// [server.nat] is omitted (the v1-compatible path) the operator is expected to have
|
||||
// configured forwarding by hand and no guard is created.
|
||||
// Auto-NAT: enable IP forwarding and a MASQUERADE rule for the pool's CIDR through the
|
||||
// configured (or auto-detected) egress interface. The returned guard is bound to the lifetime
|
||||
// of `run()` so its Drop reverts the changes on shutdown / panic.
|
||||
//
|
||||
// v3.6 changes the historical semantics: the section is now effectively *opt-out* rather than
|
||||
// opt-in. The old "no [server.nat] section means do nothing" path turned out to be the root
|
||||
// cause of "full-VPN mode ping works but external internet is dead" on existing servers —
|
||||
// packets with src = pool IP went out unmasqueraded and the upstream router dropped them on
|
||||
// return (private-address rev path filtering). The new behaviour:
|
||||
//
|
||||
// * [server.nat] explicitly present + auto = true -> apply NAT (with explicit or
|
||||
// auto-detected egress_iface). Same as v2.
|
||||
// * [server.nat] explicitly present + auto = false -> DO NOTHING. The operator opted out.
|
||||
// * [server.nat] omitted entirely on Linux -> implicit auto-NAT: try to apply with
|
||||
// auto-detected egress_iface. If detection fails we DO NOT bail — we log a loud warning
|
||||
// and continue (so safe-mode style clients still get tunnel-internal connectivity), but
|
||||
// full-VPN forward traffic will not work until the operator fixes the host.
|
||||
// * [server.nat] omitted on non-Linux -> v2 behaviour: do nothing.
|
||||
let _nat_guard: Option<NatGuard> = if let Some(nat) = cfg.server.nat.as_ref() {
|
||||
if nat.auto {
|
||||
// v2: if `egress_iface` is not set in the config, fall back to auto-detection of the
|
||||
// host's default-route interface. This makes `[server.nat] auto = true` work on
|
||||
// typical single-NIC hosts without manual configuration. If detection also fails we
|
||||
// fall back to the original hard error so the operator gets a clear message.
|
||||
// Explicit auto-NAT path. If `egress_iface` is empty we still try auto-detection,
|
||||
// matching v3 behaviour.
|
||||
let iface = if nat.egress_iface.trim().is_empty() {
|
||||
match crate::os_routes::detect_default_egress_iface() {
|
||||
Some(iface) => {
|
||||
@@ -155,9 +166,50 @@ pub async fn run(config_path: &Path, admin_socket: &str) -> anyhow::Result<()> {
|
||||
.context("enabling auto-NAT (see [server.nat] in server.toml)")?,
|
||||
)
|
||||
} else {
|
||||
tracing::info!(target: "aura::nat",
|
||||
"[server.nat] auto = false in server.toml; not touching host NAT");
|
||||
None
|
||||
}
|
||||
} else if cfg!(target_os = "linux") {
|
||||
// v3.6 implicit auto-NAT path. Anchored to Linux because the iptables/sysctl plan is
|
||||
// Linux-specific (macOS would need pfctl; we don't ship macOS server in production).
|
||||
match crate::os_routes::detect_default_egress_iface() {
|
||||
Some(iface) => {
|
||||
tracing::info!(target: "aura::nat",
|
||||
iface = %iface,
|
||||
pool = %resolved_pool.cidr,
|
||||
"v3.6 implicit auto-NAT: no [server.nat] section in server.toml — enabling \
|
||||
IPv4 forwarding + MASQUERADE on the host's default egress. Add \
|
||||
`[server.nat]\\nauto = false` to opt out."
|
||||
);
|
||||
match NatGuard::enable(&resolved_pool.cidr.to_string(), &iface, false) {
|
||||
Ok(g) => Some(g),
|
||||
Err(e) => {
|
||||
// Don't bail: the operator might be running as a non-root user that
|
||||
// cannot iptables, or in a container without NET_ADMIN. Tunnel-internal
|
||||
// traffic (pool <-> pool, used by safe-mode clients) still works without
|
||||
// NAT, so we keep the server up and just warn loudly.
|
||||
tracing::error!(target: "aura::nat", error = %e,
|
||||
"v3.6 implicit auto-NAT failed; full-VPN clients will see broken \
|
||||
external internet. Configure forwarding by hand (sysctl + iptables \
|
||||
MASQUERADE) or add [server.nat] auto = true with `egress_iface` set, \
|
||||
then restart the server. See docs/server_nat_fix.md.");
|
||||
None
|
||||
}
|
||||
}
|
||||
}
|
||||
None => {
|
||||
tracing::error!(target: "aura::nat",
|
||||
"v3.6 implicit auto-NAT: could not auto-detect the host's default-route \
|
||||
egress interface; full-VPN clients will NOT get external internet. Add \
|
||||
[server.nat] auto = true with an explicit egress_iface to server.toml, or \
|
||||
configure forwarding by hand. See docs/server_nat_fix.md.");
|
||||
None
|
||||
}
|
||||
}
|
||||
} else {
|
||||
tracing::info!(target: "aura::nat",
|
||||
"[server.nat] absent and not running on Linux; leaving host NAT untouched");
|
||||
None
|
||||
};
|
||||
|
||||
|
||||
+19
-1
@@ -151,7 +151,22 @@ masquerade = true
|
||||
|
||||
#### IP-форвардинг и NAT (для выхода клиентов в интернет)
|
||||
|
||||
В v1 настройка egress на стороне сервера — **обязательный ручной шаг**. На Linux:
|
||||
**v3.6 и новее:** настройка делается **автоматически** при старте `aura server`.
|
||||
Если в `server.toml` есть секция `[server.nat]` с `auto = true` (так пишет
|
||||
`aura server-init`) — сервер сам сделает `sysctl net.ipv4.ip_forward=1` и
|
||||
поставит правило MASQUERADE на нужный интерфейс, а при остановке откатит обе
|
||||
операции. Если секции вообще нет (legacy-конфиг до v2), сервер всё равно
|
||||
попытается включить NAT с автодетектом egress-интерфейса (**implicit auto-NAT**)
|
||||
и громко скажет это в логе.
|
||||
|
||||
Опт-аут — если оператор уже сам управляет фаерволом:
|
||||
|
||||
```toml
|
||||
[server.nat]
|
||||
auto = false
|
||||
```
|
||||
|
||||
**Legacy / ручной путь** (v1 или сценарий с отключённым auto-NAT):
|
||||
|
||||
```bash
|
||||
# 1) Включить IP-форвардинг.
|
||||
@@ -167,6 +182,9 @@ sudo iptables -t nat -A POSTROUTING \
|
||||
|
||||
Подставьте свой `pool_cidr` и имя интернет-интерфейса.
|
||||
|
||||
Подробный сценарий «существующий сервер до v3.6, full-VPN не работает» разобран
|
||||
в [`docs/server_nat_fix.md`](server_nat_fix.md).
|
||||
|
||||
### 2.5. Запуск сервера
|
||||
|
||||
```bash
|
||||
|
||||
@@ -0,0 +1,129 @@
|
||||
# Чиним full-VPN на старом сервере (v3.6)
|
||||
|
||||
Если сервер `aura server` был развёрнут до v3.6 — клиенты в `default = "DIRECT"`
|
||||
работают (пинг `10.7.0.1` идёт), но в `default = "VPN"` весь внешний интернет
|
||||
«гаснет». Корневая причина: на сервере не настроен SNAT/MASQUERADE для пула
|
||||
`10.7.0.0/24`. Пакеты с приватным `src=10.7.0.10` уходят в интернет, а ответы
|
||||
дропаются провайдером (RFC1918 reverse-path filtering).
|
||||
|
||||
В v3.6 у `aura server` появился **implicit auto-NAT**: если в `server.toml` нет
|
||||
секции `[server.nat]`, сервер сам пытается включить `ip_forward = 1` и поставить
|
||||
правило MASQUERADE на интерфейс по умолчанию (с автодетектом). Поэтому **самый
|
||||
простой фикс** — обновить бинарь на сервере и перезапустить.
|
||||
|
||||
Если по каким-то причинам так нельзя (нет рутового доступа на момент апгрейда,
|
||||
нестандартная сеть, контейнер без `NET_ADMIN`, и т.д.) — два альтернативных
|
||||
варианта.
|
||||
|
||||
---
|
||||
|
||||
## Вариант A. Обновить бинарь (рекомендуется)
|
||||
|
||||
С локальной машины (откуда есть `ssh root@187.77.67.17`):
|
||||
|
||||
```bash
|
||||
# Собираем релизный бинарь под целевую архитектуру сервера.
|
||||
cargo build --release -p aura-cli --target x86_64-unknown-linux-gnu
|
||||
|
||||
# Заливаем и подменяем.
|
||||
scp target/x86_64-unknown-linux-gnu/release/aura root@187.77.67.17:/usr/local/bin/aura.new
|
||||
ssh root@187.77.67.17 'systemctl stop aura.service \
|
||||
&& mv /usr/local/bin/aura.new /usr/local/bin/aura \
|
||||
&& systemctl start aura.service \
|
||||
&& systemctl status aura.service --no-pager -n 30'
|
||||
```
|
||||
|
||||
В логе `journalctl -u aura.service -n 30` должна появиться строка вида:
|
||||
|
||||
```
|
||||
INFO aura::nat: v3.6 implicit auto-NAT: no [server.nat] section in server.toml —
|
||||
enabling IPv4 forwarding + MASQUERADE on the host's default egress.
|
||||
iface=eth0 pool=10.7.0.0/24
|
||||
INFO aura::nat: running: sysctl -w net.ipv4.ip_forward=1
|
||||
INFO aura::nat: running: iptables -t nat -A POSTROUTING -s 10.7.0.0/24 -o eth0 -j MASQUERADE
|
||||
INFO aura::nat: auto-NAT applied (linux)
|
||||
```
|
||||
|
||||
Если эти строки на месте — full-VPN на клиенте должен заработать сразу, без
|
||||
правки `client.toml` или `server.toml`.
|
||||
|
||||
---
|
||||
|
||||
## Вариант B. Точечно дописать `[server.nat]` в `server.toml`
|
||||
|
||||
Если апгрейд бинаря пока не делаем, минимальный патч конфига:
|
||||
|
||||
```toml
|
||||
# /etc/aura/server.toml — добавить блок в конец файла
|
||||
[server.nat]
|
||||
auto = true
|
||||
egress_iface = "eth0" # ваш интернет-интерфейс; обычно eth0/ens3/enp1s0
|
||||
dry_run = false
|
||||
```
|
||||
|
||||
Затем `systemctl restart aura.service`. Это работает на v2+ и на v3.6 одинаково.
|
||||
|
||||
Узнать имя интерфейса:
|
||||
|
||||
```bash
|
||||
ip route show default | awk '{print $5; exit}'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Вариант C. Настроить NAT руками без участия Aura
|
||||
|
||||
Если по политике безопасности `aura server` не должен трогать nftables/iptables
|
||||
(например, оператор сам управляет фаерволом), то делаем всё руками **и явно
|
||||
выключаем implicit auto-NAT** через `[server.nat] auto = false`:
|
||||
|
||||
```bash
|
||||
# 1. IP-форвардинг — навсегда.
|
||||
echo 'net.ipv4.ip_forward = 1' | sudo tee /etc/sysctl.d/99-aura.conf
|
||||
sudo sysctl --system
|
||||
|
||||
# 2. MASQUERADE — оператор сам выбирает inframework (iptables/nftables/etc).
|
||||
sudo iptables -t nat -A POSTROUTING -s 10.7.0.0/24 -o eth0 -j MASQUERADE
|
||||
sudo apt-get install -y iptables-persistent && sudo netfilter-persistent save
|
||||
|
||||
# 3. Сказать aura не лезть.
|
||||
cat >> /etc/aura/server.toml <<'EOF'
|
||||
|
||||
[server.nat]
|
||||
auto = false
|
||||
EOF
|
||||
sudo systemctl restart aura.service
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Проверка после фикса
|
||||
|
||||
На клиенте (Mac):
|
||||
|
||||
```bash
|
||||
# 1) Туннель собран? Должно быть 5/5 и RTT ~70 мс.
|
||||
ping -c 5 10.7.0.1
|
||||
|
||||
# 2) Внешний интернет реально через VPN? IP должен быть IP сервера (не Mac'а).
|
||||
curl -sS https://ifconfig.co
|
||||
curl -sS https://ifconfig.co/json | jq .ip,.country
|
||||
|
||||
# 3) DNS отвечает?
|
||||
dig +short cloudflare.com
|
||||
```
|
||||
|
||||
Если `ifconfig.co` возвращает IP сервера (`187.77.67.17` в нашем случае) — full-VPN
|
||||
действительно работает. Если возвращает прежний IP мобильного оператора — что-то
|
||||
ещё не так и стоит смотреть `journalctl -u aura.service -f` на сервере.
|
||||
|
||||
## Откуда вообще проблема
|
||||
|
||||
См. `crates/aura-cli/src/server.rs` (комментарий «Auto-NAT» вокруг проверки
|
||||
`cfg.server.nat`) и `crates/aura-cli/src/nat.rs` (`linux_apply_plan`):
|
||||
до v3.6 секция `[server.nat]` была опт-ин — без неё сервер вообще не
|
||||
трогал host networking, и оператор должен был помнить ручные `sysctl` + `iptables`
|
||||
из `docs/deployment.md §2.4`. Если оператор этого не сделал, single-IP-туннель
|
||||
работал (пинг внутреннего `10.7.0.1` идёт без NAT), но full-VPN — нет.
|
||||
v3.6 переворачивает поведение: NAT теперь опт-аут, что отсекает основную
|
||||
причину «впн не работает» из коробки.
|
||||
Reference in New Issue
Block a user