← projects

nero2

2025-12
  • network
  • Go

Index#

Overview#

nero2 is a high-performance network tunnel designed for all-port NAT forwarding, with reduced CPU usage and approximately 20 MB of additional memory overhead.

Features#

nero2 is designed for high performance with low server resource consumption. Extensive optimizations were made throughout the design, which are introduced in later sections.

As the name implies, this is the second generation. The first generation was built in 2020, featuring TCP Fast Open (TFO) and a basic UDP NAT table. nero2 extends that foundation with all-port NAT forwarding and more advanced algorithms.

Modules

Users have full freedom of configuration. nero2 is built around composable modules — each component can be individually selected and tuned. For those who prefer a ready-to-run setup, a pre-tuned default configuration is provided that delivers solid performance out of the box.

Safety

For data integrity, we integrated Shadowsocks encryption, which provides solid encryption with minimal overhead.

Although Shadowsocks has been criticized for its weak obfuscation and identifiable traffic characteristics, absolute encryption is not our primary concern. Our focus is on high-throughput transmission with optional encryption support. Strong encryption carries a non-trivial resource cost; clients with stricter confidentiality requirements can handle that at the packet level.

Functionality

nero2 supports map projection, which mirrors the IP a client connects to at the entry point onto the real destination — effectively acting as a reverse proxy.

                TProxy                                       map
                  ↓                 tcp | udp | ws            ↓
public --> IP 1 -----¦          (optional shadowsocks)      ¦---> Internal server 1
                     ¦                   ↓                  ¦
public --> IP 2 -----¦---> nero2 client --> nero2 server ---¦---> Internal server 2
                     ¦                                      ¦
public --> IP 3 -----¦                                      ¦---> Internal server 3

It can also operate as a point-to-point network tunnel:

 public ---> nero2 client --> nero2 server ---> destination

Protocol Design#

Transparent Proxy#

On the client side, TProxy rules backed by TC/eBPF mark and intercept designated traffic via nftables, without disrupting other applications or the routing table. The original destination and source IPs are forwarded to the server by prepending headers.

On the server side, a NAT table records the mapping between client-side source IPs and their allocated backend sessions, allowing each flow to be correctly identified and routed.

Private Protocol#

A custom protocol header is prepended to each frame to identify the traffic type and the transport mode in use.

UDP Multiplexing#

nero2 multiplexes UDP over shared stream channels. This significantly reduces the total connection count, lowers server resource usage, and prevents the kernel from being overwhelmed by a large number of simultaneous connections.

ACP (Adaptive Connection Pool)#

For stream transports, we maintain a pool of pre-dialed, idle TCP channels. When a new flow arrives, it immediately picks up a warm connection rather than going through a full SYN handshake.

without ACP:   flow arrives → TCP SYN → SYN-ACK → handshake → payload
with ACP:      flow arrives → handshake → payload
                (connection was already warm in the pool)

Pool sizing is managed by an EWMA-based algorithm that dynamically adjusts capacity — preventing over-allocation during idle periods while ensuring sufficient connections are available under load.

TCP Framed#

Similar to UDP multiplexing, multiple logical TCP sessions are framed and multiplexed over the same tunnel connection rather than opening one tunnel per request.

                ┌──                   ──┐
                │  [session ID] flow A  │
tunnel conn 1 ──┤  [session ID] flow B  ├───┐
                │  [session ID] flow C  │   │
                └──                   ──┘   │       ┌── flow A
                                            ├───────┼── flow B
                ┌──                   ──┐   │       └── flow C
                │  [session ID] flow A  │   │
tunnel conn 2 ──┤  [session ID] flow B  ├───┘
                │  [session ID] flow C  │
                └──                   ──┘

Each frame carries a session identifier, allowing the server to demultiplex back to the correct backend connection.

Transport Layer#

nero2 supports four transport modes. In addition to TCP and UDP, WS and WSS are available. WS is useful for disguising the tunnel channel as ordinary web traffic between client and server.

Performances#

nero2 delivers near-direct throughput with penalty of 1.5%, adding under only 2.5% CPU overhead and approximately 20 MB of memory, in fraction of WireGuard's 29% CPU penalty.

Optional Shadowsocks encryption is included at no meaningful performance cost.

TCP Mode#

Benchmarks were conducted in a real environment using iperf3. For comparison, iperf3 was run both directly and through WireGuard as a tunnel baseline.

Throughput Result Penalty(%)
Direct 7.417 Gbps N/A
WireGuard 0.537 Gbps −92.8%
nero2 TCP plain 7.066 Gbps −4.7%
nero2 TCP plain, with ACP 7.308 Gbps −1.5%
nero2 TCP Shadowsocks, with ACP 7.302 Gbps −1.6%
Server Resources avg. CPU delta. CPU avg. memory delta. memory
Direct 44.63% N/A 326.7 MB N/A
WireGuard 73.78% 29.15% 319.8 MB -6.9 MB
nero2 WS plain 35.65% -8.98% 352.3 MB 25.6 MB
nero2 TCP plain 43.46% -1.17% 351.5 MB 24.8 MB
nero2 TCP plain, with ACP 43.90% -0.73% 343.3 MB 16.6 MB
nero2 TCP Shadowsocks, with ACP 46.98% 2.35% 343.8 MB 17.1 MB

UDP Mode#

Throughput Result Penalty(%)
Direct 0.734 Gbps
WireGuard 0.567 Gbps −22.7%
nero2 UDP plain 0.712 Gbps −3.0%
Server Resources avg. CPU delta. CPU avg. memory delta. memory
Direct 47.77% N/A 315.1 MB N/A
WireGuard 61.10% 13.33% 314.9 MB N/A
nero2 UDP plain 45.79% -1.98% 321.7 MB 6.6 MB