network on Monsoon's Blog

Using GPU accessible VS Code Server on UIUC Delta

Sun, 22 Dec 2024 00:00:00 +0000

Why writing this blog post

Many UIUC students rely on the Delta to access the GPU resources for their research. Delta provides 4 ssh-enabled login nodes, and lots of computing nodes with GPUs. Usually, we must ssh to the login node (by password and DUO 2FA OTP) first, and then use srun to request GPU resources to run our code. However, based on my experience, sometimes we could suffer many problems when using the Delta:

Unstable network connection: Connection is lost frequently when the network is poor. Each time when the VS Code Remote lost connection, you must reenter the password and DUO 2FA OTP (you have to unlock your phone to get the OTP) to reconnect, which is annoying, time-consuming, and distracting.
Broken OnDemand Code Server: Although you can run VS COde Remote on the login nodes by ssh, there’s no GPU for debugging, and the computing nodes are not accessible by ssh. The alternative ways include OnDemand Jupyter Lab and Code Server. But the functions of Jupiter Lab are limited, and the Code Server is broken – When I try to request a Code Server on computing nodes, the system just queues and shows my request has been completed, no running status.

Due to the above problems, debugging GPU programs on Delta are struggling. That’s why I wrote this blog post: by running private Code Server on computing nodes, and deploying a Cloudflare Tunnel reverse proxy, you can say goodbye to these annoying problems.

How to

My solution is based on an observation about the Delta: all login nodes and computing nodes are in a trusted network. There’s no firewalls between them, which means you can access to any ports on the computing nodes from the login nodes.

The main steps of my solution are simple:

Use srun to get a tty on the computing node (e.g., on gpua042 node).
Run a Code Server on the computing node. It will listen on 0.0.0.0:8080.
Reverse proxy gpua042:8080 to any port you have access. There are two approaches:
- Use ssh -L to forward the port to your local machine.
- Use Cloudflare Tunnel to reverse proxy the port to a public domain. This approach is more stable in poor network conditions.

Run Code Server

Download the Code Server binary from the Github repository (e.g., code-server-4.96.2-linux-amd64.tar.gz), and extract it. On the computing node, run:

1
2
3
4
5
6
7
8


cd code-server-4.96.2-linux-amd64/bin

## no auth
./code-server --bind-addr 0.0.0.0:8080 --auth none

## if port is exposed to untrusted network, use password auth
## password can be modified in ~/.config/code-server/config.yaml
./code-server --bind-addr 0.0.0.0:8080

Access Code Server

SSH Port Forwarding

ssh -L can forward a local port to a remote port. Run:

1

ssh -L 127.0.0.1:8080:gpua042:8080 username@login.delta.ncsa.illinois.edu

Then open http://127.0.0.1:8080 in your browser, and enjoy the Code Server!

Cloudflare Tunnel

Cloudflare Tunnel is more stable when your computer suffer from poor network connection. But it requires a domain name.

TODO

All About IPv6 Address Allocation

Sat, 12 Oct 2024 00:00:00 +0000

Preface

IPv4 has only one method of dynamic address allocation, namely DHCP, but IPv6 has two allocation methods, SLAAC and DHCPv6, and DHCPv6 additionally has the PD (Prefix Delegation) extension. These three allocation methods also interact with each other, which makes problems arising during IPv6 allocation far more common than with IPv4. Most tutorials you can find only solve problems superficially, are ambiguous about the underlying technical details, and do not fundamentally clarify the differences between IPv6 and IPv4.

This article aims to start from the relevant fundamental concepts and, in a “teach a man to fish” manner, explain how the three IPv6 address allocation methods work, helping to thoroughly resolve the tricky problems in IPv6 allocation.

IPv6 Fundamental Concepts

LLA (Link-Local Address) and EUI-64

LLA actually already existed in IPv4: when DHCP is not working properly, some operating systems assign a 169.254.0.0/16 address to the network interface for temporary point-to-point communication. But LLA is not important in IPv4, playing only an optional fallback role that appears only when DHCP fails. As a result, the vast majority of people (including the author) did not learn about the existence of LLA until IPv6 became widespread.

IPv6 LLA (fe80::/8) inherits the basic point-to-point communication function of IPv4 LLA, but goes further to take on the important functions of NDP (Neighbor Discovery Protocol) and SLAAC (Stateless Address Autoconfiguration). Understanding it is necessary to understand how SLAAC works.

For example, when two network ports are directly connected with a cable, they each automatically generate an IPv6 LLA, such as fe80::dfc2:d2aa:c86f:171e/64 and fe80::da8f:9d5b:57e3:c6a6/64, and each can ping the other’s LLA. On Linux, the ip -6 route command shows the automatically configured LLA route entry:

1

fe80::/64 dev eth0 proto kernel metric 1024 pref medium

IPv6 LLA is generated from the MAC address using a specific algorithm, namely EUI-64. For example, when the network port’s MAC address is 70:07:12:34:56:78, the generated EUI-64 is 7207:12ff:fe34:5678, and the LLA is fe80:7207:12ff:fe34:5678/64 (EUI-64 with the fe80 prefix prepended). The specific generation process is shown in the figure below:

Generally, routers do not forward traffic for LLA addresses; it is only used for point-to-point communication on the link.

GUA (Global Unicast Address)

IPv6 GUA (2000::/3) can be mapped to the IPv4 concept of a “public IP”. In theory it is globally unique and can be used for communication over the public network. A well-designed network architecture should allow every device to obtain an IPv6 GUA, so as to maximize IPv6’s P2P communication advantage.

Private Addresses

fc00::/7 is defined as the IPv6 private address range, analogous to 10.0.0.0/8, 172.16.0.0/12, and 192.168.0.0/16 in IPv4, used for LAN communication. Unlike LLA, it can be forwarded by routers.

Because IPv6 is designed so that every device worldwide can be assigned a GUA, the role of private addresses in IPv6 is greatly diminished. When it is not possible to assign a GUA to every device (as in some campus network environments), assigning IPv6 private addresses on the internal network can serve as an alternative, allowing internal devices to access IPv6.

Multicast

IPv6 multicast addresses (ff00::/8) are similar to IPv4 multicast addresses (224.0.0.0/4), used for one-to-many communication within a network segment. Both SLAAC and DHCPv6 rely on multicast to work. Commonly used multicast addresses include:

ff02::1: all nodes on the local link;
ff02::2: all routers on the local link.

NDP (Neighbor Discovery Protocol)

NDP works on top of ICMPv6 and is similar to IPv4 ARP. It is used to discover other nodes on the data link layer and their corresponding IPv6 addresses, to determine available routes, and to maintain reachability information about available paths and other active nodes. SLAAC works based on NDP. The message types involved are:

RS (Router Solicitation) and RA (Router Advertisement): used to configure IPv6 addresses and routes;
NS (Neighbor Solicitation) and NA (Neighbor Advertisement): used to find the MAC addresses of other devices on the link.

SLAAC (Stateless Address Autoconfiguration)

SLAAC is the IPv6 address allocation method defined in RFC 4862, and is also the recommended allocation method. In fact, Android only supports SLAAC for IPv6 allocation.

The most notable feature of SLAAC is that it is stateless, i.e. it does not require a centralized server responsible for allocation. Below, the author uses an example to illustrate the SLAAC process.

Suppose the lan0 port on the router is connected to the eth0 port on the host. The LLA of lan0 is fe80::1/64, and the MAC address of eth0 is 70:07:12:34:56:78. At the same time, the router holds the GUA prefix 2001:db8::/64, i.e. all GUAs under this subnet will be routed by the upstream router to this router’s wan port. The SLAAC process is as follows:

eth0 generates the EUI-64 7207:12ff:fe34:5678 and the LLA fe80:7207:12ff:fe34:5678/64 based on its MAC address;
The host performs DAD (Duplicated Address Detection) to ensure the LLA is unique on the local link. This is unrelated to address allocation, so it is omitted here; interested readers can look up the relevant material themselves;
The host sends an RS message via the eth0 LLA. The RS is sent to all routers on the local link using the multicast address ff02::2.
The router replies with an RA message to the eth0 LLA. The RA contains the prefix 2001:db8::/64, the validity period, the MTU, and other information.

The host receives the RA, combines the prefix and the EUI-64 into 2001:db8::7207:12ff:fe34:5678/64, assigns it to eth0, and adds the routing table entries:

1
2


2001:db8::/64 dev eth0 proto ra metric 1024 expires 2591993sec pref medium
default via fe80::1 dev eth0 proto static metric 1024 onlink pref medium

The host performs DAD detection and uses an NA message to announce the use of the new address to neighbors on the link.

SLAAC looks great, but it has an important flaw: it does not support distributing DNS information, so the host must obtain DNS through some other means (usually DHCPv6). There are two flag bits in the RA to address this problem:

M (Managed Address Configuration): address information can be obtained via DHCPv6;
O (Other Configuration): other information (such as DNS) can be obtained via DHCPv6.

The newer RFC 6106 supports distributing DNS information by adding RDNSS (Recursive DNS Server) and DNSSL (DNS Search List) to the RA. For the level of RDNSS support across operating systems, see Comparison of IPv6 support in operating systems. In practice, in the vast majority of cases you only need to configure IPv4 DNS (obtained via DHCPv4), so the RDNSS extension is not very meaningful.

The problem with the EUI-64-based SLAAC address configuration above is that the addresses it generates are fixed and predictable, which brings security and privacy concerns. The IPv6 SLAAC privacy extension defined in RFC 4941 solves this problem. During SLAAC it also generates random, periodically rotated addresses to address the privacy issue. At the same time, the EUI-64-generated address is also retained, for use by externally incoming connections. With the privacy extension enabled, the IPv6 addresses generated on Linux look like the following, for example (from top to bottom: the privacy address, the EUI-64 GUA, and the LLA):

1
2
3
4
5
6
7
8


2: eth0:  mtu 1500 qdisc cake state UP group default qlen 1000
    link/ether 70:07:12:34:56:78 brd ff:ff:ff:ff:ff:ff
    inet6 2001:db8::dead:beef:aaaa:bbbb/64 scope global temporary dynamic
       valid_lft 2591998sec preferred_lft 604798sec
    inet6 2001:db8::7207:12ff:fe34:5678/64 scope global dynamic mngtmpaddr noprefixroute
       valid_lft 2591998sec preferred_lft 604798sec
    inet6 fe80:7207:12ff:fe34:5678/64 scope link
       valid_lft forever preferred_lft forever

DHCPv6

DHCPv6 operates in broadly the same way as DHCPv4: the host sends a multicast message to ff02::1:2 on UDP port 547, and the DHCPv6 server replies with address, DNS, and other information.

The difference is that DHCPv6 can run in either a stateful or a stateless mode, the distinction being whether or not an address is obtained. When used together with SLAAC, the host only needs to obtain DNS and other information from DHCPv6, so stateless DHCPv6 can be used.

DHCPv6 PD (Prefix Delegation)

PD is a DHCPv6 extension defined in RFC 3633. It is used to distribute IPv6 prefixes across a network.

With the PD extension enabled, the DHCP server grants the host the right to use an IPv6 subnet prefix (such as 2001:db8::/56) and adds routing table entries to ensure that all addresses under this subnet are routed to the host that requested the prefix. The host can then further subdivide and allocate this subnet.

A typical use case for DHCPv6 PD is home ISP network access. The home gateway router requests an IPv6 prefix from the ISP DHCP server, and then distributes addresses from this subnet prefix within the home internal network via SLAAC.

Conclusion

This article briefly introduced some of the concepts involved in IPv6 address allocation and explained how SLAAC, DHCPv6, and DHCPv6 PD work. In terms of simplifying address management, IPv6 can be said to have been rather unsuccessful: multiple standards coexist, and there are various combinations of them, which gives clients a non-trivial probability of failing to correctly obtain IPv6.

In practice, the three most common IPv6 allocation scenarios we encounter are:

Pure SLAAC: typical campus networks (education networks) fall into this category. In practice, the author has found cases where a misconfigured host on the internal network indiscriminately sends RAs, causing the IPv6 of all hosts on the entire internal network to be misconfigured. At the same time, in this mode, a router you connect yourself will no longer be able to distribute SLAAC GUAs to downstream devices, because the local-link multicast packets that SLAAC relies on cannot be forwarded by the router (this can be solved via IPv6 bridging or NAT6, which is not elaborated on here).
Pure DHCPv6: some enterprise internal networks use this mode, because DHCPv6 allows centralized management. The biggest problem with this mode is that Android does not support DHCPv6. But under other operating systems, this mode runs fairly stably.
SLAAC + DHCPv6 PD: this is the most common mode for home ISP network access. Most home routers are adapted for it and work out of the box.

References

Extracting Graph Topology from Image

Thu, 11 Jul 2024 00:00:00 +0000

The Problem

Now we have an image representing a graph, as shown in the figure below:

Suppose we already know the category of each pixel: background, node, or edge. How can we extract the graph topology from it and represent the graph by an adjacency matrix?

Challenges in Classical Algorithm

TODO

What about Neural Network?

We can use a simple algorithm to extract the position of each node. Suppose the position of a node is $\mathbf{P}(x,y)$, and there are $N$ nodes in total.

Then, the task is to fill in the $N\times N$ adjacency matrix with $0$ or $1$. As we can see, this can be converted into a binary classification problem.

we can train a neural network $\mathbf{f}$, which takes 3 input: the image $I$, the position of a node pair $\left( \mathbf{P}_ 1, \mathbf{P}_ 2 \right)$. It outputs $O\in\{0,1\}$, indicating whether there is a direct connection between the node pair, i.e.,

$$O=\mathbf{f}(\mathbf{I}, \mathbf{P}_ 1, \mathbf{P}_ 2).$$

The dataset can be synthesized by a simple program, and we can use any classification network (e.g., EfficientNet) as our network architecture.

The problem is how to feed $\left( \mathbf{P}_ 1, \mathbf{P}_ 2 \right)$ into the network. We can add an additional “mask channel” to the image, where the pixels belonging to the two input nodes are marked as 1, and the others as 0. Finally, we input this 4-channel “image” into the network.

Other Notes

TODO

Building WireGuard VPN for Machine Learning Server Cluster

Mon, 29 Jan 2024 00:00:00 +0000

Motivation

A machine learning cluster needs a secure way to expose services to users, as well as to interconnect servers across the public network. For this, a VPN network needs to be deployed.

Deploying a VPN network requires considering the following factors:

Network topology: an appropriate topology must be chosen to minimize latency as much as possible;
User management: it should be easy to add or remove users and to authorize them;
Simplicity of use and maintenance.

Design

Network Topology

The network topology determines the latency.

The lowest-latency option is obviously full-mesh, i.e. every pair of peers has a direct P2P connection. However, the management complexity of this topology is $\mathcal{O}(n^2)$, and adding a new peer requires modifying the configuration files of all other peers. It also has to deal with the problems introduced by NAT, which requires some automated management software. I tried Netmaker and Headscale, but neither of them seemed able to correctly handle the complex network environment within the campus, such as the symmetric NAT used by various enterprise-grade routers, and the probability of successfully establishing P2P was very low.

In the end I chose a topology that combines full-mesh and hub-and-spoke. Since the number of servers and their IPs rarely change, manually configuring a full-mesh network among the servers is feasible. At the same time, a gateway server is provided as the hub for user access, and users only need to establish a connection with the gateway server. Since most users actually use the VPN within the campus, connecting to the on-campus gateway server and forwarding traffic through it does not introduce much additional latency. This structure balances latency and management complexity, and adding/removing and authorizing users only needs to be done on the gateway server.

Protocol Choice

The popular OpenVPN and IPSec are both good enough, but the emerging WireGuard offers unparalleled configuration simplicity. On the server side, WireGuard can define a peer and a route with just a few lines of configuration; on the user side, since WireGuard uses key-pair-based authentication, a single configuration file is enough to join the VPN network, with no need to remember an additional password or perform a login operation.

Management Approach

For the sake of predictability and stability, I chose the manual configuration approach. The full-mesh network among servers does not need to be changed frequently once it is configured. User management, on the other hand, is implemented through a script: when a new user needs to be added, the script generates a key pair and allocates an IP, adds the public key and routing information to the gateway server’s peer list, then generates a configuration file containing the private key and the allocated IP, and sends it to the user.

Example of a user peer configuration on the gateway server:

1
2
3
4
5


[Peer]
PublicKey = 
AllowedIPs = 10.1.x.y/32
AllowedIPs = fd01::x:y/128
PersistentKeepalive = 25

Example of a user’s access configuration file:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


[Interface]
PrivateKey = 
Address = 10.1.x.y/16
Address = fd01::x:y/64

[Peer]
PublicKey = 
AllowedIPs = 10.1.0.0/16  # route all VPN traffic to gateway server
AllowedIPs = fd01::/64
Endpoint = wg.ustcaigroup.xyz:51820  # gateway server is dual stack
# Endpoint = wg.ustcaigroup.xyz:51820  # IPv4
# Endpoint = wg.ustcaigroup.xyz:51820  # IPv6
PersistentKeepalive = 25

Building Proxy Service for Team

Thu, 09 Nov 2023 00:00:00 +0000

This is an unfinished blog.

Preface

Due to Internet censorship in China (known as GFW, Great Firewall, 防火长城), many websites (e.g. Google, Twitter) are blocked, and some websites (e.g. GitHub) suffer connectivity issues. In China, the means to circumvent internet censorship is referred to as 翻墙 (means climbing over the wall).

In China, to freely access the Internet, a proxy is essential. Despite various commercial options available, they may not be suitable for everyone. Therefore, I have constructed a user-friendly and easy-to-maintain proxy system for my research group, as a part of my responsibilities as a system administrator.

Target

Easy to use. Team members only need some simple configurations.The proxy client should be able to automatically update configuration.
Stability.
Sufficient traffic, to download large datasets.
Low Latency, to provide good experience for web.
Low Cost.
Easy to maintain. Frequent maintenance is unacceptable, and only simple changes of the configuration are required for new function.
Concealment. The cat-and-mouse game between GFW and anti-censorship tools has been escalating. Ten years ago (2013), only an OpenVPN client was all your need to “Across the Great Wall and reach every corner in the world”. Now, you must use much more sophisticated solutions to prevent your “unusual” traffic from being detected by GFW. According to GFW Report, popular Shadowsocks (a proxy protocol which simply encrypt all traffic using pre-shared key) was detected and blocked, and the TLS-based proxy also encountered large-scale blocking in Oct 2022. The tools and protocols used must be concealed enough to allow the service to run for a long time.

network on Monsoon's Blog

Using GPU accessible VS Code Server on UIUC Delta

Why writing this blog post

How to

Run Code Server

Access Code Server

SSH Port Forwarding

Cloudflare Tunnel

All About IPv6 Address Allocation

Preface

IPv6 Fundamental Concepts

LLA (Link-Local Address) and EUI-64

GUA (Global Unicast Address)

Private Addresses

Multicast

NDP (Neighbor Discovery Protocol)

SLAAC (Stateless Address Autoconfiguration)

DHCPv6

DHCPv6 PD (Prefix Delegation)

Conclusion

References

Extracting Graph Topology from Image

The Problem

Challenges in Classical Algorithm

What about Neural Network?

Other Notes

Building WireGuard VPN for Machine Learning Server Cluster

Motivation

Design

Network Topology

Protocol Choice

Management Approach

Building Proxy Service for Team

Preface

Target

Available Resources

CERNET

Cloudflare WARP

VPS

Server in USTC

Anti-Censorship Tools

Adopted Solution

Deployment

Problems

Client Initialization

Compatibility

Conclusion