Using GPU accessible VS Code Server on UIUC DeltaEN

Why writing this blog post Many UIUC students rely on the Delta to access the GPU resources for their research. Delta provides 4 ssh-enabled login nodes, and lots of computing nodes with GPUs. Usually, we must ssh to the login node (by password and DUO 2FA OTP) first, and then use srun to request GPU resources to run our code. However, based on my experience, sometimes we could suffer many problems when using the Delta: ...

2024-12-22 · 3 分钟 · Monsoon

关于 IPv6 地址分配的一切

序言 IPv4 只有一种动态地址分配方式,即 DHCP,但 IPv6 就有 SLAAC 和 DHCPv6 两种分配方式,同时 DHCPv6 还存在 PD (Prefix Delegation) 的扩展。这三种分配方式之间又存在交互,使得 IPv6 分配过程中出现的问题远比 IPv4 多。大多数可以搜到的教程只从表面解决了问题,对于其后的技术细节模棱两可,而没有从根本上厘清 IPv6 与 IPv4 的差异, ...

2024-10-12 · 7 分钟 · Monsoon

Extracting Graph Topology from ImageEN

The Problem Now we have an image representing a graph, as shown in the figure below: Suppose we already know the category of each pixel: background, node, or edge. How can we extract the graph topology from it and represent the graph by an adjacency matrix? Challenges in Classical Algorithm TODO What about Neural Network? We can use a simple algorithm to extract the position of each node. Suppose the position of a node is $\mathbf{P}(x,y)$, and there are $N$ nodes in total. ...

2024-07-11 · 2 分钟 · Monsoon

Latency in LLM ServingEN

Preface There have been many excellent works on LLM serving, mainly focusing on improving the throughput. Meanwhile, in practical applications, latency is equally important for LLM serving. However, currently few works focus on improvement of LLM serving latency, especially the latency optimization under SLA constraint. This blog attempts to summarize the basic concepts and problems in this direction, and give some novel research directions based on some analysis of latency in LLM serving. ...

2024-07-07 · 4 分钟 · Monsoon

How Quantization Works: From a Matrix Multiplication PerspectiveEN

Introduction Quantization is a commonly used acceleration technique in NN inference. The primary computational workloads in NNs come from Convolution, Linear Layers, and Attention, which are implemented by GEMM in the lower level. This blog aims to discuss the principles of quantization from the matrix multiplication perspective and to explain why some quantization methods are impractical. It also aims to review several LLM quantization methods from this perspective. I define practical quantization as follows: ...

2024-03-06 · 8 分钟 · Monsoon

NFS Performance Tuning

前言 本文是我在实践中总结出的生产场景下 10 Gbps 网络下的 NFS 性能调优指南,特别是针对大量小文件(Lots of Small Files, LOSF)读写的优化。 调优 硬件 网络硬件方面,带宽和延迟两者都很重要。 ...

2024-02-16 · 4 分钟 · Monsoon

[Paper Reading] ACS: Concurrent Kernel Execution on Irregular, Input-Dependent Computational Graphs (arXiv'24)EN

This blog is a write-up of the paper “ACS: Concurrent Kernel Execution on Irregular, Input-Dependent Computational Graphs” from arXiv'24. Motivation Some workloads (e.g., Simulation Engines for Deep RL, Dynamic DNNs) cannot fully utilize the massive parallelism of GPUs (see Figure 1). The main reason is that these workloads contain lots of small kernels which cannot fully utilize the GPU, and these kernels are not executed concurrently, although most of them are independent and in theory can be executed concurrently. ...

2024-02-07 · 7 分钟 · Monsoon

[Paper Reading] GPUPool: A Holistic Approach to Fine-Grained GPU Sharing in the Cloud (PACT'22)EN

This blog is a write-up of the paper “GPUPool: A Holistic Approach to Fine-Grained GPU Sharing in the Cloud” from PACT'22. Motivation This paper focuses on the GPU sharing in cloud scenarios. Currently, existing GPU sharing techniques can be categorized into 2 types: Time-sharing means executing each concurrent VM on a full device in a round-robin fashion. Pros: Simple and mature. Cons: VMs could still under-utilize the hardware within each time slice. ...

2024-02-07 · 11 分钟 · Monsoon

Building WireGuard VPN for Machine Learning Server Cluster

Motivation 机器学习集群需要一个安全的方式向用户暴露服务,以及跨公网服务器互联,为此需要部署 VPN 网络。 VPN 网络的部署需要考虑如下因素: 网络拓扑:需要选择合适的拓扑结构以尽可能降低延迟; 用户管理:可以方便地进行用户的增减和授权; 使用和维护简单。 Design 网络拓扑 网络拓扑决定着延迟。 ...

2024-01-29 · 2 分钟 · Monsoon

Building Storage System for Machine Learning Server ClusterEN

This is an unfinished blog.

2023-11-24 · 1 分钟 · Monsoon

Ascend 910B 自定义 PyTorch 算子

环境 本文基于的硬件环境为 Ascend 910B3,基于的软件环境包括 CANN 7.0-RC1、PyTorch 1.11.0、Ascend PyTorch Adapter v5.0.rc3-pytorch1.11.0。其他 CANN 和 PyTorch 版本上的情况可能略有不同。 ...

2023-11-14 · 2 分钟 · Monsoon

Building Proxy Service for TeamEN

This is an unfinished blog. Preface Due to Internet censorship in China (known as GFW, Great Firewall, 防火长城), many websites (e.g. Google, Twitter) are blocked, and some websites (e.g. GitHub) suffer connectivity issues. In China, the means to circumvent internet censorship is referred to as 翻墙 (means climbing over the wall). In China, to freely access the Internet, a proxy is essential. Despite various commercial options available, they may not be suitable for everyone. Therefore, I have constructed a user-friendly and easy-to-maintain proxy system for my research group, as a part of my responsibilities as a system administrator. ...

2023-11-09 · 1 分钟 · Monsoon

我的 TOEFL 经验

前言 作为高考以来带给我最大焦虑感的考试,TOEFL 让我 2023 年大部分时间在黑暗中度过,我对其的时间、金钱投入也是最大的。 一开始定下总分 100、口语 20 的目标,中间经历了无数天自信心丧失、被焦虑情绪淹没、口语练到舌头打结,最终在 2023 年 11 月 3 日查询到了满意的成绩。 ...

2023-11-05 · 8 分钟 · Monsoon

Catching Mining VirusEN

Problem On October 30, 2023, I received a warning message from the data center administrator, informing me that the firewall detected mining traffic sending from the server managed by me. The “mining traffic” was a bitcoin.sipa.be DNS request sent to 223.5.5.5. Initially, I thought it was a simple task to find the virus process, just like my previous encounter with another mining virus. In that case, the hacker logged in the server by hacking a weak SSH password, gained root permission possibly by an privilege escalation vulnerability exploitation (it was a server running EOL Ubuntu 16.04). Then a cron job was set up to run a mining virus. ...

2023-11-01 · 2 分钟 · Monsoon

利用 SSH 反向隧道登录 BitaHub 中的容器并长期占用 GPU

问题 每年的 CVPR 前 GPU 总是供不应求,需要从其他地方借卡。USTC 有一个供校内用户使用的 BitaHub,但它同样有 CVPR 前一卡难求的问题,同时基于任务提交的使用模式也非常不方便,提交占用多卡的任务经常需要漫长的排队,数据管理方式更是反人类。 ...

2023-10-20 · 2 分钟 · Monsoon

Nginx 启用 QUIC 并和 SNI 分流共存

问题 Nginx 自从 1.25.0 版本以来对 QUIC 的支持已被合并入 mainline,对于想体验的用户而言可以直接使用官方发布的 nginx docker 镜像,非常方便。 但是我的服务器上的 nginx 使用了 SNI 分流,源于 Shadow TLS 和 Xray Reality 等新一代基于 TLS 的代理协议的需求。这些代理协议并不能由 nginx 代为处理 TLS 层(和之前可以使用 gPRC/WebSocket 等作为数据传输方式的协议不同),但为了实现最好的伪装效果,使用 443/tcp 端口是有必要的(伪装的白名单目标网站一般情况下也只会在 443/tcp 端口开放 HTTPS 服务)。因此 443/tcp 端口的复用是必要的。 ...

2023-09-26 · 2 分钟 · Monsoon

优化 MKL 在 AMD CPU 上的性能

问题 实验室有一些 AMD EPYC 7713 的服务器,采购的原因是组里有一些人的程序有非常高的 CPU 负载(我也不知道是什么负载,为什么不能跑在 GPU 上,我也没有精力去逐个帮助解决),框框多的 AMD 处理器非常适合这种需求。 ...

2023-06-19 · 2 分钟 · Monsoon

VCB-Studio Technical Director Entry Test 2023 and My AnswerEN

See original publication page for more details. All my answer files can be browsed in here, or you can download zipped file (5.9G). Requirements This is a test for candidates who wish to participate in the training class organized by VCB-Studio. Finish as many problems as you can, and then do the following things: Pack your answers, result files, and necessary attachments into a zip/rar/7z file. Source files we provided and intermediate file in your encoding should not be packed in. Register a Baidu Net Disk account (https://pan.baidu.com), upload the zipped file and create a sharing link. Whether you like it or not, Baidu Net Disk has been the most effective way to share files within our team since day one. Other sharing methods will NOT be considered. Send the link via email to [email protected] before Beijing Time (UTC+8) Monday, 23 Jan 2023, 23:59:59. Late submissions will NOT be considered. Prepare a QQ account. The follow-up training courses will be conducted in the QQ group. You should independently complete the answers without any public discussion. Any form of plagiarism will NOT be tolerated. ...

2023-05-25 · 14 分钟 · Monsoon

Hello WorldEN

My first post on blog!

2023-03-29 · 1 分钟 · Monsoon