使用阿里云机器，部署 ChatGLM2，并做一些简单的测试。

环境

规则上选择 gn 即为搭载NVIDIA的GPU计算型

操作系统：Ubuntu 20.04 64位

可用区：华东2（上海） M
操作系统：Ubuntu 20.04 64位
配置：16 核 60 GiB GiB ecs.gn7i-c16g1.4xlarge 20 Mbps
付费方式：按量付费
实例规格：ecs.gn7i-c16g1.4xlarge
网络类型：专有网络

# uname -a
Linux iZuf6784qjryroz25m3xqeZ 5.4.0-137-generic #154-Ubuntu SMP Thu Jan 5 17:03:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

环境安全

开启 22 端口的 47.96.60.0/24 118.31.243.0/24
- 根据你登录的 webshell 的提示来自行添加相关网络段
开启服务器的端口，比如 5000 或 8000

安装

git clone https://github.com/THUDM/ChatGLM2-6B.git
cd ChatGLM2-6B
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
- 使用清华镜像进行加速安装

安装模型，（因为我是为了避免网络问题，已经提前下载好了），所以我手动做了处理

$ cd ChatGLM2-6B

# 在当前项目的根目录下，新建相关文件
# https://cloud.tsinghua.edu.cn/d/674208019e314311ab5c/?p=%2Fchatglm2-6b&mode=list
# https://huggingface.co/THUDM/chatglm2-6b
$ mkdir ./THUDM
$ mkdir ./THUDM/chatglm2-6b

# 把我下载好的模型，复制到当前项目的 THUDM/chatglm2-6b 目录下
$ cp ../chatglm2-6b/* ./THUDM/chatglm2-6b

或者手动修改模型的路径

tokenizer = AutoTokenizer.from_pretrained("./xxx/chatglm2-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("./xxx/chatglm2-6b", trust_remote_code=True, device='cuda')

测试

$ python api.py
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 7/7 [00:06<00:00,  1.09it/s]
INFO:     Started server process [9997]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
[2023-07-03 21:15:10] ", prompt:"你是一个专业的物流问题分析工具，将以下用户反馈，抽象为一个6个字内的短语：三天了，从贵阳到黔西都没有到", response:"'"物流缓慢"'"
INFO:     127.0.0.1:35162 - "POST / HTTP/1.1" 200 OK

测试

curl --header "Content-Type: application/json" \
  --request POST \
  --data '{"prompt":"你好"}' \
  http://127.0.0.1:8000

AliyunOS 或 CentOS 安装

镜像：Alibaba Cloud Linux 3.2104 LTS 64位（aliyun_3_x64_20G_alibase_20230516.vhd）
规格：ecs.gn6v-c8g1.2xlarge

下载模型

# 安装 git 和 lfs 工具
#  https://github.com/git-lfs/git-lfs/blob/main/INSTALLING.md
$ yum install git
$ yum install git-lfs

# 跳过下载大文件
$ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/THUDM/chatglm2-6b
$ cd chatglm2-6b
# $ rm -rf pytorch_model*   // 以下下载命令会覆盖模型文件，可不做删除操作

从清华源下载模型（能看到进度比较直观，速度在 10m/s）

curl https://cloud.tsinghua.edu.cn/seafhttp/files/6376078a-a8ef-44df-bf63-c5dc5f9363b8/pytorch_model-00001-of-00007.bin > pytorch_model-00001-of-00007.bin
curl https://cloud.tsinghua.edu.cn/seafhttp/files/f7df9041-4e5d-493c-91d0-828d393f3f15/pytorch_model-00002-of-00007.bin > pytorch_model-00002-of-00007.bin


curl https://cloud.tsinghua.edu.cn/seafhttp/files/efbc8e81-dd3f-436b-8564-99b81f9806a2/pytorch_model-00003-of-00007.bin > pytorch_model-00003-of-00007.bin
curl https://cloud.tsinghua.edu.cn/seafhttp/files/832513b9-84c7-47a1-a23c-c7aa29a6d5c3/pytorch_model-00004-of-00007.bin > pytorch_model-00004-of-00007.bin
curl https://cloud.tsinghua.edu.cn/seafhttp/files/202a6845-bfee-4d89-a9b3-b8a6f653a65a/pytorch_model-00005-of-00007.bin > pytorch_model-00005-of-00007.bin
curl https://cloud.tsinghua.edu.cn/seafhttp/files/40a32844-3bd6-4d4b-a491-3c324a3dc3e7/pytorch_model-00006-of-00007.bin > pytorch_model-00006-of-00007.bin
curl https://cloud.tsinghua.edu.cn/seafhttp/files/e96ecca1-4a64-4702-a51f-9b273b629cc8/pytorch_model-00007-of-00007.bin > pytorch_model-00007-of-00007.bin

curl https://cloud.tsinghua.edu.cn/seafhttp/files/1b2c278a-eaaa-4a98-8606-7564a3d4f79b/tokenizer.model > tokenizer.model



echo "done"

# 注意
# pytorch_model.bin.index.json 貌似没被 clone 下来，需要手动上传一下，否则会出现
# OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory ../chatglm2-6b.

依赖安装

如果遇到在阿里云上出现 4.30.2 找不到的问题，可以先安装更新 python 到最新版本

$ pip install transformers==4.30.2
ERROR: No matching distribution found for transformers==4.30.2

参考 https://linuxstans.com/how-to-install-python-centos/ 安装 python3.11

# 更新工具
$ yum update
$ yum install openssl-devel bzip2-devel libffi-devel
$ yum groupinstall "Development Tools"

# 安装 python3.11
$ wget https://www.python.org/ftp/python/3.11.4/Python-3.11.4.tgz
$ tar -xzf Python-3.11.4.tgz
$ cd Python-3.11.4
$ ./configure --enable-optimizations
$ make altinstall

# 配置镜像源
$ pip3.11 config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple/

测试

将 THUDM/chatglm2-6b 改成相对路径 ./chatglm2-6b （或改成你自己的模型路径）即可，目录结构如下：

$ ll
drwxr-xr-x  3 root root     4096 Jul  4 10:56 auto_install
drwxr-xr-x  3 root root     4096 Jul  4 16:09 chatglm2-6b
drwxr-xr-x  6 root root     4096 Jul  4 16:18 ChatGLM2-6B
drwxr-xr-x 17 1000 1000     4096 Jul  4 15:26 Python-3.11.4
-rw-r--r--  1 root root 26526163 Jul  4 15:13 Python-3.11.4.tgz
-rw-r--r--  1 root root      324 Jul  4 11:47 test.py

最终的测试脚本

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("./chatglm2-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("./chatglm2-6b", trust_remote_code=True, device='cuda')
model = model.eval()
response, history = model.chat(tokenizer, "你好", history=[])
print(response)

然后运行 python3.11 test.py 即可。

运行

修改 api.py 中的 THUDM/chatglm2-6b 改成相对路径 ../chatglm2-6b

$ cd ChatGLM2-6B
$ nohup python3.11 api.py &

镜像安装

手动安装过一次后，可以将实例以镜像的方式存储起来，后续重新启动，直接按镜像启动即可。

通过镜像创建实例后，需要重新装一下驱动。参考：https://help.aliyun.com/document_detail/163824.html?spm=a2c4g.286415.0.i2

配置

8核(vCPU)
32 GiB
5 Mbps
GPU：NVIDIA V100
ecs.gn6v-c8g1.2xlarge

$ wget https://cn.download.nvidia.com/tesla/525.105.17/NVIDIA-Linux-x86_64-525.105.17.run
$ chmod +x NVIDIA-Linux-x86_64-525.105.17.run
$ ./NVIDIA-Linux-x86_64-525.105.17.run
# 后续无脑同意确认即可

查看 GPU 内存

总体 CPU 在 10%，内存 2G，显存 12G 左右。

$ nvidia-smi
Tue Jul  4 21:34:32 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:00:07.0 Off |                    0 |
| N/A   34C    P0    58W / 300W |  12338MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     16148      C   python3.11                      12336MiB |
+-----------------------------------------------------------------------------+