From 4a9454a59637636d4e87485a13ed00bf82253e3c Mon Sep 17 00:00:00 2001 From: Hanye Date: Tue, 19 Mar 2024 20:17:13 +0800 Subject: [PATCH 1/8] create inference workload framework --- README.en.md | 2 +- README.md | 63 ++--- docs/DEVELOPER_ACCESS_DOC.md | 321 +++++++++++++++++++++++ huawei/pytorch/.gitkeep | 0 nvidia/pytorch/.gitkeep | 0 personal/.gitkeep | 0 tools/patch_tool/README.md | 66 +++++ tools/patch_tool/patch_config_example.sh | 16 ++ tools/patch_tool/patch_tool.sh | 85 ++++++ 9 files changed, 518 insertions(+), 35 deletions(-) create mode 100644 docs/DEVELOPER_ACCESS_DOC.md create mode 100644 huawei/pytorch/.gitkeep create mode 100644 nvidia/pytorch/.gitkeep create mode 100644 personal/.gitkeep create mode 100644 tools/patch_tool/README.md create mode 100644 tools/patch_tool/patch_config_example.sh create mode 100644 tools/patch_tool/patch_tool.sh diff --git a/README.en.md b/README.en.md index 19337fe..b815785 100644 --- a/README.en.md +++ b/README.en.md @@ -1,4 +1,4 @@ -# inference +# training #### Description {**When you're done, you can delete the content in this README and update the file with details for others getting started with your repository**} diff --git a/README.md b/README.md index 3705536..17aa62a 100644 --- a/README.md +++ b/README.md @@ -1,39 +1,34 @@ -# inference - -#### 介绍 -{**以下是 Gitee 平台说明,您可以替换此简介** -Gitee 是 OSCHINA 推出的基于 Git 的代码托管平台(同时支持 SVN)。专为开发者提供稳定、高效、安全的云端软件开发协作平台 -无论是个人、团队、或是企业,都能够用 Gitee 实现代码托管、项目管理、协作开发。企业项目请看 [https://gitee.com/enterprises](https://gitee.com/enterprises)} - -#### 软件架构 -软件架构说明 - - -#### 安装教程 - -1. xxxx -2. xxxx -3. xxxx - -#### 使用说明 - -1. xxxx -2. xxxx -3. xxxx - -#### 参与贡献 - +# AISBench 推理负载仓库 +## 名词定义 +|名词|定义| +| --- | --- | +|被测试设备|参与AI服务器性能测试的设备| +|Stubs通用包|被测试设备上运行的程序包,统一的性能测试启动入口,负责控制实际训练/推理程序的执行、与Tester、互联互通平台进行对接等| +|训练代码|能在被测试设备直接执行特定训练任务的代码| +|训练负载代码|能直接接入Stubs通用包,通过Stubs通用包启动特定训练任务的代码| +## 介绍 +AISBench推理负载仓库,包含了用于接入AISBench Stubs通用包的,来自各企业贡献者与个人贡献者贡献的推理负载代码。 +### 如何将我的训练代码接入AISBench Stubs通用包用于性能测试(如何将推理代码转换为推理负载代码)? +参考[Stubs被测试者接入使用文档](docs/DEVELOPER_ACCESS_DOC.md), **着重关注"业务代码接入Stubs"章节**。 +### 如何将我的训练负载代码贡献到本仓库? +#### 企业贡献者 +如果您是企业贡献者,请您在本仓库先建立企业目录(参考`nvidia/`和`huawei/`),在自己的企业目录中贡献训练负载代码。如果想要在仓库主页添加模型链接,参考"模型链接"章节`nvidia`和`huawei`的格式。 +#### 个人贡献者 +如果您是企业贡献者,请将你的代码贡献在`personal/`目录下。 +## 模型链接 +### huawei +#### pytorch +|model|link| +| --- | --- | + +### nvidia +#### pytorch +|model|link| +| --- | --- | + +## 参与贡献 1. Fork 本仓库 2. 新建 Feat_xxx 分支 3. 提交代码 4. 新建 Pull Request - -#### 特技 - -1. 使用 Readme\_XXX.md 来支持不同的语言,例如 Readme\_en.md, Readme\_zh.md -2. Gitee 官方博客 [blog.gitee.com](https://blog.gitee.com) -3. 你可以 [https://gitee.com/explore](https://gitee.com/explore) 这个地址来了解 Gitee 上的优秀开源项目 -4. [GVP](https://gitee.com/gvp) 全称是 Gitee 最有价值开源项目,是综合评定出的优秀开源项目 -5. Gitee 官方提供的使用手册 [https://gitee.com/help](https://gitee.com/help) -6. Gitee 封面人物是一档用来展示 Gitee 会员风采的栏目 [https://gitee.com/gitee-stars/](https://gitee.com/gitee-stars/) diff --git a/docs/DEVELOPER_ACCESS_DOC.md b/docs/DEVELOPER_ACCESS_DOC.md new file mode 100644 index 0000000..4cf354c --- /dev/null +++ b/docs/DEVELOPER_ACCESS_DOC.md @@ -0,0 +1,321 @@ +# Stubs被测试者接入使用文档 + +# 术语与定义 + +- Tester:测试者运行的程序,负责控制测试过程、维护测试数据信息、接收Stubs程序发送的测试数据等。 +- 测试者:组织和开展测试的机构企业团体和个人。 +- Stubs:被测试者设备上运行的程序,负责实际测试程序的执行、与Tester、互联互通平台进行对接等。 +- 被测试者:需要进行测试的厂商或者团体。 +- 测试服务器系统:测试者维护的,用于保存和管理测试环节的系统。包括测试数据维护、与被测试者的设备进行通信和测试、新测试项的注册和维护。 +- Test_id:测试项目的唯一标识符。用于测试过程与测试服务器系统的认证。 +- Loadgen负载生成器:推理测试使用,控制推理作业到达模式。 +- 业务代码:开发者原始的训练或推理任务代码。 +- 负载代码:运行在被测试者设备上,运行实际在业务代码中嵌入了logging打点接口的代码。 + +# Stubs介绍 + +本文主要介绍被测试者获取到Stubs软件包后,如何使用Stubs进行人工智能服务器系统的性能测试。支持无需联网直接对服务器进行**轻量化离线测试**和通过联网与测试者运行的Tester对接进行**在线测试**。 + +## Stubs程序包 + +Stubs为客户端主控模块,负责与Tester端通讯。Stubs程序包由平台提供给被测试者(测试厂商等),名称为`Ais-Benchmark-Stubs--.tar.gz`,其中`arch`是cpu架构、`version`是版本号。 + +Stubs程序包解压目录如下: + +```bash +root@ubuntu:/home/tool/Ais-Benchmark-Stubs-# tree +├── ais-bench-stubs # Stubs主程序,负责流程控制、通信与数据管理等 +├── code # 业务代码目录,运行期间只读,ais-bench-stubs会对code目录进行监控,测试期间若测试用户有操作行为,将记录并上报tester服务端,影响测试成绩判定 +│ └── benchmark.sh # 入口脚本,会被ais-bench-stubs调用,调用业务代码,被测试者需要通过编写该脚本,对接运行的训练和推理脚本 +├── config +│ ├── config.json # 测试配置文件,包含tester服务器信息、testerId等信息 +│ └── system.json # 被测试环境系统基本信息json文件,被测试者自行上传,比如硬件信息等 +├── dependencies # stubs的依赖组件 +│ ├── cluster # 分布式运行组件 +│ │ ├── ais_bench_cluster--py3-none-linux_.whl +│ │ ├── README.md +│ ├── loadgen # 负载生成器LoadGenerator模块是AISBench推理任务的必备的控制套件,负责控制被测试者负载代码的执行,并统计负载代码执行的各过程的性能数据。会根据不同的设置与参数,对被测试者负载执行不同的分发策略,以满足不同场景下的测试要求 +│ │ ├── include # LoadGenerator模块的C/C++接口头文件 +│ │ │ ├── c_api.h +│ │ │ ├── loadgenerator.h +│ │ │ ├── query_sample_lib.h +│ │ │ ├── settings.h +│ │ │ ├── system_under_test.h +│ │ │ └── utils.h +│ │ ├── libs # LoadGenerator模块lib +│ │ │ └── libloadgen.so +│ │ └── loadgen----.whl # LoadGenerator模块python接口需import的loadgen包的安装包,通过pip install 安装loadgen包 +│ └── logging # 测试结果传输模块 +│ ├── ais_bench_logging--py3-none-linux_.whl # 打点入口组件,设置相关业务运行参数并反馈测试结果 +│ └── README.md # logging指导文档 +├── log # 测试log日志。建议无需上传的日志文件,另建目录存放 +├── result # 测试结果文件。建议无需上传的结果文件,另建目录存放 +└── STUBS_PACKAGE_INTRO.md # Stubs被测试者接入使用文档 +``` + +Stubs程序包目录下有code、log、result三个受控目录,每次测试会将三个目录打包上传,不上传的文件,建议不要放在这三个目录。 + +benchmark.sh、config.json、system.json需要由被测试者配置,详见“**被测试者适配指导**”。 + +## 测试要求 + +- Stubs通用层软件运行在被测试者设备侧,为C++编译的二进制文件。需要安装g++。编译依赖的库包括pthread、dl等通用库。 +- 请将被测试者负载代码统一放在`Ais-Benchmark-Stubs-/code`目录下。 +- 请勿删除`Ais-Benchmark-Stubs-`目录下的code、log、result受控目录。 +- 请勿将无关文件放入`Ais-Benchmark-Stubs-/log`日志目录,有自定义日志需求的,另建目录存放。 +- 请勿将无关文件放入`Ais-Benchmark-Stubs-/result`目录,有自定义结果文件需求的,另建目录存放。 + +# 被测试者适配指导 + +当前Stubs工具支持无需联网直接对服务器进行**轻量化离线测试**和通过联网与测试者运行的Tester对接进行**在线测试**。根据实际需求选择其中一种方式对接即可。 + +## 对接流程(轻量化离线测试) + +1. 被测试者在被测试服务器上**准备业务代码相关资源**。 + + 1. **准备数据集** + 2. **下载模型** + 3. **准备业务代码** + +2. 被测试者从标准院门户网站**下载并解压Stub程序包**。 + +3. 被测试者**配置与Tester相关的配置文件**。 + + 1. 根据测试服务器实际属性配置(**system.json**) + 2. 修改配置文件(**config.json**) + +4. **对接Stub与业务代码**。 + + 1. 实现benchmark运行所需要的**benchmark.sh文件** + 2. **实现推理或训练脚本** + +5. 被测试者执行ais-bench-stubs test,开始测试。 + + 测试成功,打印相关提示,测试完毕,程序执行终止。 + +## 对接流程(在线测试) + +1. 被测试者线下报名,与测试者签订测试合同,同时确定相关测试信息参数。比如测试模型、测试场景、测试设备类型等信息。 + +2. 测试者登录测试服务器系统,进行注册Test_id。填入相关测试信息,完成注册。保存Test_id并登录,生成相关的配置文件config.json。 测试者将测试程序基准包、测试配置文件发送给被测试者。 + +3. 被测试者在被测试服务器上**准备业务代码相关资源**。 + + 1. **准备数据集** + 2. **下载模型** + 3. **准备业务代码** + +4. 被测试者从标准院门户网站**下载并解压Stub程序包**。 + +5. 被测试者**配置与Tester相关的配置文件**。 + + 1. 根据测试服务器实际属性配置(**system.json**) + 2. 修改配置文件(**config.json**) + +6. **对接Stub与业务代码**。 + + 1. 实现benchmark运行所需要的**benchmark.sh文件** + 2. **实现推理或训练脚本** + +7. 被测试者执行ais-bench-stubs,开始测试。 + + 测试成功,打印相关提示,测试完毕,程序执行终止。 + +## 准备业务代码相关资源 + +### 准备数据集 + +参考《人工智能 服务器系统性能测试规范》,以resnet50 v1.5的训练和推理为例,此测试场景需要获取imagenet数据集,官方网站在http://www.image-net.org/challenges/LSVRC/2012/ + +请下载: + +- ILSVRC2012_img_train.tar 138G 训练集 +- ILSVRC2012_img_val.tar 6.28G 测试集 + +**推理场景**只需要下载ILSVRC2012_img_val.tar即可。 + +### 下载模型 + +以下模型仅推理场景需要准备,提供tensorflow和onnx模型,可根据实际场景选择。 + +| model | framework | accuracy | dataset | model link | model source | precision | notes | +| ------------- | ---------- | -------- | ----------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | --------- | ------------------------------------------------------------ | +| resnet50-v1.5 | tensorflow | 76.456% | imagenet2012 validation | [from zenodo](https://zenodo.org/record/2535873/files/resnet50_v1.pb) | [tensorflow](https://github.com/tensorflow/models/tree/master/official/resnet) | fp32 | NHWC. More information on resnet50 v1.5 can be found [here](https://github.com/tensorflow/models/tree/master/official/resnet). | +| resnet50-v1.5 | onnx | 76.456% | imagenet2012 validation | [from zenodo](https://zenodo.org/record/2592612/files/resnet50_v1.onnx) | [from zenodo](https://zenodo.org/record/2535873/files/resnet50_v1.pb) | fp32 | NCHW, tested on and onnxruntime | + +### 准备业务代码 + +业务代码需要被测试者准备好,**需要保证业务代码使用准备的数据集可以在被测试者的设备上正常运行**。 + + +## 下载并解压Stub包 + +下载Stub包后执行如下命令解压。 + +``` +root@ubuntu:/home/tool# tar xvf Ais-Benchmark-Stubs--.tar.gz +``` + +解压后的目录结构请参见“**Stub程序包**”。 + +## 配置与Tester相关的配置文件 +### 配置system.json + +system.json是Stub程序运行的基本配置文件,一般包括运行环境等信息,请根据测试环境进行配置。 + +system.json文件内容示例如下: +```json +{ + "os_info":"Not selected", + "nums_of_nodes": "Not selected", + "node_info": "Not selected", + "bandwidth_between_nodes": "Not selected", + "topology": "Not selected", + "comm_protocol_info": "Not selected", + "ML_framework_identifier": "Not selected", + "use_virtual" : "Not selected", + "virtual_comp_info": "Not selected", + "minibatch_size_changeable": "Not selected", + "minibatch_value": "Not selected", + "optimizer": "Not selected", + "use_mixed_precision": "Not selected", + "use_automl": "Not selected", + "use_para_training": "Not selected", + "is_param_update_async": "Not selected", + "use_sparse": "Not selected", + "use_quantization": "Not selected" +} + +``` + +### 配置config.json + +config.json是Stub程序运行的测试配置文件,每次测试均需根据实际情况进行配置。详细数据从测试者提供的config.json文件获取。 + +文件config.json内如如下: + +```bash +{ + "testid": "xxxxxxxx-xxxxxx", + "Mode": "inference", + "Model": "resnet50_v1.5", + "Divsion": "close", + "Scenario": "generic", + "test_object_type": "single", + "tester_server_ip": "127.0.0.1", + "tester_server_port": "10002", + "cert_path": "xxx/xxxx/xx.cert", + "hlhtHttps": "https://hlht.xixineis.com/api/tools/result/callback" +} +``` + + +配置项说明: + +| 分类 | 说明 | +| ------------------ | ------------------------------------------------ | +| testid | 测试ID,唯一标识。互联互通平台场景不需要配置。 | +| Mode | 测试分类,可取值"training"、“inference"。 | +| Model | 测试模型名称。 | +| Divsion | 测试模式,可取值"open"、“close"。 | +| Scenario | 测试场景,可取值"generic"、“specific"。 | +| test_object_type | 测试对象类型,可取值"single"、"cluster"、"hpc"。 | +| tester_server_ip | 测试服务器IP。 | +| tester_server_port | 测试服务器的服务端口。 | +|cert_path|来自tester的自签名证书(测试者通过其他方式将证书给到被测试者)。| +|hlhtHttps|互联互通平台的官网,不推荐修改。| + +## 业务代码接入Stubs +业务代码接入Stubs主要分为两步: +1. 被测试者需要实现Stubs程序包中的`benchmark.sh`脚本,通过`benchmark.sh`脚本拉起测试运行的业务代码。 +2. 在业务代码中,需要嵌入logging模块的打点函数,统计准确率、耗时、能耗等信息。 +3. (可选)在负载仓库(AISBench/training 或 AISBench/inference)中需要保存的是: + - 实现了的`benchmark.sh`脚本 + - 打点后的业务代码(如果只想在负载仓库保留嵌入打点的差异,可以使用[patch工具](../tools/patch_tool/README.md)生成和导入差异文件) + +### 实现benchmark.sh + +Stub程序执行调用benchmark.sh,因此测试方训练或推理时需要实现该脚本。 + +示例benchmark.sh内容如下: + +```bash +#!/bin/bash + +currentDir=$(cd "$(dirname "$0")";pwd) +logSaveDir=xxx # 设置测试结果保存路径 +export PYTHONPATH=$currentDir:$PYTHONPATH + +# 该函数需要被测试者添加训练或推理脚本,用以将推理或训练的测试结果上传。 +function exec_infer() { + python3 $currentDir/infer.py + python3 −c "from ais_bench.logging import collect_report; collect_report(′inference′, [′$logSaveDir'])" +} + +function main() { + exec_infer +} + +main "$@" +exit $? +``` + +- 上述代码中调用logging模块进行打点,详见logging模块的使用指导。 +- 上传结果的代码实现可以在benchmark.sh中实现,也可以在训练和推理的脚本中实现,未做特殊约定。 + +### 在业务代码中嵌入logging打点接口 + +业务代码示例infer.py的代码如下: + +```python +from ais_bench.logging import * +from time import sleep +import os + +current_dir = os.path.dirname(__file__) +if __name__ == "__main__": + init("inference", current_dir) + + all_samples_num = 10000 # 之后训练流程处理的总samples数 + # ====load dataset==== + start("preprocess", 1) + sleep(2) + end("preprocess", 1) + + # ====run train==== + start("infer", 1000) + sleep(5) + end("infer", 1000) + + accuracy = 0.83 + event("accuracy", accuracy) # 设置正确率的点事件 + event("result", "OK") # 设置结果的点事件 + finish() # 结束打点接口 +``` + +训练或推理生成的日志和结果文件,平台需要的日志文件请放在`Ais-Benchmark-Stubs-/log`目录,结果文件请放在`Ais-Benchmark-Stubs-/result`目录,不需要的部分请自行安排目录存放。 + +## 测试执行 + +测试方做好测试准备后,被测试者可以在被测试服务器上执行以下步骤,进行测试: + +- 在线测试 + + ``` + ./Ais-Benchmark-Stubs-/ais-bench-stubs + ``` + + 全量进行指定模型的推理/训练测试,完成测试后,将测试结果数据上传至测试服务器系统。 + +- 轻量化离线测试 + + ``` + ./Ais-Benchmark-Stubs-/ais-bench-stubs test + ``` + + 进行专门testcase测试,完成测试后,直接打印测试结果。 + +# 参考文献 + +T/CESA 1169-2021 信息技术 人工智能服务器系统性能测试规范 +IEEE Std 2937-2022 IEEE Standard for Performance Benchmarking for Artificial Intelligence Server Systems \ No newline at end of file diff --git a/huawei/pytorch/.gitkeep b/huawei/pytorch/.gitkeep new file mode 100644 index 0000000..e69de29 diff --git a/nvidia/pytorch/.gitkeep b/nvidia/pytorch/.gitkeep new file mode 100644 index 0000000..e69de29 diff --git a/personal/.gitkeep b/personal/.gitkeep new file mode 100644 index 0000000..e69de29 diff --git a/tools/patch_tool/README.md b/tools/patch_tool/README.md new file mode 100644 index 0000000..5090279 --- /dev/null +++ b/tools/patch_tool/README.md @@ -0,0 +1,66 @@ +# patch shell脚本工具使用说明 +## 工具使用背景 +如果开发者因为种种原因(例如原始的训练代码项目过于庞大)不希望将整个嵌入AISBench logging打点接口的训练代码项目的源码放在`AISBench/training`仓库中,可以考虑仅将嵌入打点接口造成的与原始的训练代码的差异提交到`AISBench/training`仓库中(patch文件)。需要使用嵌入了打点接口的训练代码时,需要将差异文件patch文件导入原始的训练代码即可。 +## 工具使用方法 +### 准备参数配置文件 +配置文件`xxx.sh`的内容参考[patch_config_example.sh](patch_config_example.sh)的内容: +```shell +#!/bin/bash +# 代码的git远程仓库信息 +git_url="https://github.com/organization/repo.git" +branch="master" +commit_id="c0f478fc517b1daec896f5c72bcea10b2ab83bd4" +base_code_subdir="repo/xxx" # git远程仓库中的代码路径,如果要用仓库的全部代码,就直接填repo名 + +# 生成 .patch文件所需信息(makepatch) +changed_code_dir="./code/" # 基于git远程仓库原始代码做过进一步修改(嵌入AISBench的打点接口)的代码 +dir_to_save_patch_file="./xxx/xxx/xx" # 保存生成的.patch文件的文件夹路径 +patch_file_name="example" # 生成的patch文件名(不带文件后缀) + +# 由 .patch文件修改git远程仓库拉取的原始代码所需信息(applypatch) +result_code_dir="./code" # 基于.patch文件将git远程仓库原始代码修改后保存的文件夹路径 +patch_file_path="./xx/xx/xxx.patch" # 传入的.patch文件的路径 +``` + +### patch工具使用方式1:环境变量指定配置文件路径 +使用前需要声明环境变量`AISBENCH_TRAIN_PATCH_CONFIG_PATH`指定配置文件路径。 +```bash +export AISBENCH_TRAIN_PATCH_CONFIG_PATH= +``` + +#### 生成差异文件xxxx.patch +运行patch_tool.sh: +```bash +bash patch_tool.sh makepatch +``` +执行成功后,会在`$dir_to_save_patch_file`目录下生成名为`$patch_file_name.patch`的差异文件 + +#### 用差异文件嵌入式修改原始代码 +运行patch_tool.sh +```bash +bash patch_tool.sh applypatch +``` +执行成功后会将原始代码仓库中`$base_code_subdir`路径下的代码嵌入修改,嵌入修改后的代码保存在`$result_code_dir`路径下。 + +### patch工具使用方式2:shell脚本中引用配置文件 +#### 生成差异文件xxxx.patch +写一个简单的shell脚本为例 +```shell +#!/bin/bash +my_config_file_path=/home/xxx/xx/config.sh # 配置文件的路径,最好写绝对路径 +. $my_config_file_path +patch_tool_path=/home/xxx/xx/training/tools/patch_tool/patch_tool.sh # patch_tool脚本的路径 +bash $patch_tool_path makepatch +``` +此shell脚本执行成功后,会在`$dir_to_save_patch_file`目录下生成名为`$patch_file_name.patch`的差异文件 + +#### 用差异文件嵌入式修改原始代码 +写一个简单的shell脚本为例 +```shell +#!/bin/bash +my_config_file_path=/home/xxx/xx/config.sh # 配置文件的路径,最好写绝对路径 +. $my_config_file_path +patch_tool_path=/home/xxx/xx/training/tools/patch_tool/patch_tool.sh # patch_tool脚本的路径 +bash $patch_tool_path makepatch # 调用基于patch文件修改原始代码的函数 +``` +此shell脚本执行成功后,会将原始代码仓库中`$base_code_subdir`路径下的代码嵌入修改,嵌入修改后的代码保存在`$result_code_dir`路径下。 \ No newline at end of file diff --git a/tools/patch_tool/patch_config_example.sh b/tools/patch_tool/patch_config_example.sh new file mode 100644 index 0000000..e3399ef --- /dev/null +++ b/tools/patch_tool/patch_config_example.sh @@ -0,0 +1,16 @@ +#!/bin/bash +# 代码的git远程仓库信息 +export git_url="https://github.com/organization/repo.git" +export branch="master" +export commit_id="c0f478fc517b1daec896f5c72bcea10b2ab83bd4" +export base_code_subdir="repo/xxx" # git远程仓库中的代码路径,如果要用仓库的全部代码,就直接填repo名 + +# 生成 .patch文件所需信息(makepatch.sh) +export changed_code_dir="./code/" # 基于git远程仓库原始代码做过进一步修改(嵌入AISBench的打点接口)的代码 +export dir_to_save_patch_file="./xxx/xxx/xx" # 保存生成的.patch文件的文件夹路径 +export patch_file_name="example" # 生成的patch文件名(不带文件后缀) + +# 由 .patch文件修改git远程仓库拉取的原始代码所需信息(applypatch) +export result_code_dir="./code" # 基于.patch文件将git远程仓库原始代码修改后保存的文件夹路径 +export patch_file_path="./xx/xx/xxx.patch" # 传入的.patch文件的路径 + diff --git a/tools/patch_tool/patch_tool.sh b/tools/patch_tool/patch_tool.sh new file mode 100644 index 0000000..0c52509 --- /dev/null +++ b/tools/patch_tool/patch_tool.sh @@ -0,0 +1,85 @@ +#!/bin/bash +declare -i ret_ok=0 +declare -i ret_error=1 + +if [ "$AISBENCH_TRAIN_PATCH_CONFIG_PATH" != "" ];then + . $AISBENCH_TRAIN_PATCH_CONFIG_PATH # try import patch config file +fi +CUR_PATH=$PWD + +get_base_code_by_git(){ + git clone $git_url -b $branch || { echo "warn git clone failed"; return $ret_error; } + code_dir=${base_code_subdir%%/*} + cd ${code_dir} + git reset --hard $commit_id || { echo "warn git reset failed"; return $ret_error; } + cd - +} + +make_patch_file(){ + # check necessary attributes exist + [ -z $git_url ] && { echo "args git_url not exist";return $ret_error; } + [ -z $branch ] && { echo "args branch not exist";return $ret_error; } + [ -z $commit_id ] && { echo "args commit_id not exist";return $ret_error; } + [ -z $base_code_subdir ] && { echo "args base_code_subdir not exist";return $ret_error; } + [ -z $changed_code_dir ] && { echo "args changed_code_dir not exist";return $ret_error; } + [ -z $dir_to_save_patch_file ] && { echo "args dir_to_save_patch_file not exist";return $ret_error; } + [ -z $patch_file_name ] && { echo "args patch_file_name not exist";return $ret_error; } + + # make patch + cd $BUILD_TMP_PATH + get_base_code_by_git || { echo "warn git getcode failed"; return $ret_error; } + eval "cp -rf $base_code_subdir $BUILD_TMP_PATH/origin" + eval "cp -rf $changed_code_dir $BUILD_TMP_PATH/code" + diff -Nur --exclude='*.git*' origin code > $BUILD_TMP_PATH/$patch_file_name.patch + if [ ! -d $dir_to_save_patch_file ];then + mkdir -p $dir_to_save_patch_file + fi + cd .. + eval "cp $BUILD_TMP_PATH/$patch_file_name.patch $dir_to_save_patch_file/" +} + +apply_changes_to_code(){ + # check necessary attributes exist + [ -z $git_url ] && { echo "args git_url not exist";return $ret_error; } + [ -z $branch ] && { echo "args branch not exist";return $ret_error; } + [ -z $commit_id ] && { echo "args commit_id not exist";return $ret_error; } + [ -z $base_code_subdir ] && { echo "args base_code_subdir not exist";return $ret_error; } + [ -z $result_code_dir ] && { echo "args result_code_dir not exist";return $ret_error; } + [ -z $patch_file_path ] && { echo "args patch_file_path not exist";return $ret_error; } + abs_patch_file_path=$(realpath "$patch_file_path") + cd $BUILD_TMP_PATH + get_base_code_by_git || { echo "warn git getcode failed"; return 1; } + cp $base_code_subdir -rf $BUILD_TMP_PATH/origin + cp $base_code_subdir -rf $BUILD_TMP_PATH/code + + if [ -f $abs_patch_file_path ];then + patch -p0 < $abs_patch_file_path || { echo "warn patch pfile failed"; return $ret_error; } + else + echo "can not find patch file: $abs_patch_file_path" + return $ret_error + fi + cd .. + [ ! -d $result_code_dir ] || rm -rf $result_code_dir + mkdir $result_code_dir + cp $BUILD_TMP_PATH/code/* -rf $result_code_dir/ + return $ret_ok +} + +main(){ + run_mode=$1 + BUILD_TMP_PATH=$CUR_PATH/buildtmp + [ ! -d $BUILD_TMP_PATH ] || rm -rf $BUILD_TMP_PATH + mkdir -p $BUILD_TMP_PATH + if [ "$run_mode" == "makepatch" ];then + make_patch_file || { echo "warn make patch failed"; return $ret_error; } + elif [ "$run_mode" == "applypatch" ];then + apply_changes_to_code || { echo "warn make patch failed"; return $ret_error; } + else + echo "null run mode" + return $ret_error + fi + return $ret_ok +} + +main "$@" +exit $? -- Gitee From 8cf14ee63966463b225a16672eda8437ae6a0a16 Mon Sep 17 00:00:00 2001 From: Hanye Date: Tue, 19 Mar 2024 20:19:47 +0800 Subject: [PATCH 2/8] create inference workload framework --- README.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 17aa62a..f73e042 100644 --- a/README.md +++ b/README.md @@ -4,15 +4,15 @@ | --- | --- | |被测试设备|参与AI服务器性能测试的设备| |Stubs通用包|被测试设备上运行的程序包,统一的性能测试启动入口,负责控制实际训练/推理程序的执行、与Tester、互联互通平台进行对接等| -|训练代码|能在被测试设备直接执行特定训练任务的代码| -|训练负载代码|能直接接入Stubs通用包,通过Stubs通用包启动特定训练任务的代码| +|推理代码|能在被测试设备直接执行特定推理任务的代码| +|推理负载代码|能直接接入Stubs通用包,通过Stubs通用包启动特定推理任务的代码| ## 介绍 AISBench推理负载仓库,包含了用于接入AISBench Stubs通用包的,来自各企业贡献者与个人贡献者贡献的推理负载代码。 -### 如何将我的训练代码接入AISBench Stubs通用包用于性能测试(如何将推理代码转换为推理负载代码)? +### 如何将我的推理代码接入AISBench Stubs通用包用于性能测试(如何将推理代码转换为推理负载代码)? 参考[Stubs被测试者接入使用文档](docs/DEVELOPER_ACCESS_DOC.md), **着重关注"业务代码接入Stubs"章节**。 -### 如何将我的训练负载代码贡献到本仓库? +### 如何将我的推理负载代码贡献到本仓库? #### 企业贡献者 -如果您是企业贡献者,请您在本仓库先建立企业目录(参考`nvidia/`和`huawei/`),在自己的企业目录中贡献训练负载代码。如果想要在仓库主页添加模型链接,参考"模型链接"章节`nvidia`和`huawei`的格式。 +如果您是企业贡献者,请您在本仓库先建立企业目录(参考`nvidia/`和`huawei/`),在自己的企业目录中贡献推理负载代码。如果想要在仓库主页添加模型链接,参考"模型链接"章节`nvidia`和`huawei`的格式。 #### 个人贡献者 如果您是企业贡献者,请将你的代码贡献在`personal/`目录下。 ## 模型链接 -- Gitee From 4cb4c3dc602f6b3991786b3408873e0ac99daf3c Mon Sep 17 00:00:00 2001 From: Hanye Date: Tue, 19 Mar 2024 20:27:28 +0800 Subject: [PATCH 3/8] create inference workload framework --- README.en.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.en.md b/README.en.md index b815785..0f3660c 100644 --- a/README.en.md +++ b/README.en.md @@ -1,4 +1,4 @@ -# training +# inference #### Description {**When you're done, you can delete the content in this README and update the file with details for others getting started with your repository**} -- Gitee From 83ad2e3cbec0dd738a45ced9c3674c9cb156939e Mon Sep 17 00:00:00 2001 From: Hanye Date: Mon, 6 May 2024 09:15:34 +0800 Subject: [PATCH 4/8] readme fix --- tools/infer_tool/MANIFEST.in | 2 + tools/infer_tool/README.md | 767 +++++++++++++++ tools/infer_tool/__init__.py | 16 + tools/infer_tool/ais_bench/__init__.py | 0 tools/infer_tool/ais_bench/__main__.py | 18 + tools/infer_tool/ais_bench/infer/__init__.py | 0 tools/infer_tool/ais_bench/infer/__main__.py | 281 ++++++ .../ais_bench/infer/args_adapter.py | 96 ++ .../infer_tool/ais_bench/infer/args_check.py | 194 ++++ .../ais_bench/infer/backends/__init__.py | 30 + .../ais_bench/infer/backends/backend.py | 123 +++ .../infer/backends/backend_trtexec.py | 154 +++ .../ais_bench/infer/common/__init__.py | 0 .../ais_bench/infer/common/io_operations.py | 339 +++++++ .../ais_bench/infer/common/miscellaneous.py | 276 ++++++ .../infer/common/path_security_check.py | 293 ++++++ .../ais_bench/infer/common/utils.py | 274 ++++++ .../ais_bench/infer/infer_process.py | 753 +++++++++++++++ tools/infer_tool/ais_bench/infer/interface.py | 889 ++++++++++++++++++ tools/infer_tool/ais_bench/infer/registry.py | 103 ++ tools/infer_tool/ais_bench/infer/summary.py | 229 +++++ tools/infer_tool/requirements.txt | 3 + tools/infer_tool/setup.py | 51 + 23 files changed, 4891 insertions(+) create mode 100644 tools/infer_tool/MANIFEST.in create mode 100644 tools/infer_tool/README.md create mode 100644 tools/infer_tool/__init__.py create mode 100644 tools/infer_tool/ais_bench/__init__.py create mode 100644 tools/infer_tool/ais_bench/__main__.py create mode 100644 tools/infer_tool/ais_bench/infer/__init__.py create mode 100644 tools/infer_tool/ais_bench/infer/__main__.py create mode 100644 tools/infer_tool/ais_bench/infer/args_adapter.py create mode 100644 tools/infer_tool/ais_bench/infer/args_check.py create mode 100644 tools/infer_tool/ais_bench/infer/backends/__init__.py create mode 100644 tools/infer_tool/ais_bench/infer/backends/backend.py create mode 100644 tools/infer_tool/ais_bench/infer/backends/backend_trtexec.py create mode 100644 tools/infer_tool/ais_bench/infer/common/__init__.py create mode 100644 tools/infer_tool/ais_bench/infer/common/io_operations.py create mode 100644 tools/infer_tool/ais_bench/infer/common/miscellaneous.py create mode 100644 tools/infer_tool/ais_bench/infer/common/path_security_check.py create mode 100644 tools/infer_tool/ais_bench/infer/common/utils.py create mode 100644 tools/infer_tool/ais_bench/infer/infer_process.py create mode 100644 tools/infer_tool/ais_bench/infer/interface.py create mode 100644 tools/infer_tool/ais_bench/infer/registry.py create mode 100644 tools/infer_tool/ais_bench/infer/summary.py create mode 100644 tools/infer_tool/requirements.txt create mode 100644 tools/infer_tool/setup.py diff --git a/tools/infer_tool/MANIFEST.in b/tools/infer_tool/MANIFEST.in new file mode 100644 index 0000000..5a239c8 --- /dev/null +++ b/tools/infer_tool/MANIFEST.in @@ -0,0 +1,2 @@ +include ais_bench/evaluate/dataset/download.sh +include ais_bench/evaluate/dataset/*.json \ No newline at end of file diff --git a/tools/infer_tool/README.md b/tools/infer_tool/README.md new file mode 100644 index 0000000..7546417 --- /dev/null +++ b/tools/infer_tool/README.md @@ -0,0 +1,767 @@ + + +# ais_bench推理工具使用指南 + +## 简介 +本文介绍ais_bench推理工具,用来针对指定的推理模型运行推理程序,并能够测试推理模型的性能(包括吞吐率、时延)。 + +## 工具安装 + +### 环境和依赖 + +- 目前ais_bench推理工具支持trtexec和aclruntime推理后端,使用本工具时确保安装这两个后端,且这两个后端可以正常运行。 +- 安装Python3、Python包模块numpy、tqdm、wheel。 + +### 工具安装方式 + +ais_bench推理工具的安装方式包括:一键式编译安装和源代码编译安装。 + +**说明**: + +- 安装过程中会自动检查和安装python包依赖,确保安装环境要求网络畅通。 +- centos平台默认为gcc 4.8编译器,可能无法安装本工具,建议更新gcc编译器后再安装。 + +#### 一键式编译安装 + 在安装环境执行如下命令安装ais_bench推理程序包: + + ```bash + pip3 install -v 'git+https://gitee.com/aisbench/inference.git#egg=ais_bench&subdirectory=tools/infer_tool/' + ``` + + 说明:若为覆盖安装,请增加“--force-reinstall”参数强制安装,例如: + + ```bash + pip3 install -v --force-reinstall 'git+https://gitee.com/aisbench/inference.git#egg=ais_bench&subdirectory=tools/infer_tool/' + ``` + + 提示如下示例信息则表示安装成功: + + ```bash + Successfully installed ais_bench-{version} + ``` + + + +#### 源代码编译安装 +1. 从代码开源仓[Gitee](git+https://gitee.com/aisbench/inference.git#egg=ais_bench&subdirectory=tools/infer_tool/)克隆/下载工具压缩包“inference-master.zip”。 + +2. 将工具压缩包上传并解压至安装环境。 + +3. 从工具解压目录下进入tools/infer_tool/目录下,执行如下命令进行编译: + + ```bash + # 进入工具解压目录 + cd ${HOME}/tools/infer_tool/ + # 构建ais_bench推理程序包 + pip3 wheel ./ -v + ``` + + 其中,${HOME}为ais_bench推理工具包所在目录。 + + 分别提示如下信息则表示编译成功: + + ```bash + # 成功编译ais_bench推理程序包 + Successfully built ais-bench + ``` + +4. 执行如下命令,进行安装。 + + ```bash + # 安装ais_bench推理程序 + pip3 install ./ais_bench-{version}-py3-none-any.whl + ``` + + {version}表示软件版本号,{python_version}表示Python版本号,{arch}表示CPU架构。 + + 说明:若为覆盖安装,请增加“--force-reinstall”参数强制安装,例如: + + ```bash + pip3 install ./ais_bench-{version}-py3-none-any.whl --force-reinstall + ``` + + 分别提示如下信息则表示安装成功: + + ```bash + # 成功安装ais_bench推理程序 + Successfully installed ais_bench-{version} + ``` + +## 使用方法 + +### 工具介绍 +ais_bench推理工具的使用方法主要通过命令行使用。 +#### 使用入口 + +ais_bench推理工具可以通过ais_bench可执行文件方式启动模型测试。启动方式如下: + +```bash +python3 -m ais_bench --model *.om +``` +其中,*为OM离线模型文件名。 + +#### 参数说明 + +ais_bench推理工具可以通过配置不同的参数,来应对各种测试场景以及实现其他辅助功能。 + +参数按照功能类别分为**基础功能参数**和**高级功能参数**: + +- **基础功能参数**:主要包括输入输入文件及格式、debug、推理次数、预热次数、指定运行设备以及帮助信息等。 +- **高级功能参数**:主要包括动态分档场景和动态Shape场景的ais_bench推理测试参数以及profiler或dump数据获取等。 + +**说明**:以下参数中,参数和取值之间可以用“ ”空格分隔也可以用“=”等号分隔。例如:--debug 1或--debug=0。 + +##### 基础功能参数 + +| 参数名 | 说明 | 是否必选 | +| --------------------- | ------------------------------------------------------------ | -------- | +| --model | 需要进行推理的离线模型文件。 | 是 | +| --input | 模型需要的输入。可指定输入文件所在目录或直接指定输入文件。支持输入文件格式为“NPY”、“BIN”。可输入多个文件或目录,文件或目录之间用“,”隔开。具体输入文件请根据模型要求准备。 若不配置该参数,会自动构造输入数据,输入数据类型由--pure_data_type参数决定。 | 否 | +| --pure_data_type | 纯推理数据类型。取值为:“zero”、“random”,默认值为"zero"。 未配置模型输入文件时,工具自动构造输入数据。设置为zero时,构造全为0的纯推理数据;设置为random时,为每一个输入生成一组随机数据。 | 否 | +| --output | 推理结果保存目录。配置后会创建“日期+时间”的子目录,保存输出结果。如果指定output_dirname参数,输出结果将保存到子目录output_dirname下。不配置输出目录时,仅打印输出结果,不保存输出结果。 | 否 | +| --output_dirname | 推理结果保存子目录。设置该值时输出结果将保存到*output/output_dirname*目录下。 配合output参数使用,单独使用无效。 例如:--output */output* --output_dirname *output_dirname* | 否 | +| --outfmt | 输出数据的格式。取值为:“NPY”、“BIN”、“TXT”,默认为”BIN“。 配合output参数使用,单独使用无效。 例如:--output */output* --outfmt NPY。 | 否 | +| --debug | 调试开关。可打印model的desc信息和其他详细执行信息。1或true(开启)、0或false(关闭),默认关闭。 | 否 | +| --run_mode | 推理执行前的数据加载方式:可取值:array(将数据转换成host侧的ndarray,再调用推理接口推理),files(将文件直接加载进device内,再调用推理接口推理),tensor(将数据加载进device内,再调用推理接口推理),full(将数据转换成host侧的ndarray,再将ndarray格式数据加载进device内,再调用推理接口推理),默认为array。 | 否 | +| --display_all_summary | 是否显示所有的汇总信息,包含h2d和d2h信息。1或true(开启)、0或false(关闭),默认关闭。 | 否 | +| --loop | 推理次数。默认值为1,取值范围为大于0的正整数。 profiler参数配置为true时,推荐配置为1。 | 否 | +| --warmup_count | 推理预热次数。默认值为1,取值范围为大于等于0的整数。配置为0则表示不预热。 | 否 | +| --device | 指定运行设备。根据设备实际的Device ID指定,默认值为0。多Device场景下,可以同时指定多个Device进行推理测试,例如:--device 0,1,2,3。 | 否 | +| --divide_input | 输入数据集切分开关,1或true(开启)、0或false(关闭),默认关闭。多Device场景下,打开时,工具会将数据集平分给这些Device进行推理。| 否 | +| --help | 工具使用帮助信息。 | 否 | + +##### 高级功能参数 + +| 参数名 | 说明 | 是否必选 | +| ------------------------ | ------------------------------------------------------------ | -------- | +| --dymBatch | 动态Batch参数,指定模型输入的实际Batch。
如模型转换时,设置--input_shape="data:-1,600,600,3;img_info:-1,3" --dynamic_batch_size="1,2,4,8",dymBatch参数可设置为:--dymBatch 2。 | 否 | +| --dymHW | 动态分辨率参数,指定模型输入的实际H、W。
如模型转换时,设置--input_shape="data:8,3,-1,-1;img_info:8,4,-1,-1" --dynamic_image_size="300,500;600,800",dymHW参数可设置为:--dymHW 300,500。 | 否 | +| --dymDims | 动态维度参数,指定模型输入的实际Shape。
如模型转换时,设置 --input_shape="data:1,-1;img_info:1,-1" --dynamic_dims="224,224;600,600",dymDims参数可设置为:--dymDims "data:1,600;img_info:1,600"。 | 否 | +| --dymShape | 动态Shape参数,指定模型输入的实际Shape。
如ATC模型转换时,设置--input_shape_range="input1:\[8\~20,3,5,-1\];input2:\[5,3\~9,10,-1\]",dymShape参数可设置为:--dymShape "input1:8,3,5,10;input2:5,3,10,10"。
动态Shape场景下,获取模型的输出size通常为0(即输出数据占内存大小未知),建议设置--outputSize参数。
例如:--dymShape "input1:8,3,5,10;input2:5,3,10,10" --outputSize "10000,10000" | 否 | +| --dymShape_range | 动态Shape的阈值范围。如果设置该参数,那么将根据参数中所有的Shape列表进行依次推理,得到汇总推理信息。
配置格式为:name1:1,3,200\~224,224-230;name2:1,300。其中,name为模型输入名,“\~”表示范围,“-”表示某一位的取值。
也可以指定动态Shape的阈值范围配置文件*.info,该文件中记录动态Shape的阈值范围。 | 否 | +| --outputSize | 指定模型的输出数据所占内存大小,多个输出时,需要为每个输出设置一个值,多个值之间用“,”隔开。
动态Shape场景下,获取模型的输出size通常为0(即输出数据占内存大小未知),需要根据输入的Shape,预估一个较合适的大小,配置输出数据占内存大小。
例如:--dymShape "input1:8,3,5,10;input2:5,3,10,10" --outputSize "10000,10000" | 否 | +| --auto_set_dymdims_mode | 自动设置动态Dims模式。1或true(开启)、0或false(关闭),默认关闭。
针对动态档位Dims模型,根据输入的文件的信息,自动设置Shape参数,注意输入数据只能为npy文件,因为bin文件不能读取Shape信息。
配合input参数使用,单独使用无效。
例如:--input 1.npy --auto_set_dymdims_mode 1 | 否 | +| --auto_set_dymshape_mode | 自动设置动态Shape模式。取值为:1或true(开启)、0或false(关闭),默认关闭。
针对动态Shape模型,根据输入的文件的信息,自动设置Shape参数,注意输入数据只能为npy文件,因为bin文件不能读取Shape信息。
配合input参数使用,单独使用无效。
例如:--input 1.npy --auto_set_dymshape_mode 1 | 否 | +| --profiler | profiler开关。1或true(开启)、0或false(关闭),默认关闭。
profiler数据在--output参数指定的目录下的profiler文件夹内。配合--output参数使用,单独使用无效。不能与--dump同时开启。| 否 | +| --profiler_rename | 调用profiler落盘文件文件名修改开关,开启后落盘的文件名包含模型名称信息。1或true(开启)、0或false(关闭),默认开启。配合--profiler参数使用,单独使用无效。 |否| +| --dump | dump开关。1或true(开启)、0或false(关闭),默认关闭。
dump数据在--output参数指定的目录下的dump文件夹内。配合--output参数使用,单独使用无效。不能与--profiler同时开启。 | 否 | +| --acl_json_path | acl.json文件路径,须指定一个有效的json文件。该文件内可配置profiler或者dump。当配置该参数时,--dump和--profiler参数无效。 | 否 | +| --batchsize | 模型batchsize。不输入该值将自动推导。当前推理模块根据模型输入和文件输出自动进行组Batch。参数传递的batchszie有且只用于结果吞吐率计算。自动推导逻辑为尝试获取模型的batchsize时,首先获取第一个参数的最高维作为batchsize; 如果是动态Batch的话,更新为动态Batch的值;如果是动态dims和动态Shape更新为设置的第一个参数的最高维。如果自动推导逻辑不满足要求,请务必传入准确的batchsize值,以计算出正确的吞吐率。 | 否 | +| --output_batchsize_axis | 输出tensor的batchsize轴,默认值为0。输出结果保存文件时,根据哪个轴进行切割推理结果,比如batchsize为2,表示2个输入文件组batch进行推理,那输出结果的batch维度是在哪个轴。默认为0轴,按照0轴进行切割为2份,但是部分模型的输出batch为1轴,所以要设置该值为1。 | 否 | +| --aipp_config|带有动态aipp配置的om模型在推理前需要配置的AIPP具体参数,以.config文件路径形式传入。当om模型带有动态aipp配置时,此参数为必填参数;当om模型不带有动态aipp配置时,配置此参数不影响正常推理。|否| +| --backend|指定trtexec开关。需要指定为trtexec。配合--perf参数使用,单独使用无效。|否| +| --perf|调用trtexec开关。1或true(开启)、0或false(关闭),默认关闭。配合--backend参数使用,单独使用无效。|否| +| --energy_consumption |能耗采集开关。1或true(开启)、0或false(关闭),默认关闭。需要配合--npu_id参数使用,默认npu_id为0。|否| +| --npu_id |指定npu_id,默认值为0。需要通过npu-smi info命令获取指定device所对应的npu id。配合--energy_consumption参数使用,单独使用无效。|否| +| --pipeline |指定pipeline开关,用于开启多线程推理功能。1或true(开启)、0或false(关闭),默认关闭。|否| +| --dump_npy |指定dump_npy开关,用于开启dump结果自动转换功能。1或true(开启)、0或false(关闭),默认关闭。需要配合--output和--dump/--acl_json_path参数使用,单独使用无效。|否| +| --threads |指定threads开关,用于设置多计算线程推理时计算线程的数量。默认值为1,取值范围为大于0的正整数。需要配合--pipeline 1参数使用,单独使用无效。|否| + +### 使用场景 + + #### 纯推理场景 + +默认情况下,构造全为0的数据送入模型推理。 + +示例命令如下: + +```bash +python3 -m ais_bench --model /home/model/resnet50_v1.om --output ./ --outfmt BIN --loop 5 +``` + +#### 调试模式 +开启debug调试模式。 + +示例命令如下: + +```bash +python3 -m ais_bench --model /home/model/resnet50_v1.om --output ./ --debug 1 +``` + +调试模式开启后会增加更多的打印信息,包括: +- 模型的输入输出参数信息 + + ```bash + input: + #0 input_ids (1, 384) int32 1536 1536 + #1 input_mask (1, 384) int32 1536 1536 + #2 segment_ids (1, 384) int32 1536 1536 + output: + #0 logits:0 (1, 384, 2) float32 3072 3072 + ``` + +- 详细的推理耗时信息 + + ```bash + [DEBUG] model aclExec cost : 2.336000 + ``` +- 模型输入输出等具体操作信息 + + #### 文件输入场景 + +使用--input参数指定模型输入文件,多个文件之间通过“,”进行分隔。 + +本场景会根据文件输入size和模型实际输入size进行对比,若缺少数据则会自动构造数据补全,称为组Batch。 + +示例命令如下: + +```bash +python3 -m ais_bench --model ./resnet50_v1_bs1_fp32.om --input "./1.bin,./2.bin,./3.bin,./4.bin,./5.bin" +``` + + #### 文件夹输入场景 + +使用input参数指定模型输入文件所在目录,多个目录之间通过“,”进行分隔。 + +本场景会根据文件输入size和模型实际输入size进行组Batch。 + +```bash +python3 -m ais_bench --model ./resnet50_v1_bs1_fp32.om --input "./" +``` + +模型输入需要与传入文件夹的个数一致。 + +例如,bert模型有三个输入,则必须传入3个文件夹,且三个文件夹分别对应模型的三个输入,顺序要对应。 +模型输入参数的信息可以通过开启调试模式查看,bert模型的三个输入依次为input_ids、 input_mask、 segment_ids,所以依次传入三个文件夹: + +- 第一个文件夹“./data/SQuAD1.1/input_ids",对应模型第一个参数"input_ids"的输入 +- 第二个文件夹"./data/SQuAD1.1/input_mask",对应第二个输入"input_mask"的输入 +- 第三个文件夹"./data/SQuAD1.1/segment_ids",对应第三个输入"segment_ids"的输入 + +```bash +python3 -m ais_bench --model ./save/model/BERT_Base_SQuAD_BatchSize_1.om --input ./data/SQuAD1.1/input_ids,./data/SQuAD1.1/input_mask,./data/SQuAD1.1/segment_ids +``` + + + +#### 多Device场景 + +多Device场景下,可以同时指定多个Device进行推理测试。 + +示例命令如下: + +```bash +python3 -m ais_bench --model ./pth_resnet50_bs1.om --input ./data/ --device 1,2 +``` + +输出结果依次展示每个Device的推理测试结果,示例如下: + +```bash +[INFO] -----------------Performance Summary------------------ +[INFO] NPU_compute_time (ms): min = 2.4769999980926514, max = 3.937000036239624, mean = 3.5538000106811523, median = 3.7230000495910645, percentile(99%) = 3.936680030822754 +[INFO] throughput 1000*batchsize.mean(1)/NPU_compute_time.mean(3.5538000106811523): 281.38893494131406 +[INFO] ------------------------------------------------------ +[INFO] -----------------Performance Summary------------------ +[INFO] NPU_compute_time (ms): min = 3.3889999389648438, max = 3.9230000972747803, mean = 3.616000032424927, median = 3.555000066757202, percentile(99%) = 3.9134000968933105 +[INFO] throughput 1000*batchsize.mean(1)/NPU_compute_time.mean(3.616000032424927): 276.54867008654026 +[INFO] ------------------------------------------------------ +[INFO] unload model success, model Id is 1 +[INFO] unload model success, model Id is 1 +[INFO] end to destroy context +[INFO] end to destroy context +[INFO] end to reset device is 2 +[INFO] end to reset device is 2 +[INFO] end to finalize acl +[INFO] end to finalize acl +[INFO] multidevice run end qsize:4 result:1 +i:0 device_1 throughput:281.38893494131406 start_time:1676875630.804429 end_time:1676875630.8303885 +i:1 device_2 throughput:276.54867008654026 start_time:1676875630.8043878 end_time:1676875630.8326817 +[INFO] summary throughput:557.9376050278543 +``` + +其中结果最后展示每个Device推理测试的throughput(吞吐率)、start_time(测试启动时间)、end_time(测试结束时间)以及summary throughput(吞吐率汇总)。其他详细字段解释请参见本手册的“输出结果”章节。 + + #### 动态分档场景 + +主要包含动态Batch、动态HW(宽高)、动态Dims三种场景,需要分别传入dymBatch、dymHW、dymDims指定实际档位信息。 + +##### 动态Batch + +以档位1 2 4 8档为例,设置档位为2,本程序将获取实际模型输入组Batch,每2个输入为一组,进行组Batch。 + +```bash +python3 -m ais_bench --model ./resnet50_v1_dynamicbatchsize_fp32.om --input=./data/ --dymBatch 2 +``` + +##### 动态HW宽高 + +以档位224,224;448,448档为例,设置档位为224,224,本程序将获取实际模型输入组Batch。 + +```bash +python3 -m ais_bench --model ./resnet50_v1_dynamichw_fp32.om --input=./data/ --dymHW 224,224 +``` + +##### 动态Dims + +以设置档位1,3,224,224为例,本程序将获取实际模型输入组Batch。 + +```bash +python3 -m ais_bench --model resnet50_v1_dynamicshape_fp32.om --input=./data/ --dymDims actual_input_1:1,3,224,224 +``` + +##### 自动设置Dims模式(动态Dims模型) + +动态Dims模型输入数据的Shape可能是不固定的,比如一个输入文件Shape为1,3,224,224,另一个输入文件Shape为 1,3,300,300。若两个文件同时推理,则需要设置两次动态Shape参数,当前不支持该操作。针对该场景,增加auto_set_dymdims_mode模式,可以根据输入文件的Shape信息,自动设置模型的Shape参数。 + +```bash +python3 -m ais_bench --model resnet50_v1_dynamicshape_fp32.om --input=./data/ --auto_set_dymdims_mode 1 +``` + + + +#### 动态Shape场景 + +##### 动态Shape + +以ATC设置[1\~8,3,200\~300,200\~300],设置档位1,3,224,224为例,本程序将获取实际模型输入组Batch。 + +动态Shape的输出大小通常为0,建议通过outputSize参数设置对应输出的内存大小。 + +```bash +python3 -m ais_bench --model resnet50_v1_dynamicshape_fp32.om --dymShape actual_input_1:1,3,224,224 --outputSize 10000 +``` + +##### 自动设置Shape模式(动态Shape模型) + +动态Shape模型输入数据的Shape可能是不固定的,比如一个输入文件Shape为1,3,224,224 另一个输入文件Shape为 1,3,300,300。若两个文件同时推理,则需要设置两次动态Shape参数,当前不支持该操作。针对该场景,增加auto_set_dymshape_mode模式,可以根据输入文件的Shape信息,自动设置模型的Shape参数。 + +```bash +python3 -m ais_bench --model ./pth_resnet50_dymshape.om --outputSize 100000 --auto_set_dymshape_mode 1 --input ./dymdata +``` + +**注意该场景下的输入文件必须为npy格式,如果是bin文件将获取不到真实的Shape信息。** + +##### 动态Shape模型range测试模式 + +输入动态Shape的range范围。对于该范围内的Shape分别进行推理,得出各自的性能指标。 + +以对1,3,224,224 1,3,224,225 1,3,224,226进行分别推理为例,命令如下: + +```bash +python3 -m ais_bench --model ./pth_resnet50_dymshape.om --outputSize 100000 --dymShape_range actual_input_1:1,3,224,224~226 +``` + +#### 动态AIPP场景 +- 动态AIPP的介绍参考[ATC模型转换](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/63RC1alpha002/download)中"6.1 AIPP使能"章节。 +- 目前benchmark工具只支持单个input的带有动态AIPP配置的模型,只支持静态shape、动态batch、动态宽高三种场景,不支持动态shape场景。 +##### --aipp_config 输入的.config文件模板 +以resnet18模型所对应的一种aipp具体配置为例(actual_aipp_conf.config): +```cfg +[aipp_op] + input_format : RGB888_U8 + src_image_size_w : 256 + src_image_size_h : 256 + + crop : 1 + load_start_pos_h : 16 + load_start_pos_w : 16 + crop_size_w : 224 + crop_size_h : 224 + + padding : 0 + csc_switch : 0 + rbuv_swap_switch : 0 + ax_swap_switch : 0 + csc_switch : 0 + + min_chn_0 : 123.675 + min_chn_1 : 116.28 + min_chn_2 : 103.53 + var_reci_chn_0 : 0.0171247538316637 + var_reci_chn_1 : 0.0175070028011204 + var_reci_chn_2 : 0.0174291938997821 +``` +- .config文件`[aipp_op]`下的各字段名称及其取值范围参考[ATC模型转换](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/63RC1alpha002/download)中"6.1.9 配置文件模板"章节中"静态AIPP需设置,动态AIPP无需设置"部分,其中字段取值为为true、false的字段,在.config文件中取值对应为1、0。 +- .config文件`[aipp_op]`下的`input_format`、`src_image_size_w`、`src_image_size_h`字段是必填字段。 +- .config文件中字段的具体取值是否适配对应的模型,benchmark本身不会检测,在推理时acl接口报错不属于benchmark的问题 +##### 静态shape场景示例,以resnet18模型为例 +###### atc命令转换出带动态aipp配置的静态shape模型 +``` +atc --framework=5 --model=./resnet18.onnx --output=resnet18_bs4_dym_aipp --input_format=NCHW --input_shape="image:4,3,224,224" --soc_version=Ascend310 --insert_op_conf=dym_aipp_conf.aippconfig --enable_small_channel=1 +``` +- dym_aipp_conf.aippconfig的内容(下同)为: +``` +aipp_op{ + related_input_rank : 0 + aipp_mode : dynamic + max_src_image_size : 4000000 +} +``` +###### benchmark命令 +``` +python3 -m ais_bench --model resnet18_bs4_dym_aipp.om --aipp_config actual_aipp_conf.config +``` +##### 动态batch场景示例,以resnet18模型为例 +###### atc命令转换出带动态aipp配置的动态batch模型 +``` +atc --framework=5 --model=./resnet18.onnx --output=resnet18_dym_batch_aipp --input_format=NCHW --input_shape="image:-1,3,224,224" --dynamic_batch_size "1,2" --soc_version=Ascend310 --insert_op_conf=dym_aipp_conf.aippconfig --enable_small_channel=1 +``` +###### benchmark命令 +``` +python3 -m ais_bench --model resnet18_dym_batch_aipp.om --aipp_config actual_aipp_conf.config --dymBatch 1 +``` +##### 动态宽高场景示例,以resnet18模型为例 +###### atc命令转换出带动态aipp配置的动态宽高模型 +``` +atc --framework=5 --model=./resnet18.onnx --output=resnet18_dym_image_aipp --input_format=NCHW --input_shape="image:4,3,-1,-1" --dynamic_image_size "112,112;224,224" --soc_version=Ascend310 --insert_op_conf=dym_aipp_conf.aippconfig --enable_small_channel=1 +``` +###### benchmark命令 +``` +python3 -m ais_bench --model resnet18_dym_image_aipp.om --aipp_config actual_aipp_conf.config --dymHW 112,112 +``` + +#### trtexec场景 + +ais_bench支持onnx模型推理(集成trtexec),trtexec为NVIDIA TensorRT自带工具。用户使用ais_bench拉起trtexec工具进行推理性能测试,测试过程中实时输出trtexec日志,打印在控制台,推理性能测试完成后,将性能数据输出在控制台。 +##### 前置条件 +推理性能测试环境需要配置有GPU,安装CANN、CUDA及TensorRT,并且trtexec可以通过命令行调用到,安装方式可参考[TensorRT](https://github.com/NVIDIA/TensorRT)。 + +示例命令如下: + +```bash +python3 -m ais_bench --model pth_resnet50.onnx --backend trtexec --perf 1 +``` + +输出结果推理测试结果,示例如下: + +```bash +[INFO] [05/27/2023-12:05:31] [I] === Performance summary === +[INFO] [05/27/2023-12:05:31] [I] Throughput: 120.699 qps +[INFO] [05/27/2023-12:05:31] [I] Latency: min = 9.11414 ms, max = 11.7442 ms, mean = 9.81005 ms, median = 9.76404 ms, percentile(90%) = 10.1075 ms, percentile(95%) = 10.1624 ms, percentile(99%) = 11.4742 ms +[INFO] [05/27/2023-12:05:31] [I] Enqueue Time: min = 0.516296 ms, max = 0.598633 ms, mean = 0.531443 ms, median = 0.5271 ms, percentile(90%) = 0.546875 ms, percentile(95%) = 0.564575 ms, percentile(99%) = 0.580566 ms +[INFO] [05/27/2023-12:05:31] [I] H2D Latency: min = 1.55066 ms, max = 1.57336 ms, mean = 1.55492 ms, median = 1.55444 ms, percentile(90%) = 1.55664 ms, percentile(95%) = 1.55835 ms, percentile(99%) = 1.56458 ms +[INFO] [05/27/2023-12:05:31] [I] GPU Compute Time: min = 7.54407 ms, max = 10.1723 ms, mean = 8.23978 ms, median = 8.19409 ms, percentile(90%) = 8.5354 ms, percentile(95%) = 8.59131 ms, percentile(99%) = 9.90002 ms +[INFO] [05/27/2023-12:05:31] [I] D2H Latency: min = 0.0130615 ms, max = 0.0170898 ms, mean = 0.015342 ms, median = 0.0153809 ms, percentile(90%) = 0.0162354 ms, percentile(95%) = 0.0163574 ms, percentile(99%) = 0.0168457 ms +[INFO] [05/27/2023-12:05:31] [I] Total Host Walltime: 3.02405 s +[INFO] [05/27/2023-12:05:31] [I] Total GPU Compute Time: 3.00752 s +``` + +**字段说明** + +| 字段 | 说明 | +| --------------------- | ------------------------------------------------------------ | +| Throughput | 吞吐率。 | +| Latency | H2D 延迟、GPU 计算时间和 D2H 延迟的总和。这是推断单个执行的延迟。。 | +| min | 推理执行时间最小值。 | +| max | 推理执行时间最大值。 | +| mean | 推理执行时间平均值。 | +| median | 推理执行时间取中位数。 | +| percentile(99%) | 推理执行时间中的百分位数。 | +| H2D Latency | 单个执行的输入张量的主机到设备数据传输的延迟。 | +| GPU Compute Time | 为执行 CUDA 内核的 GPU 延迟。 | +| D2H Latency | 单个执行的输出张量的设备到主机数据传输的延迟。 | +| Total Host Walltime | 从第一个执行(预热后)入队到最后一个执行完成的主机时间。 | +| Total GPU Compute Time| 所有执行的 GPU 计算时间的总和。 | + +#### profiler或dump场景 + +支持以--acl_json_path、--profiler、--dump参数形式实现: ++ acl_json_path参数指定acl.json文件,可以在该文件中对应的profiler或dump参数。示例代码如下: + + + profiler + + ```bash + { + "profiler": { + "switch": "on", + "output": "./result/profiler" + } + } + ``` + + 更多性能参数配置请依据CANN包种类(商用版或社区版)分别参见《[CANN 商用版:开发工具指南/性能数据采集(acl.json配置文件方式)](https://www.hiascend.com/document/detail/zh/canncommercial/63RC1/devtools/auxiliarydevtool/atlasprofiling_16_0086.html)》和《[CANN 社区版:开发工具指南/性能数据采集(acl.json配置文件方式)](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/63RC1alpha002/developmenttools/devtool/atlasprofiling_16_0086.html)》中的参数配置详细描述 + + + dump + + ```bash + { + "dump": { + "dump_list": [ + { + "model_name": "{model_name}" + } + ], + "dump_mode": "output", + "dump_path": "./result/dump" + } + } + ``` + + 更多dump配置请参见《[CANN 开发工具指南](https://www.hiascend.com/document/detail/zh/canncommercial/60RC1/devtools/auxiliarydevtool/auxiliarydevtool_0002.html)》中的“精度比对工具>比对数据准备>推理场景数据准备>准备离线模型dump数据文件”章节。 + +- 通过该方式进行profiler采集时,如果配置了环境变量`export AIT_NO_MSPROF_MODE=1`,输出的性能数据文件需要参见《[CANN 开发工具指南/数据解析与导出/Profiling数据导出](https://www.hiascend.com/document/detail/zh/canncommercial/63RC1/devtools/auxiliarydevtool/atlasprofiling_16_0100.html)》,将性能数据解析并导出为可视化的timeline和summary文件。 +- 通过该方式进行profiler采集时,如果**没有**配置环境变量`AIT_NO_MSPROF_MODE=1`,benchmark会将acl.json中与profiler相关的参数解析成msprof命令,调用msprof采集性能数据,结果默认带有可视化的timeline和summary文件,msprof输出的文件含义参考[性能数据采集(msprof命令行方式)](https://www.hiascend.com/document/detail/zh/canncommercial/63RC1/devtools/auxiliarydevtool/atlasprofiling_16_0040.html)。 +- 如果acl.json文件中同时配置了profiler和dump参数,需要要配置环境变量`export AIT_NO_MSPROF_MODE=1`保证同时采集 + ++ profiler为固化到程序中的一组性能数据采集配置,生成的性能数据保存在--output参数指定的目录下的profiler文件夹内。 + + 该参数是通过调用ais_bench/infer/__main__.py中的msprof_run_profiling函数来拉起msprof命令进行性能数据采集的。若需要修改性能数据采集参数,可根据实际情况修改msprof_run_profiling函数中的msprof_cmd参数。示例如下: + + ```bash + msprof_cmd="{} --output={}/profiler --application=\"{}\" --model-execution=on --sys-hardware-mem=on --sys-cpu-profiling=off --sys-profiling=off --sys-pid-profiling=off --dvpp-profiling=on --runtime-api=on --task-time=on --aicpu=on".format( + msprof_bin, args.output, cmd) + ``` + + 该方式进行性能数据采集时,首先检查是否存在msprof命令: + + - 若命令存在,则使用该命令进行性能数据采集、解析并导出为可视化的timeline和summary文件。 + - 若命令不存在,则msprof层面会报错,benchmark层面不检查命令内容合法性。 + - 若环境配置了AIT_NO_MSPROF_MODE=1,则使用--profiler参数采集性能数据时调用的是acl.json文件。 + + msprof命令不存在或环境配置了AIT_NO_MSPROF_MODE=1情况下,采集的性能数据文件未自动解析,需要参见《[CANN 开发工具指南](https://www.hiascend.com/document/detail/zh/canncommercial/60RC1/devtools/auxiliarydevtool/auxiliarydevtool_0002.html)》中的“性能分析工具>高级功能>数据解析与导出”章节,将性能数据解析并导出为可视化的timeline和summary文件。 + + 更多性能数据采集参数介绍请参见《[CANN 开发工具指南](https://www.hiascend.com/document/detail/zh/canncommercial/60RC1/devtools/auxiliarydevtool/auxiliarydevtool_0002.html)》中的“性能分析工具>高级功能>性能数据采集(msprof命令行方式)”章节。 + ++ acl_json_path优先级高于profiler和dump,同时设置时以acl_json_path为准。 + ++ profiler参数和dump参数,必须要增加output参数,指示输出路径。 + ++ profiler和dump可以分别使用,但不能同时启用。 + +示例命令如下: + +```bash +python3 -m ais_bench --model ./resnet50_v1_bs1_fp32.om --acl_json_path ./acl.json +python3 -m ais_bench --model /home/model/resnet50_v1.om --output ./ --dump 1 +python3 -m ais_bench --model /home/model/resnet50_v1.om --output ./ --profiler 1 +``` + + #### 输出结果文件保存场景 + +默认情况下,ais_bench推理工具执行后不保存输出结果数据文件,配置相关参数后,可生成的结果数据如下: + +| 文件/目录 | 说明 | +| ---------------------------------------- | ------------------------------------------------------------ | +| {文件名}.bin、{文件名}.npy或{文件名}.txt | 模型推理输出结果文件。
文件命名格式:名称_输出序号.后缀。不指定input时(纯推理),名称固定为“pure_infer_data”;指定input时,名称以第一个输入的第一个名称命名;输出的序号从0开始按输出先后顺序排列;文件名后缀由--outfmt参数控制。
默认情况下,会在--output参数指定的目录下创建“日期+时间”的目录,并将结果文件保存在该目录下;当指定了--output_dirname时,结果文件将直接保存在--output_dirname参数指定的目录下。
指定--output_dirname参数时,多次执行工具推理会导致结果文件因同名而覆盖。 | +| xx_summary.json | 工具输出模型性能结果数据。默认情况下,“xx”以“日期+时间”命名;当指定了--output_dirname时,“xx”以--output_dirname指定的目录名称命名。
指定--output_dirname参数时,多次执行工具推理会导致结果文件因同名而覆盖。 | +| dump | dump数据文件目录。使用--dump开启dump时,在--output参数指定的目录下创建dump目录,保存dump数据文件。 | +| profiler | Profiler采集性能数据文件目录。使用--profiler开启性能数据采集时,在--output参数指定的目录下创建profiler目录,保存性能数据文件。 | + +- 仅设置--output参数。示例命令及结果如下: + + ```bash + python3 -m ais_bench --model ./pth_resnet50_bs1.om --output ./result + ``` + + ```bash + result + |-- 2022_12_17-07_37_18 + │   `-- pure_infer_data_0.bin + `-- 2022_12_17-07_37_18_summary.json + ``` + +- 设置--input和--output参数。示例命令及结果如下: + + ```bash + # 输入的input文件夹内容如下 + ls ./data + 196608-0.bin 196608-1.bin 196608-2.bin 196608-3.bin 196608-4.bin 196608-5.bin 196608-6.bin 196608-7.bin 196608-8.bin 196608-9.bin + ``` + + ```bash + python3 -m ais_bench --model ./pth_resnet50_bs1.om --input ./data --output ./result + ``` + + ```bash + result/ + |-- 2023_01_03-06_35_53 + | |-- 196608-0_0.bin + | |-- 196608-1_0.bin + | |-- 196608-2_0.bin + | |-- 196608-3_0.bin + | |-- 196608-4_0.bin + | |-- 196608-5_0.bin + | |-- 196608-6_0.bin + | |-- 196608-7_0.bin + | |-- 196608-8_0.bin + | `-- 196608-9_0.bin + `-- 2023_01_03-06_35_53_summary.json + ``` + +- 设置--output_dirname参数。示例命令及结果如下: + + ```bash + python3 -m ais_bench --model ./pth_resnet50_bs1.om --output ./result --output_dirname subdir + ``` + + ```bash + result + |-- subdir + │   `-- pure_infer_data_0.bin + `-- subdir_summary.json + ``` + +- 设置--dump参数。示例命令及结果如下: + + ```bash + python3 -m ais_bench --model ./pth_resnet50_bs1.om --output ./result --dump 1 + ``` + + ```bash + result + |-- 2022_12_17-07_37_18 + │   `-- pure_infer_data_0.bin + |-- dump + `-- 2022_12_17-07_37_18_summary.json + ``` + +- 设置--profiler参数。示例命令及结果如下: + + ```bash + python3 -m ais_bench --model ./pth_resnet50_bs1.om --output ./result --profiler 1 + ``` + + ```bash + result + |-- 2022_12_17-07_56_10 + │   `-- pure_infer_data_0.bin + |-- profiler + │   `-- PROF_000001_20221217075609326_GLKQJOGROQGOLIIB + `-- 2022_12_17-07_56_10_summary.json + ``` + +#### 多线程推理场景 + + ```bash + python3 -m ais_bench --model ./pth_resnet50_bs1.om --pipeline 1 + ``` + 在单线程推理的命令行基础上加上--pipeline 1即可开启多线程推理模式,实现计算-搬运的并行,加快端到端推理速度。 + + ```bash + python3 -m ais_bench --model ./pth_resnet50_bs1.om --pipeline 1 --threads 2 + ``` + 在多线程推理的命令行基础上加上--threads {$number of threads},即可开启多计算线程推理模式,实现计算-计算的并行,提高推理吞吐量。 + +#### dump数据自动转换场景 + + ```bash + python3 -m ais_bench --model ./pth_resnet50_bs1.om --output ./result --dump 1 --dump_npy 1 + ``` + 在dump场景上加上--dump_npy 1开启自动转换dump数据模式, 需要配合--dump或者--acl_json_path参数。 + + 转换后dump目录 + + ```bash + result/ + |-- 2023_01_03-06_35_53/ + |-- 2023_01_03-06_35_53_summary.json + `-- dump/ + |--20230103063551/ + |--20230103063551_npy/ + ``` + + +### 输出结果 + +ais_bench推理工具执行后,打屏输出结果示例如下: + +- display_all_summary=False时,打印如下: + + ```bash + [INFO] -----------------Performance Summary------------------ + [INFO] NPU_compute_time (ms): min = 0.6610000133514404, max = 0.6610000133514404, mean = 0.6610000133514404, median = 0.6610000133514404, percentile(99%) = 0.6610000133514404 + [INFO] throughput 1000*batchsize.mean(1)/NPU_compute_time.mean(0.6610000133514404): 1512.8592735267011 + [INFO] ------------------------------------------------------ + ``` + +- display_all_summary=True时,打印如下: + + ```bash + [INFO] -----------------Performance Summary------------------ + [INFO] H2D_latency (ms): min = 0.05700000002980232, max = 0.05700000002980232, mean = 0.05700000002980232, median = 0.05700000002980232, percentile(99%) = 0.05700000002980232 + [INFO] NPU_compute_time (ms): min = 0.6650000214576721, max = 0.6650000214576721, mean = 0.6650000214576721, median = 0.6650000214576721, percentile(99%) = 0.6650000214576721 + [INFO] D2H_latency (ms): min = 0.014999999664723873, max = 0.014999999664723873, mean = 0.014999999664723873, median = 0.014999999664723873, percentile(99%) = 0.014999999664723873 + [INFO] throughput 1000*batchsize.mean(1)/NPU_compute_time.mean(0.6650000214576721): 1503.759349974173 + ``` + +通过输出结果可以查看模型执行耗时、吞吐率。耗时越小、吞吐率越高,则表示该模型性能越高。 + +**字段说明** + +| 字段 | 说明 | +| --------------------- | ------------------------------------------------------------ | +| H2D_latency (ms) | Host to Device的内存拷贝耗时。单位为ms。 | +| min | 推理执行时间最小值。 | +| max | 推理执行时间最大值。 | +| mean | 推理执行时间平均值。 | +| median | 推理执行时间取中位数。 | +| percentile(99%) | 推理执行时间中的百分位数。 | +| NPU_compute_time (ms) | NPU推理计算的时间。单位为ms。 | +| D2H_latency (ms) | Device to Host的内存拷贝耗时。单位为ms。 | +| throughput | 吞吐率。吞吐率计算公式:1000 *batchsize/npu_compute_time.mean | +| batchsize | 批大小。本工具不一定能准确识别当前样本的batchsize,建议通过--batchsize参数进行设置。 | + +## 扩展功能 + +### 接口开放 + +开放ais_bench推理工具inferface推理Python接口。 +接口文档参考[API使用说明](API_GUIDE.md) + +动态Shape推理: + +```bash +def infer_dymshape(): + device_id = 0 + session = InferSession(device_id, model_path) + ndata = np.zeros([1,3,224,224], dtype=np.float32) + + mode = "dymshape" + outputs = session.infer([ndata], mode, custom_sizes=100000) + print("outputs:{} type:{}".format(outputs, type(outputs))) + print("dymshape infer avg:{} ms".format(np.mean(session.sumary().exec_time_list))) +``` + +多线程推理: + +使用多线程推理接口时需要注意内存的使用情况,传入的input和预计output总和内存需要小于可用内存,否则程序将会异常退出。 + +```python +def infer_pipeline(): + device_id = 0 + session = InferSession(device_id, model_path) + + barray = bytearray(session.get_inputs()[0].realsize) + ndata = np.frombuffer(barray) + + outputs = session.infer([[ndata]]) + print("outputs:{} type:{}".format(outputs, type(outputs))) + + print("static infer avg:{} ms".format(np.mean(session.sumary().exec_time_list))) +``` + + +### 推理异常保存文件功能 + +当出现推理异常时,会写入算子执行失败的输入输出文件到**当前目录**下。同时会打印出当前的算子执行信息。利于定位分析。示例如下: + +```bash +python3 -m ais_bench --model ./test/testdata/bert/model/pth_bert_bs1.om --input ./random_in0.bin,random_in1.bin,random_in2.bin +``` + +```bash +[INFO] acl init success +[INFO] open device 0 success +[INFO] load model ./test/testdata/bert/model/pth_bert_bs1.om success +[INFO] create model description success +[INFO] get filesperbatch files0 size:1536 tensor0size:1536 filesperbatch:1 runcount:1 +[INFO] exception_cb streamId:103 taskId:10 deviceId: 0 opName:bert/embeddings/GatherV2 inputCnt:3 outputCnt:1 +[INFO] exception_cb hostaddr:0x124040800000 devaddr:0x12400ac48800 len:46881792 write to filename:exception_cb_index_0_input_0_format_2_dtype_1_shape_30522x768.bin +[INFO] exception_cb hostaddr:0x124040751000 devaddr:0x1240801f6000 len:1536 write to filename:exception_cb_index_0_input_1_format_2_dtype_3_shape_384.bin +[INFO] exception_cb hostaddr:0x124040752000 devaddr:0x12400d98e400 len:4 write to filename:exception_cb_index_0_input_2_format_2_dtype_3_shape_.bin +[INFO] exception_cb hostaddr:0x124040753000 devaddr:0x12400db20400 len:589824 write to filename:exception_cb_index_0_output_0_format_2_dtype_1_shape_384x768.bin +EZ9999: Inner Error! +EZ9999 The error from device(2), serial number is 17, there is an aicore error, core id is 0, error code = 0x800000, dump info: pc start: 0x800124080041000, current: 0x124080041100, vec error info: 0x1ff1d3ae, mte error info: 0x3022733, ifu error info: 0x7d1f3266f700, ccu error info: 0xd510fef0003608cf, cube error info: 0xfc, biu error info: 0, aic error mask: 0x65000200d000288, para base: 0x124080017040, errorStr: The DDR address of the MTE instruction is out of range.[FUNC:PrintCoreErrorInfo] + +# ls exception_cb_index_0_* -lh +-rw-r--r-- 1 root root 45M Jan 7 08:17 exception_cb_index_0_input_0_format_2_dtype_1_shape_30522x768.bin +-rw-r--r-- 1 root root 1.5K Jan 7 08:17 exception_cb_index_0_input_1_format_2_dtype_3_shape_384.bin +-rw-r--r-- 1 root root 4 Jan 7 08:17 exception_cb_index_0_input_2_format_2_dtype_3_shape_.bin +-rw-r--r-- 1 root root 576K Jan 7 08:17 exception_cb_index_0_output_0_format_2_dtype_1_shape_384x768.bin +``` +如果有需要将生成的异常bin文件转换为npy文件,请使用[转换脚本convert_exception_cb_bin_to_npy.py](https://gitee.com/ascend/tools/tree/master/ais-bench_workload/tool/ais_bench/test/convert_exception_cb_bin_to_npy.py). +使用方法:python3 convert_exception_cb_bin_to_npy.py --input {bin_file_path}。支持输入bin文件或文件夹。 + + +## FAQ +[FAQ](FAQ.md) diff --git a/tools/infer_tool/__init__.py b/tools/infer_tool/__init__.py new file mode 100644 index 0000000..86c3446 --- /dev/null +++ b/tools/infer_tool/__init__.py @@ -0,0 +1,16 @@ +# Copyright (c) 2023-2023 Huawei Technologies Co., Ltd. +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from components.utils.parser import load_command_instance + +benchmark_cmd = load_command_instance('benchmark_sub_task') \ No newline at end of file diff --git a/tools/infer_tool/ais_bench/__init__.py b/tools/infer_tool/ais_bench/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/tools/infer_tool/ais_bench/__main__.py b/tools/infer_tool/ais_bench/__main__.py new file mode 100644 index 0000000..123cffc --- /dev/null +++ b/tools/infer_tool/ais_bench/__main__.py @@ -0,0 +1,18 @@ +# Copyright (c) 2023-2023 Huawei Technologies Co., Ltd. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +import os +cur_path = os.path.dirname(os.path.realpath(__file__)) +exec(open(os.path.join(cur_path, "infer/__main__.py")).read()) \ No newline at end of file diff --git a/tools/infer_tool/ais_bench/infer/__init__.py b/tools/infer_tool/ais_bench/infer/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/tools/infer_tool/ais_bench/infer/__main__.py b/tools/infer_tool/ais_bench/infer/__main__.py new file mode 100644 index 0000000..2359a58 --- /dev/null +++ b/tools/infer_tool/ais_bench/infer/__main__.py @@ -0,0 +1,281 @@ +# Copyright (c) 2023-2023 Huawei Technologies Co., Ltd. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import argparse +import os +import re +from ais_bench.infer.infer_process import infer_process +from ais_bench.infer.args_adapter import AISBenchInferArgsAdapter +from ais_bench.infer.args_check import ( + check_dym_string, check_dym_range_string, check_number_list, str2bool, check_positive_integer, + check_batchsize_valid, check_nonnegative_integer, check_device_range_valid, check_om_path_legality, + check_input_path_legality, check_output_path_legality, check_acl_json_path_legality, + check_aipp_config_path_legality +) + + +def get_args(): + parser = argparse.ArgumentParser() + parser.add_argument( + "--model", + "-m", + type=check_om_path_legality, + required=True, + help="The path of the om model" + ) + parser.add_argument( + "--input", + "-i", + type=check_input_path_legality, + default=None, + help="Input file or dir" + ) + parser.add_argument( + "--output", + "-o", + type=check_output_path_legality, + default=None, + help="Inference data output path. The inference results are output to \ + the subdirectory named current date under given output path" + ) + parser.add_argument( + "--output_dirname", + type=check_output_path_legality, + default=None, + help="Actual output directory name. \ + Used with parameter output, cannot be used alone. \ + The inference result is output to subdirectory named by output_dirname \ + under output path. such as --output_dirname 'tmp', \ + the final inference results are output to the folder of {$output}/tmp" + ) + parser.add_argument( + "--outfmt", + default="BIN", + choices=["NPY", "BIN", "TXT"], + help="Output file format (NPY or BIN or TXT)" + ) + parser.add_argument( + "--loop", + "-l", + type=check_positive_integer, + default=1, + help="The round of the PureInfer." + ) + parser.add_argument( + "--debug", + type=str2bool, + default=False, + help="Debug switch,print model information" + ) + parser.add_argument( + "--device", + "-d", + type=check_device_range_valid, + default=0, + help="The NPU device ID to use.valid value range is [0, 255]" + ) + parser.add_argument( + "--dymBatch", + dest="dym_batch", + type=check_positive_integer, + default=0, + help="Dynamic batch size param,such as --dymBatch 2" + ) + parser.add_argument( + "--dymHW", + dest="dym_hw", + type=check_dym_string, + default=None, + help="Dynamic image size param, such as --dymHW \"300,500\"" + ) + parser.add_argument( + "--dymDims", + dest="dym_dims", + type=check_dym_string, + default=None, + help="Dynamic dims param, such as --dymDims \"data:1,600;img_info:1,600\"" + ) + parser.add_argument( + "--dymShape", + "--dym-shape", + dest="dym_shape", + type=check_dym_string, + default=None, + help="Dynamic shape param, such as --dymShape \"data:1,600;img_info:1,600\"" + ) + parser.add_argument( + "--outputSize", + dest="output_size", + type=check_number_list, + default=None, + help="Output size for dynamic shape mode" + ) + parser.add_argument( + "--auto_set_dymshape_mode", + type=str2bool, + default=False, + help="Auto_set_dymshape_mode" + ) + parser.add_argument( + "--auto_set_dymdims_mode", + type=str2bool, + default=False, + help="Auto_set_dymdims_mode" + ) + parser.add_argument( + "--batchsize", + type=check_batchsize_valid, + default=None, + help="Batch size of input tensor" + ) + parser.add_argument( + "--pure_data_type", + type=str, + default="zero", + choices=["zero", "random"], + help="Null data type for pure inference(zero or random)" + ) + parser.add_argument( + "--profiler", + type=str2bool, + default=False, + help="Profiler switch" + ) + parser.add_argument( + "--dump", + type=str2bool, + default=False, + help="Dump switch" + ) + parser.add_argument( + "--acl_json_path", + type=check_acl_json_path_legality, + default=None, + help="Acl json path for profiling or dump" + ) + parser.add_argument( + "--output_batchsize_axis", + type=check_nonnegative_integer, + default=0, + help="Splitting axis number when outputing tensor results, such as --output_batchsize_axis 1" + ) + parser.add_argument( + "--run_mode", + type=str, + default="array", + choices=["array", "files", "tensor", "full"], + help="Run mode" + ) + parser.add_argument( + "--display_all_summary", + type=str2bool, + default=False, + help="Display all summary include h2d d2h info" + ) + parser.add_argument( + "--warmup_count", + "--warmup-count", + type=check_nonnegative_integer, + default=1, + help="Warmup count before inference" + ) + parser.add_argument( + "--dymShape_range", + dest="dym_shape_range", + type=check_dym_range_string, + default=None, + help="Dynamic shape range, such as --dymShape_range \"data:1,600~700;img_info:1,600-700\"" + ) + parser.add_argument( + "--aipp_config", + type=check_aipp_config_path_legality, + default=None, + help="File type: .config, to set actual aipp params before infer" + ) + parser.add_argument( + "--energy_consumption", + type=str2bool, + default=False, + help="Obtain power consumption data for model inference" + ) + parser.add_argument( + "--npu_id", + type=check_nonnegative_integer, + default=0, + help="The NPU ID to use.valid value range is [0, 255]" + ) + parser.add_argument( + "--backend", + type=str, + default=None, + choices=["trtexec"], + help="Backend trtexec" + ) + parser.add_argument( + "--perf", + type=str2bool, + default=False, + help="Perf switch" + ) + parser.add_argument( + "--pipeline", + type=str2bool, + default=False, + help="Pipeline switch" + ) + parser.add_argument( + "--profiler_rename", + type=str2bool, + default=True, + help="Profiler rename switch" + ) + parser.add_argument( + "--dump_npy", + type=str2bool, + default=False, + help="dump data convert to npy" + ) + parser.add_argument( + "--divide_input", + type=str2bool, + default=False, + help="Input datas need to be divided to match multi devices or not, \ + --device should be list, default False" + ) + parser.add_argument( + '--threads', + dest='threads', + type=check_positive_integer, + default=1, + help="Number of threads for computing. \ + need to set --pipeline when setting threads number to be more than one." + ) + benchmark_args = parser.parse_args() + + return benchmark_args + + +if __name__ == "__main__": + args = get_args() + + args = AISBenchInferArgsAdapter(args.model, args.input, args.output, + args.output_dirname, args.outfmt, args.loop, args.debug, args.device, + args.dym_batch, args.dym_hw, args.dym_dims, args.dym_shape, args.output_size, + args.auto_set_dymshape_mode, args.auto_set_dymdims_mode, args.batchsize, args.pure_data_type, + args.profiler, args.dump, args.acl_json_path, args.output_batchsize_axis, args.run_mode, + args.display_all_summary, args.warmup_count, args.dym_shape_range, args.aipp_config, + args.energy_consumption, args.npu_id, args.backend, args.perf, args.pipeline, args.profiler_rename, + args.dump_npy, args.divide_input, args.threads) + ret = infer_process(args) + exit(ret) diff --git a/tools/infer_tool/ais_bench/infer/args_adapter.py b/tools/infer_tool/ais_bench/infer/args_adapter.py new file mode 100644 index 0000000..a7c24d9 --- /dev/null +++ b/tools/infer_tool/ais_bench/infer/args_adapter.py @@ -0,0 +1,96 @@ +# Copyright (c) 2023-2023 Huawei Technologies Co., Ltd. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +class AISBenchInferArgsAdapter(): + def __init__(self, model, input_path, output, output_dirname, outfmt, loop, + debug, device, dym_batch, dym_hw, dym_dims, + dym_shape, output_size, auto_set_dymshape_mode, + auto_set_dymdims_mode, batchsize, pure_data_type, + profiler, dump, acl_json_path, output_batchsize_axis, + run_mode, display_all_summary, warmup_count, dym_shape_range, aipp_config, + energy_consumption, npu_id, backend, perf, pipeline, profiler_rename, + dump_npy, divide_input, threads): + self.model = model + self.input = input_path + self.output = output + self.output_dirname = output_dirname + self.outfmt = outfmt + self.loop = loop + self.debug = debug + self.device = device + self.dym_batch = dym_batch + self.dym_hw = dym_hw + self.dym_dims = dym_dims + self.dym_shape = dym_shape + self.output_size = output_size + self.auto_set_dymshape_mode = auto_set_dymshape_mode + self.auto_set_dymdims_mode = auto_set_dymdims_mode + self.batchsize = batchsize + self.pure_data_type = pure_data_type + self.profiler = profiler + self.dump = dump + self.acl_json_path = acl_json_path + self.output_batchsize_axis = output_batchsize_axis + self.run_mode = run_mode + self.display_all_summary = display_all_summary + self.warmup_count = warmup_count + self.dym_shape_range = dym_shape_range + self.aipp_config = aipp_config + self.energy_consumption = energy_consumption + self.npu_id = npu_id + self.backend = backend + self.perf = perf + self.pipeline = pipeline + self.profiler_rename = profiler_rename + self.dump_npy = dump_npy + self.divide_input = divide_input + self.threads = threads + + def get_all_args_dict(self): + args_dict = {} + args_dict.update({'--model':self.model}) + args_dict.update({'--input':self.input}) + args_dict.update({'--output':self.output}) + args_dict.update({'--output_dirname':self.output_dirname}) + args_dict.update({'--outfmt':self.outfmt}) + args_dict.update({'--loop':self.loop}) + args_dict.update({'--debug':self.debug}) + args_dict.update({'--device':self.device}) + args_dict.update({'--dymBatch':self.dym_batch}) + args_dict.update({'--dymHW':self.dym_hw}) + args_dict.update({'--dymDims':self.dym_dims}) + args_dict.update({'--dymShape':self.dym_shape}) + args_dict.update({'--outputSize':self.output_size}) + args_dict.update({'--auto_set_dymshape_mode':self.auto_set_dymshape_mode}) + args_dict.update({'--auto_set_dymdims_mode':self.auto_set_dymdims_mode}) + args_dict.update({'--batchsize':self.batchsize}) + args_dict.update({'--pure_data_type':self.pure_data_type}) + args_dict.update({'--profiler':self.profiler}) + args_dict.update({'--dump':self.dump}) + args_dict.update({'--acl_json_path':self.acl_json_path}) + args_dict.update({'--output_batchsize_axis':self.output_batchsize_axis}) + args_dict.update({'--run_mode':self.run_mode}) + args_dict.update({'--display_all_summary':self.display_all_summary}) + args_dict.update({'--warmup_count':self.warmup_count}) + args_dict.update({'--dymShape_range':self.dym_shape_range}) + args_dict.update({'--aipp_config':self.aipp_config}) + args_dict.update({'--energy_consumption':self.energy_consumption}) + args_dict.update({'--npu_id':self.npu_id}) + args_dict.update({'--perf':self.perf}) + args_dict.update({'--pipeline':self.pipeline}) + args_dict.update({'--profiler_rename':self.profiler_rename}) + args_dict.update({'--dump_npy':self.dump_npy}) + args_dict.update({'--divide_input':self.divide_input}) + args_dict.update({'--threads':self.threads}) + return args_dict \ No newline at end of file diff --git a/tools/infer_tool/ais_bench/infer/args_check.py b/tools/infer_tool/ais_bench/infer/args_check.py new file mode 100644 index 0000000..1093fa4 --- /dev/null +++ b/tools/infer_tool/ais_bench/infer/args_check.py @@ -0,0 +1,194 @@ +import os +import re +import argparse +from ais_bench.infer.common.path_security_check import FileStat + +OM_MODEL_MAX_SIZE = 10 * 1024 * 1024 * 1024 # 10GB +ACL_JSON_MAX_SIZE = 8 * 1024 # 8KB +AIPP_CONFIG_MAX_SIZE = 12.5 * 1024 # 12.5KB + + +def check_dym_string(value): + if not value: + return value + dym_string = value + regex = re.compile(r"[^_A-Za-z0-9,;:/.-]") + if regex.search(dym_string): + raise argparse.ArgumentTypeError(f"dym string \"{dym_string}\" is not a legal string") + return dym_string + + +def check_dym_range_string(value): + if not value: + return value + dym_string = value + regex = re.compile(r"[^_A-Za-z0-9,;:/.\-~]") + if regex.search(dym_string): + raise argparse.ArgumentTypeError(f"dym range string \"{dym_string}\" is not a legal string") + return dym_string + + +def check_number_list(value): + if not value: + return value + number_list = value + regex = re.compile(r"[^0-9,;]") + if regex.search(number_list): + raise argparse.ArgumentTypeError(f"number_list \"{number_list}\" is not a legal list") + return number_list + + +def str2bool(v): + if isinstance(v, bool): + return v + if v.lower() in ('yes', 'true', 't', 'y', '1'): + return True + elif v.lower() in ('no', 'false', 'f', 'n', '0'): + return False + else: + raise argparse.ArgumentTypeError('Boolean value expected true, 1, false, 0 with case insensitive.') + + +def check_positive_integer(value): + ivalue = int(value) + if ivalue <= 0: + raise argparse.ArgumentTypeError("%s is an invalid positive int value" % value) + return ivalue + + +def check_batchsize_valid(value): + # default value is None + if value is None: + return value + # input value no None + else: + return check_positive_integer(value) + + +def check_nonnegative_integer(value): + ivalue = int(value) + if ivalue < 0: + raise argparse.ArgumentTypeError("%s is an invalid nonnegative int value" % value) + return ivalue + + +def check_npu_id_range_vaild(value): + # if contain , split to int list + min_value = 0 + max_value = 2048 + if ',' in value: + ilist = [int(v) for v in value.split(',')] + for ivalue in ilist: + if ivalue < min_value or ivalue > max_value: + raise argparse.ArgumentTypeError("{} of npu_id:{} is invalid. valid value range is [{}, {}]".format( + ivalue, value, min_value, max_value)) + return ilist + else: + # default as single int value + ivalue = int(value) + if ivalue < min_value or ivalue > max_value: + raise argparse.ArgumentTypeError("npu_id:{} is invalid. valid value range is [{}, {}]".format( + ivalue, min_value, max_value)) + return ivalue + + +def check_device_range_valid(value): + # if contain , split to int list + min_value = 0 + max_value = 255 + try: + # Check if the value contains a comma; if so, split into a list of integers + if ',' in value: + ilist = [int(v) for v in value.split(',')] + for ivalue in ilist: + if ivalue < min_value or ivalue > max_value: + raise argparse.ArgumentTypeError("{} of device:{} is invalid. valid value range is [{}, {}]".format( + ivalue, value, min_value, max_value)) + return ilist + else: + # default as single int value + ivalue = int(value) + if ivalue < min_value or ivalue > max_value: + raise argparse.ArgumentTypeError("device:{} is invalid. valid value range is [{}, {}]".format( + ivalue, min_value, max_value)) + return ivalue + except ValueError: + raise argparse.ArgumentTypeError("Argument npu-id invalid input value: {}. " + "Please provide a valid integer or a comma-separated list of integers.".format(value)) + + + +def check_om_path_legality(value): + path_value = value + try: + file_stat = FileStat(path_value) + except Exception as err: + raise argparse.ArgumentTypeError(f"om path:{path_value} is illegal. Please check.") from err + if not file_stat.is_basically_legal('read'): + raise argparse.ArgumentTypeError(f"om path:{path_value} is illegal. Please check.") + if not file_stat.is_legal_file_type(["om"]): + raise argparse.ArgumentTypeError(f"om path:{path_value} is illegal. Please check.") + if not file_stat.is_legal_file_size(OM_MODEL_MAX_SIZE): + raise argparse.ArgumentTypeError(f"om path:{path_value} is illegal. Please check.") + return path_value + + +def check_input_path_legality(value): + if not value: + return value + inputs_list = value.split(',') + for input_path in inputs_list: + try: + file_stat = FileStat(input_path) + except Exception as err: + raise argparse.ArgumentTypeError(f"input path:{input_path} is illegal. Please check.") from err + if not file_stat.is_basically_legal('read'): + raise argparse.ArgumentTypeError(f"input path:{input_path} is illegal. Please check.") + return value + + +def check_output_path_legality(value): + if not value: + return value + path_value = value + try: + file_stat = FileStat(path_value) + except Exception as err: + raise argparse.ArgumentTypeError(f"weight path:{path_value} is illegal. Please check.") from err + if not file_stat.is_basically_legal("write"): + raise argparse.ArgumentTypeError(f"output path:{path_value} is illegal. Please check.") + return path_value + + +def check_acl_json_path_legality(value): + if not value: + return value + path_value = value + try: + file_stat = FileStat(path_value) + except Exception as err: + raise argparse.ArgumentTypeError(f"acl json path:{path_value} is illegal. Please check.") from err + if not file_stat.is_basically_legal('read'): + raise argparse.ArgumentTypeError(f"acl json path:{path_value} is illegal. Please check.") + if not file_stat.is_legal_file_type(["json"]): + raise argparse.ArgumentTypeError(f"acl json path:{path_value} is illegal. Please check.") + if not file_stat.is_legal_file_size(ACL_JSON_MAX_SIZE): + raise argparse.ArgumentTypeError(f"acl json path:{path_value} is illegal. Please check.") + return path_value + + +def check_aipp_config_path_legality(value): + if not value: + return value + path_value = value + try: + file_stat = FileStat(path_value) + except Exception as err: + raise argparse.ArgumentTypeError(f"aipp config path:{path_value} is illegal. Please check.") from err + if not file_stat.is_basically_legal('read'): + raise argparse.ArgumentTypeError(f"aipp config path:{path_value} is illegal. Please check.") + if not file_stat.is_legal_file_type(["config"]): + raise argparse.ArgumentTypeError(f"aipp config path:{path_value} is illegal. Please check.") + if not file_stat.is_legal_file_size(AIPP_CONFIG_MAX_SIZE): + raise argparse.ArgumentTypeError(f"aipp config path:{path_value} is illegal. Please check.") + return path_value \ No newline at end of file diff --git a/tools/infer_tool/ais_bench/infer/backends/__init__.py b/tools/infer_tool/ais_bench/infer/backends/__init__.py new file mode 100644 index 0000000..e78cd1b --- /dev/null +++ b/tools/infer_tool/ais_bench/infer/backends/__init__.py @@ -0,0 +1,30 @@ +# Copyright (c) 2023-2023 Huawei Technologies Co., Ltd. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +import os + +from ais_bench.infer import registry + +BACKEND_REGISTRY = registry.Registry("BACKEND_REGISTRY") + +registry.import_all_modules_for_register( + os.path.dirname(os.path.abspath(__file__)), "ais_bench.infer.backends" +) + + +class BackendFactory: + @staticmethod + def create_backend(name): + return BACKEND_REGISTRY[name] \ No newline at end of file diff --git a/tools/infer_tool/ais_bench/infer/backends/backend.py b/tools/infer_tool/ais_bench/infer/backends/backend.py new file mode 100644 index 0000000..88476f0 --- /dev/null +++ b/tools/infer_tool/ais_bench/infer/backends/backend.py @@ -0,0 +1,123 @@ +# Copyright (c) 2023-2023 Huawei Technologies Co., Ltd. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +from __future__ import annotations + +from abc import ABC, abstractmethod +from typing import List, Any, Iterable, Union + +import attrs + + +@attrs.define +class AccuracyResult: + output: Any = None + label: Any = None + prediction: Any = None + + +@attrs.define +class PerformanceStats: + min: float = None + max: float = None + mean: float = None + median: float = None + percentile: float = None + + +@attrs.define +class PerformanceResult: + h2d_latency: PerformanceStats = None + compute_time: PerformanceStats = None + d2h_latency: PerformanceStats = None + host_wall_time: float = None + throughput: float = None + + +@attrs.define +class InferenceTrace: + h2d_start: float = None + h2d_end: float = None + compute_start: float = None + compute_end: float = None + d2h_start: float = None + d2h_end: float = None + + +class Backend(ABC): + """ + Backend interface + """ + + @property + @abstractmethod + def name(self) -> str: + """ + Each of the subclasses must implement this. + This is called to return the name of backend. + """ + + @property + def model_extension(self) -> str: + return "model" + + def initialize(self) -> bool: + """ + init the resource of backend + """ + return True + + def finalize(self) -> None: + """ + release the resource of backend + """ + pass + + @abstractmethod + def load(self, model_path: str) -> Backend: + """ + Each of the subclases must implement this. + This is called to load a model. + """ + + @abstractmethod + def warm_up(self, dataloader: Iterable, iterations: int = 100) -> None: + """ + Each of the subclases must implement this. + This is called to warmup. + """ + + @abstractmethod + def predict( + self, dataloader: Iterable + ) -> Union[List[AccuracyResult], None]: + """ + Each of the subclasses must implement this. + This is called to inference a model + """ + + @abstractmethod + def build(self) -> None: + """ + Each of the subclasses must implement this. + This is called to build a model + """ + + @abstractmethod + def get_perf(self) -> PerformanceResult: + """ + Each of the subclasses must implement this. + This is called to get the performance of the model inference. + """ \ No newline at end of file diff --git a/tools/infer_tool/ais_bench/infer/backends/backend_trtexec.py b/tools/infer_tool/ais_bench/infer/backends/backend_trtexec.py new file mode 100644 index 0000000..d1d9236 --- /dev/null +++ b/tools/infer_tool/ais_bench/infer/backends/backend_trtexec.py @@ -0,0 +1,154 @@ +# Copyright (c) 2023-2023 Huawei Technologies Co., Ltd. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +from __future__ import annotations + +import os +import sys +import logging +import subprocess +import re +from typing import Iterable, List, Dict, Any + +from ais_bench.infer.backends import backend, BACKEND_REGISTRY +from ais_bench.infer.backends.backend import AccuracyResult, PerformanceStats, PerformanceResult, InferenceTrace +from ais_bench.infer.common.utils import logger + + +class TrtexecConfig(object): + def __init__(self): + self.iterations = None + self.warmup = None + self.duration = None + self.batch = None + self.device = None + + +logging.basicConfig(stream=sys.stdout, level=logging.INFO, format='[%(levelname)s] %(message)s') +logger = logging.getLogger(__name__) + + +@BACKEND_REGISTRY.register("trtexec") +class BackendTRTExec(backend.Backend): + def __init__(self, config: Any = None) -> None: + super(BackendTRTExec, self).__init__() + self.config = TrtexecConfig() + self.convert_config(config) + self.model_path = "" + self.output_log = "" + self.trace = InferenceTrace() + + @property + def name(self) -> str: + return "trtexec" + + @property + def model_extension(self) -> str: + return "plan" + + def convert_config(self, config): + if config.loop is not None: + self.config.iterations = config.loop + if config.warmup_count is not None: + self.config.warmup_count = config.warmup_count + if config.batchsize is not None: + self.config.batch = config.batchsize + if config.device is not None: + self.config.device = config.device + + def load( + self, model_path: str, inputs: list = None, outputs: list = None + ) -> BackendTRTExec: + if os.path.exists(model_path): + logger.info("Load engine from file {}".format(model_path)) + self.model_path = model_path + else: + raise Exception("{} not exit".format(model_path)) + return self + + def parse_perf(self, data: List) -> PerformanceStats: + stats = PerformanceStats() + stats.min = float(data[0]) + stats.max = float(data[1]) + stats.mean = float(data[2]) + stats.median = float(data[3]) + stats.percentile = float(data[4]) + return stats + + def parse_log(self, log: str) -> PerformanceResult: + performance = PerformanceResult() + log_list = log.splitlines() + pattern_1 = re.compile(r"(?<=: )\d+\.?\d*") + pattern_2 = re.compile(r"(?<== )\d+\.?\d*") + for line in log_list: + if "Throughput" in line: + throughput = pattern_1.findall(line) + performance.throughput = float(throughput[0]) + elif "H2D Latency" in line: + h2d_latency = pattern_2.findall(line) + performance.h2d_latency = self.parse_perf(h2d_latency) + elif "GPU Compute Time: min" in line: + compute_time = pattern_2.findall(line) + performance.compute_time = self.parse_perf(compute_time) + elif "D2H Latency" in line: + d2h_latency = pattern_2.findall(line) + performance.d2h_latency = self.parse_perf(d2h_latency) + elif "Total Host Walltime" in line: + total_host_time = pattern_1.findall(line) + performance.host_wall_time = float(total_host_time[0]) + return performance + + def warm_up(self, dataloader: Iterable, iterations: int = 100) -> None: + pass + + def predict(self, dataloader: Iterable) -> List[AccuracyResult]: + pass + + def build(self) -> None: + pass + + def get_perf(self) -> PerformanceResult: + return self.parse_log(self.output_log) + + def run(self): + command = [ + "trtexec", + f"--onnx={self.model_path}", + f"--fp16", + ] + if self.config.duration is not None: + command.append(f"--duration={self.config.duration}") + if self.config.device is not None: + command.append(f"--device={self.config.device}") + if self.config.iterations is not None: + command.append(f"--iterations={self.config.iterations}") + if self.config.warmup is not None: + command.append(f"--warmUp={self.config.warmup}") + if self.config.batch is not None: + command.append(f"--batch={self.config.batch}") + + logger.info("Trtexec Build command: " + " ".join(command)) + process = subprocess.Popen( + command, stdout=subprocess.PIPE, shell=False + ) + + while process.poll() is None: + line = process.stdout.readline() + self.output_log += line.decode() + line = line.strip() + if line: + logger.info(line.decode()) + + return [] \ No newline at end of file diff --git a/tools/infer_tool/ais_bench/infer/common/__init__.py b/tools/infer_tool/ais_bench/infer/common/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/tools/infer_tool/ais_bench/infer/common/io_operations.py b/tools/infer_tool/ais_bench/infer/common/io_operations.py new file mode 100644 index 0000000..f0e2be5 --- /dev/null +++ b/tools/infer_tool/ais_bench/infer/common/io_operations.py @@ -0,0 +1,339 @@ +# Copyright (c) 2023-2023 Huawei Technologies Co., Ltd. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import math +import os +import random +import time +import numpy as np + +from ais_bench.infer.summary import summary +from ais_bench.infer.common.utils import ( + get_file_content, + get_file_datasize, + get_fileslist_from_dir, + list_split, + logger, + save_data_to_files, +) + +PURE_INFER_FAKE_FILE = "pure_infer_data" +PURE_INFER_FAKE_FILE_ZERO = "pure_infer_data_zero" +PURE_INFER_FAKE_FILE_RANDOM = "pure_infer_data_random" +PADDING_INFER_FAKE_FILE = "padding_infer_fake_file" + + +def convert_real_files(files): + real_files = [] + for file in files: + if file == PURE_INFER_FAKE_FILE: + raise RuntimeError("not support pure infer") + elif file.endswith(".npy") or file.endswith(".NPY"): + raise RuntimeError("not support npy file:{}".format(file)) + elif file == PADDING_INFER_FAKE_FILE: + real_files.append(files[0]) + else: + real_files.append(file) + return real_files + + +def get_pure_infer_data(size, pure_data_type): + lst = [] + if pure_data_type == "random": + # random value from [0, 255] + lst = [random.randrange(0, 256) for _ in range(size)] + else: + # zero value, default + lst = [0 for _ in range(size)] + + barray = bytearray(lst) + ndata = np.frombuffer(barray, dtype=np.uint8) + return ndata + + +# get numpy array from files list combile all files +def get_narray_from_files_list(files_list, size, pure_data_type, no_combine_tensor_mode=False): + ndatalist = [] + file_path_switch = { + PURE_INFER_FAKE_FILE: pure_data_type, + PURE_INFER_FAKE_FILE_ZERO: "zero", + PURE_INFER_FAKE_FILE_RANDOM: "random", + } + for i, file_path in enumerate(files_list): + logger.debug("get tensor from filepath:{} i:{} of all:{}".format(file_path, i, len(files_list))) + if file_path_switch.get(file_path) is not None: + ndata = get_pure_infer_data(size, file_path_switch.get(file_path)) + elif file_path == PADDING_INFER_FAKE_FILE: + logger.debug("padding file use fileslist[0]:{}".format(files_list[0])) + ndata = get_file_content(files_list[0]) + elif file_path is None or not os.path.exists(file_path): + logger.error('filepath:{} not valid'.format(file_path)) + raise RuntimeError() + else: + ndata = get_file_content(file_path) + ndatalist.append(ndata) + if len(ndatalist) == 1: + return ndatalist[0] + else: + ndata = np.concatenate(ndatalist) + if not no_combine_tensor_mode and ndata.nbytes != size: + logger.error('ndata size:{} not match {}'.format(ndata.nbytes, size)) + raise RuntimeError() + return ndata + + +# get tensors from files list combile all files +def get_tensor_from_files_list(files_list, session, size, pure_data_type, no_combine_tensor_mode=False): + ndata = get_narray_from_files_list(files_list, size, pure_data_type, no_combine_tensor_mode) + tensor = session.create_tensor_from_arrays_to_device(ndata) + return tensor + + +# Obtain filesperbatch runcount information according to file information and input description information +# The strategy is as follows: Judge according to the realsize and file size of input 0. If the judgment fails, +# you need to force the desired value to be set +def get_files_count_per_batch(intensors_desc, fileslist, no_combine_tensor_mode=False): + # get filesperbatch + filesize = get_file_datasize(fileslist[0][0]) + tensorsize = intensors_desc[0].realsize + if no_combine_tensor_mode: + files_count_per_batch = 1 + else: + if filesize == 0 or tensorsize % filesize != 0: + logger.error('arg0 tensorsize: {} filesize: {} not match'.format(tensorsize, filesize)) + raise RuntimeError() + else: + files_count_per_batch = (int)(tensorsize / filesize) + if files_count_per_batch == 0: + logger.error('files count per batch is zero') + raise RuntimeError() + runcount = math.ceil(len(fileslist[0]) / files_count_per_batch) + + logger.info( + "get filesperbatch files0 size:{} tensor0size:{} filesperbatch:{} runcount:{}".format( + filesize, tensorsize, files_count_per_batch, runcount + ) + ) + return files_count_per_batch, runcount + + +# Obtain tensor information and files information according to the input filelist. Create intensor form files list +# len(files_list) should equal len(intensors_desc) +def create_infileslist_from_fileslist(fileslist, intensors_desc, no_combine_tensor_mode=False): + if len(intensors_desc) != len(fileslist): + logger.error('fileslist:{} intensor:{} not match'.format(len(fileslist), len(intensors_desc))) + raise RuntimeError() + files_count_per_batch, runcount = get_files_count_per_batch(intensors_desc, fileslist, no_combine_tensor_mode) + + files_perbatch_list = [ + list(list_split(fileslist[j], files_count_per_batch, PADDING_INFER_FAKE_FILE)) + for j in range(len(intensors_desc)) + ] + + infileslist = [] + for i in range(runcount): + infiles = [] + for j in range(len(intensors_desc)): + logger.debug( + "create infileslist i:{} j:{} runcount:{} lists:{} filesPerPatch:{}".format( + i, j, runcount, files_perbatch_list[j][i], files_count_per_batch + ) + ) + infiles.append(files_perbatch_list[j][i]) + infileslist.append(infiles) + return infileslist + + +# outapi. Obtain tensor information and files information according to the input filelist. +# Create intensor form files list +def create_intensors_from_infileslist( + infileslist, intensors_desc, session, pure_data_type, no_combine_tensor_mode=False +): + intensorslist = [] + for infiles in infileslist: + intensors = [] + for files, intensor_desc in zip(infiles, intensors_desc): + tensor = get_tensor_from_files_list( + files, session, intensor_desc.realsize, pure_data_type, no_combine_tensor_mode + ) + intensors.append(tensor) + intensorslist.append(intensors) + return intensorslist + + +def check_input_parameter(inputs_list, intensors_desc): + if len(inputs_list) == 0: + logger.error("Invalid args. Input args are empty") + raise RuntimeError() + if os.path.isfile(inputs_list[0]): + for index, file_path in enumerate(inputs_list): + realpath = os.readlink(file_path) if os.path.islink(file_path) else file_path + if not os.path.isfile(realpath): + logger.error( + "Invalid input args.--input:{} input[{}]:{} {} not exist".format( + inputs_list, index, file_path, realpath + ) + ) + raise RuntimeError() + elif os.path.isdir(inputs_list[0]): + if len(inputs_list) != len(intensors_desc): + logger.error( + "Invalid args. args input dir num:{0} not equal to model inputs num:{1}".format( + len(inputs_list), len(intensors_desc) + ) + ) + raise RuntimeError() + + for dir_path in inputs_list: + real_dir_path = os.readlink(dir_path) if os.path.islink(dir_path) else dir_path + if not os.path.isdir(real_dir_path): + logger.error("Invalid args. {} of input args is not a real dir path".format(real_dir_path)) + raise RuntimeError() + else: + logger.error("Invalid args. {} of --input is invalid".format(inputs_list[0])) + raise RuntimeError() + + +# outapi. get by input parameters of inputs_List. +def create_infileslist_from_inputs_list(inputs_list, intensors_desc, no_combine_tensor_mode=False): + check_input_parameter(inputs_list, intensors_desc) + fileslist = [] + inputlistcount = len(inputs_list) + intensorcount = len(intensors_desc) + if os.path.isfile(inputs_list[0]): + chunks = inputlistcount // intensorcount + fileslist = list(list_split(inputs_list, chunks, PADDING_INFER_FAKE_FILE)) + logger.debug( + "create intensors list file type inlistcount:{} intensorcont:{} chunks:{} files_size:{}".format( + inputlistcount, intensorcount, chunks, len(fileslist) + ) + ) + elif os.path.isdir(inputs_list[0]) and inputlistcount == intensorcount: + fileslist = [get_fileslist_from_dir(dir) for dir in inputs_list] + logger.debug( + "create intensors list dictionary type inlistcount:{} intensorcont:{} files_size:{}".format( + inputlistcount, intensorcount, len(fileslist) + ) + ) + else: + logger.error( + 'create intensors list filelists:{} intensorcont:{} error create'.format(inputlistcount, intensorcount) + ) + raise RuntimeError() + + infileslist = create_infileslist_from_fileslist(fileslist, intensors_desc, no_combine_tensor_mode) + if len(infileslist) == 0: + logger.error('create_infileslist_from_fileslist return infileslist size: {}'.format(len(infileslist))) + raise RuntimeError() + + return infileslist + + +def check_pipeline_fileslist_match_intensors(fileslist, intensors_desc): + # check intensor amount matched + if len(intensors_desc) != len(fileslist): + logger.error('fileslist:{} intensor:{} not match'.format(len(fileslist), len(intensors_desc))) + raise RuntimeError() + # check intensor size matched + for i, files in enumerate(fileslist): + filesize = get_file_datasize(files[0]) + tensorsize = intensors_desc[i].realsize + auto_mode = False + # auto_dim_mode & auto_shape_mode are exceptional cases + if intensors_desc[i].realsize == intensors_desc[i].size: + if any(dim <= 0 for dim in intensors_desc[i].shape): + auto_mode = True + if filesize != tensorsize and not auto_mode: + logger.error(f'tensor_num:{i} tensorsize:{tensorsize} filesize:{filesize} not match') + raise RuntimeError() + + +# 不组batch的情况 +def create_pipeline_fileslist_from_inputs_list(inputs_list, intensors_desc): + check_input_parameter(inputs_list, intensors_desc) + fileslist = [] + inputlistcount = len(inputs_list) + intensorcount = len(intensors_desc) + if os.path.isfile(inputs_list[0]): + chunks = inputlistcount // intensorcount + fileslist = list(list_split(inputs_list, chunks, PADDING_INFER_FAKE_FILE)) + logger.debug( + f"create intensors list file type inlistcount:{inputlistcount} \ + intensorcont:{intensorcount} chunks:{chunks} files_size:{len(fileslist)}" + ) + elif os.path.isdir(inputs_list[0]) and inputlistcount == intensorcount: + fileslist = [get_fileslist_from_dir(dir_) for dir_ in inputs_list] + logger.debug( + f"create intensors list dictionary type inlistcount:{inputlistcount} \ + intensorcont:{intensorcount} files_size:{len(fileslist)}" + ) + else: + logger.error('create intensors list filelists:{inputlistcount} intensorcont:{intensorcount} error create') + raise RuntimeError() + try: + check_pipeline_fileslist_match_intensors(fileslist, intensors_desc) + except Exception as err: + logger.error("fileslist and intensors not matched") + raise RuntimeError from err + infileslist = list(zip(*fileslist)) + return infileslist + + +def save_tensors_to_file(outputs, output_prefix, infiles_paths, outfmt, index, output_batchsize_axis): + files_count_perbatch = len(infiles_paths[0]) + infiles_perbatch = np.transpose(infiles_paths) + for i, out in enumerate(outputs): + ndata = np.array(out) + if output_batchsize_axis >= len(ndata.shape): + logger.error( + "error i:{0} ndata.shape:{1} len:{2} <= output_batchsize_axis:{3} is invalid".format( + i, ndata.shape, len(ndata.shape), output_batchsize_axis + ) + ) + raise RuntimeError() + if files_count_perbatch == 1 or ndata.shape[output_batchsize_axis] % files_count_perbatch == 0: + subdata = np.array_split(ndata, files_count_perbatch, output_batchsize_axis) + for j in range(files_count_perbatch): + sample_id = index * files_count_perbatch + j + if infiles_perbatch[j][0] == PADDING_INFER_FAKE_FILE: + logger.debug( + "sampleid:{} i:{} infiles:{} is padding fake file so continue".format( + sample_id, i, infiles_perbatch[j] + ) + ) + continue + file_path = os.path.join( + output_prefix, + "{}_{}.{}".format(os.path.basename(infiles_perbatch[j][0]).split('.')[0], i, outfmt.lower()), + ) + summary.add_sample_id_infiles(sample_id, infiles_perbatch[j]) + logger.debug( + "save func: sampleid:{} i:{} infiles:{} outfile:{} fmt:{} axis:{}".format( + sample_id, i, infiles_perbatch[j], file_path, outfmt, output_batchsize_axis + ) + ) + summary.append_sample_id_outfile(sample_id, file_path) + save_data_to_files(file_path, subdata[j]) + else: + logger.error( + 'save out files error array shape:{} filesinfo:{} files_count_perbatch:{} ndata.shape\ + {}:{}'.format( + ndata.shape, + infiles_paths, + files_count_perbatch, + output_batchsize_axis, + ndata.shape[output_batchsize_axis], + ) + ) + raise RuntimeError() diff --git a/tools/infer_tool/ais_bench/infer/common/miscellaneous.py b/tools/infer_tool/ais_bench/infer/common/miscellaneous.py new file mode 100644 index 0000000..21ce01c --- /dev/null +++ b/tools/infer_tool/ais_bench/infer/common/miscellaneous.py @@ -0,0 +1,276 @@ +# Copyright (c) 2023-2023 Huawei Technologies Co., Ltd. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import os +import sys +import stat +import subprocess +import json +import itertools +import numpy as np + +from ais_bench.infer.common.utils import logger +from ais_bench.infer.common.path_security_check import ms_open, MAX_SIZE_LIMITE_CONFIG_FILE, MAX_SIZE_LIMITE_NORMAL_FILE +from ais_bench.infer.args_adapter import AISBenchInferArgsAdapter + +PERMISSION_DIR = 0o750 + +ACL_JSON_CMD_LIST = [ + "output", + "storage_limit", + "ascendcl", + "runtime_api", + "hccl", + "task_time", + "aicpu", + "aic_metrics", + "l2", + "sys_hardware_mem_freq", + "lcc_profiling", + "dvpp_freq", + "host_sys", + "host_sys_usage", + "host_sys_usage_freq", + "sys_interconnection_freq", + "msproftx", +] + + +def get_modules_version(name): + try: + import pkg_resources + except ImportError as err: + raise Exception("importerror") from err + pkg = pkg_resources.get_distribution(name) + return pkg.version + + +def version_check(args): + try: + aclruntime_version = get_modules_version('aclruntime') + except Exception: + url = 'https://gitee.com/ascend/tools/tree/master/ais-bench_workload/tool/ais_bench' + logger.warning(f"can't find aclruntime, please visit {url} to install ais_bench(benchmark)" + "to install") + args.run_mode = "tensor" + if aclruntime_version != "0.0.2": + logger.warning( + f"aclruntime{aclruntime_version} version is lower please update \ + aclruntime follow any one method" + ) + # set old run mode to run ok + args.run_mode = "tensor" + + +def get_model_name(model): + path_list = model.split('/') + return path_list[-1][:-3] + + +def check_valid_acl_json_for_dump(acl_json_path, model): + with ms_open(acl_json_path, mode="r", max_size=MAX_SIZE_LIMITE_CONFIG_FILE) as f: + acl_json_dict = json.load(f) + model_name_correct = get_model_name(model) + if acl_json_dict.get("dump") is not None: + # check validity of dump_list (model_name) + dump_list_val = acl_json_dict["dump"].get("dump_list") + if dump_list_val is not None: + if dump_list_val == [] or dump_list_val[0].get("model_name") != model_name_correct: + logger.warning( + "dump failed, 'model_name' is not set or set incorrectly. correct" + "'model_name' should be {}".format(model_name_correct) + ) + else: + logger.warning("dump failed, acl.json need to set 'dump_list' attribute") + # check validity of dump_path + dump_path_val = acl_json_dict["dump"].get("dump_path") + if dump_path_val is not None: + if os.path.isdir(dump_path_val) and os.access(dump_path_val, os.R_OK) and os.access(dump_path_val, os.W_OK): + pass + else: + logger.warning("dump failed, 'dump_path' not exists or has no read/write permission") + else: + logger.warning("dump failed, acl.json need to set 'dump_path' attribute") + # check validity of dump_op_switch + dump_op_switch_val = acl_json_dict["dump"].get("dump_op_switch") + if dump_op_switch_val is not None and dump_op_switch_val not in {"on", "off"}: + logger.warning("dump failed, 'dump_op_switch' need to be set as 'on' or 'off'") + # check validity of dump_mode + dump_mode_val = acl_json_dict["dump"].get("dump_mode") + if dump_mode_val is not None and dump_mode_val not in {"input", "output", "all"}: + logger.warning("dump failed, 'dump_mode' need to be set as 'input', 'output' or 'all'") + return + + +def get_acl_json_path(args): + """ + get acl json path. when args.profiler is true or args.dump is True, create relative acl.json , + default current folder + """ + if args.acl_json_path is not None: + check_valid_acl_json_for_dump(args.acl_json_path, args.model) + return args.acl_json_path + if not args.profiler and not args.dump: + return None + + output_json_dict = {} + if args.profiler: + out_profiler_path = os.path.join(args.output, "profiler") + + if not os.path.exists(out_profiler_path): + os.makedirs(out_profiler_path, PERMISSION_DIR) + output_json_dict = {"profiler": {"switch": "on", "aicpu": "on", "output": out_profiler_path, "aic_metrics": ""}} + elif args.dump: + out_dump_path = os.path.join(args.output, "dump") + + if not os.path.exists(out_dump_path): + os.makedirs(out_dump_path, PERMISSION_DIR) + + model_name = args.model.split("/")[-1] + output_json_dict = { + "dump": { + "dump_path": out_dump_path, + "dump_mode": "all", + "dump_list": [{"model_name": model_name.split('.')[0]}], + } + } + + out_json_file_path = os.path.join(args.output, "acl.json") + + OPEN_FLAGS = os.O_WRONLY | os.O_CREAT | os.O_TRUNC + OPEN_MODES = stat.S_IWUSR | stat.S_IRUSR + with ms_open(out_json_file_path, mode="w") as f: + json.dump(output_json_dict, f, indent=4, separators=(", ", ": "), sort_keys=True) + return out_json_file_path + + +def get_batchsize(session, args): + intensors_desc = session.get_inputs() + batchsize = intensors_desc[0].shape[0] + if args.dym_batch != 0: + batchsize = int(args.dym_batch) + elif args.dym_dims is not None or args.dym_shape is not None: + instr = args.dym_dims if args.dym_dims is not None else args.dym_shape + elems = instr.split(';') + for elem in elems: + tmp_idx = elem.rfind(':') + name = elem[:tmp_idx] + shapestr = elem[tmp_idx + 1 :] + if name == intensors_desc[0].name: + batchsize = int(shapestr.split(',')[0]) + return batchsize + + +def get_range_list(ranges): + elems = ranges.split(';') + info_list = [] + for elem in elems: + shapes = [] + tmp_idx = elem.rfind(':') + name = elem[:tmp_idx] + shapestr = elem[tmp_idx + 1 :] + for content in shapestr.split(','): + step = 1 + if '~' in content: + start = int(content.split('~')[0]) + end = int(content.split('~')[1]) + step = int(content.split('~')[2]) if len(content.split('~')) == 3 else 1 + ranges = [str(i) for i in range(start, end + 1, step)] + elif '-' in content: + ranges = content.split('-') + else: + start = int(content) + ranges = [str(start)] + shapes.append(ranges) + logger.debug("content:{} get range{}".format(content, ranges)) + shape_list = [','.join(s) for s in list(itertools.product(*shapes))] + info = ["{}:{}".format(name, s) for s in shape_list] + info_list.append(info) + logger.debug("name:{} shapes:{} info:{}".format(name, shapes, info)) + + res = [';'.join(s) for s in list(itertools.product(*info_list))] + logger.debug("range list:{}".format(res)) + return res + + +# get dymshape list from input_ranges +# input_ranges can be a string like "name1:1,3,224,224;name2:1,600" or file +def get_dymshape_list(input_ranges): + ranges_list = [] + if os.path.isfile(input_ranges): + with ms_open(input_ranges, mode="rt", max_size=MAX_SIZE_LIMITE_NORMAL_FILE, encoding='utf-8') as finfo: + line = finfo.readline() + while line: + line = line.rstrip('\n') + ranges_list.append(line) + line = finfo.readline() + else: + ranges_list.append(input_ranges) + + dymshape_list = [] + for ranges in ranges_list: + dymshape_list.extend(get_range_list(ranges)) + return dymshape_list + + +# get throughput from out log +def get_throughtput_from_log(out_log): + log_list = out_log.split('\n') + for log_txt in log_list: + if "throughput" in log_txt: + throughput = float(log_txt.split(' ')[-1]) + return "OK", throughput + return "Failed", 0 + + +def regenerate_dymshape_cmd(args: AISBenchInferArgsAdapter, dym_shape): + args_dict = args.get_all_args_dict() + cmd = sys.executable + " -m ais_bench" + for key, value in args_dict.items(): + if key == '--dymShape_range': + continue + if key == '--dymShape': + cmd = cmd + " " + f"{key}={dym_shape}" + continue + if value: + cmd = cmd + " " + f"{key}={value}" + cmd_list = cmd.split(' ') + return cmd_list + + +def dymshape_range_run(args: AISBenchInferArgsAdapter): + dymshape_list = get_dymshape_list(args.dym_shape_range) + results = [] + for dymshape in dymshape_list: + cmd = regenerate_dymshape_cmd(args, dymshape) + result = {"dymshape": dymshape, "cmd": cmd, "result": "Failed", "throughput": 0} + logger.debug("cmd:{}".format(cmd)) + p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE) + stdout, _ = p.communicate(timeout=10) + out_log = stdout.decode('utf-8') + print(out_log) # show original log of cmd + result["result"], result["throughput"] = get_throughtput_from_log(out_log) + logger.info("dymshape:{} end run result:{}".format(dymshape, result["result"])) + results.append(result) + + tlist = [result["throughput"] for result in results if result["result"] == "OK"] + logger.info("-----------------dyshape_range Performance Summary------------------") + logger.info("run_count:{} success_count:{} avg_throughput:{}".format(len(results), len(tlist), np.mean(tlist))) + results.sort(key=lambda x: x['throughput'], reverse=True) + for i, result in enumerate(results): + logger.info( + "{} dymshape:{} result:{} throughput:{}".format( + i, result["dymshape"], result["result"], result["throughput"] + ) + ) + logger.info("------------------------------------------------------") diff --git a/tools/infer_tool/ais_bench/infer/common/path_security_check.py b/tools/infer_tool/ais_bench/infer/common/path_security_check.py new file mode 100644 index 0000000..81ef7f4 --- /dev/null +++ b/tools/infer_tool/ais_bench/infer/common/path_security_check.py @@ -0,0 +1,293 @@ +# Copyright (c) 2023-2023 Huawei Technologies Co., Ltd. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at + +# http://www.apache.org/licenses/LICENSE-2.0 + +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# this file is as same as components/utils/file_opem_check.py, because benchmark might be install without ait + +import os +import sys +import stat +import re +import logging + + +MAX_SIZE_UNLIMITE = -1 # 不限制,必须显式表示不限制,读取必须传入 +MAX_SIZE_LIMITE_CONFIG_FILE = 10 * 1024 * 1024 # 10M 普通配置文件,可以根据实际要求变更 +MAX_SIZE_LIMITE_NORMAL_FILE = 4 * 1024 * 1024 * 1024 # 4G 普通模型文件,可以根据实际要求变更 +MAX_SIZE_LIMITE_MODEL_FILE = 100 * 1024 * 1024 * 1024 # 100G 超大模型文件,需要确定能处理大文件,可以根据实际要求变更 + +PATH_WHITE_LIST_REGEX_WIN = re.compile(r"[^_:\\A-Za-z0-9/.-]") +PATH_WHITE_LIST_REGEX = re.compile(r"[^_A-Za-z0-9/.-]") + +PERMISSION_NORMAL = 0o640 # 普通文件 +PERMISSION_KEY = 0o600 # 密钥文件 +READ_FILE_NOT_PERMITTED_STAT = stat.S_IWGRP | stat.S_IWOTH +WRITE_FILE_NOT_PERMITTED_STAT = stat.S_IWGRP | stat.S_IWOTH | stat.S_IROTH | stat.S_IXOTH + +SOLUTION_LEVEL = 35 +SOLUTION_LEVEL_WIN = 45 +logging.addLevelName(SOLUTION_LEVEL, "\033[1;32m" + "SOLUTION" + "\033[0m") # green [SOLUTION] +logging.addLevelName(SOLUTION_LEVEL_WIN, "SOLUTION_WIN") +logging.basicConfig(stream=sys.stdout, level=logging.INFO, format='[%(levelname)s] %(message)s') +logger = logging.getLogger(__name__) + + +SOLUTION_BASE_URL = 'https://gitee.com/ascend/ait/wikis/ait_security_error_log_solution' +SOFT_LINK_SUB_URL = '/soft_link_error_log_solution' +PATH_LENGTH_SUB_URL = '/path_length_overflow_error_log_solution' +OWNER_SUB_URL = '/owner_or_ownergroup_error_log_solution' +PERMISSION_SUB_URL = '/path_permission_error_log_solution' +ILLEGAL_CHAR_SUB_URL = '/path_contain_illegal_char_error_log_solution' + + +def solution_log(content): + logger.log(SOLUTION_LEVEL, f"visit \033[1;32m {content} \033[0m for detailed solution") # green content + + +def solution_log_win(content): + logger.log(SOLUTION_LEVEL_WIN, f"visit {content} for detailed solution") + + +def is_legal_path_length(path): + if len(path) > 4096 and not sys.platform.startswith("win"): # linux total path length limit + logger.error(f"file total path{path} length out of range (4096), please check the file(or directory) path") + solution_log(SOLUTION_BASE_URL + PATH_LENGTH_SUB_URL) + return False + + if len(path) > 260 and sys.platform.startswith("win"): # windows total path length limit + logger.error(f"file total path{path} length out of range (260), please check the file(or directory) path") + solution_log_win(SOLUTION_BASE_URL + PATH_LENGTH_SUB_URL) + return False + + dirnames = path.split("/") + for dirname in dirnames: + if len(dirname) > 255: # linux single file path length limit + logger.error(f"file name{dirname} length out of range (255), please check the file(or directory) path") + solution_log(SOLUTION_BASE_URL + PATH_LENGTH_SUB_URL) + return False + return True + + +def is_match_path_white_list(path): + if PATH_WHITE_LIST_REGEX.search(path) and not sys.platform.startswith("win"): + logger.error(f"path:{path} contains illegal char, legal chars include A-Z a-z 0-9 _ - / .") + solution_log(SOLUTION_BASE_URL + ILLEGAL_CHAR_SUB_URL) + return False + if PATH_WHITE_LIST_REGEX_WIN.search(path) and sys.platform.startswith("win"): + logger.error(f"path:{path} contains illegal char, legal chars include A-Z a-z 0-9 _ - / . : \\") + solution_log_win(SOLUTION_BASE_URL + ILLEGAL_CHAR_SUB_URL) + return False + return True + + +def is_legal_args_path_string(path): + # only check path string + if not path: + return True + if not is_legal_path_length(path): + return False + if not is_match_path_white_list(path): + return False + return True + + +class OpenException(Exception): + pass + + +class FileStat: + def __init__(self, file) -> None: + if not is_legal_path_length(file) or not is_match_path_white_list(file): + raise OpenException(f"create FileStat failed") + self.file = file + self.is_file_exist = os.path.exists(file) + if self.is_file_exist: + self.file_stat = os.stat(file) + self.realpath = os.path.realpath(file) + else: + self.file_stat = None + + @property + def is_exists(self): + return self.is_file_exist + + @property + def is_softlink(self): + return os.path.islink(self.file) if self.file_stat else False + + @property + def is_file(self): + return stat.S_ISREG(self.file_stat.st_mode) if self.file_stat else False + + @property + def is_dir(self): + return stat.S_ISDIR(self.file_stat.st_mode) if self.file_stat else False + + @property + def file_size(self): + return self.file_stat.st_size if self.file_stat else 0 + + @property + def permission(self): + return stat.S_IMODE(self.file_stat.st_mode) if self.file_stat else 0o777 + + @property + def owner(self): + return self.file_stat.st_uid if self.file_stat else -1 + + @property + def group_owner(self): + return self.file_stat.st_gid if self.file_stat else -1 + + @property + def is_owner(self): + return self.owner == (os.geteuid() if hasattr(os, "geteuid") else 0) + + @property + def is_group_owner(self): + return self.group_owner in (os.getgroups() if hasattr(os, "getgroups") else [0]) + + @property + def is_user_or_group_owner(self): + return self.is_owner or self.is_group_owner + + @property + def is_user_and_group_owner(self): + return self.is_owner and self.is_group_owner + + def is_basically_legal(self, perm='none'): + if sys.platform.startswith("win"): + return self.check_windows_permission(perm) + else: + return self.check_linux_permission(perm) + + def check_linux_permission(self, perm='none'): + if not self.is_exists and perm != 'write': + logger.error(f"path: {self.file} not exist, please check if file or dir is exist") + return False + if self.is_softlink: + logger.error(f"path :{self.file} is a soft link, not supported, please import file(or directory) directly") + solution_log(SOLUTION_BASE_URL + SOFT_LINK_SUB_URL) + return False + if not self.is_user_or_group_owner and self.is_exists: + logger.error( + f"current user isn't path:{self.file}'s owner or ownergroup, make sure current user belong to file(or directory)'s owner or ownergroup" + ) + solution_log(SOLUTION_BASE_URL + OWNER_SUB_URL) + return False + if perm == 'read': + if self.permission & READ_FILE_NOT_PERMITTED_STAT > 0: + logger.error( + f"The file {self.file} is group writable, or is others writable, as import file(or directory), " + "permission should not be over 0o755(rwxr-xr-x)" + ) + solution_log(SOLUTION_BASE_URL + PERMISSION_SUB_URL) + return False + if not os.access(self.realpath, os.R_OK) or self.permission & stat.S_IRUSR == 0: + logger.error( + f"Current user doesn't have read permission to the file {self.file}, as import file(or directory), " + "permission should be at least 0o400(r--------) " + ) + solution_log(SOLUTION_BASE_URL + PERMISSION_SUB_URL) + return False + elif perm == 'write' and self.is_exists: + if self.permission & WRITE_FILE_NOT_PERMITTED_STAT > 0: + logger.error( + f"The file {self.file} is group writable, or is others writable, as export file(or directory), " + "permission should not be over 0o750(rwxr-x---)" + ) + solution_log(SOLUTION_BASE_URL + PERMISSION_SUB_URL) + return False + if not os.access(self.realpath, os.W_OK): + logger.error( + f"Current user doesn't have write permission to the file {self.file}, as export file(or directory), " + "permission should be at least 0o200(-w-------) " + ) + solution_log(SOLUTION_BASE_URL + PERMISSION_SUB_URL) + return False + return True + + def check_windows_permission(self, perm='none'): + if not self.is_exists and perm != 'write': + logger.error(f"path: {self.file} not exist, please check if file or dir is exist") + return False + if self.is_softlink: + logger.error(f"path :{self.file} is a soft link, not supported, please import file(or directory) directly") + solution_log(SOLUTION_BASE_URL + SOFT_LINK_SUB_URL) + return False + return True + + def is_legal_file_size(self, max_size): + if not self.is_file: + logger.error(f"path: {self.file} is not a file") + return False + if self.file_size > max_size: + logger.error(f"file_size:{self.file_size} byte out of max limit {max_size} byte") + return False + else: + return True + + def is_legal_file_type(self, file_types: list): + if not self.is_file and self.is_exists: + logger.error(f"path: {self.file} is not a file") + return False + for file_type in file_types: + if os.path.splitext(self.file)[1] == f".{file_type}": + return True + logger.error(f"path:{self.file}, file type not in {file_types}") + return False + + +def ms_open(file, mode="r", max_size=None, softlink=False, write_permission=PERMISSION_NORMAL, **kwargs): + file_stat = FileStat(file) + + if file_stat.is_exists and file_stat.is_dir: + raise OpenException(f"Expecting a file, but it's a folder. {file}") + + if "r" in mode: + if not file_stat.is_exists: + raise OpenException(f"No such file or directory {file}") + if max_size is None: + raise OpenException(f"Reading files must have a size limit control. {file}") + if max_size != MAX_SIZE_UNLIMITE and max_size < file_stat.file_size: + raise OpenException(f"The file size has exceeded the specifications and cannot be read. {file}") + + if "w" in mode: + if file_stat.is_exists and not file_stat.is_owner: + raise OpenException( + f"The file owner is inconsistent with the current process user and is not allowed to write. {file}" + ) + if file_stat.is_exists: + os.remove(file) + + if not softlink and file_stat.is_softlink: + raise OpenException(f"Softlink is not allowed to be opened. {file}") + + if "a" in mode: + if not file_stat.is_owner: + raise OpenException( + f"The file owner is inconsistent with the current process user and is not allowed to write. {file}" + ) + if file_stat.permission != (file_stat.permission & write_permission): + os.chmod(file, file_stat.permission & write_permission) + + flags = os.O_RDONLY + if "+" in mode: + flags = flags | os.O_RDWR + elif "w" in mode or "a" in mode or "x" in mode: + flags = flags | os.O_WRONLY + + if "w" in mode or "x" in mode: + flags = flags | os.O_TRUNC | os.O_CREAT + if "a" in mode: + flags = flags | os.O_APPEND | os.O_CREAT + return os.fdopen(os.open(file, flags, mode=write_permission), mode, **kwargs) diff --git a/tools/infer_tool/ais_bench/infer/common/utils.py b/tools/infer_tool/ais_bench/infer/common/utils.py new file mode 100644 index 0000000..a3a88f1 --- /dev/null +++ b/tools/infer_tool/ais_bench/infer/common/utils.py @@ -0,0 +1,274 @@ +# Copyright (c) 2023-2023 Huawei Technologies Co., Ltd. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +import os +import sys +import stat +import re +import uuid +from pickle import NONE +import logging +from random import sample +from string import digits, ascii_uppercase, ascii_lowercase +import json +import shutil +import shlex +import subprocess +import numpy as np +from ais_bench.infer.common.path_security_check import ( + ms_open, + MAX_SIZE_LIMITE_NORMAL_FILE, + MAX_SIZE_LIMITE_CONFIG_FILE, + FileStat, + is_legal_args_path_string, +) + +logging.basicConfig(stream=sys.stdout, level=logging.INFO, format='[%(levelname)s] %(message)s') +logger = logging.getLogger(__name__) + +PERMISSION_DIR = 0o750 +READ_WRITE_FLAGS = os.O_RDWR | os.O_CREAT +WRITE_FLAGS = os.O_WRONLY | os.O_CREAT | os.O_TRUNC +WRITE_MODES = stat.S_IWUSR | stat.S_IRUSR +MSACCUCMP_FILE_PATH = "tools/operator_cmp/compare/msaccucmp.py" +CANN_PATH = "/usr/local/Ascend/ascend-toolkit/latest" + + +# Split a List Into Even Chunks of N Elements +def list_split(list_a, n, padding_file): + for x in range(0, len(list_a), n): + every_chunk = list_a[x : n + x] + + if len(every_chunk) < n: + every_chunk = every_chunk + [padding_file for _ in range(n - len(every_chunk))] + yield every_chunk + + +def list_share(list_a, count, num, left): + head = 0 + for i in range(count): + if i < left: + every_chunk = list_a[head : head + num + 1] + head = head + num + 1 + else: + every_chunk = list_a[head : head + num] + head = head + num + yield every_chunk + + +def natural_sort(lst): + convert = lambda text: int(text) if text.isdigit() else text.lower() + alphanum_key = lambda key: [convert(c) for c in re.split('([0-9]+)', key)] + return sorted(lst, key=alphanum_key) + + +def get_fileslist_from_dir(dir_): + files_list = [] + + for f in os.listdir(dir_): + f_true_path = os.path.join(dir_, f) + f_stat = FileStat(f_true_path) + if not f_stat.is_basically_legal('read'): + raise RuntimeError(f'input data:{f_true_path} is illegal') + if f_stat.is_dir: + continue + if f.endswith(".npy") or f.endswith(".NPY") or f.endswith(".bin") or f.endswith(".BIN"): + files_list.append(os.path.join(dir_, f)) + + if len(files_list) == 0: + logger.error('{} of input args not find valid file,valid file format:[*.npy *.NPY *.bin *.BIN]'.format(dir_)) + raise RuntimeError() + files_list.sort() + return natural_sort(files_list) + + +def get_file_datasize(file_path): + if file_path.endswith(".NPY") or file_path.endswith(".npy"): + ndata = np.load(file_path) + return ndata.nbytes + else: + return os.path.getsize(file_path) + + +def get_file_content(file_path): + if file_path.endswith(".NPY") or file_path.endswith(".npy"): + return np.load(file_path) + else: + with ms_open(file_path, mode="rb", max_size=MAX_SIZE_LIMITE_NORMAL_FILE) as fd: + barray = fd.read() + return np.frombuffer(barray, dtype=np.int8) + + +def get_ndata_fmt(ndata): + if ndata.dtype == np.float32 or ndata.dtype == np.float16 or ndata.dtype == np.float64: + fmt = "%f" + else: + fmt = "%d" + return fmt + + +def save_data_to_files(file_path, ndata): + if file_path.endswith(".NPY") or file_path.endswith(".npy"): + with ms_open(file_path, mode="wb") as f: + np.save(f, ndata) + elif file_path.endswith(".TXT") or file_path.endswith(".txt"): + outdata = ndata.reshape(-1, ndata.shape[-1]) + fmt = get_ndata_fmt(outdata) + with ms_open(file_path, mode="wb") as f: + for i in range(outdata.shape[0]): + np.savetxt(f, np.c_[outdata[i]], fmt=fmt, newline=" ") + f.write(b"\n") + else: + with ms_open(file_path, mode="wb") as f: + ndata.tofile(f) + + +def create_fake_file_name(pure_data_type, index): + suffix = "_" + pure_data_type + "_" + str(index) + loop_max = 1000 + for _ in range(loop_max): + fname = os.path.join(os.getcwd(), "tmp-" + "".join(str(uuid.uuid4())) + suffix) + if not os.path.exists(fname): + return fname + raise RuntimeError(f'create_fake_file_name failed: inner error') + + +def get_dump_relative_paths(output_dir, timestamp): + if output_dir is None or timestamp is None: + return [] + dump_dir = os.path.join(output_dir, timestamp) + dump_relative_paths = [] + for subdir, _, files in os.walk(dump_dir): + if len(files) > 0: + dump_relative_paths.append(os.path.relpath(subdir, dump_dir)) + return dump_relative_paths + + +def get_msaccucmp_path(): + ascend_toolkit_path = os.environ.get("ASCEND_TOOLKIT_HOME") + if not is_legal_args_path_string(ascend_toolkit_path): + raise TypeError(f"ASCEND_TOOLKIT_HOME:{ascend_toolkit_path} is illegal") + if ascend_toolkit_path is None: + ascend_toolkit_path = CANN_PATH + msaccucmp_path = os.path.join(ascend_toolkit_path, MSACCUCMP_FILE_PATH) + return msaccucmp_path if os.path.exists(msaccucmp_path) else None + + +def make_dirs(path): + ret = 0 + if not os.path.exists(path): + try: + os.makedirs(path, PERMISSION_DIR) + except Exception as e: + logger.warning(f"make dir {path} failed") + ret = -1 + return ret + + +def create_tmp_acl_json(acl_json_path): + with ms_open(acl_json_path, mode="r", max_size=MAX_SIZE_LIMITE_CONFIG_FILE) as f: + acl_json_dict = json.load(f) + tmp_acl_json_path, real_dump_path, tmp_dump_path = None, None, None + + # create tmp acl.json path + acl_json_path_list = acl_json_path.split("/") + acl_json_path_list[-1] = str(uuid.uuid4()) + "_" + acl_json_path_list[-1] + tmp_acl_json_path = "/".join(acl_json_path_list) + + # change acl_json_dict + if acl_json_dict.get("dump") is not None and acl_json_dict["dump"].get("dump_path") is not None: + real_dump_path = acl_json_dict["dump"]["dump_path"] + dump_path_list = real_dump_path.split("/") + if dump_path_list[-1] == "": + dump_path_list.pop() + dump_path_list.append(str(uuid.uuid4())) + tmp_dump_path = "/".join(dump_path_list) + acl_json_dict["dump"]["dump_path"] = tmp_dump_path + if make_dirs(tmp_dump_path) != 0: + tmp_dump_path = None + os.remove(tmp_acl_json_path) + tmp_acl_json_path = None + + if tmp_acl_json_path is not None: + with ms_open(tmp_acl_json_path, mode="w") as f: + json.dump(acl_json_dict, f) + + return tmp_acl_json_path, real_dump_path, tmp_dump_path + + +def convert_helper(output_dir, timestamp): # convert bin file in src path and output the npy file in dest path + ''' + before: + output_dir--|--2023***2--... (原来可能存在的时间戳路径) + |--2023***3--... (原来可能存在的时间戳路径) + |--timestamp--... (移动过的bin file目录) + + after: + output_dir--|--2023***2--... (原来可能存在的时间戳路径) + |--2023***3--... (原来可能存在的时间戳路径) + |--timestamp--... (移动过的bin file目录) + |--timestamp_npy--... (转换后npy保存的目录) + ''' + dump_relative_paths = get_dump_relative_paths(output_dir, timestamp) + msaccucmp_path = get_msaccucmp_path() + python_path = sys.executable + if python_path is None: + logger.error("convert_helper failed: python executable is not found. NPY file transfer failed.") + return + if msaccucmp_path is None: + logger.error("convert_helper failed: msaccucmp.py is not found. NPY file transfer failed.") + return + if dump_relative_paths == []: + logger.error("convert_helper failed: dump_relative_paths is empty. NPY file transfer failed.") + return + for dump_relative_path in dump_relative_paths: + dump_npy_path = os.path.join(output_dir, timestamp + "_npy", dump_relative_path) + real_dump_path = os.path.join(output_dir, timestamp, dump_relative_path) + convert_cmd = f"{python_path} {msaccucmp_path} convert -d {real_dump_path} -out {dump_npy_path}" + convert_cmd_list = shlex.split(convert_cmd) + ret = subprocess.call(convert_cmd_list, shell=False) + if ret != 0: + logger.error(f"convert_helper failed: cmd {convert_cmd} execute failed") + + +def move_subdir(src_dir, dest_dir): + # move the subdir in src_dir to dest_dir return dest_dir/subdir + # and remove the src_dir + ''' + before: + src_dir--2023***1--... (bin file存在的路径) + + dest_dir--|--2023***2--... (原来可能存在的时间戳路径) + |--2023***3--... (原来可能存在的时间戳路径) + + after: + dest_dir--|--2023***2--... (原来可能存在的时间戳路径) + |--2023***3--... (原来可能存在的时间戳路径) + |--2023***1--... (bin file移动到新的目录下) + ''' + res_dest, res_subdir = None, None + subdirs = os.listdir(src_dir) + if len(subdirs) != 1: + logger.error( + "move_subdir failed: multiple or none directory under src dir %s. " "The reason might be dump failed.", + src_dir, + ) + else: + if os.path.exists(os.path.join(dest_dir, subdirs[0])): + logger.error("move_subdir failed: dest dir %s exists" % os.path.join(dest_dir, subdirs[0])) + else: + shutil.move(os.path.join(src_dir, subdirs[0]), os.path.join(dest_dir, subdirs[0])) + res_dest, res_subdir = dest_dir, subdirs[0] + return res_dest, res_subdir diff --git a/tools/infer_tool/ais_bench/infer/infer_process.py b/tools/infer_tool/ais_bench/infer/infer_process.py new file mode 100644 index 0000000..3ae6676 --- /dev/null +++ b/tools/infer_tool/ais_bench/infer/infer_process.py @@ -0,0 +1,753 @@ +# Copyright (c) 2023-2023 Huawei Technologies Co., Ltd. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import logging +import math +import os +import sys +import time +import json +import shutil +import copy +import shlex +import re +import subprocess +import fcntl +from multiprocessing import Pool +from multiprocessing import Manager +import numpy as np + +from tqdm import tqdm + +from ais_bench.infer.interface import InferSession, MemorySummary +from ais_bench.infer.common.io_operations import (create_infileslist_from_inputs_list, + create_pipeline_fileslist_from_inputs_list, + create_intensors_from_infileslist, + get_narray_from_files_list, + get_tensor_from_files_list, + convert_real_files, + PURE_INFER_FAKE_FILE_ZERO, + PURE_INFER_FAKE_FILE_RANDOM, + PURE_INFER_FAKE_FILE, save_tensors_to_file, + get_pure_infer_data) +from ais_bench.infer.summary import summary +from ais_bench.infer.common.miscellaneous import (dymshape_range_run, get_acl_json_path, version_check, + get_batchsize, ACL_JSON_CMD_LIST) +from ais_bench.infer.common.utils import (get_file_content, get_file_datasize, + get_fileslist_from_dir, list_split, list_share, + save_data_to_files, create_fake_file_name, logger, + create_tmp_acl_json, move_subdir, convert_helper) +from ais_bench.infer.common.path_security_check import is_legal_args_path_string +from ais_bench.infer.args_adapter import AISBenchInferArgsAdapter +from ais_bench.infer.backends import BackendFactory +from ais_bench.infer.common.path_security_check import ms_open, MAX_SIZE_LIMITE_CONFIG_FILE + +PERMISSION_DIR = 0o750 +logging.basicConfig(stream=sys.stdout, level=logging.INFO, format='[%(levelname)s] %(message)s') +logger = logging.getLogger(__name__) + + +def set_session_options(session, args): + # 增加校验 + aipp_batchsize = -1 + if args.dym_batch != 0: + session.set_dynamic_batchsize(args.dym_batch) + aipp_batchsize = session.get_max_dym_batchsize() + elif args.dym_hw is not None: + hwstr = args.dym_hw.split(",") + session.set_dynamic_hw((int)(hwstr[0]), (int)(hwstr[1])) + elif args.dym_dims is not None: + session.set_dynamic_dims(args.dym_dims) + elif args.dym_shape is not None: + session.set_dynamic_shape(args.dym_shape) + else: + session.set_staticbatch() + + if args.batchsize is None: + args.batchsize = get_batchsize(session, args) + logger.info(f"try get model batchsize:{args.batchsize}") + + if not args.auto_set_dymshape_mode and not args.auto_set_dymdims_mode: + if args.batchsize < 0 and not args.dym_batch and not args.dym_dims and not args.dym_shape: + raise RuntimeError('dynamic batch om model detected, but dymbatch, dymdims or dymshape not set!') + + if aipp_batchsize < 0: + aipp_batchsize = args.batchsize + + # 确认模型只有一个动态 aipp input + if args.dym_shape is not None or args.auto_set_dymshape_mode: + aipp_input_exist = 0 + else: + aipp_input_exist = session.get_dym_aipp_input_exist() + logger.debug(f"aipp_input_exist: {aipp_input_exist}") + if (args.aipp_config is not None) and (aipp_input_exist == 1): + session.load_aipp_config_file(args.aipp_config, aipp_batchsize) + session.check_dym_aipp_input_exist() + elif (args.aipp_config is None) and (aipp_input_exist == 1): + logger.error("can't find aipp config file for model with dym aipp input , please check it!") + raise RuntimeError('aipp model without aipp config!') + elif (aipp_input_exist > 1): + logger.error(f"don't support more than one dynamic aipp input in model, \ + amount of aipp input is {aipp_input_exist}") + raise RuntimeError('aipp model has more than 1 aipp input!') + elif (aipp_input_exist == -1): + raise RuntimeError('aclmdlGetAippType failed!') + + # 设置custom out tensors size + if args.output_size is not None: + customsizes = [int(n) for n in args.output_size.split(',')] + logger.debug(f"set customsize:{customsizes}") + session.set_custom_outsize(customsizes) + + +def init_inference_session(args, acl_json_path): + session = InferSession(args.device, args.model, acl_json_path, args.debug, args.loop) + + set_session_options(session, args) + logger.debug(f"session info:{session.session}") + return session + + +def set_dymshape_shape(session, inputs): + shape_list = [] + intensors_desc = session.get_inputs() + for i, input_ in enumerate(inputs): + str_shape = [str(shape) for shape in input_.shape] + shapes = ",".join(str_shape) + dyshape = f"{intensors_desc[i].name}:{shapes}" + shape_list.append(dyshape) + dyshapes = ';'.join(shape_list) + logger.debug(f"set dymshape shape:{dyshapes}") + session.set_dynamic_shape(dyshapes) + summary.add_batchsize(inputs[0].shape[0]) + + +def set_dymdims_shape(session, inputs): + shape_list = [] + intensors_desc = session.get_inputs() + for i, input_ in enumerate(inputs): + str_shape = [str(shape) for shape in input_.shape] + shapes = ",".join(str_shape) + dydim = f"{intensors_desc[i].name}:{shapes}" + shape_list.append(dydim) + dydims = ';'.join(shape_list) + logger.debug(f"set dymdims shape:{dydims}") + session.set_dynamic_dims(dydims) + summary.add_batchsize(inputs[0].shape[0]) + + +def warmup(session, args, intensors_desc, infiles): + # prepare input data + infeeds = [] + for j, files in enumerate(infiles): + if args.run_mode == "tensor": + tensor = get_tensor_from_files_list(files, session, intensors_desc[j].realsize, + args.pure_data_type, args.no_combine_tensor_mode) + infeeds.append(tensor) + else: + narray = get_narray_from_files_list(files, intensors_desc[j].realsize, + args.pure_data_type, args.no_combine_tensor_mode) + infeeds.append(narray) + session.set_loop_count(1) + # warmup + for _ in range(args.warmup_count): + outputs = run_inference(session, args, infeeds, out_array=True) + + session.set_loop_count(args.loop) + + # reset summary info + summary.reset() + session.reset_summaryinfo() + MemorySummary.reset() + logger.info(f"warm up {args.warmup_count} done") + + +def run_inference(session, args, inputs, out_array=False): + if args.auto_set_dymshape_mode: + set_dymshape_shape(session, inputs) + elif args.auto_set_dymdims_mode: + set_dymdims_shape(session, inputs) + outputs = session.run(inputs, out_array) + return outputs + + +def run_pipeline_inference(session, args, infileslist, output_prefix, extra_session): + out = output_prefix if output_prefix is not None else "" + pure_infer_mode = False + if args.input is None: + pure_infer_mode = True + session.run_pipeline(infileslist, + out, + args.auto_set_dymshape_mode, + args.auto_set_dymdims_mode, + args.outfmt, + pure_infer_mode, + [s.session for s in extra_session]) + + +# tensor to loop infer +def infer_loop_tensor_run(session, args, intensors_desc, infileslist, output_prefix): + for i, infiles in enumerate(tqdm(infileslist, file=sys.stdout, desc='Inference tensor Processing')): + intensors = [] + for j, files in enumerate(infiles): + tensor = get_tensor_from_files_list(files, session, intensors_desc[j].realsize, + args.pure_data_type, args.no_combine_tensor_mode) + intensors.append(tensor) + outputs = run_inference(session, args, intensors) + session.convert_tensors_to_host(outputs) + if output_prefix is not None: + save_tensors_to_file( + outputs, output_prefix, infiles, + args.outfmt, i, args.output_batchsize_axis + ) + + +# files to loop iner +def infer_loop_files_run(session, args, intensors_desc, infileslist, output_prefix): + for i, infiles in enumerate(tqdm(infileslist, file=sys.stdout, desc='Inference files Processing')): + intensors = [] + for j, files in enumerate(infiles): + real_files = convert_real_files(files) + tensor = session.create_tensor_from_fileslist(intensors_desc[j], real_files) + intensors.append(tensor) + outputs = run_inference(session, args, intensors) + session.convert_tensors_to_host(outputs) + if output_prefix is not None: + save_tensors_to_file( + outputs, output_prefix, infiles, + args.outfmt, i, args.output_batchsize_axis + ) + + +# First prepare the data, then execute the reference, and then write the file uniformly +def infer_fulltensors_run(session, args, intensors_desc, infileslist, output_prefix): + outtensors = [] + intensorslist = create_intensors_from_infileslist(infileslist, intensors_desc, session, + args.pure_data_type, args.no_combine_tensor_mode) + + for inputs in tqdm(intensorslist, file=sys.stdout, desc='Inference Processing full'): + outputs = run_inference(session, args, inputs) + outtensors.append(outputs) + + for i, outputs in enumerate(outtensors): + session.convert_tensors_to_host(outputs) + if output_prefix is not None: + save_tensors_to_file( + outputs, output_prefix, infileslist[i], + args.outfmt, i, args.output_batchsize_axis + ) + + +# loop numpy array to infer +def infer_loop_array_run(session, args, intensors_desc, infileslist, output_prefix): + for i, infiles in enumerate(tqdm(infileslist, file=sys.stdout, desc='Inference array Processing')): + innarrays = [] + for j, files in enumerate(infiles): + narray = get_narray_from_files_list(files, intensors_desc[j].realsize, args.pure_data_type) + innarrays.append(narray) + outputs = run_inference(session, args, innarrays) + session.convert_tensors_to_host(outputs) + if args.output is not None: + save_tensors_to_file( + outputs, output_prefix, infiles, + args.outfmt, i, args.output_batchsize_axis + ) + + +def infer_pipeline_run(session, args, infileslist, output_prefix, extra_session): + logger.info(f"run in pipeline mode with computing threadsnumber:{args.threads}") + run_pipeline_inference(session, args, infileslist, output_prefix, extra_session) + + +def get_file_name(file_path: str, suffix: str, res_file_path: list) -> list: + """获取路径下的指定文件类型后缀的文件 + Args: + file_path: 文件夹的路径 + suffix: 要提取的文件类型的后缀 + res_file_path: 保存返回结果的列表 + Returns: 文件路径 + """ + for file in os.listdir(file_path): + + if os.path.isdir(os.path.join(file_path, file)): + get_file_name(os.path.join(file_path, file), suffix, res_file_path) + else: + res_file_path.append(os.path.join(file_path, file)) + # endswith:表示以suffix结尾。可根据需要自行修改;如:startswith:表示以suffix开头,__contains__:包含suffix字符串 + if suffix == '' or suffix is None: + return res_file_path + else: + return list(filter(lambda x: x.endswith(suffix), res_file_path)) + + +def get_legal_json_content(acl_json_path): + cmd_dict = {} + with ms_open(acl_json_path, mode="r", max_size=MAX_SIZE_LIMITE_CONFIG_FILE) as f: + json_dict = json.load(f) + profile_dict = json_dict.get("profiler") + for option_cmd in ACL_JSON_CMD_LIST: + if profile_dict.get(option_cmd): + if option_cmd == "output" and not is_legal_args_path_string(profile_dict.get(option_cmd)): + raise Exception(f"output path in acl_json is illegal!") + cmd_dict.update({"--" + option_cmd.replace('_', '-'): profile_dict.get(option_cmd)}) + if (option_cmd == "sys_hardware_mem_freq"): + cmd_dict.update({"--sys-hardware-mem": "on"}) + if (option_cmd == "sys_interconnection_freq"): + cmd_dict.update({"--sys-interconnection-profiling": "on"}) + if (option_cmd == "dvpp_freq"): + cmd_dict.update({"--dvpp-profiling": "on"}) + return cmd_dict + + +def json_to_msprof_cmd(acl_json_path): + json_dict = get_legal_json_content(acl_json_path) + msprof_option_cmd = " ".join([f"{key}={value}" for key, value in json_dict.items()]) + return msprof_option_cmd + + +def regenerate_cmd(args:AISBenchInferArgsAdapter): + args_dict = args.get_all_args_dict() + cmd = sys.executable + " -m ais_bench" + for key, value in args_dict.items(): + if key == '--acl_json_path': + continue + if key == '--warmup_count': + cmd = cmd + " " + f"{key}={0}" + continue + if key == '--profiler': + cmd = cmd + " " + f"{key}={0}" + continue + if value: + cmd = cmd + " " + f"{key}={value}" + return cmd + + +def msprof_run_profiling(args, msprof_bin): + if args.acl_json_path is not None: + # acl.json to msprof cmd + args.profiler_rename = False + cmd = regenerate_cmd(args) + msprof_cmd = f"{msprof_bin} --application=\"{cmd}\" " + json_to_msprof_cmd(args.acl_json_path) + else: + # default msprof cmd + cmd = regenerate_cmd(args) + msprof_cmd = f"{msprof_bin} --output={args.output}/profiler --application=\"{cmd}\" --model-execution=on \ + --sys-hardware-mem=on --sys-cpu-profiling=off --sys-profiling=off --sys-pid-profiling=off \ + --dvpp-profiling=on --runtime-api=on --task-time=on --aicpu=on" \ + + ret = -1 + msprof_cmd_list = shlex.split(msprof_cmd) + logger.info(f"msprof cmd:{msprof_cmd} begin run") + if (args.profiler_rename): + p = subprocess.Popen(msprof_cmd_list, stdout=subprocess.PIPE, shell=False, bufsize=0) + flags = fcntl.fcntl(p.stdout, fcntl.F_GETFL) + fcntl.fcntl(p.stdout, fcntl.F_SETFL, flags | os.O_NONBLOCK) + + get_path_flag = True + sub_str = "" + for line in iter(p.stdout.read, b''): + if not line: + continue + line = line.decode() + if (get_path_flag and line.find("PROF_") != -1): + get_path_flag = False + start_index = line.find("PROF_") + sub_str = line[start_index:(start_index + 46)] # PROF_XXXX的目录长度为46 + print(f'{line}', flush=True, end="") + p.stdout.close() + p.wait() + + output_prefix = os.path.join(args.output, "profiler") + output_prefix = os.path.join(output_prefix, sub_str) + hash_str = sub_str.rsplit('_')[-1] + file_name = get_file_name(output_prefix, ".csv", []) + file_name_json = get_file_name(output_prefix, ".json", []) + + model_name = os.path.basename(args.model).split(".")[0] + for file in file_name: + real_file = os.path.splitext(file)[0] + os.rename(file, real_file + "_" + model_name + "_" + hash_str + ".csv") + for file in file_name_json: + real_file = os.path.splitext(file)[0] + os.rename(file, real_file + "_" + model_name + "_" + hash_str + ".json") + ret = 0 + else: + ret = subprocess.call(msprof_cmd_list, shell=False) + logger.info(f"msprof cmd:{msprof_cmd} end run ret:{ret}") + return ret + + +def get_energy_consumption(npu_id): + cmd = f"npu-smi info -t power -i {npu_id}" + get_npu_id = subprocess.run(cmd.split(), shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE) + npu_id = get_npu_id.stdout.decode('gb2312') + power = [] + npu_id = npu_id.split("\n") + for key in npu_id: + if key.find("Power Dissipation(W)", 0, len(key)) != -1: + power = key[34:len(key)] + break + + return power + + +def convert(tmp_acl_json_path, real_dump_path, tmp_dump_path): + if real_dump_path is not None and tmp_dump_path is not None: + output_dir, timestamp = move_subdir(tmp_dump_path, real_dump_path) + convert_helper(output_dir, timestamp) + if tmp_dump_path is not None: + shutil.rmtree(tmp_dump_path) + if tmp_acl_json_path is not None: + os.remove(tmp_acl_json_path) + + +def main(args, index=0, msgq=None, device_list=None): + # if msgq is not None,as subproces run + if msgq is not None: + logger.info(f"subprocess_{index} main run") + + if args.debug: + logger.setLevel(logging.DEBUG) + + acl_json_path = get_acl_json_path(args) + tmp_acl_json_path = None + if args.dump_npy and acl_json_path is not None: + tmp_acl_json_path, real_dump_path, tmp_dump_path = create_tmp_acl_json(acl_json_path) + + session = init_inference_session(args, tmp_acl_json_path if tmp_acl_json_path is not None else acl_json_path) + # if pipeline is set and threads number is > 1, create a session pool for extra computing + extra_session = [] + if args.pipeline: + extra_session = [init_inference_session(args, tmp_acl_json_path if tmp_acl_json_path is not None\ + else acl_json_path) for _ in range(args.threads - 1)] + + intensors_desc = session.get_inputs() + if device_list is not None and len(device_list) > 1: + if args.output is not None: + if args.output_dirname is None: + timestr = time.strftime("%Y_%m_%d-%H_%M_%S") + output_prefix = os.path.join(args.output, timestr) + output_prefix = os.path.join(output_prefix, "device" + str(device_list[index]) + "_" + str(index)) + else: + output_prefix = os.path.join(args.output, args.output_dirname) + output_prefix = os.path.join(output_prefix, "device" + str(device_list[index]) + "_" + str(index)) + if not os.path.exists(output_prefix): + os.makedirs(output_prefix, PERMISSION_DIR) + os.chmod(args.output, PERMISSION_DIR) + logger.info(f"output path:{output_prefix}") + else: + output_prefix = None + else: + if args.output is not None: + if args.output_dirname is None: + timestr = time.strftime("%Y_%m_%d-%H_%M_%S") + output_prefix = os.path.join(args.output, timestr) + else: + output_prefix = os.path.join(args.output, args.output_dirname) + if not os.path.exists(output_prefix): + os.makedirs(output_prefix, PERMISSION_DIR) + os.chmod(args.output, PERMISSION_DIR) + logger.info(f"output path:{output_prefix}") + else: + output_prefix = None + + inputs_list = [] if args.input is None else args.input.split(',') + + # create infiles list accord inputs list + if len(inputs_list) == 0: + # Pure reference scenario. Create input zero data + if not args.pipeline: + infileslist = [[[PURE_INFER_FAKE_FILE] for _ in intensors_desc]] + else: + infileslist = [[]] + pure_file = PURE_INFER_FAKE_FILE_ZERO if args.pure_data_type == "zero" else PURE_INFER_FAKE_FILE_RANDOM + for _ in intensors_desc: + infileslist[0].append(pure_file) + else: + if not args.pipeline: + infileslist = create_infileslist_from_inputs_list(inputs_list, intensors_desc, args.no_combine_tensor_mode) + else: + infileslist = create_pipeline_fileslist_from_inputs_list(inputs_list, intensors_desc) + if not args.pipeline: + warmup(session, args, intensors_desc, infileslist[0]) + else: + # prepare for pipeline case + infiles = [] + for file in infileslist[0]: + infiles.append([file]) + warmup(session, args, intensors_desc, infiles) + for sess in extra_session: + warmup(sess, args, intensors_desc, infiles) + + if args.pipeline and (args.auto_set_dymshape_mode or args.auto_set_dymdims_mode): + for file_list in infileslist: + input_first = np.load(file_list[0]) + summary.add_batchsize(input_first.shape[0]) + + if msgq is not None: + # wait subprocess init ready, if time eplapsed, force ready run + logger.info(f"subprocess_{index} qsize:{msgq.qsize()} now waiting") + msgq.put(index) + time_sec = 0 + while True: + if msgq.qsize() >= args.subprocess_count: + break + time_sec = time_sec + 1 + if time_sec > 10: + logger.warning(f"subprocess_{index} qsize:{msgq.qsize()} time:{time_sec} s elapsed") + break + time.sleep(1) + logger.info(f"subprocess_{index} qsize:{msgq.qsize()} ready to infer run") + + start_time = time.time() + if args.energy_consumption: + start_energy_consumption = get_energy_consumption(args.npu_id) + if args.pipeline: + infer_pipeline_run(session, args, infileslist, output_prefix, extra_session) + else: + run_mode_switch = { + "array": infer_loop_array_run, + "files": infer_loop_files_run, + "full": infer_fulltensors_run, + "tensor": infer_loop_tensor_run + } + if run_mode_switch.get(args.run_mode) is not None: + run_mode_switch.get(args.run_mode)(session, args, intensors_desc, infileslist, output_prefix) + else: + raise RuntimeError(f'wrong run_mode:{args.run_mode}') + if args.energy_consumption: + end_energy_consumption = get_energy_consumption(args.npu_id) + end_time = time.time() + + multi_threads_mode = args.threads > 1 and args.pipeline + summary.add_args(sys.argv) + s = session.summary() + if multi_threads_mode: + summary.npu_compute_time_interval_list = s.exec_time_list + else: + summary.npu_compute_time_list = [end_time - start_time for start_time, end_time in s.exec_time_list] + summary.h2d_latency_list = MemorySummary.get_h2d_time_list() + summary.d2h_latency_list = MemorySummary.get_d2h_time_list() + summary.report(args.batchsize, output_prefix, args.display_all_summary, multi_threads_mode) + try: + if args.energy_consumption: + energy_consumption = ((float(end_energy_consumption) + float(start_energy_consumption)) / 2.0) \ + * (end_time - start_time) + logger.info(f"NPU ID:{args.npu_id} energy consumption(J):{energy_consumption}") + except AttributeError as err: + logger.error(f"Attribute Access Error: {err}") + raise RuntimeError("Error accessing an attribute, please verify if the NPU ID is correct. ") from err + except Exception as err: + logger.error(f"Unexpected Error: {err}") + raise RuntimeError( + "Energy consumption append an unexpected error occurred, please check the input parameters.") from err + + if msgq is not None: + # put result to msgq + msgq.put([index, summary.infodict['throughput'], start_time, end_time]) + + session.free_resource() + for sess in extra_session: + sess.free_resource() + + InferSession.finalize() + + if args.dump_npy and acl_json_path is not None: + convert(tmp_acl_json_path, real_dump_path, tmp_dump_path) + + +def print_subproces_run_error(value): + logger.error(f"subprocess run failed error_callback:{value}") + + +def seg_input_data_for_multi_process(args, inputs, jobs): + inputs_list = [] if inputs is None else inputs.split(',') + if inputs_list is None: + return inputs_list + + fileslist = [] + if os.path.isfile(inputs_list[0]): + fileslist = inputs_list + elif os.path.isdir(inputs_list[0]): + for dir_path in inputs_list: + fileslist.extend(get_fileslist_from_dir(dir_path)) + else: + logger.error(f'error {inputs_list[0]} not file or dir') + raise RuntimeError() + + args.device = 0 + acl_json_path = get_acl_json_path(args) + session = init_inference_session(args, acl_json_path) + intensors_desc = session.get_inputs() + try: + chunks_elements = math.ceil(len(fileslist) / len(intensors_desc)) + except ZeroDivisionError as err: + logger.error("ZeroDivisionError: intensors_desc is empty") + raise RuntimeError("error zero division") from err + chunks = list(list_split(fileslist, chunks_elements, None)) + fileslist = [[] for _ in range(jobs)] + for _, chunk in enumerate(chunks): + try: + splits_elements = int(len(chunk) / jobs) + except ZeroDivisionError as err: + logger.error("ZeroDivisionError: intensors_desc is empty") + raise RuntimeError("error zero division") from err + splits_left = len(chunk) % jobs + splits = list(list_share(chunk, jobs, splits_elements, splits_left)) + for j, split in enumerate(splits): + fileslist[j].extend(split) + res = [] + for files in fileslist: + res.append(','.join(list(filter(None, files)))) + return res + + +def multidevice_run(args): + logger.info(f"multidevice:{args.device} run begin") + device_list = args.device + npu_id_list = args.npu_id + p = Pool(len(device_list)) + msgq = Manager().Queue() + args.subprocess_count = len(device_list) + splits = None + if (args.input is not None and args.divide_input): + jobs = args.subprocess_count + splits = seg_input_data_for_multi_process(args, args.input, jobs) + + for i, device in enumerate(device_list): + cur_args = copy.deepcopy(args) + cur_args.device = int(device) + if args.energy_consumption: + cur_args.npu_id = int(npu_id_list[i]) + if args.divide_input: + cur_args.input = None if splits is None else list(splits)[i] + p.apply_async(main, args=(cur_args, i, msgq, device_list), error_callback=print_subproces_run_error) + + p.close() + p.join() + result = 0 if 2 * len(device_list) == msgq.qsize() else 1 + logger.info(f"multidevice run end qsize:{msgq.qsize()} result:{result}") + tlist = [] + while msgq.qsize() != 0: + ret = msgq.get() + if type(ret) == list: + logger.info(f"i:{ret[0]} device_{device_list[ret[0]]} throughput:{ret[1]} \ + start_time:{ret[2]} end_time:{ret[3]}") + tlist.append(ret[1]) + logger.info(f'summary throughput:{sum(tlist)}') + return result + + +def args_rules(args): + if args.profiler and args.dump: + logger.error("parameter --profiler cannot be true at the same time as parameter --dump, please check them!\n") + raise RuntimeError('error bad parameters --profiler and --dump') + + if (args.profiler or args.dump) and (args.output is None): + logger.error("when dump or profiler, miss output path, please check them!") + raise RuntimeError('miss output parameter!') + + if not args.auto_set_dymshape_mode and not args.auto_set_dymdims_mode: + args.no_combine_tensor_mode = False + else: + args.no_combine_tensor_mode = True + + if args.profiler and args.warmup_count != 0 and args.input is not None: + logger.info("profiler mode with input change warmup_count to 0") + args.warmup_count = 0 + + if args.output is None and args.output_dirname is not None: + logger.error( + "parameter --output_dirname cann't be used alone. Please use it together with the parameter --output!\n") + raise RuntimeError('error bad parameters --output_dirname') + + if args.threads > 1 and not args.pipeline: + logger.info("need to set --pipeline when setting threads number to be more than one.") + args.threads = 1 + + return args + + +def acl_json_base_check(args): + if args.acl_json_path is None: + return args + json_path = args.acl_json_path + try: + with ms_open(json_path, mode="r", max_size=MAX_SIZE_LIMITE_CONFIG_FILE) as f: + json_dict = json.load(f) + except Exception as err: + logger.error(f"can't read acl_json_path:{json_path}") + raise Exception from err + if json_dict.get("profiler") is not None and json_dict.get("profiler").get("switch") == "on": + args.profiler = True + if json_dict.get("dump") is not None: + args.profiler = False + return args + + +def config_check(config_path): + if not config_path: + return + max_config_size = 12800 + if os.path.splitext(config_path)[1] != ".config": + logger.error(f"aipp_config:{config_path} is not a .config file") + raise TypeError(f"aipp_config:{config_path} is not a .config file") + config_size = os.path.getsize(config_path) + if config_size > max_config_size: + logger.error(f"aipp_config_size:{config_size} byte out of max limit {max_config_size} byte") + raise MemoryError(f"aipp_config_size:{config_size} byte out of max limit") + return + + +def backend_run(args): + backend_class = BackendFactory.create_backend(args.backend) + backend = backend_class(args) + backend.load(args.model) + backend.run() + perf = backend.get_perf() + logger.info(f"perf info:{perf}") + + +def infer_process(args:AISBenchInferArgsAdapter): + args = args_rules(args) + version_check(args) + args = acl_json_base_check(args) + + if args.perf: + backend_run(args) + return 0 + + if args.profiler: + # try use msprof to run + msprof_bin = shutil.which('msprof') + if msprof_bin is None: + logger.info("find no msprof continue use acl.json mode, result won't be parsed as csv") + elif os.getenv('AIT_NO_MSPROF_MODE') == '1': + logger.info("find AIT_NO_MSPROF_MODE set, continue use acl.json mode, result won't be parsed as csv") + else: + ret = msprof_run_profiling(args, msprof_bin) + return ret + + if args.dym_shape_range is not None and args.dym_shape is None: + # dymshape range run,according range to run each shape infer get best shape + dymshape_range_run(args) + return 0 + + if type(args.device) == list: + # args has multiple device, run single process for each device + ret = multidevice_run(args) + return ret + + main(args) + return 0 diff --git a/tools/infer_tool/ais_bench/infer/interface.py b/tools/infer_tool/ais_bench/infer/interface.py new file mode 100644 index 0000000..7719a8e --- /dev/null +++ b/tools/infer_tool/ais_bench/infer/interface.py @@ -0,0 +1,889 @@ +# Copyright (c) 2023-2023 Huawei Technologies Co., Ltd. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import logging +import time +import sys +from configparser import ConfigParser +from multiprocessing import Pool +from multiprocessing import Manager +import numpy as np +import aclruntime + + +SRC_IMAGE_SIZE_W_MIN = 2 +SRC_IMAGE_SIZE_W_MAX = 4096 +SRC_IMAGE_SIZE_H_MIN = 1 +SRC_IMAGE_SIZE_H_MAX = 4096 +RBUV_SWAP_SWITCH_OFF = 0 +RBUV_SWAP_SWITCH_ON = 1 +AX_SWAP_SWITCH_OFF = 0 +AX_SWAP_SWITCH_ON = 1 +CSC_SWITCH_OFF = 0 +CSC_SWITCH_ON = 0 +CSC_MATRIX_MIN = -32677 +CSC_MATRIX_MAX = 32676 +CROP_SWITCH_OFF = 0 +CROP_SWITCH_ON = 1 +LOAD_START_POS_W_MIN = 0 +LOAD_START_POS_W_MAX = 4095 +LOAD_START_POS_H_MIN = 0 +LOAD_START_POS_H_MAX = 4095 +CROP_POS_W_MIN = 1 +CROP_POS_W_MAX = 4096 +CROP_POS_H_MIN = 1 +CROP_POS_H_MAX = 4096 +PADDING_SWITCH_OFF = 0 +PADDING_SWITCH_ON = 1 +PADDING_SIZE_MIN = 0 +PADDING_SIZE_MAX = 32 +PIXEL_MEAN_CHN_MIN = 0 +PIXEL_MEAN_CHN_MAX = 255 +PIXEL_MIN_CHN_MIN = 0 +PIXEL_MIN_CHN_MAX = 255 +PIXEL_VAR_RECI_CHN_MIN = -65504 +PIXEL_VAR_RECI_CHN_MAX = 65504 + +TORCH_TENSOR_LIST = [ + 'torch.FloatTensor', 'torch.DoubleTensor', 'torch.HalfTensor', 'torch.BFloat16Tensor', + 'torch.ByteTensor', 'torch.CharTensor', 'torch.ShortTensor', 'torch.LongTensor', + 'torch.BoolTensor', 'torch.IntTensor' +] +NP_TYPE_LIST = [ + np.int8, np.int16, np.int32, np.int64, np.uint8, np.uint16, + np.uint32, np.float16, np.float32, np.float64 +] + +logger = logging.getLogger(__name__) + + +class InferSession: + def __init__(self, device_id: int, model_path: str, acl_json_path: str = None, + debug: bool = False, loop: int = 1): + """ + init InferSession + + Args: + device_id: device id for npu device + model_path: om model path to load + acl_json_path: set acl_json_path to enable profiling or dump function + debug: enable debug log. Default: False + loop: loop count for one inference. Default: 1 + """ + self.device_id = device_id + self.model_path = model_path + self.loop = loop + self.options = aclruntime.session_options() + self.acl_json_path = acl_json_path + self.debug = debug + if acl_json_path is not None: + self.options.acl_json_path = self.acl_json_path + self.options.log_level = 1 if self.debug else 2 + self.options.loop = self.loop + self.session = aclruntime.InferenceSession(self.model_path, self.device_id, self.options) + self.outputs_names = [meta.name for meta in self.session.get_outputs()] + self.intensors_desc = self.session.get_inputs() + self.outtensors_desc = self.session.get_outputs() + self.infer_mode_switch = { + "static": self._static_prepare, + "dymbatch": self._dymbatch_prepare, + "dymhw": self._dymhw_prepare, + "dymdims": self._dymdims_prepare, + "dymshape": self._dymshape_prepare + } + + @staticmethod + def convert_tensors_to_host(tensors): + for tensor in tensors: + tensor.to_host() + + @staticmethod + def convert_tensors_to_arrays(tensors): + arrays = [] + for tensor in tensors: + # convert acltensor to numpy array + arrays.append(np.array(tensor)) + return arrays + + @staticmethod + def finalize(): + if hasattr(aclruntime.InferenceSession, 'finalize'): + aclruntime.InferenceSession.finalize() + + def get_inputs(self): + """ + get inputs info of model + """ + self.intensors_desc = self.session.get_inputs() + return self.intensors_desc + + def get_outputs(self): + """ + get outputs info of model + """ + self.outtensors_desc = self.session.get_outputs() + return self.outtensors_desc + + def set_loop_count(self, loop): + options = self.session.options() + options.loop = loop + + # 默认设置为静态batch + def set_staticbatch(self): + self.session.set_staticbatch() + + def set_dynamic_batchsize(self, dym_batch: str): + self.session.set_dynamic_batchsize(dym_batch) + + def set_dynamic_hw(self, w: int, h: int): + self.session.set_dynamic_hw(w, h) + + def get_max_dym_batchsize(self): + return self.session.get_max_dym_batchsize() + + def set_dynamic_dims(self, dym_dims: str): + self.session.set_dynamic_dims(dym_dims) + + def set_dynamic_shape(self, dym_shape: str): + self.session.set_dynamic_shape(dym_shape) + + def set_custom_outsize(self, custom_sizes): + self.session.set_custom_outsize(custom_sizes) + + def create_tensor_from_fileslist(self, desc, files): + return self.session.create_tensor_from_fileslist(desc, files) + + def create_tensor_from_arrays_to_device(self, arrays): + tensor = aclruntime.Tensor(arrays) + tensor.to_device(self.device_id) + return tensor + + def get_dym_aipp_input_exist(self): + return self.session.get_dym_aipp_input_exist() + + def check_dym_aipp_input_exist(self): + self.session.check_dym_aipp_input_exist() + + def load_aipp_config_file(self, config_file, batchsize): + cfg = ConfigParser() + cfg.read(config_file, 'UTF-8') + session_list = cfg.sections() + #多个aipp输入不支持 + if (session_list.count('aipp_op') != 1): + logger.error("nums of section aipp_op in .config file is not supported, please check it!") + raise RuntimeError('wrong aipp config file content!') + option_list = cfg.options('aipp_op') + if (option_list.count('input_format') == 1): + self.aipp_set_input_format(cfg) + else: + logger.error("can not find input_format in config file, please check it!") + raise RuntimeError('wrong aipp config file content!') + + if (option_list.count('src_image_size_w') == 1 and option_list.count('src_image_size_h') == 1): + self.aipp_set_src_image_size(cfg) + else: + logger.error("can not find src_image_size in config file, please check it!") + raise RuntimeError('wrong aipp config file content!') + self.session.aipp_set_max_batch_size(batchsize) + self.aipp_set_rbuv_swap_switch(cfg, option_list) + self.aipp_set_ax_swap_switch(cfg, option_list) + self.aipp_set_csc_params(cfg, option_list) + self.aipp_set_crop_params(cfg, option_list) + self.aipp_set_padding_params(cfg, option_list) + self.aipp_set_dtc_pixel_mean(cfg, option_list) + self.aipp_set_dtc_pixel_min(cfg, option_list) + self.aipp_set_pixel_var_reci(cfg, option_list) + + ret = self.session.set_dym_aipp_info_set() + return ret + + def aipp_set_input_format(self, cfg): + input_format = cfg.get('aipp_op', 'input_format') + legal_format = ["YUV420SP_U8", "XRGB8888_U8", "RGB888_U8", "YUV400_U8"] + if (legal_format.count(input_format) == 1): + self.session.aipp_set_input_format(input_format) + else: + logger.error("input_format in config file is illegal, please check it!") + raise RuntimeError('wrong aipp config file content!') + + def aipp_set_src_image_size(self, cfg): + src_image_size = list() + tmp_size_w = cfg.getint('aipp_op', 'src_image_size_w') + tmp_size_h = cfg.getint('aipp_op', 'src_image_size_h') + if (SRC_IMAGE_SIZE_W_MIN <= tmp_size_w <= SRC_IMAGE_SIZE_W_MAX): + src_image_size.append(tmp_size_w) + else: + logger.error("src_image_size_w in config file out of range, please check it!") + raise RuntimeError('wrong aipp config file content!') + if (SRC_IMAGE_SIZE_H_MIN <= tmp_size_h <= SRC_IMAGE_SIZE_H_MAX): + src_image_size.append(tmp_size_h) + else: + logger.error("src_image_size_h in config file out of range, please check it!") + raise RuntimeError('wrong aipp config file content!') + + self.session.aipp_set_src_image_size(src_image_size) + + def aipp_set_rbuv_swap_switch(self, cfg, option_list): + if (option_list.count('rbuv_swap_switch') == 0): + self.session.aipp_set_rbuv_swap_switch(RBUV_SWAP_SWITCH_OFF) + return + tmp_rs_switch = cfg.getint('aipp_op', 'rbuv_swap_switch') + if (tmp_rs_switch == RBUV_SWAP_SWITCH_OFF or tmp_rs_switch == RBUV_SWAP_SWITCH_ON): + self.session.aipp_set_rbuv_swap_switch(tmp_rs_switch) + else: + logger.error("rbuv_swap_switch in config file out of range, please check it!") + raise RuntimeError('wrong aipp config file content!') + + def aipp_set_ax_swap_switch(self, cfg, option_list): + if (option_list.count('ax_swap_switch') == 0): + self.session.aipp_set_ax_swap_switch(AX_SWAP_SWITCH_OFF) + return + tmp_as_switch = cfg.getint('aipp_op', 'ax_swap_switch') + if (tmp_as_switch == AX_SWAP_SWITCH_OFF or tmp_as_switch == AX_SWAP_SWITCH_ON): + self.session.aipp_set_ax_swap_switch(tmp_as_switch) + else: + logger.error("ax_swap_switch in config file out of range, please check it!") + raise RuntimeError('wrong aipp config file content!') + + def aipp_set_csc_params(self, cfg, option_list): + if (option_list.count('csc_switch') == 0): + tmp_csc_switch = CSC_SWITCH_OFF + else: + tmp_csc_switch = cfg.getint('aipp_op', 'csc_switch') + + if (tmp_csc_switch == CSC_SWITCH_OFF): + tmp_csc_params = [0] * 16 + elif (tmp_csc_switch == CSC_SWITCH_ON): + tmp_csc_params = list() + tmp_csc_params.append(tmp_csc_switch) + options = [ + 'matrix_r0c0', 'matrix_r0c1', 'matrix_r0c2', 'matrix_r1c0', 'matrix_r1c1', 'matrix_r1c2', + 'matrix_r2c0', 'matrix_r2c1', 'matrix_r2c2', 'output_bias_0', 'output_bias_1', 'output_bias_2', + 'input_bias_0', 'input_bias_1', 'input_bias_2' + ] + for option in options: + tmp_csc_params.append(0 if option_list.count(option) == 0 else cfg.getint('aipp_op', option)) + + range_ok = True + for i in range(1, 9): + range_ok = range_ok and (CSC_MATRIX_MIN <= tmp_csc_params[i] <= CSC_MATRIX_MAX) + for i in range(10, 15): + range_ok = range_ok and (0 <= tmp_csc_params[i] <= 255) + if (range_ok is False): + logger.error("csc_params in config file out of range, please check it!") + raise RuntimeError('wrong aipp config file content!') + else: + logger.error("csc_switch in config file out of range, please check it!") + raise RuntimeError('wrong aipp config file content!') + + self.session.aipp_set_csc_params(tmp_csc_params) + + def aipp_set_crop_params(self, cfg, option_list): + if (option_list.count('crop') == 0): + tmp_crop_switch = CROP_SWITCH_OFF + else: + tmp_crop_switch = cfg.getint('aipp_op', 'crop') + + if (tmp_crop_switch == CROP_SWITCH_OFF): + tmp_crop_params = [0, 0, 0, 416, 416] + elif (tmp_crop_switch == CROP_SWITCH_ON): + tmp_crop_params = list() + tmp_crop_params.append(tmp_crop_switch) + tmp_crop_params.append( + 0 if option_list.count('load_start_pos_w') == 0 else cfg.getint('aipp_op', 'load_start_pos_w') + ) + tmp_crop_params.append( + 0 if option_list.count('load_start_pos_h') == 0 else cfg.getint('aipp_op', 'load_start_pos_h') + ) + tmp_crop_params.append( + 0 if option_list.count('crop_size_w') == 0 else cfg.getint('aipp_op', 'crop_size_w') + ) + tmp_crop_params.append( + 0 if option_list.count('crop_size_h') == 0 else cfg.getint('aipp_op', 'crop_size_h') + ) + + range_ok = True + range_ok = range_ok and (LOAD_START_POS_W_MIN <= tmp_crop_params[1] <= LOAD_START_POS_W_MAX) + range_ok = range_ok and (LOAD_START_POS_H_MIN <= tmp_crop_params[2] <= LOAD_START_POS_H_MAX) + range_ok = range_ok and (CROP_POS_W_MIN <= tmp_crop_params[3] <= CROP_POS_W_MAX) + range_ok = range_ok and (CROP_POS_H_MIN <= tmp_crop_params[4] <= CROP_POS_H_MAX) + if (range_ok is False): + logger.error("crop_params in config file out of range, please check it!") + raise RuntimeError('wrong aipp config file content!') + else: + logger.error("crop_switch(crop) in config file out of range, please check it!") + raise RuntimeError('wrong aipp config file content!') + + self.session.aipp_set_crop_params(tmp_crop_params) + + def aipp_set_padding_params(self, cfg, option_list): + if (option_list.count('padding') == 0): + tmp_padding_switch = PADDING_SWITCH_OFF + else: + tmp_padding_switch = cfg.getint('aipp_op', 'padding') + + if (tmp_padding_switch == PADDING_SWITCH_OFF): + tmp_padding_params = [0] * 5 + elif (tmp_padding_switch == PADDING_SWITCH_ON): + tmp_padding_params = list() + tmp_padding_params.append(tmp_padding_switch) + tmp_padding_params.append( + 0 if option_list.count('padding_size_top') == 0 else cfg.getint('aipp_op', 'padding_size_top') + ) + tmp_padding_params.append( + 0 if option_list.count('padding_size_bottom') == 0 else cfg.getint('aipp_op', 'padding_size_bottom') + ) + tmp_padding_params.append( + 0 if option_list.count('padding_size_left') == 0 else cfg.getint('aipp_op', 'padding_size_left') + ) + tmp_padding_params.append( + 0 if option_list.count('padding_size_right') == 0 else cfg.getint('aipp_op', 'padding_size_right') + ) + + range_ok = True + for i in range(1, 5): + range_ok = range_ok and (PADDING_SIZE_MIN <= tmp_padding_params[i] <= PADDING_SIZE_MAX) + if (range_ok is False): + logger.error("padding_params in config file out of range, please check it!") + raise RuntimeError('wrong aipp config file content!') + else: + logger.error("padding_switch in config file out of range, please check it!") + raise RuntimeError('wrong aipp config file content!') + + self.session.aipp_set_padding_params(tmp_padding_params) + + def aipp_set_dtc_pixel_mean(self, cfg, option_list): + tmp_mean_params = list() + tmp_mean_params.append( + 0 if option_list.count('mean_chn_0') == 0 else cfg.getint('aipp_op', 'mean_chn_0') + ) + tmp_mean_params.append( + 0 if option_list.count('mean_chn_1') == 0 else cfg.getint('aipp_op', 'mean_chn_1') + ) + tmp_mean_params.append( + 0 if option_list.count('mean_chn_2') == 0 else cfg.getint('aipp_op', 'mean_chn_2') + ) + tmp_mean_params.append( + 0 if option_list.count('mean_chn_3') == 0 else cfg.getint('aipp_op', 'mean_chn_3') + ) + + range_ok = True + for i in range(0, 4): + range_ok = range_ok and (PIXEL_MEAN_CHN_MIN <= tmp_mean_params[i] <= PIXEL_MEAN_CHN_MAX) + if (range_ok is False): + logger.error("mean_chn_params in config file out of range, please check it!") + raise RuntimeError('wrong aipp config file content!') + + self.session.aipp_set_dtc_pixel_mean(tmp_mean_params) + + def aipp_set_dtc_pixel_min(self, cfg, option_list): + tmp_min_params = list() + tmp_min_params.append( + 0 if option_list.count('min_chn_0') == 0 else cfg.getfloat('aipp_op', 'min_chn_0') + ) + tmp_min_params.append( + 0 if option_list.count('min_chn_1') == 0 else cfg.getfloat('aipp_op', 'min_chn_1') + ) + tmp_min_params.append( + 0 if option_list.count('min_chn_2') == 0 else cfg.getfloat('aipp_op', 'min_chn_2') + ) + tmp_min_params.append( + 0 if option_list.count('min_chn_3') == 0 else cfg.getfloat('aipp_op', 'min_chn_3') + ) + + range_ok = True + for i in range(0, 4): + range_ok = range_ok and (PIXEL_MIN_CHN_MIN <= tmp_min_params[i] <= PIXEL_MIN_CHN_MAX) + if (range_ok is False): + logger.error("min_chn_params in config file out of range, please check it!") + raise RuntimeError('wrong aipp config file content!') + + self.session.aipp_set_dtc_pixel_min(tmp_min_params) + + def aipp_set_pixel_var_reci(self, cfg, option_list): + tmp_reci_params = list() + tmp_reci_params.append( + 0 if option_list.count('var_reci_chn_0') == 0 else cfg.getfloat('aipp_op', 'var_reci_chn_0') + ) + tmp_reci_params.append( + 0 if option_list.count('var_reci_chn_1') == 0 else cfg.getfloat('aipp_op', 'var_reci_chn_1') + ) + tmp_reci_params.append( + 0 if option_list.count('var_reci_chn_2') == 0 else cfg.getfloat('aipp_op', 'var_reci_chn_2') + ) + tmp_reci_params.append( + 0 if option_list.count('var_reci_chn_3') == 0 else cfg.getfloat('aipp_op', 'var_reci_chn_3') + ) + + range_ok = True + for i in range(0, 4): + range_ok = range_ok and (PIXEL_VAR_RECI_CHN_MIN <= tmp_reci_params[i] <= PIXEL_VAR_RECI_CHN_MAX) + if (range_ok is False): + logger.error("var_reci_chn_params in config file out of range, please check it!") + raise RuntimeError('wrong aipp config file content!') + + self.session.aipp_set_pixel_var_reci(tmp_reci_params) + + def run(self, feeds, out_array=False): + if len(feeds) > 0 and isinstance(feeds[0], np.ndarray): + # if feeds is ndarray list, convert to baseTensor + inputs = [] + for array in feeds: + basetensor = aclruntime.BaseTensor(array.__array_interface__['data'][0], array.nbytes) + inputs.append(basetensor) + else: + inputs = feeds + outputs = self.session.run(self.outputs_names, inputs) + if out_array: + # convert to host tensor + self.convert_tensors_to_host(outputs) + # convert tensor to narray + return self.convert_tensors_to_arrays(outputs) + else: + return outputs + + def run_pipeline(self, infilelist, output, auto_shape=False, + auto_dims=False, outfmt="BIN", pure_infer_mode=False, extra_session=None): + infer_options = aclruntime.infer_options() + infer_options.output_dir = output + infer_options.auto_dym_shape = auto_shape + infer_options.auto_dym_dims = auto_dims + infer_options.out_format = outfmt + infer_options.pure_infer_mode = pure_infer_mode + extra_session = [] if extra_session is None else extra_session + self.session.run_pipeline(infilelist, infer_options, extra_session) + + def reset_summaryinfo(self): + self.session.reset_sumaryinfo() + + def infer(self, feeds, mode='static', custom_sizes=100000, out_array=True): + ''' + Parameters: + feeds: input data + mode: static dymdims dymshape... + ''' + inputs = [] + shapes = [] + for feed in feeds: + if type(feed) is np.ndarray: + infer_input = feed + if not infer_input.flags.c_contiguous: + infer_input = np.ascontiguousarray(infer_input) + shapes.append(infer_input.shape) + elif type(feed) in NP_TYPE_LIST: + infer_input = np.array(feed) + if not infer_input.flags.c_contiguous: + infer_input = np.ascontiguousarray(infer_input) + shapes.append([feed.size]) + elif type(feed) is aclruntime.Tensor: + infer_input = feed + shapes.append(infer_input.shape) + elif hasattr(feed, 'type') and feed.type() in TORCH_TENSOR_LIST: + infer_input = feed.numpy() + if not feed.is_contiguous(): + infer_input = np.ascontiguousarray(infer_input) + shapes.append(infer_input.shape) + else: + raise RuntimeError('type:{} invalid'.format(type(feed))) + inputs.append(infer_input) + + if self.infer_mode_switch.get(mode) is not None: + self.infer_mode_switch.get(mode)(shapes, custom_sizes) + else: + raise RuntimeError('wrong infer_mode:{}, only support \"static\",\"dymbatch\",\"dymhw\", \ + \"dymdims\",\"dymshape\"'.format(mode)) + + return self.run(inputs, out_array) + + def free_resource(self): + if hasattr(self.session, "free_resource"): + self.session.free_resource() + + def infer_pipeline(self, feeds_list, mode='static', custom_sizes=100000): + ''' + Parameters: + feeds_list: input data list + mode: static dymdims dymshape... + ''' + inputs_list = [] + shapes_list = [] + for feeds in feeds_list: + inputs = [] + shapes = [] + for feed in feeds: + if type(feed) is np.ndarray: + infer_input = feed + if not infer_input.flags.c_contiguous: + infer_input = np.ascontiguousarray(infer_input) + shape = feed.shape + elif type(feed) in NP_TYPE_LIST: + infer_input = np.array(feed) + if not infer_input.flags.c_contiguous: + infer_input = np.ascontiguousarray(infer_input) + shape = [feed.size] + elif type(feed) is aclruntime.Tensor: + infer_input = np.array(feed) + shape = infer_input.shape + elif hasattr(feed, 'type') and feed.type() in TORCH_TENSOR_LIST: + infer_input = feed.numpy() + infer_input = np.ascontiguousarray(infer_input) if not feed.is_contiguous() else infer_input + shape = infer_input.shape + else: + raise RuntimeError('type:{} invalid'.format(type(feed))) + basetensor = aclruntime.BaseTensor(infer_input.__array_interface__['data'][0], infer_input.nbytes) + inputs.append(basetensor) + shapes.append(shape) + inputs_list.append(inputs) + shapes_list.append(shapes) + if self.infer_mode_switch.get(mode) is not None and mode != "dymshape" and mode != "dymdims": + self.infer_mode_switch.get(mode)(shapes, custom_sizes) + elif mode == "dymshape": + if isinstance(custom_sizes, int): + custom_sizes = [custom_sizes] * len(self.get_outputs()) + elif not isinstance(custom_sizes, list): + raise RuntimeError('custom_sizes:{} type:{} invalid'.format( + custom_sizes, type(custom_sizes))) + self.session.set_custom_outsize(custom_sizes) + elif mode == "dymdims": + pass + else: + raise RuntimeError('wrong infer_mode:{}, only support \"static\",\"dymbatch\",\"dymhw\", \ + \"dymdims\",\"dymshape\"'.format(mode)) + outputs = self.session.run_pipeline(self.outputs_names, inputs_list, shapes_list, + mode == 'dymshape', mode == 'dymdims') + for i, output in enumerate(outputs): + outputs[i] = self.convert_tensors_to_arrays(output) + return outputs + + def inner_run(self, in_out_list, get_outputs=False, mem_copy=True): + ''' + Parameters: + in_out_list: relation between current input datas and last output datas + get_outputs: get outputs from device or not + mem_copy: the way inputs get data from outputs + ''' + if (get_outputs): + outputs = self.session.inner_run(in_out_list, self.outputs_names, get_outputs, mem_copy) + return outputs + else: + self.session.inner_run(in_out_list, self.outputs_names, get_outputs, mem_copy) + outputs = None + return outputs + + def first_inner_run(self, feeds, mode='static', custom_sizes=100000): + ''' + Parameters: + feeds: input data + mode: static dymdims dymshapes ... + custom_sizes: must equal to the realsize of outputs + ''' + inputs = [] + shapes = [] + for feed in feeds: + if type(feed) is np.ndarray: + infer_input = feed + if not infer_input.flags.c_contiguous: + infer_input = np.ascontiguousarray(infer_input) + shapes.append(infer_input.shape) + elif type(feed) in NP_TYPE_LIST: + infer_input = np.array(feed) + if not infer_input.flags.c_contiguous: + infer_input = np.ascontiguousarray(infer_input) + shapes.append([feed.size]) + elif hasattr(feed, 'type') and feed.type() in TORCH_TENSOR_LIST: + infer_input = feed.numpy() + if not feed.is_contiguous(): + infer_input = np.ascontiguousarray(infer_input) + shapes.append(infer_input.shape) + else: + raise RuntimeError('type:{} invalid'.format(type(feed))) + basetensor = aclruntime.BaseTensor(infer_input.__array_interface__['data'][0], infer_input.nbytes) + inputs.append(basetensor) + + if self.infer_mode_switch.get(mode) is not None: + self.infer_mode_switch.get(mode)(shapes, custom_sizes) + else: + raise RuntimeError('wrong infer_mode:{}, only support \"static\",\"dymbatch\",\"dymhw\", \ + \"dymdims\",\"dymshape\"'.format(mode)) + + return self.session.first_inner_run(self.outputs_names, inputs) + + def infer_iteration(self, feeds, in_out_list=None, iteration_times=1, mode='static', + custom_sizes=100000, mem_copy=True): + ''' + Parameters: + feeds: input datas + in_out_list: relation between current input datas and last output datas + iteration_times: inner iteration infer loop times + mode: static dymdims dymshape ... + custom_sizes: only dymshape needs + ''' + if not in_out_list: + in_out_list = [] + if len(in_out_list) != len(self.get_inputs()): + raise RuntimeError(f"inputs' amount and length of in_out_list not matched!") + if (iteration_times == 1): + outputs = self.infer(feeds, mode, custom_sizes) + return outputs + else: + self.first_inner_run(feeds, mode, custom_sizes) + for _ in range(iteration_times - 2): + self.inner_run(in_out_list, False, mem_copy) + outputs = self.inner_run(in_out_list, True, mem_copy) + # convert to host tensor + self.convert_tensors_to_host(outputs) + # convert tensor to narray + return self.convert_tensors_to_arrays(outputs) + + def summary(self): + return self.session.sumary() + + def _static_prepare(self, shapes, custom_sizes): + self.set_staticbatch() + + def _dymbatch_prepare(self, shapes, custom_sizes): + indesc = self.get_inputs() + if (len(shapes) != len(indesc)): + raise RuntimeError("input datas and intensors nums not matched!") + for i, shape in enumerate(shapes): + for j, dim in enumerate(shape): + if (indesc[i].shape[j] < 0): + self.set_dynamic_batchsize(dim) + return + if (indesc[i].shape[j] != dim): + raise RuntimeError("input datas and intensors dim not matched!") + raise RuntimeError("not a dymbatch model!") + + def _dymhw_prepare(self, shapes, custom_sizes): + indesc = self.get_inputs() + if (len(shapes) != len(indesc)): + raise RuntimeError("input datas and intensors nums not matched!") + for i, shape in enumerate(shapes): + if (indesc[i].shape[2] < 0 and indesc[i].shape[3] < 0): + self.set_dynamic_hw(shape[2], shape[3]) + return + raise RuntimeError("not a dymhw model!") + + def _dymdims_prepare(self, shapes, custom_sizes): + dym_list = [] + indesc = self.get_inputs() + if (len(shapes) != len(indesc)): + raise RuntimeError("input datas and intensors nums not matched!") + for i, shape in enumerate(shapes): + str_shape = [str(val) for val in shape] + dyshape = "{}:{}".format(indesc[i].name, ",".join(str_shape)) + dym_list.append(dyshape) + dyshapes = ';'.join(dym_list) + self.session.set_dynamic_dims(dyshapes) + + def _dymshape_prepare(self, shapes, custom_sizes): + dym_list = [] + indesc = self.get_inputs() + if (len(shapes) != len(indesc)): + raise RuntimeError("input datas and intensors nums not matched!") + outdesc = self.get_outputs() + for i, shape in enumerate(shapes): + str_shape = [str(val) for val in shape] + dyshape = "{}:{}".format(indesc[i].name, ",".join(str_shape)) + dym_list.append(dyshape) + dyshapes = ';'.join(dym_list) + self.session.set_dynamic_shape(dyshapes) + if isinstance(custom_sizes, int): + custom_sizes = [custom_sizes] * len(outdesc) + elif not isinstance(custom_sizes, list): + raise RuntimeError('custom_sizes:{} type:{} invalid'.format( + custom_sizes, type(custom_sizes))) + self.session.set_custom_outsize(custom_sizes) + + +class MultiDeviceSession(): + def __init__(self, model_path: str, acl_json_path: str = None, debug: bool = False, loop: int = 1): + self.model_path = model_path + self.acl_json_path = acl_json_path + self.debug = debug + self.loop = loop + self.summary = {} + + @classmethod + def print_subprocess_run_error(cls, value): + logger.error(f"subprocess run failed error_callback:{value}") + + def summary(self): + return self.summary + + def infer(self, device_feeds:dict, mode='static', custom_sizes=100000): + ''' + Parameters: + device_feeds: device match [input datas1, input datas2...] (Dict) + ''' + subprocess_num = 0 + for _, device in device_feeds.items(): + subprocess_num += len(device) + p = Pool(subprocess_num) + outputs_queue = Manager().Queue() + for device_id, feeds in device_feeds.items(): + for feed in feeds: + p.apply_async( + self.subprocess_infer, + args=(outputs_queue, device_id, feed, mode, custom_sizes), + error_callback=self.print_subprocess_run_error + ) + p.close() + p.join() + result = 0 if 2 * len(device_feeds) == outputs_queue.qsize() else 1 + logger.info(f"multidevice run end qsize:{outputs_queue.qsize()} result:{result}") + outputs_dict = {} + self.summary.clear() + while outputs_queue.qsize() != 0: + ret = outputs_queue.get() + if type(ret) == list: + if (not outputs_dict.get(ret[0])): + outputs_dict.update({ret[0]: []}) + self.summary.update({ret[0]: []}) + outputs_dict.get(ret[0]).append(ret[1]) + self.summary.get(ret[0]).append((ret[3] - ret[2]) * 1000) + logger.info(f"device {ret[0]}, start_time:{ret[2]}, end_time:{ret[3]}") + return outputs_dict + + def infer_pipeline(self, device_feeds_list:dict, mode='static', custom_sizes=100000): + ''' + Parameters: + device_feeds: device match [input datas1, input datas2...] (Dict) + ''' + subprocess_num = 0 + for _, device in device_feeds_list.items(): + subprocess_num += len(device) + p = Pool(subprocess_num) + outputs_queue = Manager().Queue() + for device_id, feeds in device_feeds_list.items(): + for feed in feeds: + p.apply_async( + self.subprocess_infer_pipeline, + args=(outputs_queue, device_id, feed, mode, custom_sizes), + error_callback=self.print_subprocess_run_error + ) + p.close() + p.join() + result = 0 if 2 * len(device_feeds_list) == outputs_queue.qsize() else 1 + logger.info(f"multidevice run pipeline end qsize:{outputs_queue.qsize()} result:{result}") + outputs_dict = {} + self.summary.clear() + while outputs_queue.qsize() != 0: + ret = outputs_queue.get() + if type(ret) == list: + if (not outputs_dict.get(ret[0])): + outputs_dict.update({ret[0]: []}) + self.summary.update({ret[0]: []}) + outputs_dict.get(ret[0]).append(ret[1]) + self.summary.get(ret[0]).append((ret[3] - ret[2]) * 1000) + logger.info(f"device {ret[0]}, start_time:{ret[2]}, end_time:{ret[3]}") + return outputs_dict + + def infer_iteration(self, device_feeds:dict, in_out_list=None, iteration_times=1, mode='static', custom_sizes=None, mem_copy=True): + ''' + Parameters: + device_feeds: device match [input datas1, input datas2...] (Dict) + ''' + subprocess_num = 0 + for _, device in device_feeds.items(): + subprocess_num += len(device) + p = Pool(subprocess_num) + outputs_queue = Manager().Queue() + for device_id, feeds in device_feeds.items(): + for feed in feeds: + p.apply_async( + self.subprocess_infer_iteration, + args=(outputs_queue, device_id, feed, in_out_list, iteration_times, mode, custom_sizes, mem_copy), + error_callback=self.print_subprocess_run_error + ) + p.close() + p.join() + result = 0 if 2 * len(device_feeds) == outputs_queue.qsize() else 1 + logger.info(f"multidevice run iteration end qsize:{outputs_queue.qsize()} result:{result}") + outputs_dict = {} + self.summary.clear() + while outputs_queue.qsize() != 0: + ret = outputs_queue.get() + if type(ret) == list: + if (not outputs_dict.get(ret[0])): + outputs_dict.update({ret[0]: []}) + self.summary.update({ret[0]: []}) + outputs_dict.get(ret[0]).append(ret[1]) + self.summary.get(ret[0]).append((ret[3] - ret[2]) * 1000) + logger.info(f"device {ret[0]}, start_time:{ret[2]}, end_time:{ret[3]}") + return outputs_dict + + def subprocess_infer(self, outputs_queue, device_id, feeds, mode='static', custom_sizes=100000): + sub_session = InferSession( + device_id=device_id, + model_path=self.model_path, + acl_json_path=self.acl_json_path, + debug=self.debug, + loop=self.loop + ) + start_time = time.time() + outputs = sub_session.infer(feeds, mode, custom_sizes, out_array=True) + end_time = time.time() + outputs_queue.put([device_id, outputs, start_time, end_time]) + return + + def subprocess_infer_pipeline(self, outputs_queue, device_id, feeds_list, mode='static', custom_sizes=100000): + sub_session = InferSession( + device_id=device_id, + model_path=self.model_path, + acl_json_path=self.acl_json_path, + debug=self.debug, + loop=self.loop + ) + start_time = time.time() + outputs = sub_session.infer_pipeline(feeds_list, mode, custom_sizes) + end_time = time.time() + outputs_queue.put([device_id, outputs, start_time, end_time]) + return + + def subprocess_infer_iteration(self, outputs_queue, device_id, feeds, in_out_list=None, + iteration_times=1, mode='static', custom_sizes=None, mem_copy=True): + sub_session = InferSession( + device_id=device_id, + model_path=self.model_path, + acl_json_path=self.acl_json_path, + debug=self.debug, + loop=self.loop + ) + start_time = time.time() + outputs = sub_session.infer_iteration(feeds, in_out_list, iteration_times, mode, custom_sizes, mem_copy) + end_time = time.time() + outputs_queue.put([device_id, outputs, start_time, end_time]) + return + + +class MemorySummary: + @staticmethod + def get_h2d_time_list(): + if hasattr(aclruntime, 'MemorySummary'): + return aclruntime.MemorySummary().H2D_time_list + else: + return [] + + @staticmethod + def get_d2h_time_list(): + if hasattr(aclruntime, 'MemorySummary'): + return aclruntime.MemorySummary().D2H_time_list + else: + return [] + + @staticmethod + def reset(): + if hasattr(aclruntime, 'MemorySummary'): + aclruntime.MemorySummary().reset() diff --git a/tools/infer_tool/ais_bench/infer/registry.py b/tools/infer_tool/ais_bench/infer/registry.py new file mode 100644 index 0000000..60f4784 --- /dev/null +++ b/tools/infer_tool/ais_bench/infer/registry.py @@ -0,0 +1,103 @@ +# Copyright (c) 2023-2023 Huawei Technologies Co., Ltd. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +import logging +from typing import Any, Dict, Iterable, Iterator, Tuple +from ais_bench.infer.common.utils import logger + + +class Registry(Iterable[Tuple[str, Any]]): + """ + The registry that provides name -> object mapping, to support third-party + users' custom modules. + """ + def register(self, obj: Any = None) -> Any: + """ + Register the given object under the the name `obj.__name__`. + Can be used as either a decorator or not.See docstring of this class for usage. + """ + if callable(obj): + return add(None, obj) + + def add(name: str, obj: Any) -> Any: + self[name] = obj + return obj + + return lambda x: add(obj, x) + + def __init__(self, name: str) -> None: + """ + Args: + name (str): the name of this registry + """ + self._name: str = name + self._obj_map: Dict[str, Any] = {} + + def __setitem__(self, name: str, obj: Any) -> None: + if not callable(obj): + raise ValueError("Value of a Registry must be a callable!") + + if name is None: + name = obj.__name__ + + if name in self._obj_map: + raise ValueError( + f"An object named '{name}' was already registered in '{self._name}' registry!" + ) + self._obj_map[name] = obj + + def __getitem__(self, name: str) -> Any: + return self._obj_map[name] + + def __call__(self, obj: Any) -> Any: + return self.register(obj) + + def __contains__(self, name: str) -> bool: + return name in self._obj_map + + def __repr__(self) -> str: + from tabulate import tabulate + + table_headers = ["Names", "Objects"] + table = tabulate( + self._obj_map.items(), headers=table_headers, tablefmt="fancy_grid" + ) + return "Registry of {}:\n".format(self._name) + table + + def __iter__(self) -> Iterator[Tuple[str, Any]]: + return iter(self._obj_map.items()) + + +def import_all_modules_for_register(module_paths, base_model_name): + import os + import importlib + + modules = [] + for _, _, files in os.walk(module_paths): + for filename in files: + if not filename.endswith(".py") or filename == "__init__.py": + continue + model_name = base_model_name + "." + filename.rsplit(".", 1)[0] + modules.append(model_name) + + errors = [] + for module in modules: + try: + importlib.import_module(module) + except ImportError as e: + errors.append((module, e)) + logger.info(f"import {module} error: {e}") + + return errors \ No newline at end of file diff --git a/tools/infer_tool/ais_bench/infer/summary.py b/tools/infer_tool/ais_bench/infer/summary.py new file mode 100644 index 0000000..65d1cb8 --- /dev/null +++ b/tools/infer_tool/ais_bench/infer/summary.py @@ -0,0 +1,229 @@ +# Copyright (c) 2023-2023 Huawei Technologies Co., Ltd. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +import json +import os +import stat + +import numpy as np +from ais_bench.infer.common.utils import logger +from ais_bench.infer.common.path_security_check import ms_open + + +class ListInfo(object): + def __init__(self): + self.min = 0.0 + self.max = 0.0 + self.mean = 0.0 + self.median = 0.0 + self.percentile = 0.0 + + +class Result(object): + def __init__(self): + self.npu_compute_time = None + self.h2d_latency = None + self.d2h_latency = None + self.throughput = None + self.scale = None + self.batchsize = None + + +class Summary(object): + def __init__(self): + self.reset() + self.infodict = {"filesinfo": {}} + + @staticmethod + def merge_intervals(intervals): + intervals.sort(key=lambda x: x[0]) + merged = [] + for interval in intervals: + if not merged or merged[-1][1] < interval[0]: + merged.append(list(interval)) + else: + merged[-1][1] = max(merged[-1][1], interval[1]) + return merged + + @staticmethod + def get_list_info(work_list, percentile_scale, merge=False): + list_info = ListInfo() + if merge: # work_list is a 2-dim vector each element is a pair containing start and end time + n = len(work_list) + if n == 0: + raise RuntimeError(f'summary.get_list_info failed: inner error') + merged_intervals = Summary.merge_intervals(work_list) + sum_time = sum(end_time - start_time for start_time, end_time in merged_intervals) + list_info.mean = sum_time / n + + elif len(work_list) != 0: + list_info.min = np.min(work_list) + list_info.max = np.max(work_list) + list_info.mean = np.mean(work_list) + list_info.median = np.median(work_list) + list_info.percentile = np.percentile(work_list, percentile_scale) + + return list_info + + def reset(self): + self.h2d_latency_list = [] + self.d2h_latency_list = [] + self.npu_compute_time_list = [] + self.npu_compute_time_interval_list = [] + self._batchsizes = [] + + def add_batchsize(self, n: int): + self._batchsizes.append(n) + + def add_sample_id_infiles(self, sample_id, infiles): + if self.infodict["filesinfo"].get(sample_id) is None: + self.infodict["filesinfo"][sample_id] = {"infiles": [], "outfiles": []} + if len(self.infodict["filesinfo"][sample_id]["infiles"]) == 0: + for files in infiles: + self.infodict["filesinfo"][sample_id]["infiles"].append(files) + + def append_sample_id_outfile(self, sample_id, outfile): + if self.infodict["filesinfo"].get(sample_id) is None: + self.infodict["filesinfo"][sample_id] = {"infiles": [], "outfiles": []} + self.infodict["filesinfo"][sample_id]["outfiles"].append(outfile) + + def add_args(self, args): + self.infodict["args"] = args + + def record(self, result, multi_threads=False): + if multi_threads: + self.infodict['NPU_compute_time'] = { + "mean": result.npu_compute_time.mean, + "count": len(self.npu_compute_time_interval_list), + } + self.infodict['H2D_latency'] = {"mean": result.h2d_latency.mean, "count": len(self.h2d_latency_list)} + self.infodict['D2H_latency'] = {"mean": result.d2h_latency.mean, "count": len(self.d2h_latency_list)} + self.infodict['npu_compute_time_list'] = self.npu_compute_time_interval_list + else: + self.infodict['NPU_compute_time'] = { + "min": result.npu_compute_time.min, + "max": result.npu_compute_time.max, + "mean": result.npu_compute_time.mean, + "median": result.npu_compute_time.median, + "percentile({}%)".format(result.scale): result.npu_compute_time.percentile, + "count": len(self.npu_compute_time_list), + } + self.infodict['H2D_latency'] = { + "min": result.h2d_latency.min, + "max": result.h2d_latency.max, + "mean": result.h2d_latency.mean, + "median": result.h2d_latency.median, + "percentile({}%)".format(result.scale): result.h2d_latency.percentile, + "count": len(self.h2d_latency_list), + } + self.infodict['D2H_latency'] = { + "min": result.d2h_latency.min, + "max": result.d2h_latency.max, + "mean": result.d2h_latency.mean, + "median": result.d2h_latency.median, + "percentile({}%)".format(result.scale): result.d2h_latency.percentile, + "count": len(self.d2h_latency_list), + } + self.infodict['npu_compute_time_list'] = self.npu_compute_time_list + self.infodict['throughput'] = result.throughput + self.infodict['pid'] = os.getpid() + + def display(self, result, display_all_summary, multi_threads): + logger.info("-----------------Performance Summary------------------") + if multi_threads: + if display_all_summary is True: + logger.info("H2D_latency (ms): mean = {0}".format(result.h2d_latency.mean)) + logger.info("NPU_compute_time (ms): mean = {0}".format(result.npu_compute_time.mean)) + if display_all_summary is True: + logger.info("D2H_latency (ms): mean = {0}".format(result.d2h_latency.mean)) + else: + if display_all_summary is True: + logger.info( + "H2D_latency (ms): min = {0}, max = {1}, mean = {2}, median = {3}, percentile({4}%) = {5}".format( + result.h2d_latency.min, + result.h2d_latency.max, + result.h2d_latency.mean, + result.h2d_latency.median, + result.scale, + result.h2d_latency.percentile, + ) + ) + + logger.info( + "NPU_compute_time (ms): min = {0}, max = {1}, mean = {2}, median = {3}, percentile({4}%) = {5}".format( + result.npu_compute_time.min, + result.npu_compute_time.max, + result.npu_compute_time.mean, + result.npu_compute_time.median, + result.scale, + result.npu_compute_time.percentile, + ) + ) + if display_all_summary is True: + logger.info( + "D2H_latency (ms): min = {0}, max = {1}, mean = {2}, median = {3}, percentile({4}%) = {5}".format( + result.d2h_latency.min, + result.d2h_latency.max, + result.d2h_latency.mean, + result.d2h_latency.median, + result.scale, + result.d2h_latency.percentile, + ) + ) + logger.info( + "throughput 1000*batchsize.mean({})/NPU_compute_time.mean({}): {}".format( + result.batchsize, result.npu_compute_time.mean, result.throughput + ) + ) + logger.info("------------------------------------------------------") + + def report(self, batchsize, output_prefix, display_all_summary=False, multi_threads=False): + scale = 99 + + if self.npu_compute_time_list and self.npu_compute_time_interval_list: + logger.error("npu_compute_time_list and npu_compute_time_interval_list exits at the same time") + raise Exception + if self.npu_compute_time_list: + npu_compute_time = Summary.get_list_info(self.npu_compute_time_list, scale) + else: + npu_compute_time = Summary.get_list_info(self.npu_compute_time_interval_list, scale, True) + h2d_latency = Summary.get_list_info(self.h2d_latency_list, scale) + d2h_latency = Summary.get_list_info(self.d2h_latency_list, scale) + if self._batchsizes: + batchsize = sum(self._batchsizes) / len(self._batchsizes) + else: + pass + if npu_compute_time.mean == 0: + throughput = 0 + else: + throughput = 1000 * batchsize / npu_compute_time.mean + + result = Result() + result.npu_compute_time = npu_compute_time + result.d2h_latency = d2h_latency + result.h2d_latency = h2d_latency + result.throughput = throughput + result.scale = scale + result.batchsize = batchsize + + self.record(result, multi_threads) + self.display(result, display_all_summary, multi_threads) + + if output_prefix is not None: + with ms_open(output_prefix + "_summary.json", mode="w") as f: + json.dump(self.infodict, f) + + +summary = Summary() diff --git a/tools/infer_tool/requirements.txt b/tools/infer_tool/requirements.txt new file mode 100644 index 0000000..e6094bf --- /dev/null +++ b/tools/infer_tool/requirements.txt @@ -0,0 +1,3 @@ +numpy +tqdm +attrs >= 21.3.0 \ No newline at end of file diff --git a/tools/infer_tool/setup.py b/tools/infer_tool/setup.py new file mode 100644 index 0000000..df023b7 --- /dev/null +++ b/tools/infer_tool/setup.py @@ -0,0 +1,51 @@ +# Copyright (c) 2023-2023 Huawei Technologies Co., Ltd. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import subprocess +from setuptools import setup, find_packages # type: ignore + + +with open('requirements.txt', encoding='utf-8') as f: + required = f.read().splitlines() + +with open('README.md', encoding='utf-8') as f: + long_description = f.read() + +# 使用Git命令获取最新的提交哈希 +try: + git_hash = subprocess.check_output(['git', 'rev-parse', 'HEAD']).decode('utf-8').strip() +except Exception: + git_hash = "" +# 使用Git命令获取最新的提交日期和时间 +try: + git_date = subprocess.check_output(['git', 'show', '-s', '--format=%cd', 'HEAD']).decode('utf-8').strip() +except Exception: + git_date = "" + +setup( + name='ais_bench', + version='0.0.2', + description='ais_bench tool', + long_description=long_description, + url=f"https://gitee.com/ascend/tools/, commit id: {git_hash}, release_date: {git_date}", + release_date = git_date, + packages=find_packages(), + include_package_data=True, + keywords='ais_bench tool', + install_requires=required, + python_requires='>=3.7', + entry_points={ + 'benchmark_sub_task': ['benchmark=ais_bench.infer.main_cli:get_cmd_instance'], + }, + +) \ No newline at end of file -- Gitee From 4f54363d231af81f59bda579c386d53510b4ef86 Mon Sep 17 00:00:00 2001 From: Hanye Date: Mon, 6 May 2024 11:34:39 +0800 Subject: [PATCH 5/8] readme fix --- tools/infer_tool/README.md | 272 ++----------------------------------- 1 file changed, 14 insertions(+), 258 deletions(-) diff --git a/tools/infer_tool/README.md b/tools/infer_tool/README.md index 7546417..545123a 100644 --- a/tools/infer_tool/README.md +++ b/tools/infer_tool/README.md @@ -142,19 +142,11 @@ ais_bench推理工具可以通过配置不同的参数,来应对各种测试 | --outputSize | 指定模型的输出数据所占内存大小,多个输出时,需要为每个输出设置一个值,多个值之间用“,”隔开。
动态Shape场景下,获取模型的输出size通常为0(即输出数据占内存大小未知),需要根据输入的Shape,预估一个较合适的大小,配置输出数据占内存大小。
例如:--dymShape "input1:8,3,5,10;input2:5,3,10,10" --outputSize "10000,10000" | 否 | | --auto_set_dymdims_mode | 自动设置动态Dims模式。1或true(开启)、0或false(关闭),默认关闭。
针对动态档位Dims模型,根据输入的文件的信息,自动设置Shape参数,注意输入数据只能为npy文件,因为bin文件不能读取Shape信息。
配合input参数使用,单独使用无效。
例如:--input 1.npy --auto_set_dymdims_mode 1 | 否 | | --auto_set_dymshape_mode | 自动设置动态Shape模式。取值为:1或true(开启)、0或false(关闭),默认关闭。
针对动态Shape模型,根据输入的文件的信息,自动设置Shape参数,注意输入数据只能为npy文件,因为bin文件不能读取Shape信息。
配合input参数使用,单独使用无效。
例如:--input 1.npy --auto_set_dymshape_mode 1 | 否 | -| --profiler | profiler开关。1或true(开启)、0或false(关闭),默认关闭。
profiler数据在--output参数指定的目录下的profiler文件夹内。配合--output参数使用,单独使用无效。不能与--dump同时开启。| 否 | -| --profiler_rename | 调用profiler落盘文件文件名修改开关,开启后落盘的文件名包含模型名称信息。1或true(开启)、0或false(关闭),默认开启。配合--profiler参数使用,单独使用无效。 |否| -| --dump | dump开关。1或true(开启)、0或false(关闭),默认关闭。
dump数据在--output参数指定的目录下的dump文件夹内。配合--output参数使用,单独使用无效。不能与--profiler同时开启。 | 否 | -| --acl_json_path | acl.json文件路径,须指定一个有效的json文件。该文件内可配置profiler或者dump。当配置该参数时,--dump和--profiler参数无效。 | 否 | | --batchsize | 模型batchsize。不输入该值将自动推导。当前推理模块根据模型输入和文件输出自动进行组Batch。参数传递的batchszie有且只用于结果吞吐率计算。自动推导逻辑为尝试获取模型的batchsize时,首先获取第一个参数的最高维作为batchsize; 如果是动态Batch的话,更新为动态Batch的值;如果是动态dims和动态Shape更新为设置的第一个参数的最高维。如果自动推导逻辑不满足要求,请务必传入准确的batchsize值,以计算出正确的吞吐率。 | 否 | | --output_batchsize_axis | 输出tensor的batchsize轴,默认值为0。输出结果保存文件时,根据哪个轴进行切割推理结果,比如batchsize为2,表示2个输入文件组batch进行推理,那输出结果的batch维度是在哪个轴。默认为0轴,按照0轴进行切割为2份,但是部分模型的输出batch为1轴,所以要设置该值为1。 | 否 | -| --aipp_config|带有动态aipp配置的om模型在推理前需要配置的AIPP具体参数,以.config文件路径形式传入。当om模型带有动态aipp配置时,此参数为必填参数;当om模型不带有动态aipp配置时,配置此参数不影响正常推理。|否| | --backend|指定trtexec开关。需要指定为trtexec。配合--perf参数使用,单独使用无效。|否| | --perf|调用trtexec开关。1或true(开启)、0或false(关闭),默认关闭。配合--backend参数使用,单独使用无效。|否| -| --energy_consumption |能耗采集开关。1或true(开启)、0或false(关闭),默认关闭。需要配合--npu_id参数使用,默认npu_id为0。|否| -| --npu_id |指定npu_id,默认值为0。需要通过npu-smi info命令获取指定device所对应的npu id。配合--energy_consumption参数使用,单独使用无效。|否| | --pipeline |指定pipeline开关,用于开启多线程推理功能。1或true(开启)、0或false(关闭),默认关闭。|否| -| --dump_npy |指定dump_npy开关,用于开启dump结果自动转换功能。1或true(开启)、0或false(关闭),默认关闭。需要配合--output和--dump/--acl_json_path参数使用,单独使用无效。|否| | --threads |指定threads开关,用于设置多计算线程推理时计算线程的数量。默认值为1,取值范围为大于0的正整数。需要配合--pipeline 1参数使用,单独使用无效。|否| ### 使用场景 @@ -166,7 +158,7 @@ ais_bench推理工具可以通过配置不同的参数,来应对各种测试 示例命令如下: ```bash -python3 -m ais_bench --model /home/model/resnet50_v1.om --output ./ --outfmt BIN --loop 5 +python3 -m ais_bench --model --output ./ --outfmt BIN --loop 5 ``` #### 调试模式 @@ -175,7 +167,7 @@ python3 -m ais_bench --model /home/model/resnet50_v1.om --output ./ --outfmt BIN 示例命令如下: ```bash -python3 -m ais_bench --model /home/model/resnet50_v1.om --output ./ --debug 1 +python3 -m ais_bench --model --output ./ --debug 1 ``` 调试模式开启后会增加更多的打印信息,包括: @@ -206,7 +198,7 @@ python3 -m ais_bench --model /home/model/resnet50_v1.om --output ./ --debug 1 示例命令如下: ```bash -python3 -m ais_bench --model ./resnet50_v1_bs1_fp32.om --input "./1.bin,./2.bin,./3.bin,./4.bin,./5.bin" +python3 -m ais_bench --model --input "./1.bin,./2.bin,./3.bin,./4.bin,./5.bin" ``` #### 文件夹输入场景 @@ -216,7 +208,7 @@ python3 -m ais_bench --model ./resnet50_v1_bs1_fp32.om --input "./1.bin,./2.bin, 本场景会根据文件输入size和模型实际输入size进行组Batch。 ```bash -python3 -m ais_bench --model ./resnet50_v1_bs1_fp32.om --input "./" +python3 -m ais_bench --model --input "./" ``` 模型输入需要与传入文件夹的个数一致。 @@ -229,7 +221,7 @@ python3 -m ais_bench --model ./resnet50_v1_bs1_fp32.om --input "./" - 第三个文件夹"./data/SQuAD1.1/segment_ids",对应第三个输入"segment_ids"的输入 ```bash -python3 -m ais_bench --model ./save/model/BERT_Base_SQuAD_BatchSize_1.om --input ./data/SQuAD1.1/input_ids,./data/SQuAD1.1/input_mask,./data/SQuAD1.1/segment_ids +python3 -m ais_bench --model --input ./data/SQuAD1.1/input_ids,./data/SQuAD1.1/input_mask,./data/SQuAD1.1/segment_ids ``` @@ -241,7 +233,7 @@ python3 -m ais_bench --model ./save/model/BERT_Base_SQuAD_BatchSize_1.om --input 示例命令如下: ```bash -python3 -m ais_bench --model ./pth_resnet50_bs1.om --input ./data/ --device 1,2 +python3 -m ais_bench --model --input ./data/ --device 1,2 ``` 输出结果依次展示每个Device的推理测试结果,示例如下: @@ -280,7 +272,7 @@ i:1 device_2 throughput:276.54867008654026 start_time:1676875630.8043878 end_tim 以档位1 2 4 8档为例,设置档位为2,本程序将获取实际模型输入组Batch,每2个输入为一组,进行组Batch。 ```bash -python3 -m ais_bench --model ./resnet50_v1_dynamicbatchsize_fp32.om --input=./data/ --dymBatch 2 +python3 -m ais_bench --model --input=./data/ --dymBatch 2 ``` ##### 动态HW宽高 @@ -288,7 +280,7 @@ python3 -m ais_bench --model ./resnet50_v1_dynamicbatchsize_fp32.om --input=./da 以档位224,224;448,448档为例,设置档位为224,224,本程序将获取实际模型输入组Batch。 ```bash -python3 -m ais_bench --model ./resnet50_v1_dynamichw_fp32.om --input=./data/ --dymHW 224,224 +python3 -m ais_bench --model --input=./data/ --dymHW 224,224 ``` ##### 动态Dims @@ -296,7 +288,7 @@ python3 -m ais_bench --model ./resnet50_v1_dynamichw_fp32.om --input=./data/ --d 以设置档位1,3,224,224为例,本程序将获取实际模型输入组Batch。 ```bash -python3 -m ais_bench --model resnet50_v1_dynamicshape_fp32.om --input=./data/ --dymDims actual_input_1:1,3,224,224 +python3 -m ais_bench --model --input=./data/ --dymDims actual_input_1:1,3,224,224 ``` ##### 自动设置Dims模式(动态Dims模型) @@ -304,11 +296,10 @@ python3 -m ais_bench --model resnet50_v1_dynamicshape_fp32.om --input=./data/ -- 动态Dims模型输入数据的Shape可能是不固定的,比如一个输入文件Shape为1,3,224,224,另一个输入文件Shape为 1,3,300,300。若两个文件同时推理,则需要设置两次动态Shape参数,当前不支持该操作。针对该场景,增加auto_set_dymdims_mode模式,可以根据输入文件的Shape信息,自动设置模型的Shape参数。 ```bash -python3 -m ais_bench --model resnet50_v1_dynamicshape_fp32.om --input=./data/ --auto_set_dymdims_mode 1 +python3 -m ais_bench --model --input=./data/ --auto_set_dymdims_mode 1 ``` - #### 动态Shape场景 ##### 动态Shape @@ -318,7 +309,7 @@ python3 -m ais_bench --model resnet50_v1_dynamicshape_fp32.om --input=./data/ -- 动态Shape的输出大小通常为0,建议通过outputSize参数设置对应输出的内存大小。 ```bash -python3 -m ais_bench --model resnet50_v1_dynamicshape_fp32.om --dymShape actual_input_1:1,3,224,224 --outputSize 10000 +python3 -m ais_bench --model --dymShape actual_input_1:1,3,224,224 --outputSize 10000 ``` ##### 自动设置Shape模式(动态Shape模型) @@ -326,7 +317,7 @@ python3 -m ais_bench --model resnet50_v1_dynamicshape_fp32.om --dymShape actual_ 动态Shape模型输入数据的Shape可能是不固定的,比如一个输入文件Shape为1,3,224,224 另一个输入文件Shape为 1,3,300,300。若两个文件同时推理,则需要设置两次动态Shape参数,当前不支持该操作。针对该场景,增加auto_set_dymshape_mode模式,可以根据输入文件的Shape信息,自动设置模型的Shape参数。 ```bash -python3 -m ais_bench --model ./pth_resnet50_dymshape.om --outputSize 100000 --auto_set_dymshape_mode 1 --input ./dymdata +python3 -m ais_bench --model --outputSize 100000 --auto_set_dymshape_mode 1 --input ./dymdata ``` **注意该场景下的输入文件必须为npy格式,如果是bin文件将获取不到真实的Shape信息。** @@ -338,77 +329,9 @@ python3 -m ais_bench --model ./pth_resnet50_dymshape.om --outputSize 100000 --a 以对1,3,224,224 1,3,224,225 1,3,224,226进行分别推理为例,命令如下: ```bash -python3 -m ais_bench --model ./pth_resnet50_dymshape.om --outputSize 100000 --dymShape_range actual_input_1:1,3,224,224~226 +python3 -m ais_bench --model --outputSize 100000 --dymShape_range actual_input_1:1,3,224,224~226 ``` -#### 动态AIPP场景 -- 动态AIPP的介绍参考[ATC模型转换](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/63RC1alpha002/download)中"6.1 AIPP使能"章节。 -- 目前benchmark工具只支持单个input的带有动态AIPP配置的模型,只支持静态shape、动态batch、动态宽高三种场景,不支持动态shape场景。 -##### --aipp_config 输入的.config文件模板 -以resnet18模型所对应的一种aipp具体配置为例(actual_aipp_conf.config): -```cfg -[aipp_op] - input_format : RGB888_U8 - src_image_size_w : 256 - src_image_size_h : 256 - - crop : 1 - load_start_pos_h : 16 - load_start_pos_w : 16 - crop_size_w : 224 - crop_size_h : 224 - - padding : 0 - csc_switch : 0 - rbuv_swap_switch : 0 - ax_swap_switch : 0 - csc_switch : 0 - - min_chn_0 : 123.675 - min_chn_1 : 116.28 - min_chn_2 : 103.53 - var_reci_chn_0 : 0.0171247538316637 - var_reci_chn_1 : 0.0175070028011204 - var_reci_chn_2 : 0.0174291938997821 -``` -- .config文件`[aipp_op]`下的各字段名称及其取值范围参考[ATC模型转换](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/63RC1alpha002/download)中"6.1.9 配置文件模板"章节中"静态AIPP需设置,动态AIPP无需设置"部分,其中字段取值为为true、false的字段,在.config文件中取值对应为1、0。 -- .config文件`[aipp_op]`下的`input_format`、`src_image_size_w`、`src_image_size_h`字段是必填字段。 -- .config文件中字段的具体取值是否适配对应的模型,benchmark本身不会检测,在推理时acl接口报错不属于benchmark的问题 -##### 静态shape场景示例,以resnet18模型为例 -###### atc命令转换出带动态aipp配置的静态shape模型 -``` -atc --framework=5 --model=./resnet18.onnx --output=resnet18_bs4_dym_aipp --input_format=NCHW --input_shape="image:4,3,224,224" --soc_version=Ascend310 --insert_op_conf=dym_aipp_conf.aippconfig --enable_small_channel=1 -``` -- dym_aipp_conf.aippconfig的内容(下同)为: -``` -aipp_op{ - related_input_rank : 0 - aipp_mode : dynamic - max_src_image_size : 4000000 -} -``` -###### benchmark命令 -``` -python3 -m ais_bench --model resnet18_bs4_dym_aipp.om --aipp_config actual_aipp_conf.config -``` -##### 动态batch场景示例,以resnet18模型为例 -###### atc命令转换出带动态aipp配置的动态batch模型 -``` -atc --framework=5 --model=./resnet18.onnx --output=resnet18_dym_batch_aipp --input_format=NCHW --input_shape="image:-1,3,224,224" --dynamic_batch_size "1,2" --soc_version=Ascend310 --insert_op_conf=dym_aipp_conf.aippconfig --enable_small_channel=1 -``` -###### benchmark命令 -``` -python3 -m ais_bench --model resnet18_dym_batch_aipp.om --aipp_config actual_aipp_conf.config --dymBatch 1 -``` -##### 动态宽高场景示例,以resnet18模型为例 -###### atc命令转换出带动态aipp配置的动态宽高模型 -``` -atc --framework=5 --model=./resnet18.onnx --output=resnet18_dym_image_aipp --input_format=NCHW --input_shape="image:4,3,-1,-1" --dynamic_image_size "112,112;224,224" --soc_version=Ascend310 --insert_op_conf=dym_aipp_conf.aippconfig --enable_small_channel=1 -``` -###### benchmark命令 -``` -python3 -m ais_bench --model resnet18_dym_image_aipp.om --aipp_config actual_aipp_conf.config --dymHW 112,112 -``` #### trtexec场景 @@ -419,7 +342,7 @@ ais_bench支持onnx模型推理(集成trtexec),trtexec为NVIDIA TensorRT自 示例命令如下: ```bash -python3 -m ais_bench --model pth_resnet50.onnx --backend trtexec --perf 1 +python3 -m ais_bench --model --backend trtexec --perf 1 ``` 输出结果推理测试结果,示例如下: @@ -453,79 +376,6 @@ python3 -m ais_bench --model pth_resnet50.onnx --backend trtexec --perf 1 | Total Host Walltime | 从第一个执行(预热后)入队到最后一个执行完成的主机时间。 | | Total GPU Compute Time| 所有执行的 GPU 计算时间的总和。 | -#### profiler或dump场景 - -支持以--acl_json_path、--profiler、--dump参数形式实现: -+ acl_json_path参数指定acl.json文件,可以在该文件中对应的profiler或dump参数。示例代码如下: - - + profiler - - ```bash - { - "profiler": { - "switch": "on", - "output": "./result/profiler" - } - } - ``` - - 更多性能参数配置请依据CANN包种类(商用版或社区版)分别参见《[CANN 商用版:开发工具指南/性能数据采集(acl.json配置文件方式)](https://www.hiascend.com/document/detail/zh/canncommercial/63RC1/devtools/auxiliarydevtool/atlasprofiling_16_0086.html)》和《[CANN 社区版:开发工具指南/性能数据采集(acl.json配置文件方式)](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/63RC1alpha002/developmenttools/devtool/atlasprofiling_16_0086.html)》中的参数配置详细描述 - - + dump - - ```bash - { - "dump": { - "dump_list": [ - { - "model_name": "{model_name}" - } - ], - "dump_mode": "output", - "dump_path": "./result/dump" - } - } - ``` - - 更多dump配置请参见《[CANN 开发工具指南](https://www.hiascend.com/document/detail/zh/canncommercial/60RC1/devtools/auxiliarydevtool/auxiliarydevtool_0002.html)》中的“精度比对工具>比对数据准备>推理场景数据准备>准备离线模型dump数据文件”章节。 - -- 通过该方式进行profiler采集时,如果配置了环境变量`export AIT_NO_MSPROF_MODE=1`,输出的性能数据文件需要参见《[CANN 开发工具指南/数据解析与导出/Profiling数据导出](https://www.hiascend.com/document/detail/zh/canncommercial/63RC1/devtools/auxiliarydevtool/atlasprofiling_16_0100.html)》,将性能数据解析并导出为可视化的timeline和summary文件。 -- 通过该方式进行profiler采集时,如果**没有**配置环境变量`AIT_NO_MSPROF_MODE=1`,benchmark会将acl.json中与profiler相关的参数解析成msprof命令,调用msprof采集性能数据,结果默认带有可视化的timeline和summary文件,msprof输出的文件含义参考[性能数据采集(msprof命令行方式)](https://www.hiascend.com/document/detail/zh/canncommercial/63RC1/devtools/auxiliarydevtool/atlasprofiling_16_0040.html)。 -- 如果acl.json文件中同时配置了profiler和dump参数,需要要配置环境变量`export AIT_NO_MSPROF_MODE=1`保证同时采集 - -+ profiler为固化到程序中的一组性能数据采集配置,生成的性能数据保存在--output参数指定的目录下的profiler文件夹内。 - - 该参数是通过调用ais_bench/infer/__main__.py中的msprof_run_profiling函数来拉起msprof命令进行性能数据采集的。若需要修改性能数据采集参数,可根据实际情况修改msprof_run_profiling函数中的msprof_cmd参数。示例如下: - - ```bash - msprof_cmd="{} --output={}/profiler --application=\"{}\" --model-execution=on --sys-hardware-mem=on --sys-cpu-profiling=off --sys-profiling=off --sys-pid-profiling=off --dvpp-profiling=on --runtime-api=on --task-time=on --aicpu=on".format( - msprof_bin, args.output, cmd) - ``` - - 该方式进行性能数据采集时,首先检查是否存在msprof命令: - - - 若命令存在,则使用该命令进行性能数据采集、解析并导出为可视化的timeline和summary文件。 - - 若命令不存在,则msprof层面会报错,benchmark层面不检查命令内容合法性。 - - 若环境配置了AIT_NO_MSPROF_MODE=1,则使用--profiler参数采集性能数据时调用的是acl.json文件。 - - msprof命令不存在或环境配置了AIT_NO_MSPROF_MODE=1情况下,采集的性能数据文件未自动解析,需要参见《[CANN 开发工具指南](https://www.hiascend.com/document/detail/zh/canncommercial/60RC1/devtools/auxiliarydevtool/auxiliarydevtool_0002.html)》中的“性能分析工具>高级功能>数据解析与导出”章节,将性能数据解析并导出为可视化的timeline和summary文件。 - - 更多性能数据采集参数介绍请参见《[CANN 开发工具指南](https://www.hiascend.com/document/detail/zh/canncommercial/60RC1/devtools/auxiliarydevtool/auxiliarydevtool_0002.html)》中的“性能分析工具>高级功能>性能数据采集(msprof命令行方式)”章节。 - -+ acl_json_path优先级高于profiler和dump,同时设置时以acl_json_path为准。 - -+ profiler参数和dump参数,必须要增加output参数,指示输出路径。 - -+ profiler和dump可以分别使用,但不能同时启用。 - -示例命令如下: - -```bash -python3 -m ais_bench --model ./resnet50_v1_bs1_fp32.om --acl_json_path ./acl.json -python3 -m ais_bench --model /home/model/resnet50_v1.om --output ./ --dump 1 -python3 -m ais_bench --model /home/model/resnet50_v1.om --output ./ --profiler 1 -``` - #### 输出结果文件保存场景 默认情况下,ais_bench推理工具执行后不保存输出结果数据文件,配置相关参数后,可生成的结果数据如下: @@ -632,25 +482,6 @@ python3 -m ais_bench --model /home/model/resnet50_v1.om --output ./ --profiler ``` 在多线程推理的命令行基础上加上--threads {$number of threads},即可开启多计算线程推理模式,实现计算-计算的并行,提高推理吞吐量。 -#### dump数据自动转换场景 - - ```bash - python3 -m ais_bench --model ./pth_resnet50_bs1.om --output ./result --dump 1 --dump_npy 1 - ``` - 在dump场景上加上--dump_npy 1开启自动转换dump数据模式, 需要配合--dump或者--acl_json_path参数。 - - 转换后dump目录 - - ```bash - result/ - |-- 2023_01_03-06_35_53/ - |-- 2023_01_03-06_35_53_summary.json - `-- dump/ - |--20230103063551/ - |--20230103063551_npy/ - ``` - - ### 输出结果 ais_bench推理工具执行后,打屏输出结果示例如下: @@ -690,78 +521,3 @@ ais_bench推理工具执行后,打屏输出结果示例如下: | D2H_latency (ms) | Device to Host的内存拷贝耗时。单位为ms。 | | throughput | 吞吐率。吞吐率计算公式:1000 *batchsize/npu_compute_time.mean | | batchsize | 批大小。本工具不一定能准确识别当前样本的batchsize,建议通过--batchsize参数进行设置。 | - -## 扩展功能 - -### 接口开放 - -开放ais_bench推理工具inferface推理Python接口。 -接口文档参考[API使用说明](API_GUIDE.md) - -动态Shape推理: - -```bash -def infer_dymshape(): - device_id = 0 - session = InferSession(device_id, model_path) - ndata = np.zeros([1,3,224,224], dtype=np.float32) - - mode = "dymshape" - outputs = session.infer([ndata], mode, custom_sizes=100000) - print("outputs:{} type:{}".format(outputs, type(outputs))) - print("dymshape infer avg:{} ms".format(np.mean(session.sumary().exec_time_list))) -``` - -多线程推理: - -使用多线程推理接口时需要注意内存的使用情况,传入的input和预计output总和内存需要小于可用内存,否则程序将会异常退出。 - -```python -def infer_pipeline(): - device_id = 0 - session = InferSession(device_id, model_path) - - barray = bytearray(session.get_inputs()[0].realsize) - ndata = np.frombuffer(barray) - - outputs = session.infer([[ndata]]) - print("outputs:{} type:{}".format(outputs, type(outputs))) - - print("static infer avg:{} ms".format(np.mean(session.sumary().exec_time_list))) -``` - - -### 推理异常保存文件功能 - -当出现推理异常时,会写入算子执行失败的输入输出文件到**当前目录**下。同时会打印出当前的算子执行信息。利于定位分析。示例如下: - -```bash -python3 -m ais_bench --model ./test/testdata/bert/model/pth_bert_bs1.om --input ./random_in0.bin,random_in1.bin,random_in2.bin -``` - -```bash -[INFO] acl init success -[INFO] open device 0 success -[INFO] load model ./test/testdata/bert/model/pth_bert_bs1.om success -[INFO] create model description success -[INFO] get filesperbatch files0 size:1536 tensor0size:1536 filesperbatch:1 runcount:1 -[INFO] exception_cb streamId:103 taskId:10 deviceId: 0 opName:bert/embeddings/GatherV2 inputCnt:3 outputCnt:1 -[INFO] exception_cb hostaddr:0x124040800000 devaddr:0x12400ac48800 len:46881792 write to filename:exception_cb_index_0_input_0_format_2_dtype_1_shape_30522x768.bin -[INFO] exception_cb hostaddr:0x124040751000 devaddr:0x1240801f6000 len:1536 write to filename:exception_cb_index_0_input_1_format_2_dtype_3_shape_384.bin -[INFO] exception_cb hostaddr:0x124040752000 devaddr:0x12400d98e400 len:4 write to filename:exception_cb_index_0_input_2_format_2_dtype_3_shape_.bin -[INFO] exception_cb hostaddr:0x124040753000 devaddr:0x12400db20400 len:589824 write to filename:exception_cb_index_0_output_0_format_2_dtype_1_shape_384x768.bin -EZ9999: Inner Error! -EZ9999 The error from device(2), serial number is 17, there is an aicore error, core id is 0, error code = 0x800000, dump info: pc start: 0x800124080041000, current: 0x124080041100, vec error info: 0x1ff1d3ae, mte error info: 0x3022733, ifu error info: 0x7d1f3266f700, ccu error info: 0xd510fef0003608cf, cube error info: 0xfc, biu error info: 0, aic error mask: 0x65000200d000288, para base: 0x124080017040, errorStr: The DDR address of the MTE instruction is out of range.[FUNC:PrintCoreErrorInfo] - -# ls exception_cb_index_0_* -lh --rw-r--r-- 1 root root 45M Jan 7 08:17 exception_cb_index_0_input_0_format_2_dtype_1_shape_30522x768.bin --rw-r--r-- 1 root root 1.5K Jan 7 08:17 exception_cb_index_0_input_1_format_2_dtype_3_shape_384.bin --rw-r--r-- 1 root root 4 Jan 7 08:17 exception_cb_index_0_input_2_format_2_dtype_3_shape_.bin --rw-r--r-- 1 root root 576K Jan 7 08:17 exception_cb_index_0_output_0_format_2_dtype_1_shape_384x768.bin -``` -如果有需要将生成的异常bin文件转换为npy文件,请使用[转换脚本convert_exception_cb_bin_to_npy.py](https://gitee.com/ascend/tools/tree/master/ais-bench_workload/tool/ais_bench/test/convert_exception_cb_bin_to_npy.py). -使用方法:python3 convert_exception_cb_bin_to_npy.py --input {bin_file_path}。支持输入bin文件或文件夹。 - - -## FAQ -[FAQ](FAQ.md) -- Gitee From 689d4fa84ee392446affd48e50227e4af2b68766 Mon Sep 17 00:00:00 2001 From: Hanye Date: Mon, 6 May 2024 11:38:49 +0800 Subject: [PATCH 6/8] readme fix --- tools/infer_tool/README.md | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/tools/infer_tool/README.md b/tools/infer_tool/README.md index 545123a..6690498 100644 --- a/tools/infer_tool/README.md +++ b/tools/infer_tool/README.md @@ -185,7 +185,7 @@ python3 -m ais_bench --model --output ./ --debug 1 - 详细的推理耗时信息 ```bash - [DEBUG] model aclExec cost : 2.336000 + [DEBUG] model exec cost : 2.336000 ``` - 模型输入输出等具体操作信息 @@ -247,14 +247,6 @@ python3 -m ais_bench --model --input ./data/ --device 1 [INFO] NPU_compute_time (ms): min = 3.3889999389648438, max = 3.9230000972747803, mean = 3.616000032424927, median = 3.555000066757202, percentile(99%) = 3.9134000968933105 [INFO] throughput 1000*batchsize.mean(1)/NPU_compute_time.mean(3.616000032424927): 276.54867008654026 [INFO] ------------------------------------------------------ -[INFO] unload model success, model Id is 1 -[INFO] unload model success, model Id is 1 -[INFO] end to destroy context -[INFO] end to destroy context -[INFO] end to reset device is 2 -[INFO] end to reset device is 2 -[INFO] end to finalize acl -[INFO] end to finalize acl [INFO] multidevice run end qsize:4 result:1 i:0 device_1 throughput:281.38893494131406 start_time:1676875630.804429 end_time:1676875630.8303885 i:1 device_2 throughput:276.54867008654026 start_time:1676875630.8043878 end_time:1676875630.8326817 -- Gitee From 8cdbfef3107ae646e9987f923ef6c0cb1246cd18 Mon Sep 17 00:00:00 2001 From: Hanye Date: Mon, 6 May 2024 16:48:25 +0800 Subject: [PATCH 7/8] conflict fix --- tools/infer_tool/MANIFEST.in | 2 -- 1 file changed, 2 deletions(-) delete mode 100644 tools/infer_tool/MANIFEST.in diff --git a/tools/infer_tool/MANIFEST.in b/tools/infer_tool/MANIFEST.in deleted file mode 100644 index 5a239c8..0000000 --- a/tools/infer_tool/MANIFEST.in +++ /dev/null @@ -1,2 +0,0 @@ -include ais_bench/evaluate/dataset/download.sh -include ais_bench/evaluate/dataset/*.json \ No newline at end of file -- Gitee From 9330558759111ed54d3779eae7fb4a65b41021f3 Mon Sep 17 00:00:00 2001 From: Hanye Date: Tue, 7 May 2024 09:48:13 +0800 Subject: [PATCH 8/8] conflict fix --- tools/infer_tool/README.md | 27 +++++++++++++-------------- 1 file changed, 13 insertions(+), 14 deletions(-) diff --git a/tools/infer_tool/README.md b/tools/infer_tool/README.md index 6690498..3d336c5 100644 --- a/tools/infer_tool/README.md +++ b/tools/infer_tool/README.md @@ -87,7 +87,7 @@ ais_bench推理工具的安装方式包括:一键式编译安装和源代码 Successfully installed ais_bench-{version} ``` -## 使用方法 +## 使用方法(以接入aclruntime后端为例) ### 工具介绍 ais_bench推理工具的使用方法主要通过命令行使用。 @@ -96,9 +96,8 @@ ais_bench推理工具的使用方法主要通过命令行使用。 ais_bench推理工具可以通过ais_bench可执行文件方式启动模型测试。启动方式如下: ```bash -python3 -m ais_bench --model *.om +python3 -m ais_bench --model ``` -其中,*为OM离线模型文件名。 #### 参数说明 @@ -327,14 +326,14 @@ python3 -m ais_bench --model --outputSiz #### trtexec场景 -ais_bench支持onnx模型推理(集成trtexec),trtexec为NVIDIA TensorRT自带工具。用户使用ais_bench拉起trtexec工具进行推理性能测试,测试过程中实时输出trtexec日志,打印在控制台,推理性能测试完成后,将性能数据输出在控制台。 +ais_bench支持ONNX模型推理(集成trtexec),trtexec为NVIDIA TensorRT自带工具,作为推理后端。用户使用ais_bench拉起trtexec工具进行推理性能测试,测试过程中实时输出trtexec日志,打印在控制台,推理性能测试完成后,将性能数据输出在控制台。 ##### 前置条件 -推理性能测试环境需要配置有GPU,安装CANN、CUDA及TensorRT,并且trtexec可以通过命令行调用到,安装方式可参考[TensorRT](https://github.com/NVIDIA/TensorRT)。 +推理性能测试环境需要配置有GPU,安装 CUDA及TensorRT,并且trtexec可以通过命令行调用到,安装方式可参考[TensorRT](https://github.com/NVIDIA/TensorRT)。 示例命令如下: ```bash -python3 -m ais_bench --model --backend trtexec --perf 1 +python3 -m ais_bench --model --backend trtexec --perf 1 ``` 输出结果推理测试结果,示例如下: @@ -356,17 +355,17 @@ python3 -m ais_bench --model --backend trtexec --p | 字段 | 说明 | | --------------------- | ------------------------------------------------------------ | | Throughput | 吞吐率。 | -| Latency | H2D 延迟、GPU 计算时间和 D2H 延迟的总和。这是推断单个执行的延迟。。 | +| Latency | H2D延迟、GPU计算时间和D2H延迟的总和。这是推断单个执行的延迟。 | | min | 推理执行时间最小值。 | | max | 推理执行时间最大值。 | | mean | 推理执行时间平均值。 | | median | 推理执行时间取中位数。 | | percentile(99%) | 推理执行时间中的百分位数。 | | H2D Latency | 单个执行的输入张量的主机到设备数据传输的延迟。 | -| GPU Compute Time | 为执行 CUDA 内核的 GPU 延迟。 | +| GPU Compute Time | 为执行CUDA内核的GPU延迟。 | | D2H Latency | 单个执行的输出张量的设备到主机数据传输的延迟。 | | Total Host Walltime | 从第一个执行(预热后)入队到最后一个执行完成的主机时间。 | -| Total GPU Compute Time| 所有执行的 GPU 计算时间的总和。 | +| Total GPU Compute Time| 所有执行的GPU计算时间的总和。 | #### 输出结果文件保存场景 @@ -423,7 +422,7 @@ python3 -m ais_bench --model --backend trtexec --p - 设置--output_dirname参数。示例命令及结果如下: ```bash - python3 -m ais_bench --model ./pth_resnet50_bs1.om --output ./result --output_dirname subdir + python3 -m ais_bench --model --output ./result --output_dirname subdir ``` ```bash @@ -436,7 +435,7 @@ python3 -m ais_bench --model --backend trtexec --p - 设置--dump参数。示例命令及结果如下: ```bash - python3 -m ais_bench --model ./pth_resnet50_bs1.om --output ./result --dump 1 + python3 -m ais_bench --model --output ./result --dump 1 ``` ```bash @@ -450,7 +449,7 @@ python3 -m ais_bench --model --backend trtexec --p - 设置--profiler参数。示例命令及结果如下: ```bash - python3 -m ais_bench --model ./pth_resnet50_bs1.om --output ./result --profiler 1 + python3 -m ais_bench --model --output ./result --profiler 1 ``` ```bash @@ -465,12 +464,12 @@ python3 -m ais_bench --model --backend trtexec --p #### 多线程推理场景 ```bash - python3 -m ais_bench --model ./pth_resnet50_bs1.om --pipeline 1 + python3 -m ais_bench --model --pipeline 1 ``` 在单线程推理的命令行基础上加上--pipeline 1即可开启多线程推理模式,实现计算-搬运的并行,加快端到端推理速度。 ```bash - python3 -m ais_bench --model ./pth_resnet50_bs1.om --pipeline 1 --threads 2 + python3 -m ais_bench --model --pipeline 1 --threads 2 ``` 在多线程推理的命令行基础上加上--threads {$number of threads},即可开启多计算线程推理模式,实现计算-计算的并行,提高推理吞吐量。 -- Gitee