1 Star 0 Fork 70

tanglei/ascend-deployer

forked from Ascend/ascend-deployer 
加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
文件
克隆/下载
README.en.rst 65.45 KB
一键复制 编辑 原始数据 按行查看 历史
Jason_Hou 提交于 2023-06-01 03:00 . !741 统一安装过程

Overview

Functions

The offline installation tool provides automatic download and one-click installation of the OS components and Python third-party dependencies. It also supports the installation of the driver, firmware, and CANN software packages. The tools directory additionally places the Device IP configuration script, the use method can refer to Device IP configuration.

Environment Requirements

Description of the supported operating system

OS Version CPU Architecture Installation Type
BCLinux 7.6 aarch64 A minimal image is installed by default.
BCLinux 7.6 x86_64 A minimal image is installed by default.
BCLinux 7.7 aarch64 A minimal image is installed by default.
CentOS 7.6 aarch64 A minimal image is installed by default.
CentOS 7.6 x86_64 A minimal image is installed by default.
CentOS 8.2 aarch64 A minimal image is installed by default.
CentOS 8.2 x86_64 A minimal image is installed by default.
Debian 9.9 aarch64 A minimal image is installed by default.
Debian 9.9 x86_64 A minimal image is installed by default.
Debian 10.0 x86_64 A minimal image is installed by default.
EulerOS 2.8 aarch64 A minimal image is installed by default.
EulerOS 2.9 aarch64 A minimal image is installed by default.
EulerOS 2.9 x86_64 A minimal image is installed by default.
Kylin V10Tercel aarch64 A minimal image is installed by default.
Kylin V10Tercel x86_64 A minimal image is installed by default.
Kylin V10GFB aarch64 A minimal image is installed by default.
Kylin v10juniper aarch64 A minimal image is installed by default.
Linx 6 aarch64 A minimal image is installed by default.
OpenEuler 20.03LTS aarch64 A minimal image is installed by default.
OpenEuler 20.03LTS x86_64 A minimal image is installed by default.
Tlinux 2.4 aarch64 A server image is installed by default.
Tlinux 2.4 x86_64 A server image is installed by default.
UOS 20SP1 aarch64 A minimal image is installed by default.
UOS 20SP1 x86_64 A minimal image is installed by default.
UOS 20-1020e aarch64 A minimal image is installed by default.
UOS 20-1021e aarch64 A minimal image is installed by default.
UOS 20 aarch64 A minimal image is installed by default.
UOS 20 x86_64 A minimal image is installed by default.
Ubuntu 18.04.1/5 aarch64 A minimal image is installed by default.
Ubuntu 18.04.1/5 x86_64 A minimal image is installed by default.
Ubuntu 20.04.1 aarch64 A minimal image is installed by default.
Ubuntu 20.04.1 x86_64 A minimal image is installed by default.

Description of supported hardware configuration

Central Inference Hardware Central Training Hardware Intelligent Edge Hardware
A300-3000 A300T-9000 A500 Pro-3000
A300-3010 A800-9000 Atlas200(EP)
Atlas 300I Pro A800-9010  
Atlas 300V Pro Atlas 300T Pro  
Atlas 300I Duo    
A800-3000    
A800-3010    
Atlas 300V    
Atlas 200I SoC A1    

Precautions

  • If the offline installation tool cannot run due to environment variables, users need to configure ASCENDPATH according to the actual situation of the environment.

  • If the gcc version of the system is lower than 7.3.0, the offline installation tool will be automatically installed, which takes a long time; Users can also avoid the automatic upgrade by manually upgrading in advance and configuring environment variables.

  • By default, the offline installation tool downloads and installs Python-3.7.5 as a Python version of the Cann package. This is explained in Python-3.7.5.Users can select the Python version by setting the ASCEND_PYTHON_VERSION environment variable, or the ASCEND_PYTHON_VERSION configuration item in the downloader/config.ini file (environment variable is preferred when setting at the same time).The optional Python versions are 3.7.0 to 3.7.11 and 3.8.0 to 3.8.11. This tool has only been fully adapted and tested on Python-3.7.5, and it is strongly recommended not to change the default configuration.

  • Basic commands such as tar, cd, ls, find, grep, chown, chmod, unzip, bzip2,ssh must be installed in the OS. It is recommended that during the installation process of Ubuntu/Debian system, select the option of [OpenSSH Server]/[SSH Server] when going to [Software Selection] to avoid missing SSH command.

  • Kylin V10 GFB system downloads kylin when the system is dependent_ V10Tercel_ Aarch64 is enough.

  • The offline installation tool supports only the default environment after the OS image is successfully installed. Do not install or uninstall software after the OS is installed. If some system software has been uninstalled, causing inconsistency with the default system package, you need to manually configure the network and use tools such as apt, yum, and dnf to install and configure the missing software.

  • The offline installation tool can install only basic libraries to ensure that TensorFlow and PyTorch can run properly. If you need to run complex inference services or model training, the model code may contain libraries related to specific services. You need to install the libraries by yourself.

  • Offline installation tools except install.sh、start_download.sh、start_download_ui.bat and start_download.bat, the rest of the files are not designed for the user to use the interface or command. Do not use them directly.

  • It is forbidden to put the password in the inventory_file file.

  • CentOS 7.6 x86_64 with lower version kernel (below 4.5) of ATLAS 300T training card requires CentOS to be upgraded to 8.0 or above or a kernel patch is added. Failure to do so may result in firmware installation failure.Add a kernel patch method please refer to the reference [link] (https://support.huawei.com/enterprise/zh/doc/EDOC1100162133/b56ad5be).

  • Atlas 300I Pro,Atlas 300V Pro,Atlas 300V,A300T-9000 and Atlas 300T Pro must be set variable cus_npu_info in inventory_file, Atlas 300I pro should be 300i-pro, Atlas 300V Pro should be 300v-pro,Atlas 300V shoud be 300V.Edit inventory_file in the following format:

    [ascend]
    localhost ansible_connection='local' cus_npu_info='300i-pro'  # Atlas 300I Pro
    ip_address_1 ansible_ssh_user='root' cus_npu_info='300v-pro'  # Atlas 300V Pro
    ip_address_2 ansible_ssh_user='root' cus_npu_info='300v'      # Atlas 300V
    ip_address_3 ansible_ssh_user='root' cus_npu_info='300t'      # A300T-9000
    ip_address_4 ansible_ssh_user='root' cus_npu_info='300t-pro'  # Atlas 300T Pro
    
  • The hardware configurations of the Atlas200 EP and A300 card (A300-3000, A300-3010, A800-3000, and A800-3010) cannot be distinguished. The following conditions must be met when using the Atlas200 EP. The Atlas200 EP and A300 inference card environments cannot be deployed in batches. If the deployed machine contains the Atlas200 EP, do not store the NPU package of the A300 EP in the Resources directory. If the deployed machine contains the A300 inference card, do not store the NPU package of the Atlas200 EP in the Resources directory. Because of the above two restrictions, --download=CANN does not include the NPU package of Atlas200 EP. Please prepare it yourself.

  • By default, the root user is not allowed to remotely log in to OSs such as EulerOS. Therefore, you need to set PermitRootLogin to yes in the sshd_config file before using this tool(Individual OS configuration methods may be different, please refer to the OS official description), and close the remote connection of root user after using this tool.

  • Support for Ubuntu 18.04.1/5 installation of cross-compiled related components and the Aarch64 architecture toolkit package.

  • After the kylin V10 system’s dependencies are installed, you need to wait for the system configuration to complete before you can use docker and other commands.

  • Before installing system dependencies, please confirm whether Docker is installed on the system. If it is installed, please uninstall it before installing system dependencies.

  • Users are advised to modify downloader/config and downloader/requirements.txt to ensure compliance with the security requirements of your organization.

  • After the default installation of tlinux system, the total space of the root directory is about 20G, and the packages that exceed the available disk space can not be placed in the resources directory to avoid decompression or installation failure.

  • BCLinux 7.6 does not have python3 by default. The yum install python3 command is run before the download operation. Because the BCLinux 7.6 system source does not contain python3, modify the source configuration file by referring to the BCLinux official configuration file, or change “el7.6” to “el7.7” in “/etc/yum.repos.d/BCLinux-Base.repo”(Run the sed -i 's/el7.6/el7.7/g' /etc/yum.repos.d/BCLinux-Base.repo command). After the installation, restore the original configuration.

  • tensorflow-1.15.0 aarch64,tensorflow-2.6.5 aarch64,torch-1.5.0/apex-0.1 aarch64/x86_64 and torch-1.8.1/apex-0.1/torch_npu-1.8.1 aarch64/x86_64 are not available for download. You need to place them in your resources/pylibs directory, otherwise the installation will be skipped.

  • Please strictly follow the official compilation specification when compiling tensorflow aarch64.

  • Tensorflow 1.15.0 is only applicable to python3.7, and tensorflow 2.6.5 is applicable to python3.7, python3.8, and python3.9.Due to dependency conflict, after installing one version, you need to uninstall the installed version before installing another version.

  • If you plan to use the automatic download function under Linux, please configure the GUI interface in advance and directly run the download instruction.

  • Euleros, Debian and other systems may trigger driver source compilation when installing the driver. Users are required to install the kernel header package consistent with the kernel version of the system (which can be viewed through ‘uname -r’ command). The details are as follows.

  • Based on security considerations, it is recommended to reinforce the unzipped installation directory(ascend-deployer) and set its permission to only allow owner to use.

  • Description of the kernel header package

OS kernel header package that matches the kernel version of the system How to get
EulerOS kernel-headers-<version>、 kernel-devel-<version> Contact the OS vendor, or find it in the “devel_t ools.tar.gz” tool component that comes with the corresponding OS
Debian linux-headers-<version>、 linux-headers-<version>-common、 linux-kbuild-<version> Contact the OS vendor, or look it up in the image of the corresponding OS

Tool installation

pip install

pip3 install ascend-deployer==<Version>
  • Version requirement: python >= 3.6
  • It is recommended that you install it as root and use the python3 and pip3 tools on your system. If pip3 is not available, please install it by yourself
  • Do not install in this way for non root users
  • Refer to Operation instruction: pip install

git install

git clone https://gitee.com/ascend/ascend-deployer.git

For security reasons, the user should set the environment umask to 077 before git clone, and only clone and use tools in the user’s home directory, which is only for the user’s own use.

download zip

Click the “clone / download” button in the upper right corner, and then click the “download zip” below to download and unzip to use.In order to prevent the software package from being maliciously tampered with during delivery or storage, it is recommended that users download the software package and use sha256sum to verify the integrity of the software. For the latest official version of sha256sum, please refer to readme of the master branch. This tool can be used by root and non-root users. To avoid the risk of excessive permissions after unzipping, it is recommended to set the environment umask to 077 before unzipping the zip package, and only unzip and use tools in the user’s HOME directory, and only for the user’s own use. The above two installation methods please pay attention to the tool directory permissions risk.

Confirm whether the owner and authority of the directory and file meet the security requirements of the user’s organization, etc. In addition, please note that except for the user himself and other users outside the management room, they should not have the write permission of the parent directory of the installation directory.find {Installation directory} -ls

Operation Instructions

Download Instructions

The download function can be used in the Windows or Linux OSs.Before running, please confirm that the offline installation directory used belongs to the user, and the permissions and groups of the directory need to meet the security requirements of the organization.

Download Notice

  • Modify the configuration file to download required OS components(Windows), edit the downloader/config.ini file. For details, see Configuration Description.
  • A large amount of open source software needs to be installed. The open source software downloaded using the offline installation tool comes from the OS source. You need to fix the vulnerabilities of the open source software as required. You are advised to use the official source to update the software regularly. For details, see Source Configuration.
  • The downloaded software is automatically stored in the resources directory.
  • Docker user groups are created and the Docker service is started during the installation. After the installation, it is recommended to uninstall the third-party components such as gcc and g++ and cpp and jdk that may have security risks in the system.

Download

  • Windows
    1. Python 3 is required in Windows. Python 3.7 or later is recommended. Download link: python3.7.5, Complete the installation as prompted. During the installation, select Add Python to environment variables on the Advanced Options page. Otherwise, you need to manually add environment variables.
    2. Start download. Set the os_list or software configuration item of “downloader/config.ini” and run start_download.bat.Run start_download_ui.bat (recommended because it allows you to select the Related components of OS or PKG to be downloaded on the displayed UI).
  • Linux
    1. Run the ./start_download.sh --os-list=<OS1>,<OS2> --download=<PK1>,<PK2>==<Version> command to start download, refer to Linux Download Parameter Description. The following call ` * * sh script using. / * * sh way, also can use bash * * sh ` calls, please according to actual use.It is recommended to set the environment umask to 077 before downloading.
    2. The presence of Python 3 on the environment is checked when the download is performed. If python3 does not exist, it can be divided into two types: if the current user is root, the tool will automatically download python3 through APT, YUM and other tools;If the current user is not root, the tool prompts the user to install Python3.

Installation Instructions

install options

  • install options are in the inventory_file. default options is below:
[ascend]
localhost ansible_connection='local'

[ascend:vars]
user=HwHiAiUser
group=HwHiAiUser
install_path=/usr/local/Ascend
parameter remark
user user,will be pass to –install-username options
group usergroup,will be pass to –install-usergroup options
in stall_path The installation path of the CANN package,will be pass to –install-path options

Notice

  • The install_path parameter can specify the CANN package’s installation path. This parameter is valid for root (The CANN package is not installed on the environment, i.e., there is no /etc/scend/cann_install.info file, otherwise it will be installed to the path specified by the contents of the file) and not for non-root (only to the default ~/Ascend path).The install_path parameter does not specify the installation path for the driver package and edge components (AtlasEdge and HA). The driver package can only be installed to the default path /usr/local/Ascend and edge components (AtlasEdge and HA) can only be installed to the default path /usr/local.
  • The install_path parameter can only specify the Toolbox package’s installation path. This parameter is valid for root (The Toolbox package is not installed on the environment, i.e., there is no /etc/scend/cann_install.info and /etc/Ascend/ascend_toolbox_install.info file, otherwise it will be installed to the path specified by the contents of the file) and not for non-root (only to the default ~/Ascend path).
  • When the offline tool is a zip package, the user needs to confirm that the decompression directory of the offline tool is a new decompression, and the directory permission is 700 without soft links.
  • After installation, the configuration needs to be modified. It is recommended to cancel the login of root user.
  • The driver software packages will user HwHiAiUser and group as default user. The HwHiAiUser user must be created first and guarantee the password of the created user, the expiration date of the password and the security issues in subsequent use. The commands to create user and group is below:
#add HwHiAiUser group
groupadd HwHiAiUser

#add HwHiAiUser user add it to HwHiAiUser group
#set /home/HwHiAiUser as HwHiAiUser's HOME directory and create
#set /bin/bash HwHiAiUser's default shell
useradd -g HwHiAiUser -d /home/HwHiAiUser -m HwHiAiUser -s /bin/bash
  • When installing edge components (AtlasEdge and HA) in versions 2.0.2, mabey need limit the login status of user HwHiAiUser. When installing the driver package, set user HwHiAiUser to the login state. Set this parameter based on the actual scenario.
usermod -s /sbin/nologin HwHiAiUser   # When installing edge components (AtlasEdge and HA) in versions 2.0.2
usermod -s /bin/bash HwHiAiUser   # When installing the driver package
  • When installing AtlasEdge components in versions 2.0.3 and later, the component creates a MindXEdge user by default.
  • When installing the edge components in version 2.0.4, you need to install haveged in advance. For example, Ubuntu system uses the command apt install haveged. After installation, you need to execute systemctl enable haveged and systemctl start haveged to start the haveged service.
  • If you need to specify the running user and user group, modify the inventory_file file. The file content is as follows:
[ascend:vars]
user=HwHiAiUser
group=HwHiAiUser
  • List of software supported by non-root users
Software name description
Python, gcc python3.7.5 and gcc7.3.0 is installed in the $HOME/.local/ directory
Python framework tensorflow, pytorch, mindpore
CANN toolbox, nnae, nnrt, tfplugin, toolkit and kernels are installed in the $HOME directory by default, and the specified path is not supported
MindStudio installed in the $HOME/ directory

Note: 1. Non-root users need root users to install system components and driver before they can install the above components. 2. After installing gcc7.3.0, you need to establish a symbolic link to use it. For example, gcc7.3.0 installed by root executes the command ln -sf /usr/local/gcc7.3.0/bin/gcc /usr/bin/gcc. 3. To install kernels, you need to install nnae or toolkit first. When installing kernels, you need to specify –kernels_type parameter. 4. Non-root users need to join the driver installation group to install and use nnrt and toolkit normally. The default driver installation group is HwHiAiUser, Modify the user group command as follows:

usermod -a -G HwHiAiUser non-root-user

Obtaining Software Packages

  1. Prepare the software packages to be installed as required (The driver, firmware, and CANN software packages can be installed). Save the software packages to be installed in the resources directory. The following is an example.
    • Driver and firmware: Link
    • CANN software package: Link
  2. The package only supports the ZIP format. Only one version of the package should exist in the resources directory at installation time, otherwise there may be version mismatch. If there are no packages in the resources directory, the tool skips the installation.
  3. Support Atlas 500 and Atlas 500Pro batch installation of IEF Agent, refer to UserManual-IEF documentation to prepare IEF product certificate, registration tools, installation tools, placed in the resources directory.
    • IEF relevant certificates and tools: Link
    • The Atlas 500 comes pre-loaded with registration tools and installation tools, so you just need to prepare the product certificate and place it in the Resources directory.The Atlas 500Pro requires all three certificates and tools
    • Atlas 500 only supports the Euleros 2.8 Aarch64 tailoring operating system, not other systems, so it does not support the offline deployment tool to run locally, only supports remote installation, and also does not support non-root installation. Atlas 500Pro supports both local and remote installations
    • Depending on the edge node AtlasEdge middleware working properly, Atlas 500 comes with AtlasEdge middleware, Atlas 500Pro needs to install AtlasEdge middleware first
    • Depends that the IEF server is working properly and that the network between the edge device and the IEF is working properly. Whether the edge node is successfully managed needs to be observed at the IEF Web front end. Refer to the usermanual-IEF documentation for other restrictions
  4. The files of docker image require the user to log in to ascendhub, pull the image, and then transfer it to resources/docker_images directory before docker-images’ installation. please create this directory by yourself.The file name of docker image is like to ubuntu_18.04_{x86_ 64 | aarch64}.tar, the system architecture is in the brackets, and only the two architectures in the brackets are supported.The installation of docker image will install the system package first, so download the corresponding system package before installing docker image; Users need to ensure the security of the docker image to be installed.
ascend-deployer
|- ...
|- install.sh
|- inventory_file
|- ...
|- playbooks
|- README.md
|- resources
   |- A300-3010-npu_xxx.zip
   |- A300-3010-npu-driver_xxx.run
   |- A300-3010-npu-firmware_xxx.run
   |- Ascend-cann-nnrt-xxx.zip
   |- Ascend-cann-nnrt-xxx.run
   |- ...
   |- Ascend-cann-toolkit-xxx.run
   |- ...
   |- BCLinux_7.6_aarch64
   |- BCLinux_7.6_x86_64
   |- cert_ief_xxx.tar.gz
   |- edge-installer_xxx_arm64.tar.gz
   |- edge-register_xxx_arm64.tar.gz
   |- docker_images
   |- ...

Single-Device Installation

  1. Configure a stand-alone inventory_file file.

    Edit the inventory_file file. The default is as follows:

    [ascend]
    localhost ansible_connection='local'
    
  2. Run the installation script and select an installation mode (software-specific installation or scenario-specific installation) as required.Note: if other users need to be able to use Python installed by root user, please set umask to 022 in advance. Before setting, confirm that the umask permission meets the security requirements of your organization.

    • 2.1 Software-specific installation

    run the ./install.sh --install=<package_name_1>,<package_name_2>. The following is an example.

    ./install.sh --help     # Viewing Help Information.
    ./install.sh --install=sys_pkg,python,npu     # Installing system dependencies and python3.7.5 and driver and firmware.
    

    Notes:

    - Installation sequence: sys_pkg > python > npu(driver and firmware) > CANN software package(such as the Toolkit and nnrt) > AI framework(pytorch、tensorflow、mindspore).During installation, the cann package version under the resources directory needs to be matched with NPU.
    - After the driver or firmware is installed, maybe you need run the `reboot` command to restart the device for the driver and firmware to take effect.
    - Some components require runtime dependencies. For example, PyTorch requires the Toolkit or nnae to provide runtime dependencies, TensorFlow and npubridge and npudevice require TFPlugin and toolkit or TFPlugin and nnae to provide runtime dependencies, and mindspore require driver and toolkit to provide runtime dependencies.
    - All the installation of Python libraries must first install Python 3.7.5, such as python, tensorflow, Mindstore, etc.
    - During installation, the running environment time needs to be calibrated to the correct UTC time through the date - s command.
    
    • 2.2 Scenario-specific installation(Recommended for non-professional users)

    run the ./install.sh --install-scene=<scene_name>. The following is an example. ./install.sh --install-scene=auto # Automatic installation of all software packages that can be found The offline installation tool provides several basic installation scenarios. For details, see Installation Scenarios.

  3. After the installation.

    run the ./install.sh --test=<target>. The following is an example:

    ./install.sh --test=driver # Test whether the driver is normal.

Batch Installation

  1. SSH connection based on key authentication,Please confirm that paramiko is not installed in the system before installation (ansible will use paramiko in some cases, and its improper configuration may cause security problems).

    Configure the IP addresses of other devices where the packages to be installed. Edit the inventory_file file. The format is shown as follows:

    [ascend]
    ip_address_1 ansible_ssh_user='root'      # root user
    ip_address_2 ansible_ssh_user='root'
    ip_address_3 ansible_ssh_user='username'  # non-root user
    

    Configure the reference operation for key authentication.Please pay attention to the risks during the use and storage of SSH keys and key passwords, especially when the keys are not encrypted. Users should configure them according to the security policies of their organization, including but not limited to software version, password complexity requirements, security configuration (protocol, encryption suite, key length, etc,especially the configuration under /etc/ssh and ~/.ssh)

    ssh-keygen -t rsa -b 3072   # Log in to the management node and generate the SSH Key. For security reasons, it is recommended that the user Enter the key password at the "Enter passphrase" step, and ensure that the password complexity is reasonable. It is recommended to set the umask to 0077 before executing this command and to restore the original umask after executing it.
    ssh-copy-id -i ~/.ssh/id_rsa.pub <user>@<ip>   # Copy the public key of the management node to the machines of all nodes, and replace <user>@<ip> with the account and ip of the corresponding node to be copied to.
    ssh <user>@<ip>   # Verify that it is possible to log on to the remote node, and replace <user>@<ip> with the account and IP of the corresponding node to be logged in. After verifying that the login is OK, run the 'exit' command to exit the SSH connection.
    

    Note: Please be aware of the risks involved in the use and storage of SSH keys.

  2. Set up the SSH agent to manage the SSH key to avoid entering the key password during the bulk installation of the tool. The following are the guidelines for setting up an SSH agent:

    ssh-agent bash   # Start the ssh-agent bash process
    ssh-add ~/.ssh/id_rsa         # Add a private key to the ssh-agent
    
  3. Run the ./install.sh --check command to test the connectivity of the devices where the packages to be installed. Ensure that all devices can be properly connected. If a device fails to be connected, check whether the network connection of the device is normal and whether sshd is enabled.

  4. The following operation is the same as the above Single-Device Installation steps 2 and 3.

  5. When the bulk installation of the tool is completed, exit the SSH agent process in time to avoid security risks.

    exit   # Exit the ssh-agent bash process
    
  6. The default concurrency number is 5, and the maximum concurrency number is 255. If the number of environments to be deployed is greater than 5, you can modify the forks value in the ansible.cfg file to the total number of nodes to be deployed.

Operation instruction: pip install

When the tool is installed with pip, two entrances will be provided for easy operation.

  • ascend-download
  • ascend-deployer

Both entrances are available to both root and non-root users

Download

ascend-download --os-list=<OS1>,<OS2> --download=<PK1>,<PK2>==<Version>

Both win10 and Linux can execute

  • Download all resources to “ascend-deployer/resources/”
  • In windows, the ascend deployer directory is generated in the current directory where the command is executed. When the download is complete, copy the whole directory to the Linux server to be deployed.
  • In Linux, the ascend-deployer directory will be generated under the HOME directory. You can replace the user’s HOME directory by setting the environment variable ASCEND_Deployer_HOME. Non-root users must ensure that the directory exists and can read and write properly.

Installation

ascend-deployer --install=<pkg1,pkg2>

The ascend-deployer command is essentially a wrapper of install.sh.The use method is exactly the same as directly executing install.sh in the ascend deployer directory. The ASCEND_Deployer command automatically looks for the file ASCEND_Deployer /install.sh in the user’s HOME directory and replaces the user’s HOME directory by setting the environment variable ASCEND_Deployer_HOME. Non-root users must ensure that the directory exists and can read and write properly.

Environment Variable Configuration

The offline deployment tool can install Python 3.7.5, To ensure that the built-in Python (Python 2.x or Python 3.x) is not affected, you need to configure the following environment variables before using Python 3.7.5:

export PATH=/usr/local/python3.7.5/bin:$PATH                         # root
export LD_LIBRARY_PATH=/usr/local/python3.7.5/lib:$LD_LIBRARY_PATH   # root

export PATH=~/.local/python3.7.5/bin:$PATH                         # non-root
export LD_LIBRARY_PATH=~/.local/python3.7.5/lib:$LD_LIBRARY_PATH   # non-root

This tool will automatically install the Python 3.7.5 environment variable in /usr/local/ascendrc file. You can easily set the Python 3.7.5 environment variable by following the following command

source /usr/local/ascendrc    # root
source ~/.local/ascendrc      # non-root

Similarly, other software packages or tools installed by offline deployment tools can be used normally only after users refer to the corresponding official information and configure environment variables or make other Settings.

Follow-up

  • Inference scenario

    If you need to develop applications, please refer to the relevant official materials, such as CANN Application Software Development Guide (C and C++) or CANN Application Software Development Guide (Python).

  • Training scenario

    For network model migration and training, please refer to the relevant official materials, such as TensorFlow Network Model Porting and Training Guide or PyTorch Network Model Porting and Training Guide.

  • Delete this tool

    This tool is only used for deployment. When installation completed, it should be deleted for free the disk space.

Something that should be deleted instructions
ascend-deployer Directory of tool on the controller
pip3 uninstall ascend-deployer Tool pip-installed on the controller, uninstall using commands
~/ansible Customize information collection configuration files on the controller and remote machines
~ /resources和~/resources.tar Resource directory on the controller and remote machines
~/build Source package decompression directory on the controller and remote machines

Reference Information

Install Parameter Description

Select corresponding parameters to install the software. The command likes ./install.sh [options]. The following table describes the parameters. You can run the ./install.sh --help command to view the options of the following parameters.

P arameter Description
–help -h Queries help information.
–check Check the environment to ensure that the control machine has installed Python 3.7.5, Ansible and other components, and check the connectivity with the device to be installed.
–clean Clean the Resources directory under the user’s home directory for the device to be installed.
–nocopy Forbids resources copying during batch installation.
–f orce_upg rade_npu Can force upgrade NPU when not all devices have exception
–te nsorflow _version Appoint tensorflow version,must be 1.15.0 or 2.6.5,default is 1.15.0
–kern els_type Appoint kernels package type,must be nnae or toolkit,default is nnae
–verbose Print verbose.
–outp ut-file= Set the output format of the command execution. The available parameters can be viewed with the command “ansible -doc-t callback-l”.
– stdout_c allback= Performs debugging.
– install= Specifies the software to be installed. If –install=npu is specified, the driver and firmware are installed.
–instal l-scene= Specifies the scenario for installation. For details about the installation scenarios, see Installation Scenarios.
–patch= Patching specific package
–patch-r ollback= Rollback specific package
–test= Checks whether the specified component works properly.

Linux Download Parameter Description

Parameter Description
--os- list=<OS1>,<OS2> set specific os softwares to download
` --download=<PK1>, <PK2>==<Version>` download specific components. such as MindSpore、MindStudio、CANN

This tool downloads python component packages by default. If the system specified by –os-list has only aarch64 architecture, only python component packages required by aarch64 architecture system will be downloaded. If the system specified by –os-list has only x86_64 architecture, only python component packages required by x86_64 architecture are downloaded. When –os-list is empty or the specified system has both aarch64 and x86_64 architectures, the Python component packages required for both architectures are downloaded. Same logic as above to download CANN package for aarch64 or x86_64 architectures.

optional components ve rsion 1 ve rsion 2 ve rsion 3 ve rsion 4 version 5 v ersion 6
MindStudio 2.0.0 3.0.2 3.0.3 3.0.4 5.0.RC1 5 .0.RC2
MindSpore 1.1.1 1.3.0 1.5.0 1.6.2 1.7.0 1.8.0
CANN 2 0.3.0 5. 0.2.1 5. 0.3.1 5.0.4 5. 1.RC1.1 5 .1.RC2

Only one version of MindSpore or MindStudio that matches CANN package version should exist in the Resources directory during installation, as shown above. ./start_download.sh --download=<PK1>,<PK2>==<Version>, when <Version> is missing, <PK> is the latest. --download=MindSpore, –os-list specifies the corresponding OS, please refer to the official website of mindspore for some instructions. MindStudio installation please refer to the install MindStudio. CANN installation please refer to the install CANN.

Installation Scenarios

The offline installation tool provides several basic installation scenarios.If the GCC version of the system is lower than 7.3.0, GCC needs to be installed before installing the framework to ensure that all scenarios can be used normally after installation. After installing gcc7.3.0, you need to establish a soft link to use it (/usr/bin/gcc points to the executable file of the installed gcc7.3.0). For example, gcc7.3.0 installed by root executes the command ln -sf /usr/local/gcc7.3.0/bin/gcc /usr/bin/gcc.

Installation scenario Installed Components Description
auto all All software packages that can be found are installed
vmhost sys_pkg、npu、toolbox VM host scene
edge sys_pkg、atlasedge、ha Install MindX middleware, HA
offline_dev sys_pkg、python、npu、toolkit Offline development scene
offline_run sys_pkg、python、npu、nnrt Offline run scene
mindspore sys_pkg、python、npu、toolkit、mindspore mindspore scene
tensorflow_dev sys_pkg、python、npu、toolkit、tfplugin、tensorflow tensorflow development scene
tensorflow_run sys_pkg、python、npu、nnae、tfplugin、tensorflow tensorflow run scene
pytorch_dev sys_pkg、python、npu、toolkit、pytorch pytorch development scene
pytorch_run sys_pkg、python、npu、nnae、pytorch pytorch run scene

The configuration files for the preceding installation scenarios are stored in the scene directory. For example, the following shows the configuration file scene/scene_auto.yml of the auto scene:

- hosts: '{{ hosts_name }}'

- name: install system dependencies
  import_playbook: ../install/install_sys_pkg.yml

- name: install python3.7.5
  import_playbook: ../install/install_python.yml

- name: install driver and firmware
  import_playbook: ../install/install_npu.yml

- name: install toolkit
  import_playbook: ../install/install_toolkit.yml

- name: install nnrt
  import_playbook: ../install/install_nnrt.yml

- name: install nnae
  import_playbook: ../install/install_nnae.yml

- name: install tfplugin
  import_playbook: ../install/install_tfplugin.yml

- name: install toolbox
  import_playbook: ../install/install_toolbox.yml

- name: install pytorch
  import_playbook: ../install/install_pytorch.yml

- name: install tensorflow
  import_playbook: ../install/install_tensorflow.yml

- name: install mindspore
  import_playbook: ../install/install_mindspore.yml

To customize an installation scenario, refer to the preceding configuration file.

Install and rollback cann patch package

The ascend deployer tool supports cann cold patch installation and fallback. 1. Cann patch packages do not support online downloading using the ascend deployer tool. Users need to obtain the required cann patch packages by themselves and place them in the ascend deployer / resources / patch (if there is no patch directory, users should create it by themselves). Note that the cann package corresponding to the patch package in the ascend deployer / resources directory should be deleted before installation. 2. The execution commands for installing and fallback cann cold patch are as follows: - Install cann cold patch (take nnae and tfplugin packages as examples): ./install.sh --patch=nnae,tfplugin - Fallback cann cold patch (take nnae and tfplugin packages as examples): ./install.sh --patch-rollback=nnae,tfplugin 3. The relevant constraints on cann cold patch are as follows: - The patch can only support the upgrade of the corresponding baseline version or related patch version. - For patches based on the same baseline version, ensure that the patch version installed later is greater than the patch version installed earlier. - The patch only supports fallback once.During fallback, you need to place the patch package when installing the patch in the ascend deployer/resources/patch directory (if there is no patch directory, please create it yourself). Note that the cann package corresponding to the patch package in the ascend deployer/resources directory should be deleted before fallback.

Configuration Description

Proxy Configuration

If you want to use an proxy, configure the proxy in an environment variable. Users need to pay attention to the security of the proxy.This tool validates HTTPS certificates by default, if a certificate error occurs during the download process, it may be that the proxy server has a security mechanism for certificate replacement, so you need to install the proxy server certificate first.

  1. Configure the agent in the environment variable as follows

    # Configure environment variables.
    export http_proxy="http://user:password@proxyserverip:port"
    export https_proxy="http://user:password@proxyserverip:port"
    

    Where “user” is the user’s internal network name, “password” is the user’s password (special characters need to be escaped), “proxyserverip” is the IP address of the proxyserver, and “port” is the port. The principle of configuring proxies in Windows environment variables is the same as that in Linux. For details, see official instructions.

  2. Configure the agent in the downloader/config.ini file as follows:

    [proxy]
    verify=true         # Whether to verify the HTTPS certificate. If it is closed,Please be aware of the security risks
    

Windows Download Configuration

You can configure and modify the download parameters in the downloader/config.ini file to download the required OS components on windows. It is not recommended to modify the configuration file directly. It is recommended to run start_download_ui.bat and use the UI interface to check the required components

[download]
os_list=CentOS_7.6_aarch64, CentOS_7.6_x86_64, CentOS_8.2_aarch64, CentOS_8.2_x86_64, Ubuntu_18.04_aarch64, Ubuntu_18.04_x86_64 ...          # OS information of the environment to be deployed.
[software]
pkg_list=CANN_5.0.3.1,MindStudio_3.0.3  # CANN or MindStudio to be deployed.

Source Configuration

The offline installation tool provides the source configuration file. Replace it as required.

  1. Python source configuration. Configure the Python source in the downloader/config.ini file.The Huawei source is used by default.
[pypi]
index_url=https://repo.huaweicloud.com/repository/pypi/simple
  1. OS source configuration. OS source configuration file: downloader/config/{os}_{version}_{arch}/source.xxx Using CentOS 7.6 AArch64 as an example, the content of the source configuration file downloader/config/CentOS_7.6_aarch64/source.repo is as follows. This indicates that both Base and EPEL sources are enabled from which system components will be queried and downloaded.Huawei source is used by default.It can be modified according to business requirements and installation requirements to ensure that its source meets the security / vulnerability repair requirements of the organization.If you modify, select a safe and reliable source and test whether the download and installation behavior is normal, otherwise it may cause incomplete download of the component or abnormal installation.Deleting the source may result in an incomplete download of the component.
[base]
baseurl=https://mirrors.huaweicloud.com/centos-altarch/7/os/aarch64
[epel]
baseurl=https://mirrors.huaweicloud.com/epel/7/aarch64
  1. When downloading the centos-like system component, you need to parse the XML files in the system source. You are advised to install the defusedxml component in python3 to improve the security against potential XML vulnerability attacks.

Public Web Site URL

https://cmake.org
https://github.com
https://gcc.gnu.org
http://mirrors.bclinux.org
https://archive.kylinos.cn
https://support.huawei.com
https://mirrors.tencent.com
https://mirrors.bfsu.edu.cn
https://repo.huaweicloud.com
https://uniportal.huawei.com
https://mirrors.huaweicloud.com
https://cache-redirector.jetbrains.com
https://obs-9be7.obs.myhuaweicloud.com
https://obs-9be7.obs.cn-east-2.myhuaweicloud.com
https://ms-release.obs.cn-north-4.myhuaweicloud.com

Sha256sum verification

sha256sum Version of the ascend-deployer
22f7e10677658e7c3d 223b32f73786c765e85cf6f66440bf620c3e4275f11e7f ascend-depl oyer-2.0.4.B093.zip

FAQ

  1. Q: The first time you execute ’./install.sh –check ’or any other installation command, the system dependencies and Python 3.7.5 will be installed automatically. If the installation process is interrupted unintentionally, the second time you execute the command, the RPM and DPKG tools may be locked, or Python 3.7.5 functionality may be missing.
  • A: Release the RPM/DPKG tool lock, delete the Python 3.7.5 installation directory, and install again using the tool.(Python 3.7.5 installation directory may refer to to Environment Variable Configuration )
  1. Q: Non-root users are prompted for the sudo password when installing the pre-5.0.1 Toolkit.
  • A: For security reasons, this tool does not require non-root users to have sudo privileges, so it does not support non-root users to install the toolkit prior to 5.0.1.
  1. Q: What is the mechanism of crl file update and signature verification? Whether the crl file can be updated independently?
  • A: There are two methods for crl file update and signature verification. The tool at toolbox/latest/Ascend-DMI/bin/ascend-cert is preferred. If this tool does not exist in the environment, openssl is preferred. To be compatible with old and new software package signature formats, the tool uses two sets of certificates. The tool compares the validity time of the crl file in the installation package with that of the local crl file, and uses the latest crl file to check whether the certificate is revoked. For the root user, the system of local crl files for /etc/hwsipcrl/ascendsip.crl(or ascendsip_g2.crl), for non-root users, This file is ~/.local/hwsipcrl/ascendsip.crl(or ascendsip_g2.crl). If the local crl file does not exist or takes effect earlier than the crl file in the installation package, the local crl file is replaced by the crl file in the installation package. The tools/update_crl.sh supports independent crl file update, Run bash update_crl.sh <crl_file> command to update an independent crl file, <crl_file> is the path of the crl file uploaded by the user.
  1. Q: What is the reason why “certificate verify failed” appears when downloading some components?
  • A: The tool verifies the HTTPS certificate by default. The preceding error may be caused by an exception of the proxy server certificate. Contact the system administrator. The verification function can be configured in the downloader/config.ini file. For details, see Proxy Configuration。
  1. Q: When the Euler system is a worker node, the words “Failed to connect to the host via ssh: Shared connection to XX closed” appear in the installation tensorflow2.6.5 .
  • A: The SSH connection session timeout is set in the host. This error will be caused if the deployment task time exceeds the set SSH connection session timeout. Modify the value of the “clientaliveinterval” keyword in the “/etc/ssh/sshd_config” file to “1800” (the timeout is 30 minutes), and then execute systemctl restart sshd to restart the sshd service.
  1. Q: What is the reason for the words “ImportError: libblas.so.3: cannot open shared object file: No such file or directory” when importing torch after installing torch-1.8.1 in the system.
  • A: The system does not install the openblas dependency, which results in the absence of this library. Execute yum install openblas to install the system dependency, and then create a soft link. The creation method is as follows (please refer to the specific library version):
    • Execute find / -name libopenblas*so to find the libopenblas-r0.3.9.so file (the specific version displayed is subject to the actual version).
    • Execute ln -s /usr/lib64/libopenblas-r0.3.9.so /usr/lib64/libblas.so.3 and ln -s /usr/lib64/libopenblas-r0.3.9.so /usr/lib64/liblapack.so.3 Create soft links.
  1. Q: What is the reason for the words “ImportError: libquadmath.so.0: cannot open shared object file: No such file or directory” when importing torch after installing torch-1.8.1 in the system.
  • A: There is no system dependency. Execute ‘yum install libquadmath’ to install the system dependency.
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
Python
1
https://gitee.com/tanglei5862/ascend-deployer.git
git@gitee.com:tanglei5862/ascend-deployer.git
tanglei5862
ascend-deployer
ascend-deployer
dev

搜索帮助