Qualcomm® AI Engine Direct 使用手册(26)

Qualcomm® AI Engine Direct 使用手册(26)

    • 8.2 高级的
      • 8.2.1 QNN HTP 共享缓冲区教程
      • 8.2.2 使用 DLC 执行

        8.2 高级的

        8.2.1 QNN HTP 共享缓冲区教程

        介绍

        本教程介绍如何使用数据缓冲区在 QNN HTP 后端的处理域之间进行共享访问。使用共享缓冲区可以消除主机 CPU 上的客户端代码和 HTP 加速器之间的数据复制。

        HTP 后端支持两种类型的共享内存。

        Qnn_MemDescriptor_t 类型QnnMemHtp_Descriptor_t 类型描述符
        QNN_MEM_TYPE_ION1、不适用每个张量将被映射到它自己的共享缓冲区; 2、文件描述符和内存句柄之间的一对一关系
        QNN_MEM_TYPE_CUSTOMQNN_HTP_MEM_SHARED_BUFFER1、多个张量将被映射到一个共享缓冲区;2、文件描述符和内存句柄之间的一对多关系

        》笔记

        本教程仅关注共享缓冲区的使用。SDK 示例代码中有一些先决条件,此处未详细讨论。用户可以参考QNN文档中的相应部分,或者参考SampleApp。

        SampleApp 文档:示例应用程序教程

        示例应用代码:${QNN_SDK_ROOT}/examples/QNN/SampleApp

        加载必备共享库

        配备高通芯片组的硬件设备包含一个共享库,该库提供共享缓冲区操作的功能。

        加载共享库

        该libcdsprpc.so共享库可在大多数配备高通芯片组的主流设备(SD888 及更高版本)上使用。

        我们可以动态加载它,如下所示:

        1 void* libCdspHandle = dlopen("libcdsprpc.so", RTLD_NOW | RTLD_LOCAL);
        2
        3 if (nullptr == libCdspHandle) {4   // handle errors
        5 }
        

        解析符号

        共享库成功加载后,我们可以继续解析所有必需的符号。

        下面的代码片段显示了解析共享库中符号的模板:

         1/**
         2* Defination: void* rpcmem_alloc(int heapid, uint32 flags, int size);
         3* Allocate a buffer via ION and register it with the FastRPC framework.
         4* @param[in] heapid  Heap ID to use for memory allocation.
         5* @param[in] flags   ION flags to use for memory allocation.
         6* @param[in] size    Buffer size to allocate.
         7* @return            Pointer to the buffer on success; NULL on failure.
         8*/
         9typedef void *(*RpcMemAllocFn_t)(int, uint32_t, int);
        10
        11/**
        12* Defination: void rpcmem_free(void* po);
        13* Free a buffer and ignore invalid buffers.
        14*/
        15typedef void (*RpcMemFreeFn_t)(void *);
        16
        17/**
        18* Defination: int rpcmem_to_fd(void* po);
        19* Return an associated file descriptor.
        20* @param[in] po  Data pointer for an RPCMEM-allocated buffer.
        21* @return        Buffer file descriptor.
        22*/
        23typedef int (*RpcMemToFdFn_t)(void *);
        24
        25RpcMemFreeFn_t rpcmem_alloc = (RpcMemAllocFn_t)dlsym(libCdspHandle, "rpcmem_alloc");
        26RpcMemFreeFn_t rpcmem_free = (RpcMemFreeFn_t)dlsym(libCdspHandle, "rpcmem_free");
        27RpcMemToFdFn_t rpcmem_to_fd = (RpcMemToFdFn_t)dlsym(libCdspHandle, "rpcmem_to_fd");
        28if (nullptr == rpcmem_alloc || nullptr == rpcmem_free || nullptr == rpcmem_to_fd) {29    dlclose(libCdspHandle);
        30    // handle errors
        31}
        

        将 QNN_MEM_TYPE_ION 与 QNN API 结合使用

        以下是 ION 共享缓冲区的表示,其中每个张量都有自己的共享缓冲区,具有自己唯一的内存指针、文件描述符和内存句柄。

        一个例子如下所示:

        HTP 共享缓冲区示例

         1// QnnInterface_t is defined in ${QNN_SDK_ROOT}/include/QNN/QnnInterface.h
         2QnnInterface_t qnnInterface;
         3// Init qnn interface ......
         4// See ${QNN_SDK_ROOT}/examples/QNN/SampleApp code
         5
         6// Qnn_Tensor_t is defined in ${QNN_SDK_ROOT}/include/QNN/QnnTypes.h
         7Qnn_Tensor_t inputTensor;
         8// Set up common setting for inputTensor ......
         9/* There are 2 specific settings for shared buffer:
        10*  1. memType should be QNN_TENSORMEMTYPE_MEMHANDLE; (line 40)
        11*  2. union member memHandle should be used instead of clientBuf, and it
        12*     should be set to nullptr. (line 41)
        13*/
        14
        15
        16size_t bufSize;
        17// Calculate the bufSize base on tensor dimensions and data type ......
        18
        19#define RPCMEM_HEAP_ID_SYSTEM 25
        20#define RPCMEM_DEFAULT_FLAGS 1
        21
        22// Allocate the shared buffer
        23uint8_t* memPointer = (uint8_t*)rpcmem_alloc(RPCMEM_HEAP_ID_SYSTEM, RPCMEM_DEFAULT_FLAGS, bufSize);
        24if (nullptr == memPointer) {25    // handle errors
        26}
        27
        28int memFd = rpcmem_to_fd(memPointer);
        29if (-1 == memfd) {30    // handle errors
        31}
        32
        33// Fill the info of Qnn_MemDescriptor_t and regist the buffer to QNN
        34// Qnn_MemDescriptor_t is defined in ${QNN_SDK_ROOT}/include/QNN/QnnMem.h
        35Qnn_MemDescriptor_t memDescriptor = QNN_MEM_DESCRIPTOR_INIT;
        36memDescriptor.memShape = {inputTensor.rank, inputTensor.dimensions, nullptr};
        37memDescriptor.dataType = inputTensor.dataType;
        38memDescriptor.memType = QNN_MEM_TYPE_ION;
        39memDescriptor.ionInfo.fd = memfd;
        40inputTensor.memType = QNN_TENSORMEMTYPE_MEMHANDLE;
        41inputTensor.memHandle = nullptr;
        42Qnn_ContextHandle_t context; // Must obtain a QNN context handle before memRegister()
        43// To obtain QNN context handle:
        44// For online prepare, refer to ${QNN_SDK_ROOT}/docs/general/sample_app.html#create-context
        45// For offline prepare, refer to ${QNN_SDK_ROOT}/docs/general/sample_app.html#load-context-from-a-cached-binary
        46Qnn_ErrorHandle_t registRet = qnnInterface->memRegister(context, &memDescriptor, 1u, &(inputTensor.memHandle));
        47if (QNN_SUCCESS != registRet) {48    rpcmem_free(memPointer);
        49    // handle errors
        50}
        51
        52/**
        53* At this place, the allocation and registration of the shared buffer has been complete.
        54* On QNN side, the buffer has been bound by memfd
        55* On user side, this buffer can be manipulated through memPointer.
        56*/
        57
        58/**
        59* Optionally, user can also allocate and register shared buffer for output as adove codes (lines 7-46).
        60* And if so the output buffer also should be deregistered and freed as below codes (lines 66-70).
        61*/
        62
        63// Load the input data to memPointer ......
        64
        65// Execute QNN graph with input tensor and output tensor ......
        66
        67// Get output data ......
        68
        69// Deregister and free all buffers if it's not being used
        70Qnn_ErrorHandle_t deregisterRet = qnnInterface->memDeRegister(&tensors.memHandle, 1);
        71if (QNN_SUCCESS != registRet) {72    // handle errors
        73}
        74rpcmem_free(memPointer);
        

        将 QNN_HTP_MEM_SHARED_BUFFER 与 QNN API 结合使用

        以下是多张量共享缓冲区的表示,其中一组张量映射到单个共享缓冲区。这个单个共享缓冲区有一个内存指针和一个文件描述符,但是每个张量都有自己的内存指针偏移量和内存句柄。

        一个例子如下所示:

        HTP 多张量共享缓冲区示例

         1// QnnInterface_t is defined in ${QNN_SDK_ROOT}/include/QNN/QnnInterface.h
          2QnnInterface_t qnnInterface;
          3// Init qnn interface ......
          4// See ${QNN_SDK_ROOT}/examples/QNN/SampleApp code
          5
          6// Total number of input tensors
          7size_t numTensors;
          8
          9// Qnn_Tensor_t is defined in ${QNN_SDK_ROOT}/include/QNN/QnnTypes.h
         10Qnn_Tensor_t inputTensors[numTensors];
         11// Set up common setting for inputTensor ......
         12/* There are 2 specific settings for shared buffer:
         13*  1. memType should be QNN_TENSORMEMTYPE_MEMHANDLE; (line 40)
         14*  2. union member memHandle should be used instead of clientBuf, and it
         15*     should be set to nullptr. (line 41)
         16*/
         17
         18// Calculate the shared buffer size
         19uint64_t totalBufferSize;
         20for (size_t tensorIdx = 0; tensorIdx < numTensors; tensorIdx++) { 21   // Calculate the tensorSize based on tensor dimensions and data type
         22   totalBufferSize += tensorSize;
         23}
         24
         25#define RPCMEM_HEAP_ID_SYSTEM 25
         26#define RPCMEM_DEFAULT_FLAGS 1
         27
         28// Allocate the shard buffer
         29uint8_t* memPointer = (uint8_t*)rpcmem_alloc(RPCMEM_HEAP_ID_SYSTEM, RPCMEM_DEFAULT_FLAGS, totalBufferSize);
         30if (nullptr == memPointer) { 31    // handle errors
         32}
         33
         34// Get a file descriptor for the buffer
         35int memFd = rpcmem_to_fd(memPointer);
         36if (-1 == memfd) { 37    // handle errors
         38}
         39
         40// Regiter the memory handles using memory descriptors
         41// This is the offset of the tensor location in the shared buffer
         42uint64_t offset;
         43for (size_t tensorIdx = 0; tensorIdx < numTensors; tensorIdx++) { 44   // Fill the info of Qnn_MemDescriptor_t and register the descriptor to QNN
         45   // Qnn_MemDescriptor_t is defined in ${QNN_SDK_ROOT}/include/QNN/QnnMem.h
         46   Qnn_MemDescriptor_t memDescriptor;
         47   memDescriptor.memShape = {inputTensors[tensorIdx].rank, inputTensors[tensorIdx].dimensions, nullptr};
         48   memDescriptor.dataType = inputTensors[tensorIdx].dataType;
         49   memDescriptor.memType = QNN_MEM_TYPE_CUSTOM;
         50   inputTensor[tensorIdx].memType = QNN_TENSORMEMTYPE_MEMHANDLE;
         51   inputTensor[tensorIdx].memHandle = nullptr;
         52
         53   // Fill the info of QnnMemHtp_Descriptor_t and set as custom info
         54   // QnnMemHtp_Descriptor_t is defined in ${QNN_SDK_ROOT}/include/QNN/HTP/QnnHtpMem.h
         55   QnnMemHtp_Descriptor_t htpMemDescriptor;
         56   htpMemDescriptor.type = QNN_HTP_MEM_SHARED_BUFFER;
         57   htpMemDescriptor.size = totalBufferSize; //Note: it's total buffer size
         58
         59   QnnHtpMem_SharedBufferConfig_t htpSharedBuffConfig = {memFd, offset};
         60   htpMemDescriptor.sharedBufferConfig = htpSharedBuffConfig;
         61
         62   memDescriptor.customInfo = &htpMemDescriptor;
         63
         64   Qnn_ContextHandle_t context; // Must obtain a QNN context handle before memRegister()
         65   // To obtain QNN context handle:
         66   // For online prepare, refer to ${QNN_SDK_ROOT}/docs/general/sample_app.html#create-context
         67   // For offline prepare, refer to ${QNN_SDK_ROOT}/docs/general/sample_app.html#load-context-from-a-cached-binary
         68
         69   Qnn_ErrorHandle_t registRet = qnnInterface->memRegister(context, &memDescriptor, 1u, &(inputTensor[tensorIdx].memHandle));
         70   if (QNN_SUCCESS != registRet) { 71      // Deregister already created memory handles
         72      rpcmem_free(memPointer);
         73      // handle errors
         74   }
         75
         76   // move offset by the tensor size
         77   offset = offset + tensorSize;
         78}
         79
         80/**
         81* At this place, the allocation and registration of the shared buffer has been complete.
         82* On QNN side, the buffer has been bound by memfd
         83* On user side, this buffer can be manipulated through memPointer and offset.
         84*/
         85
         86/**
         87* Optionally, user can also allocate and register shared buffer for output as adove codes (lines 7-78).
         88* And if so the output buffer also should be deregistered and freed as below codes (lines 98-104).
         89*/
         90
         91// Load the input data to memPointer with respecitve offsets ......
         92
         93// Execute QNN graph with input tensors and output tensors ......
         94
         95// Get output data from the memPointer and offset combination ......
         96
         97// Deregister all mem handles the buffer if it's not being used
         98for (size_t tensorIdx = 0; tensorIdx < numTensors; tensorIdx++) { 99   Qnn_ErrorHandle_t deregisterRet = qnnInterface->memDeRegister(&(inputTensors[tensorIdx].memHandle), 1);
        100   if (QNN_SUCCESS != registRet) {101    // handle errors
        102   }
        103}
        104rpcmem_free(memPointer);
        

        8.2.2 使用 DLC 执行

        教程设置

        本教程假设已遵循QNN和SNPE的一般设置说明。特别是,使用工具转换为 DLC需要适当设置 PYTHONPATH 和 SNPE_ROOT。

        此外,本教程需要获取 Inception V3 Tensorflow 模型文件和示例图像。这是由提供的安装脚本处理的setup_inceptionv3.py。该脚本位于:

        ${QNN_SDK_ROOT}/examples/Models/InceptionV3/scripts/setup_inceptionv3.py
        

        用法如下:

        usage: setup_inceptionv3.py [-h] -a ASSETS_DIR [-d] [-c] [-q]
        Prepares the inception_v3 assets for tutorial examples.
        required arguments:
          -a ASSETS_DIR, --assets_dir ASSETS_DIR
                                directory containing the inception_v3 assets
        optional arguments:
          -d, --download        Download inception_v3 assets to inception_v3 example
                                directory
          -c, --convert_model   Convert and compile model once acquired.
          -q, --quantize_model  Quantize the model during conversion. Only available
                                if --c or --convert_model option is chosen
        

        在使用脚本之前,请将环境变量设置TENSORFLOW_HOME为指向TensorFlow包的安装位置。该脚本使用 TensorFlow 实用程序,例如 optimize_for_inference.py,它们位于 TensorFlow 安装目录中。

        1. 找到TensorFlow包的位置:
        $ python3 -m pip show tensorflow
        
        1. TENSORFLOW_HOME使用 TensorFlow 包的安装位置(步骤 #1 中输出的位置字段)设置环境变量:
        $ export TENSORFLOW_HOME=/tensorflow_core
        
        1. 使用以下脚本安装 Inception V3 TensorFlow 模型和示例图像setup_inceptionv3.py:
        $ python3 ${QNN_SDK_ROOT}/examples/Models/InceptionV3/scripts/setup_inceptionv3.py -a ~/tmpdir -d
        

        该模型文件现在应填充在以下位置:

        ${QNN_SDK_ROOT}/examples/Models/InceptionV3/tensorflow/inception_v3_2016_08_28_frozen.pb
        

        此原始图像现在应填充在以下位置:

        ${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/cropped
        

        型号转换

        获取模型资产后,可以使用 Qualcomm® 神经处理 SDK 中的转换工具将模型转换为 DLC。

        笔记

        HTP 和 DSP 后端需要使用量化模型。请参阅模型量化以生成量化的 DLC。

        使用snpe-tensorflow-to-dlc工具转换 Inception V3 模型 。

        $ ${SNPE_ROOT}/bin/x86_64-linux-clang/snpe-tensorflow-to-dlc \
          --input_network ${QNN_SDK_ROOT}/examples/Models/InceptionV3/tensorflow/inception_v3_2016_08_28_frozen.pb \
          --input_dim input 1,299,299,3 \
          --out_node InceptionV3/Predictions/Reshape_1 \
          --output_path ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3.dlc \
        

        这会生成${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3.dlcDLC 文件。

        DLC 包含序列化模型、网络拓扑和关联的模型数据。

        模型量化

        DLC 可以使用snpe-dlc-quantize 工具进行量化。用法示例如下:

        $ ${SNPE_ROOT}/bin/x86_64-linux-clang/snpe-dlc-quantize \
          --input_dlc ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3.dlc \
          --input_list ${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/cropped/raw_list.txt \
          --output_dlc ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3_quantized.dlc \
        

        这将产生以下工件:

        • ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3_quantized.dlc

          笔记

          量化模型时,输入列表必须包含输入数据的绝对路径。

          执行需要生成的 DLC 和提供的实用程序库libQnnModelDlc.so。该库扩展了QNN 模型 API 以组成 QNN 图并从提供的 DLC 路径返回其句柄。

          ModelError_t QnnModel_composeGraphsFromDlc(Qnn_BackendHandle_t backendHandle,
                                                  QNN_INTERFACE_VER_TYPE interface,
                                                  Qnn_ContextHandle_t contextHandle,
                                                  const GraphConfigInfo_t **graphsConfigInfo,
                                                  const char *dlcPath,
                                                  const uint32_t numGraphsConfigInfo,
                                                  GraphInfoPtr_t **graphsInfo,
                                                  uint32_t *numGraphsInfo,
                                                  bool debug,
                                                  QnnLog_Callback_t logCallback,
                                                  QnnLog_Level_t maxLogLevel)
          

          QnnGraph_ComposeGraphs这与添加了输入参数的 API相同dlcPath 。然后可以最终确定并执行返回的 QNN 图句柄。

          以下部分演示了 DLC 的执行。

          CPU后端执行

          在Linux主机上执行

          1. qnn-net-run使用libQnnModelDlc.so实用程序库作为–model参数和 Inception_v3.dlc 作为参数来执行模型–dlc_path。
          $ cd ${QNN_SDK_ROOT}/examples/Models/InceptionV3
          $ ${QNN_SDK_ROOT}/bin/x86_64-linux-clang/qnn-net-run \
                        --backend ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnCpu.so \
                        --model ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnModelDlc.so \
                        --dlc_path ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3.dlc \
                        --input_list data/cropped/raw_list.txt
          

          结果将位于${QNN_SDK_ROOT}/examples/Models/InceptionV3/output。

          查看结果。

          $ python ${QNN_SDK_ROOT}/examples/Models/InceptionV3/scripts/show_inceptionv3_classifications.py -i data/cropped/raw_list.txt \
                                          -o output/ \
                                          -l data/imagenet_slim_labels.txt
          

          在安卓上执行

          在 Android 目标上运行 CPU 后端与在 Linux x86 目标上运行类似。

          在 Android 设备上为示例创建一个目录。

          $ adb shell "mkdir /data/local/tmp/inception_v3"
          

          将必要的库和 DLC 推送到设备。

          $ adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnCpu.so /data/local/tmp/inception_v3
          $ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3.dlc /data/local/tmp/inception_v3
          $ adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnModelDlc.so /data/local/tmp/inception_v3
          

          将输入数据和列表推送到设备。

          $ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/cropped /data/local/tmp/inception_v3
          $ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/target_raw_list.txt /data/local/tmp/inception_v3
          

          将qnn-net-run工具推至设备。

          $ adb push ${QNN_SDK_ROOT}/bin/aarch64-android/qnn-net-run /data/local/tmp/inception_v3
          

          设置设备环境。

          $ adb shell
          $ cd /data/local/tmp/inception_v3
          $ export LD_LIBRARY_PATH=/data/local/tmp/inception_v3
          

          qnn-net-run使用以下参数运行。

          $ ./qnn-net-run --backend libQnnCpu.so --model libQnnModelDlc.so --dlc_path Inception_v3.dlc --input_list target_raw_list.txt
          

          运行的输出将位于默认的 ./output 目录中。

          退出设备并查看结果。

          $ exit
          $ cd ${QNN_SDK_ROOT}/examples/Models/InceptionV3
          $ adb pull /data/local/tmp/inception_v3/output output_android
          $ python3 ${QNN_SDK_ROOT}/examples/Models/InceptionV3/scripts/show_inceptionv3_classifications.py -i data/cropped/raw_list.txt \
                                          -o output_android/ \
                                          -l data/imagenet_slim_labels.txt
          

          GPU后端执行

          笔记

          不支持在 Windows 设备上运行 GPU 后端。

          在安卓上执行

          在 Android 目标上运行 GPU 后端与在 Android 目标上运行 CPU 后端类似。

          在 Android 设备上为示例创建一个目录。

          $ adb shell "mkdir /data/local/tmp/inception_v3"
          

          将必要的库和 DLC 推送到设备。

          $ adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnGpu.so /data/local/tmp/inception_v3
          $ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3.dlc /data/local/tmp/inception_v3
          $ adb push ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnModelDlc.so /data/local/tmp/inception_v3
          

          将输入数据和列表推送到设备。

          $ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/cropped /data/local/tmp/inception_v3
          $ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/target_raw_list.txt /data/local/tmp/inception_v3
          

          将qnn-net-run工具推至设备。

          $ adb push ${QNN_SDK_ROOT}/bin/aarch64-android/qnn-net-run /data/local/tmp/inception_v3
          

          设置设备环境。

          $ adb shell
          $ cd /data/local/tmp/inception_v3
          $ export LD_LIBRARY_PATH=/data/local/tmp/inception_v3
          

          qnn-net-run使用以下参数运行。

          $ ./qnn-net-run --backend libQnnGpu.so --model libQnnModelDlc.so --dlc_path Inception_v3.dlc --input_list target_raw_list.txt
          

          运行的输出将位于默认的 ./output 目录中。

          退出设备并查看结果。

          $ exit
          $ cd ${QNN_SDK_ROOT}/examples/Models/InceptionV3
          $ adb pull /data/local/tmp/inception_v3/output output_android
          $ python3 ${QNN_SDK_ROOT}/examples/Models/InceptionV3/scripts/show_inceptionv3_classifications.py -i data/cropped/raw_list.txt \
                                         -o output_android/ \
                                         -l data/imagenet_slim_labels.txt
          

          HTP 后端执行

          在Linux主机上执行

          笔记

          可以使用 HTP 模拟后端在 Linux 主机上运行 HTP 后端。

          qnn-net-run使用libQnnModelDlc.so实用程序库作为–model参数和 Inception_v3_quantized.dlc 作为参数来执行模型–dlc_path。

          $ cd ${QNN_SDK_ROOT}/examples/Models/InceptionV3
          $ ${QNN_SDK_ROOT}/bin/x86_64-linux-clang/qnn-net-run \
                        --backend ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnHtp.so \
                        --model ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnModelDlc.so \
                        --dlc_path ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3_quantized.dlc \
                        --input_list data/cropped/raw_list.txt
          

          笔记

          HTP 仿真后端需要量化模型。有关量化的更多信息,请参阅模型量化。

          结果将位于${QNN_SDK_ROOT}/examples/Models/InceptionV3/output。

          查看结果。

          $ python ${QNN_SDK_ROOT}/examples/Models/InceptionV3/scripts/show_inceptionv3_classifications.py -i data/cropped/raw_list.txt \
                                          -o output/ \
                                          -l data/imagenet_slim_labels.txt
          

          在安卓上执行

          在 Android 目标上运行 HTP 后端与在 Android 目标上运行 CPU 和 GPU 后端类似,不同之处在于 HTP 后端需要量化模型和用户生成的序列化上下文。有关量化的更多信息,请参阅模型量化。

          1. qnn-context-binary-generator通过使用 libQnnModelDlc.so 作为–model参数和量化 DLC 作为参数运行,从 DLC 生成序列化上下文–dlc_path。
          $ ${QNN_SDK_ROOT}/bin/x86_64-linux-clang/qnn-context-binary-generator \
                        --backend ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnHtp.so \
                        --model ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnModelDlc.so \
                        --dlc_path ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3_quantized.dlc \
                        --binary_file Inception_v3_quantized.serialized
          

          上下文将在 处创建./output/Inception_v3_quantized.serialized.bin。

          1. 在 Android 设备上为示例创建一个目录。
          $ adb shell "mkdir /data/local/tmp/inception_v3"
          
          1. 将必要的库和 DLC 推送到设备。
          $ adb push ${QNN_SDK_ROOT}/lib/hexagon-v68/unsigned/libQnnHtpV68Skel.so /data/local/tmp/inception_v3
          $ adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnHtpV68Stub.so /data/local/tmp/inception_v3
          $ adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnHtp.so /data/local/tmp/inception_v3
          $ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/output/Inception_v3_quantized.serialized.bin /data/local/tmp/inception_v3
          

          笔记

          本节演示了 Android 上的 HTP 执行以及离线准备的图形步骤。要执行设备上(在线)准备好的图表,请推送设备上准备库和量化 DLC。

          $ adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnHtpPrepare.so /data/local/tmp/inception_v3
          $ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3_quantized.dlc /data/local/tmp/inception_v3
          

          将输入数据和列表推送到设备。

          $ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/cropped /data/local/tmp/inception_v3
          $ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/target_raw_list.txt /data/local/tmp/inception_v3
          

          将qnn-net-run工具推至设备。

          $ adb push ${QNN_SDK_ROOT}/bin/aarch64-android/qnn-net-run /data/local/tmp/inception_v3
          

          设置设备环境。

          $ adb shell
          $ cd /data/local/tmp/inception_v3
          $ export LD_LIBRARY_PATH=/data/local/tmp/inception_v3
          $ export ADSP_LIBRARY_PATH="/data/local/tmp/inception_v3"
          

          qnn-net-run使用以下参数运行。

          $ ./qnn-net-run --backend libQnnHtp.so --input_list target_raw_list.txt --retrieve_context Inception_v3_quantized.serialized.bin
          

          运行的输出将位于默认的 ./output 目录中。

          退出设备并查看结果。

          $ exit
          $ cd ${QNN_SDK_ROOT}/examples/Models/InceptionV3
          $ adb pull /data/local/tmp/inception_v3/output output_android
          $ python3 ${QNN_SDK_ROOT}/examples/Models/InceptionV3/scripts/show_inceptionv3_classifications.py -i data/cropped/raw_list.txt \
                                         -o output_android/ \
                                         -l data/imagenet_slim_labels.txt