Qualcomm® AI Engine Direct 使用手册（26）

8.2 高级的

8.2.1 QNN HTP 共享缓冲区教程

8.2.2 使用 DLC 执行

8.2 高级的

8.2.1 QNN HTP 共享缓冲区教程

介绍

本教程介绍如何使用数据缓冲区在 QNN HTP 后端的处理域之间进行共享访问。使用共享缓冲区可以消除主机 CPU 上的客户端代码和 HTP 加速器之间的数据复制。

HTP 后端支持两种类型的共享内存。

Qnn_MemDescriptor_t 类型	QnnMemHtp_Descriptor_t 类型	描述符
QNN_MEM_TYPE_ION	1、不适用	每个张量将被映射到它自己的共享缓冲区; 2、文件描述符和内存句柄之间的一对一关系
QNN_MEM_TYPE_CUSTOM	QNN_HTP_MEM_SHARED_BUFFER	1、多个张量将被映射到一个共享缓冲区；2、文件描述符和内存句柄之间的一对多关系

》笔记

本教程仅关注共享缓冲区的使用。SDK 示例代码中有一些先决条件，此处未详细讨论。用户可以参考QNN文档中的相应部分，或者参考SampleApp。

SampleApp 文档：示例应用程序教程

示例应用代码：${QNN_SDK_ROOT}/examples/QNN/SampleApp

加载必备共享库

配备高通芯片组的硬件设备包含一个共享库，该库提供共享缓冲区操作的功能。

加载共享库

该libcdsprpc.so共享库可在大多数配备高通芯片组的主流设备（SD888 及更高版本）上使用。

我们可以动态加载它，如下所示：

1 void* libCdspHandle = dlopen("libcdsprpc.so", RTLD_NOW | RTLD_LOCAL);
2
3 if (nullptr == libCdspHandle) {4   // handle errors
5 }

解析符号

共享库成功加载后，我们可以继续解析所有必需的符号。

下面的代码片段显示了解析共享库中符号的模板：

 1/**
 2* Defination: void* rpcmem_alloc(int heapid, uint32 flags, int size);
 3* Allocate a buffer via ION and register it with the FastRPC framework.
 4* @param[in] heapid  Heap ID to use for memory allocation.
 5* @param[in] flags   ION flags to use for memory allocation.
 6* @param[in] size    Buffer size to allocate.
 7* @return            Pointer to the buffer on success; NULL on failure.
 8*/
 9typedef void *(*RpcMemAllocFn_t)(int, uint32_t, int);
10
11/**
12* Defination: void rpcmem_free(void* po);
13* Free a buffer and ignore invalid buffers.
14*/
15typedef void (*RpcMemFreeFn_t)(void *);
16
17/**
18* Defination: int rpcmem_to_fd(void* po);
19* Return an associated file descriptor.
20* @param[in] po  Data pointer for an RPCMEM-allocated buffer.
21* @return        Buffer file descriptor.
22*/
23typedef int (*RpcMemToFdFn_t)(void *);
24
25RpcMemFreeFn_t rpcmem_alloc = (RpcMemAllocFn_t)dlsym(libCdspHandle, "rpcmem_alloc");
26RpcMemFreeFn_t rpcmem_free = (RpcMemFreeFn_t)dlsym(libCdspHandle, "rpcmem_free");
27RpcMemToFdFn_t rpcmem_to_fd = (RpcMemToFdFn_t)dlsym(libCdspHandle, "rpcmem_to_fd");
28if (nullptr == rpcmem_alloc || nullptr == rpcmem_free || nullptr == rpcmem_to_fd) {29    dlclose(libCdspHandle);
30    // handle errors
31}

将 QNN_MEM_TYPE_ION 与 QNN API 结合使用

以下是 ION 共享缓冲区的表示，其中每个张量都有自己的共享缓冲区，具有自己唯一的内存指针、文件描述符和内存句柄。

一个例子如下所示：

HTP 共享缓冲区示例

 1// QnnInterface_t is defined in ${QNN_SDK_ROOT}/include/QNN/QnnInterface.h
 2QnnInterface_t qnnInterface;
 3// Init qnn interface ......
 4// See ${QNN_SDK_ROOT}/examples/QNN/SampleApp code
 5
 6// Qnn_Tensor_t is defined in ${QNN_SDK_ROOT}/include/QNN/QnnTypes.h
 7Qnn_Tensor_t inputTensor;
 8// Set up common setting for inputTensor ......
 9/* There are 2 specific settings for shared buffer:
10*  1. memType should be QNN_TENSORMEMTYPE_MEMHANDLE; (line 40)
11*  2. union member memHandle should be used instead of clientBuf, and it
12*     should be set to nullptr. (line 41)
13*/
14
15
16size_t bufSize;
17// Calculate the bufSize base on tensor dimensions and data type ......
18
19#define RPCMEM_HEAP_ID_SYSTEM 25
20#define RPCMEM_DEFAULT_FLAGS 1
21
22// Allocate the shared buffer
23uint8_t* memPointer = (uint8_t*)rpcmem_alloc(RPCMEM_HEAP_ID_SYSTEM, RPCMEM_DEFAULT_FLAGS, bufSize);
24if (nullptr == memPointer) {25    // handle errors
26}
27
28int memFd = rpcmem_to_fd(memPointer);
29if (-1 == memfd) {30    // handle errors
31}
32
33// Fill the info of Qnn_MemDescriptor_t and regist the buffer to QNN
34// Qnn_MemDescriptor_t is defined in ${QNN_SDK_ROOT}/include/QNN/QnnMem.h
35Qnn_MemDescriptor_t memDescriptor = QNN_MEM_DESCRIPTOR_INIT;
36memDescriptor.memShape = {inputTensor.rank, inputTensor.dimensions, nullptr};
37memDescriptor.dataType = inputTensor.dataType;
38memDescriptor.memType = QNN_MEM_TYPE_ION;
39memDescriptor.ionInfo.fd = memfd;
40inputTensor.memType = QNN_TENSORMEMTYPE_MEMHANDLE;
41inputTensor.memHandle = nullptr;
42Qnn_ContextHandle_t context; // Must obtain a QNN context handle before memRegister()
43// To obtain QNN context handle:
44// For online prepare, refer to ${QNN_SDK_ROOT}/docs/general/sample_app.html#create-context
45// For offline prepare, refer to ${QNN_SDK_ROOT}/docs/general/sample_app.html#load-context-from-a-cached-binary
46Qnn_ErrorHandle_t registRet = qnnInterface->memRegister(context, &memDescriptor, 1u, &(inputTensor.memHandle));
47if (QNN_SUCCESS != registRet) {48    rpcmem_free(memPointer);
49    // handle errors
50}
51
52/**
53* At this place, the allocation and registration of the shared buffer has been complete.
54* On QNN side, the buffer has been bound by memfd
55* On user side, this buffer can be manipulated through memPointer.
56*/
57
58/**
59* Optionally, user can also allocate and register shared buffer for output as adove codes (lines 7-46).
60* And if so the output buffer also should be deregistered and freed as below codes (lines 66-70).
61*/
62
63// Load the input data to memPointer ......
64
65// Execute QNN graph with input tensor and output tensor ......
66
67// Get output data ......
68
69// Deregister and free all buffers if it's not being used
70Qnn_ErrorHandle_t deregisterRet = qnnInterface->memDeRegister(&tensors.memHandle, 1);
71if (QNN_SUCCESS != registRet) {72    // handle errors
73}
74rpcmem_free(memPointer);

将 QNN_HTP_MEM_SHARED_BUFFER 与 QNN API 结合使用

以下是多张量共享缓冲区的表示，其中一组张量映射到单个共享缓冲区。这个单个共享缓冲区有一个内存指针和一个文件描述符，但是每个张量都有自己的内存指针偏移量和内存句柄。

一个例子如下所示：

HTP 多张量共享缓冲区示例

 1// QnnInterface_t is defined in ${QNN_SDK_ROOT}/include/QNN/QnnInterface.h
  2QnnInterface_t qnnInterface;
  3// Init qnn interface ......
  4// See ${QNN_SDK_ROOT}/examples/QNN/SampleApp code
  5
  6// Total number of input tensors
  7size_t numTensors;
  8
  9// Qnn_Tensor_t is defined in ${QNN_SDK_ROOT}/include/QNN/QnnTypes.h
 10Qnn_Tensor_t inputTensors[numTensors];
 11// Set up common setting for inputTensor ......
 12/* There are 2 specific settings for shared buffer:
 13*  1. memType should be QNN_TENSORMEMTYPE_MEMHANDLE; (line 40)
 14*  2. union member memHandle should be used instead of clientBuf, and it
 15*     should be set to nullptr. (line 41)
 16*/
 17
 18// Calculate the shared buffer size
 19uint64_t totalBufferSize;
 20for (size_t tensorIdx = 0; tensorIdx < numTensors; tensorIdx++) { 21   // Calculate the tensorSize based on tensor dimensions and data type
 22   totalBufferSize += tensorSize;
 23}
 24
 25#define RPCMEM_HEAP_ID_SYSTEM 25
 26#define RPCMEM_DEFAULT_FLAGS 1
 27
 28// Allocate the shard buffer
 29uint8_t* memPointer = (uint8_t*)rpcmem_alloc(RPCMEM_HEAP_ID_SYSTEM, RPCMEM_DEFAULT_FLAGS, totalBufferSize);
 30if (nullptr == memPointer) { 31    // handle errors
 32}
 33
 34// Get a file descriptor for the buffer
 35int memFd = rpcmem_to_fd(memPointer);
 36if (-1 == memfd) { 37    // handle errors
 38}
 39
 40// Regiter the memory handles using memory descriptors
 41// This is the offset of the tensor location in the shared buffer
 42uint64_t offset;
 43for (size_t tensorIdx = 0; tensorIdx < numTensors; tensorIdx++) { 44   // Fill the info of Qnn_MemDescriptor_t and register the descriptor to QNN
 45   // Qnn_MemDescriptor_t is defined in ${QNN_SDK_ROOT}/include/QNN/QnnMem.h
 46   Qnn_MemDescriptor_t memDescriptor;
 47   memDescriptor.memShape = {inputTensors[tensorIdx].rank, inputTensors[tensorIdx].dimensions, nullptr};
 48   memDescriptor.dataType = inputTensors[tensorIdx].dataType;
 49   memDescriptor.memType = QNN_MEM_TYPE_CUSTOM;
 50   inputTensor[tensorIdx].memType = QNN_TENSORMEMTYPE_MEMHANDLE;
 51   inputTensor[tensorIdx].memHandle = nullptr;
 52
 53   // Fill the info of QnnMemHtp_Descriptor_t and set as custom info
 54   // QnnMemHtp_Descriptor_t is defined in ${QNN_SDK_ROOT}/include/QNN/HTP/QnnHtpMem.h
 55   QnnMemHtp_Descriptor_t htpMemDescriptor;
 56   htpMemDescriptor.type = QNN_HTP_MEM_SHARED_BUFFER;
 57   htpMemDescriptor.size = totalBufferSize; //Note: it's total buffer size
 58
 59   QnnHtpMem_SharedBufferConfig_t htpSharedBuffConfig = {memFd, offset};
 60   htpMemDescriptor.sharedBufferConfig = htpSharedBuffConfig;
 61
 62   memDescriptor.customInfo = &htpMemDescriptor;
 63
 64   Qnn_ContextHandle_t context; // Must obtain a QNN context handle before memRegister()
 65   // To obtain QNN context handle:
 66   // For online prepare, refer to ${QNN_SDK_ROOT}/docs/general/sample_app.html#create-context
 67   // For offline prepare, refer to ${QNN_SDK_ROOT}/docs/general/sample_app.html#load-context-from-a-cached-binary
 68
 69   Qnn_ErrorHandle_t registRet = qnnInterface->memRegister(context, &memDescriptor, 1u, &(inputTensor[tensorIdx].memHandle));
 70   if (QNN_SUCCESS != registRet) { 71      // Deregister already created memory handles
 72      rpcmem_free(memPointer);
 73      // handle errors
 74   }
 75
 76   // move offset by the tensor size
 77   offset = offset + tensorSize;
 78}
 79
 80/**
 81* At this place, the allocation and registration of the shared buffer has been complete.
 82* On QNN side, the buffer has been bound by memfd
 83* On user side, this buffer can be manipulated through memPointer and offset.
 84*/
 85
 86/**
 87* Optionally, user can also allocate and register shared buffer for output as adove codes (lines 7-78).
 88* And if so the output buffer also should be deregistered and freed as below codes (lines 98-104).
 89*/
 90
 91// Load the input data to memPointer with respecitve offsets ......
 92
 93// Execute QNN graph with input tensors and output tensors ......
 94
 95// Get output data from the memPointer and offset combination ......
 96
 97// Deregister all mem handles the buffer if it's not being used
 98for (size_t tensorIdx = 0; tensorIdx < numTensors; tensorIdx++) { 99   Qnn_ErrorHandle_t deregisterRet = qnnInterface->memDeRegister(&(inputTensors[tensorIdx].memHandle), 1);
100   if (QNN_SUCCESS != registRet) {101    // handle errors
102   }
103}
104rpcmem_free(memPointer);

8.2.2 使用 DLC 执行

教程设置

本教程假设已遵循QNN和SNPE的一般设置说明。特别是，使用工具转换为 DLC需要适当设置 PYTHONPATH 和 SNPE_ROOT。

此外，本教程需要获取 Inception V3 Tensorflow 模型文件和示例图像。这是由提供的安装脚本处理的setup_inceptionv3.py。该脚本位于：

${QNN_SDK_ROOT}/examples/Models/InceptionV3/scripts/setup_inceptionv3.py

用法如下：

usage: setup_inceptionv3.py [-h] -a ASSETS_DIR [-d] [-c] [-q]
Prepares the inception_v3 assets for tutorial examples.
required arguments:
  -a ASSETS_DIR, --assets_dir ASSETS_DIR
                        directory containing the inception_v3 assets
optional arguments:
  -d, --download        Download inception_v3 assets to inception_v3 example
                        directory
  -c, --convert_model   Convert and compile model once acquired.
  -q, --quantize_model  Quantize the model during conversion. Only available
                        if --c or --convert_model option is chosen

在使用脚本之前，请将环境变量设置TENSORFLOW_HOME为指向TensorFlow包的安装位置。该脚本使用 TensorFlow 实用程序，例如 optimize_for_inference.py，它们位于 TensorFlow 安装目录中。

找到TensorFlow包的位置：

$ python3 -m pip show tensorflow

TENSORFLOW_HOME使用 TensorFlow 包的安装位置（步骤 #1 中输出的位置字段）设置环境变量：

$ export TENSORFLOW_HOME=/tensorflow_core

使用以下脚本安装 Inception V3 TensorFlow 模型和示例图像setup_inceptionv3.py：

$ python3 ${QNN_SDK_ROOT}/examples/Models/InceptionV3/scripts/setup_inceptionv3.py -a ~/tmpdir -d

该模型文件现在应填充在以下位置：

${QNN_SDK_ROOT}/examples/Models/InceptionV3/tensorflow/inception_v3_2016_08_28_frozen.pb

此原始图像现在应填充在以下位置：

${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/cropped

型号转换

获取模型资产后，可以使用 Qualcomm® 神经处理 SDK 中的转换工具将模型转换为 DLC。

笔记
HTP 和 DSP 后端需要使用量化模型。请参阅模型量化以生成量化的 DLC。

使用snpe-tensorflow-to-dlc工具转换 Inception V3 模型。

$ ${SNPE_ROOT}/bin/x86_64-linux-clang/snpe-tensorflow-to-dlc \
  --input_network ${QNN_SDK_ROOT}/examples/Models/InceptionV3/tensorflow/inception_v3_2016_08_28_frozen.pb \
  --input_dim input 1,299,299,3 \
  --out_node InceptionV3/Predictions/Reshape_1 \
  --output_path ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3.dlc \

这会生成${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3.dlcDLC 文件。

DLC 包含序列化模型、网络拓扑和关联的模型数据。

模型量化

DLC 可以使用snpe-dlc-quantize 工具进行量化。用法示例如下：

$ ${SNPE_ROOT}/bin/x86_64-linux-clang/snpe-dlc-quantize \
  --input_dlc ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3.dlc \
  --input_list ${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/cropped/raw_list.txt \
  --output_dlc ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3_quantized.dlc \

这将产生以下工件：

${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3_quantized.dlc

笔记
量化模型时，输入列表必须包含输入数据的绝对路径。

执行需要生成的 DLC 和提供的实用程序库libQnnModelDlc.so。该库扩展了QNN 模型 API 以组成 QNN 图并从提供的 DLC 路径返回其句柄。

ModelError_t QnnModel_composeGraphsFromDlc(Qnn_BackendHandle_t backendHandle,
                                        QNN_INTERFACE_VER_TYPE interface,
                                        Qnn_ContextHandle_t contextHandle,
                                        const GraphConfigInfo_t **graphsConfigInfo,
                                        const char *dlcPath,
                                        const uint32_t numGraphsConfigInfo,
                                        GraphInfoPtr_t **graphsInfo,
                                        uint32_t *numGraphsInfo,
                                        bool debug,
                                        QnnLog_Callback_t logCallback,
                                        QnnLog_Level_t maxLogLevel)

QnnGraph_ComposeGraphs这与添加了输入参数的 API相同dlcPath 。然后可以最终确定并执行返回的 QNN 图句柄。

以下部分演示了 DLC 的执行。

CPU后端执行

在Linux主机上执行

qnn-net-run使用libQnnModelDlc.so实用程序库作为–model参数和 Inception_v3.dlc 作为参数来执行模型–dlc_path。

$ cd ${QNN_SDK_ROOT}/examples/Models/InceptionV3
$ ${QNN_SDK_ROOT}/bin/x86_64-linux-clang/qnn-net-run \
              --backend ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnCpu.so \
              --model ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnModelDlc.so \
              --dlc_path ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3.dlc \
              --input_list data/cropped/raw_list.txt

结果将位于${QNN_SDK_ROOT}/examples/Models/InceptionV3/output。

查看结果。

$ python ${QNN_SDK_ROOT}/examples/Models/InceptionV3/scripts/show_inceptionv3_classifications.py -i data/cropped/raw_list.txt \
                                -o output/ \
                                -l data/imagenet_slim_labels.txt

在安卓上执行

在 Android 目标上运行 CPU 后端与在 Linux x86 目标上运行类似。

在 Android 设备上为示例创建一个目录。

$ adb shell "mkdir /data/local/tmp/inception_v3"

将必要的库和 DLC 推送到设备。

$ adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnCpu.so /data/local/tmp/inception_v3
$ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3.dlc /data/local/tmp/inception_v3
$ adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnModelDlc.so /data/local/tmp/inception_v3

将输入数据和列表推送到设备。

$ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/cropped /data/local/tmp/inception_v3
$ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/target_raw_list.txt /data/local/tmp/inception_v3

将qnn-net-run工具推至设备。

$ adb push ${QNN_SDK_ROOT}/bin/aarch64-android/qnn-net-run /data/local/tmp/inception_v3

设置设备环境。

$ adb shell
$ cd /data/local/tmp/inception_v3
$ export LD_LIBRARY_PATH=/data/local/tmp/inception_v3

qnn-net-run使用以下参数运行。

$ ./qnn-net-run --backend libQnnCpu.so --model libQnnModelDlc.so --dlc_path Inception_v3.dlc --input_list target_raw_list.txt

运行的输出将位于默认的 ./output 目录中。

退出设备并查看结果。

$ exit
$ cd ${QNN_SDK_ROOT}/examples/Models/InceptionV3
$ adb pull /data/local/tmp/inception_v3/output output_android
$ python3 ${QNN_SDK_ROOT}/examples/Models/InceptionV3/scripts/show_inceptionv3_classifications.py -i data/cropped/raw_list.txt \
                                -o output_android/ \
                                -l data/imagenet_slim_labels.txt

GPU后端执行

笔记
不支持在 Windows 设备上运行 GPU 后端。

在安卓上执行

在 Android 目标上运行 GPU 后端与在 Android 目标上运行 CPU 后端类似。

在 Android 设备上为示例创建一个目录。

$ adb shell "mkdir /data/local/tmp/inception_v3"

将必要的库和 DLC 推送到设备。

$ adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnGpu.so /data/local/tmp/inception_v3
$ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3.dlc /data/local/tmp/inception_v3
$ adb push ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnModelDlc.so /data/local/tmp/inception_v3

将输入数据和列表推送到设备。

$ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/cropped /data/local/tmp/inception_v3
$ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/target_raw_list.txt /data/local/tmp/inception_v3

将qnn-net-run工具推至设备。

$ adb push ${QNN_SDK_ROOT}/bin/aarch64-android/qnn-net-run /data/local/tmp/inception_v3

设置设备环境。

$ adb shell
$ cd /data/local/tmp/inception_v3
$ export LD_LIBRARY_PATH=/data/local/tmp/inception_v3

qnn-net-run使用以下参数运行。

$ ./qnn-net-run --backend libQnnGpu.so --model libQnnModelDlc.so --dlc_path Inception_v3.dlc --input_list target_raw_list.txt

运行的输出将位于默认的 ./output 目录中。

退出设备并查看结果。

$ exit
$ cd ${QNN_SDK_ROOT}/examples/Models/InceptionV3
$ adb pull /data/local/tmp/inception_v3/output output_android
$ python3 ${QNN_SDK_ROOT}/examples/Models/InceptionV3/scripts/show_inceptionv3_classifications.py -i data/cropped/raw_list.txt \
                               -o output_android/ \
                               -l data/imagenet_slim_labels.txt

HTP 后端执行

在Linux主机上执行

笔记
可以使用 HTP 模拟后端在 Linux 主机上运行 HTP 后端。

qnn-net-run使用libQnnModelDlc.so实用程序库作为–model参数和 Inception_v3_quantized.dlc 作为参数来执行模型–dlc_path。

$ cd ${QNN_SDK_ROOT}/examples/Models/InceptionV3
$ ${QNN_SDK_ROOT}/bin/x86_64-linux-clang/qnn-net-run \
              --backend ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnHtp.so \
              --model ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnModelDlc.so \
              --dlc_path ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3_quantized.dlc \
              --input_list data/cropped/raw_list.txt

笔记
HTP 仿真后端需要量化模型。有关量化的更多信息，请参阅模型量化。

结果将位于${QNN_SDK_ROOT}/examples/Models/InceptionV3/output。

查看结果。

$ python ${QNN_SDK_ROOT}/examples/Models/InceptionV3/scripts/show_inceptionv3_classifications.py -i data/cropped/raw_list.txt \
                                -o output/ \
                                -l data/imagenet_slim_labels.txt

在安卓上执行

在 Android 目标上运行 HTP 后端与在 Android 目标上运行 CPU 和 GPU 后端类似，不同之处在于 HTP 后端需要量化模型和用户生成的序列化上下文。有关量化的更多信息，请参阅模型量化。

qnn-context-binary-generator通过使用 libQnnModelDlc.so 作为–model参数和量化 DLC 作为参数运行，从 DLC 生成序列化上下文–dlc_path。

$ ${QNN_SDK_ROOT}/bin/x86_64-linux-clang/qnn-context-binary-generator \
              --backend ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnHtp.so \
              --model ${QNN_SDK_ROOT}/lib/x86_64-linux-clang/libQnnModelDlc.so \
              --dlc_path ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3_quantized.dlc \
              --binary_file Inception_v3_quantized.serialized

上下文将在处创建./output/Inception_v3_quantized.serialized.bin。

在 Android 设备上为示例创建一个目录。

$ adb shell "mkdir /data/local/tmp/inception_v3"

将必要的库和 DLC 推送到设备。

$ adb push ${QNN_SDK_ROOT}/lib/hexagon-v68/unsigned/libQnnHtpV68Skel.so /data/local/tmp/inception_v3
$ adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnHtpV68Stub.so /data/local/tmp/inception_v3
$ adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnHtp.so /data/local/tmp/inception_v3
$ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/output/Inception_v3_quantized.serialized.bin /data/local/tmp/inception_v3

笔记
本节演示了 Android 上的 HTP 执行以及离线准备的图形步骤。要执行设备上（在线）准备好的图表，请推送设备上准备库和量化 DLC。

$ adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnHtpPrepare.so /data/local/tmp/inception_v3
$ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/model/Inception_v3_quantized.dlc /data/local/tmp/inception_v3

将输入数据和列表推送到设备。

$ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/cropped /data/local/tmp/inception_v3
$ adb push ${QNN_SDK_ROOT}/examples/Models/InceptionV3/data/target_raw_list.txt /data/local/tmp/inception_v3

将qnn-net-run工具推至设备。

$ adb push ${QNN_SDK_ROOT}/bin/aarch64-android/qnn-net-run /data/local/tmp/inception_v3

设置设备环境。

$ adb shell
$ cd /data/local/tmp/inception_v3
$ export LD_LIBRARY_PATH=/data/local/tmp/inception_v3
$ export ADSP_LIBRARY_PATH="/data/local/tmp/inception_v3"

qnn-net-run使用以下参数运行。

$ ./qnn-net-run --backend libQnnHtp.so --input_list target_raw_list.txt --retrieve_context Inception_v3_quantized.serialized.bin

运行的输出将位于默认的 ./output 目录中。

退出设备并查看结果。

$ exit
$ cd ${QNN_SDK_ROOT}/examples/Models/InceptionV3
$ adb pull /data/local/tmp/inception_v3/output output_android
$ python3 ${QNN_SDK_ROOT}/examples/Models/InceptionV3/scripts/show_inceptionv3_classifications.py -i data/cropped/raw_list.txt \
                               -o output_android/ \
                               -l data/imagenet_slim_labels.txt

Qualcomm® AI Engine Direct 使用手册（26）

分类:热门推荐日期:2024-04-01浏览:1评论:0

Qualcomm® AI Engine Direct 使用手册（26）

8.2 高级的

8.2.1 QNN HTP 共享缓冲区教程

8.2.2 使用 DLC 执行

缓冲区 设备

相关推荐