performace-test

perf

perf 安装

https://xiaoyanzhuo.github.io/2019/01/18/Perf-Tool.html

1
2
$sudo apt-get update
$sudo apt-get install linux-tools-common linux-tools-generic linux-tools-`uname -r`
#### orin上安装

下载 JP-x.x.x Driver Package Sources: https://developer.nvidia.com/embedded/l4t/r35_release_v1.0/sources/public_sources.tbz2

执行:

1
2
3
4
复制压缩包到orin
cd到 kernel/kernel-4.9/tools/perf
make
./perf --version

常规用法

基础教程:https://access.redhat.com/documentation/zh-cn/red_hat_enterprise_linux/8/html/monitoring_and_managing_system_status_and_performance/common-perf-commands_getting-started-with-perf

用 perf 监控相关指标,生成报告,然后可以将报告用 https://profiler.firefox.com/ 进行查看

步骤:

1
2
1、sudo perf record -e cpu-clock -g -p PID sleep 10
2、perf script -i perf.data &> perf.unfold
然后将 .unfold 文件在 profiler.firefox.com or speedscope.app 网站上打开

perf 报错
1
2
3
4
Permission error mapping pages.
Consider increasing /proc/sys/kernel/perf_event_mlock_kb,
or try again with a smaller value of -m/--mmap_pages.
(current value: 4294967295,0)

加上 -m 2参数,要加在 sleep 10 前面变成

1
sudo perf record -e cpu-clock -g -p PID -m 2 sleep 10

查看TLB命中率

https://blog.csdn.net/hbuxiaofei/article/details/128402495 查看系统的TLB命中率 perf stat -e iTLB-load,iTLB-load-misses -a -I 1000 查看某个进程TLB 命中率, ctrl + c 后输出结果 perf stat -e iTLB-load,iTLB-load-misses -a -p pid

Flame Graph生成火焰图

Flame Graph项目位于GitHub上: https://github.com/brendangregg/FlameGraph

git clone https://github.com/brendangregg/FlameGraph.git ./FlameGraph/stackcollapse-perf.pl perf.unfold &> perf.folded ./FlameGraph/flamegraph.pl perf.folded > perf.svg

nvidia-system

基本用法

orin上

/opt/nvidia/nsight-systems/2022.3.3/target-linux-tegra-armv8/nsys profile -y 60 -d 20 --gpuctxsw=true -o out_file mainboard -d ./dag/test.dag --trace nvvideo --cudabacktrace=all --cuda-memory-usage=true

程序开始运行60s后,记录20s.然后core dump 生成记录报告out_file

安装

https://developer.nvidia.com/nsight-systems/get-started 官网下载.deb包

1
2
3
4
5
6
7
8
sudo dpkg -i NsightSystems-linux-cli-public-2024.5.1.113-3461954.deb

# Selecting previously unselected package nsight-systems-cli-2024.5.1.
# (Reading database ... 142840 files and directories currently installed.)
# Preparing to unpack NsightSystems-linux-cli-public-2024.5.1.113-3461954.deb ...
# Unpacking nsight-systems-cli-2024.5.1 (2024.5.1.113-245134619542v0) ...
# Setting up nsight-systems-cli-2024.5.1 (2024.5.1.113-245134619542v0) ...
# update-alternatives: using /opt/nvidia/nsight-systems-cli/2024.5.1/target-linux-x64/nsys to provide /usr/local/bin/nsys (nsys) in auto mode

如果怕污染机器环境,甚至可以把/opt/nvidia/nsight-systems-cli/2024.5.1/ 目录从docker 拷贝到裸机环境中。

问题记录

  1. 采集不到gpu 相关的信息
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    Generating '/tmp/nsys-report-7fd4.qdstrm'
    [1/8] [========================100%] prof_query.nsys-rep
    [2/8] [========================100%] prof_query.sqlite
    [3/8] Executing 'nvtxsum' stats report
    SKIPPED: /home/stereo/guoqing.feng/output/log/prof_query.sqlite does not contain NV Tools Extension (NVTX) data.
    [4/8] Executing 'osrtsum' stats report

    Operating System Runtime API Statistics:

    Time (%) Total Time (ns) Num Calls Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
    -------- --------------- --------- --------- --------- -------- -------- ----------- ---------------------
    94.6 6,149,426 117 52,559.2 52,751.0 51,433 54,260 721.0 usleep
    3.1 204,093 1 204,093.0 204,093.0 204,093 204,093 0.0 pthread_rwlock_wrlock
    2.1 137,640 9 15,293.3 7,054.0 2,733 68,248 20,752.3 ioctl
    0.1 7,773 1 7,773.0 7,773.0 7,773 7,773 0.0 munmap

    [5/8] Executing 'cudaapisum' stats report
    SKIPPED: /home/stereo/guoqing.feng/output/log/prof_query.sqlite does not contain CUDA trace data.
    [6/8] Executing 'gpukernsum' stats report
    SKIPPED: /home/stereo/guoqing.feng/output/log/prof_query.sqlite does not contain CUDA kernel data.
    [7/8] Executing 'gpumemtimesum' stats report
    SKIPPED: /home/stereo/guoqing.feng/output/log/prof_query.sqlite does not contain GPU memory data.
    [8/8] Executing 'gpumemsizesum' stats report
    SKIPPED: /home/stereo/guoqing.feng/output/log/prof_query.sqlite does not contain GPU memory data.
    Generated:
    /home/stereo/guoqing.feng/output/log/prof_query.nsys-rep
    /home/stereo/guoqing.feng/output/log/prof_query.sqlite
    可能是nsys 版本不对,可是试试使用官网最新版本

本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!