performace-test
perf
perf 安装
https://xiaoyanzhuo.github.io/2019/01/18/Perf-Tool.html 1
2$sudo apt-get update
$sudo apt-get install linux-tools-common linux-tools-generic linux-tools-`uname -r`
下载 JP-x.x.x Driver Package Sources: https://developer.nvidia.com/embedded/l4t/r35_release_v1.0/sources/public_sources.tbz2
执行: 1
2
3
4复制压缩包到orin
cd到 kernel/kernel-4.9/tools/perf
make
./perf --version
常规用法
基础教程:https://access.redhat.com/documentation/zh-cn/red_hat_enterprise_linux/8/html/monitoring_and_managing_system_status_and_performance/common-perf-commands_getting-started-with-perf
用 perf 监控相关指标,生成报告,然后可以将报告用 https://profiler.firefox.com/ 进行查看
步骤: 1
21、sudo perf record -e cpu-clock -g -p PID sleep 10
2、perf script -i perf.data &> perf.unfold
perf 报错
1 |
|
加上 -m 2参数,要加在 sleep 10 前面变成 1
sudo perf record -e cpu-clock -g -p PID -m 2 sleep 10
查看TLB命中率
https://blog.csdn.net/hbuxiaofei/article/details/128402495
查看系统的TLB命中率
perf stat -e iTLB-load,iTLB-load-misses -a -I 1000
查看某个进程TLB 命中率, ctrl + c 后输出结果
perf stat -e iTLB-load,iTLB-load-misses -a -p pid
Flame Graph生成火焰图
Flame Graph项目位于GitHub上: https://github.com/brendangregg/FlameGraph
git clone https://github.com/brendangregg/FlameGraph.git ./FlameGraph/stackcollapse-perf.pl perf.unfold &> perf.folded ./FlameGraph/flamegraph.pl perf.folded > perf.svg
nvidia-system
基本用法
orin上
/opt/nvidia/nsight-systems/2022.3.3/target-linux-tegra-armv8/nsys profile -y 60 -d 20 --gpuctxsw=true -o out_file mainboard -d ./dag/test.dag --trace nvvideo --cudabacktrace=all --cuda-memory-usage=true
程序开始运行60s后,记录20s.然后core dump 生成记录报告out_file
安装
https://developer.nvidia.com/nsight-systems/get-started 官网下载.deb包
1 |
|
如果怕污染机器环境,甚至可以把/opt/nvidia/nsight-systems-cli/2024.5.1/ 目录从docker 拷贝到裸机环境中。
问题记录
- 采集不到gpu 相关的信息 可能是nsys 版本不对,可是试试使用官网最新版本
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27Generating '/tmp/nsys-report-7fd4.qdstrm'
[1/8] [========================100%] prof_query.nsys-rep
[2/8] [========================100%] prof_query.sqlite
[3/8] Executing 'nvtxsum' stats report
SKIPPED: /home/stereo/guoqing.feng/output/log/prof_query.sqlite does not contain NV Tools Extension (NVTX) data.
[4/8] Executing 'osrtsum' stats report
Operating System Runtime API Statistics:
Time (%) Total Time (ns) Num Calls Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
-------- --------------- --------- --------- --------- -------- -------- ----------- ---------------------
94.6 6,149,426 117 52,559.2 52,751.0 51,433 54,260 721.0 usleep
3.1 204,093 1 204,093.0 204,093.0 204,093 204,093 0.0 pthread_rwlock_wrlock
2.1 137,640 9 15,293.3 7,054.0 2,733 68,248 20,752.3 ioctl
0.1 7,773 1 7,773.0 7,773.0 7,773 7,773 0.0 munmap
[5/8] Executing 'cudaapisum' stats report
SKIPPED: /home/stereo/guoqing.feng/output/log/prof_query.sqlite does not contain CUDA trace data.
[6/8] Executing 'gpukernsum' stats report
SKIPPED: /home/stereo/guoqing.feng/output/log/prof_query.sqlite does not contain CUDA kernel data.
[7/8] Executing 'gpumemtimesum' stats report
SKIPPED: /home/stereo/guoqing.feng/output/log/prof_query.sqlite does not contain GPU memory data.
[8/8] Executing 'gpumemsizesum' stats report
SKIPPED: /home/stereo/guoqing.feng/output/log/prof_query.sqlite does not contain GPU memory data.
Generated:
/home/stereo/guoqing.feng/output/log/prof_query.nsys-rep
/home/stereo/guoqing.feng/output/log/prof_query.sqlite
本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!