Warning
This page is located in archive. Go to the latest version of this course pages.

Profiling in C/C++

Your own machine with Linux OS

Most comfortable way for you right now. Install only a few required tools from the repository (perf, hotspot, libopencv, libboost). MAC users can survive in this task as well, instead of perf and hotspot, you can use instruments or Clion for profiling.

Remote access to our server with all required tools installed

Usually, all required tools can be accessed from computers in the lab. This year, to support distance learning, we installed required software to our server and there are two possibilities how to work remotely with GUI applications:

  • Xpra-based access, which works over slower networks and from all operating systems. Additionally, it allows to reconnect to the running application when your computer disconnects for whatever reason.
  • X11 forwarding via SSH, which requires fast and low-latency network connection and is well supported only on Unix-based systems.

Using Xpra

Xpra is an open-source multi-platform persistent remote display server and client for forwarding applications and desktop screens. It gives you remote access to individual applications or full desktops. It is available for Windows / Mac OS X / Linux.

First, run xpra_launcher from command line:

xpra_launcher
Fill the following configuration:
Mode: SSH
Server: <username>@ritchie.ciirc.cvut.cz:22
Server Password: <CVUT password> or empty if you copy your public ssh key to the server first

And connect to the server. You can save and load your configuration.

After successful login, click in the right bottom corner to settings icon and then on Move. Move the window so that it is visible and press Default configuration. This should create applications panel for you.

Computers in laboratory accessible over Xpra

In case our server is overloaded, we will provide remote access to lab computers, however, it is a bit more complicated.

Computers in laboratory (probably not this year)

We have all required tools for all tasks installed in Debian available in the laboratory. To run proper system, select “DCE PXE Menu → DCE Linux (first item)” in the boot menu. Login into system by using your CTU username and KOS password. Your home directories can be accessed also remotely by using ssh: “ssh username@postel.felk.cvut.cz”.

Task assignment

Imagine that you are a developer in a company which develops autonomous driving assistance systems. You are given an algorithm which doesn't run as fast as your manager want. The algorithm finds ellipses in given picture and will be used for wheels detection of a neighbouring car while parking. Your task is to speed up this algorithm in order to run smoothly.

You probably need do following steps to achieve the desired speed-up:

  1. Download program from git repository: git clone https://gitlab.fel.cvut.cz/matejjoe/ellipses.git
  2. Compile program (simply run make in ellipse directory)
  3. Run program (./find_ellipse -h for help, example images are attached in repository, press q to quit in GUI mode)
  4. Do profiling (hints below)
  5. Make changes which will improve speed (code & compiler optimizations)
  6. Upload patch file into upload system and pass specified limit: Patch will be applied using git apply command, therefore, best way to generate patch is the git diff command:
    git fetch origin && git diff origin/master > ellipse.diff.txt
    (txt file type because of upload system).

You are not allowed to modify the number of iterations and other parameters of RANSAC algorithm!

Program requirements – if you want to compile program on your own machine, you will need OpenCV library (libopencv-dev package) and boost library (libboost-all-dev package). If you don't want to install new libraries on your machine, you can connect our server via ssh (ssh user@postel.felk.cvut.cz) and work remotely.

Basic profiling techniques

How can we evaluate the efficiency of our implementation? Run time gives a simple overview of the program. However, much more useful are different types of information such as the number of performed instructions, cache misses, or memory references in respective lines of code in order to find hot spots of our program.

Measuring execution time

Easiest program analysis is time measurements, which can be done by using C time library. More precision values can be obtained by using high_resolution_clock in chrono library (C++11) or Linux function clock_gettime (man clock_gettime).

http://www.cplusplus.com/reference/ctime/

http://www.cplusplus.com/reference/chrono/

http://man7.org/linux/man-pages/man2/clock_gettime.2.html

GProf

GProf is a GCC profiling tool, which is based on statistical sampling (every 1 ms or 10 ms). It collects time spent in each function and constructs call graph. A program has to be compiled with a particular option and all libraries, which you want to profile, have to be linked statically. Then, running the program will generate profiling information. Note, that the resulting data are not exact. Shared library profiling can be done with sprof (man sprof).

https://sourceware.org/binutils/docs/gprof/

http://man7.org/linux/man-pages/man1/sprof.1.html

Simulation using Cachegrind (Linux, Mac OS X)

Cachegrind is part of Valgrind simulation tool. It uses the processor emulation to run the binary program and catches all performed instructions, memory accesses and their relationship to source lines and functions in a program. The program can have linked shared libraries, doesn't need to be recompiled to be simulated. However, you probably want to compile with debugging info (-g option) in order to match correctly source code lines. In any case, simulation usually takes about 50 times more time than running on real hardware. Profiling data generated by Cachegrind and gprof can be virtualised simply by opening log file in kcachegring.

http://valgrind.org/docs/manual/cg-manual.html

https://kcachegrind.github.io/

If you are interested also in a relationship and exact event counts spent while calling functions, you can use Callgrind, which extends Cachegrind by adding this functionality.

Profiling using perf

In the most modern processors are present performance counters, which can count various hardware events (clock cycles, executed instructions, cache reads/hits/misses, etc.). Linux perf is able to analyze program using these counters.

https://perf.wiki.kernel.org/index.php/Main_Page

Moreover, you can use any of hardware events listed in proper reference manual. For Intel processors – Intel® 64 and IA-32 architectures software developer’s manual: Volume 3B (Chapter 18 and 19) – available from https://software.intel.com/en-us/articles/intel-sdm.

If you get rubbish in perf report, try to specify event in perf record (example: perf record -e cycles ./program). Also call graph is useful (perf record --call-graph dwarf -e cycles ./program).

A few examples of perf usage: http://www.brendangregg.com/perf.html

By default, using performance counters without sudo rights is not allowed. You can enable non-sudo user pmc access by execution of this command:

sudo sh -c 'echo 1 >/proc/sys/kernel/perf_event_paranoid'

Hotspot

The Linux perf GUI for performance analysis offers UI around Linux perf. You can download AppImage from https://github.com/KDAB/hotspot/releases (don't forget to set permissions to run - chmod +x file) or build yourself (https://github.com/KDAB/hotspot).

Handling performance counters directly from C/C++ program

If you are interested in performance counters, you can use it directly from your C/C++ program (without any external profiling tool). See perf_even_open manual page, or use some helper library built on kernel API (libpfm, PAPI toolkit, etc.)

http://man7.org/linux/man-pages/man2/perf_event_open.2.html

http://perfmon2.sourceforge.net

http://icl.cs.utk.edu/papi/index.html

Windows alternatives

We have no experience with Windows tools, however, there are a few free tools, for example, list on this page:

https://wiki.qt.io/Profiling_and_Memory_Checking_Tools

MS Visual Studio has profiler:

https://msdn.microsoft.com/en-us/library/mt210448.aspx

Another windows profilers

https://sourceforge.net/projects/lukestackwalker/ http://www.codersnotes.com/sleepy/

Also, Windows alternative to KCacheGrind – QCacheGrind:

https://sourceforge.net/projects/qcachegrindwin/

If you do not want install any of these tools, you can work remotely on our server via ssh. Putty and Xming are your friends.

How to optimize execution time?

There are many ways how to optimize your programs, you can

Several tip for optimizations: https://people.cs.clemson.edu/~dhouse/courses/405/papers/optimize.pdf

Sample CMakeLists.txt for compilation in various IDEs

cmake_minimum_required(VERSION 2.8)

set(CMAKE_CXX_STANDARD 11)

set(CMAKE_CXX_FLAGS  "${CMAKE_CXX_FLAGS} -g -O0 -Wall")

project(find_ellipse)

find_package(OpenCV REQUIRED)
find_package(Boost 1.60 COMPONENTS filesystem REQUIRED )
include_directories( ${Boost_INCLUDE_DIR} )

aux_source_directory(. SRC_LIST)

add_executable(${PROJECT_NAME} ${SRC_LIST})

target_link_libraries(${PROJECT_NAME} ${OpenCV_LIBS})
target_link_libraries(${PROJECT_NAME} ${Boost_LIBRARIES})

g++ -g -O0 -Wall -std=c++11 -I/usr/include/opencv *.cpp -o find_ellipse -lboost_filesystem -lboost_system -lopencv_shape -lopencv_stitching -lopencv_superres -lopencv_videostab -lopencv_aruco -lopencv_bgsegm -lopencv_bioinspired -lopencv_ccalib -lopencv_datasets -lopencv_dpm -lopencv_face -lopencv_freetype -lopencv_fuzzy -lopencv_hdf -lopencv_line_descriptor -lopencv_optflow -lopencv_video -lopencv_plot -lopencv_reg -lopencv_saliency -lopencv_stereo -lopencv_structured_light -lopencv_phase_unwrapping -lopencv_rgbd -lopencv_viz -lopencv_surface_matching -lopencv_text -lopencv_ximgproc -lopencv_calib3d -lopencv_features2d -lopencv_flann -lopencv_xobjdetect -lopencv_objdetect -lopencv_ml -lopencv_xphoto -lopencv_highgui -lopencv_videoio -lopencv_imgcodecs -lopencv_photo -lopencv_imgproc -lopencv_core

courses/b4m36esw/labs/lab01.txt · Last modified: 2021/02/17 11:17 by matejjoe