Search
You were hired as part of the team developing a new image processing software (similar to Photoshop or Gimp). To get edge over existing programs and to attract professional users, your program is supposed to be super-fast. Your task is to develop one of the core operations - image convolution - as consequence your implementation must be as fast as possible. In simple terms the convolution replaces pixels in original image by weighted average of pixels in neighbourhood (averaging is done separately for each colour component).
To simplify and speed-up other operations, your software always keeps up-to-date brightness histogram for the image. Thus while you are applying convolution or immediately afterwards you have to recalculate brightness histogram.
The archive with homework code should be submitted in BRUTE.
Your homework is to implement following convolution mask for sharpening the image. You are required to minimise time to process the image by utilising system caches in the most efficient way. The convolution mask for sharpening is following:
Pixels on the edges (first/last row and first/last column) are just copied from the original image. In other words if the convolution can not be computed due to mask being outside the image, the pixel is just copied.
The second part is to create a brightness histogram of output image and store values for the histogram into a file. For conversion from RGB to brightness use following formula:
Y = round(0.2126*R + 0.7152*G + 0.0722*B)
The bins for the histogram are defined by following table:
For simplicity, input image is in binary encoded portable pixmap format (PPM). Image in this file format always start with string P6 on the first line. On the next line there are two string values - width and height of the image in pixels. These values are separated by a white space (assume only space). On the next line there is maximal intensity recorded in the image (always 255). On the next line starts the pixel data - triplet of unsigned 8 bit integer for each pixel, each integer stands for red/green/blue component for this pixel in this order.
Two example images in PPM format follows:
vit_small.zip vit_normal.zip
The image file to process will be passed on command line (see parameters of main() function).
main()
You will output two files from your program - a sharpened image in file named output.ppm and a histogram data in file output.txt. In histogram file, there will be values for individual bins (0-50, 51-102, …) separated by single space.
The program you will submit will be evaluated based on
For evaluation purposes we will use following assumption on cache size: L1 data and instruction caches are 8-way set associative, 64 bytes per block of 32 KB each. L2 cache is 1MB in size, 16-way set associative with 64 bytes per block. In ideal case you program should be able to adapt to any cache parameters, but for homework it is OK to have these values fixed.
Points
Points for efficiency will be awarded based on “Cost” of your program. Cost is calculated as follows:
Cost = AMAT_i*I_refs + AMAT_d*D_refs,
where
Give cost, your points will be determined based on following formula:
Efficiency_pts = 13*(1 - (Cost_Max - Cost)/(Cost_Max - Cost_Min)) where Cost_Max=5.0 s Cost_Min=1.0 s
Efficiency_pts will be limited to range 0 to 13 and rounded to nearest upper multiply of 0.25. Together with one point for the correct answer the maximal sum is 14 points.
The program has to be written as portable C or C + + code without use of external libraries. Input/output picture format is selected such way that reading and writing can be implemented in a few lines of C code. Code can be optimized and use Intel x86 architecture extensions to achieve better memory access pattern. Upload system compiles and run tests in a similar GNU/Linux operating system environment as is used during course seminaries.
To submit your homework, you have to use upload system: http://cw.felk.cvut.cz/upload/
You have to submit your source file (named main.c or main.cpp if you use C++ instead of plain C) and the source file has to be compressed using ZIP. All these requirements are due to limitations of upload system and you really have to put all your sources into one file.
gcc -mssse3 -g -O1 -Wall -Werror -std=c99 -o main main.c -lm
g++ -mssse3 -g -O1 -Wall -Werror -o main main.cpp -lm
All homework will be tested for cheating (all cheating students will treated equally, even original authors).
output.txt
Cost of your solution is fairly well reflected in the time needed to finish the convolution (the faster program means lower cost). To measure the time needed to finish your implementation, you can use C function called gettimeofday (see man gettimeofday). See the example below:
gettimeofday
man gettimeofday
#include <unistd.h> #include <sys/time.h> ... struct timeval start, stop; gettimeofday(&start, NULL); ... gettimeofday(&stop, NULL); double accum = ( stop.tv_sec - start.tv_sec )*1000.0 + ( stop.tv_nsec - start.tv_nsec )/ 1000000.0; printf( "Time: %.6lf ms\n", accum );
Do not forget to link library rt during compilation by adding -lrt to the compiler command line (i.e. g++ main.cpp -g -lrt).
rt
-lrt
g++ main.cpp -g -lrt
You can use cachegrind tool to evaluate how your program utilise the CPU caches. This tool outputs number of cache misses and cache hits during execution of your program. By default the cachegrind uses cache configuration of your computer. To set other parameters you have to specify them via command line arguments (–I1, –D1, –LL).
cachegrind
valgrind --tool=cachegrind --I1=32768,8,64 --D1=32768,8,64 --LL=1048576,16,64 ./a.out
You can also get more detailed analysis of your program's behaviour by using cg_annotate.
cg_annotate
cg_annotate <filename> ~/hw2/main.c
where <filename> is path to the file with detailed output generated from cachegrind, please use absolute path (starting with / or ~) to your source code.
To verify output of your program, you can use this Testing Picture with source and transformed image and histogram.