Search
This homework covers multi-thread processing of text files. You should implement program which loads all files from a directory and outputs letter frequency table (LFT) for all the files together. The output of the program should be all nonzero occurances of the letters.
Your program has 3 arguments:
You can generate 1000 text files with 3 lines of random text using following bash command:
for i in $(seq -w 0 1000);do base64 /dev/urandom | awk '{print(0==NR%10)?"":$1}' | sed 's/[^[:alpha:]]/ /g' | head -3 > $i.txt;done
Measure the time of processing using bash command “time” and check whether parallelism can speed up you program. Note the difference between 0 and 1 worker threads - both options process the files sequentially, but the latter process files on separate thread.
Points:
Hints:
Examples of what the program output could look like:
>> time ./main ./files 1000 0 thread num is 0 Running sequential PRINTING BUFFER! 3000 42665 A 3534 B 3483 C 3570 D 3596 E 3618 F 3672 G 3552 H 3612 I 3506 J 3481 K 3632 L 3454 M 3591 N 3611 O 3645 P 3572 Q 3577 R 3520 S 3635 T 3644 U 3559 V 3416 W 3544 X 3580 Y 3666 Z 3580 a 3507 b 3575 c 3588 d 3515 e 3593 f 3451 g 3520 h 3597 i 3526 j 3498 k 3482 l 3622 m 3527 n 3611 o 3488 p 3621 q 3624 r 3561 s 3561 t 3530 u 3646 v 3432 w 3572 x 3593 y 3563 z 3682 ./main ./files 1000 0 0,00s user 0,01s system 95% cpu 0,008 total
>> time ./main ./files 1000 4 thread num is 4 Running parallel Spawning thread 0 Spawning thread 1 Spawning thread 2 Spawning thread 3 PRINTING BUFFER! 3000 42665 A 3534 B 3483 C 3570 D 3596 E 3618 F 3672 G 3552 H 3612 I 3506 J 3481 K 3632 L 3454 M 3591 N 3611 O 3645 P 3572 Q 3577 R 3520 S 3635 T 3644 U 3559 V 3416 W 3544 X 3580 Y 3666 Z 3580 a 3507 b 3575 c 3588 d 3515 e 3593 f 3451 g 3520 h 3597 i 3526 j 3498 k 3482 l 3622 m 3527 n 3611 o 3488 p 3621 q 3624 r 3561 s 3561 t 3530 u 3646 v 3432 w 3572 x 3593 y 3563 z 3682 ./main ./files 1000 4 0,01s user 0,00s system 260% cpu 0,004 total