HW 04 - parallel processing

This homework covers multi-thread processing of text files. You should implement program which loads all files from a directory and outputs letter frequency table (LFT) for all the files together. The output of the program should be all nonzero occurances of the letters.

Your program has 3 arguments:

directory where the files are located
number of files to process
number of worker threads

You can generate 1000 text files with 3 lines of random text using following bash command:

 for i in $(seq -w 0 1000);do base64 /dev/urandom | awk '{print(0==NR%10)?"":$1}' | sed 's/[^[:alpha:]]/ /g' | head -3 > $i.txt;done

Measure the time of processing using bash command “time” and check whether parallelism can speed up you program. Note the difference between 0 and 1 worker threads - both options process the files sequentially, but the latter process files on separate thread.

Points:

2 pts for correct sequential implementation
3 pts for correct parallel implementation
2 pts for parallel implementation which is faster than sequential
1 pt for Makefile
2 pts for a short analysis (in an analysis.txt file) on the optimal number of threads on your device

Hints:

Constructing of the LFT was covered in midterm test.
Resources which are accessed from multiple threads should be guarded by mutex.
Use threads and not processes - $ #include pthread.h $

Examples of what the program output could look like:

>> time ./main ./files 1000 0                                                   
thread num is 0 
Running sequential
PRINTING BUFFER!
 3000 
  42665 
A 3534 
B 3483 
C 3570 
D 3596 
E 3618 
F 3672 
G 3552 
H 3612 
I 3506 
J 3481 
K 3632 
L 3454 
M 3591 
N 3611 
O 3645 
P 3572 
Q 3577 
R 3520 
S 3635 
T 3644 
U 3559 
V 3416 
W 3544 
X 3580 
Y 3666 
Z 3580 
a 3507 
b 3575 
c 3588 
d 3515 
e 3593 
f 3451 
g 3520 
h 3597 
i 3526 
j 3498 
k 3482 
l 3622 
m 3527 
n 3611 
o 3488 
p 3621 
q 3624 
r 3561 
s 3561 
t 3530 
u 3646 
v 3432 
w 3572 
x 3593 
y 3563 
z 3682 
./main ./files 1000 0  0,00s user 0,01s system 95% cpu 0,008 total

>> time ./main ./files 1000 4                                               
thread num is 4 
Running parallel
Spawning thread 0 
Spawning thread 1 
Spawning thread 2 
Spawning thread 3 
PRINTING BUFFER!
 3000 
  42665 
A 3534 
B 3483 
C 3570 
D 3596 
E 3618 
F 3672 
G 3552 
H 3612 
I 3506 
J 3481 
K 3632 
L 3454 
M 3591 
N 3611 
O 3645 
P 3572 
Q 3577 
R 3520 
S 3635 
T 3644 
U 3559 
V 3416 
W 3544 
X 3580 
Y 3666 
Z 3580 
a 3507 
b 3575 
c 3588 
d 3515 
e 3593 
f 3451 
g 3520 
h 3597 
i 3526 
j 3498 
k 3482 
l 3622 
m 3527 
n 3611 
o 3488 
p 3621 
q 3624 
r 3561 
s 3561 
t 3530 
u 3646 
v 3432 
w 3572 
x 3593 
y 3563 
z 3682 
./main ./files 1000 4  0,01s user 0,00s system 260% cpu 0,004 total