Introductory Labs

General information for Python development.

To fulfill this assignment, you need to submit these files (all packed in one .zip file) into the upload system:

basics.ipynb - a script for data initialisation, calling of the implemented functions and plotting of their results (for your convenience, will not be checked).
basics.py - file with implemented methods:
- matrix_manip - a method implementing the matrix manipulation tasks specified in the section Matrix manipulation
- compute_letter_mean and compute_lr_histogram - methods specified in the section Simple data task
initial1_mean.png, initial2_mean.png and initials_histograms.png - images specified in the section Simple data task

Use template of the assignment. When preparing a zip file for the upload system, do not include any directories, the files have to be in the zip file root.

Beware of using for loops! :)

PYTHON introduction

We will be using the Python programming language with the NumPy library during the whole semester. Make sure you are comfortable with these so that you don't spend more time dealing with python/numpy issues than solving the assignment tasks.

For the case you are not too sure about your Python/NumPy skills, have a look here: http://cs231n.github.io/python-numpy-tutorial/, ask your uncle (duckduckgo, google) or your teacher.

Start by reading General information for Python development and cloning the assignment template repository.

Matrix manipulation with NumPy

In the first part of today’s assignment, you will start with some simple matrix manipulation tasks.

TRY TO AVOID USING LOOPS IN YOUR PROGRAM!

Although numpy has a matrix class, we will not be using that. Instead, we will use the array class for representing matrices, vectors, images, lists, etc. We will import numpy using

import numpy as np

Your goal is to complete a function output = matrix_manip(A, B), where A and B are input matrices (represented by np.array). The matrix_manip function should return a python dict containing the results of the operations described below.

To have some data to work with, lets use the following matrices A and B:

A = np.array([[16,  2,  3, 13],
              [ 5, 11, 10,  8],
              [ 9,  7,  6, 12],
              [ 4, 14, 15,  1]])
 
B = np.array([[3, 4,  9, 4, 3, 6, 6, 2, 3, 4],
              [9, 2, 10, 1, 4, 3, 7, 1, 3, 5]])

Your function should work on general input matrices, not only for the A and B shown here or for matrices with the same dimensions.

Find the transpose of the matrix A and return it in output['A_transpose']. Example result:

>> output['A_transpose']
array([[16,  5,  9,  4],
       [ 2, 11,  7, 14],
       [ 3, 10,  6, 15],
       [13,  8, 12,  1]])

Select the third column of the matrix A and return it in output['A_3rd_col'].
```
>> output['A_3rd_col']
array([[ 3],
       [10],
       [ 6],
       [15]])
 
```
Hint: Don't forget python and numpy use 0-based indexing. Make sure your output dimensions are correct!
Select last two rows from last three columns of the matrix A and return the matrix in output['A_slice'].
```
>> output['A_slice']
array([[ 7,  6, 12],
       [14, 15,  1]])
```
Find all positions in A greater then 3 and increment them by 1. Afterwards add a new column of ones to the matrix (from right). Save the result to output['A_gr_inc'].
```
>> output['A_gr_inc']
array([[17,  2,  3, 14,  1],
       [ 6, 12, 11,  9,  1],
       [10,  8,  7, 13,  1],
       [ 5, 15, 16,  1,  1]])
 
```
Hint: Try > operator on the whole matrix. The output dtype should be the same as the input dtype. Some numpy functions do not make copies of the inputs, but return 'views' of the input arrays instead. Make sure you don't corrupt the other results when computing output['A_gr_inc']
Create matrix C such that $C_{i,j} = \sum_{k=1}^n A\_gr\_inc_{i,k} \cdot (A\_gr\_inc^T)_{k,j}$ and store it in output['C'].
```
>> output['C']
array([[499, 286, 390, 178],
       [286, 383, 351, 396],
       [390, 351, 383, 296],
       [178, 396, 296, 508]])
 
```
Hint: No loops are needed, try it on a paper with a 2×2 matrix.
Compute $\sum_{c=1}^n c \cdot \sum_{r=1}^m A\_gr\_inc_{r,c}$, store in output['A_weighted_col_sum']:
```
>> output['A_weighted_col_sum']
391
 
```
Hint: Use broadcasting of the element-wise multiplication, np.arange, np.expand_dims and np.sum. Finally convert the output to Python float (as indicated in the docstring) by calling float( … ).

Subtract a vector $(4,6)^T$ from all columns of matrix B. Save the result to matrix output['D'].

>> output['D']
array([[-1,  0,  5,  0, -1,  2,  2, -2, -1,  0],
       [ 3, -4,  4, -5, -2, -3,  1, -5, -3, -1]])

Select all column vectors in the matrix D, which have greater euclidean length than the average length of column vectors in D. Store the results in output['D_select']
```
>> output['D_select']
array([[ 0,  5,  0, -2],
       [-4,  4, -5, -5]])
 
```

Simple data task in Python

In this part of the assignment, you are supposed to work with a simple input data which contains images of letters. We will use similar data structures later on during the labs. Do the following:

The following variables are stored in the data_33rpz_basics.npz data file:
- images (3D array of 2000 10×10 grayscale images)
- alphabet (letters contained in the images, not full alphabet is included)
- labels (indexes of the images into Alphabet array).

Load and access them as follows

loaded_data = np.load("data_33rpz_basics.npz")
loaded_data['images']

Have look at the image with the montage function supplied in the template:
```
import matplotlib.pyplot as plt
plt.imshow(montage(images), cmap='gray')
plt.show()
```
Hint: Try to use
```
%matplotlib notebook
```
after importing matplotlib.
For a given letter, compute its mean image. This means taking all images in the dataset displaying that letter, and making pixel-wise mean. Use your name initials (if present in the dataset) and save them as initial1_mean.png and initial2_mean.png (use any letter if any of your initials is not present in the dataset). Round the mean image to integers and return it in the uint8 dtype.
- For the purpose of mean image calculation, complete the function compute_letter_mean:
```
letter_mean = compute_letter_mean(letter_char, alphabet, images, labels)
```
  where letter_char is a character (e.g. 'A', 'B', 'C') representing the letter whose mean we want to compute, alphabet, images and labels are loaded from the provided data, and letter_mean is the resulting mean image.
Compute an image feature x - a single number characterizing an image. It is defined as:
```
x = sum of pixel values in the left half of image - sum of pixel values in the right half of image
```
Then make a histogram of feature values of all images of a letter. Complete a function for the feature histogram computation:
```
lr_histogram = compute_lr_histogram(letter_char, alphabet, images, labels, num_bins)
```
where letter_char is a character representing the letter whose feature histogram we want to compute, alphabet, images and labels are loaded from the provided data, num_bins is the number of histogram bins and lr_histogram is the resulting histogram (num_bins long vector containing counts of items in the corresponding bins).
- For reference the following histogram was computed for letter A with 10 bins:
```
>> compute_lr_histogram('A', alphabet, images, labels, 10)
array([ 1,  1,  3,  6, 12, 27, 24, 20,  5,  1])
 
```
- Hint: use np.histogram function to compute the histogram.
- Plot feature histograms of your initials into one figure to compare them and save the figure as initials_histograms.png.
  - WARNING: make sure you use correct x-axis on the plot. (is it 1-10, or something in orders of 1000s?)
  - Try to make the figure useful - label the axes, give it a title, … Would your grandma (or you two weeks from now) understand what is shown in the figure?
  - Do the histogram plots make sense? Could you recognize the letter only by looking at its lr histogram?
  - hint: use matplotlib bar function.

Table of Contents

Introductory Labs

PYTHON introduction

Matrix manipulation with NumPy

Simple data task in Python