Homework 2 hints

Problem description and public data

See also More clarifications at the bottom of the page.

'''
Homework 2
==========
 
The approach to the solution described in this document
is meant to be the very basic and the most straightforward.
It may turn out that there are solutions which concept
is even more simple and easier to grasp.
I would be grateful for any description of such solution.
 
Also, and very probably, there might be different solution ideas
based on other programming concepts. The reader is free to choose
their own path to the solution.
 
Anyway, it is desirable to break down any particular code
into separate meaningful functions and procedures. For example
each of points 1.-4. below may be realized by one function.
 
Data structures
===============
The solution which is outlined below may be implemented
using only lists (1D arrays).
It is the original intention of the problem.
The reader is free to use other types of data structures
if it looks necessary or convenient to him/her.
However, it is always recommended to strive
for minimum number and size of the exploited data structures.
Very often it makes the whole program simpler and easier to debug,
not to speak about its readability.
 
Solution overview
=================
1. Precompute elevation differences.
2. Remove non-camp checkpoints.
3. For each camp X define end camp of a section which starts in X.
4. For each camp X try to start a hike in X, use the results of 3
   to move forward quickly.
 
Solution in more detail
=======================
 
1. Precomputing elevation differences should be easy, in each checkpoint register
   the elevation difference to the next checkpoint, in the same manner as there
   are registered distances and times to the next checkpoint.
 
2. Removing non-camp checkpoints is also simple.
   It is enough to scan the whole trail once, total all values
   (times, distances and elevation differences separately, of course)
   between each pair of two consecutive camp sites and store those values
   in one of the campsites. After that, each camp site will register
   the distance, the time and the elevation difference to the next camp
   site.
   Now, other non-camp checkpoints may be removed from the data entirely.
 
3. For each camp X define the end camp of a section which starts in X.
   The basic idea is straightforward enough:
   Start in camp X, move forward camp by camp and update the total distance
   (and time and elevation difference) from X to the current camp.
   Just before the total exceeds the given limit(s)
   register the current camp as the end of section which starts in X.
 
   Be careful, sometimes the current camp may be simultaneously too close to X
   in one parameter (eg. in distance) and too far away from X in another
   parameter (eg. in time). In such case, no section can start at X.
 
4. For each camp X try to start a hike in X.
   The approach here is very similar to the approach in 3.
   From camp X jump directly to the camp which is the end of the section which
   starts in X. That camp is the start of the next section, jump directly to
   the end of that section and so on. Stop jumping when the total distance
   or elevation difference from X to the end of current section would
   exceed the given hike limit(s).
   If the curent hike totals are better than those of some previous one, update
   the global totals which, at the end, will contain the distance, the
   elevation difference and the number of sections in the best hike.
 
Efficiency
==========
 
If your code feels a bit slow, you should check which part of the code is the most troublesome.
If you hesitate how to measure time you may inspire yourself here 
  https://stackoverflow.com/questions/7370801/measure-time-elapsed-in-python .
 
If the part 3 is the slowest part of the code 
you may consider applying the so-called sliding window technique.
 
The following example explains the idea.
-- Read the comments.
-- Copy the code to your python editor (or an online editor) and run it a few times. 
-- Modify the array and the limit 20 to see the results.
 
-- ( Additional reading  if the example does not satisfy you 
     https://www.geeksforgeeks.org/window-sliding-technique/)
'''    
# .....................................................................................
#
#          SLIDING WINDOW METHOD
#     illustrated on a simple sum problem
#   =======================================
#
#
# In the given array of positive values
# find the longest contiguous subarray
# which sum is less than 20 (or less than given maxSum).
# For example
# array:            [ 10 10 10 5 6 5 1 10 10]
# longest subarray:           [5 6 5 1]
#
#
# The solution method is the sliding window method.
# The window is just a contiguous subarray.
# At the beginning it contains only the first element.
#
# The idea is the following:
# When the sum of the window is too small
#      expand the right end of the window one position to the right
#      so that the sum inside the window is increased.
# When the sum of the window is too big
#      shrink the window by moving its left end (=begin)
#      one position to the right
#      so that the sum inside the window is decreased.
# By repeated application of these two conditions
# the window moves through the array until it finally arrives to the
# end of the array. In the process, it examines all feasible subarrays
# and it skips most subarrays which sums are too big or too small.
#
# The advantage of the method is its speed. In each step, either
# the begin or the end index of the subarray is moved to the right,
# so the total running time of the method is proportional
# to twice the length of the array.
# The running time of a brute force approach which would examine
# each possible subarray is proportional to the square or even
# to the cube of the length of the array.
 
 
def longestSubArr( arr, maxSum ):
 
    # echo input values
    print( arr )
    print("max allowed sum of subarray:", maxSum)
 
    # the window starts as very small, it contains only
    # the first element of the array
 
    begin, end = 0, 0, # begin and end of the sliding window
    currSum = arr[0]   # initial sum in window
 
    # best values were not found yet:
 
    bestBegin, bestEnd, bestSum = -1, -1, -1
 
    # run sliding window
 
    while end < len(arr):
 
        # echo current status
 
        print("sum from", begin, "to", end, "is", currSum )
 
        # register the best result when it is encountered
 
        if currSum < maxSum  and   end - begin > bestEnd - bestBegin:
            bestBegin, bestEnd, bestSum = begin, end, currSum
 
        # try to expand the window to the right
 
        if currSum < maxSum:
            end += 1
            if end < len(arr): currSum += arr[end]
 
        # else shrink the window from the left
 
        else:
            currSum -= arr[begin]
            begin += 1
 
    return bestBegin, bestEnd, bestSum
 
 
# ------------ MAIN PROGRAM ------------------------------
#
# do experiment with the data and run the program repeatedly
 
arr = [2, 3, 7, 2, 6, 3, 4, 8, 1, 2, 6, 1, 4, 2, 7, 6]
 
bestBegin, bestEnd, bestSum = longestSubArr( arr, 20 )
 
print("Longest subarray is from", bestBegin, "to", bestEnd, ", with sum", bestSum )
 
# Check your understanding of the code by explaining why
# the program does not display the sum of subarray from 13 to 15

More clarifications

Reactions to students' inquiries:

>>  In the hints for the second homework I found it hard to understand the 3rd step
>>  (For each camp X define end camp of a section which starts in X).

We do not know, in which camp the best hike will start.
So, at the beginning, we expect that it can start it any camp.
Therefore, we have to precompute for each camp the possibility that
a section of the hike starts in that particular camp.
So this is what the 3rd step does.

Therefore, thanks to completed step 3, before any hiking starts, 
we will be able to tell the hikers:

"If you start in a camp Z, your first section will end in camp Y.
Your next section will begin in camp Y and end in camp X. and so on, till the end of your hike.
So, when you tell us where you are going to start your hike, we will be able to describe
easily your whole itinerary, because we already have precomputed then end of each section,
wherever that section starts."

Now, how the step 3 does work:
Easily and mechanically. For each camp Z it starts at Z and simulates the
hikers' movement forward to the next camp, to the following next camp etc. until the
total of the the distance or total of the time or total of the elevation difference 
exceeds the prescribed limit.
Then the very last previous camp is the end of the section which starts in Z.

The code below the description is intended to help people understand the idea
of a fast solution of step 3. Maybe, running and experimenting with the code
can help you too?

>> So the program will get slower because on
>> every “slide of the window” you should check the inside sections.

Not exactly. On every slide of the window we do not check its interior.
This is the main trick of the method. Each time we shift either the left
or the right end of the window. We keep a sum of values which are inside the window.
When we shift the left end of the window we subtract from the sum
the values which are just at the left end of the window.
So it is as if these walues have just "dropped out of the window"
because the left end of the window was just shifted past them to the right.
 When we shift the right end of the window we add to the sum
the values which are just at the right end of the window.
So it is as if these walues have just "entered the window"
because the right end of the window was just shifted past them to the right.

>> ... I don't see the real use of the sliding window method for step 3. ...

The sliding window method is completely useless in situation
when we are looking for the end of the section of *only one* start camp.
In such situation it would be best to use the direct intuitive approach,
start in the given camp, move forward to the next camps one by one and stop
when some of the totals exceeds the given section limits.
Sliding window would only complicate the task.

However, when we have to find the section ends for more camps (in fact, for each camp)
there are two possibilities.
The first one is to apply the method from the previous paragraph
separately on each start camp. That would work and it is the method I describe in
step 3 of the online hint.
The second possibility is to apply the sliding window.
Imagine the first section is from camp 0 to camp 5.  It does not matter how we found it.
Now, when we look for the end of the section which starts in 1 (in 1, not in 5)
the sliding window kicks in. It says:
"I know that the section starting in 0 ends in 5. Therefore I also know
that the section starting in 1 cannot end in any of the camps earlier then 5, that is
it cannot end in 1, 2, 3, 4."
Why? Because the of the given limits. Camp 5 is the farthest camp from 0 within the given limits.
So if we start in 1, which comes *after* 0, then surely 5 will still be in the given limits.
So we do not have to investigate camps 1,2,3,4 as possible ends of the section which starts in 1.
We can jump directly to camp 6 and check if it is within limits. If it is, move to camp no 7, 8, etc
until we run out of limit. If camp 6 is not within limits from 1, then register
camp 5 as the end of the section starting in camp 1.
(In the same manner we will then find the end of the section which starts in camp 2, etc.)