Pīkau 18: Comparing algorithms

Introduction

Why this matters

Computers are fast, but if you get them to solve a problem the wrong way, they can still take way too long, and this also means they’re using more power (or battery life) than they need to. Even worse, you might think that if we give a computer the right data then it can find the answer to any question we ask about that data. But some problems take way too long for a computer to find the solution - potentially hundreds, or even millions, of years. Programmers need to know when they’re dealing with this kind of problem, and choose algorithms that will produce a result in a reasonable amount of time. If no suitable algorithm can be found, they need to make do with a ‘good enough’ solution. For example, this happens when finding the ideal route to visit a set of destinations.

Links to existing knowledge

You might already know some of this.

If students are picking up certificates, is it a good idea to sort the certificates into alphabetical order first, or is it pretty much as fast just to look for each one in a randomly ordered pile?

Is it quicker to go through all the laundry taking out all the socks, then go through taking out all the shirts and so on, or is it better to go through putting each item into a pile as it is picked up? Is there a way to work out how efficient these two algorithms are without actually trying them out?

What if a postie had all the letters out of order for a street, and went up and down the street delivering them in the random order they were piled up, going to house number 45, then number 3, then 56, then 14, and so on? This would achieve the correct result, but how would you do it more efficiently?

There are probably many chores you do that you have found the optimal method (algorithm) for in your life, avoiding “busy” work that doesn’t improve the final result. Here we will look at how that translates into the programming world.

Surprising algorithms

Watch this video to see Tim and Joanne use two different searching algorithms (sequential search and binary search) to solve the same problem, and then analyze the results with an interesting conclusion.

As we saw, different algorithms can be used on the same problem with very different results.

If you’d like to see another example contrasting these two algorithms, watch the video below.

Exploring sorting algorithm performance

Here Tim and Joanne look at a sorting problem and two algorithms that can be used to solve it. Once again, they show the results in a graph that highlights the differences in efficiency between the two algorithms. The approach in this video builds on the method of using balance scales to explore sorting algorithms that was used in Pīkau 10 - Getting programs right: the end-user and fast algorithms.

This video showed how two algorithms, solving the exact same problem (sorting items), can give very different results. Using a graph to analyze the results emphasises this difference.

Although we’ve focussed on sorting and searching, there are thousands of other purposes that people have designed algorithms for.

These include:

matching data for similarity (such as DNA tests)
cryptography (to encode and decode data)
compression (to reduce the space that data takes up)
scheduling (such as working out which tasks on a large job need to be done first)
playing games (such as chess)
recognising objects in images (such as faces).

Are we there yet?

When you were trying out the sorting comparison programs, if you type in a large number you might be wondering if it will ever finish. You could try typing in 1000000 (one million) for the number of keys when you run the program!

After a few minutes you might be wondering if you should wait for it to finish, or if it will take way too long. If you stop the program, perhaps you will have stopped it just before it finished! In fact, this program is likely to take the best part of 24 hours to sort a million keys. And if you had 10 million keys, it would take 100 times as long - about 3 months.

Being able to make these estimates can save a lot of confusion when dealing with slow programs, and can guide programmers on whether or not they should even write the program based on a particular algorithm.

It gets worse

In this video Tim explores some real life problems that we are yet to find fast enough algorithms (solutions) for, even though a lot of effort has gone into doing so and the payoff would be large (there’s even a million dollar prize involved!).

As you saw, there is a surprising number of problems computer scientists have not been able to find suitable algorithms for, and there are many problems that are thought to be “intractable” (take way too long to solve), even with the ever increasing capabilities of technology. This is where a programmer needs to recognise when their program will take an impossibly long amount of time, and consider making do with ‘good enough’ solutions - they might not be the optimal answer, but at least we get a timely result.

You can try out the Travelling Salesperson Problem (TSP) online solver that was demonstrated in the video at CS Field Guide - City Trip.

This problem needs to be solved by anyone doing many drop-offs, such as courier deliveries or vending machine reloading (or in the original form, a salesperson visiting a number of cities).

click on “Add city"
choose about 7 cities
and then click on “Start” to get it to try all possible routes around those cities.

For a small number of cities it should be quite quick, but notice how the time needed grows exponentially as you add more cities. For 25 cities, it will eventually find the best route, but you might not want to wait around for the result.

There are many problems that we only have found exponential time algorithms for - so far! Some of the key ones are referred to as “NP-complete” - the word “complete” means that a solution for any one of them could be adapted for any of the others, so they all stand or fall together.

There’s a list of hundreds of these problems here: List of NP-complete problems

What is "intractable"?

“Intractable” is a technical term for problems that can be solved by a computer program, but take so long that it’s not going to be worth waiting for. Usually the term is applied when the best known algorithms take an exponential amount of time, such as taking 2n steps to solve a problem with n items to process. (2n is the number 2 multiplied by itself; for example, 25 is 2x2x2x2x2, which is just 32 steps, and would happen in the blink of an eye; but if there are 60 items to process, 260 is over a million million million steps, which is likely to take a lifetime, even on a very fast computer.)

Currently all the NP-complete problems mentioned above are intractable, but maybe one of your students will end up discovering a tractable (fast) algorithm for them (in which case you should ask for a share of the million dollar prize). The existence of so many intractable problems makes us realise that computers don’t have the answer to everything, and also that there are still many mysteries in the world of digital technologies.

Here’s an important intractable problem: can you find two numbers that multiply together to give 21 (other than 1 and 21)? How about two numbers that multiply to give 32,231*? 52,937,153**? Finding prime factors of large numbers like this is currently intractable, and a tractable solution to this could be used to crack a lot of computer security systems, so it’s not all bad that there are some problems that we don’t have a good solution for!

*167 x 193 in case you're wondering

** 6883 x 7691

How efficiently can you pack a bin?

Here’s another example of an intractable problem. Have a go at packing bins yourself with this ever changing interactive - you should minimise the number of bins needed. Beware though, it can be addictive!

This interactive can be found here: CS Field Guide - Bin packing

For small versions of the problem like the one above, it’s not hard to solve it on a computer, but as the number of items to pack grows, the best known algorithms take an exponentially increasing amount of time to solve the problem. If you’re working for a courier company with thousands of packages and hundreds of containers, it’s not going to be feasible to find the perfect way to pack the “bins” (in the case the bins might be containers, or even trucks or planes), so we have to settle for solutions that are 'good enough'.

What do students need to know?

Progress Outcome 4 says that students “... evaluate the efficiency of algorithms, recognising that computers need to search and sort large amounts of data.” The kinds of experiments shown in the above videos showed two ways that students can evaluate the efficiency of algorithms (reasoning about searching with the cups, and running a program that tries out two different sorting algorithms).

Relevant progress outcomes continue into NCEA. Progress Outcome 6 states that in “... authentic contexts and taking account of end-users, students determine and compare the “cost” (computational complexity) of two iterative algorithms for the same problem size.” The term “computational complexity” usually just means the time taken by an algorithm, and the “cost” can be measured in seconds (or hours, or centuries!), or in terms of the number of steps needed, such as the number of comparisons made by a sorting algorithm. This was done in the videos above where the two algorithms compared were sequential search and binary search; it was also done by comparing selection sort with quicksort. If students want to explore other sorting algorithms, other slower sorting algorithms that could be used are insertion sort and bubblesort; and they could be contrasted with the most common fast algorithms, which are quicksort and mergesort. By the time we consider things like what the worst case and average cases are for these, there are a lot of different ways they can be compared. As an aside, bubblesort is regarded as a particularly poor algorithm that isn’t very useful, but because it has such a memorable name, it’s often the first one that students remember!

These issues are covered in even more detail in Progress Outcome 7, which says that students “... discuss the purpose of a selection of data structures and evaluate their use in terms of trade-offs between performance and storage requirements and their suitability for different algorithms.” This is where they are considering how the way data is stored (such as using a sorted list) can affect the kinds of algorithms that are possible.

For more information

There are a number of relevant CS Unplugged activities on searching and sorting.

The Computer Science Field Guide has a whole chapter on algorithms covering these ideas in more detail (aimed at NCEA level students).

If you’re interested in the limits of computing, the author David Harel has a book on this aimed at informed lay-people, called “Computers Ltd: What They Really Can't Do”.

Quiz - Know your algorithms

Activity Q1 - Know your algorithms quiz

Instructions: If you did a sequential search of 2000 items, what is the worst case (maximum) number of keys you’d need to check?

Feedback: 2000 is correct.

Activity Q2 - Knowing your algorithms quiz

Instructions: If you are searching a million items that are in sorted order, which of the following statements is true?

Feedback: The correct answer is On average, binary search will be many times faster than sequential search.

Activity Q3 - Know your algorithms quiz

Instructions: Which of the following statements is true?

Feedback: The true statement is "There are problems that we don’t know of any fast algorithms to solve".

Link to programme design

Technological practice

Anticipating which parts of a program will run slowly, and being aware of the best possible algorithms to use in common situations, is a fundamental skill that software developers need to have. As students work with larger amounts of data they will likely encounter situations where a bad decision about an algorithm makes the software impossibly slow to use. This can be a good learning experience as long as they are encouraged to explore what is causing the issue, and how better algorithms could be used to address it. To connect this practice to the NZC in your programme, refer to Outcome Development and Evaluation.

Technological knowledge

Many of the most popular apps and digital systems rely on searching. Although search engines are an obvious application, the “Edit” menu of a lot of software will have a “Find” command that relies on a searching algorithm, and every time a customer logs in or swipes a card on a digital system, a search is made to look up their details to allow access, and then to personalise their experience. Beyond everyday applications, whole companies have been founded on particular algorithms, such as finding optimal routes on a map, finding potential friends on a social network, recognising faces in scanned images, or detecting patterns in shopper activity to make buying recommendations. Students won’t have encountered all these algorithms, but they need to know that there are thousands of algorithms that have been invented, and be able to take advantage of these to “build on the shoulders of giants” rather than “reinvent the wheel”. To incorporate this knowledge refer to Technological Systems.

Nature of technology

It may come as a surprise to students that there are many things we don’t know about algorithms. There are hundreds of problems for which all known algorithms are intractable, yet no-one has proved that we won’t find good solutions to these problems. This is a conundrum that has puzzled researchers for decades, and it is one of several reasons that the exponential increase in computing speed doesn’t mean that one day we might expect to be able to make these algorithms fast enough to be useful. Exploring these limitations of what computers can do helps students to think about the future of computing in an informed way. Perhaps quantum computing will break through these problems, or perhaps someone will find a fast algorithm for conventional computers. There’s so much that we don’t know about what computers can do. Refer to Characteristics of Technological Outcomes for more information.

Algorithms have a direct impact on the environment. An important reason that fast algorithms are desirable is that they use less electrical energy because the computation finishes faster. For large companies that operate data warehouses with millions of computers, a small speed-up in an algorithm could save huge amounts of energy, which in turn reduces their impact on the environment. It would be an interesting exercise to estimate the impact on the environment (energy consumption) for one search on a search engine or to make a posting on a social network. One positive impact of algorithms is helping vehicles to follow shorter routes, saving fuel and reducing pollution. This then raises the interesting question of whether the energy used to calculate a shorter route is less than the energy saved by the vehicle that follows that shorter route. Investigating the ways that algorithms can both help and harm the environment can follow many avenues!

Estimating the speed of algorithms can provide rich material for statistics investigations. When talking about the “average” number of comparisons that sequential search makes, a more accurate estimate can be made by assuming that the key is equally likely to be at any place in the list, and working out the average over all these possibilities. A related option is to do a statistical experiment, measure how long it takes in a large number of cases, and plotting or averaging the times taken. The average case for an algorithm is a pragmatic measure of how useful it will be and provides great material for statistical analysis.

Wrapping up and where to next?

As we have seen there are many problems that computers can solve, and often there are a number of different algorithms to choose from to solve them. Yet for some problems we are yet to find a useful algorithm to solve them, and we need to know when to make the call to stop an unfinished run of an algorithm when it reaches a ‘good enough’ solution. Perhaps we could design algorithms that design new algorithms for us? Actually, this has been done, but only with very limited application.

There’s still so much to learn about how to get the most out of these devices that we call computers. But in the meantime, there are many fascinating algorithms to explore that help us in many different aspects of everyday life.

Facilitation notes

If you are working through this pīkau as a group feel free to download and use these facilitation notes:

Facilitation notes - Comparing algorithms