Saturday, December 25, 2021

Are MIT Computer Scientists any good?

I've been watching a lot of lecture videos on ocw.mit.edu while working out lately and decided to watch the course called introduction to computer science. At one point the lecturer is writing a function to search a dictionary and return the value for a key, or zero if the key does not exist. He wrote code to fetch the value from the dictionary and, if the key did not exist, trap the exception and return zero. My gut feeling is that this is an appalling solution and I would make one of my programmers rewrite it if they tried to use it.

Apart from the fact that best practice says exceptions should only be raised for unforeseen problems, the cost of creating the exception is much larger than the cost of checking for the key. But one of my colleagues argued that it might be more efficient if the key was very likely to be in the dictionary. How likely, I wondered?

So I wrote a test in C# (MIT uses Python 2.x!) to see how much slower it is to raise an exception rather than test for the existence of a key. Start a C# console project in Visual Studio and call it DictTestVsException. Here's the code. 

DictTestTwice checks for the existence of a key 2000 times (it fails half the time) and fetches the value when the key exists. This means it executes 3000 key searches. Note that ContainsKey is faster than Key.Contains.

DictTestOnce executes the FirstOrDefault function 2000 times to either return the value or zero. If a key search takes the same time as FirstOrDefault, this should be the fastest algorithm. It's highly contrived and only works because the first entry in the list contains our default value.

DictException assumes the existence of a key and traps the exception when it does not exist. It does 2000 key lookups and raises 1000 exceptions. 

Each of these functions returns the number of ticks it took. The main program prints the results on the console.

using System;
using System.Collections.Generic;
using System.Linq;
 
namespace DictTestVsException
{
    class Program
    {
        static long DictTestTwice()
        {
            Dictionary<intintd = new Dictionary<intint>();
            for (int i = 0; i < 1000; i++) d.Add(i, i * 2);
            DateTime sw = DateTime.Now;
            int v;
 
            for (int i = 0; i < 2000; i++)
                if (d.ContainsKey(i))
                    v = d[i];
                else
                    v = 0;
 
            return (DateTime.Now - sw).Ticks;
        }
 
        static long DictTestOnce()
        {
            Dictionary<intintd = new Dictionary<intint>();
            for (int i = 0; i < 1000; i++) d.Add(i, i * 2);
            DateTime sw = DateTime.Now;
            int v;
 
            for (int i = 0; i < 2000; i++)
                v = d.FirstOrDefault(k => k.Key == i).Value;
 
            return (DateTime.Now - sw).Ticks;
        }
 
        static long DictException()
        {
            Dictionary<intintd = new Dictionary<intint>();
            for (int i = 0; i < 1000; i++) d.Add(i, i * 2);
            DateTime sw = DateTime.Now;
            int v;
 
            for (int i = 0; i < 2000; i++)
            {
                try
                {
                    v = d[i];
                }
                catch (Exception ex)
                {
                    v = 0;
                }
            }
 
            return (DateTime.Now - sw).Ticks;
        }
 
        static void Main(string[] args)
        {
            long dt2 = DictTestTwice();
            long dt1 = DictTestOnce();
            long de = DictException();
 
            Console.WriteLine("DictTestTwice = " + dt2.ToString() + " ticks");
            Console.WriteLine("DictTestOnce = " + dt1.ToString() + " ticks");
            Console.WriteLine("DictException = " + de.ToString() + " ticks");
            Console.WriteLine("Exception is " + (de / dt2).ToString() + " times slower than test twice");
            Console.ReadKey();
        }
    }
}

Your results may vary, but on my computer the cost of creating the exception is between 600 and 800 times greater than the cost of testing the dictionary. This is a pretty strong argument for testing the dictionary rather than trapping the exception. 



Interestingly, DictTestOnce is slower than DictTestTwice. This implies that a key lookup is using a binary search and FirstOrDefault is using a scan. The search, in this scenario, is about five times faster. This makes sense when you realize that FirstOrDefault is a method of IEnumerable which is not guaranteed to be sorted.



No comments:

Post a Comment