Home > Uncategorized > Out of sight, out of mind

Out of sight, out of mind

To get my head back into a coding mindset after a weekend playing games and lounging at my grandparents’ house, I decided to crack a Project Euler problem I’ve skipped over a few times.

The sequence of triangle numbers is generated by adding the natural numbers. So the 7th triangle number would be 1 + 2 + 3 + 4 + 5 + 6 + 7 = 28. The first ten terms would be:

1, 3, 6, 10, 15, 21, 28, 36, 45, 55, …

Let us list the factors of the first seven triangle numbers:

1: 1
3: 1,3
6: 1,2,3,6
10: 1,2,5,10
15: 1,3,5,15
21: 1,3,7,21
28: 1,2,4,7,14,28

We can see that 28 is the first triangle number to have over five divisors.

What is the value of the first triangle number to have over five hundred divisors?

Okay, so finding the actual right number is pretty easy: jump around the number line until you find the right one. What we need before we can do that is some way to find all the divisors for an arbitrary number. And before we can do that, we need a way to generate arbitrary lists of prime numbers.

Sieve of Eratosthenes, go!

Looking around my hard drive, I found an old version in Python (which, I reckon, is an actual Sieve and not an ersatz imitation like in the past):

def sieve(max):
    primes = range(2, max + 1)  #Dimension an array of all integers < max
    length = max - 1
    index = 0
    #LOOP 1
    while index < length:   #Iterate through the whole array
        currentVal = primes[index]
        counter = index + currentVal
        while counter < length:     #Jump through the list and set every
            primes[counter] = 0     #of currentVal to 0
            counter += currentVal
        index += 1
        #Find the next number that hasn't been zeroed
        while ((index < length) and (primes[index] == 0)):
            index += 1

   index = 0
   #LOOP 2
   while index < len(primes):
       if primes[index] == 0:
           del(primes[index])
       else:
           index += 1

    return primes

Okay, go through all the numbers between 2 and the max for the Sieve and zero out all the composites, then delete the zeroes. Simple, right? That’s what I thought. Just to give it a run around the block, I tried the normal sequence: sieve to 100, then 1,000, then 10,000, then 100,000, and it wasn’t until 1,000,000 that I noticed a problem: it was taking way too long. I knew it was possible to write a Sieve to quickly get up to at least 2 million, so choking at 1 million seemed silly.

Adding a few print statements in to Loop 1, I found that it was zipping through there, and it was Loop 2 that was taking the vast majority of execution time. How can that be? It’s such a simple loop: almost no control logic, just testing for whether an item is zero or not.

Then I realized that my logic wasn’t very complicated, but the del() operation, to delete an item from an array, is a very expensive operation, given that deleting items from arrays requires quite a lot of work under the covers.

Also, using the pi function, we know that there are approximately 78,498 primes less than 1 million, meaning that when Loop 2 begins, there are about 1,000,000 – 78,498 = 921,502 zeroes in the array. That’s nine hundred and twenty-one thousand del() operations. No wonder it was taking so long.

So, instead of deleting the bad numbers (the zeroes), let’s just extract all the good numbers (the non-zero, prime numbers), by replacing Loop 2 with the following:

 toReturn = []
    for item in primes:
        if item != 0:
            toReturn.append(item)

And then return toReturn instead of primes.

The moral of the story here is that it is always important to know what’s going on under the covers of any programming language, even one like Python that makes it too easy to ignore simple things like the difference in execution time between two seemingly equivalent solutions.

Of course, this is not a new problem by any means:

Java is an awesome language because you get to ignore hard stuff like memory allocation. Write once, run anywhere. Sweet, where do I sign up?

The privilege of not having to manage memory comes at a cost: you aren’t allowed to question how the JVM works. Move along, coder. Keep making those objects. Don’t ask how much memory things take up. In fact, to keep you from getting curious, we’re not even going to have a sizeof function.

And this, of course, is why the computer science curriculum at NC State does (and should) start with Java and then move to C, and then eventually in to the dreaded assembly. Because at some point in a student’s professional career, he will probably need to figure out where all of his bytes are going, and it would be nice to have been taught how to figure it out.

Advertisements
Categories: Uncategorized
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: