andymatuschak.org: Square Signals

Slowly winning an epic war against a fiendish army of sine waves.

Los Lonely Boys ringtonesBucky Covington ringtones

Writing Code and Winning the NetFlix Prize

I whine a lot about my time at Caltech, but I’m actually taking a really neat course now which might as well be called “Win the NetFlix Prize—the Class.” If you’re unfamiliar with the prize, it’s $1,000,000 for improving upon NetFlix’s movie recommendation algorithm by 10%.

Not that we’re going to win or anything, but the performance and optimization implications of trying to solve this problem have been absolutely fascinating. A few details on the datasets that need to be analyzed:

  • 17,700 movies
  • 480,000 users
  • 100,000,000 ratings
  • 2 GB of data (700 MB stored optimally)
  • Density of the movie-user ratings matrix: 1.18%!
  • If computing the similarity between two movies takes 0.01 seconds, it takes 18 days to finish computing all pairs!

So that’s a little intimidating. There are a couple golden rules on performance that I’ve always been careful to follow:

  1. Code first; optimize later.

    Don’t write your code thinking about every little performance hit. Make it work. If you focus on making it fast in the first iteration, you’ll never finish the first iteration, and your code will be really ugly.

  2. Don’t optimize without knowing what’s slow.

    A lot of the time I think I know what’s slow, but half the time, I’m wrong. Shark knows what’s slow. Instruments knows what’s slow. Make sure you ask before diving in.

Golden Rules Aren’t Always Right

The really great thing about working on the NetFlix Prize is that basically all my software engineering assumptions and practices have had to go totally out the window.

The first observation, given these numbers, is that Ruby is not going to work even the tiniest bit. Which is really sad since it’d make life a lot easier. I’m using Python instead, with Numpy for computation. Most of the heavy lifting, then, is in C.

The fascinating thing about working on these algorithms is that I’m writing code that would normally work just fine: most of it’s O(n) or O(log n). But now I’m finding for the first time that the constant coefficient is actually mattering because it’s being executed a hundred million times. Every line of code has to be carefully scrutinized for speed considerations, and every loop is the potential for absolute disaster.

Last night I found myself writing a very simple regularization algorithm. The naïve implementation in Ruby was about eight lines. But the implementation I have now is still too slow by an order of magnitude and took more than four hours to write.

Moral of the story? If you’re a consumer software developer, you should probably be following the standard software engineering “golden rules” for optimization. But maybe all of us should work on a project at least once that gives us some appreciation for what Google has to go through to deliver us search results in a couple hundred milliseconds.

Is that a laptop in your pocket, or…?

There’s been a lot of whining about the MacBook Air. Some of it may be justified—I don’t know, I still pre-ordered one—but there’s one point of contention in particular that confuses me.

It’s the same size as the MacBook! Only thinner! And lighter! This isn’t a real subnotebook!

Okay. Imagine if it were 9 inches across instead of 12. What would this do for you?

Would it be easier to transport? It’s still not going to fit in your pocket, unless you’re a scary kind of guy who wears ginormous trenchcoats. You’re still going to need to put it in a bag to move it around. It’s basically the size of a notebook or normal paper, so it’s hardly going to be the limiting factor in bag size.

Would it be easier to use? Yes, if a smaller keyboard makes it easier to use. I don’t know about you guys, but I have full-sized fingers, and they like a full-sized keyboard. And the screen resolution on 12″ PowerBooks makes me cry.

Would it be more powerful somehow? Come on. The motherboard is already the size of two playing cards.

Honestly, I don’t understand the little-tiny-laptop thing. Devices are either pocket-sized or bag-sized. Anything in between is really just bag-sized pretending to be… I’m not even sure what. Levitation-sized? That would be pretty awesome, but unlikely until the MacBook Lighter-than-Air.

  • Bob Warwick has an excellent follow-up to my post last night.

    He also makes a very important point:

    Unless you’re a member of the small demographic that thinks “There’s gotta be something better than Safari for news feeds”, you’ll never discover NetNewsWire.

    You should care about this not just because it means a better experience for users. You should care because it means more customers for you.

    (0)

Anything Ubuntu can do, we can do better. Except this stuff.

When I argue with my Linux-loving friends about OS X vs. Ubuntu, they’ve only really got a couple points for which I have no retorts. But they’re big ones. Let’s think about how to fix them.

Why can’t I watch this video?

On Ubuntu, if you open an .avi for which you don’t have the codec, it will fetch it and install it.

On Mac OS X, if you open an .avi for which you don’t have the codec, QuickTime will give you an entirely useless error message. It won’t even tell you what codec you’re missing, so you can’t Google it.

continued…

If you're looking for something specific then give the search form below a try:

RSS Wordpress Grady (theme) Return to the Top ↑