Thursday, May 10, 2007

Priors

So the past two quarters I have been getting a crash course in various machine learning algorithms. I find them unsettling for a couple of reasons.... perhaps I don't understand them well enough yet to appreciate their full glory. But here's my initial reactions.

1) I don't like the word "learning". It makes it sound like it's doing something that it's not. My understanding is that most machine-learning algs. just use some iterative method to find the best function f(inputs) = outputs, such that for a given set of (input,output) pairs, you minimize the error. This is not learning!! Once you have "trained" the algorithm, a.k.a. created a mathematical function, it is done. From then on, for a given input your output will always be f(input).

If I took 10 data points from some experiment and fitted a curve to them, I'd call that 'making a regrssion line' or something like that. I wouldn't call it 'learning my data'. But, that's just me!

2) I don't like priors that much. (If you're trying to find P(A|B), the prior is P(A).)

So, I can see why they are useful, but they don't "feeeeel" good to me. It's like if somebody told you that on the SAT, 80% of the time the correct answer is (d). It is definitely useful information because if you don't know an answer, you'd be stupid not to pick (d). HowEVER, an ideal algorithm (in this case the 'algorithm' being the student's brain) should pick the answer based soley on information from the question, not some knowledge about the distribution of the answers. Clearly when you know nothing about the question, it's best to pick (d). But what if you're 60% sure that it's (a), with the remaining 40% spread evenly over (b)-(e). From a probabalistic perspective, (d) is still the correct choice. But, after reading the question you only gave it a 10% chance of being right, versus 60% for (a). Doesn't that feel bad??

0 comments: