Sometimes, figuring out something deterministically is hard. Sometimes, it's just plain impossible.

Let's say we have a model and we want to make sure that model is correct. The best way to do that is to mathematically prove that it is correct. This works well for simple model, but when you have insanely completed models it become a huge pain to try to prove it right. What can you do to validate it? We can use something called a Monte Carlo simulation. This is also known as statistical sampling.

Here's how wikipedia describe the Monte Carlo method:

- Define a domain of possible inputs.
- Generate inputs randomly from the domain, and perform a deterministic computation on them.
- Aggregate the results of the individual computations into the final result.

I'll use the simplest of example to show how this can be done. I want to write a test to know if a number is even or odd (told you it was going to be simple). I'll write some code to test it:

def is_even(number):

if number % 2 == 0:

return True

else:

return False

This is Python code by the way, but it should be simple enough even for non programmer. It's a function called is_even that takes an integer and returns true if the number modulo 2 is 0 and false otherwise.

Let's run a monte carlo simulator on that function to make sure it's right. We'll generate n random integer between minus one billion and a billion and we expect that around 50% of them will be even. We'll set n at a reasonable 100 000 runs. The higher the number of runs, the more accurate the result will be, or rather the higher the confidence in the results will be. It should look like this:

import random

n = 100000

number_even = 0

for i in range(n):

random_number = random.randint(-1000000000, 1000000000)

if is_even(random_number):

number_even = number_even + 1

print '%f%% of numbers are even' % (float(number_even) / float(n))

When I run this I get this message back:

0.501340% of numbers are even

And without any kind of mathematical proof we have validated the model. Though it's important to realize I haven't actually proved anything, I've simply shown that with a high probability (I'm too lazy here but we could actually find the confidence interval relatively easily) the model acts a certain way.

This example was, of course, way too trivial. But this kind of reasoning is very useful when a system has too many different components which makes it difficult to accurately predict, such as a traffic flow system (think about elevator planning for example).

## 2 comments:

To be statistically stuck up about it, you'd probably have to do a binomial test (or variant thereof) to prove that the result indicates real equal probability of outcome.

I realize afterward that this example wasn't very good to be honest, as I could've returned the wrong result all the time and the result would've been the same.

It's just a simple example though, the first one I could come up with that can be entirely understand easily in one post.

Post a Comment