3 minute read

Probability and statistics riddles

What I did

- I've continued to work my way through some riddles and probability puzzles from a Quant interview prep book and a data science interview prep book.

What I learned

  • Recall the poisson distribution, which is used for calculating the probability of certain events occurring based on previous expected values. For example: sometimes

    If, on average, 2 customers come every 3 minutes, what’s the probability that 5 or fewer customers come in the next 9 minutes?

    \[P(x \leq 5 \mid \lambda = 6) = \sum_{x=0}^5 \frac{\lambda^x e^{-\lambda}}{x!}\]
    • Note, \(\lambda = 6\) because if you get 2 customers in 3 minutes, you’d expect to get 6 customers in 9 minutes.

    If there’s a 80% probability of seeing a comet in one hour, what’s the probability of seeing one in 15 minutes?

    • If there’s an 80% probability of seeing one, then there’s a 20% chance of not seeing one. Using Poisson, we can say that the probability of not seeing a comet is:
    \[e^{-\lambda} = .20\]
    • Thus the probability of not seeing one in 15 minutes (1/4 of the time) is:
    \[e^{-\lambda} = .20^{\frac{1}{4}} = .66\]
    • This means that there is a 1 - .66 = 44% chance of seeing a comet in a 15 minute period.

    Try going the other way, if there’s a 20% chance of seeing a comet in 15 minutes, what’s the probability of seeing one in an hour?

    \[\begin{aligned} e^{-\lambda} =& .80 \\ e^{-4\lambda} =& .80^4 \\ \rightarrow \text{Probability of seeing comet}& \text{ in one hour} = 1 - .80^4 = 59\% \end{aligned}\]
  • Here’s another probability one:

    Your hash function assigns each object to a number between 1:10, each with equal probability. With 10 objects, what is the probability of a hash collision?

    • It's easier to calculate the compliment and subtract it from 1. The probability of getting a collision is the number of distinct arrangement of 10 numbers (i.e. 10!) divided by the total number of possibilities with replacements. You can think of these numbers as spot one has 10 options, spot 2 has 9, spot 3 has 8 .... and when you use replacement, you always have 10 options for each position. Thus, the
    \[P(\text{collision}) = 1 - \frac{10!}{10^{10}}\]

    What is the expected number of hash collisions?

    • The expectation value is the the sum of the each outcome times its probability. In this case, again, it's easier to try and reframe the problem. If there is a hash collision, then that means another "bucket" is left empty. Consider "bucket 1." The expected number of values in "bucket 1" is 1 - probability that none are in bucket 1. The second term - the probability that none get hashed to bucket 1 - is the probability that 1 - the probability that a single event gets hashed to B1, and the whole thing is raised to the tenth because there are ten chances for it to *not* be hashed to B1, each with probability (1-1/10).
    \[\mathbb{E}[B1] = 1 - \left(1 - \frac{1}{10}\right)^{10} = .65\]
    But this is only the expected number of collisions for B1. There are 9 other buckets. So, what's the expected number of collisions for all the buckets? Well, it's just ten times that. We can do this because we're assuming all the expectation values are based on the number of empty buckets which is an iid variable.
    \[\mathbb{E}[\text{\# of collisions}] = 10\left(1 - \left(1 - \frac{1}{10}\right)^{10}\right) = 3.49\]

    What is the expected number of hashes that are unused.

    • Well we know that for every collision, there is an unused hash. Thus, if the expected number of collisions is 3.49, then there must be 3.49 unused hashes.
  • Decision trees and other nonlinear models tend to have high variance. Pruning trees is a is used to help with this. Also, decision trees are a non-parametric model because they depend on the amount of data and don't make a preset determination about the number of parameters. Typically, a large dataset will have a larger tree than one with sparse data.

What I will do next

  • Upon some reflection, this section has become increasingly vague. I've also started vacillating slightly on what which jobs to apply for and whether I want to really focus my expertise on natural language processing, or if I want to switch my focus on time series analysis and predicting stochastic processes.
  • I’ve pushed off the Bayes course too much. I only have one week left to finish. I will finish the course next week.