Nowadays, the word "probability" is everywhere in our lives and is used more and more often. This is because it is an increasingly variable world: everything is changing and everything is difficult to determine. Our world can be said to be made up of variables, including many decisive ones. For example, the news says: "At 20:43 Beijing time on November 3, 2016, Long March 5 was successfully launched in Wenchang, Hainan", where the time and place are deterministic variables.
However, there are also many random variables in our lives that are difficult to determine, such as the level of haze tomorrow, or the value of a company's stock, and so on, are uncertain random variables.
Random variables are not expressed in terms of fixed values, but are described in terms of the probability of the occurrence of a certain value. Because there are random variables everywhere, you hear the word "probability" everywhere. You turn on the TV to listen to the weather forecast to see if it will rain today, and the weatherman tells you that the "probability of precipitation" is 90% at 8 o'clock this morning. lottery tickets, but your friend tells you that a fool would go spend the 50 dollars for nothing, because the probability of you winning the lottery is only one in 100 million ......
The word "probability" is so common in life that people probably know what it means without thinking about it, for example, in the last example, the 0.03% probability of malignancy means that "only 3 out of 10,000 such sarcomas will be malignant "? Thus, in the classical sense, probability can then be crudely defined as the frequency of an event, i.e., the ratio of the number of occurrences to the total number of occurrences. More precisely, it is the limit to which this ratio converges as the total number of occurrences tends to infinity.
Although the definition of "probability" is not difficult to understand and seems to be used by everyone, you may not know that the results of probability calculations often defy our intuition and there are many unexplainable and plausible paradoxes in probability theory. You can't trust your intuition completely! Just as a driver in a car has a "blind spot" in his vision that needs a few mirrors to overcome, so too does our thinking process have blind spots that need to be clarified by calculation and reflection. Probability theory is a field where strange conclusions that contradict intuition often occur, and even mathematicians can be wrong if they are not careful. We will now begin with an example of a paradox in classical probability called the "base rate fallacy".
Let's start with a life example.
Wang Hong went to the hospital for laboratory tests to check the possibility that he was suffering from some kind of disease. The result was actually positive, and he was so shocked that he rushed to check online. The information on the Internet says that there are always errors in the test, and that the test has a "1% false positive rate and a 1% false negative rate.
This statement means that among those who get tested, 1% are false negatives and 99% are true positives. And when the test is done in people who do not have the disease, 1% of them are false positive and 99% are true negative. So, based on this interpretation, Wang Hong estimated the likelihood (i.e., probability) that he himself had the disease to be 99%. Wang Hong thought, since there is only a 1% false positive rate and 99% are true positives, the probability that I have been infected with this disease in the population should then be 99%.
However, the doctor told him that his probability of being infected in the general population was only about 0.09 (9%). What is going on here? Where is Wang Hong's misguided thinking?
The doctor said: "99%? 99% is the accuracy of the test, not the probability that you will get the disease. You're forgetting one thing: the normal percentage of people infected with this disease is small, only one in 1,000 people get it."
It turns out that this doctor, in addition to practicing medicine, also loves to study mathematics and often uses probability methods in medicine.
His calculation is basically like this: Because the false positive rate of the test is 1%, 10 out of 1000 people will be reported as "false positive", and according to the proportion of the disease in the population (1/1000=0.1%), there is only 1 true positive, so about 1 out of 11 people who test positive will be true positive ( with the disease), so the probability of Wang Hong being infected is about 1/11, or 0.09 (9%).
Wang Hong thought about it and still felt confused, but the incident inspired Wang Hong to revisit the probability theory he had learned before. After repeatedly reading and pondering the doctor's algorithm, he realized that he had made the mistake called the "fundamental ratio fallacy," i.e., he had forgotten to use the fact that the "fundamental proportion of the disease in the population (1/1000).
When it comes to the fundamental rate fallacy, we would do well to start with the well-known Bayes' theorem in probability theory.
Thomas Bayes (1701-1761) was an English statistician and a former clergyman. Bayes' theorem, his greatest contribution to probability theory and statistics, is the fundamental framework for machine learning commonly used in artificial intelligence today, and its ideas are so profound that they are far beyond the average person's knowledge, and perhaps Bayes himself was under-recognized during his lifetime. Because such an important result was not published during his lifetime, it was published by a friend only after his death in 1763.
Roughly speaking, Bayes' theorem involves the interaction of two random variables A and B. If summarized in one sentence, this theorem says: Using the new information brought by B, how should we modify the "prior probability" P(A) of A when B does not exist, so as to obtain the "conditional probability" P(A|B) when B exists "P(A|B), or the posterior probability, if written as the formula:
Here the definitions of a priori and a posteriori are a convention and are relative. For example, it is also possible to reverse A and B, i.e., how to obtain the "conditional probability" of B, P(B|A), from the prior probability of B, P(B), as indicated by the dashed line in the figure.
Don't be afraid of the formula, with examples we can slowly understand it.
For example, for the previous example of Wang Hong's visit to the doctor, the random variable A means "Wang Hong has a certain disease"; the random variable B means "Wang Hong's test results". The prior probability P(A) refers to the probability of Wang Hong getting the disease in the absence of test results (i.e., the basic probability of the disease among the public is 0.1%); while the conditional probability (or posterior probability) P(A|B) refers to the probability of Wang Hong getting the disease in the condition of "positive test results" (9%). How to correct from basic probability to a posteriori probability? We will explain it later.
Bayes' theorem, a product of the 18th century, worked well for 200 years, but was challenged in the 1970s by the "fundamental ratio fallacy" of Daniel Kahneman and Tversky. The former is an Israeli-American psychologist and winner of the 2002 Nobel Prize in Economics. The Fundamental Ratio Fallacy is not a rejection of Bayes' theorem, but rather an exploration of a puzzling question: Why do people's intuitions often contradict the Bayesian formula? As the example earlier shows, people often ignore the underlying probabilities when using intuition.
In their article "Thinking, Fast and Slow", Kahneman et al. give the example of a cab to inspire people to think about what influences their "decisions". We do not want to go into the implications of the fundamental ratio fallacy for "decision theory" here, but just use this example to deepen our understanding of Bayes' formula.
Suppose a city has two colors of cabs: blue and green (with a market share of 15:85). A cab is hit and run at night, but fortunately there is an eyewitness who identifies the cab as blue. But how credible was his "eyewitness account"? The public safety officers conducted a "blue-green" test on the witness in the same environment: 80% of the cases were identified correctly, 20% were not. Perhaps some readers immediately concluded: the probability of the car being blue should be 80%. If you answer this way, you are making the same mistake as Wang Hong in the above example, ignoring the a priori probability and not considering the basic proportion of "blue-green" cars in the city.
So what exactly should be the probability that the car in question is blue (condition)?
The Bayesian formula gives the correct answer. First we must consider the basic ratio of blue to green cabs (15:85). That is, in the absence of a witness, the probability that the car is blue is only 15%, which is the a priori probability P(A) = 15% for "A = blue car accident". Now, with an eyewitness, the probability of event A is changed. The witness saw the car was "blue". However, his sighting ability is also discounted, with only 80% accuracy, i.e., it is also a random event (denoted as B). Our problem is to find the probability that the car is "really blue" under the condition that the witness "saw the blue car", i.e., the conditional probability P(A|B). The latter should be greater than the a priori probability of 15%, because the witness saw the "blue car". How to correct the a priori probability? You need to calculate P(B|A) and P(B).
Since A=blue car accident and B=blue sighting, P(B|A) is the probability of "blue sighting" under the condition of "blue car accident", i.e. P(B|A) = 80%. Finally, the a priori probability P(B), which is a little more difficult to calculate, P(B) refers to the probability that the witness saw a car blue, equal to the sum of the probability of two cases: one is the car is blue, the identification is correct; the other is the car is green, wrongly seen as blue. So:
From the Bayesian formula:
It can be calculated that the probability of the vehicle being blue in the presence of witnesses is 41%, while it can also be found that the probability of the vehicle being green is 59%. The corrected conditional probability of 41% for "blue" is much higher than the a priori probability of 15%, but still less than the probability of 0.59 for green.
Returning to the example of Wang Hong testing for a certain disease, it is not difficult to arrive at the correct answer:
A: Wang Hong in the general population is infected with a disease
P(A): Probability of contracting a disease in the general population
P(B|A): The percentage of correct positive results
P(A|B): With the condition of positive results, the probability of Wang Hong contracting a certain disease
P(B): Total likelihood of a positive result = true positive in a positive test + true positive in a negative test
Reprinted content represents the views of the author only
Does not represent the position of the Institute of Physics, Chinese Academy of Sciences
For reprinting, please contact the original public website
Recent Top 10 Popular Articles
↓ Click on the title to view ↓