How to calculate the level of statistical significance. Explain what a level of statistical significance is

Sample distribution parameters determined by a series of measurements are random variables, therefore, their deviations from the general parameters will also be random. The assessment of these deviations is probabilistic in nature - in statistical analysis, one can only indicate the probability of a particular error.

Let for the general parameter a derived from experience unbiased estimate a*. We assign a sufficiently large probability b (such that an event with probability b can be considered practically certain) and find such a value e b = f(b) for which

The range of practically possible values ​​of the error that occurs when replacing a on the a* , will be ±e b . Errors that are large in absolute value will appear only with a small probability.

called significance level. Otherwise, expression (4.1) can be interpreted as the probability that the true value of the parameter a lies within

. (4.3)

The probability b is called confidence level and characterizes the reliability of the obtained estimate. Interval I b= a* ± e b is called confidence interval. Interval boundaries a¢ = a* - e b and a¢¢ = a* + e b are called trust boundaries. The confidence interval at a given confidence level determines the accuracy of the estimate. The value of the confidence interval depends on the confidence level with which the parameter is guaranteed to be found a inside the confidence interval: the larger the value of b, the larger the interval I b (and the value of e b). An increase in the number of experiments is manifested in a reduction in the confidence interval with a constant confidence probability or in an increase in the confidence probability while maintaining the confidence interval.

In practice, one usually fixes the value of the confidence probability (0.9; 0.95 or 0.99) and then determines the confidence interval of the result I b. When constructing a confidence interval, the problem of absolute deviation is solved:

Thus, if the distribution law of the estimate was known a* , the problem of determining the confidence interval would be solved simply. Consider the construction of a confidence interval for the mathematical expectation of a normally distributed random variable X with a known general standard s over a sample size n. Best Bound for Expectation m is the sample mean with the standard deviation of the mean

.

Using the Laplace function, we get

. (4.5)

Given the confidence probability b, we determine the value from the table of the Laplace function (Appendix 1) . Then the confidence interval for the mathematical expectation takes the form

. (4.7)

From (4.7) it can be seen that the decrease in the confidence interval is inversely proportional to the square root of the number of experiments.

Knowing the general variance allows us to estimate the mathematical expectation even for one observation. If for a normally distributed random variable X as a result of the experiment, the value X 1 , then the confidence interval for the mathematical expectation for the chosen b has the form

where U 1-p/2 - quantile of the standard normal distribution (Appendix 2).

Grade distribution law a* depends on the distribution law of the quantity X and, in particular, on the parameter itself a. To get around this difficulty, two methods are used in mathematical statistics:

1) approximate - at n³ 50 replace the unknown parameters in the expression for e b with their estimates, for example:

2) from a random variable a* go to another random variable Q * , the distribution law of which does not depend on the estimated parameter a, but depends only on the sample size. n and on the type of distribution law of the quantity X. Quantities of this kind have been studied in most detail for the normal distribution of random variables. Symmetric quantiles are usually used as confidence limits for Q¢ and Q¢¢

, (4.9)

or taking into account (4.2)

. (4.10)

4.2. Testing statistical hypotheses, significance tests,

errors of the first and second kind.

Under statistical hypotheses some assumptions about the distributions of the general population of one or another random variable are understood. Hypothesis testing is understood as a comparison of some statistical indicators, verification criteria (significance criteria) computed from the sample, with their values ​​determined under the assumption that the given hypothesis is true. When testing hypotheses, some hypothesis is usually tested. H 0 compared to alternative hypothesis H 1 .

To decide whether to accept or reject a hypothesis, the significance level is given R. The most commonly used significance levels are 0.10, 0.05, and 0.01. According to this probability, using the hypothesis about the distribution of the estimate Q * (significance criterion), quantile confidence limits are found, as a rule, symmetrical Q p/2 and Q 1- p/2 . Q numbers p/2 and Q 1- p/2 are called critical values ​​of the hypothesis; Q values ​​*< Qp/2 and Q * > Q 1- p/2 form a critical


the area of ​​the hypothesis (or the area of ​​non-acceptance of the hypothesis) (Fig. 12).

Rice. 12. Critical area Rice. 13. Checking statistical

hypotheses. hypotheses.

If Q 0 found in the sample falls between Q p/2 and Q 1- p/2 , then the hypothesis admits such a value as random and therefore there are no grounds for rejecting it. If the value of Q 0 falls into the critical region, then according to this hypothesis, it is practically impossible. But since it appeared, the hypothesis itself is rejected.

There are two types of errors that can be made when testing hypotheses. Type I error is that rejecting a hypothesis that is actually true. The probability of such an error is not greater than the accepted level of significance. Type II error is that the hypothesis is accepted, but in fact it is false. The probability of this error is the lower, the higher the level of significance, since this increases the number of rejected hypotheses. If the probability of an error of the second kind is a, then the value (1 - a) is called the power of the criterion.

On fig. 13 shows two curves of the distribution density of the random variable Q, corresponding to two hypotheses H 0 and H one . If the value obtained from experience is Q > Q p, then the hypothesis is rejected. H 0 and the hypothesis is accepted H 1 , and vice versa, if Q< Qp.

Area under the probability density curve corresponding to the validity of the hypothesis H 0 to the right of the Q value p, is equal to the significance level R, i.e., the probabilities of an error of the first kind. Area under the probability density curve corresponding to the validity of the hypothesis H 1 to the left of Q p, is equal to the probability of error of the second kind a, and to the right of Q p- the power of the criterion (1 - a). Thus, the more R, the more (1 - a). When testing a hypothesis, they try to choose from all possible criteria the one that, at a given level of significance, has a lower probability of a Type II error..

Usually, as the optimal level of significance when testing hypotheses, use p= 0.05, since if the hypothesis being tested is accepted with a given level of significance, then the hypothesis, of course, should be recognized as consistent with the experimental data; on the other hand, the use of this level of significance does not provide grounds for rejecting the hypothesis.

For example, two values ​​of and some sample parameter are found, which can be considered as estimates of the general parameters a 1 and a 2. It is hypothesized that the difference between and is random and that the general parameters a 1 and a 2 are equal to each other, i.e. a 1 = a 2. This hypothesis is called null, or null hypothesis. To test it, you need to find out if the discrepancy between and is significant under the null hypothesis. To do this, one usually investigates a random variable D = – and checks whether its difference from zero is significant. Sometimes it is more convenient to consider the value / by comparing it with unity.

Rejecting the null hypothesis, they accept the alternative one, which splits into two: > and< . Если одно из этих равенств заведомо невозможно, то альтернативная гипотеза называется unilateral, and to check it, use unilateral significance criteria (as opposed to conventional, bilateral). In this case, it is necessary to consider only one of the halves of the critical region (Fig. 12).

For example, R= 0.05 with a two-sided criterion, the critical values ​​Q 0.025 and Q 0.975 correspond, i.e., Q * that have taken the values ​​Q * are considered significant (non-random)< Q 0.025 и Q * >Q 0.975 . With a one-sided criterion, one of these inequalities is obviously impossible (for example, Q *< Q 0.025) и значимыми будут лишь Q * >Q 0.975 . The probability of the last inequality is 0.025 and hence the significance level will be 0.025. Thus, if the same critical numbers are used for the one-tailed significance test as for the two-tailed one, these values ​​will correspond to half the significance level.

Usually, for a one-tailed test, the same level of significance is taken as for a two-tailed test, since under these conditions both tests provide the same type I error. To do this, a one-tailed test must be derived from a two-tailed one, corresponding to twice the level of significance than that accepted. To maintain a significance level for a one-tailed test R= 0.05, for bilateral it is necessary to take R= 0.10, which gives the critical values ​​Q 0.05 and Q 0.95. Of these, for a one-sided test, one will remain, for example, Q 0.95. The significance level for the one-tailed test is 0.05. The same level of significance for the two-tailed test corresponds to the critical value Q 0.975. But Q 0.95< Q 0.975 , значит, при одностороннем критерии большее число гипотез будет отвергнуто и, следовательно, меньше будет ошибка второго рода.

The level of significance in statistics is an important indicator that reflects the degree of confidence in the accuracy and truth of the received (predicted) data. The concept is widely used in various fields: from sociological research to statistical testing of scientific hypotheses.

Definition

The level of statistical significance (or statistically significant result) shows what is the probability of random occurrence of the studied indicators. The overall statistical significance of the phenomenon is expressed by the p-value (p-level). In any experiment or observation, there is a possibility that the data obtained arose due to sampling errors. This is especially true for sociology.

That is, a value is statistically significant, whose probability of random occurrence is extremely small or tends to extremes. The extreme in this context is the degree of deviation of statistics from the null hypothesis (a hypothesis that is tested for consistency with the obtained sample data). In scientific practice, the significance level is chosen before data collection and, as a rule, its coefficient is 0.05 (5%). For systems where accurate values ​​are critical, this may be 0.01 (1%) or less.

Background

The concept of significance level was introduced by the British statistician and geneticist Ronald Fisher in 1925 when he was developing a technique for testing statistical hypotheses. When analyzing any process, there is a certain probability of certain phenomena. Difficulties arise when working with small (or not obvious) percentages of probabilities that fall under the concept of "measurement error".

When working with statistics that were not specific enough to be tested, scientists were faced with the problem of the null hypothesis, which “prevents” operating with small values. Fisher proposed for such systems to determine the probability of events at 5% (0.05) as a convenient sample cutoff that allows one to reject the null hypothesis in the calculations.

Introduction of a fixed coefficient

In 1933 Jerzy scientists Neumann and Egon Pearson in their papers recommended setting a certain significance level in advance (before data collection). Examples of the use of these rules are clearly visible during the elections. Suppose there are two candidates, one of which is very popular and the other is not well known. It is obvious that the first candidate will win the election, and the chances of the second tend to zero. Strive - but not equal: there is always the possibility of force majeure, sensational information, unexpected decisions that can change the predicted election results.

Neumann and Pearson agreed that Fisher's proposed significance level of 0.05 (denoted by the symbol α) is the most convenient. However, Fischer himself in 1956 opposed fixing this value. He believed that the level of α should be set in accordance with specific circumstances. For example, in particle physics it is 0.01.

p-value

The term p-value was first used by Brownlee in 1960. P-level (p-value) is an indicator that is inversely related to the truth of the results. The highest p-value corresponds to the lowest level of confidence in the sampled relationship between variables.

This value reflects the probability of errors associated with the interpretation of the results. Assume p-value = 0.05 (1/20). It shows a five percent chance that the relationship between variables found in the sample is just a random feature of the sample. That is, if this dependence is absent, then with repeated similar experiments, on average, in every twentieth study, one can expect the same or greater dependence between variables. Often the p-level is considered as the "margin" of the error level.

By the way, the p-value may not reflect the real relationship between the variables, but only shows a certain average value within the assumptions. In particular, the final analysis of the data will also depend on the chosen values ​​of this coefficient. With p-level = 0.05 there will be some results, and with a coefficient equal to 0.01, others.

Testing statistical hypotheses

The level of statistical significance is especially important when testing hypotheses. For example, when calculating a two-tailed test, the rejection area is divided equally at both ends of the sampling distribution (relative to the zero coordinate) and the truth of the data obtained is calculated.

Suppose, when monitoring a certain process (phenomenon), it turned out that new statistical information indicates small changes relative to previous values. At the same time, the discrepancies in the results are small, not obvious, but important for the study. The specialist faces a dilemma: do the changes really occur or are they sampling errors (measurement inaccuracy)?

In this case, the null hypothesis is applied or rejected (everything is written off as an error, or the change in the system is recognized as a fait accompli). The process of solving the problem is based on the ratio of the overall statistical significance (p-value) and the level of significance (α). If p-level< α, значит, нулевую гипотезу отвергают. Чем меньше р-value, тем более значимой является тестовая статистика.

Used values

The level of significance depends on the analyzed material. In practice, the following fixed values ​​are used:

  • α = 0.1 (or 10%);
  • α = 0.05 (or 5%);
  • α = 0.01 (or 1%);
  • α = 0.001 (or 0.1%).

The more accurate the calculations are required, the smaller the coefficient α is used. Naturally, statistical forecasts in physics, chemistry, pharmaceuticals, and genetics require greater accuracy than in political science and sociology.

Significance thresholds in specific areas

In high-precision fields such as particle physics and manufacturing, statistical significance is often expressed as the ratio of the standard deviation (denoted by the sigma - σ coefficient) relative to a normal probability distribution (Gaussian distribution). σ is a statistical indicator that determines the spread of values ​​of a certain quantity relative to mathematical expectations. Used to plot the probability of events.

Depending on the field of knowledge, the coefficient σ varies greatly. For example, when predicting the existence of the Higgs boson, the parameter σ is equal to five (σ=5), which corresponds to the p-value=1/3.5 million. areas.

Efficiency

It must be taken into account that the coefficients α and p-value are not exact characteristics. Whatever the level of significance in the statistics of the phenomenon under study, it is not an unconditional basis for accepting the hypothesis. For example, the smaller the value of α, the greater the chance that the hypothesis being established is significant. However, there is a risk of error, which reduces the statistical power (significance) of the study.

Researchers who focus exclusively on statistically significant results may draw erroneous conclusions. At the same time, it is difficult to double-check their work, since they apply assumptions (which, in fact, are the values ​​of α and p-value). Therefore, it is always recommended, along with the calculation of statistical significance, to determine another indicator - the magnitude of the statistical effect. Effect size is a quantitative measure of the strength of an effect.

The value is called statistically significant, if the probability of a purely random occurrence of it or even more extreme values ​​is small. Here, extreme is the degree of deviation from the null hypothesis. A difference is said to be "statistically significant" if there are data that would be unlikely to occur, assuming that the difference does not exist; this expression does not mean that this difference should be large, important, or significant in the general sense of the word.

The significance level of a test is the traditional notion of hypothesis testing in frequency statistics. It is defined as the probability of deciding to reject the null hypothesis if, in fact, the null hypothesis is true (the decision is known as a Type I error, or false positive decision.) The decision process often relies on a p-value (read "pi-value"): if p-value is less than the significance level, then the null hypothesis is rejected. The smaller the p-value, the more significant the test statistic is said to be. The smaller the p-value, the stronger the reason to reject the null hypothesis.

The level of significance is usually denoted by the Greek letter α (alpha). Popular significance levels are 5%, 1%, and 0.1%. If the test produces a p-value less than the α-level, then the null hypothesis is rejected. Such results are informally referred to as "statistically significant". For example, if someone says that "the chances of what happened is a coincidence equal to one in a thousand", then they mean 0.1% significance level.

Different values ​​of the α-level have their advantages and disadvantages. Smaller α-levels give more confidence that an already established alternative hypothesis is significant, but there is a greater risk of not rejecting a false null hypothesis (Type II error, or "false negative decision"), and thus less statistical power. The choice of α-level inevitably requires a trade-off between significance and power, and hence between Type I and Type II error probabilities. In domestic scientific papers often the incorrect term "significance" is used instead of the term "statistical significance".

see also

Notes

George Casella, Roger L. Berger Hypothesis Testing // Statistical Inference . -Second edition. - Pacific Grove, CA: Duxbury, 2002. - S. 397. - 660 p. - ISBN 0-534-24312-6


Wikimedia Foundation. 2010 .

See what the "Level of Significance" is in other dictionaries:

    The number is so small that it can be considered almost certain that an event with probability α will not occur in a single experiment. Usually U. z. is fixed arbitrarily, namely: 0.05, 0.01, and with special accuracy 0.005, etc. In geol. work… … Geological Encyclopedia

    significance level- statistical criterion (it is also called “alpha level” and denoted by a Greek letter) is an upper bound on the probability of a type I error (the probability of rejecting a null hypothesis when it is actually true). Typical values ​​are... Dictionary of Sociological Statistics

    English level, significance; German Signifikanzniveau. The degree of risk is that the researcher may draw the wrong conclusion about the fallacy of the extras, hypotheses based on sample data. Antinazi. Encyclopedia of Sociology, 2009 ... Encyclopedia of Sociology

    significance level- - [L.G. Sumenko. English Russian Dictionary of Information Technologies. M .: GP TsNIIS, 2003.] Topics information technology in general EN level of significance ... Technical Translator's Handbook

    significance level- 3.31 significance level α: A given value representing the upper bound on the probability of rejecting a statistical hypothesis when that hypothesis is true. Source: GOST R ISO 12491 2011: Building materials and products. ... ... Dictionary-reference book of terms of normative and technical documentation

    SIGNIFICANCE LEVEL- the concept of mathematical statistics, reflecting the degree of probability of an erroneous conclusion regarding a statistical hypothesis about the distribution of a feature, verified on the basis of sample data. In psychological research for a sufficient level ... ... Modern educational process: basic concepts and terms

    significance level- reikšmingumo lygis statusas T sritis automatika atitikmenys: engl. significance level vok. Signifikanzniveau, n rus. significance level, m pranc. niveau de signifiance, m … Automatikos terminų žodynas

    significance level- reikšmingumo lygis statusas T sritis fizika atitikmenys: engl. level of significance; significance level vok. Sicherheitsschwelle, f rus. significance level, fpranc. niveau de significance, m … Fizikos terminų žodynas

    Statistical test, see Significance level... Great Soviet Encyclopedia

    SIGNIFICANCE LEVEL- See significance, level... Dictionary in psychology

Books

  • "Top secret" . Lubyanka - to Stalin on the situation in the country (1922-1934). Volume 4. Part 1,. Multi-volume fundamental publication of papers - information reviews and summaries of the OGPU - is unique in its scientific significance, value, content and scope. In this historic…
  • Educational program as a tool for the quality management system of vocational education, Tkacheva Galina Viktorovna, Logachev Maxim Sergeevich, Samarin Yury Nikolaevich. The monograph analyzes the existing practices of forming the content of professional educational programs. The place, structure, content and level of significance are determined ...

p-value(eng.) - the value used when testing statistical hypotheses. In fact, this is the probability of error when rejecting the null hypothesis (error of the first kind). Hypothesis testing using the P-value is an alternative to the classic testing procedure through the critical value of the distribution.

Usually, the P-value is equal to the probability that a random variable with a given distribution (the distribution of the test statistic under the null hypothesis) will take on a value no less than the actual value of the test statistic. Wikipedia.

In other words, the p-value is the smallest level of significance (i.e., the probability of rejecting a true hypothesis) for which the computed test statistic leads to the rejection of the null hypothesis. Typically, the p-value is compared to generally accepted standard significance levels of 0.005 or 0.01.

For example, if the value of the test statistic calculated from the sample corresponds to p = 0.005, this indicates a 0.5% probability of the hypothesis being true. Thus, the smaller the p-value, the better, since it increases the “strength” of rejecting the null hypothesis and increases the expected significance of the result.

An interesting explanation of this is on Habré.

Statistical analysis is starting to look like a black box: the input is data, the output is a table of main results and a p-value.

What does p-value say?

Suppose we decided to find out if there is a relationship between the addiction to bloody computer games and aggressiveness in real life. For this, two groups of schoolchildren of 100 people each were randomly formed (group 1 - shooter fans, group 2 - not playing computer games). For example, the number of fights with peers acts as an indicator of aggressiveness. In our imaginary study, it turned out that the group of schoolchildren-gamblers did conflict with their comrades noticeably more often. But how do we find out how statistically significant the resulting differences are? Maybe we got the observed difference quite by accident? To answer these questions, the p-value is used - this is the probability of getting such or more pronounced differences, provided that there are actually no differences in the general population. In other words, this is the probability of getting such or even stronger differences between our groups, provided that, in fact, computer games do not affect aggressiveness in any way. It doesn't sound that difficult. However, this particular statistic is often misinterpreted.

p-value examples

So, we compared two groups of schoolchildren with each other in terms of the level of aggressiveness using a standard t-test (or a non-parametric Chi test - the square of the more appropriate in this situation) and found that the coveted p-significance level is less than 0.05 (for example, 0.04). But what does the resulting p-significance value actually tell us? So, if the p-value is the probability of getting such or more pronounced differences, provided that there are actually no differences in the general population, then what do you think is the correct statement:

1. Computer games are the cause of aggressive behavior with a 96% probability.
2. The probability that aggressiveness and computer games are not related is 0.04.
3. If we got a p-level of significance greater than 0.05, this would mean that aggressiveness and computer games are not related in any way.
4. The probability of getting such differences by chance is 0.04.
5. All statements are wrong.

If you chose the fifth option, then you are absolutely right! But, as numerous studies show, even people with significant experience in data analysis often misinterpret p-values.

Let's take each answer in order:

The first statement is an example of the correlation error: the fact that two variables are significantly related tells us nothing about cause and effect. Maybe it's more aggressive people who prefer to spend time playing computer games, and it's not computer games that make people more aggressive.

This is a more interesting statement. The thing is that we initially take it for granted that there really are no differences. And, keeping this in mind as a fact, we calculate the p-value. Therefore, the correct interpretation is: "Assuming that aggressiveness and computer games are not related in any way, then the probability of getting such or even more pronounced differences was 0.04."

But what if we got insignificant differences? Does this mean that there is no relationship between the studied variables? No, it only means that there may be differences, but our results did not allow us to detect them.

This is directly related to the definition of p-value itself. 0.04 is the probability of getting these or even more extreme differences. In principle, it is impossible to estimate the probability of obtaining exactly such differences as in our experiment!

These are the pitfalls that can be hidden in the interpretation of such an indicator as p-value. Therefore, it is very important to understand the mechanisms underlying the methods of analysis and calculation of the main statistical indicators.

How to find p-value?

1. Determine the expected results of your experiment

Usually, when scientists conduct an experiment, they already have an idea of ​​what results to consider "normal" or "typical." This may be based on the experimental results of past experiments, on reliable data sets, on data from the scientific literature, or the scientist may be based on some other sources. For your experiment, define the expected results, and express them as numbers.

Example: For example, earlier studies have shown that in your country, red cars are more likely to get speeding tickets than blue cars. For example, average scores show a 2:1 preference for red cars over blue ones. We want to determine if the police have the same prejudice against the color of cars in your city. To do this, we will analyze the fines issued for speeding. If we take a random set of 150 speeding tickets issued to either red or blue cars, we would expect 100 tickets to be issued to red cars and 50 to blue if the police in our city are as biased towards the color of cars as this observed throughout the country.

2. Determine the observable results of your experiment

Now that you have determined the expected results, you need to experiment and find the actual (or "observed") values. You again need to represent these results as numbers. If we create experimental conditions, and the observed results differ from the expected ones, then we have two possibilities - either this happened by chance, or this is caused precisely by our experiment. The purpose of finding the p-value is precisely to determine whether the observed results differ from the expected ones in such a way that one can not reject the "null hypothesis" - the hypothesis that there is no relationship between the experimental variables and the observed results.

Example: For example, in our city, we randomly selected 150 speeding tickets that were issued to either red or blue cars. We determined that 90 tickets were issued to red cars and 60 to blue ones. This is different from the expected results, which are 100 and 50, respectively. Did our experiment (in this case, changing the data source from national to urban) produce this change in the results, or is our city police biased in exactly the same way as the national average and we see just a random variation? The p-value will help us determine this.

3. Determine the number of degrees of freedom of your experiment

The number of degrees of freedom is the degree of variability in your experiment, which is determined by the number of categories you are exploring. The equation for the number of degrees of freedom is Number of degrees of freedom = n-1, where "n" is the number of categories or variables you are analyzing in your experiment.

Example: In our experiment, there are two categories of results: one category for red cars, and one for blue cars. Therefore, in our experiment, we have 2-1 = 1 degree of freedom. If we were comparing red, blue and green cars, we would have 2 degrees of freedom, and so on.

4. Compare expected and observed results using the chi-square test

Chi-square (written "x2") is a numerical value that measures the difference between the expected and observed values ​​of an experiment. The equation for the chi-square is x2 = Σ((o-e)2/e) where "o" is the observed value and "e" is the expected value. Sum the results of the given equation for all possible outcomes (see below).

Note that this equation includes the summation operator Σ (sigma). In other words, you need to calculate ((|o-e|-.05)2/e) for each possible outcome, and add the numbers together to get the chi-square value. In our example, we have two possible outcomes - either the car that received the penalty is red or blue. So we have to count ((o-e)2/e) twice - once for the red cars, and once for the blue cars.

Example: Let's plug our expected and observed values ​​into the equation x2 = Σ((o-e)2/e). Remember that because of the summation operator, we need to count ((o-e)2/e) twice - once for the red cars, and once for the blue cars. We will make this work as follows:
x2 = ((90-100)2/100) + (60-50)2/50)
x2 = ((-10)2/100) + (10)2/50)
x2 = (100/100) + (100/50) = 1 + 2 = 3.

5. Choose a Significance Level

Now that we know the number of degrees of freedom in our experiment, and we know the value of the chi-square test, we need to do one more thing before we can find our p-value. We need to determine the level of significance. talking plain language, the significance level indicates how confident we are in our results. A low value for significance corresponds to a low probability that the experimental results were obtained by chance, and vice versa. Significance levels are written as decimal fractions (such as 0.01), which corresponds to the probability that we obtained the experimental results by chance (in this case, the probability of this is 1%).

By convention, scientists typically set the significance level of their experiments to 0.05, or 5%. This means that experimental results that meet such a criterion of significance could only be obtained with a probability of 5% purely by chance. In other words, there is a 95% chance that the results were caused by how the scientist manipulated the experimental variables, and not by chance. For most experiments, 95% confidence that there is a relationship between two variables is enough to consider that they are “really” related to each other.

Example: For our example with red and blue cars, let's follow the convention between the scientists and set the significance level to 0.05.

6. Use a chi-squared distribution datasheet to find your p-value

Scientists and statisticians use large spreadsheets to calculate the p-value of their experiments. Table data usually have a vertical axis on the left, corresponding to the number of degrees of freedom, and a horizontal axis on the top, corresponding to the p-value. Use the data in the table to first find your number of degrees of freedom, then look at your series from left to right until you find the first value greater than your chi-square value. Look at the corresponding p-value at the top of your column. Your p-value is between this number and the next one (the one to the left of yours).

Chi-squared distribution tables can be obtained from many sources (here you can find one at this link).

Example: Our chi-square value was 3. Since we know that there is only 1 degree of freedom in our experiment, we will select the very first row. We go from left to right along this line until we encounter a value greater than 3, our chi-square test value. The first one we find is 3.84. Looking up our column, we see that the corresponding p-value is 0.05. This means that our p-value is between 0.05 and 0.1 (the next highest p-value in the table).

7. Decide whether to reject or keep your null hypothesis

Since you have determined the approximate p-value for your experiment, you need to decide whether to reject the null hypothesis of your experiment or not (recall, this is the hypothesis that the experimental variables you manipulated did not affect the results you observed). If your p-value is less than your significance level, congratulations, you have proven that there is a very likely relationship between the variables you manipulated and the results you observed. If your p-value is higher than your significance level, you cannot be sure whether the results you observed were due to pure chance or manipulation of your variables.

Example: Our p-value is between 0.05 and 0.1. This is clearly no less than 0.05, so unfortunately we cannot reject our null hypothesis. This means that we have not reached a minimum of 95% probability of saying that the police in our city issue tickets to red and blue cars with a probability that is quite different from the national average.

In other words, there is a 5-10% chance that the results we observe are not the consequences of a change in location (analysis of the city, not the whole country), but simply an accident. Since we required an accuracy of less than 5%, we cannot say that we are sure that the police in our city are less biased towards red cars - there is a small (but statistically significant) chance that this is not the case.

In the tables of the results of statistical calculations in term papers, diploma and master's theses in psychology, there is always an indicator "p".

For example, in accordance with research objectives Differences in the level of meaningfulness of life in boys and girls of adolescence were calculated.

Mean

Mann-Whitney U test

Level of statistical significance (p)

Boys (20 people)

Girls

(5 people)

Goals

28,9

35,2

17,5

0,027*

Process

30,1

32,0

38,5

0,435

Result

25,2

29,0

29,5

0,164

Locus of control - "I"

20,3

23,6

0,067

Locus of Control - "Life"

30,4

33,8

27,5

0,126

Meaningfulness of life

98,9

111,2

0,103

* - differences are statistically significant (p0,05)

The right column indicates the value of "p" and it is by its value that one can determine whether the differences in the meaningfulness of life in the future in boys and girls are significant or not significant. The rule is simple:

  • If the level of statistical significance "p" is less than or equal to 0.05, then we conclude that the differences are significant. In the above table, the differences between boys and girls are significant in relation to the indicator "Goals" - meaningfulness of life in the future. In girls, this indicator is statistically significantly higher than in boys.
  • If the level of statistical significance "p" is greater than 0.05, then it is concluded that the differences are not significant. In the above table, the differences between boys and girls are not significant for all other indicators, except for the first one.

Where does the level of statistical significance "p" come from

The level of statistical significance is calculated statistical program together with the calculation of the statistical criterion. In these programs, you can also set a critical limit for the level of statistical significance and the corresponding indicators will be highlighted by the program.

For example, in the STATISTICA program, when calculating correlations, you can set the p limit, for example, 0.05, and all statistically significant relationships will be highlighted in red.

If the calculation of the statistical criterion is carried out manually, then the significance level "p" is determined by comparing the value of the obtained criterion with the critical value.

What does the level of statistical significance "p" show

All statistical calculations are approximate. The level of this approximation determines the "r". The significance level is written as decimals, for example, 0.023 or 0.965. If we multiply this number by 100, we get the p indicator as a percentage: 2.3% and 96.5%. These percentages reflect the likelihood that our assumption of a relationship, for example, between aggressiveness and anxiety, is wrong.

That is, correlation coefficient 0.58 between aggressiveness and anxiety is obtained at a statistical significance level of 0.05 or a 5% error probability. What exactly does this mean?

The correlation we found means that the following pattern is observed in our sample: the higher the aggressiveness, the higher the anxiety. That is, if we take two teenagers, and one of them will have higher anxiety than the other, then, knowing about the positive correlation, we can say that this teenager will also have higher aggressiveness. But since everything is approximate in statistics, then, stating this, we admit that we can make a mistake, and the probability of an error is 5%. That is, having made 20 such comparisons in this group of adolescents, we can make a mistake with the forecast about the level of aggressiveness once, knowing anxiety.

Which level of statistical significance is better: 0.01 or 0.05

The level of statistical significance reflects the probability of error. Therefore, the result at p=0.01 is more accurate than at p=0.05.

In psychological research, two acceptable levels of statistical significance of the results are accepted:

p=0.01 - high reliability of the result comparative analysis or analysis of relationships;

p=0.05 - sufficient accuracy.

I hope this article will help you write a psychology paper on your own. If you need help, please contact (all types of work in psychology; statistical calculations).