P-value Calculator

The P-value Calculator computes p-values from z-scores using the standard normal distribution. Enter your z-score, pick the test direction (one- or two-tailed), and set your significance level (α — typically 0.05). The calculator returns the p-value plus a verdict telling you whether to reject the null hypothesis at your chosen α.

How to use

1
Enter your z-score from your test statistic.
2
Pick the test type: two-tailed (most common, tests for difference in either direction), one-tailed right (tests for greater than), or one-tailed left (tests for less than).
3
Set α (significance level) — 5% is the standard choice; 1% for stricter, 10% for looser.
4
Read the p-value in the green block.
5
The verdict below tells you whether the result is statistically significant at your α.
6
Reference cards show whether the result is significant at α = 0.001, 0.01, 0.05, and 0.10 simultaneously.

📊 Geometric Mean Calculator 📈 Average Calculator % Percentage Calculator

Frequently asked questions

What is a p-value?

The p-value is the probability of observing a test statistic at least as extreme as your actual statistic, assuming the null hypothesis is true. Small p-values (typically < 0.05) suggest your observed data is unlikely under the null, so you reject the null. Larger p-values mean your data is consistent with the null — you fail to reject.

What's α (alpha)?

α is the significance level — the threshold below which you'll call your result statistically significant. The convention is α = 0.05 (5%), set by Fisher in the 1920s. If p < α, you reject the null hypothesis. If p ≥ α, you fail to reject. α is the maximum probability of a false positive (Type I error) you're willing to accept.

What's the difference between one-tailed and two-tailed?

Two-tailed tests for any difference (greater OR less than the null hypothesis). One-tailed tests for a specific direction (only greater, or only less). Two-tailed is the default unless you have a strong directional hypothesis. A two-tailed test produces a p-value twice as large as the corresponding one-tailed test for the same |z|, so significance is harder to claim — which is appropriate when you don't have a directional hypothesis.

What's the difference between z-test and t-test?

Z-test assumes you know the population standard deviation (σ); t-test estimates it from the sample. For sample sizes above 30, the two converge. For smaller samples, the t-distribution has fatter tails — the same calculated p-value would be larger. This calculator uses the z-distribution; for t-tests, use a t-distribution calculator with the appropriate degrees of freedom.

What if my p-value is exactly 0.05?

Conventionally, 'p < 0.05' is significant. p = 0.05 is borderline — typically reported as 'marginally significant' or just stated as 'p = 0.05.' Modern statistical practice favors reporting the exact p-value (e.g., 'p = 0.047') rather than just 'significant/not significant' — readers can apply their own α threshold.

Can a small p-value mean the effect is large?

No. A p-value tells you about strength of evidence, not effect size. A tiny p-value (p = 0.0001) means the observed result is very unlikely under the null, but the actual effect might be tiny — visible only because of large sample size. Always report effect size (e.g., Cohen's d, odds ratio) alongside p-values to communicate the magnitude.

Ratings & Reviews

Rate this tool

Loading reviews…

What a P-value Tells You

A p-value is the probability of observing a test statistic at least as extreme as the one you got, assuming the null hypothesis is true. The smaller the p-value, the less likely your data is to have come from a world where the null is true — and so the stronger the evidence to reject the null.

The Microapp P-value Calculator handles the most common case: given a z-score (from a z-test), what's the p-value? Pick the tail direction, set your α, get a verdict.

Worked example. A drug trial finds a z-score of 1.96 (two-tailed test).
• P-value: p ≈ 0.0500 (the standard "exactly significant" threshold)
• At α = 0.05: p < α → reject the null. The effect is statistically significant.
• At α = 0.01: p > α → fail to reject. The effect is NOT significant at the stricter threshold.
Interpretation: there's about a 5% chance you'd see a result this extreme under the null hypothesis.

Common Z-score Thresholds

Z-score (two-tailed)	P-value	α level
1.645	0.10	10% (loose)
1.96	0.05	5% (standard)
2.576	0.01	1% (strict)
3.291	0.001	0.1% (very strict)
3.891	0.0001	0.01%
5	~6 × 10⁻⁷	"Five sigma" — physics standard

One-tailed vs Two-tailed: When to Use Which

Two-tailed (default) tests for "any difference" — your test rejects the null if the data is significantly higher OR significantly lower than the null hypothesis predicted. Use this when you don't have a directional hypothesis. Most published research uses two-tailed by convention.

One-tailed tests for a specific direction — only "significantly higher than" (right-tail) or "significantly lower than" (left-tail). Use only when you have a strong, pre-registered directional hypothesis. One-tailed tests are statistically more powerful (easier to find significance) but the cost is you can't claim significance in the opposite direction even if the data dramatically points there.

The rule of thumb: if there's any chance you'd be interested in a result in the opposite direction, use two-tailed. Two-tailed = honest about uncertainty.

What "Statistically Significant" Means (and Doesn't)

A statistically significant p-value means: assuming the null is true, this data is unlikely. It does NOT mean:

The effect is large. A tiny effect can be highly significant if the sample size is huge. Always report effect size alongside p-values.
The result will replicate. A "significant" finding has a 5% false-positive rate (at α = 0.05) by construction. Replication is needed for confidence.
The null is false. Failing to reject the null doesn't prove the null; it means your data didn't have enough evidence. Absence of evidence isn't evidence of absence.
The result is practically meaningful. A drug that lowers blood pressure by 0.5 mmHg might be statistically significant in a 100,000-person trial — but clinically irrelevant.

Z-test vs T-test

This calculator uses the standard normal distribution (z-distribution), which assumes you know the population standard deviation. In practice:

Use a z-test when: sample size is large (≥ 30) AND you can plausibly assume the population variance.
Use a t-test when: sample size is small (< 30), OR you're estimating the variance from the sample. The t-distribution has fatter tails (more extreme values), so the same z-score gives a larger p-value with t.

For practical statistics work, the z-test and t-test agree closely above n = 30. Below that, you should use a t-distribution calculator with the appropriate degrees of freedom (df = n − 1 for one-sample tests).

Common Misuses of P-values

P-hacking. Running 20 tests, picking the one that's significant, and not reporting the others. By chance alone, 1 in 20 tests with α = 0.05 will be "significant" under the null. P-hacking turns false positives into apparent discoveries.

Stopping rule fishing. Running an experiment, checking p-value, continuing to collect data if not significant yet. This inflates the false-positive rate dramatically. Pre-register your sample size and stop when you reach it.

Conflating "non-significant" with "no effect." A non-significant result means "we can't tell" — not "no effect exists." Wide confidence intervals hide real effects.

Ignoring multiple comparisons. Testing 100 things at α = 0.05 means ~5 false positives by chance. Use Bonferroni (divide α by number of tests) or false discovery rate methods when running many tests.

Related Tools

For computing geometric means (used in some statistical contexts), see the Geometric Mean Calculator. For arithmetic means and other basic statistics, the Average Calculator is the right tool. For raw percentage math (often used alongside p-values), see the Percentage Calculator.

P-value Calculator

How to use

Related tools

Frequently asked questions

Ratings & Reviews

Rate this tool

What a P-value Tells You

Common Z-score Thresholds

One-tailed vs Two-tailed: When to Use Which

What "Statistically Significant" Means (and Doesn't)

Z-test vs T-test

Common Misuses of P-values

Related Tools