P-value Calculator

The P-value Calculator computes p-values from z-scores using the standard normal distribution. Enter your z-score, pick the test direction (one- or two-tailed), and set your significance level (α — typically 0.05). The calculator returns the p-value plus a verdict telling you whether to reject the null hypothesis at your chosen α.

Compute p-value from a z-score using the standard normal distribution. Pick the tail (one-sided or two-sided) and the significance level (α). The verdict below the result tells you whether to reject the null hypothesis.

From your test statistic.

5% is standard.

How to use

  1. 1

    Enter your z-score from your test statistic.

  2. 2

    Pick the test type: two-tailed (most common, tests for difference in either direction), one-tailed right (tests for greater than), or one-tailed left (tests for less than).

  3. 3

    Set α (significance level) — 5% is the standard choice; 1% for stricter, 10% for looser.

  4. 4

    Read the p-value in the green block.

  5. 5

    The verdict below tells you whether the result is statistically significant at your α.

  6. 6

    Reference cards show whether the result is significant at α = 0.001, 0.01, 0.05, and 0.10 simultaneously.

Frequently asked questions

Ratings & Reviews

Rate this tool

Sign in to rate and review this tool.

Loading reviews…

What a P-value Tells You

A p-value is the probability of observing a test statistic at least as extreme as the one you got, assuming the null hypothesis is true. The smaller the p-value, the less likely your data is to have come from a world where the null is true — and so the stronger the evidence to reject the null.

The Microapp P-value Calculator handles the most common case: given a z-score (from a z-test), what's the p-value? Pick the tail direction, set your α, get a verdict.

Worked example. A drug trial finds a z-score of 1.96 (two-tailed test).
• P-value: p ≈ 0.0500 (the standard "exactly significant" threshold)
• At α = 0.05: p < α → reject the null. The effect is statistically significant.
• At α = 0.01: p > α → fail to reject. The effect is NOT significant at the stricter threshold.
Interpretation: there's about a 5% chance you'd see a result this extreme under the null hypothesis.

Common Z-score Thresholds

Z-score (two-tailed)P-valueα level
1.6450.1010% (loose)
1.960.055% (standard)
2.5760.011% (strict)
3.2910.0010.1% (very strict)
3.8910.00010.01%
5~6 × 10⁻⁷"Five sigma" — physics standard

One-tailed vs Two-tailed: When to Use Which

Two-tailed (default) tests for "any difference" — your test rejects the null if the data is significantly higher OR significantly lower than the null hypothesis predicted. Use this when you don't have a directional hypothesis. Most published research uses two-tailed by convention.

One-tailed tests for a specific direction — only "significantly higher than" (right-tail) or "significantly lower than" (left-tail). Use only when you have a strong, pre-registered directional hypothesis. One-tailed tests are statistically more powerful (easier to find significance) but the cost is you can't claim significance in the opposite direction even if the data dramatically points there.

The rule of thumb: if there's any chance you'd be interested in a result in the opposite direction, use two-tailed. Two-tailed = honest about uncertainty.

What "Statistically Significant" Means (and Doesn't)

A statistically significant p-value means: assuming the null is true, this data is unlikely. It does NOT mean:

  • The effect is large. A tiny effect can be highly significant if the sample size is huge. Always report effect size alongside p-values.
  • The result will replicate. A "significant" finding has a 5% false-positive rate (at α = 0.05) by construction. Replication is needed for confidence.
  • The null is false. Failing to reject the null doesn't prove the null; it means your data didn't have enough evidence. Absence of evidence isn't evidence of absence.
  • The result is practically meaningful. A drug that lowers blood pressure by 0.5 mmHg might be statistically significant in a 100,000-person trial — but clinically irrelevant.

Z-test vs T-test

This calculator uses the standard normal distribution (z-distribution), which assumes you know the population standard deviation. In practice:

  • Use a z-test when: sample size is large (≥ 30) AND you can plausibly assume the population variance.
  • Use a t-test when: sample size is small (< 30), OR you're estimating the variance from the sample. The t-distribution has fatter tails (more extreme values), so the same z-score gives a larger p-value with t.

For practical statistics work, the z-test and t-test agree closely above n = 30. Below that, you should use a t-distribution calculator with the appropriate degrees of freedom (df = n − 1 for one-sample tests).

Common Misuses of P-values

P-hacking. Running 20 tests, picking the one that's significant, and not reporting the others. By chance alone, 1 in 20 tests with α = 0.05 will be "significant" under the null. P-hacking turns false positives into apparent discoveries.

Stopping rule fishing. Running an experiment, checking p-value, continuing to collect data if not significant yet. This inflates the false-positive rate dramatically. Pre-register your sample size and stop when you reach it.

Conflating "non-significant" with "no effect." A non-significant result means "we can't tell" — not "no effect exists." Wide confidence intervals hide real effects.

Ignoring multiple comparisons. Testing 100 things at α = 0.05 means ~5 false positives by chance. Use Bonferroni (divide α by number of tests) or false discovery rate methods when running many tests.

Related Tools

For computing geometric means (used in some statistical contexts), see the Geometric Mean Calculator. For arithmetic means and other basic statistics, the Average Calculator is the right tool. For raw percentage math (often used alongside p-values), see the Percentage Calculator.