Calculadora de Valor-p

A Calculadora de Valor-p calcula o valor-p (p-value) a partir de estatísticas de teste comuns: z (teste de uma média populacional), t (teste de Student), chi² (qui-quadrado). Usado em pesquisa científica, controle de qualidade, A/B testing e ciência de dados. Convenção: p < 0,05 é geralmente considerado estatisticamente significativo.

Compute p-value from a z-score using the standard normal distribution. Pick the tail (one-sided or two-sided) and the significance level (α). The verdict below the result tells you whether to reject the null hypothesis.

From your test statistic.

5% is standard.

Como usar

  1. 1

    Escolha o tipo de teste (z, t ou chi²).

  2. 2

    Informe a estatística de teste calculada.

  3. 3

    Para t e chi², informe os graus de liberdade.

  4. 4

    Escolha unilateral ou bilateral.

  5. 5

    Veja o valor-p resultante.

Perguntas frequentes

Ratings & Reviews

Rate this tool

Sign in to rate and review this tool.

Loading reviews…

What a P-value Tells You

A p-value is the probability of observing a test statistic at least as extreme as the one you got, assuming the null hypothesis is true. The smaller the p-value, the less likely your data is to have come from a world where the null is true — and so the stronger the evidence to reject the null.

The Microapp P-value Calculator handles the most common case: given a z-score (from a z-test), what's the p-value? Pick the tail direction, set your α, get a verdict.

Worked example. A drug trial finds a z-score of 1.96 (two-tailed test).
• P-value: p ≈ 0.0500 (the standard "exactly significant" threshold)
• At α = 0.05: p < α → reject the null. The effect is statistically significant.
• At α = 0.01: p > α → fail to reject. The effect is NOT significant at the stricter threshold.
Interpretation: there's about a 5% chance you'd see a result this extreme under the null hypothesis.

Common Z-score Thresholds

Z-score (two-tailed)P-valueα level
1.6450.1010% (loose)
1.960.055% (standard)
2.5760.011% (strict)
3.2910.0010.1% (very strict)
3.8910.00010.01%
5~6 × 10⁻⁷"Five sigma" — physics standard

One-tailed vs Two-tailed: When to Use Which

Two-tailed (default) tests for "any difference" — your test rejects the null if the data is significantly higher OR significantly lower than the null hypothesis predicted. Use this when you don't have a directional hypothesis. Most published research uses two-tailed by convention.

One-tailed tests for a specific direction — only "significantly higher than" (right-tail) or "significantly lower than" (left-tail). Use only when you have a strong, pre-registered directional hypothesis. One-tailed tests are statistically more powerful (easier to find significance) but the cost is you can't claim significance in the opposite direction even if the data dramatically points there.

The rule of thumb: if there's any chance you'd be interested in a result in the opposite direction, use two-tailed. Two-tailed = honest about uncertainty.

What "Statistically Significant" Means (and Doesn't)

A statistically significant p-value means: assuming the null is true, this data is unlikely. It does NOT mean:

  • The effect is large. A tiny effect can be highly significant if the sample size is huge. Always report effect size alongside p-values.
  • The result will replicate. A "significant" finding has a 5% false-positive rate (at α = 0.05) by construction. Replication is needed for confidence.
  • The null is false. Failing to reject the null doesn't prove the null; it means your data didn't have enough evidence. Absence of evidence isn't evidence of absence.
  • The result is practically meaningful. A drug that lowers blood pressure by 0.5 mmHg might be statistically significant in a 100,000-person trial — but clinically irrelevant.

Z-test vs T-test

This calculator uses the standard normal distribution (z-distribution), which assumes you know the population standard deviation. In practice:

  • Use a z-test when: sample size is large (≥ 30) AND you can plausibly assume the population variance.
  • Use a t-test when: sample size is small (< 30), OR you're estimating the variance from the sample. The t-distribution has fatter tails (more extreme values), so the same z-score gives a larger p-value with t.

For practical statistics work, the z-test and t-test agree closely above n = 30. Below that, you should use a t-distribution calculator with the appropriate degrees of freedom (df = n − 1 for one-sample tests).

Common Misuses of P-values

P-hacking. Running 20 tests, picking the one that's significant, and not reporting the others. By chance alone, 1 in 20 tests with α = 0.05 will be "significant" under the null. P-hacking turns false positives into apparent discoveries.

Stopping rule fishing. Running an experiment, checking p-value, continuing to collect data if not significant yet. This inflates the false-positive rate dramatically. Pre-register your sample size and stop when you reach it.

Conflating "non-significant" with "no effect." A non-significant result means "we can't tell" — not "no effect exists." Wide confidence intervals hide real effects.

Ignoring multiple comparisons. Testing 100 things at α = 0.05 means ~5 false positives by chance. Use Bonferroni (divide α by number of tests) or false discovery rate methods when running many tests.

Related Tools

For computing geometric means (used in some statistical contexts), see the Geometric Mean Calculator. For arithmetic means and other basic statistics, the Average Calculator is the right tool. For raw percentage math (often used alongside p-values), see the Percentage Calculator.