Two-Way ANOVA calculation using summary data (mean, SD, sample size)

I want to perform a two-way ANOVA using the groups' means, standard deviations, and sizes. There is this website but just for one-way ANOVA. Also there are R packages and other programs that can do one-way ANOVA from summary data. However, I do not know any packages, macros, programs, or codes that can perform a two-way ANOVA from summary data. If you know one, please guide me. If not, please guide me to make an R package for it.

asked Dec 3, 2014 at 17:26 1,363 6 6 gold badges 19 19 silver badges 30 30 bronze badges $\begingroup$ Is it impossible for you to get the actual data? $\endgroup$ Commented Dec 3, 2014 at 21:28 $\begingroup$ R package for anova from sumarys: stackoverflow.com/questions/26170002/… $\endgroup$ Commented Dec 19, 2018 at 12:47

2 Answers 2

$\begingroup$

Well, if it's a one-off you can always do the calculation "by hand" (in R or any other suitable calculation tool) -- it's not hard to find the formulas for a two-way ANOVA and rewrite those in terms of summary statistics.

However, I'm going to suggest simple simulation. Since the answers can in-principle be obtained from suitable summary statistics, simply construct samples of the appropriate sizes that exactly reproduce the summary statistics (this is relatively straightforward and is addressed in a couple of questions on site). You do this individually for each cell of your two-way table. You can then call any function that can do the calculation.

To make the answer generically useful I'll describe the approach in general terms first.

A basic algorithm for a given cell with known mean $m$ and standard deviation $s$ and cell-sample-size $n$ is:

  1. generate a normal sample of size n
  2. standardize it to z-scores, $z_i$, $i=1,2. n$
  3. compute $y_i=m+s\,z_i$

Repeat for every cell, and you're done. This works as long as $n>1$ in every cell.

In R, step 1 would use rnorm , step 2 would use scale and step 3 is straight calculation, operating inside a double loop to fill out the full data and row/column group vectors, though there are ways to avoid loops if you have gigantic numbers of cells.