Validation: Distinguishing Small Samples

Suppose that we have two small samples of performance measurements, perhaps as follows:

Measurement #Sample 1Sample 2

Each sample's five measurements are in transactions per second, and are taken using two slightly different software implementations on identical hardware systems. What can we conclude from this sample?

Did the difference in software implementation result in a significant change in performance?


Jan. 4th, 2011 06:23 pm (UTC)
Assume normal distribution?
What leads you to believe these samples come from a normal distribution? If you can't assume that, then I believe a suitable tool is the Wilcoxon rank sum test (implemented by the wilcoxon.test function in R). That test returns a p-value of 0.22. Without further information I don't think you can make a strong case that the means are different.
