Paired Difference Calculations for Lab Analysis Comparisons

Labs that provide the backbone for various industries often find themselves trying to compare test data from labs not only in another part of the country, but now, because of globalization, even from another continent.

Comparing test results from various sites will be useless if benchmark samples used to compare laboratory results cannot be duplicated from lab to lab. Doing so in a cost-effective manner while still maintaining precise, accurate, and repeatable data can be quite a challenge.

The logistics and costs associated with lab coordinators traveling across the country to compare test results are getting to be prohibitive, and yet these people have the need to ensure that the results being submitted by any lab are indeed acceptable.

The author spent five years compiling the data to show that mineral contaminants in the parts-per-billion and -trillion ranges were enough to have an enormous influence on an industrial process that used treated river water as its industrial water supply. Although the water was treated by filtering and reverse osmosis, mineralization in the water used in the lab also had a drastic effect on some critical product tests—effects that had gone unexplained for decades. Now imagine the differences found in raw water around the globe. Trying to make sense of test anomalies can be quite frustrating for any lab coordinator.

Testing discrepancies in baseline standardized samples sent to various labs for comparisons can be caused by a number of generators and include, but are not limited to, the following list:

Variances in lab chemicals from different suppliers of the same reagent. Lab chemical suppliers will use different stabilizers and color additives to meet their own ISO requirements. The “1% other” sometimes noted in the composition label can be enough to cause analytical comparison problems between labs.
Different analytical instruments being used with varying specs, such as those found in atomic absorption spectrometers and gas chromatography spectrometers with different origins. Often it is very hard to compare such terminology as “gain, frequency discrepancies, and percent variability,” since instrument manufacturers usually use in-house terminology and criteria that have no resemblance to their competitors.’ Buyer beware is the name of the game.
Variations in lab water quality that is used to mix or dissolve reagents or to prepare product samples for testing. Some labs may use distillers, some may use reverse osmosis, and still others may rely on filtering systems, each of which may vary the basic dissolved ion content of the lab water. Differences in the parts-per-billion range can greatly affect many tests, as the author has found in his own work with graphite furnace AAS.
Auto pipettes and dispensers have inherent amounts of variability between models of even the same make. To ensure accurate, precise, and repeatable volumes, they need to be routinely calibrated—daily, at the very least.
Lab chemists and technicians will all have varying amounts of skill and mastery of certain techniques. Slight differences in manual functions can greatly enhance test variability, especially when very small pipette tips are used.

With all these potential internal nuances, a lab coordinator must have some basic statistical analysis tools to quickly find and identify latent corrupters of product test data. Although linear regression is one such powerful statistical tool, it usually requires that a large number of tests be performed and compared so that the mathematical results are conclusive and accurate.

But again, time is money, and money may be in tight supply because of corporate auditors and local cost-reduction programs. Other statistical tools are just as accurate and rely on a much smaller number of repeated test results to show that the numbers compared are indeed either not acceptable or are, in fact, precise, accurate, and repeatable.

One such tool is the paired difference calculation, which can rely on a field as small as three tests done in triplicate on the same sample to crunch out data that is accurate within the 95 percent confidence interval. The following example will be used to compare the results of four hypothetical labs, one of which is designated as the control source with which the other three will be compared. If the satellite labs fall within the 95 percent confidence interval and the significant difference is negative, then the labs can be assumed to be providing test data that is acceptable for a given test type when compared with a benchmark that is presumed correct. Three tests done in triplicate on the same sample are much more cost-effective than doing 20, 30, or even more, as is required for linear regression sampling.

Figure 1: Test results from four labs on three samples divided into equal volumes. Each was then subdivided and run in triplicate at the labs.
RAW DATA
Sample Description	Central Lab Viscosity	Satellite Lab 1 Viscosity	Satellite Lab 2 Viscosity	Satellite Lab 3 Viscosity
Liquid A (1)	33.2	35.7	34.2	33.7
Liquid A (2)	32.3	35.7	34.5	33.8
Liquid A (3)	32.9	35.7	34.5	33.9
Liquid B (1)	35.2	38.4	35.9	36.3
Liquid B (2)	35.2	37.9	36.9	36.7
Liquid B (3)	35.1	38.2	36.1	36.8
Liquid C (1)	38.4	41.5	40.8	41.1
Liquid C (2)	38.5	42.6	41.6	41.2
Liquid C (3)	39.4	42.1	41.7	40.3
Liquid B (3)	35.1	38.2	36.1	36.8
Liquid C (1)	38.4	41.5	40.8	41.1
Liquid C (2)	38.5	42.6	41.6	41.2
Liquid C (3)	39.4	42.1	41.7	40.3

In the example used, we will assume we are checking the viscosity test of industrial liquid lubricant products produced at four different sites around the country, but all of them belong to the same corporation. As lab coordinator at one of the labs, we will be comparing our data with that generated by the other three, one of which is the corporate central lab. It is our mission to find out whether the other labs are, in fact, up to the task of producing accurate, precise, and repeatable test results.

Figure 2: Summarized Data.
SUMMARY DATA
Sample #	Central Lab					Satellite Lab 1					Satellite Lab 2					Satellite Lab 3
	Test #1	Test #2	Test #3	Range	Avg	Test #1	Test #2	Test #3	Range	Avg	Test #1	Test #2	Test #3	Range	Avg	Test #1	Test #2	Test #3	Range	Avg
Liquid A	33.2	32.3	32.9	0.9	32.8	35.7	35.7	35.7	0	33.2	34.2	34.5	34.5	0.3	34.4	33.7	33.8	33.9	0.2	33.8
Liquid B	35.2	35.2	35.1	0.1	35.2	38.4	37.9	38.2	0.5	35.2	35.9	36.9	36.1	1	36.3	36.3	36.7	36.8	0.5	36.6
Liquid C	38.4	38.5	39.4	1	38.8	41.5	42.6	42.1	1.1	38.4	40.8	41.6	41.7	0.9	41.4	41.1	41.2	40.0	0.9	40.9

The reason is simple. If the control test results show no significant difference, then our salespeople can market the production liquid products to customers and guarantee that the specs are exactly the same and produce the same results. This way, whichever producer is closer to the customer can fill the order; otherwise, shipping costs could be increased prohibitively if the products must be shipped from a plant that is farther away from the intended user.

Figure 3: Paired Difference Calculations. YES means there is a difference in test results on the same sample. The cause needs to be identified and eliminated.Each lab is provided with three standard samples taken from the same production lot or from some other acceptable reagent source that is used as a benchmark. The labs then split these samples into three parts, and the selected tests are run. This will give three sets of data, each done in triplicate. The most important part is where the data is determined to be statistically different within the 95 percent confidence interval. It is in a YES-or-NO format to make it simple to see right away.

If the absolute value of the mean difference is greater than the 95 percent confidence interval, then there is a statistical difference in the data, and YES comes up, showing it is not acceptable data. The reasons for the unacceptability must be determined. The points in the aforementioned list of lab variables will have to be studied in detail to identify the culprit.

If NO comes up, then there is no statistical variability in the test results, and it can be assumed that the labs are providing accurate, precise, and repeatable data that can be compared with each other regardless of point of origin in a manner acceptable to both the salespeople and the customer. This statistical method is far more cost-effective than having all four labs run 20 or more tests for regression analysis, with 30 usually being the acceptable number.

Figure 4: Paired Difference Calculations. YES means there is a difference in test results on the same sample. The cause needs to be identified and eliminated.
PAID DIFFERENCES CALCULATION
Reference	Satellite Lab 3 vs central Lab	Satellite Lab 3 vs Satellite Lab 1	Satellite Lab 3 vs Satellite Lab 2	Satellite Lab 3 vs Satellite Lab 3
Satellite Lab 3 Viscosity	Satellite Lab 3 vs central Lab	Satellite Lab 3 vs Satellite Lab 1	Satellite Lab 3 vs Satellite Lab 2	Satellite Lab 3 vs Satellite Lab 3
33.7	0.5	-2	-0.5	#N/A
33.8	1.5	-1.9	-0.7	#N/A
33.9	1	-1.8	-0.6	#N/A
36.3	1.1	-2.1	0.4	#N/A
36.7	1.5	-1.2	-0.2	#N/A
36.8	1.7	-1.4	0.7	#N/A
41.1	2.7	-0.4	0.3	#N/A
41.2	2.7	-1.4	-0.4	#N/A
40.3	0.9	-1.8	-1.4	#N/A
n =	9	9	9	0
Mean Difference =	1.51	-1.56	-0.27	#N/A
Std Dev =	0.77	0.53	0.65	#N/A
95% Confidence Interval	0.59	0.41	0.50	#N/A
Significant Difference?	YES	YES	NO	#N/A

Paired difference calculations are a quick and cost-conscious method of determining where satellite labs may be in a mode where red flags may be needed to identify concern areas. At the same time, they offer the sales force the tools needed to help sell and move products in a very competitive and shrinking world environment, where international trade agreements are making it easier for one’s competition to move in on clients, both present and potential.