Comparing a Continuous Outcome Between Two Groups Parametric Sas

Problem Statement

The sample dataset has placement test scores (out of 100 points) for four subject areas: English, Reading, Math, and Writing. Each student took all four placement tests. Suppose we are particularly interested in the English and Math sections, and want to determine whether English or Math had higher test scores on average. We could use a paired t test to test if there was a significant difference in the average of the two tests.

Before the Test

State the Null and Alternative Hypotheses

The hypotheses for this example can be expressed as:

H ₀: µ_English - µ_Math = 0 ("the difference between the average English and Math scores is equal to 0")
H ₁ :µ_English - µ_Math ≠ 0 ("the difference between the average English and Math scores is not 0")

Before we perform our hypothesis tests, we should decide on asignificance level (denoted α). The significance level is the threshold we will use to decide whether a test result is significant. For this example, let's useα = 0.05, or 5%.

Data Set-Up

In the sample dataset, each student's responses are recorded on one row. Their English and Math scores are represented in the variables English and Math. This format is already appropriate for the paired samples t-test, so no further restructuring is needed.

Running the Test

SAS Program

          PROC TTEST DATA=sample ALPHA=.05;     PAIRED English*Math; RUN;

Output

Tables

After executing the SAS program above, SAS produces the following set of tables:

The heading "Difference: English - Math" tells us the order of the subtraction used for these numbers. This is important, since it determines how we interpret positive and negative numbers. Because Math is subtracted from English, positive numbers correspond to higher English scores, and negative numbers correspond to higher Math scores.

In the first table, we have descriptive statistics for the difference scores:

N: The effective sample size (students who had both an English score and a Math score).
Mean and St Dev: The average difference between a student's English and Math scores. On average, students had a 17.3-point difference between their English and Math scores (+/- a standard deviation of 9.5).
Std Err: The standard error of the difference scores, s/sqrt(n).
Minimum: The smallest difference score observed in the sample. Here, the score -10.64 represents a student who scored 10.64 points higher on their Math test than their English test.
Maximum: The largest difference score observed in the sample. Here, the score 41.69 represents a student who scored 41.69 points higher on their English test than their Math test.

In the second table, we have the 95% confidence intervals for the mean difference and the standard deviation of the differences.

In the third table, we have the actual paired t test results. The p-value is very small (p < .0001), so we reject the null hypothesis that the average English and Math scores were the same, and conclude that the English scores had a significantly different average than the Math scores.

Graphs

Graph 1: Distribution of Difference Scores

The first graph depicts the distribution of the difference scores, using both a histogram (top panel) and a boxplot (bottom panel).

If this histogram were centered about 0, it would correspond to no difference between the two test scores; however, the highlighted region in the boxplot shows that the center of the distribution is between 17 and 18.

In the histogram:
- The blue line shows the shape of a normal distribution with the mean and standard deviation from this sample.
- The red line shows the kernel density estimate - a type of approximation for the shape of a distribution. If the scores were perfectly normally distributed, the kernel density estimate would "line up" with the normal approximation.
In the boxplot:
- The box's center line shows the median, while the diamond shows the mean.
- There are two outliers on the low end; these represent individuals with who scored 7-10 points higher on the Math test than the English test.

Graph 2: Profile Plot

The second graph is a paired profile plot. Profile plots depict the "trajectory" of individuals. Each line represents one subject or case. On the left side is the person's English score; on the right side is their Math score. (Note that the axes on both sides have the same range; this is an important feature of profile plots.) By looking at the slope of these lines, we can get a feel for whether the scores are approximately equal (horizontal lines), or if one score was higher than the other (sloping lines).

Although some of the lines are roughly horizontal, most of the lines tend to have a downward slope. Since English is on the left and Math is on the right, this corresponds to most students scoring higher on the English placement test than the Math placement test. The red line showing the average trend confirms this.

Graph 3: Agreement Plot

The third graph shows the "agreement" of the two scores. The plot itself is a variation on a simple scatterplot. The diagonal reference line represents identical English and Math scores. Datapoints that fall on this line (or near this line) represent students who scored the same on their English and Math tests. Datapoints above the line represent students who scored higher on the Math test than the English test. Points below the line represent students who scored higher on English than Math. In this graph, we see many more points below the line than above the line, which means that most students scored higher on English than on Math.

Graph 4: Quantile-Quantile (Q-Q) Plot of Normality for Differences

The fourth graph is a Q-Q plot, or quantile-quantile plot, of the difference scores. Q-Q Plots are used to inspect whether an observed variable (represented as points) matches what we would expect that variable to look like if it were truly normally distributed (represented as a solid line). To read a Q-Q plot, we look to see if the dots (the observed values) match up with the expected values for a normal distribution (the diagonal line). If the points fall along the line, then the values are consistent with what we would expect them to be if the data were truly normally distributed. Here, we see that the majority of the difference scores fall on the diagonal line, so we can say that the data appear to be approximately normally distributed.

Decision and Conclusions

Sincep < .0001 is less than our chosen significance levelα= 0.05, we can reject the null hypothesis, and conclude that the English and Math scores were significantly different from each other.

Based on the results, we can state the following:

There was a significant difference in the average English and Math scores (t ₃₉₇ = 36.31,p < .05).
On average, students scored 17.3 points higher on the English test than the Math test (95% confidence interval [16.36, 18.23]).

neighbourbeggerver.blogspot.com

Source: https://libguides.library.kent.edu/SAS/PairedSamplestTest