Experiments are designed to demonstrate a causal relationship between the IV and the DV, but in order to do this is it essential that anything else that might affect the DV remains constant (the same) between the two conditions, otherwise it will be impossible to determine what has brought about any resulting changes in the DV, is it the IV or is it the uncontrolled (confounding) variable? When we are unsure whether changes in the DV are caused by the IV or not, we have a problem with internal validity.
There are many factors that can affect the internal validity, but before we look at them, let’s consider a simple example:
Imagine, I want to test whether the listening to music affects Pps ability to remember a word list; all the Pps in condition 1 study the words whilst listening to light classical music, whilst all the pps in condition 2 study the same word list but whilst listening to some hard rap. I want to see whether the music (classical or rap) (the IV) affects the number of words that the pupils are able to remember in the final test, the DV). Both groups study the list of 20 words for 3 minutes whilst listening to either classical or rap, they then all complete a 1 minute distractor task (crossing out the letter E in a piece of text) and then they are asked to write down as many words as they can remember in 3 minutes.
Situational variables are variables which might affect the DV if they are not held constant between the two or more levels of the IV. Most situational variables can be controlled for using a standardised procedure which ensure that only the IV changes between the two conditions or groups. Anything relating to the situation, which is different for members of the first group compared with the second group, other than the music itself, is classed as a situational variable. it becomes a second IV and makes it impossible to determine the effects of our chosen IV (type of music) on the DV (number of words recalled). Can you think of any factors relating to the situation, which might mean the members of one group are able to record more or less words in the test than the other group (other than the type of music).
Extraneous and Confounding Variables
Extraneous variables are “extra” or in addition to the variable that we are interested in our study, that is out chosen IV. There are lots of things that could affect the DV for each person in a study and a good researcher will think carefully about these in advance. Some may vary randomly for each person and have little impact on the overall outcome, particularly if we ensure we have enough participants in our study. However, some extraneous variables are important to consider as they could impact the internal validity of the study. So, (as in the description above) if every member of one group has the window open (making it a bit colder and noisier) in the room and the other group have the window closed, this becomes a second IV. At this point we would call this situational variable a confounding variable as it has systematically affected every person in one group but not every person in the other group. Confounding variables need to be eliminated, otherwise our findings will lack internal validity.
Pp variables relate to anything about the individuals in the study that might systematically affect the DV other than the IV and therefore they can be considered a type of confounding variable that needs to be considered and controlled for in some way. If we are doing an independent measures design and all the people in group 1 (listening to classical music) have a higher IQ than the people in Group 2, listening to rap, we again have a second IV and a problem with our internal validity. If group 2 remember less words on average than group 1 we don’t know whether this was caused by the music style interfering with their ability to encode the words or whether it was because they were not as bright as the other group. There are a variety of ways in which we can deal with this problem including, switching to a repeated measures design, where each Pps performs in both conditions however this presents its own problems or if we want to stick with independent measures we can randomly allocate our Pps to the two conditions (e.g. putting all Pp “names in hat”, the first ten names drawn from the hat are allocating to condition A and the second ten names to condition B). This should ensure a more representative group for each level of the IV, made up of people with varying levels of IQ. We can also compete a matched pairs design and this is also discussed elsewhere on this website. Researchers needs to think very carefully about Pp variables which maight impact the DV and decide how these will be managed in order to preserve internal validity.
In a repeated measures deign, it is possible that Pps will perform differently the second time they take part of the study, not due to the action of the independent variable (whatever that maybe ), but simply because they have done the two trials in a specific order. The second time we do something we are usually a little more practiced and therefore may perform better due to practice effects. We may be a little more tired the second time around especially if the tasks are time consuming and repetitive and therefore we may perform a little worse (fatigue or boredom effects). This means it can be tricky to know whether differences in the DV on trial 2 are due to the IV or due to order effects, the collective term for fatigue, boredom and practice effects. This means we will have poor internal validity. There are two ways to deal with this problem: counterbalancing or randomisation.
This is also known as the AB-BA design. In a repeated measures design half the Pps do trial A followed by trial B and the other half do trial B followed by trial A. This way any order effects are counter-balanced by the other half of the group doing the trials the other way around, thus enhancing the internal validity! NEAT!
This time we ensure that our participants have an equal chance of doing either trial A or trial B first, by flipping a coin for example; randomizing the trials in this way ensures that not all of our Pps do trial B second and therefore means that performance the second time around does not always affect the same trial, thus eliminating those pesky order effects and enhancing the internal validity.
Experimenter effects occur when the experimenter is aware of the hypothesis of the study and this consciously or unconsciously affects their perception and potentially their recording of the dependent variable. For example, a researcher may have certain expectations of his or her participants and may treat them accordingly thus eliciting different responses in them. This can affect the internal validity of the findings, but can be avoided by using “blind” researchers, who do not know the hypothesis of the study or to which group the Pps belong. In a single blind study the Pps are unaware of the group they are in or the hypothesis and in a double blind study the Pps and the researchers are both blind to these all-important details.
Behaving in a socially desirable manner means behaving in away which leads to acceptance by others and avoids rejection. In terms of research, participants adhere to cultural conventions, which they believe will help to give a good impression of themselves. This tend to happen when Pps know that they are being watched or in interviews when people give a more socially acceptable account of how they feel or things that have happened to them. This can hamper the validity of research findings. Some personality types and people from certain cultures are more prone to behaving in a socially desirable manner than others.
This refers to the problem that arises when participants pick up on cues about aim of the experiment, which may be unintentionally provided by the researchers and leading to the participants behaving in a way which they think will be pleasing to the experimenter. Again, some Pps will be more likely to behave in this manner than others; some will be better at picking up on the cues in the first place and some will be more likely than others to let this knowledge affect them. Along with social desirability and evaluation apprehension, demand characteristics are part of a wider issue termed “participant reactivity” and once again all of these problem can affect the internal validity of the findings.