How can we designate the different methods of doing research in education in a manner that is useful for helping both students in education programs and professional teachers understand the range of possibilities available? As a doctoral student, I understood that the most basic distinction used was to describe research as being either “qualitative”, “quantitative”, or “mixed methods”; in other words, research that involved the analysis of words, numbers, or some blend of the two. However, these terms convey nothing with respect to what we actually want to do as researchers (describe a learning setting, try out a teaching technique, etc.) or how we want to do it (by observing and interviewing members of a class, randomly selecting learners and assigning them separate groups, etc.).
Therefore, understanding the options available for conducting educational research requires that we begin with terms suitable for making helpful distinctions which can enable researchers to understand which method to choose when we wish to embark on a study. The first question to ask someone planning to conduct a research study is “What do you want to do?” This may elicit responses such as “I want try out cooperative learning in my EAP class to encourage my students to try to speak more, and I want to know how effective if is,” or “I want to understand what it’s like to teach in an English immersion program.” Since the information being sought is very different in each of these two instances, the research approachchosen for selecting participants, collecting and analyzing data, and ensuring the reliability and validity of the findings will be very different. The teacher trying to get his or her students to speak more in class will find an action-research approach to be the best option, while the educational researcher interested in immersion education would find a naturalistic approach, by which a researcher seeks to describe a particular educational setting–or some aspect therein–as is, to be the preferred route to use.
Research approaches in education can be divided three branches: general, administrative, and pedagogical.
General approaches are those selected by professional researchers for the purpose of generating knowledge to inform theory and/or practice. This branch of approaches includes: the aforementioned naturalistic approach, such as ethnographies and case studies, which are descriptive and subjective in nature; the experimental approach, through which studies are conducted to test one or more hypotheses by administering a treatment (a teaching technique or use of a material) and measuring the effects through the use of statistical analysis; and, the multivariate approach, which involves the quantitative analysis of effects and relationships among factors affecting language learning. Administrative approaches are evaluative in nature and are undertaken to generate knowledge informing policymakers at either the program, institution, or district level to make decisions relating to staffing, funding, or program reform, for example. Finally, pedagogical approaches are conducted to improve classroom practices and may aim to inform theory as well. The action-research approach is conducted by one or more classroom teachers to find a solution to a teaching or learning issue in a particular course or school subject. Action research is a short-term venture, intended to find an immediate solution, if possible, for the problem at stake. Design(-based) research, on the other hand, is a long-term collaborative project involving both researchers and practitioners and is aimed not only at resolving a teaching issue, but in the process of doing so, generating knowledge that can result in innovative practices and updated theory. ¹
Once a researcher has decided on an approach, the next step is to select the most suitable design. Naturalistic approaches, being descriptive in nature, allow for either a thorough, overall portrayal (ethnography) of an educational milieu, such as an immersion program, or a more in-depth examination (case study) of one aspect of it, such as a class or teacher, to give two common examples. An experimental approach can involve either the random selection of participants (pure experiment) from a particular target population (e.g. international students enrolled in EAP programs in a particular region) so as to generalize the findings to the target group, or the use of one or more existing classes (one-group or quasi-experiment), which does not allow such generalizations but can have important local implications. A multivariate approach may involve a single group or multiple groups, and make use of a survey or factorial design for example, to measure statistical relationships or effects among variables relating to language learning. A wide range of designs and statistical analysis options is possible, depending on the nature and number of independent and dependent variables of interest to the researcher. Finally, the pedagogical approach, being pragmatic in nature is flexible with respect to the design chosen, depending on the number and size of the classes involved. Ethical concerns relating to matters such as the involvement of teachers researching their own classrooms dictate that the specific options available for a given study are limited to those which pose the least threat to the learners.
Contemplating research? What do you want to do?
¹Shah, J. K., Ensminger, D. C., & Thier, K. (2015). The Time for Design-Based Research Is “Right” and “Right Now.” Mid-Western Educational Researcher, 27(2), 152–171.
Identifying elements of credibility to evaluate assessment is relatively straightforward. Assessment, either formal or informal, is also a part of daily practice in the professional life of a teacher, and so an experienced teacher can approach this task with confidence of being able to make an accurate judgment–one that is valid, fair, and reliable–about his/her learners’ progress in their learning.
Research is a very different matter. The criteria for judging the credibility of a teacher’s research study are considerably more complex; what is more, research is not a daily activity of teachers, and for many in the profession, it may be altogether foreign, making any attempt to conduct a study that can be judged as credible a formidable task. Nevertheless, doing research in their classrooms can be of great value to teachers, providing opportunities for them to increase their knowledge of learning in their particular teaching specialty, to develop creativity, or find solutions to issues affecting learning in their classes, and through each of these means strengthening their professional development. Additionally, by sharing their findings through articles, webinars, and/or sessions at professional conferences, teachers can serve the professional-development needs of their colleagues.
In a study that examined the issue of credibility from the perspective of researchers engaged in the scholarship of teaching and learning (SOTL), Biliot et al (2017) found that SOTL researchers considered the quality of their research to be the most important aspect of their credibility as professionals. These researchers also considered that getting their work published, conducting workshops and making presentations at conferences, and getting their work published to be of high importance as well. Hanuka, in a review of SOTL research, believes that basing research on a strong theoretical framework (i.e. on theories relevant to the topic on which a researcher is focusing) and using proper methodology are critical aspects of credibility for teachers to apply when embarking on a study.
It is therefore important for teachers to understand how to conduct research in a manner that can be judged to be credible not only to local stakeholders but also other professionals in our field. The purpose of this article is to present quality criteria for teachers to keep in mind as they plan, conduct, and analyze their classroom research in order to be well received in the professional community as they report their results.
Criterion #1: selecting the right research approach
The first question for any researcher to ask themselves is “What do I want to do?”. This question leads to selecting the proper research approach for their planned study. Unfortunately, educational program developers and textbook writers have traditionally focused on research designs rather than approaches. Research designs, however, deal with the question of “how” of doing research, and this needs to be preceded by “what”. If a researcher does not clearly understand “what” he or she wants to do, then deciding “how” is not at all helpful.
The nature of much learning–and language learning in particular–is such that studying it in a controlled environment such as a laboratory is not practicable. While experimental studies are often carried out in educational research, they are not the only option and very often do not offer not the best means of helping an individual researcher or team achieve their stated purpose for conducting a study: “I/We want to . . .”.
Thus here are common purposes for choosing to embark on research study in education:
Study a program or a group of learners as is.
Study statistical relationships among factors affecting learning.
Study the effects of a technique or educational material on learning.
Explore a possible solution to an issue affecting learning in my classroom.
Each of the above aims is best achieved by a particular research approach:
Studying a program or a group of learners as is calls for a naturalistic research approach.
Studying statistical relationships among factors affecting learning requires a multivariate approach.
Studying the effects of an instructional technique or educational material on learning requires an experimental approach.
Exploring a possible solution to an issue affecting learning in your classroom calls for an action research approach.
The beauty of a naturalistic approach is that it offers a researcher an opportunity to examine a learner or group of learners as is in the context in which the learning takes place. It is largely descriptive, making use of repeated observations, as well as such data collection techniques as interviews in order to understand how learning is taking place from multiple perspectives, that is, the views of those who are part of that context, such as students and administrators. It involves a long, inductive process, requiring collection and analysis of a large amount of verbal data in order to arrive at a clear understanding of how learning takes place, be it in the formal setting of a classroom or less formal setting of a coffee shop, dormitory, or student club.
A multivariate approach is useful for providing insights about relationships among two or more factors related to learning. A large-scale study of factors affecting academic achievement of ESL learners in one or more school districts is one example of a study using such an approach. While such a study would be descriptive, like a naturalistic approach, the description of relationships would be primarily or wholly statistical rather than verbal and would not likely make use of multiple perspectives in order to understand what is going on.
An experimental approach is in order when one or more teachers wishes to measure the effectiveness of a particular teaching method/technique, assessment tool, or teaching material. The experiment may involve one or multiple classes and will be deductive in nature, seeking to test one or more statistical hypothesis, to be supported or rejected depending on the results of the data analysis/es conducted.
Finally, an action research approach is used for dealing with a significant issue affecting learning in a particular classroom. Action research may be participatory or practical (Creswell, 2018). While participatory action research often has a social justice aim, practical action research, the type described here, is pedagogical in its purpose, seeking to find a solution to resolve a learning issue.
Criterion #2: selecting the right research design
Once an individual researcher or team has determined what type of approach to use for a study, the next step is to decide how to carry out the study. This is where the matter of design comes in, and the design is dependent on the approach. Among designs within the natural approach, ethnographies and case studies are the best known. To apply an artistic metaphor, an ethnographer seeks to paint as complete a portrayal of a particular learning context as possible. Thus, an ethnographer focusing, let’s say, on the experience of acquiring both languages in a bilingual program will spend an extensive amount of time observing classes, interviewing various stakeholders, administering questionnaires, and collecting and analyzing documents in order to gain as complete an understanding as possible of the bilingual learning experience. A case study researcher, on the other hand, as an artist, will focus on one particularly noticeable aspect of the learning context, such as a single class or learner, in order to gain a more detailed understanding of it.
Designs applying a multivariate approach focus on two or more learning-related factors. In addition to school-district studies described earlier, a multivariate study may involve collection of data related to factors affecting freshman undergraduate GPA, for example, to understand the relative strength of relationship between each independent variable (e.g., SAT score) and the dependent variable (GPA). Alternatively, learners’ performance on tests related to reading comprehension can be compared with their scores on a direct test of reading, such as the ib TOEFL reading section, to measure the extent to which factors such as vocabulary size and reading speed, for example, are related to overall reading ability. Large-scale survey studies are another design type applying a multivariate approach. Such surveys focus on topics such as confidence, willingness to communicate, or language aptitude to understand relationships among learning-related variables.
Unlike studies conducted using a naturalistic or multivariate approach, experimental studies are highly controlled. As in the hard sciences, these studies involve some form of treatment, and require a design that can account for all the variables (factors) that can affect the outcome of the study. Thus there are a number of experimental designs to select from, depending on how many variables one is dealing with. Selection of the right design is important not only for including the relevant variables but also for selecting the correct data analytic procedure. For teacher researchers, the most important types of experimental designs to understand are pure experiments, in which all subjects (participants) are randomly selected from a particular population and then assigned to a treatment group (experimental or control), quasi-experimental studies, in which random selection of subjects is not possible (generally the case in school-based studies), and one-group experiments, in which a single group (generally, students enrolled in the same section of a course) serves as the experimental group. For educational studies, treatment very often involves the use of a post-test, pre-test and post test, or a post-treatment (or pre- and post treatment) questionnaire. Experimental studies focusing on skills generally involve tests or tasks, while those focusing on motivation or self-efficacy (confidence) make use of questions rather than, or in addition to, tests or tasks.
Studies applying a practical action research approach are most commonly experimental, since an instructor is seeking to apply some type of pedagogical strategy, be it a type of task or material, in order to find a solution to an issue affecting learning in a specific class or sections within a course. Where student numbers are very low, six or fewer, a case study design is more appropriate, since statistical analyses featuring very small groups of learners are less stable: an extremely high or low score can significantly impact the mean score if the number of scores in a given dataset is very low.
Criterion #3: displaying knowledge of relevant theory, research, and research gap
Much of a researcher’s authority to conduct a study on any given topic stems from their knowledge of theory and previous studies relating to that topic, based on which the researcher or team identifies a research gap which provides justification for their study. For instance, for a study on the usefulness of allowing learners to provide error feedback on writing assignments in a common first language (e.g., Cantonese for an EAP Writing class in Hong Kong), a researcher would need to provide evidence of strong background knowledge of theory relating to the effectiveness of error correction as well as the development of writing skills in a second language, in addition to awareness of previous research studies relating to error correction in writing, particularly that provided in a learner’s first language, whether it be written or oral and from peers and/or an instructor. The discussion of relevant theory and research, and the research gap which provides a niche for the study in question, would be provided in the literature review or conceptual framework of an article or conference session reporting on a completed study. The stronger the evidence for the researcher’s knowledge of theory and research related to their research topic, the more authority they have to present their own research, and the more credibility they stand to have in the eyes of their colleagues and other professionals in the field.
Criterion #4: stating appropriate research questions/statements and hyptheses
Based on the literature review and identification of the research gap, the researcher is ready to state the general purpose (often in naturalistic research) governing their study or specific research question(s) which their study seeks to answer. The open-ended nature of an ethnography, wherein the researcher seeks to discover and then describe the characteristics of the educational context being studied generally does not lend itself to asking specific questions at the outset. Experimental studies, on the other hand, as well as practical action research studies, are much more focused in purpose, and the articulation of specific research questions allows both the researcher(s) and consumers of studies to have a specific idea of which specific answer(s) are being sought through a particular study. Moreover, in the case of experimental studies, research questions provide a basis for stating the statistical hypothesis/es being tested through the research. This includes action-research studies employing an experimental design. Experiments are carried out to test one or more specific statistical hypotheses, and while articles on many experimental studies do not include specifically-stated hypotheses, stating the hypothesis/es being tested supports the credibility of the study by providing evidence to the readers/listeners concerned that the researcher or team recognizes the hypothesis/es in play.
Criterion #5: Designing or adapting appropriate and well constructed data-collection tools
Empirical research involves collecting data and making conclusions based on analysis of the collected data. It is essential, therefore, that the tools researchers use to collect data are well designed to allow researchers to make valid and reliable conclusions. Two of the most common data collection tools used by teacher researchers are tests and questionnaires. I will therefore focus here on credibility issues related to valid construction and reliable scoring/analysis. These issues also apply to tasks researchers administer during the course of a study.
According to Brown (2018), “a valid test measures exactly what it proposes to measure” (p.32). This is what is referred to as construct validity, that is, actually measuring the skill in question. Thus, a test or task designed to test students’ ability in business writing for example, might test that specific skill by requiring them to compose a letter of complaint regarding a delayed shipment of goods, or a request to reschedule a planned meeting. Such a test or task would also be required to display content validity if it is meant to be based on what has been covered in class, such as in a course textbook. Validity also depends on accurate scoring: answer keys need to be checked for accurate scoring of a multiple-choice grammar or reading test, for example, and any rubric used to evaluate a writing or speaking test/task needs to be rater friendly to ensure accurate scoring and needs to provide an accurate measurement of learners’ ability.
As for reliability, Brown mentions the need for rubrics that allow for consistent scoring, and tasks//items that are unambiguous for test-takers (p. 29); the latter would apply to instructions as well as the lay out of the task/items–a familiar task/test is far easier for students to understand, with respect to how to answer, than an unfamiliar task/test. As well, inter-rater and intra-rater reliability are important issues for grading writing and speaking tests: A reliable rubric will facilitate a high rate of agreement between two raters grading the same test, as well as consistently accurate scoring for all students by a single rater.
One data-collection technique used by TESOL instructors is to use tasks and rubrics from standardized tests. Thus an ibt TOEFL or IELTS Academic writing task and rubric might be used in an EAP writing class or a TOEIC or IELTS speaking task and rubric for a Business English course focusing on speaking. While such tests are internationally recognized for both their validity and reliability, the specific task chosen needs to fulfill the requirement for content validity for the course in question as well, and the rubric needs to be both suitable for the proficiency level of the specific students and user friendly for the teacher researcher doing the scoring. Standardized-test rubrics are very task specific: the IELTS Academic test has two writing tasks with accompanying rubrics, and the ibt TOEFL has both an integrated and independent writing task, each as well with its own rubric. To ensure credibility of adopted data-collection tools, teacher researchers need to understand the specific skills being tested by a particular task on a standardized test to ensure they select the most suitable task for their research purposes, and be able to use the corresponding scoring rubric easily and consistently to provide reliable and valid grading.
As with tests and tasks measuring language skills, questionnaires used for data-collection purposes need to display both construct and content validity, and must consistently measure the target construct (e.g., motivation or self confidence) from a credibility perspective. Thus a questionnaire on motivation must be shown to consistently measure only a single construct, a criterion that can be measured using a statistic such a Cronbach alpha. To strengthen questionnaire credibility, teacher researchers adopt previously designed questionnaires recognized as valid and reliable measures, but need to ensure that students can complete them accurately and that they (the researchers) can interpret the results accurately as well.
One technique to ensure comprehension of a questionnaire created in a learner’s second or third language is to have it translated, in which case checking the translated version for accuracy is required. Questionnaire credibility can also be enhanced by adding 1-2 open-ended items (to a Likert-style questionnaire, for example), which allows learners respond in their own words with respect to the topic of the questionnaire (e.g., motivation). This does require, though, accurate interpretation of learners’ perspectives, which can be challenging if their L1 is not familiar to the researcher. A concern with respect to Likert-style questionnaires using a five-point or seven point scale–in which the middle choice is “Neutral” or Not sure”–is that overuse of the middle response by students wishing to complete the questionnaire quickly detracts from credibility of the tool, since a response of “Neutral” or Not sure” provides no additional information about a student’s perspective, weakening the validity of the instrument.
Criterion #6: Carrying out suitable and sound research procedures
Research procedures vary considerably by approach; understanding the correct procedures for the approach being applied is critical for researchers.
For naturalistic studies, thorough data collection is an absolute necessity for making any claim for credibility with respect to conclusions reached. A sufficiently broad and in-depth process of data collection provides a thick description of the educational context that creates this credibility. Triangulation is the research strategy employed to achieve a thick description. Creswell describes triangulation as “the process of corroborating evidence from different individuals . . ., types of data . . ., or methods of data collection”. Achieving a thick description requires a long-term commitment to collect a variety of data by various means and from different informants. Creswell (2018) identifies a five-step procedure for carrying out data collection in such studies:
identify the specific informants (individuals or groups who are regular participants in the educational context being researched (e.g. teachers and/or administrators in a school, students in a particular class, etc.) from whom you will collect data during your study
obtain permission to carry out research by means of interviews with informants, class observations, etc.
consider what types of information you need to gather
prepare data-collection materials and recording devices
collecting different types of data from your selected informants
Different informants require different means of collecting information based on their preferred communication style and willingness to share their opinions. As this writer discovered doing his dissertation research on a bilingual school using a naturalistic approach, some teachers can be very approachable for interviews and share opinions readily. Others are less communicative with respect to in-person interaction but are willing to complete questionnaires. Likewise with school administrators. Larger groups such as students and parents may be most accessible by means of questionnaires, designed bilingually if need be; this is how I sought the opinions of both groups during my dissertation research. It is best to be flexible, patient and courteous, particularly as an outsider at a research site, and to be well aware of, and careful to follow, ethical guidelines. Developing and maintaining positive relationships with informants is particularly important given the recursive nature naturalistic studies; data collection and analysis in early stages is general, but becomes gradually more focused as patterns emerge during data analysis that enable a researcher to make, and investigate tentative conclusions. These in turn, lead to more structured interviews and questionnaires seeking more specific information as the researcher’s understanding of the site or learner(s)–as in a case study–gains greater clarity until conclusions are confirmed and the portrait is completed.
Experiments involve a very different research process from that used in naturalistic studies. As he does with naturalistic studies, Creswell (2018) identifies a five-step process. The steps are generally the same for the two approaches; however, the manner in which participants are selected and data collection takes place in experimental studies is markedly different from that for ethnographies and case studies:
participants are selected using probability sampling: the researcher(s) select, at random, a sample large enough to be considered representative of a larger target population, such as international students living in particular metropolitan area.
While permission to carry out a naturalistic study at a school, for example, will be sought from administrators, permission to carry out an experimental study, unless the researcher is conducting the research as a faculty member of a particular institution, may simply involve gaining participants’ consent before the experiment begins to use their data for analytic purposes once they indicate their agreement to take part in a study.
As with naturalistic studies, consideration of the type(s) of data to gather, preparation of data-collection materials and recording devises takes place before the actual research begins. However, an experimental study involves grouping of participants in line with the specific design selected for the study. A pure experiment will involve random assignment of each participant in either the control group, or the experimental group–or one of two or three experimental groups, depending on the design of the study.
While a researcher or team conducting a naturalistic study depends on a thorough and lengthy data-collection process to establish credibility, experimental researchers depend on a very carefully conducted treatment process to achieve that same purpose. Participants in the control group for a study will engage in some type of activity, but that chosen will be selected and conducted carefully so as not to cause participants in that group to do any language practice that could affect the results of the study. On the other hand, the participants in each experimental group will participate in one or more activities designed to test the hypothesis articulated for that study. For example, for an experiment testing a hypothesis that L1 peer feedback of students’ paragraph drafts has no impact on performance of Mandarin native speakers in an EAP writing class, participants in the control group would write the first draft of a paragraph on the topic of “My life hero”, then watch a video on famous heroes in history before writing their second draft. Participants in Experimental Group #1 would receive feedback on their first draft from their instructor in English, using a pre-designed error feedback key, while watching the same video as those in the control group before writing their second draft. Finally, participants in Experimental Group #2 would provide feedback on one another’s first drafts in Mandarin after receiving directions from their instructor on how to apply the same error feedback key used for Experimental Group #1.
For multivariate studies, credibility depends both on selection of participants for a given study, as well as the reliability and validity of any data-collection instruments employed during the study. With respect to surveys, for instance, the researcher(s) must first identify their target group of participants in line with the research question(s) articulated for the study. For a survey on the impact of COVID-related campus access policies on teaching and learning of ESL/EFL during 2020 and the first half of 2021, this writer chose as the target participant group current and foreign students in the graduate TESOL program in which he is an instructor. This was a group for which I could easily and ethically access their content information and which consisted of a large portion of individuals who knew me either as a former instructor of some of their courses or a colleague of those who had taught them. After gaining permission to conduct the study from my institution’s research ethics board, I set up the survey using an online server (Survey Monkey) and distributed the survey electronically with a preceding description of the study which included an invitation to participate in an optional interview afterwards for which those who took part received a promised Amazon gift certificate. The description also included a notice of consent indicating that their decision to complete the online survey would be taken as their consent to participate in the study and allow me, as the researcher to use their responses for research purposes. Finally, the survey was designed so that each participant could complete it anonymously; it contained no information that could be used to specifically identify any participant, although those participants who agreed to participate in the optional interview were, with their foreknowledge, identifiable to me, though no report mentioned any of their names. Each interview was recorded with the participant’s knowledge. Overall, the response rate for the survey was about thirty percent, a ratio considered to be about average in our profession.
Action research studies employ a unique research procedure from that used in other approaches. Because teachers conducting action research are working with their own students as participants, special ethical procedures must be followed to protect participants from a “power over” relationship that exists since it is the teacher who determines final grades. Teachers planning to conduct action research can choose from two procedural options:
The first option, which is the less common of the two, involves carrying out an action research study with a single or group of students within a class one is teaching. This is mostly likely due to diagnosis by the teacher of a learning issue that she/he would like to address in the short term but does not have class time available for such an activity. For example, an elementary teacher may identify some students whose word-reading skills are particularly weak in comparison to other students in the class and wish to do some additional phonics work with those students. In such a case, the teacher concerned will seek permission to carry out the activity, in the form of an action research study, after school or at some other time that can be arranged. Permission would be needed from school or program administration to conduct the study, and consent needed from the learners’ parents or guardians, for participation. Program/institutional ethical guidelines must be strictly adhered to, including proper procedures for working with participants who are minors. In addition, it is important to protect the confidentiality of data collected and the anonymity of participants.
The second option, in which formal research takes place after a course has been completed, is more common. In this case, during the teaching phase of the study, the teacher researcher conducts a “curricular innovation”–that is, makes an adjustment to normal pedagogical procedures in order to implement a technique, activity, or material that he/she believes could help to resolve a learning issue impacting students. For example, a teacher who finds that students take a long time to read passages because they constantly consult their dictionaries may implement timed readings with no dictionary use permitted in order to help wean students of their over-dependence on dictionaries and build their confidence as readers. In such a case, while data collection, involving task worksheets, a pre/post-test and/or questionnaire, would take place during class, formal analysis of the data collected would not take place until after the course is completed and final grades submitted. This delay negates pressure on learners (or their parents/guardians) relating to the power relationship mentioned earlier, since the researcher was also the course teacher. Once grades are submitted, the teacher researcher would invite students or adults responsible for them to participate in the research phase by seeking their consent to use their (or their child’s) data for formal research analysis. As with the first option, data confidentiality and participants’ anonymity are prime concerns.
Credibility in action research procedures depends, first, on the researcher(s) showing evidence of having done sufficient background research to gain a clear understanding of the issue being studied. This includes an awareness of the challenges faced by the learners in the particular educational context as well as knowledge of theory and research related to the general learning issue involved (for example, willingness to communicate). Second, it depends on providing one or more formally stated research questions, and where relevant, statistical hypotheses. Third, and most importantly, it depends on the selection of the optimal research design for the context and research focus concerned. For example, in cases where the teacher wishes to apply a curricular innovation, a one-group experimental design (or a quasi-experimental design, if two classes are involved) is the best choice, unless the size of the overall class is very small. An experimental design allows for a pre/post-test procedure to measure the effectiveness of the curricular innovation the teacher wishes to try out. A questionnaire can also be used to assess effectiveness by gleaning the students’ opinion of the innovation. Task worksheets provide additional information, as can a teacher diary, in which the teacher records his/her own observations and opinions during the teaching phase of the study. In a case where a teacher is working with a small class (six students or less) or subgroup within a class, a case study design would likely be the best choice. A small group makes an experimental design a risky choice due to the instability of data analytic procedures with such a small amount of data. Moreover, a small group of participants allows for an in-depth analysis of treatment effectiveness by making use of student journals, open-ended questionnaires (in which participants provide written responses to questions), and, where a sufficient level of trust exists, face-to-face interviews or a focus-group discussion. The latter two options, posing a significant threat to both confidentiality and anonymity of data, require that participants be made aware beforehand of the risks involved.
Credibility in action research procedures is also affected by the manner in which the intervention–treatment–is carried out. The length of the study, the clarity of procedures for the learners, and steps taken to ensure accurate scoring of tests, analysis of questionnaires, and proper storage and protection of collected data all impact credibility.
Criterion #7: Conducting appropriate data analyses
Procedures for analyzing collected data vary by research approach. There are two reasons for this. First, naturalistic and multivariate studies both use an inductive method to draw conclusions, while experimental studies use a deductive method to test one or more hypotheses. Action research may be deductive or inductive, depending on whether an experimental or case-study design is chosen. Second, numeric and other data require different methods of analysis to enable the researcher(s) concerned to interpret the results. This is particularly important in a study where the researcher collectives both numeric and other data in order to strengthen the credibility of the findings. A naturalistic researcher, for example, may use numeric data such as test scores, questionnaire responses, and biodata such as age or years of study to gain a more in-depth understanding of the context being studied. Conversely, a researcher conducting an experimental study may make use of interviews to have more information available for rejecting or supporting a null hypothesis concerning the effectiveness of a particular technique.
The steps for analyzing both numeric and other data are as follows (based on Creswell, 2018):
For numeric data, the main steps are as follows
Score the data, that is, calculate scores (test or task) or tabulate results (questionnaire or survey)
Select a data analysis application
Select a data analytic procedure
Input the data
Run the analysis
Interpret the result
Despite the apparent fear with which many students in education view statistics, much statistical analysis in educational research is not complex because the conditions present in large-scale studies–a large population of potential subjects from which to randomly select a sufficient number of participants to allow for claims to external validity–don’t exist in our classrooms. Complex statistic analyses generally require large numbers of participants to yield meaningful results. A class of 10-20 learners, or even 25-30 learners does not meet the conditions to warrant such analyses. Indeed, much can be learned from what are known as descriptive statistics–such as the number of answers for each choice in a Likert-style questionnaire or the mean score for a test–statistics which teachers already learned at some point during their schooling and which are very handy for much of the research teachers can most practically carry out: distributing anonymous questionnaires to their students on a class topic or regarding the impact a class has had on their motivation or self confidence, or administering a task or test which can be used–within ethical parameters–for research purposes. Other descriptive statistics common in education include the percentile, range, median, and standard deviation, all of which are useful for understanding test results.
The other branch of statistics used in educational research is inferential statistics. These are applied, as the term suggests, to allow researchers to make conclusions based on the results of statistical analyses. The ANOVA family of analyses, for example, enable researchers to understand effects of independent variables (e.g., first language, native country, level of education) on dependent variables (grammatical accuracy or reading fluency, for example). Correlation analyses provide information about statistical relationships among variables, such as between grammatical knowledge and overall score on the IELTS test. Reliability analyses such as Cronbach alpha, KR 20 and KR 21, provide understanding of the consistency with which a test or questionnaire measure only a single ability (e.g., vocabulary knowledge) or personal characteristic (motivation, confidence, or willingness to communicate). In experimental research, inferential analyses provide information supporting or rejecting a statistical hypothesis being tested in a particular study.
For verbal data, the analytic process is markedly different. The most important difference between numeric and verbal analyses is that the former is a linear process generally occurring once during a survey study, for example while the latter is generally an iterative process that becomes gradually more focused during the course of a longitudinal study such an ethnography or case study. Creswell (2018) identifies the following steps:
transcribe and organize data
explore the transcribed data
code the data
describe it
identify and connect themes
Analysis during naturalistic studies reflects the aim of a researcher or team to provide a thick description of the research context as mentioned earlier, a process that requires considerable time, and numerous iterations, to be sufficiently thorough. With multiple informants involved, as well as multiple types of data, the process of transcribing, exploring, and describing data is one that needs to be repeated often before themes are developed that allow the researcher(s) to reach a credible conclusion. Where numeric data is also collected over the course of a naturalistic study, the findings from statistical analyses need to be accounted for as themes are identified. The more thorough the triangulation process, the “thicker” the description that results, and the more credible the conclusions made.
Criterion #8: Making appropriate interpretations and conclusions
The elements of research credibility discussed up to now serve as building blocks for making appropriate interpretations and conclusions for any research study. Selecting the right research approach at the outset is critical to choosing and applying the most appropriate design. Identifying the research questions you seek to answer and stating appropriate statistical hypotheses when needed will serve as helpful guidelines for selecting participants, creating or adopting data collection instruments, applying the correct data collection procedures, and using appropriate data analytic analyses. Credibility is also dependent on a display of sufficient knowledge of theory and previous research relevant to your research topic; such knowledge lends authority to the conclusion(s) reached by the researcher(s), as does knowledge and application of ethical constraints governing one’s study. Additionally, credibility is dependent on logical interpretations made based on results from data analyses; answers to research questions must make sense based on the findings, and support or rejection of any statistical hypotheses governing a study must likewise be logical based on results of the statistical analysis/es carried out.
Finally, credibility is either strengthened or weakened by a clear statement of the limitations pertaining to a particular study; no research study is perfect. A naturalistic researcher is dependent on the quality of information provided by participants, and ethical restraints can prevent access to data that could add valuable insights. A good experiment has a specific focus and features careful selection of participants; however, such requirements tend to limit the external validity of conclusions reached. Participation in survey research is generally well under 50% of participants initially recruited, and it is strongly suspected that those who do participate in surveys are rather different from those who do not. Lastly, action research studies, designed to find solutions to learning challenges affecting a particular classroom, do not lend themselves to extending conclusions beyond the group of students being researched.
In conclusion . . .
Conducting high-quality educational research does not require one to be a statistical genius. Knowledge of the criteria for credible research in education is far more important than mathematical expertise. Taking a course in research methods can be a great asset to any teacher’s professional development, both as a consumer and producer of research. As a teacher begins and then becomes adept at research over time, then the following will apply (Moulden, 2021):
Knowledge feeds practice.
Knowledge-based practice builds excellence.
Excellence breeds fruitfulness.
References
Billot, J., Rowland, S., Carnell, B., Amundsen, C., & Evans, T. (2017). How Experienced SoTL Researchers Develop the Credibility of Their Work. Teaching & Learning Inquiry, 5(1).
Brown, H. D. (2018). Language assessment: Principles and classroom practices (3rd ed.). Pearson Education.
Creswell, J. W. (2018). Educational Research: Planning, Conducting, and Evaluating Quantitative and Qualitative Research. Sixth Edition. Pearson Education, Inc.
Kanuka, H. (2011). Keeping the scholarship in the scholarship of teaching and learning. International Journal for the Scholarship of Teaching and Learning, 5(1), 1-12.
It has long been believed that research ought to be left to the researchers, to those experienced in the work of carrying out experiments, taking field notes as they observe what is happening around them, or pouring over text to glean some meaning from it. Teaching, on the other hand, takes place in an environment from which researchers have come but to which they either do not return or return only occasionally to give a lecture or presentation.
Such a perspective ignores both the value of the teachers’ perspective as the ears and eyes of those of us in the education profession, as well as the value that research holds for teachers in their classrooms. Regarding the first point, teachers work on the front lines of our profession: the classroom. They understand the ins and outs, and the ups and downs of classroom learning and overall classroom dynamics better than anyone, aside possibly from their students. They thus have many experiences, and much wisdom, to share with those of us on the outside of their classroom walls. If I were hired to teach in a Korean high school, I would contact any English teachers currently or recently working in such schools to find out what to expect in terms of the students’ abilities and concerning likely expectations on the part of administration. I certainly wouldn’t ask my neighbour working as an instructor in a tech college in Vancouver. Conversely, were I scheduled to have a job interview at the tech college, I would be keenly interested in knowing about my neighbour’s experience teaching there, and would be interested in talking to other teachers working there as well.
As to the second point–the benefits that teachers gain from carrying out their own research–like a Swiss army knife, research can be a very handy tool for any teacher understanding how to use it well. First, research in the classroom can be used to expand a teacher’s knowledge about learners and learning in their particular teaching specialty, for example by means of observation notes, questionnaires given to students, tests, or any of variety of individual or group tasks, large or small. Second, as teachers experiment with a variety of tasks and test types, they can develop their creativity in the classroom. The need for this is particularly great for online/distance learning: getting and keeping learners engaged during a class on Zoom is widely reported to be quite a challenge, particularly with young learners and others lacking the motivation or confidence to learn online.
Third, through action research, teachers can explore potential solutions to particular learning challenges in their classrooms. For example, cooperative learning can be used as a possible solution to help learners struggling with selecting the correct pronouns, prepositions or transition words (“therefore”, “however”, “additionally”, etc.) in their writing. Stronger writers can be grouped with those who are weaker to act as mentors, a particularly helpful strategy when they share a common native language. Finally, by expanding their knowledge of learning in their discipline, increasing their creativity, and finding solutions to learning issues in their classrooms, teachers advance in their professional development, and care help promote the professional development of their peers as they share their findings in workshops, presentations, or on their own professional blogs or in a professional journal.
All of these are sound reasons for teachers to engage in research in their classrooms. As the instructor of the research-methods course in a TESOL program, however, I regularly encounter two obstacles to teachers becoming skillful researchers: the lack of knowledge needed to carry out sound research, and the lack in confidence to engage in it. The latter problem is an issue based not only on a lack of knowledge of educational research fundamentals but also on a lack of understanding of how educational research differs from that carried out in the hard sciences and medical fields, for example. Research of learners, and learning, is very different from other types because, for instance, it cannot be carried out in a lab using test tubes and special machines or analyzing rock formations or fallen trees outdoors. Educational research makes use of particular approaches, design options, and procedures for data collection and analysis, and for meeting ethical constraints. This is specialized knowledge requiring a course focusing specifically on research methods. Thankfully, programs such as ours and others in education do offer such a course, either as an elective, or ideally, core requirement. The potential benefits to teachers, their students and to our peers within our teaching specialties are such that it would be a shame for teachers to refrain from conducting research in their classrooms, physical or virtual.
A good teacher knows how to plan. Effective planning before a course begins involves selecting instructional materials, developing a syllabus, and planning units and lessons. It also involves preparing assessment, such as placement or diagnostic testing, and any major assessment tools such as a term paper, portfolio, or midterm and final exam.
Planning does not stop once a course begins, however. Teachers plan lessons throughout the duration of a course, and they are often required to adjust or make significant changes to a plan depending on how well students perform in class.
Planning is also only one stage of the teaching process. Once a course begins, a teacher needs to implement the plan worked on in the preceding weeks, and, as they implement it, evaluate its effectiveness by monitoring students’ performance and attainment of lesson and unit objectives, and decide how to respond, either by continuing with the planned curriculum or by making adjustments. Whichever choice is made involves the teacher in continuous planning throughout the duration of a course: the process of planning, implementing, evaluating, and responding becomes a recursive cycle, as seen in the image below, that moves at irregular speeds depending on the particular course involved and the frequency and length of class meetings. A typical day in a Grade 3 class will likely see the cycle moving at a much faster pace than a once-weekly lecture-style history course in university.
Let us look in detail at the different stages of the process to understand the model more clearly and see how it can be best put to use to help teachers thrive in their classrooms. As evidenced from the table below, success in teaching is very dependent on what happens before a course begins and in its early stages. Quality teaching depends on the ability of a teacher to both prepare and adjust as needed in response to students’ attainment, or lack thereof, of the intended course outcomes. Also important are the in-class skills a teacher employs to communicate effectively, manage time, and organize teaching activities to maximize learning and monitor how effectively students appear to be “getting” lesson objectives. Evaluation is both informal and formal; a skilled teacher has both eyes and ears open in class to assess what is happening and selects and designs assessment activities and tools to more formally evaluate students’ progress in achieving objectives between class sessions. Additionally, a skilled teacher can respond to students’ struggles during a course by making adjustments to instruction, in-class activities, and/or assessment strategies to foster more successful learning.
Teachers are professional learners as well as learning planners; their skills do not develop overnight, but as a teacher gains and applies the knowledge of the best ways to help students attain their goals, this experience helps them to develop into excellent practitioners whose efforts to help students succeed yield more and more fruit as they apply the many lessons they have themselves learned.
Much discussion has taken place over the past few decades concerning the issue of what is the most important criterion for evaluating the quality of a test: is it validity? is it reliability? is it authenticity? or, is it something different from these, or perhaps some combination of them?
Validity, or more specifically construct validity, has received perhaps the most attention and support as the supreme test quality. If we define it as “measuring what a test is designed to measure” then it is clearly vital that a test succeed in measuring the skill (e.g., academic writing ability) or knowledge area (e.g, understanding the pragmatic rules relating to international business negotiations) that it was designed to measure. The Scholastic Aptitude Test (S.A.T.) has often been panned for its supposed inability to accurately assess students’ readiness for post-secondary study. The paper-based TOEFL was so poorly regarded for its lack of validity as a test of academic English proficiency that its creator, the Educational Testing Service (ETS), overhauled the content and design of the test so much so that the current ibt TOEFL bares little if any resemblance to its primary “ancestor”, featuring direct measurement of writing ability in the form of actual writing tasks and semi-direct measurement of speaking (each test-taker speaks into a microphone and has their voice recorded). The ibt TOEFL, far more than the original, merits the designation of a “valid” test.
Reliability, however, has also been touted as an essential ingredient; the focus on psychometrics reflected in tests such as the paper-based TOEFL and the original Michigan English test featured multiple-choice items measuring knowledge of discrete points of grammar and specific vocabulary terms. Tests were long in order to provide coverage of a wide range of knowledge “points”, and the multiple-choice design of items enabled quick, accurate scoring and, therefore, “trustworthy” results. While stakeholders such as teachers and admissions officers demanding validity bemoaned its lack in psychometric tests, reliability advocates could point out the significant challenge of providing accurate scoring of direct speaking and writing tasks for hundreds or thousands of test-takers on a single test.
The reliability vs. validity debate ignored the fact that there are other important criteria for evaluating the quality of a test or other means of assessment. The growing popularity of communicative language teaching in the 1980’s increased the desire to find ways to directly and accurately test the skills of speaking and writing, absent in tests such as the paper TOEFL. The introduction of tests such as the IELTS, with a focus on measuring both receptive and productive communicative skills, highlighted the fact that other criteria affect test quality: authenticity concerns the degree to which a test task reflects real use of language, whether that be in a college classroom, office, or coffee shop, and whether the task be understanding a portion of a lecture or written instructions for getting to a hotel, or writing a letter of complaint to a hotel in response to unsatisfactory service. Practicality addresses the question of how to carry out assessment, including grading, in a manner that does not exceed the resources, including those of time and energy, required to do so. When Bachman and Palmer (1996)1 identified what they considered to be the qualities of test usefulness, they included the four already mentioned, but added interactiveness and impact to their list. Interactiveness refers to the extent to which a test or test task accounts for learners’ topical knowledge, language ability, and affective factors such as interest and feelings (i.e. sensitivity to particular topics). Impact is concerned with the effect that a test has not only on learners but on effects at the classroom, school, and even societal level. The gatekeeping effects of admissions tests such as the TOEFL impact not only learners, but also teachers and language schools, as practitioners and institutions seek to find effective ways to prepare learners for these tests. College entrance tests in countries such as Japan have had a similar effect. The decades-long proliferation of cram schools such as juku in Japan and hagwon in South Korea bear testimony to the fact that test results can significantly impact a student’s long-term prospects.
Yet even these qualities do not address all the concerns learners and other stakeholders have with regards to tests and other forms of assessment. The issue of fairness is also clearly a matter of great concern. Reports of test-taker accent having an impact on scores obtained on speaking tests have invited the ire of both learners and others invested in their educational success. An international research study on the effect of accent familiarity on ratings given to samples of responses on an IELTS speaking task indicated that it does indeed affect a rater’s perception of the quality of learner speech2. Group discussions with non-native teachers of English (i.e., those for whom English is not their native tongue) reveal their general strong distrust of human raters of their speaking ability and their resulting preference for automated scoring of speaking, a feature of tests such as the Pearson Academic Test of English. The grading of writing has likewise been called into question; in response to this, evaluation of test-takers’ responses to the two writing tasks on the itp TOEFL involves a combination of automated (“e-rater”) and human scoring to maximize the reliability, validity, and thus the fairness of the scores assigned.
There are clearly several factors to consider when evaluating the quality of an assessment tool or approach. None by itself can cover all the concerns raised about testing. In the light of this, let us ponder a general quality that takes all of the above into account. If we consider what people identify as most important when evaluating a teacher, church leader, or a politician or political party, the word we so often hear is credibility, that is their trustworthiness or, as the morphology of the term implies, “believability”. This believability gives meaning, and thus value, to what they say. In the realm of education, credibility of a teaching approach or a test in the eyes of students, parents, and other stakeholders builds trust and adds value to what is revealed. With the preceding discussion in mind, I propose the following model of test credibility:
This is, of course, a subjective model. It is, however, based on two decades of experience as a language teacher, as well as what I have learned during the past decade as an instructor in a graduate-level TESOL program. It is complex enough to include the criteria discussed above but in even greater detail than what I have mentioned. Finally, it gives central focus to a concern that matters not only to those of us in the field, but to others, such as students, parents, and other stakeholders.
1Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice : designing and developing useful language tests. Oxford University Press.
2Carey, M. D., Mannell, R. H., & Dunn, P. K. (2011). Does a rater’s familiarity with a candidate’s pronunciation affect the rating in oral proficiency interviews? Language Testing, 28(2), 201–219. https://doi-org.twu.idm.oclc.org/10.1177/0265532210393704
Task-based language teaching is an approach to instruction in which students develop their proficiency in a target language by using the language to complete various tasks; these tasks may be pedagogical in nature, in order to improve students’ knowledge or skill with respect to microskills such as vocabulary knowledge or the ability to use commas or semicolons in the correct manner. At other times, teachers introduce “real world” tasks in which students engage in activities resembling those they would engage in in natural social settings; ordering a meal in a restaurant, applying for a bank card, or reading directions for taking a particular cold medicine, for example.
No task activity is complete without some type of feedback from the teacher, whether it be in the form of comments given while walking around the classroom during a group task or more formal feedback such as written advice provided on a paragraph-writing assignment.
Different types of tasks can serve different purposes from an assessment perspective. LINC (Language Instruction for Newcomers to Canada) courses, in which teachers introduce language through general topics related to living in a new country (e.g., using public transit, finding a job), often make use of three types of tasks over the course of a module (general topic, covering about one month): skill-building (SB) activities, skill-using (SU) tasks, and assessment tasks (AT).
Skill-building activities, which can also be labeled “indirect”, focus on those skills which serve as building blocks to becoming proficient in what are known as the “macroskills” of language: listening, speaking, reading, and writing. They would thus be featured early in any given module. With respect to speaking for example, a skill-building task for a lower-proficiency level class might be marking the correct stress on multi-syllabic words related to weather–”cloudy”, “thunderstorm”, “tornado”–while for an intermediate level class, an SB task might be in the form of a discourse-completion task in which learners choose the correct responses to questions asked during a job interview, for example. While such tasks do not involve authentic use of language on the part of learners, they are pedagogically highly useful, allowing for the teacher to conduct assessment for learning, providing feedback that serves to help learners build up the skills and knowledge areas needed for proficient use of the macroskills.
Skill-using tasks, presented later during a module, serve as a bridge between skill-building and assessment tasks. Unlike skill-building tasks, skill-using tasks are “direct”, providing learners an opportunity to engage in actual speaking, listening, reading, or writing. Unlike assessment tasks, however, they are pedagogical in nature, as they offer an opportunity to a teacher to provide support to learners as they “practice” speaking or listening, for example. For instance, learners doing a module on “finding a job” might practice a mock job interview, perhaps working with the script beforehand (depending on their proficiency level) and getting feedback on their performance from the teacher and, perhaps, one another. These tasks thus feature both assessment of learning, as learners demonstrate their application of underlying skills in carrying out these tasks, as well as assessment for learning, as teacher feedback provides information learners can apply for successfully completing the end-of-module assessment tasks.
Assessment tasks, in which learners take part in real-world speaking, listening, reading, and/or writing tasks, provide an opportunity for both learners and their teachers to gauge how well the learners have learned the skills practiced during the module. They are, as much as is possible in a classroom setting, intended to be authentic. While serving as assessment of learning from the teacher’s perspective, they can serve, like skill-building and skill-using tasks, as assessments for learning by those learners able to apply what they have learned during the module in their daily-life encounters outside the classroom.
The null hypothesis is one of the most important tools in a researcher’s toolbox. This applies not only to education but to other disciplines as well, particularly with regards to experimental studies. The online library from the University of Southern California describes the null hypothesis as follows with respect to research in education:
“the proposition, to be tested statistically, that the experimental intervention has “no effect,” meaning that the treatment and control groups will not differ as a result of the intervention.”1
This definition helpfully illustrates the fact that any null hypothesis is statistical in nature, and thus very different from the general public’s understanding of a hypothesis, that is,”an idea that attempts to explain something but has not yet been tested or proved to be correct”2. In fact, since “null” can be understood to mean zero, the null hypothesis is the opposite of whatever hypothesis a researcher is seeking to test. For example, a teacher may believe that that allowing students whose proficiency is low in the language of instruction to use their native language for planning activities would increase their chances of completing the activity successfully. In planning an experiment to test this belief, the teacher would pose a research question, such as
“How does the use of learners’ L1 (native language) impact their performance in skits in an English conversation class for Taiwanese college students?”
Although the teacher in confident the L1 will have an impact given that all the students are native speakers of Mandarin, he poses the null hypothesis since he does not have clear evidence to the contrary:
The use of the learners’ L1 in planning skits has no significant impact on their performance doing role plays in an English conversation class for Taiwanese college students.”
To provide a means of comparison for assessing performance, the teacher will first have students plan skits in groups in groups in which no Mandarin is permitted during the planning stage. This can be done for two different skits in order to ensure the students understand the task. The second set of skits is video recorded for both pedagogical and research purposes. For the following two skit activities, Mandarin is permitted during the planning stage, and the second set of Mandarin-planned skits is recorded. Students are given a questionnaire, bilingually or in Mandarin, seeking their opinion of the skit tasks. For pedagogical purposes, the teacher can, in a subsequent class, show both sets of recordings to the students to show them how much they have improved. At this point, no formal data analysis of the recordings is carried out. Once the course is completed, however, and grades (if applicable) are submitted, the teacher is free conduct formal analysis of the recordings. At this point, however, consent is required from the learners for their recordings to be used for formal analysis. Only recordings for which permission is granted may be used for analysis. The transcripts are analyzed in order to compare the number of idea units in the English-planned and Mandarin-planned skits. Based on the results of the data analysis, the null hypothesis is either supported or rejected. Only if there is a statistically significant difference in the number of idea units between the English-planned and Mandarin-planned skits can the null hypothesis be rejected and only for this particular study.
A teacher’s educated hunches about what can work in the classroom are important and informative. By formulating a research question and testing a null hypothesis based on that question, teachers can formally test their hunches with professional objectivity that fosters their professional development and gives weight to their perspectives in the professional community.
1Glossary of Research terms, accessed July 31, 2021
Standards are used for a variety of purposes in education. They serve as guidelines for curriculum development across subjects in different states, provinces, and even countries. Standards are also used to determine criteria for graduating from high school, gaining admission to post-secondary institutions, and even maintaining one’s academic standing once enrolled. Standards are used as criteria for obtaining a license to drive, practice law, or practice medicine. They provide ethical guidelines for conducting research on humans and animals. They are also used for for test design in different school subjects.
In the field of Teaching English to Speakers of Other Languages, there are a different sets of standards (also called benchmarks) that are very useful for English-language teaching professionals. The most international set of standards available is the Common European Framework of Reference for Languages (CEFR)1. They were in fact, set up, not only for the teaching of English but also for languages across Europe. The CEFR scale comprises six levels, from Basic to Proficient User and is presented as follows:
Because of the broad nature of these levels, the framework is used by ELT materials publishers such as Oxford and Cambridge as a means of calibrating and designating the proficiency levels of their textbooks in their online catalogues, which is particularly useful for their multilevel series. Additionally, both the Cambridge English2 and the Pearson International Certificate exams3 use the framework as a basis for designating the proficiency levels of their tests. The CEFR proficiency levels are also divided by macroskill (L, S, R, W) and can be used for self-assessment as well.
Next, the ACTFL (American Council for the Teaching of Foreign Languages)4 standards have quite a different purpose and structure from that of the CEFR. Like the CEFR standards, they can be applied to the learning of a number of different languages, including English. Being much more pedagogical and communicative than the former, they are useful for testing and developing materials for language use as well as language proficiency. The ACTFL standards include not only proficiency guidelines but also performance descriptors5 and can-do statements for three modes of communication: interpersonal, interpretive, and presentational. Below is a pyramid of the five different proficiency levels with each divided into three stages, as well as a sample performance-descriptor chart.
Because the ACTFL standards include such a wealth of information, they are suitable for teachers and program developers in workplace and academic-English settings. The performance descriptors can be employed for developing materials, planning instruction, and for developing and implementing assessment strategies. Instructors in business-English courses, as well as those teaching specialized courses for healthcare workers and international students seeking admission to English-medium universities, for example, would likely find the ACTFL-standards resources very beneficial.
The third set of standards I would like to introduce differs from both of those discussed already in that it is designed for use with adult learners in an ESL setting, that is, a setting in a region where English is used as the dominant language for daily communication. The standards I am referring to here are the Canadian Language Benchmarks6, and consist of twelve levels of proficiency divided into three stages covering all four macroskills:
At each of these levels, there is a profile of ability, a description of specific skills (can-do descriptors), and suggested tasks for learners at a particular proficiency level. The amount of detail makes the CLB, like the ACTFL proficiency and performance descriptors, useful for general course planning, developing materials, and planning instruction and assessment. However, given its target-learner focus, the CLB are limited for use with adult learners living in areas where English in heard, seen, and used on a daily basis.
Finally, I would like to turn my attention to a set of standards used for teaching young learners. The WIDA7 standards are a very detailed set of standards, organized by grade level from K-12, for use with English-language learners in the United States and in international schools overseas. The standards cover five domains and have broad application at each grade level8. The following two charts illustrate the scope of the standards as well as attention to detail at each grade level:
Standards provide TESOL instructors and administrators with practical tools for course planning, with helpful ideas for course objectives, planning instruction, and designing assessment tools. Whatever the age range of your learners, and in whatever type of program you are teaching, familiarizing yourself with the set of standards most applicable to your teaching context can equip you with valuable ideas for helping your students meet their English-learning needs.
For decades, tests used for university admission, such as the SAT, have been the subject of many complaints, and perhaps the most critical drawback discussed is their lack of predictive validity. To be sure, there is far more than mathematics and verbal skills to achieving a high GPA even during a student’s freshman year. Emotional maturity, English ability (for international students), the absence or presence of the need to work part-time to support one’s education, access to campus facilities, and quality of instruction are all factors that can affect academic achievement.
Hannon1 examined the impact of individual learners’ social and cognitive characteristics on college student GPA and found that SAT scores were a significant predictor of freshman GPA only when added statistically as a first predictor, while academic self-efficacy, epistemic belief in learning, and high knowledge integration otherwise acted as more significant predictors. Kobrin and Patterson2, meanwhile, found that the correlation between SAT scores and first-year GPA varied widely among institutions in comparison to other institutional factors such as the size of an institution’s financial aid package and the proportion of white freshmen admitted.
These personal and institutional factors, however, can generally not be easily measured on an admissions test, and even an excellent high-school transcript or high level of English proficiency offers no guarantee of success. From a pedagogical perspective, what an SAT or TOEFL score will not tell you is whether a student will have sufficient academic skills and knowledge within their specific discipline to be able to competently perform the work assigned in their courses to complete their degree or diploma. Biber et al3 found that scores on the ibt TOEFL writing section had only weak to moderate correlations with students’ scores on academic writing tasks across various disciplines; correlations were particularly low with scores of overall organization of ideas, both at the graduate and undergraduate levels. Cho and Bridgeman’s4 study of graduate and undergraduate students across ten universities found only weak correlations between ibt TOEFL scores and GPA.
At the graduate level, the GRE general is described as “a measure of general cognitive skills”5 and so also ought not be regarded as a reliable predictor of discipline-specific competence. The reason for this is that graduate-level coursework and research is carried out in a highly-specialized program, requiring application of skills and/or knowledge at a level of expertise considerably beyond that required for an undergraduate degree.
Young6, for example, discovered that the mean scores of students on the GRE Verbal section who were accepted into, but did not complete, a doctoral program in Educational Leadership, were nearly identical to the mean scores of those who were successful in completing the requirements. That is, scores on the GRE Verbal section had no predictive validity pertaining to the ability or lack thereof on the part of students to attain the degree. Moneta-Koehler et al, researching a biomedical degree program at Vanderbilt University, found that performance on the GRE had no predictive validity either with respect to passing a Ph.D qualifying exam or successfully completing their Ph.D.7 Rubio et. al.8 reported mixed results of earlier studies and found in their own study that GRE-section scores had very weak correlations with graduate GPA for students at both the masters and doctoral levels across various disciplines at a midwestern university.
General cognitive skills do not reliably indicate readiness to perform with a sufficiently high level of academic competence to attain an advanced degree. Even the GMAT, an exam used for admission to business schools, has been found to rather variable predictive validity worldwide, with the quantitative section of the test being a poor predictor in the UK9 and the verbal and quantitative sections being variable in their predictive validity worldwide10.
If the information provided by a standardized test score and transcripts are not sufficient predictors of academic success, how can program administrators make well-informed decisions regarding the academic readiness of students seeking admission to their programs? In small programs, at least, one possibility would be to conduct individual interviews with applicants in which program administrators can use specific questions to evaluate a student’s readiness for study. An additional or alternative strategy, particularly in larger programs, would be to administer tasks representative of those typically used by instructors in a given program and evaluate applicants’ performance based on specific rubrics. Finally, with respect to a discipline-specific “test”, a cloze test based on a journal article or book chapter covering a wide range of topics in the discipline could be administered to evaluate applicants’ ability to handle content specific to that discipline. A vocabulary test, in which applicants match field-specific terms with their definitions, is also a possibility. The advantage of a cloze test would be the opportunity it would provide to test-takers to demonstrate not only their topical knowledge but also their skill at applying that knowledge to read literature particular to the discipline.
Whatever the strategies selected, it is clear that administrators need information besides standardized test scores to make reliable decisions with respect to admitting students to their programs.
1Hannon, B. (2014). Predicting College Success: The Relative Contributions of Five Social/Personality Factors, Five Cognitive/Learning Factors and SAT Scores. Journal of Education and Training Studies, 2(4), 46–58.
2Kobrin, J., & Patterson, B. (2011). Contextual Factors Associated With the Validity of SAT Scores and High School GPA for Predicting First-Year College Grades. Educational Assessment, 16(4), 207–226. https://doi-org.ezproxy.student.twu.ca/10.1080/10627197.2011.635956
3Biber, D., Reppen, R., & Staples, S. (2017). Exploring the Relationship Between TOEFL iBT Scores and Disciplinary Writing Performance. TESOL Quarterly, 51(4), 948–960. https://doi-org.ezproxy.student.twu.ca/10.1002/tesq.359
5Liu, O. L., Klieger, D. M., Bochenek, J. L., Holtzman, S. L., & Xu, J. (2016). An Investigation of the Use and Predictive Validity of Scores from the “GRE”® revised General Test in a Singaporean University. ETS GRE® Board Research Report. ETS GRE®-16-01. ETS Research Report. RR-16-05. ETS Research Report Series.
6Young, I. P. (2008). Predictive Validity of the GRE and GPAs for a Doctoral Program Focusing on Educational Leadership. Journal of Research on Leadership Education, 3(1).
8Rubio, D. M., Rubin, R. S., & Brennan, D. G. (2003). How Well Does the GRE Work for Your University? An Empirical Institutional Case Study of the Graduate Record Examination Across Multiple Disciplines. College & University, 78(4), 11–17.
9Dobson, P., Krapljan-Barr, P., & Vielba, C. (1999). An Evaluation of the Validity and Fairness of the Graduate Management Admissions Test (GMAT) Used for MBA Selection in a UK Business School. International Journal of Selection & Assessment, 7(4), 196. https://doi-org.ezproxy.student.twu.ca/10.1111/1468-2389.00119
How can we designate the different methods of doing research in education in a manner that is useful for helping both students in education programs and professional teachers understand the range of possibilities available? As a doctoral student, I understood that the most basic distinction used was to describe research as being either “qualitative”, “quantitative”, or “mixed methods”; in other words, research that involved the analysis of words, numbers, or some blend of the two. However, these terms convey nothing with respect to what we actually want to do as researchers (describe a learning setting, try out a teaching technique, etc.) or how we want to do it (by observing and interviewing members of a class, randomly selecting learners and assigning them separate groups, etc.).
Therefore, understanding the options available for conducting educational research requires that we begin with terms suitable for making helpful distinctions which can enable researchers to understand which method to choose when we wish to embark on a study. The first question to ask someone planning to conduct a research study is “What do you want to do?” This may elicit responses such as “I want try out cooperative learning in my EAP class to encourage my students to try to speak more, and I want to know how effective if is,” or “I want to understand what it’s like to teach in an English immersion program.” Since the information being sought is very different in each of these two instances, the research approachchosen for selecting participants, collecting and analyzing data, and ensuring the reliability and validity of the findings will be very different. The teacher trying to get his or her students to speak more in class will find an action-research approach to be the best option, while the educational researcher interested in immersion education would find a naturalistic approach, by which a researcher seeks to describe a particular educational setting–or some aspect therein–as is, to be the preferred route to use.
Research approaches in education can be divided three branches: general, administrative, and pedagogical.
General approaches are those selected by professional researchers for the purpose of generating knowledge to inform theory and/or practice. This branch of approaches includes: the aforementioned naturalistic approach, such as ethnographies and case studies, which are descriptive and subjective in nature; the experimental approach, through which studies are conducted to test one or more hypotheses by administering a treatment (a teaching technique or use of a material) and measuring the effects through the use of statistical analysis; and, the multivariate approach, which involves the quantitative analysis of effects and relationships among factors affecting language learning. Administrative approaches are evaluative in nature and are undertaken to generate knowledge informing policymakers at either the program, institution, or district level to make decisions relating to staffing, funding, or program reform, for example. Finally, pedagogical approaches are conducted to improve classroom practices and may aim to inform theory as well. The action-research approach is conducted by one or more classroom teachers to find a solution to a teaching or learning issue in a particular course or school subject. Action research is a short-term venture, intended to find an immediate solution, if possible, for the problem at stake. Design(-based) research, on the other hand, is a long-term collaborative project involving both researchers and practitioners and is aimed not only at resolving a teaching issue, but in the process of doing so, generating knowledge that can result in innovative practices and updated theory. ¹
Once a researcher has decided on an approach, the next step is to select the most suitable design. Naturalistic approaches, being descriptive in nature, allow for either a thorough, overall portrayal (ethnography) of an educational milieu, such as an immersion program, or a more in-depth examination (case study) of one aspect of it, such as a class or teacher, to give two common examples. An experimental approach can involve either the random selection of participants (pure experiment) from a particular target population (e.g. international students enrolled in EAP programs in a particular region) so as to generalize the findings to the target group, or the use of one or more existing classes (one-group or quasi-experiment), which does not allow such generalizations but can have important local implications. A multivariate approach may involve a single group or multiple groups, and make use of a survey or factorial design for example, to measure statistical relationships or effects among variables relating to language learning. A wide range of designs and statistical analysis options is possible, depending on the nature and number of independent and dependent variables of interest to the researcher. Finally, the pedagogical approach, being pragmatic in nature is flexible with respect to the design chosen, depending on the number and size of the classes involved. Ethical concerns relating to matters such as the involvement of teachers researching their own classrooms dictate that the specific options available for a given study are limited to those which pose the least threat to the learners.
Contemplating research? What do you want to do?
¹Shah, J. K., Ensminger, D. C., & Thier, K. (2015). The Time for Design-Based Research Is “Right” and “Right Now.” Mid-Western Educational Researcher, 27(2), 152–171.