Friday, March 21, 2014


Rubric:  A scoring scale used to assess student performance along a task-specific set of criteria
Authentic assessments typically are criterion-referenced measures.  That is, a student's aptitude on a task is determined by matching the student's performance against a set of criteria to determine the degree to which the student's performance meets the criteria for the task.  To measure student performance against a pre-determined set of criteria, a rubric, or scoring scale, is typically created which contains the essential criteria for the task and appropriate levels of performance for each criterion.  For example, the following rubric (scoring scale) covers the research portion of a project:
Research Rubric
Number of Sources
Historical Accuracy
Lots of historical inaccuracies
Few inaccuracies
No apparent inaccuracies
Cannot tell from which source information came
Can tell with difficulty where information came from
Can easily tell which sources info was drawn from
Bibiliography contains very little information
Bibliography contains most relevant information
All relevant information is included
As in the above example, a rubric is comprised of two components:  criteria and levels of performance.  Each rubric has at least two criteria and at least two levels of performance.  The criteria, characteristics of good performance on a task, are listed in the left-hand column in the rubric above (number of sources, historical accuracy, organization and bibliography). Actually, as is common in rubrics, the author has used shorthand for each criterion to make it fit easily into the table. The full criteria are statements of performance such as "include a sufficient number of sources" and "project contains few historical inaccuracies."
For each criterion, the evaluator applying the rubric can determine to what degree the student has met the criterion, i.e., the level of performance. In the above rubric, there are three levels of performance for each criterion. For example, the project can contain lots of historical inaccuracies, few inaccuracies or no inaccuracies.
Finally, the rubric above contains a mechanism for assigning a score to each project. (Assessments and their accompanying rubrics can be used for purposes other than evaluation and, thus, do not have to have points or grades attached to them.) In the second-to-left column a weight is assigned each criterion. Students can receive 1, 2 or 3 points for "number of sources." But historical accuracy, more important in this teacher's mind, is weighted three times (x3) as heavily. So, students can receive 3, 6 or 9 points (i.e., 1, 2 or 3 times 3) for the level of accuracy in their projects.
The above rubric includes another common, but not a necessary, component of rubrics --descriptors. Descriptors spell out what is expected of students at each level of performance for each criterion. In the above example, "lots of historical inaccuracies," "can tell with difficulty where information came from" and "all relevant information is included" are descriptors. A descriptor tells students more precisely what performance looks like at each level and how their work may be distinguished from the work of others for each criterion. Similarly, the descriptors help the teacher more precisely and consistently distinguish between student work.
Many rubrics do not contain descriptors, just the criteria and labels for the different levels of performance. For example, imagine we strip the rubric above of its descriptors and put in labels for each level instead. Here is how it would look:
Poor (1)
Good (2)
Excellent (3)
Number of Sources

Historical Accuracy



It is not easy to write good descriptors for each level and each criterion. So, when you first construct and use a rubric you might not include descriptors. That is okay. You might just include the criteria and some type of labels for the levels of performance as in the table above. Once you have used the rubric and identified student work that fits into each level it will become easier to articulate what you mean by "good" or "excellent." Thus, you might add or expand upon descriptors the next time you use the rubric.
Why Include Levels of Performance?
Clearer expectations
As mentioned in Step 3, it is very useful for the students and the teacher if the criteria are identified and communicated prior to completion of the task. Students know what is expected of them and teachers know what to look for in student performance. Similarly, students better understand what good (or bad) performance on a task looks like if levels of performance are identified, particularly if descriptors for each level are included.

More consistent and objective assessment
In addition to better communicating teacher expectations, levels of performance permit the teacher to more consistently and objectively distinguish between good and bad performance, or between superior, mediocre and poor performance, when evaluating student work.

Better feedback
Furthermore, identifying specific levels of student performance allows the teacher to provide more detailed feedback to students. The teacher and the students can more clearly recognize areas that need improvement.

Analytic Versus Holistic Rubrics
For a particular task you assign students, do you want to be able to assess how well the students perform on each criterion, or do you want to get a more global picture of the students' performance on the entire task? The answer to that question is likely to determine the type of rubric you choose to create or use: Analytic or holistic.

Analytic rubric
Most rubrics, like the Research rubric above, are analytic rubrics. An analytic rubric articulates levels of performance for each criterion so the teacher can assess student performance on each criterion. Using the Research rubric, a teacher could assess whether a student has done a poor, good or excellent job of "organization" and distinguish that from how well the student did on "historical accuracy."

Holistic rubric
In contrast, a holistic rubric does not list separate levels of performance for each criterion. Instead, a holistic rubric assigns a level of performance by assessing performance across multiple criteria as a whole. For example, the analytic research rubric above can be turned into a holistic rubric:
3 - Excellent Researcher
  • included 10-12 sources
  • no apparent historical inaccuracies
  • can easily tell which sources information was drawn from
  • all relevant information is included
2 - Good Researcher
  • included 5-9 sources
  • few historical inaccuracies
  • can tell with difficulty where information came from
  • bibliography contains most relevant information
1 - Poor Researcher
  • included 1-4 sources
  • lots of historical inaccuracies
  • cannot tell from which source information came
  • bibliography contains very little information
In the analytic version of this rubric, 1, 2 or 3 points is awarded for the number of sources the student included. In contrast, number of sources is considered along with historical accuracy and the other criteria in the use of a holistic rubric to arrive at a more global (or holistic) impression of the student work. Another example of a holistic rubric is the "Holistic Critical Thinking Scoring Rubric" (in PDF) developed by Facione & Facione.
When to choose an analytic rubric
Analytic rubrics are more common because teachers typically want to assess each criterion separately, particularly for assignments that involve a larger number of criteria. It becomes more and more difficult to assign a level of performance in a holistic rubric as the number of criteria increases. For example, what level would you assign a student on the holistic research rubric above if the student included 12 sources, had lots of inaccuracies, did not make it clear from which source information came, and whose bibliography contained most relevant information? As student performance increasingly varies across criteria it becomes more difficult to assign an appropriate holistic category to the performance. Additionally, an analytic rubric better handles weighting of criteria. How would you treat "historical accuracy" as more important a criterion in the holistic rubric? It is not easy. But the analytic rubric handles it well by using a simple multiplier for each criterion.

When to choose a holistic rubric
So, when might you use a holistic rubric? Holistic rubrics tend to be used when a quick or gross judgment needs to be made. If the assessment is a minor one, such as a brief homework assignment, it may be sufficient to apply a holistic judgment (e.g., check, check-plus, or no-check) to quickly review student work. But holistic rubrics can also be employed for more substantial assignments. On some tasks it is not easy to evaluate performance on one criterion independently of performance on a different criterion. For example, many writing rubrics (see example) are holistic because it is not always easy to disentangle clarity from organization or content from presentation. So, some educators believe a holistic or global assessment of student performance better captures student ability on certain tasks. (Alternatively, if two criteria are nearly inseparable, the combination of the two can be treated as a single criterion in an analytic rubric.)
How Many Levels of Performance Should I Include in my Rubric?
There is no specific number of levels a rubric should or should not possess. It will vary depending on the task and your needs. A rubric can have as few as two levels of performance (e.g., a checklist) or as many as ... well, as many as you decide is appropriate. (Some do not consider a checklist a rubric because it only has two levels -- a criterion was met or it wasn't. But because a checklist does contain criteria and at least two levels of performance, I include it under the category of rubrics.) Also, it is not true that there must be an even number or an odd number of levels. Again, that will depend on the situation.
To further consider how many levels of performance should be included in a rubric, I will separately address analytic and holistic rubrics.
Analytic rubrics
Generally, it is better to start with a smaller number of levels of performance for a criterion and then expand if necessary. Making distinctions in student performance across two or three broad categories is difficult enough. As the number of levels increases, and those judgments become finer and finer, the likelihood of error increases.
Thus, start small. For example, in an oral presentation rubric, amount of eye contact might be an important criterion. Performance on that criterion could be judged along three levels of performance: never, sometimes, always.
makes eye contact with audience
Although these three levels may not capture all the variation in student performance on the criterion, it may be sufficient discrimination for your purposes. Or, at the least, it is a place to start. Upon applying the three levels of performance, you might discover that you can effectively group your students' performance in these three categories. Furthermore, you might discover that the labels of never, sometimes and always sufficiently communicates to your students the degree to which they can improve on making eye contact.
On the other hand, after applying the rubric you might discover that you cannot effectively discriminate among student performance with just three levels of performance. Perhaps, in your view, many students fall in between never and sometimes, or between sometimes and always, and neither label accurately captures their performance. So, at this point, you may decide to expand the number of levels of performance to include never, rarely, sometimes, usually and always.
makes eye contact
There is no "right" answer as to how many levels of performance there should be for a criterion in an analytic rubric; that will depend on the nature of the task assigned, the criteria being evaluated, the students involved and your purposes and preferences. For example, another teacher might decide to leave off the "always" level in the above rubric because "usually" is as much as normally can be expected or even wanted in some instances. Thus, the "makes eye contact" portion of the rubric for that teacher might be
makes eye contact

So, I recommend that you begin with a small number of levels of performance for each criterion, apply the rubric one or more times, and then re-examine the number of levels that best serve your needs. I believe starting small and expanding if necessary is preferable to starting with a larger number of levels and shrinking the number because rubrics with fewer levels of performance are normally
·          easier and quicker to administer
·          easier to explain to students (and others)
·          easier to expand than larger rubrics are to shrink
The fact that rubrics can be modified and can reasonably vary from teacher to teacher again illustrates that rubrics are flexible tools to be shaped to your purposes.
Holistic rubrics
Much of the advice offered above for analytic rubrics applies to holistic rubrics as well. Start with a small number of categories, particularly since holistic rubrics often are used for quick judgments on smaller tasks such as homework assignments. For example, you might limit your broad judgments to

·          satisfactory
·          unsatisfactory
·          not attempted
·          check-plus
·          check
·          no check

or even just
·          satisfactory (check)
·          unsatisfactory (no check)
Of course, to aid students in understanding what you mean by "satisfactory" or "unsatisfactory" you would want to include descriptors explaining what satisfactory performance on the task looks like.
Even with more elaborate holistic rubrics for more complex tasks I recommend that you begin with a small number of levels of performance. Once you have applied the rubric you can better judge if you need to expand the levels to more effectively capture and communicate variation in student performance.

Mueller's* Glossary
of Authentic Assessment Terms

* I have tried to present definitions below that are consistent with the common use of these terms.  However, because some terms do not have commonly agreed upon definitions and because, in a few cases, I think certain definitions make more sense, I am calling this Mueller's Glossary.   Use at your own risk.
Analytic Rubric: An analytic rubric articulates levels of performance for each criterion so the teacher can assess student performance on each criterion. (For examples and a fuller discussion, go to Rubrics.)
Authentic Assessment:  A form of assessment in which students are asked to perform real-world tasks that demonstrate meaningful application of essential knowledge and skills.  Student performance on a task is typically scored on a rubric to determine how successfully the student has met specific standards.
Some educators choose to distinguish between authentic assessment andperformance assessment.  For these educators, performance assessment meets the above definition except that the tasks do not reflect real-world (authentic) challenges.  If we are going to ask students to construct knowledge on assessments, then virtually all such tasks should be authentic in nature or they lose some relevance to the students.  Thus, for me, this distinction between performance and authentic assessments becomes insignificant and unnecessary.   Consequently, I use authentic assessment and performance assessment synonymously.   (For a fuller discussion of the different terms used to describe this form of assessment and its distinction from "traditional" or forced-choice assessment, go to What is Authentic Assessment?)
Authentic Task:  An assignment given to students designed to assess their ability to apply standards-driven knowledge and skills to real-world challenges. A task is considered authentic when 1) students are asked to construct their own responses rather than to select from ones presented; and 2) the task replicates challenges faced in the real world.  Good performance on the task should demonstrate, or partly demonstrate, successful completion of one or more standards.  The term task is often used synonymously with the term assessment in the field of authentic assessment.  (For a fuller description of authentic tasks and for examples, go toAuthentic Tasks.)
Content Standards: Statements that describe what students should know or be able to do within the content of a specific discipline or at the intersection of two or more disciplines (e.g., students will describe effects of physical activity on the body). Contrast with Process Standards and Value Standards.
Criteria:   Characteristics of good performance on a particular task.  For example, criteria for a persuasive essay might include well organized, clearly stated, andsufficient support for arguments.  (The singular of criteria is criterion.   For a fuller description of criteria and for examples, go to Identifying the Criteria for the Task.)
Descriptors: Statements of expected performance at each level of performance for a particular criterion in a rubric - typically found in analytic rubrics. See example and further discussion of descriptors.
Distractors: The incorrect alternatives or choices in a selected response item. (For more see terminology for multiple-choice items.)
Goal:  In the field of student assessment, a goal is a very broad statement of what students should know or be able to do.  Unlike a standard or an objective, a goal is often not written in language that is amenable to assessment.  Rather, the purpose for crafting a set of goals typically is to give a brief and broad picture of what a school, district, state, etc. expects its students will know and be able to do upon graduation.  (For a fuller description of the distinction between these types of statements and for examples of each, go to Standards.)
Holistic Rubric: In contrast to an analytic rubric, a holistic rubric does not list separate levels of performance for each criterion. Instead, a holistic rubric assigns a level of performance by assessing performance across multiple criteria as a whole. (For examples and a fuller discussion, go to Rubrics.)
Objective:  Much like a goal or standard, an objective is a statement of what students should know and be able to do.  Typically, an objective is the most narrow of these statements, usually describing what a student should know or be able to do at the end of a specific lesson plan.  Like a standard, an objective is amenable to assessment, that is, it is observable and measurable.  (For a fuller description of the distinction between these types of goal statements and for examples of each, go toStandards.)
Outcome:  See Standard. Preceding the current standards-based movement was a drive for outcome-based education. The term standard has replaced the term outcome with much the same meaning.
Performance Assessment:  See Authentic Assessment above.  I use these terms synonymously.
Portfolio: A collection of a student's work specifically selected to tell a particular story about the student. See Portfolios for more details. 
Process Standards: Statements that describe skills students should develop to enhance the process of learning. Process standards are not specific to a particular discipline, but are generic skills that are applicable to any discipline (e.g., students will find and evaluate relevant information). Contrast with Content Standards and Value Standards.
Reliability: The degree to which a measure yields consistent results.
Rubric:  A scoring scale used to evaluate student work.  A rubric is composed of at least two criteria by which student work is to be judged on a particular task and at least two levels of performance for each criterion.  (For a fuller description of rubrics, their different variations, and to see examples, go to Rubrics. Also, see Analytic Rubrics; Holistic Rubrics.)
Standard:  Much like a goal or objective, a standard is a statement of what students should know or be able to do.  I distinguish between a standard and these other goal statements by indicating that a standard is broader than an objective, but more narrow than a goal.   Like an objective and unlike a goal, a standard is amenable to assessment, that is, it is observable and measurable.  (For a fuller description of the distinction between these types of goal statements and for examples of each, clickstandards. Also, see Content Standards; Process Standards; Value Standards.)
(Actually, I prefer the way we previously used the term standard: "A description of what a student is expected to attain in order to meet a specified educational intent (such as a learning outcome or objective).  The description may be qualitative and/or quantitative and may vary in level of specificity, depending on its purpose" (Assessment Handbook, Illinois State Board of Education, 1995). In other words, an outcome would describe what students should know and be able to do, and the standard described the particular level of accomplishment on that outcome that you expected most students to meet. That was your standard. We no longer commonly use that definition of standard in assessment.)
Stem: A question or statement followed by a number of choices or alternatives that answer or complete the question or statement. (Stems are most commonly found in multiple-choice questions. See terminology for multiple-choice items.)
Validity: "The degree to which a certain inference from a test is appropriate and meaningful" (AERA, APA, & NCME, 1985). For example, if I measure the circumference of your head to determine your level of intelligence, my measurement might be accurate. However, it would be inappropriate for me to draw a conclusion about your level of intelligence. Such an inference would be invalid.
Value Standards: Statements that describe attitudes teachers would like students to develop towards learning (e.g., students will value diversity of opinions or perspectives). Contrast with Content Standards and Process Standards.

No comments:

Post a Comment