Rubrics
Rubric: A scoring scale used to assess student performance along a
task-specific set of criteria
Authentic assessments typically are criterion-referenced measures.
That is, a student's aptitude on a task is determined by matching the student's
performance against a set of criteria to determine the degree to which the
student's performance meets the criteria for the task. To measure student
performance against a pre-determined set of criteria, a rubric, or scoring
scale, is typically created which contains the essential criteria for the task
and appropriate levels of performance for each criterion. For example,
the following rubric (scoring scale) covers the research portion of a project:
Research Rubric
Criteria
|
1
|
2
|
3
|
|
Number of Sources
|
x1
|
1-4
|
5-9
|
10-12
|
Historical Accuracy
|
x3
|
Lots of historical inaccuracies
|
Few inaccuracies
|
No apparent inaccuracies
|
Organization
|
x1
|
Cannot tell from which source
information came
|
Can tell with difficulty where
information came from
|
Can easily tell which sources
info was drawn from
|
Bibliography
|
x1
|
Bibiliography contains very
little information
|
Bibliography contains most
relevant information
|
All relevant information is
included
|
As in the above example, a rubric is comprised of two components: criteria and levels of performance. Each rubric has at least two criteria and at least two levels of
performance. The criteria, characteristics of good performance on a task,
are listed in the left-hand column in the rubric above (number of sources,
historical accuracy, organization and bibliography). Actually, as is common in
rubrics, the author has used shorthand for each criterion to make it fit easily
into the table. The full criteria are statements of performance such as
"include a sufficient number of sources" and "project contains
few historical inaccuracies."
For each criterion, the evaluator applying the rubric can determine to
what degree the student has met the criterion, i.e., the level of performance.
In the above rubric, there are three levels of performance for each criterion.
For example, the project can contain lots of historical inaccuracies, few
inaccuracies or no inaccuracies.
Finally, the rubric above contains a mechanism for assigning a score to
each project. (Assessments and their accompanying rubrics can be used for
purposes other than evaluation and, thus, do not have to have points or grades
attached to them.) In the second-to-left column a weight is assigned each
criterion. Students can receive 1, 2 or 3 points for "number of
sources." But historical accuracy, more important in this teacher's mind,
is weighted three times (x3) as heavily. So, students can receive 3, 6 or 9
points (i.e., 1, 2 or 3 times 3) for the level of accuracy in their projects.
The above rubric
includes another common, but not a necessary, component of rubrics --descriptors. Descriptors spell out what is
expected of students at each level of performance for each criterion. In the
above example, "lots of historical inaccuracies," "can tell with
difficulty where information came from" and "all relevant information
is included" are descriptors. A descriptor tells students more precisely
what performance looks like at each level and how their work may be
distinguished from the work of others for each criterion. Similarly, the
descriptors help the teacher more precisely and consistently distinguish
between student work.
Many rubrics do not contain descriptors, just the criteria and labels
for the different levels of performance. For example, imagine we strip the
rubric above of its descriptors and put in labels for each level instead. Here
is how it would look:
Criteria
|
Poor (1)
|
Good (2)
|
Excellent (3)
|
|
Number of Sources
|
x1
|
|
|
|
Historical Accuracy
|
x3
|
|
|
|
Organization
|
x1
|
|
|
|
Bibliography
|
x1
|
|
|
|
It is not easy to write good descriptors for each level and each
criterion. So, when you first construct and use a rubric you might not include
descriptors. That is okay. You might just include the criteria and some type of
labels for the levels of performance as in the table above. Once you have used
the rubric and identified student work that fits into each level it will become
easier to articulate what you mean by "good" or
"excellent." Thus, you might add or expand upon descriptors the next
time you use the rubric.
Why Include Levels of Performance?
As mentioned in Step
3, it is very useful for the students and the teacher if the criteria are
identified and communicated prior to completion of the task. Students know what
is expected of them and teachers know what to look for in student performance.
Similarly, students better understand what good (or bad) performance on a task
looks like if levels of performance are identified, particularly if descriptors
for each level are included.
More consistent and objective assessment
In addition to better
communicating teacher expectations, levels of performance permit the teacher to
more consistently and objectively distinguish between good and bad performance,
or between superior, mediocre and poor performance, when evaluating student
work.
Better feedback
Furthermore,
identifying specific levels of student performance allows the teacher to
provide more detailed feedback to students. The teacher and the students can
more clearly recognize areas that need improvement.
Analytic Versus
Holistic Rubrics
For a particular task
you assign students, do you want to be able to assess how well the students
perform on each criterion, or do you want to get a more global picture of the
students' performance on the entire task? The answer to that question is likely
to determine the type of rubric you choose to create or use: Analytic or
holistic.
Most rubrics, like
the Research rubric above, are analytic rubrics. An analytic rubric articulates levels of performance for each criterion so the teacher can assess student performance on each
criterion. Using the Research rubric, a teacher could assess whether a student
has done a poor, good or excellent job of "organization" and
distinguish that from how well the student did on "historical accuracy."
Holistic rubric
In contrast, a
holistic rubric does not list separate levels of performance for each criterion. Instead, a holistic rubric assigns a level of performance by assessing performance across multiple
criteria as a whole. For example, the analytic research rubric above can be
turned into a holistic rubric:
3 -
Excellent Researcher
|
2 -
Good Researcher
|
1 - Poor
Researcher
|
In the analytic version of this rubric, 1, 2 or 3 points is awarded for
the number of sources the student included. In contrast, number of sources is
considered along with historical accuracy and the other criteria in the use of
a holistic rubric to arrive at a more global (or holistic) impression of the
student work. Another example of a holistic rubric is the "Holistic Critical
Thinking Scoring Rubric" (in PDF) developed by Facione & Facione.
Analytic rubrics are
more common because teachers typically want to assess each criterion
separately, particularly for assignments that involve a larger number of
criteria. It becomes more and more difficult to assign a level of performance
in a holistic rubric as the number of criteria increases. For example, what
level would you assign a student on the holistic research rubric above if the
student included 12 sources, had lots of inaccuracies, did not make it clear
from which source information came, and whose bibliography contained most
relevant information? As student performance increasingly varies across
criteria it becomes more difficult to assign an appropriate holistic category
to the performance. Additionally, an analytic rubric better handles weighting
of criteria. How would you treat "historical accuracy" as more
important a criterion in the holistic rubric? It is not easy. But the analytic
rubric handles it well by using a simple multiplier for each criterion.
When to choose a
holistic rubric
So, when might you
use a holistic rubric? Holistic rubrics tend to be used when a quick or gross
judgment needs to be made. If the assessment is a minor one, such as a brief
homework assignment, it may be sufficient to apply a holistic judgment (e.g.,
check, check-plus, or no-check) to quickly review student work. But holistic
rubrics can also be employed for more substantial assignments. On some tasks it
is not easy to evaluate performance on one criterion independently of performance
on a different criterion. For example, many writing rubrics (see example) are holistic because it is not always easy to disentangle clarity from
organization or content from presentation. So, some educators believe a
holistic or global assessment of student performance better captures student
ability on certain tasks. (Alternatively, if two criteria are nearly
inseparable, the combination of the two can be treated as a single criterion in
an analytic rubric.)
There is no specific number of levels a rubric should or should not
possess. It will vary depending on the task and your needs. A rubric can have
as few as two levels of performance (e.g., a checklist) or as many as ... well,
as many as you decide is appropriate. (Some do not consider a checklist a
rubric because it only has two levels -- a criterion was met or it wasn't. But
because a checklist does contain criteria and at least two levels of
performance, I include it under the category of rubrics.) Also, it is not true that there must be an even number or an odd number of levels.
Again, that will depend on the situation.
To further consider how many levels of performance should be included in
a rubric, I will separately address analytic and holistic rubrics.
Analytic rubrics
Generally, it is
better to start with a smaller number of levels of performance for a criterion
and then expand if necessary. Making distinctions in student performance across
two or three broad categories is difficult enough. As the number of levels
increases, and those judgments become finer and finer, the likelihood of error
increases.
Thus, start small. For example, in an oral presentation rubric, amount
of eye contact might be an important criterion. Performance on that criterion
could be judged along three levels of performance: never, sometimes, always.
makes eye contact with audience
|
never
|
sometimes
|
always
|
Although these three levels may not capture all the variation in student
performance on the criterion, it may be sufficient discrimination for your
purposes. Or, at the least, it is a place to start. Upon applying the three
levels of performance, you might discover that you can effectively group your
students' performance in these three categories. Furthermore, you might
discover that the labels of never, sometimes and always sufficiently
communicates to your students the degree to which they can improve on making
eye contact.
On the other hand, after applying the rubric you might discover that you
cannot effectively discriminate among student performance with just three
levels of performance. Perhaps, in your view, many students fall in between
never and sometimes, or between sometimes and always, and neither label
accurately captures their performance. So, at this point, you may decide to
expand the number of levels of performance to include never, rarely, sometimes, usually and always.
makes eye contact
|
never
|
rarely
|
sometimes
|
usually
|
always
|
There is no "right" answer as to how many levels of performance
there should be for a criterion in an analytic rubric; that will depend on the
nature of the task assigned, the criteria being evaluated, the students
involved and your purposes and preferences. For example, another teacher might
decide to leave off the "always" level in the above rubric because
"usually" is as much as normally can be expected or even wanted in
some instances. Thus, the "makes eye contact" portion of the rubric
for that teacher might be
makes eye contact
|
never
|
rarely
|
sometimes
|
usually
|
So, I recommend that
you begin with a small number of levels of performance for each criterion,
apply the rubric one or more times, and then re-examine the number of levels
that best serve your needs. I believe starting small and expanding if necessary
is preferable to starting with a larger number of levels and shrinking the
number because rubrics with fewer levels of performance are normally
·
easier and quicker to administer
·
easier to explain to students (and
others)
·
easier to expand than larger rubrics
are to shrink
The fact that rubrics can be modified and can reasonably vary from
teacher to teacher again illustrates that rubrics are flexible tools to be
shaped to your purposes.
Holistic rubrics
Much of the advice
offered above for analytic rubrics applies to holistic rubrics as well. Start
with a small number of categories, particularly since holistic rubrics often
are used for quick judgments on smaller tasks such as homework assignments. For
example, you might limit your broad judgments to
·
satisfactory
·
unsatisfactory
·
not attempted
or
·
check-plus
·
check
·
no check
or even just
·
satisfactory (check)
·
unsatisfactory (no check)
Of course, to aid students in understanding what you mean by
"satisfactory" or "unsatisfactory" you would want to
include descriptors explaining what satisfactory performance on the task looks like.
Even with more elaborate holistic rubrics for more complex tasks I
recommend that you begin with a small number of levels of performance. Once you
have applied the rubric you can better judge if you need to expand the levels
to more effectively capture and communicate variation in student performance.
Mueller's* Glossary
of Authentic Assessment Terms
of Authentic Assessment Terms
* I have tried to present definitions below that are consistent with the
common use of these terms. However, because some terms do not have
commonly agreed upon definitions and because, in a few cases, I think certain
definitions make more sense, I am calling this Mueller's Glossary. Use
at your own risk.
Analytic Rubric: An analytic rubric articulates levels of performance for each criterion
so the teacher can assess student performance on each criterion. (For examples
and a fuller discussion, go to Rubrics.)
Authentic Assessment: A form of assessment in which students are asked to perform
real-world tasks that demonstrate meaningful application of essential knowledge
and skills. Student performance on a task is typically scored on a rubric
to determine how successfully the student has met specific standards.
Some educators choose to distinguish between authentic assessment andperformance assessment. For these educators, performance
assessment meets the above definition except that the tasks do not reflect
real-world (authentic) challenges. If we are going to ask students to
construct knowledge on assessments, then virtually all such tasks should be
authentic in nature or they lose some relevance to the students. Thus,
for me, this distinction between performance and authentic assessments becomes
insignificant and unnecessary. Consequently, I use authentic assessment and performance assessment synonymously.
(For a fuller discussion of the different terms used to describe this form of
assessment and its distinction from "traditional" or forced-choice
assessment, go to What is Authentic Assessment?)
Authentic Task: An assignment given to students designed to assess their ability
to apply standards-driven knowledge and skills to real-world challenges. A task is considered
authentic when 1) students are asked to construct their own responses rather
than to select from ones presented; and 2) the task replicates challenges faced
in the real world. Good performance on the task should demonstrate, or
partly demonstrate, successful completion of one or more standards. The
term task is often used synonymously with the term assessment in the field of authentic
assessment. (For a fuller description of authentic tasks and for
examples, go toAuthentic Tasks.)
Content Standards: Statements that describe what students should know
or be able to do within the content of a specific discipline or at the
intersection of two or more disciplines (e.g., students will describe effects of physical activity
on the body). Contrast with Process Standards and Value Standards.
Criteria: Characteristics of good performance on a particular task.
For example, criteria for a persuasive essay might include well organized, clearly stated, andsufficient support for arguments. (The singular of criteria is criterion. For a fuller
description of criteria and for examples, go to Identifying the Criteria for
the Task.)
Descriptors: Statements of expected performance at each level of performance for a
particular criterion in a rubric - typically found in analytic rubrics. See example and further discussion of descriptors.
Distractors: The incorrect alternatives or choices in a selected response item. (For
more see terminology for multiple-choice
items.)
Goal: In the field
of student assessment, a goal is a very broad statement of what students should
know or be able to do. Unlike a standard or an objective, a goal is often
not written in language that is amenable to assessment. Rather, the
purpose for crafting a set of goals typically is to give a brief and broad
picture of what a school, district, state, etc. expects its students will know
and be able to do upon graduation. (For a fuller description of the distinction
between these types of statements and for examples of each, go to Standards.)
Holistic Rubric: In contrast to an analytic rubric, a holistic rubric does not list
separate levels of performance for each criterion. Instead, a holistic rubric
assigns a level of performance by assessing performance across multiple
criteria as a whole. (For examples and a fuller discussion, go to Rubrics.)
Objective: Much like a
goal or standard, an objective is a statement of what students should know and
be able to do. Typically, an objective is the most narrow of these
statements, usually describing what a student should know or be able to do at
the end of a specific lesson plan. Like a standard, an objective is
amenable to assessment, that is, it is observable and measurable. (For a
fuller description of the distinction between these types of goal statements
and for examples of each, go toStandards.)
Outcome: See Standard. Preceding the current standards-based
movement was a drive for outcome-based education. The term standard has
replaced the term outcome with much the same meaning.
Portfolio: A collection of a student's work specifically selected to tell a
particular story about the student. See Portfolios for more details.
Process Standards: Statements that describe skills students should develop to enhance the
process of learning. Process standards are not specific to a particular
discipline, but are generic skills that are applicable to any discipline (e.g., students will find and evaluate relevant
information). Contrast with Content Standards and Value Standards.
Rubric: A scoring
scale used to evaluate student work. A rubric is composed of at least two
criteria by which student work is to be judged on a particular task and at
least two levels of performance for each criterion. (For a fuller
description of rubrics, their different variations, and to see examples, go to Rubrics. Also, see Analytic Rubrics; Holistic Rubrics.)
Standard: Much like a
goal or objective, a standard is a statement of what students should know or be
able to do. I distinguish between a standard and these other goal
statements by indicating that a standard is broader than an objective, but more
narrow than a goal. Like an objective and unlike a goal, a standard is
amenable to assessment, that is, it is observable and measurable. (For a
fuller description of the distinction between these types of goal statements
and for examples of each, clickstandards. Also, see Content Standards; Process Standards; Value Standards.)
(Actually, I prefer the way we previously used the term standard:
"A description of what a student is expected to attain in order to meet a
specified educational intent (such as a learning outcome or objective).
The description may be qualitative and/or quantitative and may vary in level of
specificity, depending on its purpose" (Assessment Handbook, Illinois
State Board of Education, 1995). In other words, an outcome would describe what
students should know and be able to do, and the standard described the
particular level of accomplishment on that outcome that you expected most
students to meet. That was your standard. We no longer commonly use that
definition of standard in assessment.)
Stem: A question or statement followed by a number of choices or alternatives
that answer or complete the question or statement. (Stems are most commonly
found in multiple-choice questions. See terminology for multiple-choice
items.)
Validity: "The degree to which a certain inference from a test is appropriate
and meaningful" (AERA, APA, & NCME, 1985). For example, if I measure
the circumference of your head to determine your level of intelligence, my
measurement might be accurate. However, it would be inappropriate for me to
draw a conclusion about your level of intelligence. Such an inference would be
invalid.
Value Standards: Statements that describe attitudes teachers would like students to
develop towards learning (e.g., students will value diversity of opinions or perspectives). Contrast with Content Standards and Process Standards.
http://jfmueller.faculty.noctrl.edu/toolbox/index.htm
No comments:
Post a Comment