Advertisement

Expert Panel to Scrutinize Test Scoring : Schools: Department of Education says previous reports misstated the extent of problems in new statewide program. But the accuracy of the sampling process will be studied further.

Share via
TIMES STAFF WRITER

A panel of experts in statistics has been appointed by the state Department of Education to look into the procedures used to score tests taken last year by 1 million students in the first year of the California Learning Assessment System.

The move follows widespread criticism of the accuracy of test results released last month and strong reaction to a Times story on April 10 that analyzed the test scoring and reported numerous problems.

The experts will report their findings publicly to the State Board of Education in July, Acting Supt. of Public Instruction William D. Dawson said.

Advertisement

At issue is the method used by testing officials to score the exams. To save money, only a portion of the tests taken at more than 7,000 schools statewide in reading, writing and math were scored. An elaborate sampling process was used to select the tests to be scored.

The Times story examined the sampling process and reported that the test results “for hundreds of schools may be wildly inaccurate” and that the state violated its guidelines in more than 11,000 cases--most of them minor--by scoring fewer tests than promised. The story quoted some educational testing experts who criticized the state’s methodology.

Dawson said the Times computer analysis misstated the extent of the problems because of a misunderstanding between the newspaper and state testing officials.

Advertisement

The “scoring design” provided to The Times by the state April 5 explained the sampling method and, in a sequence of columns titled “number scored per school,” appeared to report the minimum number of test booklets that should be graded at each school to provide statistical validity for the results.

On Friday, however, Dawson said the scoring design did not specify how many tests should actually be scored at each school. Instead, he said, the document was a draft that contained mislabeled columns and should have shown only the number of students whose tests should be pulled for possible scoring.

Testing officials assumed that some of those selected tests would not be scored for various reasons--such as being lost or defaced--thus shrinking the sample, he said.

Advertisement

Under that definition, Dawson said, it would not be possible to break any guidelines because the Department of Education did not set a minimum number of tests to be scored at each school.

Gerry Shelton, a state consultant who helped design the testing procedure, said the assumption was made that at some schools, as many as 20% of the tests chosen for possible scoring might not be graded.

According to a new Times computer analysis, there were 878 instances where the sample scored at a particular school was at least 20% smaller than the number of tests selected for possible scoring. That is about 4% of the nearly 22,000 samples drawn from the state’s public schools.

The panel of outside experts who will examine the scoring methods are Lee Crombach, professor emeritus at Stanford University; Norm Bradburn, professor of business and public policy at the University of Chicago and senior vice president for research at National Opinion Research Corp., and Dan Horvitz, former president of Research Triangle Institute who is now affiliated with the National Institute of Statistical Sciences.

The April 10 Times article also cited details of instances where extremely small sample sizes had been used to give overall test scores for a school. There were two schools where a single student’s test was used to score the entire school.

The Times computer analysis identified 148 cases where schools were given their test results based on samples of fewer than 25% of the exams taken. Dawson said Friday that officials are still checking suspicious results and that the number of cases with such faulty sample sizes may reach 250 or more.

Advertisement

Additional tests will be graded at those schools and their overall school scores will be revised, Dawson said.

In general, the state’s method for dealing with low sample sizes was to publish the margin of error, known as “standard error,” for each score at each school. A large standard error should serve as a signal that a low sampling problem existed at a particular school.

The Times analysis found 331 instances where the standard error was greater than 15 percentage points and 1,511 instances where it was between 10 and 15 points. Dawson said the state had set no threshold for acceptable error.

Dawson said those kinds of errors should not occur when tests being taken now in schools statewide are graded. The state intends to grade all exams in the fourth and eighth grades, and a higher percentage of those taken in the 10th grade. Also, new computer instructions will check for low sample sizes and prevent them from being used.

The state chose not to grade all of the tests taken last year in order to save money. The California Learning Assessment System, which uses performance-based tests and measures students against tough statewide standards, must receive funding each year from the state Legislature and is expected to cost $55 million a year when fully in place.

Last week, some leaders in the Legislature urged the state to slow expansion of CLAS into more grades because of the questions about accuracy of the test results.

Advertisement

“Ignorance is forgivable, stupidity is not,” said state Sen. Leroy Greene (D-Carmichael) of the Senate education panel. “The state Department of Education exhibited a high coefficient of stupidity in the way they handled the test.”

Advertisement