July 30, 2015
It is getting very hard to keep up with the American Institute for Research's educational malfeasance in the testing realm. Parent Advocate Deb Herbage and Dr. Karen Effrem of FSCCC have reviewed the 2245 page contract between AIR and the Florida Department of Education. Here are just a few of those revelations as Alpine spends another $600,000 of Florida taxpayer funds to do a validity study that should have already been done and that Commissioner Stewart promised was done. The public has been told this test is going to cost the public $220 million over six years. It is very important to verify what has been paid for has been done.
The following information was taken directly from the executed contract Contract #14-652 for the development of the state assessment (FSA) and Algebra I, Algebra II, and Geometry End of Course (EOC's) exams executed on 6/3/14. The contract was signed by Pam Stewart (FLDOE) and Vickie Brooks (AIR). [The full 2245 page contract is available at the Florida CFO website with all page numbers referencing this document].
1) Potentially Missing Linking, Validity and or Field Studies Have Florida Taxpayers Paid for Work Not Completed?
The FLDOE required in its ITN (Invitation to Negotiate) that:
7.6.0. Scaling, Equating, Scoring and Special Psychometric Studies (Pg. 77)
Excerpt from the ITN" Other than the annual regular psychometric operations, such as sampling, test construction, and SES for the assessment system, the contractor will conduct a set of special psychometric studies for these assessments, described in Section 18.104.22.168. (Pg. 77)
The contractor must show evidence that the Department's preferences are psychometrically defensible and operationally feasible. The respondent may include in the reply a different proposal for scaling, equating, and scoring of these assessments to obtain assessment results that are valid, reliable, and accurate. (Pg. 165)
AIR said in numerous places in its proposal that became the executed contract that its tests are psychometrically valid and reliable. Here is one example:
AIR's Assessment Program offers psychometric and statistical services that stand alone in terms of quality and innovation. The integration of psychometrics with statistics and sampling sets AIR apart from the competition. Although testing firms often bring expertise in psychometrics, the quality of those services depends dramatically on the samples on which the data are based. Typical samples used in state testing programs can undercut the best psychometrics, leading to volatile test results from year to year and inaccurate classification of examinees. AIR combines expertise in sampling and psychometrics; all of our samples are optimized, and our statistics accurately reflect the complexities of the sample designs.
7.7.9 Reimbursable Funding Categories
Six funding categories are designated for specified program functions and may be used only for those functions. Use of these funds requires authorization by the Department contract or program manager or program area leads. (p. 392-393)
If these studies regarding validity and field testing had been completed by the end of 2014 as shown above, why did Commissioner Stewart or the FLDOE not provide them to legislators as promised after her March 4th 2015 testimony
proclaiming that the FSA was field tested in Utah and that it was "absolutely psychometrically valid and reliable"?
One document sent to senators from the FLDOE to verify the commissioner's testimony had nothing to do with the FSA or Utah test. It was the technical report on the old FCAT and EOC tests. Even it admitted that after all the years those tests had been in use, there is still doubt about whether they are valid for use in high stakes decisions:
Less strong is the empirical evidence for extrapolation and implication. This is due in part to the absence of criterion studies. Because an ideal criterion for the FCAT 2.0 or EOC assessments probably cannot be found, empirical evidence for the extrapolation argument may need to come from several studies showing convergent validity evidence. Further studies are also needed to verify some implication arguments. This is especially true for the inference that the state's accountability program is making a positive impact on student proficiency and school accountability without causing unintended negative consequences. (Emphasis added).
Aside from this not being about the same test, based on different standards, using a different developer and from a different state, the above quote completely negates the validity of the FCAT, which had been in place for years, much less the FSA which is still being put together on the fly.
If after the many years that we have had the FCAT 2.0, they STILL don't know if the test is adequately measuring and having a positive effect on student proficiency, that the use in accountability is valid, and that this whole system is not having unintended negative consequences, how in the world can they say anything about the validity of the FSA? There is nothing here to answer the questions about the Utah field test. In fact, the word Utah does not appear in above quoted report.
And in addition, efforts to verify that the Utah test had been validated have been unsuccessful. As extensively described by the Utah blog, What is Common Core? Education Without Representation, prominent Utah child psychologist Dr. Gary Thompson offered a $100,000 reward
for evidence on validation that was never claimed. Here is a letter from a Utah school board member to Senator David Simmons (R-Altamonte Springs) to verify that the Utah State Office of Education had never supplied the requested validation documents, documents that according to Stewart, if the Utah test was validated, should be easily available:
2) FLDOE Did Not Ask Testing Company (AIR) to Produce Validity Studies on Using Test Scores for the Extremely Consequential High Stakes Decisions of High School Graduation and Third Grade Promotion in the ITN.
FLDOE in its Invitation to Negotiate (ITN), mentions the requirement for the contractor to set cut scores for the high stakes decisions of retention and graduation:
The contractor is responsible for facilitating the Department's process to establish achievement levels and associated cut scores for the Florida Standards ELA/L, Mathematics, Algebra 1, Algebra 2, and Geometry Assessments; achievement levels, passing scores, retention cut for grade 3 ELA/L, graduation standards for ELA/L, and achievement levels and passing scores for the Algebra 1, Algebra 2, and Geometry Assessments in consultation with Florida educators and citizens. (Emphasis added p. 170)
Because FLDOE did not even ask for any company to make sure that their tests were valid for these extremely important decisions, AIR had to offer to provide these additional studies:
We also provide as a separate cost option other validity studies that may be useful to support the goals of the state. These studies are not requested in the ITN, and for this reason, we propose them as optional studies AIR is prepared to implement for Florida if selected.
These studies are centered on the high-stakes nature of the test scores and their uses in student-level promotion/graduation and the teacher accountability system using test scores.
The additional validity studies we propose focus on evidence needed to support the following uses of the test scores in the Florida accountability system:
Evidence to support the use of the grade 3 level 2 cut score for determining student level promotion decisions
Evidence to support the use of the level 3 cuts as a graduation requirement for the
end-of-course exams (Emphasis added p. 909).
Unfortunately, there is no evidence on the deliverable list that FLDOE accepted the offer to do them there is no mention of third grade or graduation on the list of deliverables.
How many lives has Florida already ruined with this unsubstantiated policy given that the FCAT technical report already admits they still need more information that these policies are having the desired effect? (See full quote from FCAT validity study in question 1). How many more will be affected by this decision not to verify the use of the FSA for these policies? In what legal and financial jeopardy will this decision not tovalidate the cut scores for the high stakes decisions place the state and the taxpayers of Florida?
Dr. Karen Effrem , Executive Director of Stop Florida Common Core Coalition
Common Core Discussion Group