Social Media

A method for evaluating research syntheses: The quality, conclusions, and consensus of 12 syntheses of the effects of after-school programs

Description
A method for evaluating research syntheses: The quality, conclusions, and consensus of 12 syntheses of the effects of after-school programs
Categories
Published
of 19
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  Research Article Received 14 August 2009, Revised 20 November 2009, Accepted 20 November 2009 Published online 28 January 2010 in Wiley Interscience (www.interscience.wiley.com) DOI: 10.1002/jrsm.3 A method for evaluating research syntheses:The quality, conclusions, and consensus of 12syntheses of the effects of after-schoolprograms  JeffreyC.Valentine, a ∗ † HarrisCooper, b ErikaA.Patall, b DianaTyson b and JorgianneCiveyRobinson b Like all forms of empirical inquiry, research syntheses can be carried out in ways that lead to more or less valid inferencesabout the phenomenon under study. This synthesis of syntheses (a) examined the methods employed in the syntheses of the effects of after-school programs (ASPs) and determined how closely they conformed to what is defined as best practicefor research synthesis, (b) compared the inferences drawn from the ASP research literature by each synthesis with theinferences that plausibly could be made from the data they covered, and (c) determined the points of consistency acrossthe syntheses with regard to both potentially valid and potentially invalid conclusions. It was found that the 12 synthesesused highly divergent methods, varying in problem definitions, search strategies, inclusion criteria for individual studies,and techniques for drawing conclusions about the cumulative evidence. Copyright  ©  2010 John Wiley & Sons, Ltd.Keywords:  research synthesis; systematic review; meta-analysis; quality; trustworthiness Introduction Like all forms of empirical inquiry, research syntheses can be carried out in ways that lead to more or less valid inferences aboutthe phenomenon under study. Until the mid-1970s, research synthesis was a subjective process with little protection against biasfavoring the perspective of the person conducting it. However, in the past three decades, the methods for the retrieval, integration,and interpretation of research literature have undergone enormous change (see [1, 2], for histories of these developments). Today,research synthesis has its own methodological techniques and decision rules, all meant to help synthesists produce an unbiasedestimate of what the cumulative evidence says. The focus of this paper is on applying advice on what constitutes ‘best practice’in systematic reviewing (e.g. [3, 4]) to a research area with a number of existing research syntheses. Specifically, we examinedsyntheses of research on after-school programs (ASPs) that are typified by the 21 Century Community Learning Centers (21stCentury CLC) model (i.e. largely operate during the hours immediately after school lets out, provide opportunities for academicenrichment, and offer a broad array of additional services, programs, and activities). It is our hope that this exercise will providea viable approach that might be used by others involved in the evaluation of the value of research syntheses. In the sectionsthat follow, we briefly trace the historical development of advice on conducting and evaluating research syntheses, then distillsome critical elements of that advice and apply it to 12 research syntheses on the effects of ASPs. a College of Education, University of Louisville, 309 CEHD, Louisville, KY, U.S.A. b Duke University, Durham, NC, U.S.A. ∗ Correspondence to: Jeffrey C. Valentine, College of Education, University of Louisville, 309 CEHD, Louisville, KY, U.S.A. † E-mail: jeff.valentine@louisville.eduContract/grant sponsor: Spencer Foundation; contract/grant number: 200600070  2  0  Copyright  ©  2010 John Wiley & Sons, Ltd.  Res. Syn. Meth.  2010, 120--38  J. C. VALENTINE  ET AL.Development of standards for research synthesis One of the earliest sets of prescriptions for conducting research syntheses was provided by Cooper [5]. He argued that similarto primary research, a research synthesis involves several stages of implementation. These stages demarcated the principal tasksof a research synthesis so that the effort produces an unbiased rendering of the cumulative state of evidence on a researchproblem. To justify this approach, Cooper [6] wrote:...the integration of separate research projects involves scientific inferences as central to the validity of knowledge as theinferences made in primary research ... Most important, the methodological choices at each review stage may engenderthreats to the validity of the review’s conclusions. (pp. 291–292)For each stage, Cooper codified the research question asked, its primary function in the synthesis, and the procedural differencesthat might cause variation in conclusions. Then, he applied the notion of threats-to-inferential-validity to research synthesis ([7];also see [8]). Cooper identified 10 threats to validity that might undermine the trustworthiness of the findings of a researchsynthesis. He focused primarily on validity threats that arise from the procedures used to cumulate studies, for example, biases inthe literature searching or the criteria used for including studies in the synthesis. The threats-to-validity approach was subsequentlyapplied to research synthesis by Matt and Cook [9], who identified 21 threats, and Shadish  et al.  [8] who expanded this list to 29threats. In each case, these researchers described threats related not only to potential biases caused by the process of researchsynthesis itself, but also to deficiencies in the primary research that made up the evidence base of the synthesis. For example,they included as a threat to validity the lack of representation of important participant populations in the primary studies. Thisthreat, while certainly a real threat to the validity of inferences arising from a synthesis, is a consequence of the sampling methodsused in the studies that make up the evidence base, rather than a choice of methodology employed by the research synthesists.Missing from the social science literature have been scales or checklists meant more to provide assistance to the consumers of research syntheses in assessing how much credibility they should place in these efforts. Research synthesis textbooks are oftenexhaustive in their prescriptions for how to carry out a trustworthy synthesis. However, they are written primarily for synthesisproducers, providing far more detail than the typical consumer wants or even needs to decide how much credence to place inthe conclusions of a synthesis they are reading to help them make decisions about a practice or public policy.Interestingly, researchers in medicine and public health have focused more attention on the development of consumer-oriented evaluation devices than have social scientists. Still, the resulting scales and checklists are not numerous and efforts toconstruct them have been only partly successful. For example, after a thorough search of the literature a report prepared by theAgency for Healthcare Research and Quality [10] identified one research synthesis quality scale and 10 sources they labeled as‘checklists,’ although two of these sources are essentially textbooks on conducting research syntheses for medical researchers.More recently, the AMSTAR checklist was developed [11], but this instrument also has not undergone a thorough assessment of its construct validity.Generally, the scales and the checklists in medical and health research distill the critical decision points that need to beconsidered when consumers evaluate the trustworthiness of a research synthesis. Examining their content can be instructive forsocial scientists because many of the issues that are important in synthesis methodology are independent of the specific focus of the problem. Still, some of the issues that are of importance to medical researchers may dictate a relative emphasis on aspects of methodology that might not be isomorphic with the issues of importance to social scientists. For example, the AMSTAR instrumentdoes not have items that examine whether the keywords employed in an electronic search fully capture the different terms studyauthors have used to describe similar activities. ASPs are sometimes spelled out as ‘after-school programs’, and a synthesist whodoes not know this will likely miss some portion of the relevant literature. The omission of such items likely reflects the factthat issues of construct validity often receive (and perhaps need) less attention in medical research. The attempts to developscales and checklists by medical and health scientists also can be instructive because the strengths and the weaknesses of theseinstruments can highlight for social scientists proper and improper ways to construct useful evaluation devices. Cooper’s evaluative checklist  . Cooper’s [3, 4] checklist was meant to systematically organize the most important issues addressedin both the ‘threats-to-validity approach’ used by social scientists and the checklist approach used in medical and health research.For this synthesis, we focused on the quality of the reports in five general areas: (a) defining the problem (e.g. ‘Are the variablesof interest give clear conceptual definitions?’), (b) collecting the research evidence (e.g. ‘Were complementary searching strategiesused to find relevant studies?’), (c) evaluating the correspondence between the methods and the implementation of individualsstudies and the desired inferences of the synthesis (e.g. ‘Were studies categorized so that important distinctions could bemade among them regarding their research design and implementation?’), (d) summarizing and integrating the evidence fromindividual studies (e.g. ‘Was an appropriate method used to combine and compare results across studies?’), and (e) interpretingthe cumulative evidence (e.g. ‘Were analyses carried out that tested whether results were sensitive to statistical assumptionsand, if so, were these analyses used to help interpret the evidence?’). The questions are written from the point of view of asynthesis consumer and each is phrased so that an affirmative response means confidence could be placed in that aspect of thesynthesis’ methodology. While these questions are not exhaustive, most of the threats to validity identified in the earlier worksfind expression in them, as do the dimensions used in medical and health checklists that seem most essential to work in thesocial sciences (there is, for example, quite a bit of conceptual overlap between Cooper’s 20 questions and the questions on theShea  et al.  [11] instrument).We used Cooper’s [3, 4] checklist as a guide to evaluating syntheses to help us determine if there are lessons to be learnedfrom the existing syntheses about the effective development and implementation of ASPs. More specifically, we (a) used theguidance provided by Cooper to identify the critical dimensions needed to appraise the trustworthiness of research syntheses, Copyright  ©  2010 John Wiley & Sons, Ltd.  Res. Syn. Meth.  2010, 120--38  2  1   J. C. VALENTINE  ET AL. (b) examined the methods employed in the ASP syntheses and determined how closely they conformed to what is defined asthe best practice for research synthesis, (c) compared the inferences drawn from the ASP research literature by each synthesiswith the inferences that plausibly can be made from the data they cover, and (d) determined the points of consistency acrossthe syntheses with regard to both potentially valid and potentially invalid conclusions. Methods Literature search procedures Our experience with the ASP literature suggested that the syntheses of interest to us were relatively less likely to be published inacademic outlets. Therefore, to locate syntheses of the ASP literature, we first retrieved several synthesis mentioned in meetingsconvened by the C. S. Mott Foundation, a non-profit foundation that funds research on the effects of ASPs, and by the NationalPartnership for Quality Afterschool Programs. We supplemented the documents obtained at these meetings by: (a) examining thereference sections of the syntheses for citations to other possibly relevant syntheses, (b) contacting other researchers working inthe field of youth development generally, and (c) conducting an electronic forward citation search for documents that referencedthe 21st CLC enabling legislation, which identified documents that cited this reference. Coding of the syntheses We created a coding guide to serve as a data extraction form, using as a frame the synthesis quality criteria outlined in Cooper[3, 4]. All items were coded by at least two trained researchers. Agreement rates were checked for each item. When agreementwas not 100%, another individual served as the arbiter. In addition, we contacted authors of the syntheses as a final check onour coding.All but one of the research syntheses used in this investigation made direct reference to the 21st Century CLC enablinglegislation, although explicit reference to the definition of programs contained therein never occurred. Most made reference tothe legislation for its impact on the funding of such programs. More relevant, the reports either (a) said that the term ‘21stCentury Community Learning Centers’ or a variation thereof was used in their search of reference databases, (b) used evaluationsof 21st Century CLC funded programs as examples of the type of program included in the synthesis, and  /  or (c) explicitly statedthat such programs were included in their evidence base.In order to investigate the nature of the operational definitions used in the research syntheses, our coding protocol wasbased on the 21st Century CLC definition, which recognized nine distinctions in programs. Table I presents the nine operationalcharacteristics of programs and the 29 possible variations in these features of programs. Some of the nine characteristics areexplicitly mentioned in the 21st Century CLC definition and others are implied. Within characteristics, we filled out the list of features so that they would exhaustively represent possible program variations. Note that the program variations are not mutuallyexclusive. Rather, a particular synthesis team might have defined their research domain so that the programs their synthesisincluded (that is, deemed relevant for the synthesis) could contain more than one type of variation within a characteristic.One problem we encountered in applying the coding protocol was that the syntheses varied in how explicitly they specifiedtheir operational definition of the programs. Not surprisingly, some synthesists were explicit about whether programs with certaincharacteristics were or were not included while other characteristics were left unmentioned. Despite this ambiguity, it couldoccasionally be inferred that programs with certain characteristics were or were not included, often from the way programs werediscussed or from the features of programs examined in the research that was covered. Therefore, we applied the coding protocolfor operational definitions twice. First, for the 29 features of a program, we coded each synthesis based on whether it explicitlyspecified that the feature was or was not deemed relevant for inclusion in the synthesis. In other words, each coder answeredthe question ‘Did the report explicitly specify whether programs with or without this feature were included or excluded from thesynthesis, yes or no?’ Second, we made judgments about whether each variation was or was not included, based on (a) explicitinformation in the document or (b) what appeared to be implicit operational criteria for inclusion or exclusion. For this judgment,coders could use a third category that indicated that no inference was possible. In other words, coders answered the question‘Were programs with this feature included in or excluded from the synthesis, or could you not make this determination?’ Results Our initial beliefs about the publication status of the syntheses on ASPs seem to have been correct. Of the 11 syntheses weinitially uncovered, 10 were unpublished. One unpublished document was subsequently published, and another document thatis now in the publication pipeline also appeared (through our informal network of scholars). Therefore, this synthesis includes 12syntheses of the effects of ASPs. Three reports we retrieved and read were excluded from our examination of research syntheses [24--26]. Specifically, theresearch syntheses in the Catalano  et al.  [24] and Roth  et al.  [26] reports dealt foremost with Positive Youth Development (PYD)programs. Although PYD programs were sometimes implemented outside the school day, they took many other forms as wellincluded, for example, serving as curriculum components for in-school courses on health. Thus, we excluded the two synthesesthat emphasized stand-alone programs meant to teach specific skills that could be, and often were, implemented in settings  2  2  Copyright  ©  2010 John Wiley & Sons, Ltd.  Res. Syn. Meth.  2010, 120--38   J   . C  .V A L  E  N T  I   N E   E  T  A L   . Table I . Features of ASPs included and excluded from the definitions used in the research synthesis.ReviewBeckett Scott-Little Little and McComb and Lauer Bodily and Durlak Fashola  et al. et al.  Harris Miller Scott-Little Hollister Kane  et al.  Beckett Zief   et al. Characteristic [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23]Calendar Before school  ◦ ◦ ◦ • •    ◦ After School  • • • ◦ • • • • • • • • During school hours  ◦          On weekends  ◦ ◦ ◦ • ◦ ◦ During school holidays  ◦ ◦ ◦ • ◦ During summer  ◦ ◦ ◦ ◦ ◦ • • ◦   Duration Indefinite time period  ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ • ◦ Discrete time period  ◦ ◦ ◦    ◦ Location In schools  • ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ • ◦ In community centers  • ◦ ◦ • ◦ • • ◦ In private residenceGoal specificity Discrete skills  • ◦ ◦ • ◦ ◦ ◦ ◦ ◦ • General impact  • ◦ ◦ • • • ◦ ◦ ◦ • • • Goal content Academic achievement  • ◦ ◦ • • • • ◦ • • •   Prevention  ◦ ◦ • • ◦ • ◦ • ◦ ◦ Hobbies  /  Interests  • ◦ • ◦ • ◦ ◦ ◦ • •   Social  /  Emotional development  ◦ ◦ ◦ • • ◦ • ◦ • • • Safe environment  • ◦ ◦ • ◦ ◦ ◦ ◦ ◦ • ◦ Staffing Groups  • ◦ • ◦ • ◦ ◦ ◦ ◦ • ◦ ◦ Individuals  •    ◦ • ◦ ◦ ◦    School status Pre-school  •    ◦ • •     • of participants Grades K–5  • ◦ • • • • ◦ ◦ • • • • Grades 6–8  • ◦ • • • • ◦ ◦ • • • • Grades 9–12  • ◦ • • • • ◦ • • • • High school drop-outs    Participant None  ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ restrictions Emphasis on participantswith defined need  ◦ ◦ ◦ ◦ ◦ ◦ ◦ •   Only participantswith defined need  ◦ ◦ ◦ ◦ •   Family involvement Encourages family involvement  ◦ ◦ ◦ ◦ Notes : This table describes whether each program feature was either explicitly specified versus inferred to be included in or excluded from each synthesis. The symbol  •  indicates that the review explicitly specified that the corresponding program characteristic was included as a part of the synthesis. The symbol  ◦  indicates that the review did not explicitly specify that the corresponding program characteristic was included in the synthesis, but it was inferred to beincluded. The symbol    indicates that the review explicitly specified that the corresponding program characteristic was excluded as a part of the synthesis. The symbol    indicates that the review did not explicitly specify that the corresponding program characteristic was excluded in the synthesis, but it was inferred to beexcluded.  C  o  p  y r  i     gh  t   © 2  0 1  0  J   oh nWi   l    e  y  &  S   on s  ,L   t   d  . R  e  s  . S    y n .M e  t  h   . 2   0  1  0    , 1 2  0 -- 3  8      2    3  J. C. VALENTINE  ET AL. other than ASPs. Similarly, Eccles and Templeton [25] addressed the effectiveness of activities and community programs morebroadly defined than those implied by the 21st Century CLC definition.We turn now to a presentation of the specifics of the 12 syntheses. For each general category of questions for evaluatingresearch syntheses presented in Cooper [3, 4], we compare the syntheses in an effort to both resolve the apparent differences inconclusions arrived at by the different synthesis teams, and to get a sense of where consensus about the effects of ASP mightlie. Coding agreement  As we coded the syntheses, we kept track of our agreement rates. When judges disagreed, the disagreements were resolved eitherin conference or through arbitration with an author not involved in the coding. Not surprisingly, there was little disagreementregarding whether the synthesis reports explicitly stated the program feature was or was not a part of the definition (82% of the codings were agreements). Judgments about the second question, concerning which program features were or were notincluded in the syntheses, produced less frequent unanimous agreement, but unanimous agreement was still reached most of the time (75% of the codings were agreements). Because there were two categories for judgments about whether a feature wasspecified and three about what the included features were (that is, the feature was included, the feature was excluded, or noinference could be drawn) it is possible that much of this difference is a function of different base rates of chance agreements.Most interesting, then, is that all disagreements about whether or not a feature of programs was included in the operationaldefinition used in a synthesis occurred when the coders had to make inferences based on information implicit in the report.At one level, this finding seems obvious: When synthesists explicitly stated that ASPs with or without a particular feature wereor were not included in the synthesis, judges agreed on this determination more often than when they had to infer it fromother information. Still, the point is not trivial because it underscores the importance of being explicit about what operationaldefinitions were included in a synthesis. Without explicit statements of included operations consumers will be more likely todisagree about what operations the synthesists intended to include, and hence the operations and constructs to which thesynthesists’ conclusions apply. The importance of this point becomes even clearer when we look at how the operational definitions used to include andexclude studies from the syntheses differed from one another. These operational definitions are presented in Table I. In about aquarter of the cases, the judges could make no inference about the feature. These cases were not evenly distributed across the12 syntheses. The syntheses ranged from a high of 17 ‘unaddressed’ feature specifications to a low of only 3. Also, the inabilityto infer whether or not features were included was not evenly distributed across features. All syntheses included programs thatoccurred during the hours after school, but in no case could we make a determination about whether programs operating inprivate residences were included or excluded in the syntheses and in only two cases could we infer that the included programsexcluded high school dropouts.It could be argued that expecting synthesists to provide such complete explication of definitions sets an unfair standard. Afterall, we decided what our definition of an ASP would be, and we deliberately chose a broad one that was ultimately derivedfrom legislation. However, our definition does provide a large canvass for painting the definitional picture and it is clear that thesyntheses differed in how explicit they were about definitions, even with regard to program characteristics about which there canbe a little argument concerning relevance. For example, Lauer  et al.  [20] provided the only report that explicitly stated whetheror not each possible calendar feature was or was not included in their synthesis of ASPs. Still, this difficulty would be minimizedif we could assume that ‘if it was not mentioned, it was not included.’ However, there were five program features for which atleast one synthesist explicitly included programs with the feature and another synthesist explicitly excluded programs with thefeature. Thus, the operating rule would have led to some errors. And, inclusion criteria can be finely nuanced. Zief   et al.  [22]mentioned that the delivery of programs at times other than after the school day ‘could not be the primary means throughwhich the program attempts to influence outcomes’ (p. 4) but did include evaluations of programs that included Saturdays andsummers along with a primary after-school component. There were also three instances in which the judges came to consensusabout the inferred inclusion or exclusion of a feature, but the judgment was different for different syntheses. The existence of these types of disagreements would make it hard to justify a blanket rule governing the inference that should be made in theabsence of explicit discussion in the synthesis. Defining the problem in the synthesesConceptual and operational definitions in the syntheses.  Table I describes the features of studies that were used as inclusion andexclusion criteria in the 12 ASP syntheses, based either on information explicitly stated in the syntheses or by inferences madeby the coders. We would not use the information contained in Table I to suggest that most of the syntheses used ‘correct’or ‘incorrect’ definitions of ASPs. Applying these characterizations involves assessing the fit between the conceptual and theoperational definitions used in the synthesis and consumers might differ with our judgment about when a fit is good or bad.For example, eight of the synthesists use the term ‘after school’ as the conceptual label for the types of programs that interestthem. However, the synthesis by Hollister [18] labeled the programs of interest as ‘after-school’ but also included one programdelivered exclusively during summer. It could be argued that Hollister should have either used the broader ‘out-of-school-time’conceptual label to describe the purview of his synthesis or he should have excluded from it the evaluation.We would not argue that all syntheses on the same topic should employ the exact same conceptual and operational definitions.In fact, there is value in having a somewhat heterogeneous sample of possible definitions. What we would argue is that the  2  4  Copyright  ©  2010 John Wiley & Sons, Ltd.  Res. Syn. Meth.  2010, 120--38
Search
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks