By Cognadev on October 9, 2018
Written by Paul Barrett and Maretha Prinsloo
The answer to the question depends upon what we mean by assessments “working”. There are two key perspectives here: the abstract and the pragmatic. First, we can ask the more abstract question:
Q1. Do they provide an accurate assessment of what they claim to assess?
This one is all about measurement validity. Many test providers will present you with a body of information quoting psychometric indices and statistics; all designed to convince you that their product is valid. Fact is, most of this information is not very trustworthy given the assumptions and misunderstandings that form the basis of claims about ‘measurement’ employed by many psychologists and test providers alike. My recent article on these issues explains why the information must be considered untrustworthy, while elaborating upon the potential legal implications of those claims in certain circumstances.
So, answering this measurement-focused question is very awkward, not least because few if any users of psychological assessments use them in a way that the routine validity workups justify. i.e. who uses test scores as the sole pieces of information for a selection decision? The validity coefficients provided by all test providers assume that an assessment score is the sole criterion for a decision. It never is except under rare circumstances i.e. autonomous target-person profiling. So, human judgement plays a part in any selection decision – and that varies in quality and amount by person, by organisation, and by context.
How do we even know if an assessment is making an ‘accurate measurement’ of what is claimed to be measured? The usual response is to reference ‘concurrent validity’ or ‘construct validity’. Concurrent/convergent validity is now somewhat problematic given three recent articles (Pace and Brannick, 2010; Thielmann & Hilbig, 2018; Rossiter, 2018) showing that assessments of same-named attributes across different tests do not necessarily provide even ‘good enough’ agreement between one another. As to construct validity, the recent book chapter from Denny Borsboom and colleagues (2009) demonstrates that the definition used by psychologists is fatally flawed.
My advice with psychological attribute interpretation is to be content with a more common-sense/generic understanding of the meaning or a scale imparted by a test provider. As Mike Maraun (1998) has indicated, most psychological attributes are understood and described by us all, as ‘common-or-garden’ concepts, which unlike natural science concepts, are not defined technically or precisely. An example of a technical definition is that of the ohm (the unit of electrical resistance). Basically, my advice is don’t sweat the measurement validity question because what really matters in applied organisational work is how we answer Q2.
Q2. If I use these assessments, will the people I hire perform as expected?
Now this is of much more interest for a workplace user of any assessment.
So, you rightly ask a provider of assessment materials: if I use your assessment, will it work for me? You carefully explain what ‘work’ means for you and listen critically to what the test provider says in response. Perhaps in the back of your mind is the very recent article from Fisher et al (2018) showing that validity coefficients computed using groups of people (i.e. the kinds of correlations test providers provide as evidence) are not actually predictive of any individual’s likely behaviour.
Quite clearly, psychological assessment is not equivalent to psychological measurement; it’s not even close. We must accept that like insurance actuaries, we can associate test scores with relevant outcomes, with probabilities assigned to those outcomes. But unlike bank credit-scoring or insurance actuaries, we do not make decisions solely on the results of an algorithm. We use our judgement, especially in high-stakes assessment.
And whereas we might be told that meta-analytic evidence can now provide definitive answers for us, the opposite seems to be the case, as noted in a recent Science article entitled: The metawars: Meta-analyses were supposed to end scientific debates. Often, they only cause more controversy. The relevant awkward example for HR is the latest meta-analytic evidence showing that unstructured interviews are equally valid as structured ones, and second only in validity coefficient magnitude to ability test validity (see Table 1 in Schmidt, Oh, and Shaffer, 2016).
Likewise, those computational estimates of financial utility provided by many test providers. The economist Michael Sturman showed as far back as 2000 that the utility formulae used by psychologists and test providers don’t work in the real world.
So, you are going to have to work through this “will it work for me” answer yourself, in your context, in your organisation. You need to carefully work through this issue with your test provider and/or an independent assessment psychologist-expert.
Accept that evaluating “will it work” is messy and complex, and not the ‘done deal’ that so many test-provider sales-people portray. But, with the right due-diligence evaluative approach, you can make a substantive difference for your organisation using an appropriate-to-your-context assessment and a set of realistic expectations.
So, do Cognadev’s Assessments Work?
Yes, given a more realistic understanding about what constitutes psychological assessment, and how assessment information will be interpreted and incorporated into a typical decision process within a selection protocol.
We are a high-stakes assessment organisation. We deal with clients who are hiring for roles where the cost of ‘getting it wrong’ is substantive. And these clients want much more information about how a person functions, their values, and their motivations, beyond the limited information-content of sum-score responses to self-report questionnaire items.
We also have a clear view of what constitutes measurement, and what does not. We do not pretend that we are making ‘measurement’ as a physicist or engineer might. Our evidence bases and computational models of relevant workplace outcomes are constructed and interpreted concordant with that understanding.
Cognadev’s assessments have introduced a new paradigm to Psychometrics. In comparison to many personality and ability tests where the constructs are empirically derived, our assessments are based on well-founded, coherent and self-contained theoretical models. In the case of the cognitive and values assessments, well-researched holonic models of thinking processes and levels of consciousness underlie the assessment methodologies (Prinsloo & Barrett, 2013; Prinsloo 2012). Motivation is also measured in terms of well-accepted and validated models.
In addition to their theoretical foundation, the Cognadev assessment methodologies are anchored in a wide range of empirical research findings available in the literature; the author’s intuitive insights; as well as in-depth action research aimed at constructing evidence bases of the validity of the assessments from the ground up.
The psychological constructs measured represent aspects of complex, self-organising and dynamic systems functioning. “To predict these kinds of outputs requires innovative classifier (ordered classes or categories) construction that neither depends upon strong quantitative measurement models for their validity, not simplistic nonsense such as true scores, latent variables, or the like.” (Barrett, 2011).
Cognadev’s assessments are based upon systems-modelling techniques where trajectory mapping for an individual is the goal. The design of these techniques involved moving from creating broad generalisations about an individual into the more precise modelling of their cognitive features. This is achieved by recording thousands of very specific observations which are interpreted algorithmically by rule-based expert systems.
The metric properties of the Cognadev assessments are largely researched through a case-building approach to validation. Reliability is defined and investigated using the straightforward engineering concept of repeatability; often of ordered-class sequences. The validity and reliability of the assessments are continuously investigated using modern statistical techniques.
In the application of assessment solutions, care is taken to ensure that the interpretation of the results is achieved by professionally trained practitioners. We also collaborate with our long term corporate clients by conducting ongoing action research to follow up and refine the reports and the interpretations of the results. Our holistic approach to assessment includes in-depth discussion of the results with the assessment candidates themselves, as these individuals can provide us with new perspectives of their results. In addition, the assessment results of individuals and teams are contextualised according to specific job requirements as systematically identified using our Contextualised Competency Mapping (CCM) job analysis tool.
Cognadev’s approach to test construction thus makes for a more honest, ethical, and realistic approach to psychological assessment.
Barrett, P.T. (2018). The EFPA test-review model: When good intentions meet a methodological thought disorder. Behavioural Sciences (https://www.mdpi.com/2076-328X/8/1/5 ), 8,1, 5, 1-22.
Borsboom, D., Cramer, A.O.J., Kievit, R.A., Scholten, A.Z., & Franic, S. (2009). The end of construct validity. In Lissitz, R.W. (Eds.). The Concept of Validity: Revisions, New Directions, and Applications (Chapter 7, pp. 135-170). Charlotte, NC: Information Age Publishing. ISBN: 9781-60752-227-0.
Fisher, A.J., Medaglia, J.D., & Jeronimus, B.F. (2018). Lack of group-to-individual generalizability is a threat to human subjects research. PNAS (https://doi.org/10.1073/pnas.1711978115 ), In Press, , 1-10.
Maraun, M.D. (1998). Measurement as a normative practice: Implications of Wittgenstein’s philosophy for measurement in Psychology. Theory & Psychology, 8, 4, 435-461.
Pace, V.L., & Brannick, M.T. (2010). How similar are personality scales of the “same” construct? A meta-analytic investigation. Personality and Individual Differences, 49, 7, 669-676.
Prinsloo, M. & Barrett, P. (2013). Cognition: Theory, measurement, Implications. Integral Leadership Review. Integral Publishers.
Prinsloo, M. (2012). Consciousness models in action: Comparisons. Integral Leadership Review. Integral Publishers.
Rossiter, J.R. (2018). The New Psychometrics: Comment on Appelbaum et al. (2018). American Psychologist (http://psycnet.apa.org/record/2018-48461-001) , 73, 7, 930-931.
Schmidt, F.L., Oh, I-S., & Shaffer, J.A. (2016). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 100 years of research findings. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2853669 , 1-74.
Sturman, M.C. (2000). Implications of Utility Analysis adjustments for estimates of human resource intervention value. Journal of Management, 26, 2, 281-299.
Thielmann, I., & Hilbig, B.E. (2018). Nomological consistency: A comprehensive test of the equivalence of different trait indicators for the same constructs. Journal of Personality (https://onlinelibrary.wiley.com/doi/pdf/10.1111/jopy.12428 ), In Press, 1-16.