Using self-report personality questionnaires for high-stakes employee assessment: The end of an era?
By Paul Barrett on January 15, 2020
©mrr / adobe.stock.com
I know, ‘end of an era’ looks ridiculous at first sight, given the huge uptake of these questionnaires in the assessment market. From a recent overview of the area:
“Personality testing is a big business. It has an estimated market value of £4 billion. There are now hundreds of vendors distributing and selling an array of tests (1,319 at the last count), deployed across a range of applications” (Munro/Envisia Learning, 2019).
But the author of this overview begins his 49-page evidence-based investigation with:
“The proposal is that self-report personality measures have now had the best part of a century to demonstrate their practical value in employee selection. But their initial potential – despite some promising signs – has not been translated into the kind of performance that has had a significant organisational impact.”
“We can only anticipate another century of counter-productive debate and confusing claims in which self-report personality test data from applicants account for less than 5% of work performance.”
This conclusion is given even more credence by the results presented in a 2016 working paper (now submitted for publication) from Frank Schmidt and colleagues entitled: “The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 100 years of research findings” This paper updates the famous 1998 article from Schmidt & Hunter (which summarised the results from 85 years of publications). In table 1, p. 65 of the 2016 article, we see that personality assessment, including emotional intelligence and standard situational judgement tests, is of marginal predictive accuracy of job performance, compared to ability assessment, structured and unstructured interviews.
Table 1: Operational validities of personality attributes as predictors of Job Performance, extracted from Table 1, Schmidt, Oh, & Shaffer (2016)
And, similarly in the Annual Review of the area in 2014, by Neal Schmitt, who states:
“The relationships between personality measures and performance vary with which of the Big Five constructs one considers and appear to be generalizable only in the case of conscientiousness. Observed correlations between performance and individual measures of personality are almost always less than .20, and corrected correlations rarely exceed .25.” (p. 46)
These results are also compounded by the near-terminal problems associated with correcting correlations (disattenuation) for range restriction and unreliability in order to compute ‘operational’ validities (LeBreton, Scherer, & James, 2014; Roth, Huy, Oh, Van Iddekinge, & Robbins, 2017; Johnson, Deary, and Bouchard, Jr., 2018).
Furthermore, after almost 80 years of self-report assessment of personality, accompanied by developments in psychometric test theory including ideal-point and forced-choice ipsative-to-normative Thurstonian item response theory (IRT) upon which the SHL OPQ- “Reimagined” is based, what we find is that none of it has had any enduring/substantive impact except on the marketing and sales of tests. Indeed, two recent articles show that the whizz-bang personality ideal-point computer-adaptive administration methodology along with SHL’s Thurstonian IRT scoring of the OPQ are no better/more predictively accurate than using a simple item sum-score (Oswald, Shaw, and Farmer, 2015; Walton, Cherkasova, & Roberts, 2019; Fisher, Robie, Christiansen, Speer, & Schneider, 2019).
This relative failure to show any real improvement in assessment is also confounded by the realisation that for many high-stakes, leadership, and high job-role-autonomy positions, such candidates invariably possess ATIC (Ability of candidates To Identify Criteria for a job-role), which is related to a candidate’s cognitive abilities and as a result increases the validities of job performance in such candidates. It’s not faking as many I/O psychologists would suggest or try to eliminate, but something which is indicative of perceptive reasoning and information extrapolation in candidates; attributes clearly of value to employers (Kleinmann, Ingold, Lievens, Jansen, Melchers, & König, 2011; Ingold, Kleinmann, König, Melchers, & van Iddekinge, 2015; Geiger, Olderbak, Sauter, & Wilheim, 2018).
And then, there is the changing workplace features of the modern corporate and job-roles within such corporates. As noted in an earlier Cognadev blog, what’s critical now for choosing a psychological assessment is an answer to our Question #3: What level of autonomy does the job-role possess? By autonomy, I mean the degree to which the employee will have the freedom to choose how they wish to achieve necessary work-goals, make decisions and ‘influence’ (in the widest possible meaning of that term) other employees and important/substantive organizational outcomes. Self-report information is not sufficiently trustworthy in this context because we need to know how a person actually functions cognitively when undertaking a complex novel task that requires information manipulation, integration, and extrapolation; and not how they think they function. Before an employer allows an employee considerable autonomy, they need to be assured it will be used responsibly.
Given the huge body of evidence presented in Andrew Munro’s article and the recent publications noted above, it might be concluded that those continuing to promote the famous commercial personality assessments for high-stakes assessment are selling products which now appear to have relied for their success more upon marketing expertise than truly long-term substantive evidence of ‘effect’. The use of self-report questionnaire assessment for modern high-stakes job-roles is, as a result, gradually fading away year-on-year because there is simply no enduring, substantive evidence that anyone can point to as clear, unambiguous, evidence of ‘effect’.
Indeed, from 1996 up until 2012 (the last available year of this specific survey data), an analysis of annual US Harris Poll data on the public perceptions of leadership in a variety of large corporates and organizations revealed that as the ‘$$ spend’ on leadership training/development increased, so did public perceptions and confidence in leaders decrease (Barrett, 2012). It’s the same with the usual employee engagement assessment and interventions. For example, Forbes magazine in 2018 reported the results of a Gallup international survey on employee engagement, with an article entitled: Our approach to employee engagement is not working.
“A staggering 87% of employees worldwide are not engaged”.
Next Generation Assessment
With the increasing focus on the psychology of high-stakes candidates, more sensitive and performance-based information is required about a candidate’s psychology and cognitive functioning than is possible to acquire via self-reports or GMA assessments. Indeed, the simple strategy (utilised by some professional services organizations) of hiring the highest-scoring candidates on ability tests is now looking unwise, as Robert Sternberg pointed out in 2017 “The IQ of smart fools”, and in 2018: “Speculations on the role of successful intelligence in solving contemporary world problems”.
This “Next Generation” of psychological assessment requires a qualified psychologist to interpret assessment results in conjunction with other external information sources and job-role/organizational experts because now we are addressing a person’s cognitive capabilities, their preferred cognitive styles/biases of working with/manipulating information, and their personal values and motivations which influence their cognition. That interpretative expertise is not something that can be acquired in a 2-day test-publisher training course.
Whether selecting entrants for leadership development or C-Suite leaders, the goal now is to assess ‘in depth’ and not rely upon the cumulative summed responses to self-report questionnaire items. Indeed, some consulting psychologists simply use the items themselves as probes, asking candidates about why they chose a particular response; as that questioning can sometimes reveal far more about how a candidate is thinking than a simple cumulative sum score of such items. But, the more powerful/robust information comes from formally observing cognitive functioning rather than asking for self-reports of functioning; looking at candidate preferences for particular values and motivations without asking them rather transparent and simple single-statement questions as is the norm for typical values/interests/motivation questionnaires.
The overarching psychological model now used as a focus for interpretation is drawn from dynamical integral psychology, not discrete-attribute psychometrics.
An example of such an assessment approach is described in detail in a previous Cognadev blog. A more in-depth explanation of the theoretical foundation and the assessment approach can be found in Prinsloo & Barrett (2013). The methodological approach, and thus the assessment techniques, namely the Cognitive Process Profile (CPP) and the Learning Orientation Index (LOI), are primarily based on the self-contained, holonically organised Information Processing Model (IPM) of Prinsloo. The assessments involve automated simulation exercises by which thinking processes are operationalised, externalised and tracked across thousands of measurement points. A person’s performance is then analysed by algorithmically based expert systems. Extensive reports are automatically generated which indicate the test candidate’s stylistic preferences, information processing competencies (IPCs), cognitive complexity, learning potential/cognitive modifiability, developmental guidelines, levels of metacognitive awareness, and an ideal work environment. Aspects such as conceptual skills, logical capability, strategic orientation, judgement and decision making are specifically addressed as well.
The above approach, formulated in the early 1990’s, therefore does not rely on the typical psychometric assumption that ‘true scores’ can be located for specific ‘traits’. In fact, the previous century’s thinking enshrined in true-score psychometrics was finally laid to rest in 2002 by Borsboom and Mellenbergh.
Perhaps the obvious ‘performance-based’ model for assessment is the use of Assessment Centres. But these are expensive to set up and run in such a way that their validity is robust and maintained (as noted in Jackson, Michaelides, Dewberry, & Kim, 2016; Dewberry & Jackson, 2016). In many ways, though, the CPP and LOI can also be regarded as automated assessment centres.
Cognadev’s unique performance-based, holonic assessment of the integral triad of cognitive complexity + preference + learning agility, and the additional unique approach to values and motivation assessment, is now of increasing relevance. Two very recent blog articles explain why in some detail:
Cognitive complexity and cognitive styles: implications for strategic work
and, part 4 of a 4-part series on intellectual capital management:
Intellectual Capital Management: Assessment Products and Constructs
And they also make clear why interpretation of the acquired assessment information requires a psychologist; because the integration of the assessment information requires more knowledge and insight into human psychology than interpreting the typical generic narrative statements which appear in most computer-generated self-report questionnaire test reports.
This may appear to negate the objectivity inherent in the generation of attribute magnitudes, orders, and classes, but is merely a recognition of the fact that the integrated assessment of any individual’s cognitive functioning, motives, and values requires much more thought and insight than the computation of a few numbers as though we were making measurement of a physical science base or derived-unit quantity.
As Jan Smedslund (2009) began the first of a series of three articles:
“This article contains a comparison between what goes on in psychological research and in psychological practice. These two kinds of activities both start from a threefold common but unstated base, namely what we all know about being human because we are humans, what we know about each other because we participate in shared meaning systems (language and culture), and what we know about individuals in their individual life situations. From this common starting point, the two activities have developed differently. In order to help people in real life, practitioners have pursued a search for effectiveness, whereas researchers, in order to produce knowledge, have pursued a search for invariance (exact or probabilistic regularities). Traditionally, practitioners are supposed to learn from the results of research. In doing research one assembles evidence for and against hypotheses linking measured variables. The results very often take the form of small average differences and low correlations. These same variables are also taken to be involved in practical psychological work and, therefore, practitioners should be able to profit from reading the research reports. However, it is difficult to apply the scientific results, given the highly complex stream of persons, circumstances, and events that make up the practical experience. Therefore, it is frequently hard to see when and how practitioners can learn anything useful from the researchers.
Here, I argue that it is mostly the other way around and that researchers must listen to and learn from what goes on in practice. Practitioners are forced by their commitment to people in real life to take into account all our advance knowledge of psychological phenomena, whereas researchers, as I will argue, frequently have ignored, excluded, or circumvented much of this knowledge in order to produce invariant empirically based findings.” p. 778-779.
That’s why Cognadev assessments are so different and so important for high-stakes assessment. We form judgments and knowledge-claims about an individual based upon formally observed and computational-rule-based ‘scored’ novel task performance (resulting in attribute magnitudes, orders, and classes) as well as considering the additional complexity provided by the assessment of motivations and values acquired by algorithmically examining the judgments and preferences made by an individual. Finally, all this information is contextualised for specific work purposes by means of the Contextualised Competency Mapping (CCM) tool, with the final results integrated by a trained psychologist in the context of the job-role expectations and specific requirements of a client.
Our evidence-bases for the judgments we formulate are robust and open to third-party evaluation in our test manuals and online Technical Report Series, except we do not assume psychological attributes are physical science quantities like length and mass; as though a magnitude on an attribute like “Curiosity” can be interpreted in the same precise way as we would “interpret” a quantitative measure of length.
There are no shortcuts to getting it right in psychological assessment when we refuse to make the untested assumptions made by so many test publishers and academics concerning the quantitative structure of psychological attributes (Michell, 1997; 2008; 2012; McGrane & Maul, 2020). These assumptions have now been openly challenged in courts of law in NZ and the US, (Barrett, 2018; Beaujean, 2018).
As Michael Maraun put it in 1998:
“There is no public, normative status at all to assertions like ‘Tomorrow we are going to measure little Tommy’s dominance’. What does this mean? In contrast to the teaching of the use of concepts such as weight and height, the teaching of the use of concepts such as dominance and intelligence does not involve the teaching of rules for measuring. There is no common language standard of correctness for a claim like ‘I measured Sue’s leadership this morning’. In other words, there is no public, standardly taught notion of what it is to be correct in making such an assertion; instead, it sounds merely curious.” p. 455
That is why Cognadev approaches high-stakes assessment as psychologists and scientists; using tools, methods, analytics, and insights which are allied to assessing ‘effectiveness” realistically rather than pretending any of this can be made to look and sound like mechanical engineering or a branch of applied mathematics (Schönemann, 1994). As I have shown above that latter approach hasn’t worked and we now understand why; which is the reason why we don’t continue doing ‘same-old’ as though this will somehow achieve the elusive success which has failed to emerge over the previous decades of doing things the same way.
Barrett, P. (2012). The public perception of institutional leadership as a function of $$ spend on executive training and development. https://www.pbarrett.net/stratpapers/annual_spend_and_leadership_perceptions.pdf
Barrett, P.T. (2018). The EFPA test-review model: When good intentions meet a methodological thought disorder. Behavioural Sciences (https://www.mdpi.com/2076-328X/8/1/5 ), 8,1, 5, 1-22.
Beaujean, A.A., Benson, N.E., McGill, R.J., & Dombrowski, S.C. (2018). A misuse of IQ scores: Using the dual discrepancy/consistency model for identifying specific learning disabilities. Journal of Intelligence (http://www.mdpi.com/2079-3200/6/3/36 ), 6, 36, 1-25.
Borsboom, D., & Mellenbergh, G.J. (2002). True scores, latent variables, and constructs: a comment on Schmidt and Hunter. Intelligence, 30, 505-514.
Dewberry, C., & Jackson, D.J.R. (2016). The perceived nature and incidence of dysfunctional assessment center features and processes. International Journal of Selection and Assessment, 24, 2, 189-196.
Fisher, P.A., Robie, C., Christiansen, N.D., Speer, A.B., & Schneider, L. (2019). Criterion-related validity of forced-choice personality measures: A cautionary note regarding Thurstonian IRT versus Classical Test Theory scoring. Personnel Assessment and Decisions (https://scholarworks.bgsu.edu/pad/vol5/iss1/3/ ), 5, 1, 1-14.
Geiger, M., Olderbak, S., Sauter, R., & Wilheim, O. (2018). The “g” in Faking: Doublethink the validity of personality self-report measures for applicant selection. Frontiers in Psychology: Personality and Social Psychology (https://doi.org/10.3389/fpsyg.2018.02153 ), 9, 2153, 1-15.
Ingold, P.V., Kleinmann, M., König, C.J., Melchers, K.G., & van Iddekinge, C.H. (2015). Why do situational interviews predict job performance? The role of interviewees’ ability to identify criteria. Journal of Business and Psychology, 30, 2, 387-398.
Jackson, J.R., Michaelides, G., Dewberry, C., & Kim, Y.-J (2016). Everything that you have ever been told about assessment center ratings is confounded. Journal of Applied Psychology, 101, 7, 976-994.
Johnson, W., Deary, I.J., & Bouchard, Jr., T.J. (2018). Have standard formulas correcting correlations for range restriction been adequately tested? Minor sampling distribution quirks distort them. Educational and Psychological Measurement, 78, 6, 1021-1055.
Kleinmann, M., Ingold, P.V., Lievens, F., Jansen, A., Melchers, K.G., & König, C.J. (2011). A different look at why selection procedures work: The role of candidates’ ability to identify criteria. Organizational Psychology Review, 1, 2, 128-146.
LeBreton, J.M., Scherer, K.T., & James, L.R. (2014). Corrections for criterion reliability in validity generalization: A false prophet in a land of suspended judgment. Industrial and Organizational Psychology: Perspectives on Science and Practice, 7, 4, 478-500.
Maraun, M.D. (1998). Measurement as a normative practice: Implications of Wittgenstein’s philosophy for measurement in Psychology. Theory & Psychology, 8, 4, 435-461.
McGrane, J. A., & Maul, A. (2020). The human sciences, models and metrological mythology. Measurement (https://doi.org/10.1016/j.measurement.2019.107346), 152,107346, 1-9.
Michell, J. (1997). Quantitative science and the definition of measurement in Psychology. British Journal of Psychology, 88, 3, 355-383.
Michell, J. (2008). Is Psychometrics Pathological Science? Measurement: Interdisciplinary Research & Perspective, 6, 1, 7-24.
Michell, J. (2012). Alfred Binet and the concept of heterogeneous orders. Frontiers in Quantitative Psychology and Measurement, http://www.frontiersin.org/quantitative_psychology_and_measurement/10.3389/fpsyg.2012.00261/abstract, 3, 261, 1-8.
Munro, A. (2019). Personality testing in employee selection Challenges, controversies and future directions. Envisia Learning: https://amazure.envisialearning.com/wp-content/uploads/2019/12/Personality-Testing-Employee-Selection.pdf
Prinsloo, M. & Barrett, P. (2013). Cognition: Theory, measurement, implications. Integral Leadership Review. http://integralleadershipreview.com/9270-cognition-theory-measurement-implications/
Roth, P.L., Huy, L., In-Sue, O., Van Iddekinge, C.H., & Robbins, S.B. (2017). Who r u? On the (in)accuracy of incumbent-based estimates of range restriction in criterion-related and differential validity research. Journal of Applied Psychology, 102, 5, 802-808.
Schmidt, F.L., & Hunter, J.E. (1998). The Validity and Utility of Selection Methods in Personnel Psychology: practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124, 2, 262-274.
Schmidt, F.L., Oh, I-S., & Shaffer, J.A. (2017). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 100 years of research findings. Available on request from: https://www.researchgate.net/publication/309203898_The_Validity_and_Utility_of_Selection_Methods_in_Personnel_Psychology_Practical_and_Theoretical_Implications_of_100_Years_of_Research_Findings
Schmitt, N. (2014). Personality and cognitive ability as predictors of effective performance at work. Annual Review of Organizational Psychology and Organizational Behavior, 1, 45-65.
Schönemann, P. (1994). Measurement: The reasonable ineffectiveness of mathematics in the social sciences (Chapter 10, pp 149-160). In I. Borg and P. Mohler (Eds.). Trends and Perspectives in Empirical Social Research.
Sternberg, R.J. (2017). The IQ of smart fools. APA Observer (online: https://www.psychologicalscience.org/observer/the-iq-of-smart-fools ), December, , 1-4.
Sternberg, R.J. (2018). Speculations on the role of successful intelligence in solving contemporary world problems. Journal of Intelligence (http://www.mdpi.com/2079-3200/6/1/4 ), 6, 4, 1-10.
Walton, K.E., Cherkasova, L., & Roberts, R.D. (2019). On the validity of forced choice scores derived from the Thurstonian item response theory model. Assessment (https://doi.org/10.1177/1073191119843585), In press, 1-13.