1
|
Habib F, Ali Z, Azam A, Kamran K, Pasha FM. Navigating pathways to automated personality prediction: a comparative study of small and medium language models. Front Big Data 2024; 7:1387325. [PMID: 39345825 PMCID: PMC11427259 DOI: 10.3389/fdata.2024.1387325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 08/28/2024] [Indexed: 10/01/2024] Open
Abstract
Introduction Recent advancements in Natural Language Processing (NLP) and widely available social media data have made it possible to predict human personalities in various computational applications. In this context, pre-trained Large Language Models (LLMs) have gained recognition for their exceptional performance in NLP benchmarks. However, these models require substantial computational resources, escalating their carbon and water footprint. Consequently, a shift toward more computationally efficient smaller models is observed. Methods This study compares a small model ALBERT (11.8M parameters) with a larger model, RoBERTa (125M parameters) in predicting big five personality traits. It utilizes the PANDORA dataset comprising Reddit comments, processing them on a Tesla P100-PCIE-16GB GPU. The study customized both models to support multi-output regression and added two linear layers for fine-grained regression analysis. Results Results are evaluated on Mean Squared Error (MSE) and Root Mean Squared Error (RMSE), considering the computational resources consumed during training. While ALBERT consumed lower levels of system memory with lower heat emission, it took higher computation time compared to RoBERTa. The study produced comparable levels of MSE, RMSE, and training loss reduction. Discussion This highlights the influence of training data quality on the model's performance, outweighing the significance of model size. Theoretical and practical implications are also discussed.
Collapse
Affiliation(s)
- Fatima Habib
- FAST School of Management, National University of Computer and Emerging Sciences, Lahore, Pakistan
| | - Zeeshan Ali
- Oxford Brookes Business School, Oxford Brookes University, Oxford, United Kingdom
| | - Akbar Azam
- FAST School of Management, National University of Computer and Emerging Sciences, Lahore, Pakistan
| | - Komal Kamran
- FAST School of Management, National University of Computer and Emerging Sciences, Lahore, Pakistan
| | - Fahad Mansoor Pasha
- Faculty of Business Administration, Lahore School of Economics, Lahore, Pakistan
| |
Collapse
|
2
|
Giannini F, Marelli M, Stella F, Monzani D, Pancani L. Surfing the OCEAN: The machine learning psycholexical approach 2.0 to detect personality traits in texts. J Pers 2024. [PMID: 38217359 DOI: 10.1111/jopy.12915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 10/11/2023] [Accepted: 12/13/2023] [Indexed: 01/15/2024]
Abstract
OBJECTIVE We aimed to develop a machine learning model to infer OCEAN traits from text. BACKGROUND The psycholexical approach allows retrieving information about personality traits from human language. However, it has rarely been applied because of methodological and practical issues that current computational advancements could overcome. METHOD Classical taxonomies and a large Yelp corpus were leveraged to learn an embedding for each personality trait. These embeddings were used to train a feedforward neural network for predicting trait values. Their generalization performances have been evaluated through two external validation studies involving experts (N = 11) and laypeople (N = 100) in a discrimination task about the best markers of each trait and polarity. RESULTS Intrinsic validation of the model yielded excellent results, with R2 values greater than 0.78. The validation studies showed a high proportion of matches between participants' choices and model predictions, confirming its efficacy in identifying new terms related to the OCEAN traits. The best performance was observed for agreeableness and extraversion, especially for their positive polarities. The model was less efficient in identifying the negative polarity of openness and conscientiousness. CONCLUSIONS This innovative methodology can be considered a "psycholexical approach 2.0," contributing to research in personality and its practical applications in many fields.
Collapse
Affiliation(s)
- Federico Giannini
- Department of Informatics, Systems and Communication, University of Milan-Bicocca, Milan, Italy
| | - Marco Marelli
- Department of Psychology, University of Milan-Bicocca, Milan, Italy
| | - Fabio Stella
- Department of Informatics, Systems and Communication, University of Milan-Bicocca, Milan, Italy
| | - Dario Monzani
- Department of Psychology, Educational Science and Human Movement, University of Palermo, Palermo, Italy
| | - Luca Pancani
- Department of Psychology, University of Milan-Bicocca, Milan, Italy
| |
Collapse
|
3
|
Ramezani M, Feizi-Derakhshi MR, Balafar MA. Text-based automatic personality prediction using KGrAt-Net: a knowledge graph attention network classifier. Sci Rep 2022; 12:21453. [PMID: 36509800 PMCID: PMC9743120 DOI: 10.1038/s41598-022-25955-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Accepted: 11/25/2022] [Indexed: 12/14/2022] Open
Abstract
Nowadays, a tremendous amount of human communications occur on Internet-based communication infrastructures, like social networks, email, forums, organizational communication platforms, etc. Indeed, the automatic prediction or assessment of individuals' personalities through their written or exchanged text would be advantageous to ameliorate their relationships. To this end, this paper aims to propose KGrAt-Net, which is a Knowledge Graph Attention Network text classifier. For the first time, it applies the knowledge graph attention network to perform Automatic Personality Prediction (APP), according to the Big Five personality traits. After performing some preprocessing activities, it first tries to acquire a knowing-full representation of the knowledge behind the concepts in the input text by building its equivalent knowledge graph. A knowledge graph collects interlinked descriptions of concepts, entities, and relationships in a machine-readable form. Practically, it provides a machine-readable cognitive understanding of concepts and semantic relationships among them. Then, applying the attention mechanism, it attempts to pay attention to the most relevant parts of the graph to predict the personality traits of the input text. We used 2467 essays from the Essays Dataset. The results demonstrated that KGrAt-Net considerably improved personality prediction accuracies (up to 70.26% on average). Furthermore, KGrAt-Net also uses knowledge graph embedding to enrich the classification, which makes it even more accurate (on average, 72.41%) in APP.
Collapse
Affiliation(s)
- Majid Ramezani
- Computerized Intelligence Systems Laboratory, Department of Computer Engineering, Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran.
| | - Mohammad-Reza Feizi-Derakhshi
- Computerized Intelligence Systems Laboratory, Department of Computer Engineering, Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran.
| | - Mohammad-Ali Balafar
- Department of Computer Engineering, Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran
| |
Collapse
|
4
|
A Failed Cross-Validation Study on the Relationship between LIWC Linguistic Indicators and Personality: Exemplifying the Lack of Generalizability of Exploratory Studies. PSYCH 2022. [DOI: 10.3390/psych4040059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
(1) Background: Previous meta-analytic research found small to moderate relationships between the Big Five personality traits and different linguistic computational indicators. However, previous studies included multiple linguistic indicators to predict personality from an exploratory framework. The aim of this study was to conduct a cross-validation study analyzing the relationships between language indicators and personality traits to test the generalizability of previous results; (2) Methods: 643 Spanish undergraduate students were tasked to write a self-description in 500 words (which was evaluated with the LIWC) and to answer a standardized Big Five questionnaire. Two different analytical approaches using multiple linear regression were followed: first, using the complete data and, second, by conducting different cross-validation studies; (3) Results: The results showed medium effect sizes in the first analytical approach. On the contrary, it was found that language and personality relationships were not generalizable in the cross-validation studies; (4) Conclusions: We concluded that moderate effect sizes could be obtained when the language and personality relationships were analyzed in single samples, but it was not possible to generalize the model estimates to other samples. Thus, previous exploratory results found on this line of research appear to be incompatible with a nomothetic approach.
Collapse
|
5
|
Spitzley LA, Wang X, Chen X, Burgoon JK, Dunbar NE, Ge S. Linguistic measures of personality in group discussions. Front Psychol 2022; 13:887616. [PMID: 36186305 PMCID: PMC9523152 DOI: 10.3389/fpsyg.2022.887616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 08/15/2022] [Indexed: 11/26/2022] Open
Abstract
This investigation sought to find the relationships among multiple dimensions of personality and multiple features of language style. Unlike previous investigations, after controlling for such other moderators as culture and socio-demographics, the current investigation explored those dimensions of naturalistic spoken language that most closely align with communication. In groups of five to eight players, participants (N = 340) from eight international locales completed hour-long competitive games consisting of a series of ostensible missions. Composite measures of quantity, lexical diversity, sentiment, immediacy and negations were measured with an automated tool called SPLICE and with Linguistic Inquiry and Word Count. We also investigated style dynamics over the course of an interaction. We found predictors of extraversion, agreeableness, and neuroticism, but overall fewer significant associations than prior studies, suggesting greater heterogeneity in language style in contexts entailing interactivity, conversation rather than solitary message production, oral rather than written discourse, and groups rather than dyads. Extraverts were found to maintain greater linguistic style consistency over the course of an interaction. The discussion addresses the potential for Type I error when studying the relationship between language and personality.
Collapse
Affiliation(s)
- Lee A. Spitzley
- Department of Information Security and Digital Forensics, University at Albany, SUNY, Albany, NY, United States
- *Correspondence: Lee A. Spitzley,
| | - Xinran Wang
- Department of Management Information Systems, University of Arizona, Tucson, AZ, United States
- Center for the Management of Information Systems, University of Arizona, Tucson, AZ, United States
| | - Xunyu Chen
- Department of Management Information Systems, University of Arizona, Tucson, AZ, United States
- Center for the Management of Information Systems, University of Arizona, Tucson, AZ, United States
| | - Judee K. Burgoon
- Center for the Management of Information Systems, University of Arizona, Tucson, AZ, United States
| | - Norah E. Dunbar
- Department of Communication, University of California, Santa Barbara, Santa Barbara, CA, United States
| | - Saiying Ge
- Department of Management Information Systems, University of Arizona, Tucson, AZ, United States
- Center for the Management of Information Systems, University of Arizona, Tucson, AZ, United States
| |
Collapse
|
6
|
Knowledge Graph-Enabled Text-Based Automatic Personality Prediction. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:3732351. [PMID: 35769270 PMCID: PMC9236841 DOI: 10.1155/2022/3732351] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Revised: 04/24/2022] [Accepted: 05/26/2022] [Indexed: 11/24/2022]
Abstract
How people think, feel, and behave primarily is a representation of their personality characteristics. By being conscious of the personality characteristics of individuals whom we are dealing with or deciding to deal with, one can competently ameliorate the relationship, regardless of its type. With the rise of Internet-based communication infrastructures (social networks, forums, etc.), a considerable amount of human communications takes place there. The most prominent tool in such communications is the language in written and spoken form that adroitly encodes all those essential personality characteristics of individuals. Text-based Automatic Personality Prediction (APP) is the automated forecasting of the personality of individuals based on the generated/exchanged text contents. This paper presents a novel knowledge graph-enabled approach to text-based APP that relies on the Big Five personality traits. To this end, given a text, a knowledge graph, which is a set of interlinked descriptions of concepts, was built by matching the input text's concepts with DBpedia knowledge base entries. Then, due to achieving a more powerful representation, the graph was enriched with the DBpedia ontology, NRC Emotion Intensity Lexicon, and MRC psycholinguistic database information. Afterwards, the knowledge graph, which is now a knowledgeable alternative for the input text, was embedded to yield an embedding matrix. Finally, to perform personality predictions, the resulting embedding matrix was fed to four suggested deep learning models independently, which are based on convolutional neural network (CNN), simple recurrent neural network (RNN), long short-term memory (LSTM), and bidirectional long short-term memory (BiLSTM). The results indicated considerable improvements in prediction accuracies in all of the suggested classifiers.
Collapse
|
7
|
Machine Learning Approach for Personality Recognition in Spanish Texts. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12062985] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Personality is a unique trait that distinguishes an individual. It includes an ensemble of peculiarities on how people think, feel, and behave that affects the interactions and relationships of people. Personality is useful in diverse areas such as marketing, training, education, and human resource management. There are various approaches for personality recognition and different psychological models. Preceding work indicates that linguistic analysis is a promising way to recognize personality. In this work, a proposal for personality recognition relying on the dominance, influence, steadiness, and compliance (DISC) model and statistical methods for language analysis is presented. To build the model, a survey was conducted with 120 participants. The survey consisted in the completion of a personality test and handwritten paragraphs. The study resulted in a dataset that was used to train several machine learning algorithms. It was found that the AdaBoost classifier achieved the best results followed by Random Forest. In both cases a feature selection pre-process with Pearson’s Correlation was conducted. AdaBoost classifier obtained the average scores: accuracy = 0.782, precision = 0.795, recall = 0.782, F-measure = 0.786, receiver operating characteristic (ROC) area = 0.939.
Collapse
|