Can ChatGPT evaluate research quality?

Baker, M. (2016). Stat-checking software stirs up psychology. Nature, 540(7631), 151–152. Baker M. ( 2016 ). Stat-checking software stirs up psychology . Nature , 540 ( 7631 ), 151 – 152 . Search in Google Scholar

Bornmann, L., Mutz, R., & Daniel, H. D. (2010). A reliability-generalization study of journal peer reviews: A multilevel meta-analysis of inter-rater reliability and its determinants. PloS one, 5(12), e14331. Bornmann L. Mutz R. Daniel H. D. ( 2010 ). A reliability-generalization study of journal peer reviews: A multilevel meta-analysis of inter-rater reliability and its determinants . PloS one , 5 ( 12 ), e14331 . Search in Google Scholar

Buriak, J. M., Hersam, M. C., & Kamat, P. V. (2023). Can ChatGPT and Other AI Bots Serve as Peer Reviewers? ACS Energy Letters, 9, 191–192. Buriak J. M. Hersam M. C. Kamat P. V. ( 2023 ). Can ChatGPT and Other AI Bots Serve as Peer Reviewers? ACS Energy Letters , 9 , 191 – 192 . Search in Google Scholar

Cheng, S. W., Chang, C. W., Chang, W. J., Wang, H. W., Liang, C. S., Kishimoto, T., & Su, K. P. (2023). The now and future of ChatGPT and GPT in psychiatry. Psychiatry and Clinical Neurosciences, 77(11), 592–596. Cheng S. W. Chang C. W. Chang W. J. Wang H. W. Liang C. S. Kishimoto T. Su K. P. ( 2023 ). The now and future of ChatGPT and GPT in psychiatry . Psychiatry and Clinical Neurosciences , 77 ( 11 ), 592 – 596 . Search in Google Scholar

Feng, Y., Vanam, S., Cherukupally, M., Zheng, W., Qiu, M., & Chen, H. (2023, June). Investigating code generation performance of ChatGPT with crowdsourcing social data. In 2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC) (pp. 876–885). IEEE. Feng Y. Vanam S. Cherukupally M. Zheng W. Qiu M. Chen H. ( 2023 , June ). Investigating code generation performance of ChatGPT with crowdsourcing social data . In 2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC) (pp. 876 – 885 ). IEEE . Search in Google Scholar

Flanagin, A., Kendall-Taylor, J., & Bibbins-Domingo, K. (2023). Guidance for authors, peer reviewers, and editors on use of AI, language models, and chatbots. JAMA. https://doi.org/10.1001/jama.2023.12500. Flanagin A. Kendall-Taylor J. Bibbins-Domingo K. ( 2023 ). Guidance for authors, peer reviewers, and editors on use of AI, language models, and chatbots . JAMA . https://doi.org/10.1001/jama.2023.12500 . Search in Google Scholar

Garcia, M. B. (2024). Using AI tools in writing peer review reports: should academic journals embrace the use of ChatGPT? Annals of biomedical engineering, 52, 139–140. Garcia M. B. ( 2024 ). Using AI tools in writing peer review reports: should academic journals embrace the use of ChatGPT? Annals of biomedical engineering , 52 , 139 – 140 . Search in Google Scholar

Gov.uk (2023). Guidance: Exceptions to copyright. https://www.gov.uk/guidance/exceptions-to-copyright. Gov.uk ( 2023 ). Guidance: Exceptions to copyright . https://www.gov.uk/guidance/exceptions-to-copyright . Search in Google Scholar

Hosseini, M., & Horbach, S. P. (2023). Fighting reviewer fatigue or amplifying bias? Considerations and recommendations for use of ChatGPT and other Large Language Models in scholarly peer review. Research Integrity and Peer Review, 8(1), 4. https://doi.org/10.1186/s41073-023-00133-5. Hosseini M. Horbach S. P. ( 2023 ). Fighting reviewer fatigue or amplifying bias? Considerations and recommendations for use of ChatGPT and other Large Language Models in scholarly peer review . Research Integrity and Peer Review , 8 ( 1 ), 4 . https://doi.org/10.1186/s41073-023-00133-5 . Search in Google Scholar

Huang, J., & Tan, M. (2023). The role of ChatGPT in scientific communication: writing better scientific review articles. American Journal of Cancer Research, 13(4), 1148. Huang J. Tan M. ( 2023 ). The role of ChatGPT in scientific communication: writing better scientific review articles . American Journal of Cancer Research , 13 ( 4 ), 1148 . Search in Google Scholar

Johnson, D., Goodman, R., Patrinely, J., Stone, C., Zimmerman, E., Donald, R., … & Wheless, L. (2023). Assessing the accuracy and reliability of AI-generated medical responses: an evaluation of the Chat-GPT model. Research square. rs.3.rs-2566942. https://doi.org/10.21203/rs.3.rs-2566942/v1. Johnson D. Goodman R. Patrinely J. Stone C. Zimmerman E. Donald R. Wheless L. ( 2023 ). Assessing the accuracy and reliability of AI-generated medical responses: an evaluation of the Chat-GPT model . Research square . rs.3.rs-2566942 . https://doi.org/10.21203/rs.3.rs-2566942/v1 . Search in Google Scholar

Kocoń, J., Cichecki, I., Kaszyca, O., Kochanek, M., Szydło, D., Baran, J., & Kazienko, P. (2023). ChatGPT: Jack of all trades, master of none. Information Fusion, 101861. Kocoń J. Cichecki I. Kaszyca O. Kochanek M. Szydło D. Baran J. Kazienko P. ( 2023 ). ChatGPT: Jack of all trades, master of none . Information Fusion , 101861 . Search in Google Scholar

Langfeldt, L., Nedeva, M., Sörlin, S., & Thomas, D. A. (2020). Co-existing notions of research quality: A framework to study context-specific understandings of good research. Minerva, 58(1), 115–137. Langfeldt L. Nedeva M. Sörlin S. Thomas D. A. ( 2020 ). Co-existing notions of research quality: A framework to study context-specific understandings of good research . Minerva , 58 ( 1 ), 115 – 137 . Search in Google Scholar

Liang, W., Zhang, Y., Cao, H., Wang, B., Ding, D., Yang, X., & Zou, J. (2023). Can large language models provide useful feedback on research papers? A large-scale empirical analysis. arXiv preprint arXiv:2310.01783 Liang W. Zhang Y. Cao H. Wang B. Ding D. Yang X. Zou J. ( 2023 ). Can large language models provide useful feedback on research papers? A large-scale empirical analysis . arXiv preprint arXiv:2310.01783 Search in Google Scholar

Memon, A. R. (2020). Similarity and plagiarism in scholarly journal submissions: bringing clarity to the concept for authors, reviewers and editors. Journal of Korean medical science, 35(27), https://synapse.koreamed.org/articles/1146064. Memon A. R. ( 2020 ). Similarity and plagiarism in scholarly journal submissions: bringing clarity to the concept for authors, reviewers and editors . Journal of Korean medical science , 35 ( 27 ), https://synapse.koreamed.org/articles/1146064 . Search in Google Scholar

Mollaki, V. (2024). Death of a reviewer or death of peer review integrity? the challenges of using AI tools in peer reviewing and the need to go beyond publishing policies. Research Ethics, 17470161231224552. Mollaki V. ( 2024 ). Death of a reviewer or death of peer review integrity? the challenges of using AI tools in peer reviewing and the need to go beyond publishing policies . Research Ethics , 17470161231224552 . Search in Google Scholar

Nazir, A., & Wang, Z. (2023). A Comprehensive Survey of ChatGPT: Advancements, Applications, Prospects, and Challenges. Meta-radiology, 100022. Nazir A. Wang Z. ( 2023 ). A Comprehensive Survey of ChatGPT: Advancements, Applications, Prospects, and Challenges . Meta-radiology , 100022 . Search in Google Scholar

OpenAI (2023). GPT-4 technical report. https://arxiv.org/abs/2303.08774 OpenAI ( 2023 ). GPT-4 technical report . https://arxiv.org/abs/2303.08774 Search in Google Scholar

Perkins, M., & Roe, J. (2024). Academic publisher guidelines on AI usage: A ChatGPT supported thematic analysis. F1000Research, 12, 1398. Perkins M. Roe J. ( 2024 ). Academic publisher guidelines on AI usage: A ChatGPT supported thematic analysis . F1000Research , 12 , 1398 . Search in Google Scholar

REF (2019a). Guidance on submissions (2019/01). https://archive.ref.ac.uk/publications-and-reports/guidance-on-submissions-201901/ REF ( 2019a ). Guidance on submissions (2019/01) . https://archive.ref.ac.uk/publications-and-reports/guidance-on-submissions-201901/ Search in Google Scholar

REF (2019b). Panel criteria and working methods (2019/02). https://archive.ref.ac.uk/publications-and-reports/panel-criteria-and-working-methods-201902/ REF ( 2019b ). Panel criteria and working methods (2019/02) . https://archive.ref.ac.uk/publications-and-reports/panel-criteria-and-working-methods-201902/ Search in Google Scholar

Sivertsen, G. (2017). Unique, but still best practice? The Research Excellence Framework (REF) from an international perspective. Palgrave Communications, 3(1), 1–6. Sivertsen G. ( 2017 ). Unique, but still best practice? The Research Excellence Framework (REF) from an international perspective . Palgrave Communications , 3 ( 1 ), 1 – 6 . Search in Google Scholar

Thelwall, M., Kousha, K., Wilson, P., Makita, M., Abdoli, M., Stuart, E., Levitt, J. & Cancellieri, M. (2023a). Predicting article quality scores with machine learning: The UK Research Excellence Framework. Quantitative Science Studies, 4(2), 547–573. Thelwall M. Kousha K. Wilson P. Makita M. Abdoli M. Stuart E. Levitt J. Cancellieri M. ( 2023a ). Predicting article quality scores with machine learning: The UK Research Excellence Framework . Quantitative Science Studies , 4 ( 2 ), 547 – 573 . Search in Google Scholar

Thelwall, M., Kousha, K., Stuart, E., Makita, M., Abdoli, M., Wilson, P. & Levitt, J. (2023b). Does the perceived quality of interdisciplinary research vary between fields? Journal of Documentation, 79(6), 1514–1531. https://doi.org/10.1108/JD-01-2023-0012 Thelwall M. Kousha K. Stuart E. Makita M. Abdoli M. Wilson P. Levitt J. ( 2023b ). Does the perceived quality of interdisciplinary research vary between fields? Journal of Documentation , 79 ( 6 ), 1514 – 1531 . https://doi.org/10.1108/JD-01-2023-0012 Search in Google Scholar

Wei, X., Cui, X., Cheng, N., Wang, X., Zhang, X., Huang, S., & Han, W. (2023). Zero-shot information extraction via chatting with chatgpt. arXiv preprint arXiv:2302.10205. Wei X. Cui X. Cheng N. Wang X. Zhang X. Huang S. Han W. ( 2023 ). Zero-shot information extraction via chatting with chatgpt . arXiv preprint arXiv:2302.10205 . Search in Google Scholar

Wilsdon, J., Allen, L., Belfiore, E., Campbell, P., Curry, S., Hill, S., (2015). The metric tide. Report of the independent review of the role of metrics in research assessment and management. Wilsdon J. Allen L. Belfiore E. Campbell P. Curry S. Hill S. ( 2015 ). The metric tide . Report of the independent review of the role of metrics in research assessment and management . Search in Google Scholar

Wu, T., He, S., Liu, J., Sun, S., Liu, K., Han, Q. L., & Tang, Y. (2023). A brief overview of ChatGPT: The history, status quo and potential future development. IEEE/CAA Journal of Automatica Sinica, 10(5), 1122–1136. Wu T. He S. Liu J. Sun S. Liu K. Han Q. L. Tang Y. ( 2023 ). A brief overview of ChatGPT: The history, status quo and potential future development . IEEE/CAA Journal of Automatica Sinica , 10 ( 5 ), 1122 – 1136 . Search in Google Scholar

Zhao, X., & Zhang, Y. (2022). Reviewer assignment algorithms for peer review automation: A survey. Information Processing & Management, 59(5), 103028. Zhao X. Zhang Y. ( 2022 ). Reviewer assignment algorithms for peer review automation: A survey . Information Processing & Management , 59 ( 5 ), 103028 . Search in Google Scholar

eISSN:: 2543-683X
Language:: English

Publication timeframe:: 4 times per year
Journal Subjects:: Computer Sciences, Information Technology, Project Management, Databases and Data Mining

Journal RSS Feed

Can ChatGPT evaluate research quality?

Article Category: Research Papers

Published Online: May 27, 2024

Page range: 1 - 21

Received: Feb 06, 2024

Accepted: Apr 22, 2024

DOI: https://doi.org/10.2478/jdis-2024-0013

KeywordsChatGPT, Large Language Models, LLM, Research Excellence Framework, REF 2021, Research quality, Research assessment

© 2024 Mike Thelwall, published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Keywords
ChatGPT, Large Language Models, LLM, Research Excellence Framework, REF 2021, Research quality, Research assessment