Template-Type: ReDIF-Paper 1.0 Author-Name: Lachlan O'Neill Author-Email: lachlan.oneill@monash.edu Author-Workplace-Name: SoDa Laboratories, Monash Business School Author-Name: Nandini Anantharama Author-Email: Nandini.Anantharama@monash.edu Author-Workplace-Name: SoDa Laboratories, Monash Business School Author-Name: Wray Buntine Author-Email: wray.buntine@monash.edu Author-Workplace-Name: Faculty of Information Technology, Monash University Author-Name: Simon D Angus Author-Email: simon.angus@monash.edu Author-Workplace-Name: Dept. of Economics and SoDa Laboratories, Monash Business School Title: Quantitative Discourse Analysis at Scale - AI, NLP and the Transformer Revolution Abstract: Empirical social science requires structured data. Traditionally, these data have arisen from statistical agencies, surveys, or other controlled settings. But what of language, political speech, and discourse more generally? Can text be data? Until very recently, the journey from text to data has relied on human coding, severely limiting study scope. Here, we introduce natural language processing (NLP), a field of artificial intelligence (AI), and its application to discourse analysis at scale. We introduce AI/NLP’s key terminology, concepts, and techniques, and demonstrate its application to the social sciences. In so doing, we emphasise a major shift in AI/NLP technological capability now underway, due largely to the development of transformer models. Our aim is to provide the quantitative social scientists with both a guide to state-of-the-art AI/NLP in general, and something of a road-map for the transformer revolution now sweeping through the landscape. Creation-Date: 2021-12 File-URL: http://soda-wps.s3-website-ap-southeast-2.amazonaws.com/RePEc/ajr/sodwps/2021-12.pdf File-Format: Application/pdf Number: 2021-12 Classification-JEL: C45,C52,C55 Keywords: text as data, artificial intelligence, machine learning, natural language processing, transformer models Handle: RePEc:ajr:sodwps:2021-12