Preview

The Russian Journal of Cultural Studies and Communication

Advanced search

Getting a Handle on Hansard with Python and NLTK, or How to Tame the Linguistic Picture of British Politics with NLP

https://doi.org/10.24833/RJCSC-2024-3-3-81-95

Abstract

This article proposes an optimised starter’s set of basic Python and NLTK (Natural Language Toolkit) methods that are essential to the analysis of massive textual corpora conducted as part of research investigating linguistic images of the world. The need to specify and detail these applied techniques stems from the nature and scope of the inexorable challenges confronted by contemporary cognitive linguistics and lexicology in the realm of unstructured big data analysis. Their viability and practical value are demonstrated in a series of illustrative examples where they are applied to the processing of continuous parallel diachronic corpora of Hansard that capture the discourse of both chambers of the British Parliament produced in the years 2006–2023 and jointly amounting to over a third of a billion tokens. The article suggests that the methods it outlines and classifies can be seen as forming an indispensable minimum of IT competences that is capable of delivering a substantial boost to the level of research both as regards its overall quality and its competitive edge. The proposed toolkit includes an essential set of instruments for target vocabulary processing as well as for the assessment and visualization of word and phrase frequency and collocation. The author presumes that, urged by the need to keep abreast of prevailing trends, the contemporary Russian researcher of linguistic images of the world is highly likely to find themselves compelled at some point to embrace the quantitative analysis methods made possible by combining Python and NLTK. As part of its substantial and varied range of benefits, the latter would arguably help them design and customize research protocols, adapting them with ease and versatility. Lastly and most importantly, the author suggests that Python and NLTK skills may serve as a comfortable gateway towards eventually upgrading one’s linguistic research to cutting-edge global standards of technological sophistication and marketability.

English translation from the Russian text: Gagarin S. N. 2024. Getting a handle on a Hansard with Python and NLTK, or how to tame the linguistic picture of British politics with NLP. Linguistics & Polyglot Studies. 10(2). P. 125–140. DOI: https:// doi.org/10.24833/2410-2423-2024-2-39-125-140

About the Author

S. N. Gagarin
MGIMO University
Russian Federation

Sergey N. Gagarin – Candidate of Philology, a Senior Lecturer at English Language Department No. 1

Moscow



References

1. Aizenshtat M. P. 2016a. Novatsii v parlamentskoi praktike Britanii XVIII stoletiia [Innovations in Britain’s Parliamentary practice of the 18th and 19th centuries]. In Honoris causa. Sbornik nauchnykh statei, posviashchennyi 70-letiiu professora Viktora Vladimirovicha Sergeeva [Honoris causa. Collected Articles of the scientific conference celebrating the 70th anniversary of Professor Viktor Sergeev]. P. 7–13.

2. Aizenshtat M. P. 2016b. Parlamentskie materialy Britanii XVII-XIX vekov. Zaprety i preodoleniia. [Britain’s parliamentary materials of the 18th–19th centuries. Prohibitions and how they were overcome]. Novaia i noveishaia istoriia [Modern and contemporary history]. 5. P. 16−25.

3. Bykova E. A., Sigova A. A. 2023. Vopros priznaniia sovetskogo gosudarstva v politicheskoi diskussii britanskogo parlamenta [The recognition of the Soviet state in the political debate of the British Parliament]. Veter Perestroiki − 2022 [The Wind of Perestroika − 2022]. In A. D. Matlin (Ed.), Sbornik materialov Vtoroj Vserossiiskoi nauchnoi konferentsii [Collected articles of the second national scientific conference]/ (otvetstvennyi redaktor). P. 22−27.

4. Golovina N. M. 2021. «Neparlamentskie vyrazheniia» i rechevaia agressiia v britanskom parlamente: ritoricheskaia strategiia ili institutsional’naia norma? [Unparliamentary language and verbal aggression in the British Parliament: Rhetorical strategy or institutional norm?]. In Myskin S.V. (Ed.) Rech’ i iazyki obshcheniia v konfliktogennom mire. Materialy mezhdunarodnoi nauchno-prakticheskoi konferentsii [Speech and languages of communication in a conflict-prone world. Proceedings of an international research-to-practice conference]. P. 37−39.

5. Zakharova O. V. 2015. Obsuzhdenie migratsionnoi politiki v britanskom parlamente. [Debates on Migration Policy in the British Parliament]. Chelovek, obraz, slovo v kontekste istoricheskogo vremeni i prostranstva. Мaterialy Vserossiiskoi nauchno-prakticheskoi konferentsii [Man, image and word in the context of historical time and space. Proceedings of an international research-to-practice conference]. P. 93−96.

6. Ziubina I. A., Maslova V. A. 2023. Realizatsiia kommunikativnykh strategii v britanskom parlamente [The implementation of communication strategies in the British Parliament]. Ural’skii nauchnyi vestnik [The Urals Science Bulletin]. 6(6). P. 53−60.

7. Kovaliov N. A., Ches N.A. 2017. «SVOI» versus «CHUZHIE»: dinamika razvitiia i manipuljativnyi potentsial kontsepta KHOLODNAIA VOINA v angloiazychnom politicheskom diskurse [Us vs Them: The Development Dynamics and Manipulative Potential of the Concept “Cold War” in Russian and English-Language Political Discourse]. Vestnik Rossiiskogo universiteta druzhby narodov. Seriia: Teoriia iazyka. Semiotika. Semantika [RUDN Journal of Language Studies, Semiotics and Semantics]. 8(4). P. 1171−1178.

8. Koretskaia O. V. 2021. O nekotorykh politicheskikh evfemizmakh v epokhu postpravdy (na primere angliiskogo iazyka) [On some political euphemisms in the post-truth era (a case study of the English language]. Filologicheskie nauki v MGIMO [Linguistics & Polyglot Studies]. 7(3). P. 16−23.

9. Kornilov A. A., Lobanova N. S., Egorov A. I. 2023. Britanskii parlament kak tsentr vyrabotki vneshnepoliticheskikh reshenii v period siriiskogo krizisa (2011–2015 gody) [The Role of the British Parliament in foreign policymaking during the Syria crisis of 2011-2015]. Nauchnyi dialog [Scientific Dialogue]. 12(2). P. 363−384.

10. Kornilov A. A., Lobanova N. S., Zhernovaia O. R. 2022. Obsuzhdenie palestino-izrail’skogo konflikta v komitete britanskogo parlamenta po inostrannym delam (2014 god) [The Israeli-Palestinian conflict as debated by the Foreign Affairs Committee of the British Parliament in 2014]. Nauchnyi dialog [Scientific Dialogue]. 11(2). P. 437−462.

11. Lobanova N. S. 2021. Kliuchevye terminy dokumentov britanskogo parlamenta v oblasti blizhnevostochnoi politiki: etimologiia, politicheskoe znachenie i primery ispol’zovaniia [Key terms of the Middle East policy employed by the British Parliament: etymology, political significance and usage]. In Regiony mira: problemy istorii, kul’tury i politiki. Sbornik nauchnykh statei [The world’s regions: historical, cultural and political problems. Collected articles]. P. 107−112.

12. Lobanova N. S. 2023. Podkhod komiteta po inostrannym delam britanskogo parlamenta k krizisu na Ukraine [The Ukraine crisis as seen by the Foreign Affairs Committee of the British Parliament]. Nauchno-analiticheskii vestnik Instituta Evropy RAN [The Scientific and Analytical Bulletin of the Institute for Europe of the Russian Academy of Sciences]. 6(36). P. 7−18.

13. Mikhailov V. V. 2022. Vkhozhdenie Azerbaidzhana v sostav sovetskogo gosudarstva i politika Velikobritanii v otnoshenii Zakavkaz’ia v 1918-1920 gg.: politicheskii i sotsial’no-ekonomicheskii aspekty [Azerbaijan’s accession to the USSR and the UK Transcaucasia policy in 1918–1920: Political and socio-economic aspects]. Uchenye zapiski Krymskogo federal’nogo universiteta imeni V.I. Vernadskogo. Istoricheskie nauki [Scientific Notes of V.I. Vernadsky Crimean Federal University. Historical Science]. 8(2). P. 3−87.

14. Khakhalkina E. V. 2022. “Pokolenie Vindrash» v kontekste sovremennogo razvitiia mul’tirasovoi Velikobritanii (po materialam britanskogo parlamenta) [Windrush generation in the context of the modern development of multiracial Great Britain (based on the materials of the British Parliament)]. Novaia i noveishaia istoriia [Modern and contemporary history]. 6. P. 180−191.

15. Ches N. A. 2020. Kontseptual’naia metafora v politicheskom mediadiskurse (na materiale angliiskogo iazyka): monografiia [Conceptual Metaphor in English-Language Political Media Discourse]. MGIMO University.

16. Abercrombie G., Batista-Navarro R. 2018a. A sentiment-labelled corpus of Hansard parliamentary debate speeches. Proceedings of ParlaCLARIN. Common Language Resources and Technology Infrastructure (CLARIN). P. 43−48.

17. Abercrombie G., Batista-Navarro R. 2020. Sentiment and position-taking analysis of parliamentary debates: A systematic literature review. Journal of Computational Social Science. 3(1). P. 245−270.

18. Abercrombie G., Batista-Navarro R. 2018b. ‘Aye’or ‘no’? Speech-level sentiment analysis of Hansard UK parliamentary debate transcripts. Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). P. 4173−4180.

19. Abercrombie G., Batista-Navarro R. 2018c. Identifying opinion-topics and polarity of parliamentary debate motions. Proceedings of the 9th workshop on computational approaches to subjectivity, sentiment and social media analysis. P. 280−285.

20. Aspinall P. 2020. Ethnic/racial terminology as a form of representation: A critical review of the lexicon of collective and specific terms in use in Britain. Genealogy. 4(3). P. 87−100.

21. Bischof K., Ilie C. 2018. Democracy and discriminatory strategies in parliamentary discourse. Charteris-Black J. 2009. Metaphor and gender in British parliamentary debates. Palgrave Macmillan.

22. Coutto T. 2021. Half-full or half-empty? Framing of UK–EU relations during the Brexit referendum campaign. In Voltolini B., Natorski M., Hay C. (Eds.) Crisis and Politicisation Routledge. P. 85−103.

23. Cribb M., Rochford S. 2018. The transcription and representation of spoken political discourse in the UK House of Commons. International Journal of English Linguistics. 8(2). P. 1−14.

24. Duthie R., Budzyńska K. 2018. Classifying types of ethos support and attack. 7th International Conference on Computational Models of Argument. IOS Press. P. 161−168.

25. Hiltunen T. et al. 2020. Investigating colloquialization in the British parliamentary record in the late 19th and early 20th century. Language Sciences. DOI: https://doi.org/10.1016/j.langsci.2020.101270

26. Huysmans J., Buonfino A. 2008. Politics of exception and unease: Immigration, asylum and terrorism in parliamentary debates in the UK. Political Studies. 56(4). P. 766−788.

27. Ihalainen P., Sahala A. 2020. Evolving conceptualisations of internationalism in the UK parliament: Collocation analyses from the League to Brexit. In Oiva M., Fridlund M., Paju P. (Eds.) Digital Histories: Emergent Approaches within the New Digital History. Helsinki University Press. P. 199–219.

28. Ilie C. 2003. Parenthetically speaking: Parliamentary parentheticals as rhetorical strategies. Dialogue Analysis 2000: Selected Papers from the 10th IADA Anniversary Conference. P. 253−264.

29. Ilie C. 2010. Strategic uses of parliamentary forms of address: The case of the UK Parliament and the Swedish Riksdag. Journal of Pragmatics. Vol. 42(4). P. 885−911.

30. Jeffries L., Walker B. 2019. Austerity in the Commons: A corpus critical analysis of austerity and its surrounding grammatical context in Hansard (1803–2015). In Power K., Ali T., Lebdušková E. (Eds.) Discourse Analysis and Austerity. Routledge. P. 53–79.

31. Kettell S., Kerr P. 2020. From eating cake to crashing out: Constructing the myth of a no-deal Brexit. Comparative European Politics. 18. P. 590−608.

32. Labat S., Kotze H., Szmrecsanyi B. 2023. Processing and prescriptivism as constraints on language variation and change: Relative clauses in British and Australian English parliamentary debates. In Korhonen M., Kotze H., and Tyrkkö J. (Eds.) Exploring Language and Society with Big Data: Parliamentary discourse across time and space. John Benjamins. P. 250−276.

33. Leduc R. 2021. The ontological threat of foreign fighters. European Journal of International Relations. 27(1). P.127−149.

34. Mair C. 2023. Empire, migration and race in the British parliament (1803–2005). In Korhonen M., Kotze H., Tyrkkö J. (Eds.) Exploring Language and Society with Big Data: Parliamentary discourse across time and space. John Benjamins. P. 111−118.

35. McGill E., Saggion H. 2023. BSL-Hansard: A parallel, multimodal corpus of English and interpreted British Sign Language data from parliamentary proceedings. Proceedings of the Second International Workshop on Automatic Translation for Signed and Spoken Languages. P. 38−43.

36. McKenzie-McHarg A., Fredheim R. 2017. Cock-ups and slap-downs: A quantitative analysis of conspiracy rhetoric in the British Parliament 1916–2015. Historical Methods: A Journal of Quantitative and Interdisciplinary History. 50(3). P. 156−169.

37. Mollin S. 2007. The Hansard hazard: Gauging the accuracy of British parliamentary transcripts. Corpora. 2(2). P. 187−210.

38. Onyimadu O. et al. 2014. Towards sentiment analysis on parliamentary debates in Hansard. Semantic Technology: Third Joint International Conference, JIST 2013, Seoul, South Korea, November 28-30, 2013. Revised Selected Papers. Vol. 3. Springer International Publishing. P. 48−50.

39. Riihimäki J. 2019. At the heart and in the margins: Discursive construction of British national identity in relation to the EU in British parliamentary debates from 1973 to 2015. Discourse & Society. 30(4). P. 412−431.

40. Thundyill S. et al. 2023. Moving Fingers Write History and Having Writ Become Digital: Towards a Big Data Framework for the Analysis of Parliamentary Proceedings. Future of Information and Communication Conference. P. 459−479.

41. Van Dijk T. 2010. Political identities in parliamentary debates. In C. Ilie (Ed.), European parliaments under scrutiny: Discourse strategies and interaction practices. John Benjamins. P. 29−56.

42. Willis R. 2017. Taming the climate? Corpus analysis of politicians’ speech on climate change. Environmental Politics. 26(2). P. 212−231.


Review

For citations:


Gagarin S.N. Getting a Handle on Hansard with Python and NLTK, or How to Tame the Linguistic Picture of British Politics with NLP. The Russian Journal of Cultural Studies and Communication. 2024;3(3):81-95. https://doi.org/10.24833/RJCSC-2024-3-3-81-95

Views: 39


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2949-6330 (Online)