An Introduction to Quantitative Text Analysis for Linguistics

Reproducible Research Using R

Author
Affiliation

Wake Forest University

Updated

August 17, 2024

About

Book cover with title and subtitle

Book

The goal of this textbook is to provide readers with foundational knowledge and practical skills in quantitative text analysis using the R programming language. It is geared towards advanced undergraduates, graduate students, and researchers looking to expand their methodological toolbox. It assumes no prior knowledge of programming or quantitative methods and prioritizes practical application and intuitive understanding over technical details.

By the end of this textbook, readers will be able to identify, interpret and evaluate data analysis procedures and results to support research questions within language science. Additionally, readers will gain experience in designing and implementing research projects that involve processing and analyzing textual data employing modern programming strategies. This textbook aims to instill a strong sense of reproducible research practices, which are critical for promoting transparency, verification, and sharing of research findings.

Author

Dr. Jerid Francom is Associate Professor of Spanish and Linguistics at Wake Forest University. His research focuses on the use of language corpora from a variety of sources (news, social media, and other internet sources) to better understand the linguistic and cultural similarities and differences between language varieties for both scholarly and pedagogical projects. He has published on topics including the development, annotation, and evaluation of linguistic corpora and analyzed corpora through corpus, psycholinguistic, and computational methodologies. He also has experience working with and teaching statistical programming with R.

License

Creative Commons License
This work by Jerid C. Francom is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Credits

Font Awesome Icons are SIL OFL 1.1 Licensed Open Font License

Acknowledgments

The journey of creating this textbook has been both challenging and rewarding, and it would not have been possible without the inspiration, support, and invaluable feedback from many individuals. First and foremost, I extend my deepest gratitude to my students at Wake Forest University. Your enthusiasm and curiosity have been a constant source of inspiration, pushing me to address my blind spots and meet your needs more effectively.

I am particularly grateful for the generous feedback from the following individuals, whose insights and suggestions have significantly shaped the development of this book: Laura Aull, Andrea Bowling, Caroline Brady, Declan Golsen, Logan Jacobs, Abby Komiske, Asya Little, Elaine Lu, Jack Nelson, and Sicheng Wang. Your contributions have been instrumental in refining the content and making it more accessible and engaging for future readers.

A special thanks to my spouse and colleague, Dr. Claudia Valdez, for her unwavering support, encouragement, and patience throughout this project. Your feedback and guidance have been invaluable, and I am grateful for your willingness to engage in countless discussions about the content, structure, and pedagogical approach of this book. Most importantly, thank you for your love and understanding, which have sustained me through the ups and downs of this journey.

Finally, I would like to express my appreciation to the R community, especially the developers and contributors of the {tidyverse} and {tidymodels} packages. Your dedication to creating user-friendly and powerful tools for data analysis has revolutionized the field of quantitative text analysis and made it accessible to a broader audience.