1 Introduction: Data science in education – you’re invited to the party!

Abstract

This chapter is a welcome to readers and an overview of the book. It describes challenges with learning data science in education, including the complexity of the data scientist role. Finally, this chapter introduces walkthroughs as a way to learn data science in the education field.

Dear Data Scientists, Educators, and Data Scientists who are Educators:

If you’re a data scientist in education or an educator in data science, your role probably isn’t straightforward. This book is our contribution to a growing movement to merge the paths of data analysis and education. We wrote this book to provide clarity and support on your first steps in this direction. It’s an invitation to join others who are learning.

Whether you’re a data scientist working in education or an educator learning data science, we invite you to read this book and put these techniques to work in the real world. Your work in the education community will help decide how education and data science come together.

1.1 Learning data science in education

In the coming chapters, you’ll explore examples of data science in education. But first, let’s talk about why data science in education is not easily definable.

We like the definition of McFarland et al. – it’s expansive and conceives of educational data science as “an umbrella for a range of new and often nontraditional quantitative methods (such as machine learning, network analysis, and natural language processing) applied to educational problems often using novel data” (mcfarland2021education?). And yet the definition is still so broad that it can’t fully define any one person’s experience.

Being a data scientist in education is challenging because there is no universal vision for what that means. Education is a complex field. If education were a building, it would be multi-storied with countless rooms. There are privately and publicly funded schools with more than 18 possible grade levels. Students learn alone or with others in a classroom. Their journeys involve starting, leaving, finishing, and returning to school.

This metaphorical building we call education also has rooms most never see – finance staff plan the most efficient use of limited funds. The transportation department plans bus routes across vast spaces. University administrators search for ways to measure career readiness. Education consultants study student performance and sentiments.

There are many ways one could do data science in education, but consensus on ways one should do it is still developing. Practitioners in the education field are still working out how it all fits together.

Then there’s the challenge of how education systems can best use data science tools. In many education settings, school staff have never worked with someone with that blends education content knowledge, programming, and statistics — the “Data Science Venn Diagram” (Conway, 2010).

1.2 Making the path clearer

As data science in education grows, the way we think about it must also grow. Doing so will help us advance data science in education as a discipline. It will uncover the unique opportunities that come with analyzing data in our domain.

We begin this book by offering a primer for data science in education, including a discussion of foundational skills in the R programming language. The primer continues with suggestions for how to use this text (Chapter 2), our definition of the data science process, what it looks like in different roles (Chapter 3), and a discussion of similarities and differences between data science in education and data science in others fields (Chapter 4).

Next, you’ll take what you’ve learned and apply it in walkthroughs. The walkthroughs are meant to be an example-driven approach to learning. They include recognizable and actionable demonstrations.

The chapters in this book fall into four different themes:

Building a foundation to use R and RStudio

Analyzing student perception data

Analyzing student performance data

Using publicly available data

We end the book by discussing how to integrate data science skills into your education job (Chapter 15, an overview of teaching data science ([Chapter 16])(#c16)), and supplemental resources (Chapter 17 and Chapter 18).

We hope after reading this book, you feel less alone in your data science journey. We also hope your experience with this book is both challenging and fun. Finally, we hope you take what you learned and share it with others.

1.3 Conventions used in the book

The following typographical conventions are used in this book:

  • Package names are surrounded by curly brackets: {caret}
  • Variable names are in constant width: var1
  • Function names are in constant width and then parentheses: clean_names()