4 Special considerations

Abstract

This chapter explores how data science in education is similar to other data science fields. It also explores how data science in education has its unique opportunities and challenges. This chapter describes considerations for data science in general, like learning to code and adopting a reproducible scientific approach to data analysis. Skills like cleaning, visualizing, and modeling data are present in other fields. Some skills, like working with data standards and guidelines specific to education, require learning in the education context. This chapter also describes opportunities in data science in education, like working towards equitable outcomes for students. Using examples from the field, this chapter provides practical context and inspiration for the reader’s learning experience.

Data science in education presents many opportunities, like those discussed in Chapter 3, but also many challenges. These are varied, and while some are common to all domains in which data science is carried out, others are very particular to the field of education. For example, data science in education includes not only accessing, processing, and modeling data, but also social and cultural factors, like the training and support that educational data scientists have available to them.

Because data science in education is relatively new, it’s understandable that school staff may be wary of how data is collected and analyzed. It’s common for them to question how data is used, particularly if it is used to describe and evaluate staff and student performance. One of the biggest challenges that can arise is when individuals feel concerned that they are being evaluated by unclear or unfair metrics. Usually, “data-driven” efforts mean different things to administrators and educators. To an administrator, a data-driven effort might be an endeavor to better understand the strengths and weaknesses of pre-existing systems, with an eye to eventually proposing new, more efficient systems. To an educator, a data-driven effort might feel like an approach that masks the individuality of students by reducing them to numbers. Neither perspective is exactly correct. While maximizing efficiency and preserving students’ individual needs should certainly be goals of educators and educational administrators, data science is a versatile tool that can be leveraged to help answer a variety of meaningful questions. This chapter will present some thoughts to consider when adopting data science in educational contexts.

4.1 Things to consider when doing data science in any domain

4.1.1 Learning to code

Data scientists everywhere combine content knowledge, programming, and statistics to solve problems. However, many are not experts in all three areas when they begin their data science journeys. You are not alone if your programming skills are lacking. This book is for R learners without a computer science or coding background. The great thing about entering a field as flexible as data science is joining a crowd of self-taught individuals.

4.1.2 Cleaning data

Due to its nature, analyzing education data can be difficult. The data is often not ready to be used. It may be in a format that is difficult to open without specialized software, or it may need to be “cleaned” before it is usable.

In data science, “cleaning” or “processing” data refers to reorganizing or restructuring the dataset to make it easier to analyze. It’s not unusual to begin with data that has duplicate columns or columns that are out of order.

The process you’d go through to reorganize the data to make it logical is data cleaning. Because of the different types of analyses, the data scientist in education must use a variety of cleaning approaches. In the walkthrough sections of this book, you’ll see different cleaning approaches for multilevel models, models for longitudinal data, and text analysis. In later chapters of this book, you will learn more specifics about building models.

4.1.3 Addressing ambiguity: a reproducible approach

Educators may feel wary of data science processes, especially when they’re ambiguous. To make them less ambiguous, build analytic routines that are clear and transparent. Specifically, be explicit about what data is collected, how it is collected, how it is analyzed, and how it is used for decision making.

One approach to this kind of openness is called “reproducible research”. Reproducibility (Wikipedia, 2020) is the idea that a completed analysis should come with all the necessary materials, methodology, and code needed for another data scientist to achieve the same results.

Consider the practical benefits, especially during transition periods. If a data scientist leaves their position, they can pass on the files, code, and documentation necessary for their colleagues to continue the work. This empowers new staff to continue running reports, updating a dashboard’s underlying data, or explaining methods.

If you want to start implementing a more reproducible approach, consider starting with clear documentation and uniform file folder structures.

4.2 Things to consider when doing data science in education

4.2.1 Addressing organizational resistance: a self-driven analytic approach

In some education settings, there is no precedent for a data science approach. For example, it’s not common for a teacher to do a regression analysis on data. However, you don’t have to wait for a district-wide or state-wide data science initiative to start using techniques you will learn in this book.

An organization can encourage its staff to do data analyses to test hypotheses about student performance. For example, a teacher may believe their students would learn more if they received a short review period at the start of a lesson. This hypothesis can be evaluated using data.

There are at least two benefits to this approach. First, staff will begin seeing data analysis as an ongoing inquiry into learning outcomes instead of something that happens only after test results are released. Second, staff begin to demystify data analysis as a process. When school staff collect and analyze their own data, they know exactly how it is collected and exactly how it is analyzed.

Building and establishing a data governance system that advocates for an open and transparent analytic process is difficult and long-term work, but the reward is less apprehension about the value of data analysis.

Here are more practical steps a school district can take towards a more open approach to analysis:

Make plain language write-ups of data analyses available so interested staff can learn more about how data was collected and analyzed
To the extent that privacy laws and policies allow, make datasets available to staff within the organization
Establish an expectation that analysts present their work in a way that is accessible to a wide audience
Hold regular forums to discuss how the organization collects and uses data

4.2.2 Lack of processes and guidelines

It’s possible that educators are concerned about the ambiguity of data science processes because they’re not yet familiar with existing best practices. While there is a body of past research on students’ work with data (see Lee & Wilkerson (2018) for a review), there is limited information from case- or design-based research on how educators themselves use data in their work.

This challenge is reflected in the range of job titles of those who work with data: some are data analysts, some are research associates, and the list continues. However, as educational data science grows, some school districts are now hiring for data scientist positions. Even so, there are a multitude of discipline-specific (e.g., science teaching) or department-specific (e.g., institutional research) conferences, but fewer for those who work with data in education.

Education is a field that is rich with data: survey, assessment, written, policy, and evaluation. Nevertheless, there’s room for more agreement on processes and procedures to share data among educators. Academic and research settings can sometimes lead to silos of information. Researchers at one university might run a survey, while researchers at another have to wait years before seeing the results. Or worse, the results aren’t shared at all.

The good news is that many education organizations are both curious and passionate about supporting student success. As a pioneer for data science in your organization, you can create processes that make data and analysis more available to others.

4.2.3 Advancing equity

Data science can be used to inform decisions that reduce inequality in the education system. However, it can also exacerbate inequality. Algorithms that are not transparent or are implemented poorly can perpetuate biases and policies that make change difficult.

But once an organization has clearly defined its equity-related goals, it can use its data science routines to advance those goals through learning and decision making.

For example, if an organization hopes to decrease the opportunity gap between students affected by poverty and students not affected by poverty, then it is important that it (1) defines what “affected by poverty” means, (2) creates a project design that helps it evaluate progress, and (3) determines whether its data collection allows it to disaggregate on useful demographics (see Walkthrough 3).

R and RStudio, both freely available and open, also serve to increase equity in data science. As opposed to proprietary tools, they are accessible to anybody with a computer and internet. The code behind the packages is available online, making the packages’ processes and design choices transparent.

Thoughtful and deliberate data science empowers practitioners to advance learning goals for their students. It makes repetitive analytic tasks more efficient. Through code, it makes analytic decisions and thinking transparent. And it serves as a scalable tool for continually monitoring progress towards equity.

4.2.4 The complex nature of education data

Education data is difficult to collect and to analyze. It often includes data at multiple levels, like classrooms, schools, districts, states, and countries. It also often requires linking with other datasets before analyzing for research. For example, a researcher learning about teacher preparation will need to link data about students at the school level with another dataset about teacher training.

Varied data types are another factor that makes education data complex. Education data includes more than numerical data types. It also includes categorical data, like student and teacher demographics. Another example is survey data, which often includes open-ended responses that are stored as string variables.

It’s true that varied data types and structures are challenging. But as with the diversity of training for educational data scientists, this complexity also presents opportunities for educators to creatively approach their tasks. There are techniques to handle each type of data listed above, which we’ll explore in this book.

4.2.5 Ethical and legal concerns

There are many ethical and legal concerns to consider when working with education data. At the K–12 level, most datasets require safeguards to protect student identities.

These include limitations on where and how data scientists work with confidential data. They also include limitations on how research results are shared.

A good first step towards responsible data use is awareness of the unique legal and ethical factors involved in your organization’s work. A good second step is establishing analytic routines for working with de-identified or anonymous data. Working with de-identified data is a common way to comply with your organization’s privacy procedures while still showing the value of data science techniques.

4.3 Conclusion

While there are many challenges to working with education data, there are also many opportunities. In particular, data science becomes a useful tool for deepening your content expertise. As an educational professional, remember that you are uniquely positioned to apply these tools to the most meaningful research questions your organization faces.

In summary, educators seeking to evolve their data analyses into something practical and meaningful for student progress will need to address challenges. Some of these challenges exist for all data analysis and others are unique to data analysis in education. Regardless, developing an awareness of the challenges and solutions will pay off.