Big Data and Learning Analytics in Blended Learning Environments: Benefits and Concerns

— The purpose of this article is to examine big data and learning analytics in blended learning environments. It will examine the nature of these concepts, provide basic definitions, and identify the benefits and concerns that apply to their development and implementation. This article draws on concepts associated with data-driven decision making, which evolved in the 1980s and 1990s


I. INTRODUCTION
N May 2014, I was at North-West University in South Africa to lecture and conduct workshops on blended learning in higher education. The topics of my workshops related to conducting research in instructional technology, design of blended learning environments, MOOCs, and technology planning. For the technology planning session, administrators at North-West University shared with me a document that outlined its plan for integrating more technology, and specifically blended learning, into its academic programs. Among the strategies to be considered was the effective use of learning analytics to profile students and track their learning achievements in order to:  identify at-risk students in a timely manner;  monitor student persistence on a regular basis; and  develop an evidence base for program planning and learner support strategies. During the session, I was specifically asked to give my opinion about whether North-West University should invest in learning analytics technology at this time. On May 28, 2014, one week after I returned to my home institution at the Graduate Center of the City University of New York (CUNY), I received an email from the University . Director of Academic Technology, asking if I would comment on a white paper entitled, Blackboard Analytics and CUNY. This white paper outlined the potential for implementation of learning analytics software into the University's course/learning management system. It was sent to members of a committee examining the feasibility of this software for the university. The email specifically asked for one of the following responses:  I think this is worth pursuing.  I don't think this is worth pursuing.  I am not sure. and to provide comments in support of the choice.
North-West University and CUNY are 8,000 miles apart, on different continents, with very different missions, organizational structures, and academic programs. Yet, with regard to the acquisition and development of learning analytics software they were pretty much in the exact same situation.
Instructional technology is at the center of many discussions on college and university campuses across the globe. The Internet has permeated every aspect of our societies by its ubiquity, and has changed higher education as well. Online and blended learning, specifically, are being utilized with increasing regularity and are changing the way instruction is provided. In the United States, more than seven million students, approximately one-third of the higher education population, were enrolled in fully online college courses in 2013. [1] Millions more are enrolled in blended courses, although precise data on the extent of blended learning in American higher education is not to be found because of problems with definition and accurate data reporting at the individual college level. Precision aside, the changes brought on by online access to instruction is affecting the way our colleges and universities are being administered. Infusions of technology infrastructure, large-scale databases, and demands for timely data to support decision making have seeped into all levels of college leadership and operations. Data-driven decision making is evolving into a vastly more sophisticated concept known as big data which relies on software approaches generally referred to as learning analytics. Big data and learning analytics for instructional applications are still evolving and will take a few years to mature, although their presence is already being felt and cannot be ignored. While big data and learning analytics are not panaceas for all of the issues and decisions being faced by higher education administrators, the hope is that they can become part of solutions and gracefully integrated into administrative and instructional functions. The purpose of this article is to examine the evolving world of big data and learning analytics in blended learning environments. Specifically, it will look at the nature of these concepts, provide basic definitions, and identify the benefits and concerns related to their development, implementation, and growth in higher education environments. Administrative decision making processes have been evolving for decades and as more data were made available from integrated information systems, decisions became more rational, using data to support alternative courses of action. A new phenomenon, generally termed online learning, emerged in the 1990s and the early 2000s, that changed the way many faculty teach and students learn. As mentioned earlier, millions of students are learning online and entire colleges have been "built" to offer the entirety of their academic programs online. In addition, for most institutions, online technology is being integrated with face-to-face instruction in what is commonly being referred to as blended learning. The utilization of data-driven decision making in online learning environments has opened up new approaches and avenues for collecting and processing data on students and course activities whereby instructional transactions can be immediately recorded and added to an institutional database. Academic administration and evaluation, which in the past occurred away from the classroom, can now be integrated more closely into instructional activities.

II. BLENDED LEARNING
Blended learning environments present unique challenges to implementing learning analytics mainly because they have so many different facets and are difficult to define. They combine face-to-face instruction and online technology in myriad ways.
Blended learning is not one thing but comes in many different flavors, styles, and applications. It means different things to different people. The word "blended" implies a mixture rather than simply an attaching of components. When a picture is pasted above a paragraph of text, a presentation is created that may be more informative to the viewer or reader, but the picture and text remain intact and can be individually discerned. On the other hand, when two cans of different colored paints are mixed, the new paint will look different from either of the original colors. In fact, if the new paint is mixed well, neither of the original colors will continue to exist. Similar situations exist in blended learning. The mix can be a simple separation of part of a course into an online component. For instance, in a course that meets for three weekly contact hours, two hours might take place in a traditional classroom while the equivalent of one weekly hour is conducted online. The two modalities for this course are carefully separated, and although they may overlap, they can still be differentiated. In other forms of blended courses and programs, the modalities are not so easily distinguishable. Consider an online program that offers three online courses in a semester that all students are required to take. The courses meet for three consecutive five week sessions. However, students do a collaborative fifteen-week project that overlaps the courses. The students are expected to maintain regular communication with one another through email and group discussion boards. They are also required to meet face-to-face once a month on Saturdays where course materials from the online courses are further presented and discussed and some sessions are devoted to group project work. These activities begin to blur the modalities in a new mixture or blend where the individual parts are not as discernable as they once were. Add to this the increasing popularity of integrating videoconferencing, podcasting, YouTube videos, wikis, blogs, and other media into class work and the definition of blended learning becomes very fluid.
In the broadest sense, blended learning (see Figure 1) can be conceptualized as a wide variety of technology/media integrated with conventional, face-to-face classroom activities. This conceptualization serves as a guideline and should not be viewed as an absolute, limiting declaration. Also, it can apply to entire academic programs as well as individual courses.
III. DATA-DRIVEN DECISION MAKING, BIG DATA, AND LEARNING ANALYTICS The focus of this article is technology-based approaches that support decision making in blended learning environments. The simplest definition of the popular term "data-driven decision making" is the use of data analysis to inform courses of action involving policy and procedures. Inherent in this definition is the development of reliable and timely information resources to collect, sort, and analyze the data used in the decision making process. It is important to note that data analysis is used to inform and does not mean to replace entirely the experience, expertise, intuition, judgment, and acumen of competent educators. While decision making may be singly defined as choosing between or among two or more alternatives, in a modern educational organization, decision making is an integral component of complex management processes such as academic planning, policy making, and budgeting. These processes evolve over time, require participation by stakeholders, and most importantly, seek to include information which will help all those involved in the decision process.
Fundamental to data-driven decision making is a rational model directed by values and based on data. It is wellrecognized, however, that a strictly rational model has limitations. An individual commonly associated with this concept and whose work is highly recommended for further reference, is Herbert Simon [2,3,4,5,6]. Simon was awarded the Nobel Prize in economics in 1978 for his research on decision making in organizations. His theory on the limits of rationality, later renamed "bounded rationality," has as its main principle that organizations operate along a continuum of rational and social behaviors mainly because the knowledge necessary to function strictly according to a rational model is beyond what is available. Although first developed in the 1940s, this theory has withstood the test of time and is widely recognized as a fundamental assumption in understanding organizational processes such as decision making and planning [7,8,9]. More recently, modern computerized information systems are facilitating and instilling a greater degree of rationality in decision making in all organizations including colleges and universities. They support organizations and help them to adjust, adapt, and learn in order to perform their administrative functions. [10] While these systems are not replacing the decision maker, they surely are helping to refine the decision-making process. Figure 2 illustrates the basic data-driven decision-making process. It assumes that decision making in education environments is fundamentally part of a social process. It also assumes that an information system is available to support the decision process, that internal and external factors not available through the information system are considered, and that a course or courses of action are determined. The information system in Figure 2 is a computerized database system capable of storing, manipulating, and providing reports from a wide variety of data. The decision process concludes with decision makers reflecting on and evaluating their decisions.
Terms related to data-driven decision making include data warehousing, data mining, and data disaggregation. Data warehousing essentially refers to a database information system that is capable of storing, integrating and maintaining large amounts of data over time. It might also involve multiple database systems. Data mining is a frequently used term in research and statistics which refers to searching or "digging into" a data file for information to understand better a particular phenomenon. Data disaggregation refers to the use of software tools to break data files down into various characteristics. An example might be using a software program to select student performance data by gender, by major, by ethnicity, or by other definable characteristics.
In recent years, two other terms, big data and analytics, have become important. Big data is a generic term that assumes that the information or database system(s) used as the main storage facility is capable of storing large quantities of data longitudinally and down to very specific transactions. For example, college student record keeping systems have maintained outcomes information on students such as grades in each course. This information could be used by institutional researchers to study patterns of student performance over time, usually from one semester to another or one year to another. In a big data scenario, data would be collected for each student transaction in a course, especially if the course was delivered electronically online. Every student entry on a course assessment, discussion board entry, blog entry, or wiki activity could be recorded, generating thousands of transactions per student per course. Furthermore, this data would be collected in real or near real time as it is transacted and then analyzed to suggest courses of action. Analytics software is evolving to assist in this analysis.
The generic definition of analytics is similar to data-driven decision making. Essentially it is the science of examining data to draw conclusions and, when used in decision making, to present paths or courses of action. In recent years, the definition of analytics has gone further, however, to incorporate elements of operations research such as decision trees and strategy maps to establish predictive models and to determine probabilities for certain courses of action. It uses data mining software to establish decision processes that convert data into actionable insight, uncover patterns, alert and respond to issues and concerns, and plan for the future. This might seem to be an overly complicated definition but the term "analytics" has been used in many different ways in recent years and has become part of the buzzword jargon that sometimes seeps into new technology applications and products. Goldstein and Katz (2005) in a study of academic analytics admitted that they struggled with coming up with a name and definition that was appropriate for their work. They stated that they adopted the term "academic analytics" for their study but that it was an "imperfect label." [11] Alias (2011) defined four different types of analytics that could apply to instruction including web analytics, learning analytics, academic analytics, and action analytics. [12] The trade journal, Infoworld, referred to analytics as: "One of the buzzwords around business intelligence software…[that]…has been through the linguistic grinder, with vendors and customers using it to describe very different functions.
The term can cause confusion for enterprises, especially as they consider products from vendors who use analytics to mean different things…" [13] Critical to the definition of analytics is the use of data to determine courses of action especially where there is a high volume of transactions. Common examples of analytics applications are examinations of Website traffic, purchases, or navigation patterns to determine which customers are more or less likely to buy particular products (i.e., books, movies) by ecommerce companies such as amazon.com or Netflix. Using these patterns, companies send personalized notifications to customers as new products become available. In higher education, analytics are beginning to be used for a number of applications that address student performance, outcomes, and persistence.
Big data concepts and analytics can be applied to a variety of higher education administrative and instructional applications including recruitment and admissions processing, financial planning, donor tracking and student performance monitoring. This article will focus on teaching and learning, and hence will specifically examine learning analytics.
To take advantage of big data and learning analytics, it is almost a requirement that transaction processing be electronic rather than manual. Traditional face-to-face instruction can support traditional data-driven decision-making processes, however, to move into the more extensive and time-sensitive learning analytics applications, it is important that instructional transactions are collected as they occur. This would be possible within a course management/learning management system (CMS/LMS). Most CMS/LMSs provide constant monitoring of student activity whether they are responses, postings on a discussion board, accesses of reading material, completions of quizzes, or some other assessment. Using the full capabilities of a basic CMS/LMS, a robust fifteen week online course would generate thousands of transactions per student.
Real-time recording and analysis of these transactions could then be used to feed a learning analytics application. Not waiting for the end of a marking period or semester to record performance measures is critical to this type of application. Monitoring student transactions on a real-time basis allows for real-time decisions. Instructors may take actions or intervene in time to alert or assist students. A CMS/LMS or something similar therefore becomes critical for collecting and feeding this data into a "big" database for processing by a learning analytics software application. These instructional transactions should also be integrated with other resources such as student, course, and faculty data from the college information systems. Analytics software can then be used to analyze these transactions to establish patterns that are used to develop guidelines and rules for subsequent courses of action (see Figure 3). An important caveat is that the data accuracy should never be compromised in favor of timeliness of the data. Both accuracy and timeliness are required and need to be present in the learning analytics application.
In a white paper published by IBM entitled Analytics for Achievement, eight categories of possible instructional applications utilizing analytics were described. The eight categories are as follows: 1. Monitoring individual student performance 2. Disaggregating student performance by selected characteristics such as major, year of study, ethnicity, etc. 3. Identifying outliers for early intervention 4. Predicting potential so that all students achieve optimally 5. Preventing attrition from a course or program 6. Identifying and developing effective instructional techniques 7. Analyzing standard assessment techniques and instruments (i.e. departmental and licensing exams) 8. Testing and evaluation of curricula. [14] Of the above, monitoring individual student performance and course participation in a course is among the most popular type of learning analytics application. Anyone who has ever taught (face-to-face or online) will monitor student participation to determine engagement with the course material. Taking attendance is a time-honored classroom activity and most instructors will become concerned about students who have too many absences. Grades on quizzes and papers are also frequently monitored. A conscientious instructor will review his/her records and meet with those students who are not meeting the standards for the course. Many colleges have instituted mid-term reviews that provide students with indicators of their progress in a course. In online courses, CMS/LMSs routinely provide course monitoring statistics and rudimentary early warning systems that allow instructors to follow up with students who are not responding on blogs or discussion boards, not accessing reading materials, or not promptly taking quizzes. These course statistics are maintained in real-time and instructors can review them as often as they wish. Students who are not as engaged as they should be can be sent emails expressing concerns about their performance. None of these interventions requires learning analytics, however, these interactions can be enhanced significantly by expanding the amount and nature of the data collected. For example, a single student response on a discussion board can be analyzed through pattern recognition to determine the depth and quality of student engagement with the course material. The pattern used in this type of analysis are uncovered by examining thousands and tens of thousands of other student responses and evaluating sentences and phrases.
Examples of well-designed learning analytics-based student monitoring systems are Rio Salado Community College's Progress and Course Engagement (PACE) system, Northern Arizona University's Grade Performance System (GPS), and Purdue University's Course Signals System.
For purposes of this article, the Course Signals System, in particular, is a good example of learning analytics software because it is one of the first to be used in blended learning environments. It combines demographic information with online student interactions and produces a red, yellow or green light to show students how well they are doing in their courses --and also provides that information to their professors who can intervene if necessary . Developed originally at Purdue University, Course Signals was licensed to SunGard Higher Education (now Ellucian) in 2010 to make it available to other colleges and universities. Large CMS/LMS providers such as Desire2Learn and Blackboard have modeled their own retention early warning systems after Purdue's work. [15] It has won a number of awards including the Campus Technology Innovators Award, Digital Education Achievement Award, and the Lee Noel and Randi Levitz Retention Excellence Awards. Course Signals has been used in online, face-to-face, and blended learning environments. It has been particularly popular in large-section size, blended, and flipped classroom courses. [16] While there have been several studies supporting the use of learning analytics software such as Course Signals for improving student retention, more research needs to be done. [17] Michael Caulfield, director of blended and networked learning at Washington State University at Vancouver, cautioned that the early research on the effectiveness of learning analytics on retention needs further verification and review. [18] The fact is that learning analytics as a tool for retention is still in its nascent stage. The Society for Learning Analytics Research (SoLAR) is an inter-disciplinary network of leading international researchers who are exploring the role and impact of analytics on teaching, learning, training and development. This society was established in 2011 and has held four conferences to discuss issues related to learning analytics research. To provide a vehicle for documenting the research, SoLAR established The Journal of Learning Analytics, a peerreviewed, open-access journal, for disseminating research in this field. It provides a research forum within what George Siemens, the president of SoLAR calls "the messiness of science". [19] The first edition was published in June, 2014. The articles in this first edition address issues such as scaling-up learning analytics initiatives, the relationship between LMS/VLE usage and learning performance, the role of psychometric data to predict academic achievement, and the capacity to detect boredom through user log-data. All of these, while important, are just beginning to scratch the surface of effectiveness of learning analytics with respect to student performance and retention. Furthermore, there is practically no research that does cost-benefit comparisons of the largescale implementation of learning analytics in blended learning or face-to-face environments. In sum, there is a long road ahead for researchers in this field and much study to be done.

IV. BENEFITS AND CONCERNS
The New Horizon Report is published each year by The New Media Consortium and EDUCAUSE. It predicts six emerging technologies that are likely "to enter mainstream use" over the next five years. In the 2014 Report, the six technologies in rank order were identified as follows: 1. Growth of Social Media 2. Integration of Online, Blended and Collaborative Learning 3. Rise of Data-Driven Learning and Assessment 4. Shift from Students as Consumers to Students as Creators 5. Agile Approaches to Change 6. Evolution of Online Learning [20] The ranking of these six technologies indicates that the first two will likely enter the mainstream in one to two years; the second two within three years; and the last two within five years or more. The Rise of Data-Driven Learning and Assessment (referring to learning analytics) was ranked third and indicates that this technology has potential and that widespread adoption is projected to be about three years away. This ranking also indicates that learning analytics need more exploration at this time and refinements before their adoption.

A. Benefits
Learning analytics can have significant benefits in monitoring student performance and progress. First, and at its most basic level, learning analytics software can mine down to the frequency with which individual students access a CMS/LMS, how much time they are spending in a course, and the number and nature of instructional interactions. These interactions can be categorized into assessments (tests, assignments, or exercises), content (articles, videos, or simulations viewed) and collaborative activities (blogs, discussion groups, or wikis).
Second, by providing detailed data on instructional interactions, learning analytics can significantly improve academic advisement related directly to teaching and learning. Learning analytics can improve the ability to identify at-risk students and intervene at the first indication of trouble. Furthermore by linking instructional activities with other student information system data (college readiness, gender, age, major), learning analytics software is able to review performance across the organizational hierarchy: from the student, to courses, to department, to the entire college. It can provide insights into individual students as well as the learning patterns of various cohorts of students.
Third, learning analytics software is able to provide longitudinal analysis that can lead to predictive behavior studies and patterns. By linking CMS/LMS databases with an institution's information system, data can be collected over time. Student and course data can be aggregated and disaggregated to analyze patterns at multiple levels of the institution. This would allow for predictive modeling that in turn, can create and establish student outcomes alert systems and intervention strategies.
In sum, learning analytics can become an important element in identifying students who are at risk and alerting advisors and faculty to take appropriate actions. Furthermore, it can do so longitudinally across the institution and can undercover patterns to improve student retention that in turn, can assist in academic planning.

B. Concerns
First, in order for big data and learning analytics applications to function well, data need to be accurate and timely. Learning analytics software works best for courses that are delivered completely electronically such as online courses. Traditional face-to-face courses that require significant data conversion time are problematic. Blended learning courses (part face-to-face and part online) likewise present data collection problems. Because blended learning courses vary so much in the nature of their delivery, learning analytics software can have significant data gaps.
Instructional transactions that take place in the face-to-face environment will be lost unless the faculty member or teaching assistant is willing to manually enter them into the student information system.
The second, and perhaps the most serious concern, is that since learning analytics require massive amounts of data collected on students and integrated with other databases, colleges need to be mindful of privacy, data profiling, and the rights of students in terms of recording their individual behaviors. While college classes have always involved evaluating student performance and academic behavior, learning analytics take the recording of behavior to a whole new level and scope. As well-intentioned as learning analytics might be in terms of helping students succeed, this "big data" approach may also be seen as "big brother is watching" and, as such ,an invasion of privacy that some students would find objectionable. Precautions need be taken to ensure that the extensive data collection of student instructional transactions is not abused in ways that potentially hurt individuals. Vicky Gunn, Director of the Learning and Teaching Centre at the University of Glasgow, advises: "… it is clear that the growth of learning analytics needs a few up-front protocols of protection as soon as possible.. We should especially be considering…Ethical consent structures to enable students to know what is being gathered, when and how it will be used as well as opportunities for students to opt in/out." [21] Third, there are not yet enough individuals trained to use big data and analytics appropriately.
Experienced database administrators and designers capable of warehousing and integrating data across multiple files and formats are a necessity. In addition to the expertise needed to develop databases, instructional designers working with faculty will need to understand and derive insights into the student behaviors that are pertinent to the application at hand. There is also a need for institutional researchers, or others knowledgeable about statistics, decision trees, and strategy mapping, to develop algorithms that construct predictive models. College administrators may have to invest in consultants or undertake extensive professional development of their own staffs in order to develop appropriate applications. This will take time and additional resources and may or may not be worth the return on investment. Furthermore, because of the dearth of expertise, there may be a tendency to use instructional templates that are integrated into CMS/LMSs. These, although convenient, may be overly simplistic and should be considered with caution.
Fourth, a good deal of college and university student data may end up in larger governmental databases either at the state or national level. Bennett (2011) cautions that the United States is heading to an all-inclusive national K-20 database. [21] Federal education policies as promulgated by No Child Left Behind and Race to the Top funding have pushed many states to adopt comprehensive statewide student databases that could easily be the basis for establishing a national system. Furthermore, there is a certain amount of influence being exerted on the part of the U.S. Department of Education in favor of development of common database structures. Such a system might be beneficial but may also leave individuals vulnerable to privacy, data security and theft issues. In 2013, the people of the United States were awakened to the spying activities of the National Security Agency (N.S.A.) and the intelligence arms of other governments around the world. The problem became so bad that large Internet service companies such as Google, Facebook, and Yahoo invested hundreds of millions of dollars to seal up security systems that Edward J. Snowden revealed the N.S.A. had been exploiting. After years of cooperating with the U.S federal government, the goal of many of these companies is to thwart Washington as well as Beijing and Moscow. The users of "big data" and analytics need to be careful that these mega-database systems do not become the playground of exploitative individuals and organizations. [23] Lastly, it might be beneficial to revisit the work of Herbert Simon and his theory on the limits of rational decision making that was mentioned earlier in this article. Herbert Simon was a life-long supporter of the use of computer technology to support decision making, including the application of artificial intelligence. At Carnegie-Mellon University where he taught for decades he was active in integrating artificial intelligence software in the learning sciences to improve instruction. Instructional data-driven decision making and learning analytics parallel Simon's work in this area. In his honor, Carnegie-Mellon University established the Simon Initiative in 2013 to accelerate the use of learning science and technology to improve student learning. This initiative harnesses CMU's decades of learning data and research to improve educational outcomes for students. However, as database systems become bigger and as software such as learning analytics becomes more complex, a case can be made that the limits of rational decision making are being exceeded because of the plethora of information and data available. Simon was highly focused on the efficient use of data and is famously quoted as saying that too much information can consume its recipients and that "… a wealth of information creates a poverty of attention…" [24] Simon's quote may be a most appropriate concern in the era of big data and learning analytics. Nathan Silver, an American statistician, echoed Simon in his 2012 bestseller, The Signal and the Noise…., and cautioned that in predictive models, there is a tendency to collect a lot of meaningless data (i.e., noise) creating the danger of poor predictions. [25] V. CONCLUSION This article started with a reference to two scenarios, one at North-West University in South Africa and one at the City University of New York. These institutions are very different, yet are facing similar decisions with regard to investing and acquiring learning analytic software. They are also similar in that, while they have some fully online academic programs, both are presently and for the foreseeable future investing heavily in blended learning. My recommendation to both institutions was that before committing to learning analytics they do a careful analysis of the costs related to acquiring this software. These would include not only direct costs such as software licenses and maintenance contracts, but also indirect costs to hire personnel and/or consultants to design and implement learning analytics applications. It would also include the feasibility and cost of data collection in blended learning environments where faculty or other personnel would be needed to provide accurate and timely data.
Colleges and universities around the world need to meet a number of challenges related to providing greater access to higher education.
However, expanding access does not necessarily lead to expanding resources. To the contrary, higher education policy makers, while calling for more access, are limiting resources and instructional technology such as online and blended learning is being seen as an important vehicle for expanding access while containing costs. In its truest sense, expanded access does not just mean getting acceptance into college programs; it also means successful completion of degrees. Student attrition in many colleges and universities is at unacceptable levels and needs to be addressed as well. Data-driven decision making and learning analytics software have the potential to assist colleges in identifying and evaluating strategies that can improve retention. At the present time, however, these software are best suited for fully online environments, not face-to-face or blended learning environments. Nevertheless, as data-driven decision making enters the big data and learning analytics era, these new approaches, while not silver bullets, may be part of the solution. Higher education administrators would do well to consider the benefits, concerns, and costs iterated above when evaluating whether big data and learning analytics can be used in their institutions and determining the exact role they can play.