Big data / small data

What’s this all about? 

We live in a world characterised by an unprecedented amount of data. It is being collected, measured, reported, and analysed everywhere we look, and in places where we don’t even realise it’s happening. This growth has occurred since the development of databases capable of storing large volumes of data, and is growing exponentially with the design of even larger data storage units linked to an increasing range of tools and mechanisms for capturing data. Businesses, schools / kura and governments create new data on a daily basis. In addition, every time we send an email, post on social media or search an online website we are contributing to this vast store of data. 

Data that is gathered and stored in this way is referred to as ‘big data’. Having access to big data enables us to do searches, compare information, and see correlations and trends in ways never before experienced. We’ve become used to seeing the modelling of complex weather patterns on the evening news, predictions made about election outcomes, or movements on the stock exchange, all made possible because of big data.

Big data is a massive volume of data that moves too fast and is too large and complex to analyse and process without the use of very sophisticated technology. Every day, big data systems gather billions and trillions of items of information from millions of people from a range of sources such as web analytics, social media, customer service information and mobile phone applications and services.

This data is referred to in two ways:

  1. Structured data has a pattern that makes it easily searchable such as audio, video and social media information. It is is stored using established data sets (e.g. the way a phone book is organised, or the student data on a school SMS) and so is quite straightforward to analyse.
  2. Unstructured data includes dates, numbers, and facts and can be human or computer-generated. An example is an email where the body of text in the message field follows no set pattern and so traditional analytics cannot analyse it. But as technology develops, so too is the ability to scrutinise these seemingly ‘random’ pieces of data and report back in ways that show patterns, trends, and make predictions. 

While the use of big data provides a number of benefits, there are a number of concerns to be acknowledged here, including:

  • Privacy - the ability of an organisation or individual to determine what data in a computer system can be shared with third parties.
  • Security - the process of protecting data from unauthorized access and data corruption throughout its lifecycle.
  • Quality - determined by factors such as accuracy, completeness, reliability, relevance and how up to date the information is.
  • Alignment - the way data is arranged and accessed in computer memory, allowing for future access and transfer between systems etc.

In educational settings we’re seeing an increased use of large data sets to help us make decisions about student learning. Patterns and trends in the learning behaviours of individuals and cohorts are used to predict future achievement, and to identify specific gaps that can be addressed to accelerate the learning process. The larger the data sets, and the more sophisticated the technology we use to analyse and report on them, the greater the confidence we seem to have in the insights they offer.

Whilst extremely useful in terms of the insights provided, there are some problems with relying heavily on big data and the associated analytics. Big data is great at providing us with correlations, but not that good when it comes to identifying causation, or ‘the why’. For example, analysing school achievement information may show that the students / ākonga in a particular school or kura have consistently high scores, compared to other schools, in a specific area of the curriculum, let’s say maths.

Another search of the data for the students at the same school may show that a far higher percentage of students at the school wear brown shoes than is the case for students at other schools. So, while it is accurate to observe a correlation between the two sets of data - shoe colour and maths scores - it is certainly not accurate to suggest that wearing brown shoes is a cause of better maths scores. While this illustration poses an obviously silly relationship, there are many examples where the argument for causation is made in the same erroneous way.

Consider the way that school decile ratings have become a de facto measure of intelligence - i.e. the correlation between data revealing that students in low decile schools achieve at a lower level to those in higher decile schools. The correlation may be true, but it cannot be argued that intelligence or ability will be limited simply because a child attends a low decile school (causation).

This is where an argument is being made for what is being referred to as ‘small data’. In education settings, small data refers to the everyday, nuanced observations and decisions made by teachers/kaiako, based on the intimate knowledge they have of learners, gathered through years of experience, knowledge of other family members, ‘feel’ for class dynamics, and awareness of factors affecting a student’s health or attention at that moment. These moment-by-moment judgments, made daily by teachers, provide insights that are every bit as valuable as what the big data reveals.

Finnish educator and researcher, Pasi Sahlberg, explains how small data can give us tiny clues to create great impact by uncovering important relationships about teaching and learning. A perfect example of small data in centres, schools and kura is spending time with learners, understanding how they are feeling about their learning, the supports they need, and if something is affecting their ability to learn a specific task. This includes having a deep understanding of their cultural contexts and backgrounds, so that we act in culturally responsive ways when making decisions about how best to facilitate their learning. 

The important thing about this trend area is being aware of the dynamic relationship between big data and small data as described in this way, and to understand the folly of blindly relying on what the big data suggests, without taking into account the small data, like teacher judgments, that we constantly engage with as professionals.

The early years sector provides a prime example. For many years early years professionals have used a simple frame of noticing, recognising, and responding when working with learners. Noticing means to be intensely aware of, and tuned into, what the learner is doing at that moment like taking account of the environment, and what others are doing. Recognising means seeing how this behaviour or accomplishment fits with what is known in terms of learning theory and of the child’s own background and stage of development. Finally, taking all of this into account, the teacher responds in a way that suggests the next steps to be taken. It may be as simple as a word of encouragement to repeat the action so that through repetition a particular skill or outcome becomes embedded, or it may involve using questions to prompt the student to consider the next step they may try. This process happens many times over on a daily basis in our early years settings. It doesn’t involve interrogating databases and looking at large volumes of data, it relies on insightful and informed decisions being made by a skilled professional. This is small data in action.

What’s driving this change?

Global competitiveness

The World Economic Forum defines global competitiveness as "the ability of a country to achieve sustained high rates of growth in gross domestic product (GDP) per capita." As countries around the world seek to remain competitive in this way it is important that they have a highly skilled workforce being prepared through their education system, and will often rely on the international benchmark comparisons as a measure of their success in this.

Emphasis on evidence-based practice

Evidence-based practice (EBP) has become a buzzword across all areas of Government in recent times, for example, in health, social policy, and education. With so much at stake, and with the cost of education constantly rising, the demand for gaining maximum return on the investment made is a priority for many governments and education systems. This has driven many initiatives for gathering and using data to help inform this evidence base.

Personalisation of learning

Personalised learning aims to provide a more tailored education for every learner. This places great demands on teachers in a traditional setting for, even with very small teacher-student ratios, monitoring individual learning pathways becomes overwhelming. Modern, data-driven, online systems promise to provide up to the minute feedback and next steps suggestions for learners pursuing a personalised learning pathway.

Internet of things

There are now more ‘things’ connected to the internet than people. Almost everything imaginable is now capable of gathering data and feeding it into a massive data store where it can be combined, organised, analysed and patterns and correlations produced within seconds. We have the potential now to track student learning habits using data from their laptops, their pens, their smartphones and watches or from cameras tracking their eye movements as they read a page, for example. All of this is happening in a continuous stream providing real-time feedback that can be used to predict learning outcomes and provide feedback to improve. 

Data storage and processing technology

Until recently it was difficult to imagine just how all of the data captured could be stored and analysed in a timely fashion, but with the increased capacity and capability of cloud-based data storage and processing systems, the ability to work with seemingly endless amounts of data is now commonplace. Once the ability to do this was limited by the processing speed and storage capacity of a local desktop, but now an individual’s computer may now act as a part of a globally connected ‘supercomputer’, capable of processing highly complex analysis tasks in micro-seconds.

What examples of this can I see?

Increased use of learner analytics

In 2017 there were six nationally-funded case studies in New Zealand carried out in tertiary education, of how to build an evidence-base for teaching and learning design using learning analytics data. One of the case studies at Massey University explored how data and analytics could be used to encourage students who appeared not to be engaged in their study. The project used data through a learning management system which recorded what students were viewing and doing, and time-stamped it to show when they were doing it. Learning analytics in this case supported the teachers to design more effective teaching strategies, bringing an awareness that specific students were disengaged with the material.

AI and personalised learning

Developments in AI systems allow educators to bring personalised learning to the classroom and empower students to learn the best way they can. An example of a system-wide application can be seen in the AltSchool platform, co-developed by educators and engineers, and used throughout a network of lab and partner US schools serving pre-K through 12th-grade students.

Data integration projects at a national level

The New Zealand Ministry of Education’s Student Information Sharing Initiative (SISI) project aims to enable data within Student Management Systems (SMSs) to travel with children and young people as they move through the education system. The ability of different technologies to exchange and share data in this way is intended to allow for more immediate and responsive decision-making on the part of educators and others who are supporting each child in his or her learning.

Internet of things

Data is being captured from an increasing range of ‘things’ that we encounter in our daily lives, whether that be the roads we use or the cities we live in through to gadgets we include in our homes. These are constantly monitoring our activity and using that data to help us make decisions, and in some cases, make those decisions for us. The AltSchool example illustrates the application of this thinking in an education context.

How might we respond? 

It’s easy to feel overwhelmed by the rate of development of the technology driving much of this area. In addition, there are questions about data security and data sovereignty to be addressed, not to mention the impact of all of this on teachers and others charged with the professional duty of working with learners.

Some questions to help guide your professional discussions on this issue are:

In relation to big data:
  • What use are you currently making of ‘big data’ to inform the design of learning in your context? (e.g. standardised test scores, LMS and SMS reports etc.)
  • To what extent do you reference some of the meta-analyses around teaching and learning to inform your practice? (e.g. Best Evidence Synthesis, Visible Learning) How is this happening?
  • How might you make better use of data-gathering systems to support your efforts to personalise learning for all students? 
In relation to ‘small data’:
  • Do you give enough time to communicate with students and engage in conversation to discover the things about them and their learning that matter?
  • Do you and your colleagues take time to consider and address the ‘small data’ concerns arising from your teaching as inquiry, for example?
  • How can we use small data more effectively in teachers’ work, for example in overall teacher judgments?