Three Things I Learnt About Data Collection

Recently a colleague sent a data table to me, asking me to check before he submits for clearance.

I took a look at the table, and a number of questions went off within my head. The requirement was to provide responses to two questions that will be included as part of an international survey. I find the proposed responses odd. Why so? Because the questions asked about the provision of basic services in specific schools. Services which at first glance I know are definitely present in these schools.

Looking at the two question stems, my immediate thought on the ‘right’ responses would be that all the schools covered in the survey should be included in the responses. Which means the responses should be 100 percent. However my colleague didn’t put that down. The answer was not 100 percent, but a figure lower than that. Not for one, but both questions. And he provided the same percentage for both questions. I was puzzled, and so I asked.

It turned out that both my colleague and I were right. I expected 100 percent, and it was so. He was also right, because he went through the right methodology to gather the responses. Yet the resultant figures are not what I expected. Why did my colleague reflect a lower figure than 100 percent in the data table?

The reason turned out to be simple. The samples to factor in the two questions are different from the total number of schools I had in my mind. I had assumed that the questions required us to include all the schools as the base for both questions. The questions however had specified the kinds of schools to be considered as samples.

What have I learnt about data collection and analysis?

1. Words matter in the data collection process. Do not assume. In this particular data collection exercise, I had assumed that the word ‘schools’ referred to all the schools I know of. I forgot that the questions are asking for particular sets of schools. I forgot that schools are not built in the same way. Some are built for different purposes, which means I need to exclude these schools from the data count where needed.

Going one step upstream, it is important to be able to define the problem statement, or the issue at hand, clearly and carefully. This allows not just the data collector but also the supplier of data to have the same understanding. If it is useful, provide a glossary to guide the supplier of the data. The glossary will also help the data collector to hold the same definitions throughout the exercise. This will also help in subsequent interpretation and analysis too.

2. You need checks and balances in the data collection process. All of us probably heard the saying ‘garbage in, garbage out’ often. And this is true. If we interpret the questions wrongly, look at the data differently, or miss out on the positioning of the data, the eventual dataset collected is as meaningful as how we treat general waste. Every person and aspect of the data collection process is important. The person providing the initial dataset, the person cross-checking the data, the person maintaining the dataset over time, and even the data custodian who makes sure access and interpretation of data remains tight.

If we want to see desired outcomes realised in our work, we need to embrace a data-driven culture in an organisation. To have a data-driven culture, we need to ask how we can shift the mindset of everyone to start thinking of using data as evidence to drive improvements in processes and outcomes. And to achieve all these means we need to get everyone aligned, from the top echelon to the followers in the organisation. When everyone is serious in checking on the data being used to support decision-making, the quality of work naturally goes up.

3. Data is just … data, not facts or information. You will never get information or facts from data unless you go through a process of interpreting it. This means you need to understand the context the data was collected from so you know how to make meaning out of it. Because I did not fully understand the background of the schools required for the data table, I had assumed that all schools I am looking at match what is required by the survey questions.

To turn data into facts or information, it will be wise to consider the conditions under which the data was collected. We cannot claim that data collected over a specific period of time is representative of the entire population. For example, the entry and exit of people in a shopping mall takes place over time. We cannot claim that x number of people in that shopping mall at a certain time represents the footfall for the month. However, when carefully collected and aggregated over the month, we can make a claim that there will be an average of x number of people in a single day in the mall, representing the average daily footfall for the month.


There is a lot more I can talk about when it comes to data collection. I think the key aspects are briefly touched on here. We need to pay more attention to how evidence meant to support claims and actions are collected and interpreted. Being more aware of practices will help all of us do better in our various areas of work, and also be better agents of advocacy for different policies. It will also help safeguard ourselves from being victims of false or exaggerated claims.

Do you have any principles you abide in carrying out data collection or analysis? How do you detect erroneous data collection steps and what do you do to introduce checks and balances in the process? Do feel free to share!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s