In the past years there’s been a rapid growth of digital tools that enable the development of instruments to gather data (surveys, for instance). Although technological democratization is positive, it is essential to recognize the (almost forgotten) science that exists behind the scenes to develop accurate sampling methodologies. While the access to these tools has increased and almost anyone can make up a survey with a few clicks, we are quick to forget that a single statistic is rarely representative of what’s real. After hearing misinformed opinions related to the unfortunate news about Puerto Rico’s 2020 Census and after observing other surveys with multiple methodological mistakes that were erroneously interpreted, I deemed it a favorable occasion to write about the subject.
Censuses and Surveys
Applications of data gathering processes can be traced as far back as the Babylonian civilization with the first hints of what today we call a census. A census can be considered a process of systematically gathering information about every member of a population. By definition, the objective is the full enumeration of a population, not its sampling. Historically, censuses had been used as accurate tools for public administrations to determine multiple elements such as food needs, labor force expectations, population growth, and families living in households (to name a few). The methods to calculate, acquire and record this information in a systematic and scientific manner are highly complex, but also greatly expensive.
On the other hand, surveys have the objective of sampling a population. Surveys provide the ability to examine large groups (population/census) in a cost-effective way, using a subset of those groups (samples). One can look at this process as the equivalent of visiting a laboratory where a sample of blood is taken, with the objective of it being representative of an organism’s entirety. And this is where things get interesting. Although at first glance all surveys may seem the same in nature, there is a whole universe of flavors that can be applied to different circumstances. To keep it simple, surveys can use two types of sampling: one that enables a statistical representation of a population (like the example of laboratory testing) and one that is non-probabilistic in nature (their design is more exploratory). The first can be used to accurately derive conclusions about a population, while the second (which is mostly used in online surveys/polling) cannot. Don’t get me wrong, this second type of sampling is very useful in certain research environments; nonetheless, it is not suited to explain entire population characteristics or behaviors. But… why use surveys to generate information if production of digital data is at an all-time high?
Not All Data is Created Equal
One may be asking the need for surveys and all this complicated jargon when worldwide data creation and consumption are expected to reach over 180 zettabytes by 2025 (just for context, 1 zettabyte is equal to one billion terabytes). Most of this data is known as “organic data” because it’s born as a byproduct of individuals interacting with systems (online searches, mouse clicks, number of messages or calls received, times we open social media, hours of YouTube videos seen, etc.). In contrast, survey data (or design data) generates data that was pre-designed and engineered for a specific purpose. It involves creating unbiased questions that are later transformed into standardized questionnaires. These two are different in nature but not mutually exclusive. Further understanding the reality of something will come from merging these two and creating methodologies that enhance both organic and design data, as they work together to yield the most accurate results.
The Bottom Line
It’s hard to believe that surveys have no place in multi-method research. I’m convinced that combining data sources to produce new information not contained in any single source is the future. As I mentioned in a previous column, creativity and statistical robustness can coexist. Unfortunately, one can have all the science, processes, and technological infrastructure, but a data-driven future is not in the hands of protocols or hardware, but in the tradition, values and ethics that feed our CULTURE: the way we do things and why we do them. If one believes that a data-driven future is the way to really assess needs, the first step is to change the values, perceptions, and beliefs concerning the level of priority, respect, and importance this sensitive information should have. Such a slight shift in perspective can make a big difference when it comes to harnessing responsibly sourced data.