Beyond the lab: Using big data to discover principles of cognition

Lupyan, G., & Goldstone, R. L. (2019). Introduction to special issue. Beyond the lab: Using big data to discover principles of cognition.  Behavior Research Methods, 51, 1473-1476.

Like many other scientific disciplines, psychological science¬†has felt the impact of the big-data revolution. This impact¬†arises from the meeting of three forces: data availability, data¬†heterogeneity, and data analyzability. In terms of data¬†availability, consider that for decades, researchers relied on¬†the Brown Corpus of about one million words (Kuńćera &¬†Francis, 1969). Modern resources, in contrast, are larger by¬†six orders of magnitude (e.g., Google‚Äôs 1T corpus) and are¬†available in a growing number of languages. About 240 billion¬†photos have been uploaded to Facebook,1¬† and Instagram¬†receives over 100 million new photos each day.2¬† The largescale¬†digitization of these data has made it possible in principle¬†to analyze and aggregate these resources on a previously¬†unimagined scale. Heterogeneity¬† refers to the availability of¬†different types¬† of data. For example, recent progress in automatic¬†image recognition is owed not just to improvements in¬†algorithms and hardware, but arguably more to the ability to¬†merge large collections of images with linguistic labels (produced¬†by crowdsourced human taggers) that serve as training¬†data to the algorithms. Making use of heterogeneous data¬†sources often depends on their standardization. For example,¬†the ability to combine demographic and grammatical data¬†about thousands of languages led to the finding that languages¬†spoken by more people have simpler morphologies (Lupyan¬†& Dale, 2010 ). The ability to combine these data types would¬†have been substantially more difficult without the existence of¬†standardized language and country codes that could be used to¬†merge the different data sources. Finally, analyzability¬† must be¬†ensured, for without appropriate tools to process and analyze¬†different types of data, the ‚Äú data‚Ä̬† are merely bytes.

Download PDF of this paper

See all of the papers appearing in the Big Data Special Issue of Behavior Research Methods