Big Data, Big Planet

November 15, 2016 - 11:13am

We’re experiencing a big data explosion both in cultural awareness and in penetration into many aspects of everyday life. How did we get here? What role do weather and climate play in our data moment?

GOES-R satellite and ground stationBig data is characterized by its "Vs": Volume, Variety and Velocity. Data volume is “big” if it is too large to reside on or be processed with a personal computer.

Did you know that even in the pre-computer era, weather data was the original big data use case? It’s most useful when we have a lot of it and we need the data in real-time. Furthermore, we use a large variety of physical measurements to characterize the state of the atmosphere and ocean.

Whether that data was shared by lantern, signal flag, telegraph, telex, internet, or satellite, the volume and velocity of weather data has always stress-tested communication systems of their times. In addition, climate and weather models have pushed the state-of-the-art in supercomputers and digital storage systems in the modern era.

Weather data is cooperative. In order to be useful, we need a long climate record at a single location and/or data from a widespread network of upstream neighbors. It requires many people around the world to carefully collect, calibrate, and share data so that we can all use it.

Commercial data, e.g. how likely someone is to purchase an item, is treated as a commodity to be bought and sold. Web browser history from large groups of people can be collected easily and cheaply and then aggregated and sold profitably. Because human behavior changes quickly, much commercial data is quickly collected, processed and then discarded.

Researches collectiong data on buoy, Source: NOAAWeather data, in contrast, is collected at great cost, given away for free, and must be carefully preserved for future generations. In order to make useful weather predictions, we need to feed numerical weather prediction models with carefully quality-controlled and calibrated data from expensive satellites and ground networks around the world. Observational weather data is freely shared between nations, combined into global weather predictions and climate reconstructions and then given away to those that need it.

Furthermore, scientists need to perform extra steps to ensure that measurements have sufficient accuracy and precision to become long-term climate records. For example, every weather balloon instrument package flown from the Global Climate Observing System Reference Upper-Air Network (GRUAN) undergoes independent, very precise calibration immediately prior to launch. (See the GCOS Reference Upper-Air Network site.) Then science-trained data curators perform even more work to ensure that data is described in an accurate and consistent manner. Finally, data engineers build systems to preserve data so that it can outlive its storage media and to make it discoverable, retrievable, and usable for future generations.

Participants in the global weather enterprise come from all over. NCAR staff are involved with all aspects. Scientists in NCAR’s Earth Observing Laboratory (EOL) and Research Applications Laboratory (RAL) contribute to data collection and quality assurance. Data specialists at the Research Data Archive (RDA) perform data and metadata quality assurance and data engineering for long-term data preservation and retrieval. Visualization experts at Computational Information Systems Laboratory (CISL) and Unidata also build reusable data tools to help researchers and forecasters.

1938 Los Angeles FloodWhat would happen without all the people in this data chain? We need look no farther than past weather catastrophes such as the hurricane that hit Galveston without warning in 1900, killing 11,000; or the atmospheric river that unleashed flash floods in 1938, killing 115 people and flooding over one third of Los Angeles.

Contrast that with recent hurricane Matthew, which hit a much wider area. Early warning helped many people evacuate out of the storm’s path. 44 people died in the US, mainly from flooding well after the storm had passed.

In conclusion, weather and climate encompass the original big data use cases. While scientists and others in this area must grapple with some unique data challenges among big data problems, they also share the all-too-common problem of being unable to solve systemic problems with technology alone. Over 900 Haitians died in hurricane Matthew even though they were aware of the danger. They simply lacked the means to evacuate. It's not enough to know what is happening; we need to build systems that enable us to act on our knowledge.


About the author

Grace Peng

Grace Peng is a data specialist at NCAR’s Computational Information Systems Laboratory (CISL.) Her peripatetic career has spanned analytical and physical chemistry, spectroscopy, molecular physics, computer simulations and modeling, satellite meteorology; analysis, visualization and management of data; education, theater and costuming. She holds degrees in Mathematics (BA), Chemistry (BS) and Chemical Physics (PhD.)

Preserving data for future generations and teaching next generation scientists how to work effectively and accurately with data is her passion. She’d like to earn the moniker, “Data Whisperer.”