In the post Cloud Weather Reporting with R and Hadoop the meaning of synoptic cloud data was not explored. This post looks at cloud patterns and their impact on world weather.
The cloud weather analytics covered mainly comprises aggregate frequency data (e.g. cloud location, coverage and precipitation types) and time series data (e.g. when precipitation happens and for how long). This post focusses on frequency data analytics, while the next post will focus on time dependant analytics.
Frequency Based Analytics
Frequency based properties are evaluated through MapReduce by extracting relevant weather data using the following R script as Mapper:
# Mapper script
#! /usr/bin/env Rscript
con <- file("stdin", open = "r")
while (length(line <- readLines(con, n = 1, warn = FALSE)) > 0)
{
if(substr(line,9,9)=="1")
cat(sprintf("%d%s%d\n",
as.integer((abs(as.integer(substr(line,10,14))/2250)+1)/2),
substr(line,33,38),
as.integer(as.integer(substr(line,26,27))/10)))
#output latitude, cloud types and precipitation level
#latitude is coded as 0 for tropical, 1 for mid-latitudes, 2 for polar regions
}
close(con)
The Reducer script from a previous post gives counts for each combination of cloud type and precipitation level, from which most frequency based weather analytics can be derived. For example, stratocumulus frequency is the aggregate count of where the low cloud type code is 4,5 or 8, divided by the total number of reports in the dataset.
Ideally from a data mining perspective, error prone work such as interpretation of WMO codes and further data analytics can be done using an R script and, as this is outside Hadoop, can be tested and amended quickly.
Results
This data mining reveals a few properties of world weather, in addition to a few local weather patterns.
Location of Clouds
Nimbostratus is “common in middle latitudes”[1] – in the dataset 85% of nimbostratus occurrences are mid-latitude. Cumulonimbus is “common in tropics and temperate regions, rare at poles”[1], somewhat reflected in the dataset where 23% of occurrences of cumulonimbus are in tropical regions and 4% are in polar regions.
Likelihood of Precipitation
100% of occurrences of nimbostratus occur with precipitation, as WMO codes define precipitation bearing altostratus or altocumulus to be nimbostratus. Indeed nimbostratus is derived from ‘nimbus’, Latin for rain.
Conversely, high cloud types (cirrus, cirrostratus and cirrocumulus) generate no precipitation[1] so any perceived correlation with precipitation levels is meaningless. For remaining cloud types, precipitation occurrence levels range from 3% for cumulus to 38% for stratus. Indeed cumulus is associated with “generally no precipitation”[1].
Types of Precipitation
Cumulonimbus typically results in “Heavy downpours, hail”[1] – in the dataset 91% of precipitation associated with cumulonimbus is in the thunderstorm range. Indeed in the tropics, cumulonimbus often forms from hot weather through convection during monsoon season.
Conversely, nimbostratus is rarely seen with thunderstorms, more with “moderate to heavy rain or snow”[1] – 54% of nimbostratus occurrences happen with rain. Similarly, stratocumulus is usually accompanied by “occasional light rain, snow”[1] – 47% of cumulonimbus occurrences happen with rain.
Cloud Embedding
Cumulonimbus can be embedded with nimbostratus, and in such instances may result in heavy rain, but this is rare[2]. Indeed, further data mining reveals that only 5.9% of the occurrences of nimbostratus show this embedding.
Cloud Coverage Levels
The most significant cloud coverage comes from cirrus, stratocumulus and cumulus. For the dataset cirrus cloud cover is 22%, stratocumulus cloud cover is 24% and altocumulus cloud cover is 25%. This compares quite well to satellite data of 20 to 25% for cirrus[3], 25% for stratocumulus[2] and 20 to 25% for altocumulus[4].
Conclusion
The data mining results compare quite well with established cloud facts and world weather satellite data, particularly on location and coverage of clouds.
In the next post more complex data mining is presented i.e. time dependant analytics. For instance how to determine which clouds cause most prolonged rainfall? And how is this analytics done efficiently, given the amount of re-sorting of data involved?
Reblogged this on HadoopEssentials.
LikeLike