is a database of weather observations from across the globe. This database is the base for the common global temperature reconstructions like HadCRUT and GISS. There are however some deficiencies with GHCN that makes it hard to use the data:
The urbanisation effect has a large impact on certain stations but it is not possible to distinguish which stations that are near human constructions. Despite these deficiencies, in this analysis I have used the raw GHCN data as it is and compared it with HadCRUT and GISS.
I have created a simple model, that easily can be reproduced, to create global and regional datasets of the GHCN raw daily data:
The use of year to year differences facilitates the problem of how to handle stations with measurements only for certain time periods. A station located in a warm place in a grid box might otherwise affect the gridbox average with a warm bias when the station's data is available and vice versa for a station in a cold place.
To create temperature series for different regions/countries, I have used the grid boxes where its center is located in that region/country. This means that for instance the US data might include stations in Canada and Mexico which are close to the border of the U.S.. The GHCN data includes a country code, but the HadCRUT and GISS gridded data don't. Therefore I have chosen this method.
As defined as gridboxes within 29W, 36N and 80E, 69N
There seems to be a very good agreement between the raw GHCN data and the adjusted GISS and HadCRUT data for areas and periods where and when there are plenty of measurements. In the 19th century and in the beginning of the 20th century, especially in the southern hemisphere, the curves diverge substantially. But a few stations moves can have a huge impact on the average temperature when there are only a few stations available. Further analysis is needed to find the root cause for these differences.
From the 1920th and onwards there is little evidence of any major adjustments of the data. For the northern hemisphere, where most of the stations are located, the adjustments for urbanization effects seem to be between 0.1 to 0.2 C for later years. Since there is no metadata for the GHCN data, it is not possible to validate if this adjustment is enough.
Iceland was included in the analysis due to previous strange adjustments in GISS. This seems to have been corrected now since both GISS and HadCRUT correlates well with the raw data.
Another problem with the GHCN data is that the number of stations peaked in the 1960s. See below plot of the area covered (using the above described model):
To further illustrate this problem I have plotted trend maps for these three data series for the years 1970 to 2018. Only those grid boxes with data in all those years are included. Please note that GISS uses extrapolation which HadCRUT and my GHCN series don't.
The GISS and HadCRUT data series matches the raw GHCN data well for periods and locations where there is data from many stations. There is little evidence of downward adjustments of older data and the UHI adjustment for recent years is around 0.1 to 0.2 degrees C. Unfortunately it is not possible to validate if this UHI adjustment is sufficient since the raw data lacks meta-data about the stations surroundings and equipment. Since it is not possible to validate the results of GISS of HadCRUT, the scientific value of these temperature series is questionable.
During the 21st century many stations have dropped out which means that only a small area of the globe is covered. There is therefore a risk that local climate variations affect the global average to a large degree and the global average created might therefore not represent the globe.
It is surprising that the database for historical temperature measurements isn't in a better shape considering the huge amount of money spent on climate research.
GHCN version 3.26
GISTEMP v4 (land air temperature, data smoothing 250km)
CRUTEM4 (18.104.22.168, land air temperature 5x5 degree grid)