Climate Science

GHCN analysis - a comparison with HadCRUT and GISS


The Global Historical Climatology Network (GHCN)

is a database of weather observations from across the globe. This database is the base for the common global temperature reconstructions like HadCRUT and GISS. There are however some deficiencies with GHCN that makes it hard to use the data:

  • There is no metadata of the stations describing siting, surroundings and equipment
  • There is no information of station moves
  • There is no information of equipment changes
  • The coordinates are often given with 2 decimals which makes the precision of the location to within +/- 500 meters.

The urbanisation effect has a large impact on certain stations but it is not possible to distinguish which stations that are near human constructions. Despite these deficiencies, in this analysis I have used the raw GHCN data as it is and compared it with HadCRUT and GISS.


I have created a simple model, that easily can be reproduced, to create global and regional datasets of the GHCN raw daily data:

  1. Calculate daily averages of the raw daily data for each station: Tavg = (Tmax+Tmin)/2
  2. Calculate, for each station, monthly averages of months with maximum 4 missing days. Other months are discarded.
  3. Calculate, for each station, yearly average for all years with 12 months of data. Other years are discarded.
  4. Calculate, for each station, year to year difference for all years where there is a value for previous year. Other years are discarded.
  5. Create a global grid where each gridbox is 5x5 degrees. This is the same grid as used by HadCRUT.
  6. Calculate, for each gridbox, the mean of the year to year difference using all stations in that grid box.
  7. Calculate global/regional averages of the year to year difference using the gridboxes in that region/country and weighting each gridbox with its size
  8. Create temperature series with the accumulated year to year difference and normalize it so that the average of the years 1961 to 1990 is zero.

The use of year to year differences facilitates the problem of how to handle stations with measurements only for certain time periods. A station located in a warm place in a grid box might otherwise affect the gridbox average with a warm bias when the station's data is available and vice versa for a station in a cold place.

Temperature series for different regions/countries (1850-2018)

To create temperature series for different regions/countries, I have used the grid boxes where its center is located in that region/country. This means that for instance the US data might include stations in Canada and Mexico which are close to the border of the U.S.. The GHCN data includes a country code, but the HadCRUT and GISS gridded data don't. Therefore I have chosen this method.


Northern Hemisphere

Southern Hemisphere

Contiguous United States


As defined as gridboxes within 29W, 36N and 80E, 69N




There seems to be a very good agreement between the raw GHCN data and the adjusted GISS and HadCRUT data for areas and periods where and when there are plenty of measurements. In the 19th century and in the beginning of the 20th century, especially in the southern hemisphere, the curves diverge substantially. But a few stations moves can have a huge impact on the average temperature when there are only a few stations available. Further analysis is needed to find the root cause for these differences.

From the 1920th and onwards there is little evidence of any major adjustments of the data. For the northern hemisphere, where most of the stations are located, the adjustments for urbanization effects seem to be between 0.1 to 0.2 C for later years. Since there is no metadata for the GHCN data, it is not possible to validate if this adjustment is enough.

Iceland was included in the analysis due to previous strange adjustments in GISS. This seems to have been corrected now since both GISS and HadCRUT correlates well with the raw data.

The coverage problem

Another problem with the GHCN data is that the number of stations peaked in the 1960s. See below plot of the area covered (using the above described model):

To further illustrate this problem I have plotted trend maps for these three data series for the years 1970 to 2018. Only those grid boxes with data in all those years are included. Please note that GISS uses extrapolation which HadCRUT and my GHCN series don't.


The GISS and HadCRUT data series matches the raw GHCN data well for periods and locations where there is data from many stations. There is little evidence of downward adjustments of older data and the UHI adjustment for recent years is around 0.1 to 0.2 degrees C. Unfortunately it is not possible to validate if this UHI adjustment is sufficient since the raw data lacks meta-data about the stations surroundings and equipment. Since it is not possible to validate the results of GISS of HadCRUT, the scientific value of these temperature series is questionable.

During the 21st century many stations have dropped out which means that only a small area of the globe is covered. There is therefore a risk that local climate variations affect the global average to a large degree and the global average created might therefore not represent the globe.

It is surprising that the database for historical temperature measurements isn't in a better shape considering the huge amount of money spent on climate research.

Data sources used

GHCN version 3.26

GISTEMP v4 (land air temperature, data smoothing 250km)

CRUTEM4 (, land air temperature 5x5 degree grid)