$Header$ -*-text-*-

netCDF Operators NCO version 5.2.6 re-crash the gates

http://nco.sf.net (Homepage, Mailing lists, Help)
http://github.com/nco/nco (Source Code, Issues, Releases)

Version 5.2.6 contains a one-line bugfix to ncremap in 5.2.5.
Unfortunately that horse had already escaped the barn.
There is nothing else new in 5.2.6, so the release notes below
repeat those for 5.2.5 so people can get a quick summary of new
features since the previous widely distributed version, 5.2.4.

What's new?
Version 5.2.6 fully implements the draft CF lossy metadata convention
for all NCO internal quantization algorithms. It also improves ncclimo 
diagnostics and adds a new invocation synonym (--qnt) for
quantization, and outputs a maximum_relative_error statistic
where possible. ncclimo/ncremap add support for MPAS-Atmosphere
datasets, and ncclimo supports excluding variable lists in timeseries
mode. Skip this release if these changes are not important to you. 

Work on NCO 5.2.7 has commenced and will add support for Zarr S3 
stores, and will enhance the vertical regridder.

Enjoy,
Charlie

NEW FEATURES (full details always in ChangeLog):

A. All numeric operators now support --qnt_alg=alg_nm to request an
NCO quantization algorithm by name. Previously one had to use the --baa
option with an obscure undocumented integer code for each algorithm.
Now quantization algorithms can be requested by the English names.
alg_nm can be BitGroom, Granular BitRound, BitShave, BitSet,
DigitRound, BitGroomRound, HalfShave, BruteForce, BitRound or common
synonyms for these names, e.g., set, shave, "bit round", btg, etc.
The names are case-insensitive:
ncks -7 -L 1               --qnt default=3 in.nc out.nc # Granular BitRound (NSD)
ncks -7 -L 1 --qnt_alg=btg --qnt default=3 in.nc out.nc # BitGroom (NSD)
ncks -7 -L 1 --qnt_alg=shv --qnt default=3 in.nc out.nc # BitShave (NSD)
ncks -7 -L 1 --qnt_alg=set --qnt default=3 in.nc out.nc # BitSet (NSD)
ncks -7 -L 1 --qnt_alg=dgr --qnt default=3 in.nc out.nc # DigitRound (NSD)
ncks -7 -L 1 --qnt_alg=gbr --qnt default=3 in.nc out.nc # Granular BitRound (NSD)
ncks -7 -L 1 --qnt_alg=bgr --qnt default=3 in.nc out.nc # BitGroomRound (NSD)
ncks -7 -L 1 --qnt_alg=sh2 --qnt default=9 in.nc out.nc # HalfShave (NSB)
ncks -7 -L 1 --qnt_alg=brt --qnt default=3 in.nc out.nc # BruteForce (NSD)
ncks -7 -L 1 --qnt_alg=btr --qnt default=9 in.nc out.nc # BitRound (NSB)
This menagerie arose from research efforts. We recommend that others
choose between BitRound, DigitRound, and Granular BitRound for
real-world workflows. The others are of mainly research or historical 
interest only. 
http://nco.sf.net/nco.html#qnt_alg
http://nco.sf.net/nco.html#qnt

B. CF-compliant metadata for quantization now includes maximum
relative error (MRE) for the BitRound algorithm, for which MRE is
0.5*2^-NSB = 2^-(NSB+1). The MRE appears as the attribute
"lossy_compression_maximum_relative_error" in each field's
metadata:
ncks -7 -v ps,ts --qnt_alg=btr --qnt default=9 --qnt ps=13 --cmp='shf|zst' in.nc out.nc
ncks -m -C -v ps,ts,compression_info out.nc
netcdf out {
...
    float ps(time,lat,lon) ;
      ps:standard_name = "surface_air_pressure" ;
      ps:units = "Pa" ;
      ps:lossy_compression = "compression_info" ;
      ps:lossy_compression_nsb = 13 ;
      ps:lossy_compression_maximum_relative_error = 6.103516e-05f ;

    float ts(time) ;
      ts:standard_name = "surface_temperature" ;
      ts:units = "K" ;
      ts:lossy_compression = "compression_info" ;
      ts:lossy_compression_nsb = 9 ;
      ts:lossy_compression_maximum_relative_error = 0.0009765625f ;
} // group /
http://nco.sf.net/nco.html#qnt_alg
http://nco.sf.net/nco.html#qnt

C. ncremap and ncclimo now handle MPAS-A (Atmosphere) datasets.
These operators have always supported MPAS Ocean, Sea-ice, and
Land-Ice (all used by E3SM) datasets. This completes MPAS support.
Use -P mpasa to indicate that datasets follow MPAS-A conventions.
This allow ncremap to automatically permute the spatial dimensions
into the correct order for regridding, and to differentiate itself
from other MPAS datasets in terms of missing value treatment:
ncremap -P mpasa  --map=map.nc mpa.nc foo.nc # MPAS-A
ncremap -P mali   --map=map.nc mpli.nc foo.nc # MPAS-LI
ncremap -P mpaso  --map=map.nc mpo.nc foo.nc # MPAS-O
ncremap -P mpassi --map=map.nc mpsi.nc foo.nc # MPAS-SI

NB: Omitting the -P mpasX option on MPAS datasets works if the user
explicitly permutes the horizontal dimensions, e.g.: 
ncremap --pdq=Time,nVertLevels,nIsoLevelsT,nIsoLevelsZ,nCells \
	--map=map.nc mpa.nc foo.nc # MPAS-A
Analogous workarounds apply to the other MPAS componenents when
omitting the -P option, and -P mpas also works for generic MPAS.
However, using -P mpasX results in the best output.
http://nco.sf.net/nco.html#MPAS
http://nco.sf.net/nco.html#ncremap
http://nco.sf.net/nco.html#pdq_opt
Thanks to Angela Borallo of CGG for prompting this feature.

D. ncclimo now supports excluding the specified variable list
(with -x or --xcl_var or --exclude) in timeseries mode.
Previously this option only worked in climo mode.
However, in timeseries mode this option requires invoking ncclimo
with Bash version 4.0 or higher. NB: This works well across modern
Linux  machines, though MacOS still ships Bash 3.2.57 (from 2007!).
MacOS users must put an updated Bash on their PATH before /bin/bash
to access this features (all other features continue to work fine
with older versions of Bash).
ncclimo --split --exclude -v FSNT,AODVIS,TREFHT \
	-c v2.LR.historical_0101 -s 2013 -e 2014 \
	-i ${DATA}/ne30/raw -o ${DATA}/ne30/clm
http://nco.sf.net/nco.html#xcl_var
Thanks to Koichi Sakaguchi of PNNL for prompting this feature.

E. ncremap now automatically tests whether vertical grid-file has a
level(level) coordinate à la ERA5. If so, it treats this as a pure
pressure coordinate. This is analogous to the treatment of the
plev(plev) coordinate for NCEP files.
ncremap --vrt_out=vrt_prs_era5_L37.nc in.nc out.nc
http://nco.sf.net/nco.html#vrt

F. ncclimo updated its MPAS dataset filename construction option.
Previously it constructed MPAS monthly datasets names like this:
${mdl_nm}.hist.am.timeSeriesStatsMonthly.${YYYY}-${MM}-01.nc
where mdl_nm is the canonical MPAS component name, e.g., mpaso.
This yielded names consistent with MPAS v1 output like
"mpaso.hist.am.timeSeriesStatsMonthly.0001-02-01.nc", and
"mpascice.hist.am.timeSeriesStatsMonthly.0001-02-01.nc", 
Now ncclimo prepends the ${caseid}, if present, to the filename.
This yields names consistent with E3SM v2 and v3 output like
"v2.LR.historical_0101.mpaso.hist.am.timeSeriesStatsMonthly.0001-02-01.nc", and
"v2.LR.historical_0101.mpassi.hist.am.timeSeriesStatsMonthly.0001-02-01.nc". 
To read MPAS filenames with different patterns, simply pipe the
filenames to ncclimo;
ls *mpas*hist | ncclimo ...
http://nco.sf.net/nco.html#ncclimo

BUG FIXES:
   
A. ncclimo: The -Y/--drc_rgr_xtn option was broken. This has been 
fixed. There is no workaround. The solution is to upgrade.
http://nco.sf.net/nco.html#drc_rgr_xtn

Full release statement at http://nco.sf.net/ANNOUNCE
    
KNOWN PROBLEMS DUE TO NCO:

This section of ANNOUNCE reports and reminds users of the
existence and severity of known, not yet fixed, problems. 
These problems occur with NCO 5.2.6 built/tested under
MacOS 14.5 with netCDF 4.9.3-dev on HDF5 1.14.3 and with
Linux FC38 with netCDF 4.9.2 on HDF5 1.14.1.

A. NOT YET FIXED (NCO problem)
   Correctly read arrays of NC_STRING with embedded delimiters in ncatted arguments

   Demonstration:
   ncatted -D 5 -O -a new_string_att,att_var,c,sng,"list","of","str,ings" ~/nco/data/in_4.nc ~/foo.nc
   ncks -m -C -v att_var ~/foo.nc

   20130724: Verified problem still exists
   TODO nco1102
   Cause: NCO parsing of ncatted arguments is not sophisticated
   enough to handle arrays of NC_STRINGS with embedded delimiters.

B. NOT YET FIXED (NCO problem?)
   ncra/ncrcat (not ncks) hyperslabbing can fail on variables with multiple record dimensions

   Demonstration:
   ncrcat -O -d time,0 ~/nco/data/mrd.nc ~/foo.nc

   20140826: Verified problem still exists
   20140619: Problem reported by rmla
   Cause: Unsure. Maybe ncra.c loop structure not amenable to MRD?
   Workaround: Convert to fixed dimensions then hyperslab

KNOWN PROBLEMS DUE TO BASE LIBRARIES/PROTOCOLS:

A. NOT YET FIXED (netCDF4 or HDF5 problem?)
   Specifying strided hyperslab on large netCDF4 datasets leads
   to slowdown or failure with recent netCDF versions.

   Demonstration with NCO <= 4.4.5:
   time ncks -O -d time,0,,12 ~/ET_2000-01_2001-12.nc ~/foo.nc
   Demonstration with NCL:
   time ncl < ~/nco/data/ncl.ncl   
   20140718: Problem reported by Parker Norton
   20140826: Verified problem still exists
   20140930: Finish NCO workaround for problem
   20190201: Possibly this problem was fixed in netCDF 4.6.2 by https://github.com/Unidata/netcdf-c/pull/1001
   Cause: Slow algorithm in nc_var_gets()?
   Workaround #1: Use NCO 4.4.6 or later (avoids nc_var_gets())
   Workaround #2: Convert file to netCDF3 first, then use stride
   Workaround #3: Compile NCO with netCDF >= 4.6.2

B. NOT YET FIXED (netCDF4 library bug)
   Simultaneously renaming multiple dimensions in netCDF4 file can corrupt output

   Demonstration:
   ncrename -O -d lev,z -d lat,y -d lon,x ~/nco/data/in_grp.nc ~/foo.nc # Completes but produces unreadable file foo.nc
   ncks -v one ~/foo.nc

   20150922: Confirmed problem reported by Isabelle Dast, reported to Unidata
   20150924: Unidata confirmed problem
   20160212: Verified problem still exists in netCDF library
   20160512: Ditto
   20161028: Verified problem still exists with netCDF 4.4.1
   20170323: Verified problem still exists with netCDF 4.4.2-development
   20170323: https://github.com/Unidata/netcdf-c/issues/381
   20171102: Verified problem still exists with netCDF 4.5.1-development
   20171107: https://github.com/Unidata/netcdf-c/issues/597
   20190202: Progress has recently been made in netCDF 4.6.3-development
   More details: http://nco.sf.net/nco.html#ncrename_crd

C. NOT YET FIXED (would require DAP protocol change?)
   Unable to retrieve contents of variables including period '.' in name
   Periods are legal characters in netCDF variable names.
   Metadata are returned successfully, data are not.
   DAP non-transparency: Works locally, fails through DAP server.

   Demonstration:
   ncks -O -C -D 3 -v var_nm.dot -p http://thredds-test.ucar.edu/thredds/dodsC/testdods in.nc # Fails to find variable

   20130724: Verified problem still exists. 
   Stopped testing because inclusion of var_nm.dot broke all test scripts.
   NB: Hard to fix since DAP interprets '.' as structure delimiter in HTTP query string.

   Bug tracking: https://www.unidata.ucar.edu/jira/browse/NCF-47

D. NOT YET FIXED (would require DAP protocol change)
   Correctly read scalar characters over DAP.
   DAP non-transparency: Works locally, fails through DAP server.
   Problem, IMHO, is with DAP definition/protocol

   Demonstration:
   ncks -O -D 1 -H -C -m --md5_dgs -v md5_a -p http://thredds-test.ucar.edu/thredds/dodsC/testdods in.nc

   20120801: Verified problem still exists
   Bug report not filed
   Cause: DAP translates scalar characters into 64-element (this
   dimension is user-configurable, but still...), NUL-terminated
   strings so MD5 agreement fails 

"Sticky" reminders:

A. Reminder that NCO works on most HDF4 and HDF5 datasets, e.g., 
   HDF4: AMSR MERRA MODIS ...
   HDF5: GLAS ICESat Mabel SBUV ...
   HDF-EOS5: AURA HIRDLS OMI ...

B. Pre-built executables for many OS's at:
   http://nco.sf.net#bnr

