Catalog Validation

The GFDL Catalog Builder toolset includes a comprehensive validator. This validator can perform two different types of validation: Vocabulary Validation, which checks against controlled vocabularies (CVs) as defined in the catalog schema, and Catalog Generation Validation (also known as “proper generation”), which ensures catalogs are generated correctly.

Vocabulary Validation

This test validates catalogs against CMIP6 or GFDL controlled vocabularies (CVs) provided by specific JSON schemas for each vocabulary type. Each vocabulary type to be validated will have a raw GitHub URL to its corresponding CV set in the catalog schema’s vocabulary section. CMIP6 CVs are found in the WCRP-CMIP/CMIP6_CVs GitHub repository. GFDL CVs are found in the NOAA-GFDL/CMIP6_CVs GitHub repository. The –vocab flag must be used to enable this test.

Proper Generation Validation

This test ensures catalogs generated by the Catalog Builder tool are minimally valid. This means a few things: the generated catalog JSON file reflects the template it was generated with, the catalog CSV has at least one row of values (not including headers), and each required column exists without any empty values. If a test case is broken or expected to fail, the –test-failure or -tf flag can be used. This flag will print errors instead of raising an exception. This test must be enabled with the -pg or –proper_generation flag.

Validating a catalog during catalog generation

In order to validate a catalog during generation, use the ‘–strict’ flag. This only activates vocabulary validation, however, as proper catalog generation is meant to be checked after generation is complete.

Using the standlone validator tool

The comprehensive validator tool can be found in catalogbuilder/tests/compval.py.

It can be used in a few ways:

compval.py <json_path> –vocab (Validates catalog against CV’s defined in vocabulary section of catalog schema)

compval <json_path> –proper_generation (Checks that catalog is minimally valid. Uses default catalog template/schema if no template path is given. This default template is located at catalogbuilder/cats/gfdl_template.json)

compval <json_path> <json_template_path> –proper_generation (Checks that catalog is minimally valid. Uses given template path to check for reflection.)

  • Vocab and proper generation tests can be run at the same time! *

Flags:
--vocab

Validates catalog vocabulary

-pg, --proper_generation

Validates that catalog has been ‘properly generated’ (No empty columns, reflects template)

-tf, --test_failure

Errors are only printed. Program will not exit.