![]() | ![]() | Home | ![]() | My BADC | ![]() | Data | ![]() | Search | ![]() | Community | ![]() | Help | ![]() |
![]() | Guided Tour | Getting data | Formats | Software | 10 FAQ | ![]() |
The term metadata encompasses all the information necessary to interpret, understand and use a given dataset. Discovery metadata more particularly apply to information (keywords) that can be used to identify and locate the data that meet the user's requirements (via a Web browser, a Web based catalogue, etc). Detailed metadata include the additional information necessary for a user to work with the data without reference back to the data provider (although one element of the detailed metadata may be the data provider's contact!).
Metadata pertaining to observational data, for example, include details about how (with which instrument or technique), when, where and with which accuracy (or error bars) the data have been collected, by whom (including affiliation and contact address or telephone number) and in the framework of which research project. In the case of processed data, the nature of the initial raw data and the derivation process must be stated. The nature and units of the recorded variables are of course essential, as well as the grid or the reference system. Metadata pertaining to model output should include the name of the model, the conditions of the calculation, the type of constraint applied, the length of the integration, the nature of the output, the geographical domain over which the output is defined (when applicable), etc. Specific conditions applying to the model or the experiment may be mentioned. Metadata also obviously include information on the format in which the data are stored, the order of the variables, etc, to allow potential users to read them. Metadata pertaining to software models include the key points of the theory on which the model is based, the techniques and computational language used, references, etc.
Metadata relative to a specific data set can be provided as a separate document or as a piece of the data set itself. For digital data sets, this means that the metadata can sit in separate files (for example text files) or be integrated into the data file(s), as a header or at specified locations in the file. Some data formats provide room and rules for metadata (see Section 3).
As far as possible, metadata of data held at the BADC follow the guidelines laid below. Data providers are encouraged to comply with the BADC implementation of the Climate and Forecast (CF) Metadata Convention (see also Section 3).
N.B. Metadata relative to software are commonly included as comments, either in the top section of the source file, or at various places of the code.
Since the evaluation of information relevance may widely vary with individuals, some metadata standards have been and are still currently being developed with the aim of uniformising metadata presentation. The other advantage of metadata standards is that they ensure the transmission of the information contained in the metadata (and hence the ability to use the data), in some predefined generic way, to remote and future users, provided that the latter will know the adopted conventions. Which in turn requires the existence, maintenance and transmission of manuals describing the set of conventions relevant to a particular metadata standard some kind of metametadata.
Since a crucial section of the metadata pertains to the data format, different metadata standards have been developed in conjunction with the various data formats. (To know about the formats supported by the BADC, please refer to the BADC Formats Welcome Page). Existing data format standards, and metadata standards alike, are based both on the specific needs of confined scientific communities and on habits already in use within these communities. All of them regularly undergo updates and are susceptible of further evolution. In geosciences and among disciplines where 2-dimensional Earth surface reference systems play an important role (like archæology), the most popular data formats seem to belong to the GIS family (Geographic Information Systems). In the atmospheric research community, however, the third spatial (vertical) dimension obviously plays a crucial role, and so does time. Sections 3.1 and 3.2 below respectively give a brief outline of two formats widely used in the atmospheric sphere, namely the NASA Ames Format for Data Exchange, applying to data coded in ASCII, and NetCDF (network Common Data Form), applying to data coded in binary language and hence better adapted to voluminous data sets such as 3- or 4-dimensional fields, satellite data, etc. Both data formats include some metadata rules.
Standard rules can be mandatory, conditional or optional. They apply to three aspects of the metadata:
The NASA Ames Format for Data Exchange has been developed by S. Gaines and S. Hipskind at the NASA Ames Laboratory, for the benefit of instrument scientists operating atmospheric probe apparatus onboard balloons and aircrafts, and its straightforwardness and portability serve this purpose perfectly. It is in principle able to deal with 3- and 4-dimensional data sets, although the data layout within a file, which shows its original aim (i.e. the storage of time series), does not optimise the representation of fields on a 3-D or 4-D gridded domain. NASA Ames formatted data are coded in ASCII, which presents the noticeable advantage of being directly readable by (English speaking) humans, but the drawback of producing cumbersome files, which again is not optimal for 3-D or 4-D variables. Each NASA Ames file is divided into a header and a body, the latter containing the data, the former the metadata. The required metadata include both discovery and detailed metadata.
NASA Ames rules include some statements about the metadata content. Any additional information (for example, elements listed in Section 2.1 that would not fit into the provided rules) can still be inserted in dedicated comment lines at the end of the header. The metadata layout is strictly defined in the NASA Ames format, but for the comment lines, which are loosely constrained. A complete description of the NASA Ames data and metadata format (including content and layout rules) is available from the BADC NASA Ames Format Page.
The NASA Ames format makes no statement on any mandatory or suggested vocabulary. As mentioned earlier, data providers using NASA Ames are strongly encouraged to follow the BADC guidelines on CF conventions (see also Section 3.2 below).
NetCDF is the binary data format underlying the Network Common Data Form supported by Unidata. It allows the user to insert metadata in the data files.
The NetCDF Climate and Forecast (CF) Metadata Convention has developed a standard dealing mainly with vocabulary rules. Although this standard was developed with the NetCDF format in mind, it can be applied to any set of geophysical data, and probably extended to cover a much broader range of disciplines as well.
With the aim of providing a consistent way of describing atmospheric data sets, the BADC has developed its own implementation of CF metadata rules. If you are about to submit metadata to the BADC, whether you use NetCDF or not, please refer to the BADC implementation of the CF Convention.
![]() | Home | Contact | Disclaimer | Last Modified:
| ![]() |