From 21653046ca98fed8d6670eaa0257dfb109bd51c3 Mon Sep 17 00:00:00 2001 From: Allen Byrne Date: Tue, 3 Dec 2024 08:29:31 -0600 Subject: [PATCH] Split doxygen pages from spec and TN into files --- doxygen/dox/AppDbgging.dox | 160 ++++ doxygen/dox/GettingStarted.dox | 10 +- doxygen/dox/SWMRTechNote.dox | 146 +++ doxygen/dox/Specifications.dox | 121 +-- doxygen/dox/TableSpec.dox | 100 ++ doxygen/dox/TechnicalNotes.dox | 1596 +------------------------------- doxygen/dox/Unicode.dox | 144 +++ doxygen/dox/VDSTechNote.dox | 115 +++ doxygen/dox/VFLTechNote.dox | 1025 ++++++++++++++++++++ src/H5Fmodule.h | 8 +- src/H5Ppublic.h | 6 +- 11 files changed, 1715 insertions(+), 1716 deletions(-) create mode 100644 doxygen/dox/AppDbgging.dox create mode 100644 doxygen/dox/SWMRTechNote.dox create mode 100644 doxygen/dox/TableSpec.dox create mode 100644 doxygen/dox/Unicode.dox create mode 100644 doxygen/dox/VDSTechNote.dox create mode 100644 doxygen/dox/VFLTechNote.dox diff --git a/doxygen/dox/AppDbgging.dox b/doxygen/dox/AppDbgging.dox new file mode 100644 index 00000000000..a4126965f55 --- /dev/null +++ b/doxygen/dox/AppDbgging.dox @@ -0,0 +1,160 @@ + +/** \page APPDBG Debugging HDF5 Applications + +\section sec_adddbg_intro Introduction +The HDF5 library contains a number of debugging features to make programmers' lives +easier including the ability to print detailed error messages, check invariant +conditions, display timings and other statistics. + +\subsection subsec_adddbg_intro_err Error Messages +Error messages are normally displayed automatically on the standard error stream and +include a stack trace of the library including file names, line numbers, and function +names. The application has complete control over how error messages are displayed and +can disable the display on a permanent or temporary basis. Refer to the documentation + for the H5E error handling package. + +\subsection subsec_adddbg_intro_invar Invariant Conditions +Unless NDEBUG is defined during compiling, the library will include code to verify that +invariant conditions have the expected values. When a problem is detected the library will +display the file and line number within the library and the invariant condition that +failed. A core dump may be generated for post mortem debugging. The code to perform these +checks can be included on a per-package bases. + +\subsection subsec_adddbg_intro_stats Timings and Statistics +The library can be configured to accumulate certain statistics about things like cache +performance, datatype conversion, data space conversion, and data filters. The code is +included on a per-package basis and enabled at runtime by an environment variable. + +\subsection subsec_adddbg_intro_trace API Tracing +All API calls made by an application can be displayed and include formal argument names +and actual values and the function return value. This code is also conditionally included +at compile time and enabled at runtime. + +The statistics and tracing can be displayed on any output stream (including streams opened by +the shell) with output from different packages even going to different streams. + +\section sec_adddbg_msg Error Messages +By default any API function that fails will print an error stack to the standard error stream. +\code +HDF5-DIAG: Error detected in thread 0. Back trace follows. + #000: H5F.c line 1245 in H5Fopen(): unable to open file + major(04): File interface + minor(10): Unable to open file + #001: H5F.c line 846 in H5F_open(): file does not exist + major(04): File interface + minor(10): Unable to open file +\endcode +The error handling package (H5E) is described elsewhere. + +\section sec_adddbg_invars Invariant Conditions +To include checks for invariant conditions the library should be configured +with --disable-production, the default for versions before 1.2. The library +designers have made every attempt to handle error conditions gracefully but +an invariant condition assertion may fail in certain cases. The output from +a failure usually looks something like this: +\code +Assertion failed: H5.c:123: i + +Name + +Default + +Description + + + +aNoAttributes + + +acYesMeta data cache + + +bYesB-Trees + + +dYesDatasets + + +eYesError handling + + +fYesFiles + + +gYesGroups + + +hgYesGlobal heap + + +hlNoLocal heaps + + +iYesInterface abstraction + + +mfNoFile memory management + + +mmYesLibrary memory management + + +oNoObject headers and messages + + +pYesProperty lists + + +sYesData spaces + + +tYesDatatypes + + +vYesVectors + + +zYesRaw data filters + + + +In addition to including the code at compile time the application must enable each package at +runtime. This is done by listing the package names in the HDF5_DEBUG environment variable. That +variable may also contain file descriptor numbers (the default is '2') which control the output +for all following packages up to the next file number. The word 'all' refers to all packages. Any +word my be preceded by a minus sign to turn debugging off for the package. + +\subsection subsec_adddbg_stats_sample Sample debug specifications + + + + + + + + + + + + + +
all +This causes debugging output from all packages to be sent to the standard error stream. +
all -t -s +Debugging output for all packages except datatypes and data spaces will appear on the standard error stream. +
-all ac 255 t,s +This disables all debugging even if the default was to debug something, then output +from the meta data cache is send to the standard error stream and output from data types +and spaces is sent to file descriptor 255 which should be redirected by the shell. +
+The components of the HDF5_DEBUG value may be separated by any non-lowercase letter. + +*/ diff --git a/doxygen/dox/GettingStarted.dox b/doxygen/dox/GettingStarted.dox index 274598c9537..a37b197afea 100644 --- a/doxygen/dox/GettingStarted.dox +++ b/doxygen/dox/GettingStarted.dox @@ -50,7 +50,7 @@ The high-level HDF5 library includes several sets of convenience and standard-us -\ref IntroParHDF5 +@ref IntroParHDF5 A brief introduction to Parallel HDF5. If you are new to HDF5 please see the @ref LearnBasics topic first. @@ -58,7 +58,7 @@ A brief introduction to Parallel HDF5. If you are new to HDF5 please see the @re -\ref ViewTools +@ref ViewTools \li @ref LearnHDFView @@ -71,8 +71,8 @@ A brief introduction to Parallel HDF5. If you are new to HDF5 please see the @re New Features since HDF5-1.10 -\li \ref VDS -\li \ref SWMR +\li @ref VDSTN +\li @ref SWMRTN @@ -80,7 +80,7 @@ New Features since HDF5-1.10 Example Programs -\ref HDF5Examples +@ref HDF5Examples diff --git a/doxygen/dox/SWMRTechNote.dox b/doxygen/dox/SWMRTechNote.dox new file mode 100644 index 00000000000..7041ac046bc --- /dev/null +++ b/doxygen/dox/SWMRTechNote.dox @@ -0,0 +1,146 @@ + +/** \page SWMRTN Introduction to Single-Writer/Multiple-Reader (SWMR) + +\section sec_swmr_intro Introduction to SWMR +The Single-Writer / Multiple-Reader (SWMR) feature enables multiple processes to read an HDF5 file +while it is being written to (by a single process) without using locks or requiring communication between processes. +tutr-swmr1.png + +All communication between processes must be performed via the HDF5 file. The HDF5 file under SWMR access must +reside on a system that complies with POSIX write() semantics. + +The basic engineering challenge for this to work was to ensure that the readers of an HDF5 file always +see a coherent (though possibly not up to date) HDF5 file. + +The issue is that when writing data there is information in the metadata cache in addition to the physical file on disk: +tutr-swmr2.png + +However, the readers can only see the state contained in the physical file: +tutr-swmr3.png + +The SWMR solution implements dependencies on when the metadata can be flushed to the file. This ensures that metadata cache +flush operations occur in the proper order, so that there will never be internal file pointers in the physical file +that point to invalid (unflushed) file addresses. + +A beneficial side effect of using SWMR access is better fault tolerance. It is more difficult to corrupt a file when using SWMR. + +\subsection subsec_swmr_doc Documentation +\subsubsection subsubsec_swmr_doc_guide User Guide +SWMR User Guide + +\subsubsection subsubsec_swmr_doc_apis HDF5 Library APIs + + +\subsubsection subsubsec_swmr_doc_tools Tools +\li h5watch — Outputs new records appended to a dataset as the dataset grows +\li h5format_convert — Converts the layout format version and chunked indexing types of datasets created with +HDF5-1.10 so that applications built with HDF5-1.8 can access them +\li h5clear — Clears superblock status_flags field, removes metadata cache image, prints EOA and EOF, or sets EOA of a file + +\subsubsection subsubsec_swmr_doc_design Design Documents + +\subsection subsec_swmr_model Programming Model +Please be aware that the SWMR feature requires that an HDF5 file be created with the latest file format. See +#H5Pset_libver_bounds for more information. + +To use SWMR follow the the general programming model for creating and accessing HDF5 files and objects along with the steps described below. + +\subsubsection subsubsec_swmr_model_writer SWMR Writer +The SWMR writer either opens an existing file and objects or creates them as follows. + +Open an existing file: +Call #H5Fopen using the #H5F_ACC_SWMR_WRITE flag. +Begin writing datasets. +Periodically flush data. + +Create a new file: +Call #H5Fcreate using the latest file format. +Create groups, datasets and attributes, and then close the attributes. +Call #H5Fstart_swmr_write to start SWMR access to the file. +Periodically flush data. + +

Example Code:

+Create the file using the latest file format property: +\code + fapl = H5Pcreate (H5P_FILE_ACCESS); + status = H5Pset_libver_bounds (fapl, H5F_LIBVER_LATEST, H5F_LIBVER_LATEST); + fid = H5Fcreate (filename, H5F_ACC_TRUNC, H5P_DEFAULT, fapl); + // Create objects (files, datasets, ...). + // Close any attributes and named datatype objects. + // Groups and datasets may remain open before starting SWMR access to them. + + // Start SWMR access to the file: + status = H5Fstart_swmr_write (fid); + + // Reopen the datasets and then start writing, periodically flushing data: + status = H5Dwrite (dset_id, ...); + status = H5Dflush (dset_id); +\endcode + +\subsubsection subsubsec_swmr_model_reader SWMR Reader +The SWMR reader must continually poll for new data: + +Call #H5Fopen using the #H5F_ACC_SWMR_READ flag. +Poll, checking the size of the dataset to see if there is new data available for reading. +Read new data, if any. + +

Example Code:

+\code + // Open the file using the SWMR read flag: + fid = H5Fopen (filename, H5F_ACC_RDONLY | H5F_ACC_SWMR_READ, H5P_DEFAULT); + // Open the dataset and then repeatedly poll the dataset, by getting the dimensions, reading new data, and refreshing: + dset_id = H5Dopen (...); + space_id = H5Dget_space (...); + while (...) { + status = H5Dread (dset_id, ...); + status = H5Drefresh (dset_id); + space_id = H5Dget_space (...); + } +\endcode + +\subsection subsec_swmr_scope Limitations and Scope +An HDF5 file under SWMR access must reside on a system that complies with POSIX write() +semantics. It is also limited in scope as follows. + +The writer process is only allowed to modify raw data of existing datasets by; +Appending data along any unlimited dimension. +Modifying existing data +The following operations are not allowed (and the corresponding HDF5 files will fail) +\li The writer cannot add new objects to the file. +\li The writer cannot delete objects in the file. +\li The writer cannot modify or append data with variable length, string or region reference datatypes. +\li File space recycling is not allowed. As a result the size of a file modified by a SWMR writer may be larger than a file modified by a non-SWMR writer.

+ +\subsection subsec_swmr_tools Tools for Working with SWMR +Two new tools, h5watch and h5clear, are available for use with SWMR. The other HDF5 utilities have also been modified to recognize SWMR +\li The h5watch tool allows a user to monitor the growth of a dataset. +\li The h5clear tool clears the status flags in the superblock of an HDF5 file. +\li The rest of the HDF5 tools will exit gracefully but not work with SWMR otherwise. + +\subsection subsec_swmr_example Programming Example +A good example of using SWMR is included with the HDF5 tests in the source code. You can run it while reading +the file it creates. If you then interrupt the application and reader and look at the resulting file, you will +see that the file is still valid. Follow these steps: +\li Download the HDF5 source code to a local directory on a filesystem (that complies with POSIX write() semantics). +Build the software. No special configuration options are needed to use SWMR. +\li Invoke two command terminal windows. In one window go into the bin directory of the built binaries. +In the other window go into the test directory of the HDF5-1.10 source code that was just built. +\li In the window in the test directory compile and run use_append_chunk.c. The example writes a three +dimensional dataset by planes (with chunks of size 1 x 256 x 256). +\li In the other window (in the bin directory) run h5watch on the file created by +use_append_chunk.c (use_append_chunk.h5). It should be run while use_append_chunk is executing and you +will see valid data displayed with h5watch. +\li Interrupt use_append_chunk while it is running, and stop h5watch. +\li Use h5clear to clear the status flags in the superblock of the HDF5 file (use_append_chunk.h5). +\li View the file with h5dump. You will see that it is a valid file even though the application did not +close properly. It will contain data up to the point that it was interrupted. + +*/ diff --git a/doxygen/dox/Specifications.dox b/doxygen/dox/Specifications.dox index de9c23d80aa..5e5b782eb00 100644 --- a/doxygen/dox/Specifications.dox +++ b/doxygen/dox/Specifications.dox @@ -9,39 +9,38 @@ \section File Format -\li \ref FMT1 -\li \ref FMT11 -\li \ref FMT2 -\li \ref FMT3 +\li \ref FMT1SPEC +\li \ref FMT11SPEC +\li \ref FMT2SPEC +\li \ref FMT3SPEC \section Other \li \ref IMG -\li \ref TBL -\li - HDF5 Dimension Scale Specification +\li \ref TBLSPEC +\li \ref sec_dim_scales_spec */ -/** \page FMT3 HDF5 File Format Specification Version 3.0 +/** \page FMT3SPEC HDF5 File Format Specification Version 3.0 \htmlinclude H5.format.html */ -/** \page FMT2 HDF5 File Format Specification Version 2.0 +/** \page FMT2SPEC HDF5 File Format Specification Version 2.0 \htmlinclude H5.format.2.0.html */ -/** \page FMT11 HDF5 File Format Specification Version 1.1 +/** \page FMT11SPEC HDF5 File Format Specification Version 1.1 \htmlinclude H5.format.1.1.html */ -/** \page FMT1 HDF5 File Format Specification Version 1.0 +/** \page FMT1SPEC HDF5 File Format Specification Version 1.0 \htmlinclude H5.format.1.0.html @@ -51,104 +50,4 @@ \htmlinclude ImageSpec.html -*/ - -/** \page TBL HDF5 Table Specification Version 1.0 -The HDF5 specification defines the standard objects and storage for the standard HDF5 -objects. (For information about the HDF5 library, model and specification, see the HDF -documentation.) This document is an additional specification do define a standard profile -for how to store tables in HDF5. Table data in HDF5 is stored as HDF5 datasets with standard -attributes to define the properties of the tables. - -\section sec_tab_spec_intro Introduction -A generic table is a sequence of records, each record has a name and a type. Table data -is stored as an HDF5 one dimensional compound dataset. A table is defined as a collection -of records whose values are stored in fixed-length fields. All records have the same structure -and all values in each field have the same data type. - -The dataset for a table is distinguished from other datasets by giving it an attribute -"CLASS=TABLE". Optional attributes allow the storage of a title for the Table and for -each column, and a fill value for each column. - -\section sec_tab_spec_attr Table Attributes -The attributes for the Table are strings. They are written with the #H5LTset_attribute_string -Lite API function. "Required" attributes must always be used. "Optional" attributes must be -used when required. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Table 1. Attributes of an Image Dataset
Attribute NameRequired
Optional
TypeString SizeValueDescription
CLASSRequiredString5"TABLE"This attribute is type #H5T_C_S1, with size 5. For all Tables, the value of this attribute is -TABLE. This attribute identifies this data set as intended to be interpreted as Table that -conforms to the specifications on this page.
VERSIONRequiredString3"0.2"This attribute is of type #H5T_C_S1, with size corresponding to the length of the version string. -This attribute identifies the version number of this specification to which it conforms. The current -version number is "0.2".
TITLEOptionalString  The TITLE is an optional String that is to be used as the informative title of the whole table. -The TITLE is set with the parameter table_title of the function #H5TBmake_table.
FIELD_(n)_NAMERequiredString  The FIELD_(n)_NAME is an optional String that is to be used as the informative title of column n -of the table. For each of the fields the word FIELD_ is concatenated with the zero based field (n) -index together with the name of the field.
FIELD_(n)_FILLOptionalString  The FIELD_(n)_FILL is an optional String that is the fill value for column n of the table. -For each of the fields the word FIELD_ is concatenated with the zero based field (n) index -together with the fill value, if present. This value is written only when a fill value is defined -for the table.
- -The following section of code shows the calls necessary to the creation of a table. -\code -// Create a new HDF5 file using default properties. -file_id = H5Fcreate("my_table.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); - -// Call the make table function -H5TBmake_table("Table Title", file_id, "Table1", NFIELDS, NRECORDS, dst_size, field_names, dst_offset, field_type, chunk_size, fill_data, compress, p_data); - -// Close the file. -status = H5Fclose(file_id); -\endcode - -For more information see the @ref H5TB reference manual page and the @ref H5TB_UG, which includes examples. - - */ diff --git a/doxygen/dox/TableSpec.dox b/doxygen/dox/TableSpec.dox new file mode 100644 index 00000000000..79866043a92 --- /dev/null +++ b/doxygen/dox/TableSpec.dox @@ -0,0 +1,100 @@ + +/** \page TBLSPEC HDF5 Table Specification Version 1.0 +The HDF5 specification defines the standard objects and storage for the standard HDF5 +objects. (For information about the HDF5 library, model and specification, see the HDF +documentation.) This document is an additional specification do define a standard profile +for how to store tables in HDF5. Table data in HDF5 is stored as HDF5 datasets with standard +attributes to define the properties of the tables. + +\section sec_tab_spec_intro Introduction +A generic table is a sequence of records, each record has a name and a type. Table data +is stored as an HDF5 one dimensional compound dataset. A table is defined as a collection +of records whose values are stored in fixed-length fields. All records have the same structure +and all values in each field have the same data type. + +The dataset for a table is distinguished from other datasets by giving it an attribute +"CLASS=TABLE". Optional attributes allow the storage of a title for the Table and for +each column, and a fill value for each column. + +\section sec_tab_spec_attr Table Attributes +The attributes for the Table are strings. They are written with the #H5LTset_attribute_string +Lite API function. "Required" attributes must always be used. "Optional" attributes must be +used when required. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Table 1. Attributes of an Image Dataset
Attribute NameRequired
Optional
TypeString SizeValueDescription
CLASSRequiredString5"TABLE"This attribute is type #H5T_C_S1, with size 5. For all Tables, the value of this attribute is +TABLE. This attribute identifies this data set as intended to be interpreted as Table that +conforms to the specifications on this page.
VERSIONRequiredString3"0.2"This attribute is of type #H5T_C_S1, with size corresponding to the length of the version string. +This attribute identifies the version number of this specification to which it conforms. The current +version number is "0.2".
TITLEOptionalString  The TITLE is an optional String that is to be used as the informative title of the whole table. +The TITLE is set with the parameter table_title of the function #H5TBmake_table.
FIELD_(n)_NAMERequiredString  The FIELD_(n)_NAME is an optional String that is to be used as the informative title of column n +of the table. For each of the fields the word FIELD_ is concatenated with the zero based field (n) +index together with the name of the field.
FIELD_(n)_FILLOptionalString  The FIELD_(n)_FILL is an optional String that is the fill value for column n of the table. +For each of the fields the word FIELD_ is concatenated with the zero based field (n) index +together with the fill value, if present. This value is written only when a fill value is defined +for the table.
+ +The following section of code shows the calls necessary to the creation of a table. +\code +// Create a new HDF5 file using default properties. +file_id = H5Fcreate("my_table.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); + +// Call the make table function +H5TBmake_table("Table Title", file_id, "Table1", NFIELDS, NRECORDS, dst_size, field_names, dst_offset, field_type, chunk_size, fill_data, compress, p_data); + +// Close the file. +status = H5Fclose(file_id); +\endcode + +For more information see the @ref H5TB reference manual page and the @ref H5TB_UG, which includes examples. + + +*/ diff --git a/doxygen/dox/TechnicalNotes.dox b/doxygen/dox/TechnicalNotes.dox index 8bea54bfb25..0590ad7ba33 100644 --- a/doxygen/dox/TechnicalNotes.dox +++ b/doxygen/dox/TechnicalNotes.dox @@ -8,11 +8,11 @@ \li \ref IOFLOW \li \ref TNMDC \li \ref thread-safe-lib -\li \ref SWMR -\li \ref VDS +\li \ref SWMRTN +\li \ref VDSTN \li \ref RELVERSION \li \ref UNICODE -\li \ref VFL +\li \ref VFLTN \li HDF5 Library Architecture Overview \li \ref VOL_Connector @@ -30,1599 +30,9 @@ */ -/** \page VFL HDF5 Virtual File Layer - -\section sec_vfl_intro Introduction -The HDF5 file format describes how HDF5 data structures and dataset raw data are mapped -to a linear format address space and the HDF5 library implements that bidirectional mapping -in terms of an API. However, the HDF5 format specifications do not indicate how the format -address space is mapped onto storage and HDF (version 5 and earlier) simply mapped the format -address space directly onto a single file by convention. - -Since early versions of HDF5 it became apparent that users want the ability to map the -format address space onto different types of storage (a single file, multiple files, local -memory, global memory, network distributed global memory, a network protocol, etc.) with -various types of maps. For instance, some users want to be able to handle very large format -address spaces on operating systems that support only 2GB files by partitioning the format -address space into equal-sized parts each served by a separate file. Other users want the -same multi-file storage capability but want to partition the address space according to -purpose (raw data in one file, object headers in another, global heap in a third, etc.) -in order to improve I/O speeds. - -In fact, the number of storage variations is probably larger than the number of methods -that the HDF5 team is capable of implementing and supporting. Therefore, a Virtual File -Layer API is being implemented which will allow application teams or departments to design -and implement their own mapping between the HDF5 format address space and storage, with each -mapping being a separate file driver (possibly written in terms of other file drivers). The -HDF5 team will provide a small set of useful file drivers which will also serve as examples -for those who which to write their own: - - - - - - - - - - - - - - - - -
#H5FD_SEC2This is the default driver which uses Posix file-system functions -like read and write to perform I/O to a single file. All I/O requests are unbuffered -although the driver does optimize file seeking operations to some extent. -
#H5FD_STDIOThis driver uses functions from 'stdio.h' to perform buffered I/O to a single file. -
#H5FD_COREThis driver performs I/O directly to memory and can be -used to create small temporary files that never exist on permanent storage. This -type of storage is generally very fast since the I/O consists only of memory-to-memory copy operations. -
#H5FD_MPIOThis is the driver of choice for accessing files in parallel -using MPI and MPI-IO. It is only predefined if the library is compiled with parallel I/O support. -
#H5FD_FAMILYLarge format address spaces are partitioned into more -manageable pieces and sent to separate storage locations using an underlying driver -of the user's choice. \ref H5TOOL_RT_UG can be used to change the sizes of the family -members when stored as files or to convert a family of files to a single file or vice versa. -
- -\section sec_vfl_use Using a File Driver -Most application writers will use a driver defined by the HDF5 library or contributed by another -programming team. This chapter describes how existing drivers are used. - -\subsection subsec_vfl_use_hdr Driver Header Files -Each file driver is defined in its own public header file which should be included by any -application which plans to use that driver. The predefined drivers are in header files whose -names begin with 'H5FD' followed by the driver name and '.h'. The 'hdf5.h' header file includes -all the predefined driver header files. - -Once the appropriate header file is included a symbol of the form 'H5FD_' followed by the -upper-case driver name will be the driver identification number.(The driver name is by convention -and might not apply to drivers which are not distributed with HDF5.) However, the value may -change if the library is closed (e.g., by calling #H5close) and the symbol is referenced again. - -\subsection subsec_vfl_use_create Creating and Opening Files -In order to create or open a file one must define the method by which the storage is -accessed(The access method also indicates how to translate the storage name to a storage server -such as a file, network protocol, or memory.) and does so by creating a file access property -list(The term "file access property list" is a misnomer since storage isn't required to be a file.) -which is passed to the #H5Fcreate or #H5Fopen function. A default file access property list is created -by calling #H5Pcreate and then the file driver information is inserted by calling a driver initialization -function such as #H5Pset_fapl_family: -\code -hid_t fapl = H5Pcreate(H5P_FILE_ACCESS); -size_t member_size = 100*1024*1024; /*100MB*/ -H5Pset_fapl_family(fapl, member_size, H5P_DEFAULT); -hid_t file = H5Fcreate("foo%05d.h5", H5F_ACC_TRUNC, H5P_DEFAULT, fapl); -H5Pclose(fapl); -\endcode - -Each file driver will have its own initialization function whose name is H5Pset_fapl_ followed by -the driver name and which takes a file access property list as the first argument followed by additional -driver-dependent arguments. - -An alternative to using the driver initialization function is to set the driver directly using the -#H5Pset_driver function.(This function is overloaded to operate on data transfer property lists also, as described below.) -Its second argument is the file driver identifier, which may have a different numeric value from run to run -depending on the order in which the file drivers are registered with the library. The third argument encapsulates -the additional arguments of the driver initialization function. This method only works if the file driver -writer has made the driver-specific property list structure a public datatype, which is often not the case. -\code -hid_t fapl = H5Pcreate(H5P_FILE_ACCESS); -static H5FD_family_fapl_t fa = {100*1024*1024, H5P_DEFAULT}; -H5Pset_driver(fapl, H5FD_FAMILY, &fa); -hid_t file = H5Fcreate("foo.h5", H5F_ACC_TRUNC, H5P_DEFAULT, fapl); -H5Pclose(fapl); -\endcode - -It is also possible to query the file driver information from a file access property list by -calling #H5Pget_driver to determine the driver and then calling a driver-defined query function -to obtain the driver information: -\code -hid_t driver = H5Pget_driver(fapl); -if (H5FD_SEC2==driver) { - /*nothing further to get*/ -} else if (H5FD_FAMILY==driver) { - hid_t member_fapl; - haddr_t member_size; - H5Pget_fapl_family(fapl, &member_size, &member_fapl); -} else if (....) { - .... -} -\endcode - -\subsection subsec_vfl_use_per Performing I/O -The #H5Dread and #H5Dwrite functions transfer data between application memory and the file. They both take -an optional data transfer property list which has some general driver-independent properties and optional -driver-defined properties. An application will typically perform I/O in one of three styles via the -#H5Dread or #H5Dwrite function: - -Like file access properties in the previous section, data transfer properties can be set using a driver -initialization function or a general purpose function. For example, to set the MPI-IO driver to use -independent access for I/O operations one would say: -\code -hid_t dxpl = H5Pcreate(H5P_DATA_XFER); -H5Pset_dxpl_mpio(dxpl, H5FD_MPIO_INDEPENDENT); -H5Dread(dataset, type, mspace, fspace, buffer, dxpl); -H5Pclose(dxpl); -\endcode - -The alternative is to initialize a driver defined C struct and pass it to the #H5Pset_driver function: -\code -hid_t dxpl = H5Pcreate(H5P_DATA_XFER); -static H5FD_mpio_dxpl_t dx = {H5FD_MPIO_INDEPENDENT}; -H5Pset_driver(dxpl, H5FD_MPIO, &dx); -H5Dread(dataset, type, mspace, fspace, buffer, dxpl); -\endcode - -The transfer property list can be queried in a manner similar to the file access property list: the driver -provides a function (or functions) to return various information about the transfer property list: -\code -hid_t driver = H5Pget_driver(dxpl); -if (H5FD_MPIO==driver) { - H5FD_mpio_xfer_t xfer_mode; - H5Pget_dxpl_mpio(dxpl, &xfer_mode); -} else { - .... -} -\endcode - -\subsection subsec_vfl_use_inter File Driver Interchangeability -The HDF5 specifications describe two things: the mapping of data onto a linear format address -space and the C API which performs the mapping. However, the mapping of the format address space -onto storage intentionally falls outside the scope of the HDF5 specs. This is a direct result of the -fact that it is not generally possible to store information about how to access storage inside the -storage itself. For instance, given only the file name '/arborea/1225/work/f%03d' the HDF5 library -is unable to tell whether the name refers to a file on the local file system, a family of files on -the local file system, a file on host 'arborea' port 1225, a family of files on a remote system, etc. - -Two ways which library could figure out where the storage is located are: storage access information -can be provided by the user, or the library can try all known file access methods. This implementation -uses the former method. - -In general, if a file was created with one driver then it isn't possible to open it with another driver. -There are of course exceptions: a file created with MPIO could probably be opened with the sec2 driver, -any file created by the sec2 driver could be opened as a family of files with one member, etc. In fact, -sometimes a file must not only be opened with the same driver but also with the same driver properties. -The predefined drivers are written in such a way that specifying the correct driver is sufficient for -opening a file. - -\section sec_vfl_imp Implementation of a Driver -A driver is simply a collection of functions and data structures which are registered with the HDF5 -library at runtime. The functions fall into these categories: -\li Functions which operate on modes -\li Functions which operate on files -\li Functions which operate on the address space -\li Functions which operate on data -\li Functions for driver initialization -\li Optimization functions - -\subsection subsec_vfl_imp_mode Mode Functions -Some drivers need information about file access and data transfers which are very specific to the driver. -The information is usually implemented as a pair of pointers to C structs which are allocated and -initialized as part of an HDF5 property list and passed down to various driver functions. There are two -classes of settings: file access modes that describe how to access the file through the driver, and -data transfer modes which are settings that control I/O operations. Each file opened by a particular -driver may have a different access mode; each dataset I/O request for a particular file may have a -different data transfer mode. - -Since each driver has its own particular requirements for various settings, each driver is responsible -for defining the mode structures that it needs. Higher layers of the library treat the structures as -opaque but must be able to copy and free them. Thus, the driver provides either the size of the -structure or a pair of function pointers for each of the mode types. - -Example: The family driver needs to know how the format address space is partitioned and the file -access property list to use for the family members. -\code -// Driver-specific file access properties -typedef struct H5FD_family_fapl_t { - hsize_t memb_size; // size of each family member - hid_t memb_fapl; // file access property list for each family member -} H5FD_family_fapl_t; - -// Driver specific data transfer properties -typedef struct H5FD_family_dxpl_t { - hid_t memb_dxpl_id; //data xfer property list of each member -} H5FD_family_dxpl_t; -\endcode -n order to copy or free one of these structures the member file access or data transfer properties must -also be copied or freed. This is done by providing a copy and close function for each structure: - -Example: The file access property list copy and close functions for the family driver: -\code -static void * -H5FD_family_fapl_copy(const void *_old_fa) -{ - const H5FD_family_fapl_t *old_fa = (const H5FD_family_fapl_t*)_old_fa; - H5FD_family_fapl_t *new_fa = malloc(sizeof(H5FD_family_fapl_t)); - assert(new_fa); - - memcpy(new_fa, old_fa, sizeof(H5FD_family_fapl_t)); - new_fa->memb_fapl_id = H5Pcopy(old_fa->memb_fapl_id); - return new_fa; -} - -static herr_t -H5FD_family_fapl_free(void *_fa) -{ - H5FD_family_fapl_t *fa = (H5FD_family_fapl_t*)_fa; - H5Pclose(fa->memb_fapl_id); - free(fa); - return 0; -} -\endcode - -Generally when a file is created or opened the file access properties for the driver are copied into the -file pointer which is returned and they may be modified from their original value (for instance, the file -family driver modifies the member size property when opening an existing family). In order to support the -#H5Fget_access_plist function the driver must provide a fapl_get callback which creates a copy of the -driver-specific properties based on a particular file. - -Example: The file family driver copies the member size file access property list into the return value: -\code -static void * -H5FD_family_fapl_get(H5FD_t *_file) -{ - H5FD_family_t *file = (H5FD_family_t*)_file; - H5FD_family_fapl_t *fa = calloc(1, sizeof(H5FD_family_fapl_t*)); - - fa->memb_size = file->memb_size; - fa->memb_fapl_id = H5Pcopy(file->memb_fapl_id); - return fa; -} -\endcode - -\subsection subsec_vfl_imp_file File Functions -The higher layers of the library expect files to have a name and allow the file to be accessed in various modes. -The driver must be able to create a new file, replace an existing file, or open an existing file. Opening or -creating a file should return a handle, a pointer to a specialization of the H5FD_t struct, which allows read-only -or read-write access and which will be passed to the other driver functions as they are called.(Read-only access is -only appropriate when opening an existing file.) -\code -typedef struct { - // Public fields - H5FD_class_t *cls; //class data defined below - - // Private fields -- driver-defined - -} H5FD_t; -\endcode - -Example: The family driver requires handles to the underlying storage, the size of the members for this -particular file (which might be different than the member size specified in the file access property list -if an existing file family is being opened), the name used to open the file in case additional members -must be created, and the flags to use for creating those additional members. The eoa member caches the -size of the format address space so the family members don't have to be queried in order to find it. -\code -// The description of a file belonging to this driver. -typedef struct H5FD_family_t { - H5FD_t pub; // public stuff, must be first - hid_t memb_fapl_id; // file access property list for members - hsize_t memb_size; // maximum size of each member file - int nmembs; // number of family members - int amembs; // number of member slots allocated - H5FD_t **memb; // dynamic array of member pointers - haddr_t eoa; // end of allocated addresses - char *name; // name generator printf format - unsigned flags; // flags for opening additional members -} H5FD_family_t; -\endcode - -Example: The sec2 driver needs to keep track of the underlying Unix file descriptor and also the -end of format address space and current Unix file size. It also keeps track of the current file -position and last operation (read, write, or unknown) in order to optimize calls to lseek. The -device and inode fields are defined on Unix in order to uniquely identify the file and will be -discussed below. -\code -typedef struct H5FD_sec2_t { - H5FD_t pub; // public stuff, must be first - int fd; // the unix file - haddr_t eoa; // end of allocated region - haddr_t eof; // end of file; current file size - haddr_t pos; // current file I/O position - int op; // last operation - dev_t device; // file device number - ino_t inode; // file i-node number -} H5FD_sec2_t; -\endcode - -\subsection subsec_vfl_imp_open Open Files -All drivers must define a function for opening/creating a file. This function should have a prototype which is: - - - - - -
static H5FD_t * open (const char *name, unsigned flags, hid_t fapl, haddr_t maxaddr)The file name name and file access property list fapl are the same as were specified in the #H5Fcreate -or #H5Fopen call. The flags are the same as in those calls also except the flag #H5F_ACC_CREAT is also -present if the call was to H5Fcreate and they are documented in the 'H5Fpublic.h' file. The maxaddr -argument is the maximum format address that the driver should be prepared to handle (the minimum address is always zero).
- -Example: The sec2 driver opens a Unix file with the requested name and saves information which -uniquely identifies the file (the Unix device number and inode). -\code -static H5FD_t * -H5FD_sec2_open(const char *name, unsigned flags, hid_t fapl_id/*unused*/, - haddr_t maxaddr) -{ - unsigned o_flags; - int fd; - struct stat sb; - H5FD_sec2_t *file=NULL; - - // Check arguments - if (!name || !*name) return NULL; - if (0==maxaddr || HADDR_UNDEF==maxaddr) return NULL; - if (ADDR_OVERFLOW(maxaddr)) return NULL; - - // Build the open flags - o_flags = (H5F_ACC_RDWR & flags) ? O_RDWR : O_RDONLY; - if (H5F_ACC_TRUNC & flags) o_flags |= O_TRUNC; - if (H5F_ACC_CREAT & flags) o_flags |= O_CREAT; - if (H5F_ACC_EXCL & flags) o_flags |= O_EXCL; - - // Open the file - if ((fd=open(name, o_flags, 0666))<0) return NULL; - if (fstat(fd, &sb)<0) { - close(fd); - return NULL; - } - - // Create the new file struct - file = calloc(1, sizeof(H5FD_sec2_t)); - file->fd = fd; - file->eof = sb.st_size; - file->pos = HADDR_UNDEF; - file->op = OP_UNKNOWN; - file->device = sb.st_dev; - file->inode = sb.st_ino; - - return (H5FD_t*)file; -} -\endcode - -\subsection subsec_vfl_imp_close Closing Files -Closing a file simply means that all cached data should be flushed to the next lower layer, the -file should be closed at the next lower layer, and all file-related data structures should be -freed. All information needed by the close function is already present in the file handle. - - - - - -
static herr_t close (H5FD_t *file)The file argument is the handle which was returned by the open function, and the close should -free only memory associated with the driver-specific part of the handle (the public parts will -have already been released by HDF5's virtual file layer).
- -Example: The sec2 driver just closes the underlying Unix file, making sure that the actual -file size is the same as that known to the library by writing a zero to the last file position -it hasn't been written by some previous operation (which happens in the same code which flushes -the file contents and is shown below). -\code -static herr_t -H5FD_sec2_close(H5FD_t *_file) -{ - H5FD_sec2_t *file = (H5FD_sec2_t*)_file; - - if (H5FD_sec2_flush(_file)<0) return -1; - if (close(file->fd)<0) return -1; - free(file); - return 0; -} -\endcode - -\subsection subsec_vfl_imp_key File Keys -Occasionally an application will attempt to open a single file more than one time in order -to obtain multiple handles to the file. HDF5 allows the files to share information(For instance, -writing data to one handle will cause the data to be immediately visible on the other handle.) -but in order to accomplish this HDF5 must be able to tell when two names refer to the same file. -It does this by associating a driver-defined key with each file opened by a driver and comparing -the key for an open request with the keys for all other files currently open by the same driver. - - - - - -
const int cmp (const H5FD_t *f1, const H5FD_t *f2)The driver may provide a function which compares two files f1 and f2 belonging to the same -driver and returns a negative, positive, or zero value a la the strcmp function.(The ordering -is arbitrary as long as it's consistent within a particular file driver.) If this function is -not provided then HDF5 assumes that all calls to the open callback return unique files regardless -of the arguments and it is up to the application to avoid doing this if that assumption is incorrect.
- -Each time a file is opened the library calls the cmp function to compare that file with all other files -currently open by the same driver and if one of them matches (at most one can match) then the file -which was just opened is closed and the previously opened file is used instead. - -Opening a file twice with incompatible flags will result in failure. For instance, opening a file with -the truncate flag is a two step process which first opens the file without truncation so keys can be -compared, and if no matching file is found already open then the file is closed and immediately reopened -with the truncation flag set (if a matching file is already open then the truncating open will fail). - -Example: The sec2 driver uses the Unix device and i-node as the key. They were initialized when -the file was opened. -\code -static int -H5FD_sec2_cmp(const H5FD_t *_f1, const H5FD_t *_f2) -{ - const H5FD_sec2_t *f1 = (const H5FD_sec2_t*)_f1; - const H5FD_sec2_t *f2 = (const H5FD_sec2_t*)_f2; - - if (f1->device < f2->device) return -1; - if (f1->device > f2->device) return 1; - - if (f1->inode < f2->inode) return -1; - if (f1->inode > f2->inode) return 1; - - return 0; -} -\endcode - -\subsection subsec_vfl_imp_save Saving Modes Across Opens -Some drivers may also need to store certain information in the file superblock in order -to be able to reliably open the file at a later date. This is done by three functions: -one to determine how much space will be necessary to store the information in the superblock, -one to encode the information, -and one to decode the information. These functions are optional, but if any one is defined -then the other two must also be defined. - - - - - - - - - - - - - - - - - -
FunctionDescription
static hsize_t sb_size (H5FD_t *file)The sb_size function returns the number of bytes necessary to encode -information needed later if the file is reopened.
static herr_t sb_encode (H5FD_t *file, char *name, unsigned char *buf)The sb_encode function encodes information from the file into buffer buf -allocated by the caller. It also writes an 8-character (plus null termination) into -the name argument, which should be a unique identification for the driver.
static herr_t sb_decode (H5FD_t *file, const char *name, const unsigned char *buf)The sb_decode function looks at the name decodes data from the buffer buf and -updates the file argument with the new information, advancing *p in the process.
-The part of this which is somewhat tricky is that the file must be readable before the -superblock information is decoded. File access modes fall outside the scope of the HDF5 -file format, but they are placed inside the boot block for convenience.(File access modes -do not describe data, but rather describe how the HDF5 format address space is mapped to -the underlying file(s). Thus, in general the mapping must be known before the file -superblock can be read. However, the user usually knows enough about the mapping for -the superblock to be readable and once the superblock is read the library can fill -in the missing parts of the mapping.) - -\section sec_vfl_address Address Space Functions -HDF5 does not assume that a file is a linear address space of bytes. Instead, the library -will call functions to allocate and free portions of the HDF5 format address space, which -in turn map onto functions in the file driver to allocate and free portions of file address -space. The library tells the file driver how much format address space it wants to allocate -and the driver decides what format address to use and how that format address is mapped -onto the file address space. Usually the format address is chosen so that the file address -can be calculated in constant time for data I/O operations (which are always specified by format addresses). - -\subsection subsec_vfl_address_blk Userblock and Superblock -The HDF5 format allows an optional userblock to appear before the actual HDF5 data in such -a way that if the userblock is sucked out of the file and everything remaining is -shifted downward in the file address space, then the file is still a valid HDF5 file. -The userblock size can be zero or any multiple of two greater than or equal to 512 and -the file superblock begins immediately after the userblock. - -HDF5 allocates space for the userblock and superblock by calling an allocation function -defined below, which must return a chunk of memory at format address zero on the first call. - -\subsection subsec_vfl_address_alloc Allocatiion of Format Regions -The library makes many types of allocation requests: - - - - - - - - - - - - - - - - - - - - -
#H5FD_MEM_SUPERuserblock
#H5FD_MEM_BTREEAn allocation request for a node of a B-tree. -
#H5FD_MEM_DRAWAn allocation request for the raw data of a dataset. -
#H5FD_MEM_GHEAPAn allocation request for a global heap collection. Global -heaps are used to store certain types of references such as dataset region references. -The set of all global heap collections can become quite large. -
#H5FD_MEM_LHEAPAn allocation request for a local heap. Local heaps are used -to store the names which are members of a group. The combined size of all local heaps is -a function of the number of object names in the file. -
#H5FD_MEM_OHDRAn allocation request for (part of) an object header. Object -headers are relatively small and include meta information about objects (like the data -space and type of a dataset) and attributes. -
- -When a chunk of memory is freed the library adds it to a free list and allocation requests -are satisfied from the free list before requesting memory from the file driver. Each type of -allocation request enumerated above has its own free list, but the file driver can specify that -certain object types can share a free list. It does so by providing an array which maps a -request type to a free list. If any value of the map is H5MF_DEFAULT (zero) then the object's -own free list is used. The special value H5MF_NOLIST indicates that the library should not -attempt to maintain a free list for that particular object type, instead calling the file driver -each time an object of that type is freed. - -Mappings predefined in the 'H5FDpublic.h' file are: - - - - - - - - - - -
#H5FD_FLMAP_SINGLEAll memory usage types are mapped to a single free list. -
#H5FD_FLMAP_DICHOTOMYMemory usage is segregated into meta data and raw data -for the purposes of memory management. -
#H5FD_FLMAP_DEFAULTEach memory usage type has its own free list. -
- -Example: To make a map that manages object headers on one free list and everything else on -another free list one might initialize the map with the following code: (the use of #H5FD_MEM_SUPER is arbitrary) -\code -H5FD_mem_t mt, map[H5FD_MEM_NTYPES]; - -for (mt = 0; mt < H5FD_MEM_NTYPES; mt++) { - map[mt] = (H5FD_MEM_OHDR== mt) ? mt : H5FD_MEM_SUPER; -} -\endcode - -If an allocation request cannot be satisfied from the free list then one of two things happen. -If the driver defines an allocation callback then it is used to allocate space; otherwise new -memory is allocated from the end of the format address space by incrementing the end-of-address marker. - - - - - -
static haddr_t alloc (H5FD_t *file, H5MF_type_t type, hsize_t size)The file argument is the file from which space is to be allocated, type is the type of -memory being requested (from the list above) without being mapped according to the freelist -map and size is the number of bytes being requested. The library is allowed to allocate large -chunks of storage and manage them in a layer above the file driver (although the current library -doesn't do that). The allocation function should return a format address for the first byte -allocated. The allocated region extends from that address for size bytes. If the request cannot -be honored then the undefined address value is returned (#HADDR_UNDEF). The first call to this -function for a file which has never had memory allocated must return a format address of zero -or #HADDR_UNDEF since this is how the library allocates space for the userblock and/or superblock.
- -\subsection subsec_vfl_address_free Freeing Format Regions -When the library is finished using a certain region of the format address space it will return the -space to the free list according to the type of memory being freed and the free list map described above. -If the free list has been disabled for a particular memory usage type (according to the free list map) -and the driver defines a free callback then it will be invoked. The free callback is also invoked for -all entries on the free list when the file is closed. - - - - - - -
static herr_t free (H5FD_t *file, H5MF_type_t type, haddr_t addr, hsize_t size)The file argument is the file for which space is being freed; type is the type of object being -freed (from the list above) without being mapped according to the freelist map; addr is the first -format address to free; and size is the size in bytes of the region being freed. The region being -freed may refer to just part of the region originally allocated and/or may cross allocation boundaries -provided all regions being freed have the same usage type. However, the library will never attempt -to free regions which have already been freed or which have never been allocated.
-A driver may choose to not define the free function, in which case format addresses will be leaked. -This isn't normally a huge problem since the library contains a simple free list of its own and freeing -parts of the format address space is not a common occurrence. - -\subsection subsec_vfl_address_query Querying the Address Range -Each file driver must have some mechanism for setting and querying the end of address, or -EOA, marker. The EOA marker is the first format address after the last format address ever allocated. -If the last part of the allocated address range is freed then the driver may optionally decrease the eoa marker. - - - - - -
static haddr_t get_eoa (H5FD_t *file)This function returns the current value of the EOA marker for the specified file.
- -Example: The sec2 driver just returns the current eoa marker value which is cached in the file structure: -\code -static haddr_t -H5FD_sec2_get_eoa(H5FD_t *_file) -{ - H5FD_sec2_t *file = (H5FD_sec2_t*)_file; - return file->eoa; -} -\endcode - -The eoa marker is initially zero when a file is opened and the library may set it to some other value -shortly after the file is opened (after the superblock is read and the saved eoa marker is determined) -or when allocating additional memory in the absence of an alloc callback (described above). - -Example: The sec2 driver simply caches the eoa marker in the file structure and does not extend the -underlying Unix file. When the file is flushed or closed then the Unix file size is extended to match -the eoa marker. -\code -static herr_t -H5FD_sec2_set_eoa(H5FD_t *_file, haddr_t addr) -{ - H5FD_sec2_t *file = (H5FD_sec2_t*)_file; - file->eoa = addr; - return 0; -} -\endcode - -\section sec_vfl_data Data Functions -These functions operate on data, transferring a region of the format address space between memory and files. - -\subsection subsec_vfl_data_cont Contiguous I/O Functions -A driver must specify two functions to transfer data from the library to the file and vice versa. - - - - - - - - - -
static herr_t read (H5FD_t *file, H5FD_mem_t type, hid_t dxpl, haddr_t addr, hsize_t size, void *buf)The read function reads data from file file beginning at address addr and continuing -for size bytes into the buffer buf supplied by the caller.
static herr_t write (H5FD_t *file, H5FD_mem_t type, hid_t dxpl, haddr_t addr, hsize_t size, const void *buf)The write function transfers data -in the opposite direction.
-\li Both functions take a data transfer property list dxpl which -indicates the fine points of how the data is to be transferred and which comes directly -from the #H5Dread or #H5Dwrite function. -\li Both functions receive type of data being written, -which may allow a driver to tune it's behavior for different kinds of data. -\li Both functions should return -a negative value if they fail to transfer the requested data, or non-negative if they -succeed. The library will never attempt to read from unallocated regions of the format address space. - -Example: The sec2 driver just makes system calls. It tries not to call lseek if the current operation -is the same as the previous operation and the file position is correct. It also fills the output buffer -with zeros when reading between the current EOF and EOA markers and restarts system calls which were interrupted. -\code -static herr_t -H5FD_sec2_read(H5FD_t *_file, H5FD_mem_t type/*unused*/, hid_t dxpl_id/*unused*/, - haddr_t addr, hsize_t size, void *buf/*out*/) -{ - H5FD_sec2_t *file = (H5FD_sec2_t*)_file; - ssize_t nbytes; - - assert(file && file->pub.cls); - assert(buf); - - /* Check for overflow conditions */ - if (REGION_OVERFLOW(addr, size)) return -1; - if (addr+size>file->eoa) return -1; - - /* Seek to the correct location */ - if ((addr!=file->pos || OP_READ!=file->op) && - file_seek(file->fd, (file_offset_t)addr, SEEK_SET)<0) { - file->pos = HADDR_UNDEF; - file->op = OP_UNKNOWN; - return -1; - } - - /* - * Read data, being careful of interrupted system calls, partial results, - * and the end of the file. - */ - while (size>0) { - do nbytes = read(file->fd, buf, size); - while (-1==nbytes && EINTR==errno); - if (-1==nbytes) { - /* error */ - file->pos = HADDR_UNDEF; - file->op = OP_UNKNOWN; - return -1; - } - if (0==nbytes) { - /* end of file but not end of format address space */ - memset(buf, 0, size); - size = 0; - } - assert(nbytes>=0); - assert((hsize_t)nbytes<=size); - size -= (hsize_t)nbytes; - addr += (haddr_t)nbytes; - buf = (char*)buf + nbytes; - } - - /* Update current position */ - file->pos = addr; - file->op = OP_READ; - return 0; -} -\endcode -Example: The sec2 write callback is similar except it updates the file EOF marker when extending the file. - -\subsection subsec_vfl_data_flush Flushing Cached Data -Some drivers may desire to cache data in memory in order to make larger I/O requests to the -underlying file and thus improving bandwidth. Such drivers should register a cache flushing -function so that the library can insure that data has been flushed out of the drivers in -response to the application calling #H5Fflush. - - - - - -
static herr_t flush (H5FD_t *file)Flush all data for file file to storage.
- -Example: The sec2 driver doesn't cache any data but it also doesn't extend the Unix file as -aggressively as it should. Therefore, when finalizing a file it should write a zero to the last -byte of the allocated region so that when reopening the file later the EOF marker will be at -least as large as the EOA marker saved in the superblock (otherwise HDF5 will refuse to open -the file, claiming that the data appears to be truncated). -\code -static herr_t -H5FD_sec2_flush(H5FD_t *_file) -{ - H5FD_sec2_t *file = (H5FD_sec2_t*)_file; - - if (file->eoa>file->eof) { - if (-1==file_seek(file->fd, file->eoa-1, SEEK_SET)) return -1; - if (write(file->fd, "", 1)!=1) return -1; - file->eof = file->eoa; - file->pos = file->eoa; - file->op = OP_WRITE; - } - - return 0; -} -\endcode - -\section sec_vfl_opt Optimization Functions -The library is capable of performing several generic optimizations on I/O, but these types of -optimizations may not be appropriate for a given VFL driver. - -Each driver may provide a query function to allow the library to query whether to enable these -optimizations. If a driver lacks a query function, the library will disable all types of -optimizations which can be queried. - - - - - - -
static herr_t query (const H5FD_t *file, unsigned long *flags)This function is called by the library to query which optimizations to enable for I/O to this driver.
- -These are the flags which are currently defined: - - - - - - - - - - - - - -
H5FD_FEAT_AGGREGATE_METADATA (0x00000001)Defining the H5FD_FEAT_AGGREGATE_METADATA for a VFL driver means that the library will attempt to allocate -a larger block for metadata and then sub-allocate each metadata request from that larger block.
H5FD_FEAT_ACCUMULATE_METADATA (0x00000002)Defining the H5FD_FEAT_ACCUMULATE_METADATA for a VFL driver means that the library will attempt to cache -metadata as it is written to the file and build up a larger block of metadata to eventually pass to the -VFL 'write' routine.
H5FD_FEAT_DATA_SIEVE (0x00000004)Defining the H5FD_FEAT_DATA_SIEVE for a VFL driver means that the library will attempt to cache raw data - as it is read from/written to a file in a "data sieve" buffer.
- -See Rajeev Thakur's papers: -http://www.mcs.anl.gov/~thakur/papers/romio-coll.ps.gz -http://www.mcs.anl.gov/~thakur/papers/mpio-high-perf.ps.gz - -\section sec_vfl_reg Registration of a Driver -Before a driver can be used the HDF5 library needs to be told of its existence. This is done by -registering the driver, which results in a driver identification number. Instead of passing many -arguments to the registration function, the driver information is entered into a structure and the -address of the structure is passed to the registration function where it is copied. This allows -the HDF5 API to be extended while providing backward compatibility at the source level. - - - - - - -
hid_t H5FDregister (H5FD_class_t *cls)The driver described by struct cls is registered with the library and an ID number for the driver is returned.
- -The H5FD_class_t type is a struct with the following fields: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
const char *nameA pointer to a constant, null-terminated driver name to be used for debugging purposes.
size_t fapl_sizeThe size in bytes of the file access mode structure or zero if the driver supplies a copy function -or doesn't define the structure.
void *(*fapl_copy)(const void *fapl)An optional function which copies a driver-defined file access mode structure. This field takes -precedence over fm_size when both are defined.
void (*fapl_free)(void *fapl)An optional function to free the driver-defined file access mode structure. If null, then the -library calls the C free function to free the structure.
size_t dxpl_sizeThe size in bytes of the data transfer mode structure or zero if the driver supplies a copy -function or doesn't define the structure.
void *(*dxpl_copy)(const void *dxpl)An optional function which copies a driver-defined data transfer mode structure. This field -takes precedence over xm_size when both are defined.
void (*dxpl_free)(void *dxpl)An optional function to free the driver-defined data transfer mode structure. If null, then -the library calls the C free function to free the structure.
H5FD_t *(*open)(const char *name, unsigned flags, hid_t fapl, haddr_t maxaddr)The function which opens or creates a new file.
herr_t (*close)(H5FD_t *file)The function which ends access to a file.
int (*cmp)(const H5FD_t *f1, const H5FD_t *f2)An optional function to determine whether two open files have the same key. If this function -is not present then the library assumes that two files will never be the same.
int (*query)(const H5FD_t *f, unsigned long *flags)An optional function to determine which library optimizations a driver can support.
haddr_t (*alloc)(H5FD_t *file, H5FD_mem_t type, hsize_t size)An optional function to allocate space in the file.
herr_t (*free)(H5FD_t *file, H5FD_mem_t type, haddr_t addr, hsize_t size)An optional function to free space in the file.
haddr_t (*get_eoa)(H5FD_t *file)A function to query how much of the format address space has been allocated.
herr_t (*set_eoa)(H5FD_t *file, haddr_t)A function to set the end of address space.
haddr_t (*get_eof)(H5FD_t *file)A function to return the current end-of-file marker value.
herr_t (*read)(H5FD_t *file, H5FD_mem_t type, hid_t dxpl, haddr_t addr, hsize_t size, void *buffer)A function to read data from a file.
herr_t (*write)(H5FD_t *file, H5FD_mem_t type, hid_t dxpl, haddr_t addr, hsize_t size, const void *buffer)A function to write data to a file.
herr_t (*flush)(H5FD_t *file)A function which flushes cached data to the file.
H5FD_mem_t fl_map[H5FD_MEM_NTYPES]An array which maps a file allocation request type to a free list.
- -Example: The sec2 driver would be registered as: -\code -static const H5FD_class_t H5FD_sec2_g = { - "sec2", /*name */ - MAXADDR, /*maxaddr */ - NULL, /*sb_size */ - NULL, /*sb_encode */ - NULL, /*sb_decode */ - 0, /*fapl_size */ - NULL, /*fapl_get */ - NULL, /*fapl_copy */ - NULL, /*fapl_free */ - 0, /*dxpl_size */ - NULL, /*dxpl_copy */ - NULL, /*dxpl_free */ - H5FD_sec2_open, /*open */ - H5FD_sec2_close, /*close */ - H5FD_sec2_cmp, /*cmp */ - H5FD_sec2_query, /*query */ - NULL, /*alloc */ - NULL, /*free */ - H5FD_sec2_get_eoa, /*get_eoa */ - H5FD_sec2_set_eoa, /*set_eoa */ - H5FD_sec2_get_eof, /*get_eof */ - H5FD_sec2_read, /*read */ - H5FD_sec2_write, /*write */ - H5FD_sec2_flush, /*flush */ - H5FD_FLMAP_SINGLE, /*fl_map */ -}; - -hid_t -H5FD_sec2_init(void) -{ - if (!H5FD_SEC2_g) { - H5FD_SEC2_g = H5FDregister(&H5FD_sec2_g); - } - return H5FD_SEC2_g; -} -\endcode - -A driver can be removed from the library by unregistering it - - - - - -
herr_t H5Dunregister (hid_t driver)Where driver is the ID number returned when the driver was registered.
-Unregistering a driver makes it unusable for creating new file access or data transfer property -lists but doesn't affect any property lists or files that already use that driver. - -\subsection subsec_vfl_reg_prog Programming Note for C++ Developers Using C Functions -If a C routine that takes a function pointer as an argument is called from within C++ code, -the C routine should be returned from normally. - -Examples of this kind of routine include callbacks such as #H5Pset_elink_cb -and #H5Pset_type_conv_cb and functions such as #H5Tconvert and #H5Ewalk2. - -Exiting the routine in its normal fashion allows the HDF5 C Library to clean up -its work properly. In other words, if the C++ application jumps out of the routine -back to the C++ “catch” statement, the library is not given the opportunity to close -any temporary data structures that were set up when the routine was called. The C++ -application should save some state as the routine is started so that any problem that -occurs might be diagnosed. - -\section sec_vfl_query Querying Driver Information - - - - - -
void * H5Pget_driver_data (hid_t fapl)
void * H5Pget_driver_data (hid_t fxpl)
This function is intended to be used by driver functions, not applications. It returns a pointer -directly into the file access property list fapl which is a copy of the driver's file access mode -originally provided to the H5Pset_driver function. If its argument is a data transfer property list -fxpl then it returns a pointer to the driver-specific data transfer information instead. -
- -\section sec_vfl_misc Miscellaneous -The various private H5F_low_* functions will be replaced by public H5FD* functions so they -can be called from drivers. - -All private functions H5F_addr_* which operate on addresses will be renamed as public functions -by removing the first underscore so they can be called by drivers. - -The haddr_t address data type will be passed by value throughout the library. The original -intent was that this type would eventually be a union of file address types for the various -drivers and may become quite large, but that was back when drivers were part of HDF5. It will -become an alias for an unsigned integer type (32 or 64 bits depending on how the library was configured). - -The various H5F*.c driver files will be renamed H5FD*.c and each will have a corresponding header -file. All driver functions except the initializer and API will be declared static. - -This documentation didn't cover optimization functions which would be useful to drivers like MPI-IO. -Some drivers may be able to perform data pipeline operations more efficiently than HDF5 and need to -be given a chance to override those parts of the pipeline. The pipeline would be designed to call -various H5FD optimization functions at various points which return one of three values: the operation -is not implemented by the driver, the operation is implemented but failed in a non-recoverable manner, -the operation is implemented and succeeded. - -Various parts of HDF5 check the only the top-level file driver and do something special if it is -the MPI-IO driver. However, we might want to be able to put the MPI-IO driver under other drivers -such as the raw part of a split driver or under a debug driver whose sole purpose is to accumulate -statistics as it passes all requests through to the MPI-IO driver. Therefore we will probably need -a function which takes a format address and or object type and returns the driver which would have -been used at the lowest level to process the request. - -*/ - /** \page FMTDISC HDF5 File Format Discussion \htmlinclude FileFormat.html */ -/** \page APPDBG Debugging HDF5 Applications - -\section sec_adddbg_intro Introduction -The HDF5 library contains a number of debugging features to make programmers' lives -easier including the ability to print detailed error messages, check invariant -conditions, display timings and other statistics. - -\subsection subsec_adddbg_intro_err Error Messages -Error messages are normally displayed automatically on the standard error stream and -include a stack trace of the library including file names, line numbers, and function -names. The application has complete control over how error messages are displayed and -can disable the display on a permanent or temporary basis. Refer to the documentation - for the H5E error handling package. - -\subsection subsec_adddbg_intro_invar Invariant Conditions -Unless NDEBUG is defined during compiling, the library will include code to verify that -invariant conditions have the expected values. When a problem is detected the library will -display the file and line number within the library and the invariant condition that -failed. A core dump may be generated for post mortem debugging. The code to perform these -checks can be included on a per-package bases. - -\subsection subsec_adddbg_intro_stats Timings and Statistics -The library can be configured to accumulate certain statistics about things like cache -performance, datatype conversion, data space conversion, and data filters. The code is -included on a per-package basis and enabled at runtime by an environment variable. - -\subsection subsec_adddbg_intro_trace API Tracing -All API calls made by an application can be displayed and include formal argument names -and actual values and the function return value. This code is also conditionally included -at compile time and enabled at runtime. - -The statistics and tracing can be displayed on any output stream (including streams opened by -the shell) with output from different packages even going to different streams. - -\section sec_adddbg_msg Error Messages -By default any API function that fails will print an error stack to the standard error stream. -\code -HDF5-DIAG: Error detected in thread 0. Back trace follows. - #000: H5F.c line 1245 in H5Fopen(): unable to open file - major(04): File interface - minor(10): Unable to open file - #001: H5F.c line 846 in H5F_open(): file does not exist - major(04): File interface - minor(10): Unable to open file -\endcode -The error handling package (H5E) is described elsewhere. - -\section sec_adddbg_invars Invariant Conditions -To include checks for invariant conditions the library should be configured -with --disable-production, the default for versions before 1.2. The library -designers have made every attempt to handle error conditions gracefully but -an invariant condition assertion may fail in certain cases. The output from -a failure usually looks something like this: -\code -Assertion failed: H5.c:123: i - -Name - -Default - -Description - - - -aNoAttributes - - -acYesMeta data cache - - -bYesB-Trees - - -dYesDatasets - - -eYesError handling - - -fYesFiles - - -gYesGroups - - -hgYesGlobal heap - - -hlNoLocal heaps - - -iYesInterface abstraction - - -mfNoFile memory management - - -mmYesLibrary memory management - - -oNoObject headers and messages - - -pYesProperty lists - - -sYesData spaces - - -tYesDatatypes - - -vYesVectors - - -zYesRaw data filters - - - -In addition to including the code at compile time the application must enable each package at -runtime. This is done by listing the package names in the HDF5_DEBUG environment variable. That -variable may also contain file descriptor numbers (the default is '2') which control the output -for all following packages up to the next file number. The word 'all' refers to all packages. Any -word my be preceded by a minus sign to turn debugging off for the package. - -\subsection subsec_adddbg_stats_sample Sample debug specifications - - - - - - - - - - - - - -
all -This causes debugging output from all packages to be sent to the standard error stream. -
all -t -s -Debugging output for all packages except datatypes and data spaces will appear on the standard error stream. -
-all ac 255 t,s -This disables all debugging even if the default was to debug something, then output -from the meta data cache is send to the standard error stream and output from data types -and spaces is sent to file descriptor 255 which should be redirected by the shell. -
-The components of the HDF5_DEBUG value may be separated by any non-lowercase letter. - -*/ - -/** \page SWMR Introduction to Single-Writer/Multiple-Reader (SWMR) - -\section sec_swmr_intro Introduction to SWMR -The Single-Writer / Multiple-Reader (SWMR) feature enables multiple processes to read an HDF5 file -while it is being written to (by a single process) without using locks or requiring communication between processes. -tutr-swmr1.png - -All communication between processes must be performed via the HDF5 file. The HDF5 file under SWMR access must -reside on a system that complies with POSIX write() semantics. - -The basic engineering challenge for this to work was to ensure that the readers of an HDF5 file always -see a coherent (though possibly not up to date) HDF5 file. - -The issue is that when writing data there is information in the metadata cache in addition to the physical file on disk: -tutr-swmr2.png - -However, the readers can only see the state contained in the physical file: -tutr-swmr3.png - -The SWMR solution implements dependencies on when the metadata can be flushed to the file. This ensures that metadata cache -flush operations occur in the proper order, so that there will never be internal file pointers in the physical file -that point to invalid (unflushed) file addresses. - -A beneficial side effect of using SWMR access is better fault tolerance. It is more difficult to corrupt a file when using SWMR. - -\subsection subsec_swmr_doc Documentation -\subsubsection subsubsec_swmr_doc_guide User Guide -SWMR User Guide - -\subsubsection subsubsec_swmr_doc_apis HDF5 Library APIs -
    -
  • #H5Fstart_swmr_write — Enables SWMR writing mode for a file
  • -
  • #H5DOappend — Appends data to a dataset along a specified dimension
  • -
  • #H5Pset_object_flush_cb — Sets a callback function to invoke when an object flush occurs in the file
  • -
  • #H5Pget_object_flush_cb — Retrieves the object flush property values from the file access property list
  • -
  • #H5Odisable_mdc_flushes — Prevents metadata entries for an HDF5 object from being flushed from the metadata cache to storage
  • -
  • #H5Oenable_mdc_flushes — Enables flushing of dirty metadata entries from a file's metadata cache
  • -
  • #H5Oare_mdc_flushes_disabled — Determines if an HDF5 object has had flushes of metadata entries disabled
  • -
- -\subsubsection subsubsec_swmr_doc_tools Tools -\li h5watch — Outputs new records appended to a dataset as the dataset grows -\li h5format_convert — Converts the layout format version and chunked indexing types of datasets created with -HDF5-1.10 so that applications built with HDF5-1.8 can access them -\li h5clear — Clears superblock status_flags field, removes metadata cache image, prints EOA and EOF, or sets EOA of a file - -\subsubsection subsubsec_swmr_doc_design Design Documents - -\subsection subsec_swmr_model Programming Model -Please be aware that the SWMR feature requires that an HDF5 file be created with the latest file format. See -#H5Pset_libver_bounds for more information. - -To use SWMR follow the the general programming model for creating and accessing HDF5 files and objects along with the steps described below. - -\subsubsection subsubsec_swmr_model_writer SWMR Writer -The SWMR writer either opens an existing file and objects or creates them as follows. - -Open an existing file: -Call #H5Fopen using the #H5F_ACC_SWMR_WRITE flag. -Begin writing datasets. -Periodically flush data. - -Create a new file: -Call #H5Fcreate using the latest file format. -Create groups, datasets and attributes, and then close the attributes. -Call #H5Fstart_swmr_write to start SWMR access to the file. -Periodically flush data. - -

Example Code:

-Create the file using the latest file format property: -\code - fapl = H5Pcreate (H5P_FILE_ACCESS); - status = H5Pset_libver_bounds (fapl, H5F_LIBVER_LATEST, H5F_LIBVER_LATEST); - fid = H5Fcreate (filename, H5F_ACC_TRUNC, H5P_DEFAULT, fapl); - // Create objects (files, datasets, ...). - // Close any attributes and named datatype objects. - // Groups and datasets may remain open before starting SWMR access to them. - - // Start SWMR access to the file: - status = H5Fstart_swmr_write (fid); - - // Reopen the datasets and then start writing, periodically flushing data: - status = H5Dwrite (dset_id, ...); - status = H5Dflush (dset_id); -\endcode - -\subsubsection subsubsec_swmr_model_reader SWMR Reader -The SWMR reader must continually poll for new data: - -Call #H5Fopen using the #H5F_ACC_SWMR_READ flag. -Poll, checking the size of the dataset to see if there is new data available for reading. -Read new data, if any. - -

Example Code:

-\code - // Open the file using the SWMR read flag: - fid = H5Fopen (filename, H5F_ACC_RDONLY | H5F_ACC_SWMR_READ, H5P_DEFAULT); - // Open the dataset and then repeatedly poll the dataset, by getting the dimensions, reading new data, and refreshing: - dset_id = H5Dopen (...); - space_id = H5Dget_space (...); - while (...) { - status = H5Dread (dset_id, ...); - status = H5Drefresh (dset_id); - space_id = H5Dget_space (...); - } -\endcode - -\subsection subsec_swmr_scope Limitations and Scope -An HDF5 file under SWMR access must reside on a system that complies with POSIX write() -semantics. It is also limited in scope as follows. - -The writer process is only allowed to modify raw data of existing datasets by; -Appending data along any unlimited dimension. -Modifying existing data -The following operations are not allowed (and the corresponding HDF5 files will fail) -\li The writer cannot add new objects to the file. -\li The writer cannot delete objects in the file. -\li The writer cannot modify or append data with variable length, string or region reference datatypes. -\li File space recycling is not allowed. As a result the size of a file modified by a SWMR writer may be larger than a file modified by a non-SWMR writer.

- -\subsection subsec_swmr_tools Tools for Working with SWMR -Two new tools, h5watch and h5clear, are available for use with SWMR. The other HDF5 utilities have also been modified to recognize SWMR -\li The h5watch tool allows a user to monitor the growth of a dataset. -\li The h5clear tool clears the status flags in the superblock of an HDF5 file. -\li The rest of the HDF5 tools will exit gracefully but not work with SWMR otherwise. - -\subsection subsec_swmr_example Programming Example -A good example of using SWMR is included with the HDF5 tests in the source code. You can run it while reading -the file it creates. If you then interrupt the application and reader and look at the resulting file, you will -see that the file is still valid. Follow these steps: -\li Download the HDF5 source code to a local directory on a filesystem (that complies with POSIX write() semantics). -Build the software. No special configuration options are needed to use SWMR. -\li Invoke two command terminal windows. In one window go into the bin directory of the built binaries. -In the other window go into the test directory of the HDF5-1.10 source code that was just built. -\li In the window in the test directory compile and run use_append_chunk.c. The example writes a three -dimensional dataset by planes (with chunks of size 1 x 256 x 256). -\li In the other window (in the bin directory) run h5watch on the file created by -use_append_chunk.c (use_append_chunk.h5). It should be run while use_append_chunk is executing and you -will see valid data displayed with h5watch. -\li Interrupt use_append_chunk while it is running, and stop h5watch. -\li Use h5clear to clear the status flags in the superblock of the HDF5 file (use_append_chunk.h5). -\li View the file with h5dump. You will see that it is a valid file even though the application did not -close properly. It will contain data up to the point that it was interrupted. - -*/ - -/** \page UNICODE Using UTF-8 Encoding in HDF5 Applications - -\section sec_unicode_intro Introduction -Text and character data are often discussed as though text means ASCII text. We even go so far as -to call a file containing only ASCII text a plain text file. This works reasonably well for English -(though better for American English than British English), but what if that plain text file is in -French, German, Chinese, or any of several hundred other languages? This document introduces the -use of UTF-8 encoding (see note 1), enabling the use of a much more extensive and flexible character -set that can faithfully represent any of those languages. - -This document assumes a working familiarity with UTF-8 and Unicode. Any reader who is unfamiliar -with UTF-8 encoding should read the [Wikipedia UTF-8 article](https://en.wikipedia.org/wiki/UTF-8) -before proceeding; it provides an excellent primer. - -For our context, the most important UTF-8 concepts are: -\li Multi-byte and variable-size character encodings -\li Limitations of the ASCII character set -\li Risks associated with the use of the term plain text -\li Representation of multiple language alphabets or characters in a single document - -More specific technical details will only become important if they affect the specifics of -your application design or implementation. - -\section sec_unicode_support How and Where Is UTF-8 Supported in HDF5? -HDF5 uses characters in object names (which are actually link names, but that's a story for a -different article), dataset raw data, attribute names, and attribute raw data. Though the -mechanisms differ, you can use either ASCII or UTF-8 character sets in all of these situations. - -\subsection subsec_unicode_support_names Object and Attribute Names -By default, HDF5 creates object and attribute names with ASCII character encoding. An object or -attribute creation property list setting is required to create object names with UTF-8 characters. -This uses the function #H5Pset_char_encoding, which sets the character encoding used for object and attribute names. - -For example, the following call sequence could be used to create a dataset with its name encoded with the UTF-8 character set: - -\code - lcpl_id = H5Pcreate(H5P_LINK_CREATE) ; - error = H5Pset_char_encoding(lcpl_id, H5T_CSET_UTF8) ; - dataset_id = H5Dcreate2(group_id, "datos_ñ", datatype_id, dataspace_id, - lcpl_id, H5P_DEFAULT, H5P_DEFAULT) ; -\endcode - -If the character encoding of an attribute name is unknown, the combination of an -#H5Aget_create_plist call and an #H5Pget_char_encoding call will reveal that information. -If the character encoding of an object name is unknown, the information can be accessed -through the object's H5L_info_t structure which can be obtained using #H5Lvisit or #H5Lget_info_by_idx calls. - -\subsection subsec_unicode_support_char Character Datatypes in Datasets and Attributes -Like object names, HDF5 character data in datasets and attributes is encoded as ASCII by -default. Setting up attribute or dataset character data to be UTF-8-encoded is accomplished -while defining the attribute or dataset datatype. This makes use of the function #H5Tset_cset, -which sets the character encoding to be used in building a character datatype. - -For example, the following commands could be used to create an 8-character, UTF-8 encoded, -string datatype for use in either an attribute or dataset: - -\code - datatype_id = H5Tcopy(H5T_C_S1); - error = H5Tset_cset(datatype_id, H5T_CSET_UTF8); - error = H5Tset_size(datatype_id, "8"); -\endcode - -If a character or string datatype's character encoding is unknown, the combination of an -#H5Aget_type or #H5Dget_type call and an #H5Tget_cset call can be used to determine that. - -\section sec_unicode_warn Caveats, Pitfalls, and Things to Watch For -Programmers who are accustomed to using ASCII text without accommodating other text -encodings will have to be aware of certain common issues as they begin using UTF-8 encodings. - -\subsection subsec_unicode_warn_port Cross-platform Portability -Since the HDF5 Library handles datatypes directly, UTF-8 encoded text in dataset and -attribute datatypes in a well-designed HDF5 application and file should work transparently -across platforms. The same should be true of handling names of groups, datasets, committed -datatypes, and attributes within a file. - -Be aware, however, of system or application limitations once data or other information -has been extracted from an HDF5 file. The application or system must be designed to -accommodate UTF-8 encodings if the information is then used elsewhere in the application or system environment. - -Data from a UTF-8 encoded HDF5 datatype, in either a dataset or an attribute, -that has been established within an HDF5 application should "just work" within the HDF5 portions of the application. - -\subsection subsec_unicode_warn_names Filenames -Since file access is a system issue, filenames do not fall within the scope -of HDF5's UTF-8 capabilities; filenames are encoded at the system level. - -Linux and Mac OS systems normally handle UTF-8 encoded filenames correctly -while Windows systems generally do not. - -\section sec_unicode_text The *Plain Text* Illusion -Beware the use of the term *plain text*. *Plain text* is at best ambiguous, but often -misleading. Many will assume that *plain text* means ASCII, but plain text German or -French, for example, cannot be represented in ASCII. Plain text is only unambiguous -in the context of English (and even then can be problematic!). - -\subsection subsec_unicode_warn_store Storage Size -Programmers and data users accustomed to working strictly with ASCII data generally make -the reasonable assumption that 1 character, be it in an object name or in data, requires -1 byte of storage. This equation does not work when using UTF-8 or any other Unicode encoding. -With Unicode encoding, number of characters is not synonymous with number of bytes. One must -get used to thinking in terms of number of characters when talking about content, reserving -number of bytes for discussions of storage size. - -When working with Unicode text, one can no longer assume a 1:1 correspondence between the -number of characters and the data storage requirement. - -\subsection subsec_unicode_warn_sys System Dependencies -Linux, Unix, and similar systems generally handle UTF-8 encodings in correct and -predictable ways. There is an apparent consensus in the Linux community that "UTF-8 is just the right way to go." - -Mac OS systems generally handle UTF-8 encodings correctly. - -Windows systems use a different Unicode encoding, UCS-2 (discussed in this UTF-16 article) at -the system level. Within an HDF5 file and application on a Windows system, UTF-8 encoding should -work correctly and as expected. Problems may arise, however, when that UTF-8 encoding is exposed -directly to the Windows system. For example: -\li File open and close calls on files with UTF-8 encoded names are likely to fail as the HDF5 -open and close operations interact directly with the Windows file system interface. -\li Anytime an HDF5 command-line utility (\ref H5TOOL_LS_UG or \ref H5TOOL_DP_UG, for example) emits text output, the -Windows system must interpret the character encodings. If that output is UTF-8 encoded, Windows -will correctly interpret only those characters in the ASCII subset of UTF-8. - -\section sec_unicode_common Common Characters in UTF-8 and ASCII -One interesting feature of UTF-8 and ASCII is that the ASCII character set is a discrete subset of -the UTF-8 character set. And where they overlap, the encodings are identical. This means that a -character string consisting entirely of members of the ASCII character set can be encoded in either -ASCII or UTF-8, the two encodings will be indistinguishable, and the encodings will require exactly the same storage space. - - -\section sec_unicode_also See Also - -- For object and attribute names: - * #H5Pset_char_encoding - * #H5Pget_char_encoding -- For dataset and attribute datatypes: - * #H5Tset_cset - * #H5Tget_cset -- [UTF-8 article on Wikipedia](https://en.wikipedia.org/wiki/UTF-8) - -

NOTES

-1. UTF-8 is the only Unicode standard encoding supported in HDF5. - -*/ - -/** \page VDS Introduction to the Virtual Dataset - VDS - -\section sec_vds_intro Introduction to VDS -The HDF5 Virtual Dataset (VDS) feature enables users to access data in a collection of HDF5 files as a -single HDF5 dataset and to use the HDF5 APIs to work with that dataset. - -For example, your data may be collected into four files: -tutrvds-multimgs.png - -You can map the datasets in the four files into a single VDS that can be accessed just like any other dataset: -tutrvds-snglimg.png - -The mapping between a VDS and the HDF5 source datasets is persistent and transparent to an application. If a source -file is missing the fill value will be displayed. - -See the Virtual (VDS) Documentation for complete details regarding the VDS feature. - -The VDS feature was implemented using hyperslab selection (#H5Sselect_hyperslab). See the tutorial on -Reading From or Writing to a Subset of a Dataset for more information on selecting hyperslabs. - -\subsection subsec_vds_intro_model Programming Model -To create a Virtual Dataset you simply follow the HDF5 programming model and add a few additional API calls -to map the source code datasets to the VDS. - -Following are the steps for creating a Virtual Dataset: -\li Create the source datasets that will comprise the VDS -\li Create the VDS: ‐ Define a datatype and dataspace (can be unlimited) -\li Define the dataset creation property list (including fill value) -\li (Repeat for each source dataset) Map elements from the source dataset to elements of the VDS -\li Select elements in the source dataset (source selection) -\li Select elements in the virtual dataset (destination selection) -\li Map destination selections to source selections (see Functions for Working with a VDS) -\li Call H5Dcreate using the properties defined above -\li Access the VDS as a regular HDF5 dataset -\li Close the VDS when finished - -

Functions for Working with a VDS

-The #H5Pset_virtual API sets the mapping between virtual and source datasets. This is a dataset creation property list. -Using this API will change the layout of the dataset to #H5D_VIRTUAL. As with specifying any dataset creation property -list, an instance of the property list is created, modified, passed into the dataset creation call and then closed: -\code - dcpl = H5Pcreate (H5P_DATASET_CREATE); - src_space = H5screate_simple ... - status = H5Sselect_hyperslab (space, ... - status = H5Pset_virtual (dcpl, space, SRC_FILE[i], SRC_DATASET[i], src_space); - dset = H5Dcreate2 (file, DATASET, H5T_NATIVE_INT, space, H5P_DEFAULT, dcpl, H5P_DEFAULT); - status = H5Pclose (dcpl); -\endcode - -There are several other APIs introduced with Virtual Datasets, including query functions. For details -see the complete list of HDF5 library APIs that support Virtual Datasets. - -

Limitations

-This feature was introduced in HDF5-1.10. - -The number of source datasets is unlimited. However, there is a limit on the size of each source dataset. - -\subsection subsec_vds_intro_examples Programming Examples -Example 1 -This example creates three HDF5 files, each with a one-dimensional dataset of 6 elements. The datasets in these files -are the source datasets that are then used to create a 4 x 6 Virtual Dataset with a fill value of -1. The first three -rows of the VDS are mapped to the data from the three source datasets as shown below: -tutrvds-ex.png - -In this example the three source datasets are mapped to the VDS with this code: -\code> -src_space = H5Screate_simple (RANK1, dims, NULL); -for (i = 0; i < 3; i++) { - start[0] = (hsize_t)i; - // Select i-th row in the virtual dataset; selection in the source datasets is the same. - status = H5Sselect_hyperslab (space, H5S_SELECT_SET, start, NULL, count, block); - status = H5Pset_virtual (dcpl, space, SRC_FILE[i], SRC_DATASET[i], src_space); -} -endcode> - -After the VDS is created and closed, it is reopened. The property list is then queried to determine the -layout of the dataset and its mappings, and the data in the VDS is read and printed. - -This example is in the HDF5 source code and can be obtained from here: -

C Example

-For details on compiling an HDF5 application: [ Compiling HDF5 Applications ] - -

Example 2

-This example shows how to use a C-style printf statement for specifying multiple source datasets as one virtual -dataset. Only one mapping is required. In other words only one #H5Pset_virtual call is needed to map multiple datasets. -It creates a 2-dimensional unlimited VDS. Then it re-opens the file, makes queries, and reads the virtual dataset. - -The source datasets are specified as A-0, A-1, A-2, and A-3. These are mapped to the virtual dataset with one call: -\code -status = H5Pset_virtual (dcpl, vspace, SRCFILE, "A-%b", src_space); -\endcode - -The %b indicates that the block count of the selection in the dimension should be used. - -

C Example

-For details on compiling an HDF5 application: [ Compiling HDF5 Applications ] - -Using h5dump with a VDS -The h5dump utility can be used to view a VDS. The h5dump output for a VDS looks exactly like that for any other dataset. -If h5dump cannot find a source dataset then the fill value will be displayed. - -You can determine that a dataset is a VDS by looking at its properties with -\code - h5dump -p -\endcode - It will display each source dataset mapping, beginning with Mapping 0. Below is an excerpt of the output of -\code - h5dump -p -\endcode -on the vds.h5 file created in Example 1.You can see that the entire source file a.h5 is mapped to the first row of the VDS dataset. - -tutrvds-map.png - -*/ - diff --git a/doxygen/dox/Unicode.dox b/doxygen/dox/Unicode.dox new file mode 100644 index 00000000000..fbb83f635b8 --- /dev/null +++ b/doxygen/dox/Unicode.dox @@ -0,0 +1,144 @@ + +/** \page UNICODE Using UTF-8 Encoding in HDF5 Applications + +\section sec_unicode_intro Introduction +Text and character data are often discussed as though text means ASCII text. We even go so far as +to call a file containing only ASCII text a plain text file. This works reasonably well for English +(though better for American English than British English), but what if that plain text file is in +French, German, Chinese, or any of several hundred other languages? This document introduces the +use of UTF-8 encoding (see note 1), enabling the use of a much more extensive and flexible character +set that can faithfully represent any of those languages. + +This document assumes a working familiarity with UTF-8 and Unicode. Any reader who is unfamiliar +with UTF-8 encoding should read the [Wikipedia UTF-8 article](https://en.wikipedia.org/wiki/UTF-8) +before proceeding; it provides an excellent primer. + +For our context, the most important UTF-8 concepts are: +\li Multi-byte and variable-size character encodings +\li Limitations of the ASCII character set +\li Risks associated with the use of the term plain text +\li Representation of multiple language alphabets or characters in a single document + +More specific technical details will only become important if they affect the specifics of +your application design or implementation. + +\section sec_unicode_support How and Where Is UTF-8 Supported in HDF5? +HDF5 uses characters in object names (which are actually link names, but that's a story for a +different article), dataset raw data, attribute names, and attribute raw data. Though the +mechanisms differ, you can use either ASCII or UTF-8 character sets in all of these situations. + +\subsection subsec_unicode_support_names Object and Attribute Names +By default, HDF5 creates object and attribute names with ASCII character encoding. An object or +attribute creation property list setting is required to create object names with UTF-8 characters. +This uses the function #H5Pset_char_encoding, which sets the character encoding used for object and attribute names. + +For example, the following call sequence could be used to create a dataset with its name encoded with the UTF-8 character set: + +\code + lcpl_id = H5Pcreate(H5P_LINK_CREATE) ; + error = H5Pset_char_encoding(lcpl_id, H5T_CSET_UTF8) ; + dataset_id = H5Dcreate2(group_id, "datos_ñ", datatype_id, dataspace_id, + lcpl_id, H5P_DEFAULT, H5P_DEFAULT) ; +\endcode + +If the character encoding of an attribute name is unknown, the combination of an +#H5Aget_create_plist call and an #H5Pget_char_encoding call will reveal that information. +If the character encoding of an object name is unknown, the information can be accessed +through the object's H5L_info_t structure which can be obtained using #H5Lvisit or #H5Lget_info_by_idx calls. + +\subsection subsec_unicode_support_char Character Datatypes in Datasets and Attributes +Like object names, HDF5 character data in datasets and attributes is encoded as ASCII by +default. Setting up attribute or dataset character data to be UTF-8-encoded is accomplished +while defining the attribute or dataset datatype. This makes use of the function #H5Tset_cset, +which sets the character encoding to be used in building a character datatype. + +For example, the following commands could be used to create an 8-character, UTF-8 encoded, +string datatype for use in either an attribute or dataset: + +\code + datatype_id = H5Tcopy(H5T_C_S1); + error = H5Tset_cset(datatype_id, H5T_CSET_UTF8); + error = H5Tset_size(datatype_id, "8"); +\endcode + +If a character or string datatype's character encoding is unknown, the combination of an +#H5Aget_type or #H5Dget_type call and an #H5Tget_cset call can be used to determine that. + +\section sec_unicode_warn Caveats, Pitfalls, and Things to Watch For +Programmers who are accustomed to using ASCII text without accommodating other text +encodings will have to be aware of certain common issues as they begin using UTF-8 encodings. + +\subsection subsec_unicode_warn_port Cross-platform Portability +Since the HDF5 Library handles datatypes directly, UTF-8 encoded text in dataset and +attribute datatypes in a well-designed HDF5 application and file should work transparently +across platforms. The same should be true of handling names of groups, datasets, committed +datatypes, and attributes within a file. + +Be aware, however, of system or application limitations once data or other information +has been extracted from an HDF5 file. The application or system must be designed to +accommodate UTF-8 encodings if the information is then used elsewhere in the application or system environment. + +Data from a UTF-8 encoded HDF5 datatype, in either a dataset or an attribute, +that has been established within an HDF5 application should "just work" within the HDF5 portions of the application. + +\subsection subsec_unicode_warn_names Filenames +Since file access is a system issue, filenames do not fall within the scope +of HDF5's UTF-8 capabilities; filenames are encoded at the system level. + +Linux and Mac OS systems normally handle UTF-8 encoded filenames correctly +while Windows systems generally do not. + +\section sec_unicode_text The *Plain Text* Illusion +Beware the use of the term *plain text*. *Plain text* is at best ambiguous, but often +misleading. Many will assume that *plain text* means ASCII, but plain text German or +French, for example, cannot be represented in ASCII. Plain text is only unambiguous +in the context of English (and even then can be problematic!). + +\subsection subsec_unicode_warn_store Storage Size +Programmers and data users accustomed to working strictly with ASCII data generally make +the reasonable assumption that 1 character, be it in an object name or in data, requires +1 byte of storage. This equation does not work when using UTF-8 or any other Unicode encoding. +With Unicode encoding, number of characters is not synonymous with number of bytes. One must +get used to thinking in terms of number of characters when talking about content, reserving +number of bytes for discussions of storage size. + +When working with Unicode text, one can no longer assume a 1:1 correspondence between the +number of characters and the data storage requirement. + +\subsection subsec_unicode_warn_sys System Dependencies +Linux, Unix, and similar systems generally handle UTF-8 encodings in correct and +predictable ways. There is an apparent consensus in the Linux community that "UTF-8 is just the right way to go." + +Mac OS systems generally handle UTF-8 encodings correctly. + +Windows systems use a different Unicode encoding, UCS-2 (discussed in this UTF-16 article) at +the system level. Within an HDF5 file and application on a Windows system, UTF-8 encoding should +work correctly and as expected. Problems may arise, however, when that UTF-8 encoding is exposed +directly to the Windows system. For example: +\li File open and close calls on files with UTF-8 encoded names are likely to fail as the HDF5 +open and close operations interact directly with the Windows file system interface. +\li Anytime an HDF5 command-line utility (\ref H5TOOL_LS_UG or \ref H5TOOL_DP_UG, for example) emits text output, the +Windows system must interpret the character encodings. If that output is UTF-8 encoded, Windows +will correctly interpret only those characters in the ASCII subset of UTF-8. + +\section sec_unicode_common Common Characters in UTF-8 and ASCII +One interesting feature of UTF-8 and ASCII is that the ASCII character set is a discrete subset of +the UTF-8 character set. And where they overlap, the encodings are identical. This means that a +character string consisting entirely of members of the ASCII character set can be encoded in either +ASCII or UTF-8, the two encodings will be indistinguishable, and the encodings will require exactly the same storage space. + + +\section sec_unicode_also See Also + +- For object and attribute names: + * #H5Pset_char_encoding + * #H5Pget_char_encoding +- For dataset and attribute datatypes: + * #H5Tset_cset + * #H5Tget_cset +- [UTF-8 article on Wikipedia](https://en.wikipedia.org/wiki/UTF-8) + +

NOTES

+1. UTF-8 is the only Unicode standard encoding supported in HDF5. + +*/ diff --git a/doxygen/dox/VDSTechNote.dox b/doxygen/dox/VDSTechNote.dox new file mode 100644 index 00000000000..9bb6786a132 --- /dev/null +++ b/doxygen/dox/VDSTechNote.dox @@ -0,0 +1,115 @@ + +/** \page VDSTN Introduction to the Virtual Dataset - VDS + +\section sec_vds_intro Introduction to VDS +The HDF5 Virtual Dataset (VDS) feature enables users to access data in a collection of HDF5 files as a +single HDF5 dataset and to use the HDF5 APIs to work with that dataset. + +For example, your data may be collected into four files: +tutrvds-multimgs.png + +You can map the datasets in the four files into a single VDS that can be accessed just like any other dataset: +tutrvds-snglimg.png + +The mapping between a VDS and the HDF5 source datasets is persistent and transparent to an application. If a source +file is missing the fill value will be displayed. + +See the Virtual (VDS) Documentation for complete details regarding the VDS feature. + +The VDS feature was implemented using hyperslab selection (#H5Sselect_hyperslab). See the tutorial on +Reading From or Writing to a Subset of a Dataset for more information on selecting hyperslabs. + +\subsection subsec_vds_intro_model Programming Model +To create a Virtual Dataset you simply follow the HDF5 programming model and add a few additional API calls +to map the source code datasets to the VDS. + +Following are the steps for creating a Virtual Dataset: +\li Create the source datasets that will comprise the VDS +\li Create the VDS: ‐ Define a datatype and dataspace (can be unlimited) +\li Define the dataset creation property list (including fill value) +\li (Repeat for each source dataset) Map elements from the source dataset to elements of the VDS +\li Select elements in the source dataset (source selection) +\li Select elements in the virtual dataset (destination selection) +\li Map destination selections to source selections (see Functions for Working with a VDS) +\li Call H5Dcreate using the properties defined above +\li Access the VDS as a regular HDF5 dataset +\li Close the VDS when finished + +

Functions for Working with a VDS

+The #H5Pset_virtual API sets the mapping between virtual and source datasets. This is a dataset creation property list. +Using this API will change the layout of the dataset to #H5D_VIRTUAL. As with specifying any dataset creation property +list, an instance of the property list is created, modified, passed into the dataset creation call and then closed: +\code + dcpl = H5Pcreate (H5P_DATASET_CREATE); + src_space = H5screate_simple ... + status = H5Sselect_hyperslab (space, ... + status = H5Pset_virtual (dcpl, space, SRC_FILE[i], SRC_DATASET[i], src_space); + dset = H5Dcreate2 (file, DATASET, H5T_NATIVE_INT, space, H5P_DEFAULT, dcpl, H5P_DEFAULT); + status = H5Pclose (dcpl); +\endcode + +There are several other APIs introduced with Virtual Datasets, including query functions. For details +see the complete list of HDF5 library APIs that support Virtual Datasets. + +

Limitations

+This feature was introduced in HDF5-1.10. + +The number of source datasets is unlimited. However, there is a limit on the size of each source dataset. + +\subsection subsec_vds_intro_examples Programming Examples +Example 1 +This example creates three HDF5 files, each with a one-dimensional dataset of 6 elements. The datasets in these files +are the source datasets that are then used to create a 4 x 6 Virtual Dataset with a fill value of -1. The first three +rows of the VDS are mapped to the data from the three source datasets as shown below: +tutrvds-ex.png + +In this example the three source datasets are mapped to the VDS with this code: +\code> +src_space = H5Screate_simple (RANK1, dims, NULL); +for (i = 0; i < 3; i++) { + start[0] = (hsize_t)i; + // Select i-th row in the virtual dataset; selection in the source datasets is the same. + status = H5Sselect_hyperslab (space, H5S_SELECT_SET, start, NULL, count, block); + status = H5Pset_virtual (dcpl, space, SRC_FILE[i], SRC_DATASET[i], src_space); +} +endcode> + +After the VDS is created and closed, it is reopened. The property list is then queried to determine the +layout of the dataset and its mappings, and the data in the VDS is read and printed. + +This example is in the HDF5 source code and can be obtained from here: +

C Example

+For details on compiling an HDF5 application: [ Compiling HDF5 Applications ] + +

Example 2

+This example shows how to use a C-style printf statement for specifying multiple source datasets as one virtual +dataset. Only one mapping is required. In other words only one #H5Pset_virtual call is needed to map multiple datasets. +It creates a 2-dimensional unlimited VDS. Then it re-opens the file, makes queries, and reads the virtual dataset. + +The source datasets are specified as A-0, A-1, A-2, and A-3. These are mapped to the virtual dataset with one call: +\code +status = H5Pset_virtual (dcpl, vspace, SRCFILE, "A-%b", src_space); +\endcode + +The %b indicates that the block count of the selection in the dimension should be used. + +

C Example

+For details on compiling an HDF5 application: [ Compiling HDF5 Applications ] + +Using h5dump with a VDS +The h5dump utility can be used to view a VDS. The h5dump output for a VDS looks exactly like that for any other dataset. +If h5dump cannot find a source dataset then the fill value will be displayed. + +You can determine that a dataset is a VDS by looking at its properties with +\code + h5dump -p +\endcode + It will display each source dataset mapping, beginning with Mapping 0. Below is an excerpt of the output of +\code + h5dump -p +\endcode +on the vds.h5 file created in Example 1.You can see that the entire source file a.h5 is mapped to the first row of the VDS dataset. + +tutrvds-map.png + +*/ diff --git a/doxygen/dox/VFLTechNote.dox b/doxygen/dox/VFLTechNote.dox new file mode 100644 index 00000000000..130259baf9c --- /dev/null +++ b/doxygen/dox/VFLTechNote.dox @@ -0,0 +1,1025 @@ + +/** \page VFLTN HDF5 Virtual File Layer + +\section sec_vfl_intro Introduction +The HDF5 file format describes how HDF5 data structures and dataset raw data are mapped +to a linear format address space and the HDF5 library implements that bidirectional mapping +in terms of an API. However, the HDF5 format specifications do not indicate how the format +address space is mapped onto storage and HDF (version 5 and earlier) simply mapped the format +address space directly onto a single file by convention. + +Since early versions of HDF5 it became apparent that users want the ability to map the +format address space onto different types of storage (a single file, multiple files, local +memory, global memory, network distributed global memory, a network protocol, etc.) with +various types of maps. For instance, some users want to be able to handle very large format +address spaces on operating systems that support only 2GB files by partitioning the format +address space into equal-sized parts each served by a separate file. Other users want the +same multi-file storage capability but want to partition the address space according to +purpose (raw data in one file, object headers in another, global heap in a third, etc.) +in order to improve I/O speeds. + +In fact, the number of storage variations is probably larger than the number of methods +that the HDF5 team is capable of implementing and supporting. Therefore, a Virtual File +Layer API is being implemented which will allow application teams or departments to design +and implement their own mapping between the HDF5 format address space and storage, with each +mapping being a separate file driver (possibly written in terms of other file drivers). The +HDF5 team will provide a small set of useful file drivers which will also serve as examples +for those who which to write their own: + + + + + + + + + + + + + + + + +
#H5FD_SEC2This is the default driver which uses Posix file-system functions +like read and write to perform I/O to a single file. All I/O requests are unbuffered +although the driver does optimize file seeking operations to some extent. +
#H5FD_STDIOThis driver uses functions from 'stdio.h' to perform buffered I/O to a single file. +
#H5FD_COREThis driver performs I/O directly to memory and can be +used to create small temporary files that never exist on permanent storage. This +type of storage is generally very fast since the I/O consists only of memory-to-memory copy operations. +
#H5FD_MPIOThis is the driver of choice for accessing files in parallel +using MPI and MPI-IO. It is only predefined if the library is compiled with parallel I/O support. +
#H5FD_FAMILYLarge format address spaces are partitioned into more +manageable pieces and sent to separate storage locations using an underlying driver +of the user's choice. \ref H5TOOL_RT_UG can be used to change the sizes of the family +members when stored as files or to convert a family of files to a single file or vice versa. +
+ +\section sec_vfl_use Using a File Driver +Most application writers will use a driver defined by the HDF5 library or contributed by another +programming team. This chapter describes how existing drivers are used. + +\subsection subsec_vfl_use_hdr Driver Header Files +Each file driver is defined in its own public header file which should be included by any +application which plans to use that driver. The predefined drivers are in header files whose +names begin with 'H5FD' followed by the driver name and '.h'. The 'hdf5.h' header file includes +all the predefined driver header files. + +Once the appropriate header file is included a symbol of the form 'H5FD_' followed by the +upper-case driver name will be the driver identification number.(The driver name is by convention +and might not apply to drivers which are not distributed with HDF5.) However, the value may +change if the library is closed (e.g., by calling #H5close) and the symbol is referenced again. + +\subsection subsec_vfl_use_create Creating and Opening Files +In order to create or open a file one must define the method by which the storage is +accessed(The access method also indicates how to translate the storage name to a storage server +such as a file, network protocol, or memory.) and does so by creating a file access property +list(The term "file access property list" is a misnomer since storage isn't required to be a file.) +which is passed to the #H5Fcreate or #H5Fopen function. A default file access property list is created +by calling #H5Pcreate and then the file driver information is inserted by calling a driver initialization +function such as #H5Pset_fapl_family: +\code +hid_t fapl = H5Pcreate(H5P_FILE_ACCESS); +size_t member_size = 100*1024*1024; /*100MB*/ +H5Pset_fapl_family(fapl, member_size, H5P_DEFAULT); +hid_t file = H5Fcreate("foo%05d.h5", H5F_ACC_TRUNC, H5P_DEFAULT, fapl); +H5Pclose(fapl); +\endcode + +Each file driver will have its own initialization function whose name is H5Pset_fapl_ followed by +the driver name and which takes a file access property list as the first argument followed by additional +driver-dependent arguments. + +An alternative to using the driver initialization function is to set the driver directly using the +#H5Pset_driver function.(This function is overloaded to operate on data transfer property lists also, as described below.) +Its second argument is the file driver identifier, which may have a different numeric value from run to run +depending on the order in which the file drivers are registered with the library. The third argument encapsulates +the additional arguments of the driver initialization function. This method only works if the file driver +writer has made the driver-specific property list structure a public datatype, which is often not the case. +\code +hid_t fapl = H5Pcreate(H5P_FILE_ACCESS); +static H5FD_family_fapl_t fa = {100*1024*1024, H5P_DEFAULT}; +H5Pset_driver(fapl, H5FD_FAMILY, &fa); +hid_t file = H5Fcreate("foo.h5", H5F_ACC_TRUNC, H5P_DEFAULT, fapl); +H5Pclose(fapl); +\endcode + +It is also possible to query the file driver information from a file access property list by +calling #H5Pget_driver to determine the driver and then calling a driver-defined query function +to obtain the driver information: +\code +hid_t driver = H5Pget_driver(fapl); +if (H5FD_SEC2==driver) { + /*nothing further to get*/ +} else if (H5FD_FAMILY==driver) { + hid_t member_fapl; + haddr_t member_size; + H5Pget_fapl_family(fapl, &member_size, &member_fapl); +} else if (....) { + .... +} +\endcode + +\subsection subsec_vfl_use_per Performing I/O +The #H5Dread and #H5Dwrite functions transfer data between application memory and the file. They both take +an optional data transfer property list which has some general driver-independent properties and optional +driver-defined properties. An application will typically perform I/O in one of three styles via the +#H5Dread or #H5Dwrite function: + +Like file access properties in the previous section, data transfer properties can be set using a driver +initialization function or a general purpose function. For example, to set the MPI-IO driver to use +independent access for I/O operations one would say: +\code +hid_t dxpl = H5Pcreate(H5P_DATA_XFER); +H5Pset_dxpl_mpio(dxpl, H5FD_MPIO_INDEPENDENT); +H5Dread(dataset, type, mspace, fspace, buffer, dxpl); +H5Pclose(dxpl); +\endcode + +The alternative is to initialize a driver defined C struct and pass it to the #H5Pset_driver function: +\code +hid_t dxpl = H5Pcreate(H5P_DATA_XFER); +static H5FD_mpio_dxpl_t dx = {H5FD_MPIO_INDEPENDENT}; +H5Pset_driver(dxpl, H5FD_MPIO, &dx); +H5Dread(dataset, type, mspace, fspace, buffer, dxpl); +\endcode + +The transfer property list can be queried in a manner similar to the file access property list: the driver +provides a function (or functions) to return various information about the transfer property list: +\code +hid_t driver = H5Pget_driver(dxpl); +if (H5FD_MPIO==driver) { + H5FD_mpio_xfer_t xfer_mode; + H5Pget_dxpl_mpio(dxpl, &xfer_mode); +} else { + .... +} +\endcode + +\subsection subsec_vfl_use_inter File Driver Interchangeability +The HDF5 specifications describe two things: the mapping of data onto a linear format address +space and the C API which performs the mapping. However, the mapping of the format address space +onto storage intentionally falls outside the scope of the HDF5 specs. This is a direct result of the +fact that it is not generally possible to store information about how to access storage inside the +storage itself. For instance, given only the file name '/arborea/1225/work/f%03d' the HDF5 library +is unable to tell whether the name refers to a file on the local file system, a family of files on +the local file system, a file on host 'arborea' port 1225, a family of files on a remote system, etc. + +Two ways which library could figure out where the storage is located are: storage access information +can be provided by the user, or the library can try all known file access methods. This implementation +uses the former method. + +In general, if a file was created with one driver then it isn't possible to open it with another driver. +There are of course exceptions: a file created with MPIO could probably be opened with the sec2 driver, +any file created by the sec2 driver could be opened as a family of files with one member, etc. In fact, +sometimes a file must not only be opened with the same driver but also with the same driver properties. +The predefined drivers are written in such a way that specifying the correct driver is sufficient for +opening a file. + +\section sec_vfl_imp Implementation of a Driver +A driver is simply a collection of functions and data structures which are registered with the HDF5 +library at runtime. The functions fall into these categories: +\li Functions which operate on modes +\li Functions which operate on files +\li Functions which operate on the address space +\li Functions which operate on data +\li Functions for driver initialization +\li Optimization functions + +\subsection subsec_vfl_imp_mode Mode Functions +Some drivers need information about file access and data transfers which are very specific to the driver. +The information is usually implemented as a pair of pointers to C structs which are allocated and +initialized as part of an HDF5 property list and passed down to various driver functions. There are two +classes of settings: file access modes that describe how to access the file through the driver, and +data transfer modes which are settings that control I/O operations. Each file opened by a particular +driver may have a different access mode; each dataset I/O request for a particular file may have a +different data transfer mode. + +Since each driver has its own particular requirements for various settings, each driver is responsible +for defining the mode structures that it needs. Higher layers of the library treat the structures as +opaque but must be able to copy and free them. Thus, the driver provides either the size of the +structure or a pair of function pointers for each of the mode types. + +Example: The family driver needs to know how the format address space is partitioned and the file +access property list to use for the family members. +\code +// Driver-specific file access properties +typedef struct H5FD_family_fapl_t { + hsize_t memb_size; // size of each family member + hid_t memb_fapl; // file access property list for each family member +} H5FD_family_fapl_t; + +// Driver specific data transfer properties +typedef struct H5FD_family_dxpl_t { + hid_t memb_dxpl_id; //data xfer property list of each member +} H5FD_family_dxpl_t; +\endcode +n order to copy or free one of these structures the member file access or data transfer properties must +also be copied or freed. This is done by providing a copy and close function for each structure: + +Example: The file access property list copy and close functions for the family driver: +\code +static void * +H5FD_family_fapl_copy(const void *_old_fa) +{ + const H5FD_family_fapl_t *old_fa = (const H5FD_family_fapl_t*)_old_fa; + H5FD_family_fapl_t *new_fa = malloc(sizeof(H5FD_family_fapl_t)); + assert(new_fa); + + memcpy(new_fa, old_fa, sizeof(H5FD_family_fapl_t)); + new_fa->memb_fapl_id = H5Pcopy(old_fa->memb_fapl_id); + return new_fa; +} + +static herr_t +H5FD_family_fapl_free(void *_fa) +{ + H5FD_family_fapl_t *fa = (H5FD_family_fapl_t*)_fa; + H5Pclose(fa->memb_fapl_id); + free(fa); + return 0; +} +\endcode + +Generally when a file is created or opened the file access properties for the driver are copied into the +file pointer which is returned and they may be modified from their original value (for instance, the file +family driver modifies the member size property when opening an existing family). In order to support the +#H5Fget_access_plist function the driver must provide a fapl_get callback which creates a copy of the +driver-specific properties based on a particular file. + +Example: The file family driver copies the member size file access property list into the return value: +\code +static void * +H5FD_family_fapl_get(H5FD_t *_file) +{ + H5FD_family_t *file = (H5FD_family_t*)_file; + H5FD_family_fapl_t *fa = calloc(1, sizeof(H5FD_family_fapl_t*)); + + fa->memb_size = file->memb_size; + fa->memb_fapl_id = H5Pcopy(file->memb_fapl_id); + return fa; +} +\endcode + +\subsection subsec_vfl_imp_file File Functions +The higher layers of the library expect files to have a name and allow the file to be accessed in various modes. +The driver must be able to create a new file, replace an existing file, or open an existing file. Opening or +creating a file should return a handle, a pointer to a specialization of the H5FD_t struct, which allows read-only +or read-write access and which will be passed to the other driver functions as they are called.(Read-only access is +only appropriate when opening an existing file.) +\code +typedef struct { + // Public fields + H5FD_class_t *cls; //class data defined below + + // Private fields -- driver-defined + +} H5FD_t; +\endcode + +Example: The family driver requires handles to the underlying storage, the size of the members for this +particular file (which might be different than the member size specified in the file access property list +if an existing file family is being opened), the name used to open the file in case additional members +must be created, and the flags to use for creating those additional members. The eoa member caches the +size of the format address space so the family members don't have to be queried in order to find it. +\code +// The description of a file belonging to this driver. +typedef struct H5FD_family_t { + H5FD_t pub; // public stuff, must be first + hid_t memb_fapl_id; // file access property list for members + hsize_t memb_size; // maximum size of each member file + int nmembs; // number of family members + int amembs; // number of member slots allocated + H5FD_t **memb; // dynamic array of member pointers + haddr_t eoa; // end of allocated addresses + char *name; // name generator printf format + unsigned flags; // flags for opening additional members +} H5FD_family_t; +\endcode + +Example: The sec2 driver needs to keep track of the underlying Unix file descriptor and also the +end of format address space and current Unix file size. It also keeps track of the current file +position and last operation (read, write, or unknown) in order to optimize calls to lseek. The +device and inode fields are defined on Unix in order to uniquely identify the file and will be +discussed below. +\code +typedef struct H5FD_sec2_t { + H5FD_t pub; // public stuff, must be first + int fd; // the unix file + haddr_t eoa; // end of allocated region + haddr_t eof; // end of file; current file size + haddr_t pos; // current file I/O position + int op; // last operation + dev_t device; // file device number + ino_t inode; // file i-node number +} H5FD_sec2_t; +\endcode + +\subsection subsec_vfl_imp_open Open Files +All drivers must define a function for opening/creating a file. This function should have a prototype which is: + + + + + +
static H5FD_t * open (const char *name, unsigned flags, hid_t fapl, haddr_t maxaddr)The file name name and file access property list fapl are the same as were specified in the #H5Fcreate +or #H5Fopen call. The flags are the same as in those calls also except the flag #H5F_ACC_CREAT is also +present if the call was to H5Fcreate and they are documented in the 'H5Fpublic.h' file. The maxaddr +argument is the maximum format address that the driver should be prepared to handle (the minimum address is always zero).
+ +Example: The sec2 driver opens a Unix file with the requested name and saves information which +uniquely identifies the file (the Unix device number and inode). +\code +static H5FD_t * +H5FD_sec2_open(const char *name, unsigned flags, hid_t fapl_id/*unused*/, + haddr_t maxaddr) +{ + unsigned o_flags; + int fd; + struct stat sb; + H5FD_sec2_t *file=NULL; + + // Check arguments + if (!name || !*name) return NULL; + if (0==maxaddr || HADDR_UNDEF==maxaddr) return NULL; + if (ADDR_OVERFLOW(maxaddr)) return NULL; + + // Build the open flags + o_flags = (H5F_ACC_RDWR & flags) ? O_RDWR : O_RDONLY; + if (H5F_ACC_TRUNC & flags) o_flags |= O_TRUNC; + if (H5F_ACC_CREAT & flags) o_flags |= O_CREAT; + if (H5F_ACC_EXCL & flags) o_flags |= O_EXCL; + + // Open the file + if ((fd=open(name, o_flags, 0666))<0) return NULL; + if (fstat(fd, &sb)<0) { + close(fd); + return NULL; + } + + // Create the new file struct + file = calloc(1, sizeof(H5FD_sec2_t)); + file->fd = fd; + file->eof = sb.st_size; + file->pos = HADDR_UNDEF; + file->op = OP_UNKNOWN; + file->device = sb.st_dev; + file->inode = sb.st_ino; + + return (H5FD_t*)file; +} +\endcode + +\subsection subsec_vfl_imp_close Closing Files +Closing a file simply means that all cached data should be flushed to the next lower layer, the +file should be closed at the next lower layer, and all file-related data structures should be +freed. All information needed by the close function is already present in the file handle. + + + + + +
static herr_t close (H5FD_t *file)The file argument is the handle which was returned by the open function, and the close should +free only memory associated with the driver-specific part of the handle (the public parts will +have already been released by HDF5's virtual file layer).
+ +Example: The sec2 driver just closes the underlying Unix file, making sure that the actual +file size is the same as that known to the library by writing a zero to the last file position +it hasn't been written by some previous operation (which happens in the same code which flushes +the file contents and is shown below). +\code +static herr_t +H5FD_sec2_close(H5FD_t *_file) +{ + H5FD_sec2_t *file = (H5FD_sec2_t*)_file; + + if (H5FD_sec2_flush(_file)<0) return -1; + if (close(file->fd)<0) return -1; + free(file); + return 0; +} +\endcode + +\subsection subsec_vfl_imp_key File Keys +Occasionally an application will attempt to open a single file more than one time in order +to obtain multiple handles to the file. HDF5 allows the files to share information(For instance, +writing data to one handle will cause the data to be immediately visible on the other handle.) +but in order to accomplish this HDF5 must be able to tell when two names refer to the same file. +It does this by associating a driver-defined key with each file opened by a driver and comparing +the key for an open request with the keys for all other files currently open by the same driver. + + + + + +
const int cmp (const H5FD_t *f1, const H5FD_t *f2)The driver may provide a function which compares two files f1 and f2 belonging to the same +driver and returns a negative, positive, or zero value a la the strcmp function.(The ordering +is arbitrary as long as it's consistent within a particular file driver.) If this function is +not provided then HDF5 assumes that all calls to the open callback return unique files regardless +of the arguments and it is up to the application to avoid doing this if that assumption is incorrect.
+ +Each time a file is opened the library calls the cmp function to compare that file with all other files +currently open by the same driver and if one of them matches (at most one can match) then the file +which was just opened is closed and the previously opened file is used instead. + +Opening a file twice with incompatible flags will result in failure. For instance, opening a file with +the truncate flag is a two step process which first opens the file without truncation so keys can be +compared, and if no matching file is found already open then the file is closed and immediately reopened +with the truncation flag set (if a matching file is already open then the truncating open will fail). + +Example: The sec2 driver uses the Unix device and i-node as the key. They were initialized when +the file was opened. +\code +static int +H5FD_sec2_cmp(const H5FD_t *_f1, const H5FD_t *_f2) +{ + const H5FD_sec2_t *f1 = (const H5FD_sec2_t*)_f1; + const H5FD_sec2_t *f2 = (const H5FD_sec2_t*)_f2; + + if (f1->device < f2->device) return -1; + if (f1->device > f2->device) return 1; + + if (f1->inode < f2->inode) return -1; + if (f1->inode > f2->inode) return 1; + + return 0; +} +\endcode + +\subsection subsec_vfl_imp_save Saving Modes Across Opens +Some drivers may also need to store certain information in the file superblock in order +to be able to reliably open the file at a later date. This is done by three functions: +one to determine how much space will be necessary to store the information in the superblock, +one to encode the information, +and one to decode the information. These functions are optional, but if any one is defined +then the other two must also be defined. + + + + + + + + + + + + + + + + + +
FunctionDescription
static hsize_t sb_size (H5FD_t *file)The sb_size function returns the number of bytes necessary to encode +information needed later if the file is reopened.
static herr_t sb_encode (H5FD_t *file, char *name, unsigned char *buf)The sb_encode function encodes information from the file into buffer buf +allocated by the caller. It also writes an 8-character (plus null termination) into +the name argument, which should be a unique identification for the driver.
static herr_t sb_decode (H5FD_t *file, const char *name, const unsigned char *buf)The sb_decode function looks at the name decodes data from the buffer buf and +updates the file argument with the new information, advancing *p in the process.
+The part of this which is somewhat tricky is that the file must be readable before the +superblock information is decoded. File access modes fall outside the scope of the HDF5 +file format, but they are placed inside the boot block for convenience.(File access modes +do not describe data, but rather describe how the HDF5 format address space is mapped to +the underlying file(s). Thus, in general the mapping must be known before the file +superblock can be read. However, the user usually knows enough about the mapping for +the superblock to be readable and once the superblock is read the library can fill +in the missing parts of the mapping.) + +\section sec_vfl_address Address Space Functions +HDF5 does not assume that a file is a linear address space of bytes. Instead, the library +will call functions to allocate and free portions of the HDF5 format address space, which +in turn map onto functions in the file driver to allocate and free portions of file address +space. The library tells the file driver how much format address space it wants to allocate +and the driver decides what format address to use and how that format address is mapped +onto the file address space. Usually the format address is chosen so that the file address +can be calculated in constant time for data I/O operations (which are always specified by format addresses). + +\subsection subsec_vfl_address_blk Userblock and Superblock +The HDF5 format allows an optional userblock to appear before the actual HDF5 data in such +a way that if the userblock is sucked out of the file and everything remaining is +shifted downward in the file address space, then the file is still a valid HDF5 file. +The userblock size can be zero or any multiple of two greater than or equal to 512 and +the file superblock begins immediately after the userblock. + +HDF5 allocates space for the userblock and superblock by calling an allocation function +defined below, which must return a chunk of memory at format address zero on the first call. + +\subsection subsec_vfl_address_alloc Allocatiion of Format Regions +The library makes many types of allocation requests: + + + + + + + + + + + + + + + + + + + + +
#H5FD_MEM_SUPERuserblock
#H5FD_MEM_BTREEAn allocation request for a node of a B-tree. +
#H5FD_MEM_DRAWAn allocation request for the raw data of a dataset. +
#H5FD_MEM_GHEAPAn allocation request for a global heap collection. Global +heaps are used to store certain types of references such as dataset region references. +The set of all global heap collections can become quite large. +
#H5FD_MEM_LHEAPAn allocation request for a local heap. Local heaps are used +to store the names which are members of a group. The combined size of all local heaps is +a function of the number of object names in the file. +
#H5FD_MEM_OHDRAn allocation request for (part of) an object header. Object +headers are relatively small and include meta information about objects (like the data +space and type of a dataset) and attributes. +
+ +When a chunk of memory is freed the library adds it to a free list and allocation requests +are satisfied from the free list before requesting memory from the file driver. Each type of +allocation request enumerated above has its own free list, but the file driver can specify that +certain object types can share a free list. It does so by providing an array which maps a +request type to a free list. If any value of the map is H5MF_DEFAULT (zero) then the object's +own free list is used. The special value H5MF_NOLIST indicates that the library should not +attempt to maintain a free list for that particular object type, instead calling the file driver +each time an object of that type is freed. + +Mappings predefined in the 'H5FDpublic.h' file are: + + + + + + + + + + +
#H5FD_FLMAP_SINGLEAll memory usage types are mapped to a single free list. +
#H5FD_FLMAP_DICHOTOMYMemory usage is segregated into meta data and raw data +for the purposes of memory management. +
#H5FD_FLMAP_DEFAULTEach memory usage type has its own free list. +
+ +Example: To make a map that manages object headers on one free list and everything else on +another free list one might initialize the map with the following code: (the use of #H5FD_MEM_SUPER is arbitrary) +\code +H5FD_mem_t mt, map[H5FD_MEM_NTYPES]; + +for (mt = 0; mt < H5FD_MEM_NTYPES; mt++) { + map[mt] = (H5FD_MEM_OHDR== mt) ? mt : H5FD_MEM_SUPER; +} +\endcode + +If an allocation request cannot be satisfied from the free list then one of two things happen. +If the driver defines an allocation callback then it is used to allocate space; otherwise new +memory is allocated from the end of the format address space by incrementing the end-of-address marker. + + + + + +
static haddr_t alloc (H5FD_t *file, H5MF_type_t type, hsize_t size)The file argument is the file from which space is to be allocated, type is the type of +memory being requested (from the list above) without being mapped according to the freelist +map and size is the number of bytes being requested. The library is allowed to allocate large +chunks of storage and manage them in a layer above the file driver (although the current library +doesn't do that). The allocation function should return a format address for the first byte +allocated. The allocated region extends from that address for size bytes. If the request cannot +be honored then the undefined address value is returned (#HADDR_UNDEF). The first call to this +function for a file which has never had memory allocated must return a format address of zero +or #HADDR_UNDEF since this is how the library allocates space for the userblock and/or superblock.
+ +\subsection subsec_vfl_address_free Freeing Format Regions +When the library is finished using a certain region of the format address space it will return the +space to the free list according to the type of memory being freed and the free list map described above. +If the free list has been disabled for a particular memory usage type (according to the free list map) +and the driver defines a free callback then it will be invoked. The free callback is also invoked for +all entries on the free list when the file is closed. + + + + + + +
static herr_t free (H5FD_t *file, H5MF_type_t type, haddr_t addr, hsize_t size)The file argument is the file for which space is being freed; type is the type of object being +freed (from the list above) without being mapped according to the freelist map; addr is the first +format address to free; and size is the size in bytes of the region being freed. The region being +freed may refer to just part of the region originally allocated and/or may cross allocation boundaries +provided all regions being freed have the same usage type. However, the library will never attempt +to free regions which have already been freed or which have never been allocated.
+A driver may choose to not define the free function, in which case format addresses will be leaked. +This isn't normally a huge problem since the library contains a simple free list of its own and freeing +parts of the format address space is not a common occurrence. + +\subsection subsec_vfl_address_query Querying the Address Range +Each file driver must have some mechanism for setting and querying the end of address, or +EOA, marker. The EOA marker is the first format address after the last format address ever allocated. +If the last part of the allocated address range is freed then the driver may optionally decrease the eoa marker. + + + + + +
static haddr_t get_eoa (H5FD_t *file)This function returns the current value of the EOA marker for the specified file.
+ +Example: The sec2 driver just returns the current eoa marker value which is cached in the file structure: +\code +static haddr_t +H5FD_sec2_get_eoa(H5FD_t *_file) +{ + H5FD_sec2_t *file = (H5FD_sec2_t*)_file; + return file->eoa; +} +\endcode + +The eoa marker is initially zero when a file is opened and the library may set it to some other value +shortly after the file is opened (after the superblock is read and the saved eoa marker is determined) +or when allocating additional memory in the absence of an alloc callback (described above). + +Example: The sec2 driver simply caches the eoa marker in the file structure and does not extend the +underlying Unix file. When the file is flushed or closed then the Unix file size is extended to match +the eoa marker. +\code +static herr_t +H5FD_sec2_set_eoa(H5FD_t *_file, haddr_t addr) +{ + H5FD_sec2_t *file = (H5FD_sec2_t*)_file; + file->eoa = addr; + return 0; +} +\endcode + +\section sec_vfl_data Data Functions +These functions operate on data, transferring a region of the format address space between memory and files. + +\subsection subsec_vfl_data_cont Contiguous I/O Functions +A driver must specify two functions to transfer data from the library to the file and vice versa. + + + + + + + + + +
static herr_t read (H5FD_t *file, H5FD_mem_t type, hid_t dxpl, haddr_t addr, hsize_t size, void *buf)The read function reads data from file file beginning at address addr and continuing +for size bytes into the buffer buf supplied by the caller.
static herr_t write (H5FD_t *file, H5FD_mem_t type, hid_t dxpl, haddr_t addr, hsize_t size, const void *buf)The write function transfers data +in the opposite direction.
+\li Both functions take a data transfer property list dxpl which +indicates the fine points of how the data is to be transferred and which comes directly +from the #H5Dread or #H5Dwrite function. +\li Both functions receive type of data being written, +which may allow a driver to tune it's behavior for different kinds of data. +\li Both functions should return +a negative value if they fail to transfer the requested data, or non-negative if they +succeed. The library will never attempt to read from unallocated regions of the format address space. + +Example: The sec2 driver just makes system calls. It tries not to call lseek if the current operation +is the same as the previous operation and the file position is correct. It also fills the output buffer +with zeros when reading between the current EOF and EOA markers and restarts system calls which were interrupted. +\code +static herr_t +H5FD_sec2_read(H5FD_t *_file, H5FD_mem_t type/*unused*/, hid_t dxpl_id/*unused*/, + haddr_t addr, hsize_t size, void *buf/*out*/) +{ + H5FD_sec2_t *file = (H5FD_sec2_t*)_file; + ssize_t nbytes; + + assert(file && file->pub.cls); + assert(buf); + + /* Check for overflow conditions */ + if (REGION_OVERFLOW(addr, size)) return -1; + if (addr+size>file->eoa) return -1; + + /* Seek to the correct location */ + if ((addr!=file->pos || OP_READ!=file->op) && + file_seek(file->fd, (file_offset_t)addr, SEEK_SET)<0) { + file->pos = HADDR_UNDEF; + file->op = OP_UNKNOWN; + return -1; + } + + /* + * Read data, being careful of interrupted system calls, partial results, + * and the end of the file. + */ + while (size>0) { + do nbytes = read(file->fd, buf, size); + while (-1==nbytes && EINTR==errno); + if (-1==nbytes) { + /* error */ + file->pos = HADDR_UNDEF; + file->op = OP_UNKNOWN; + return -1; + } + if (0==nbytes) { + /* end of file but not end of format address space */ + memset(buf, 0, size); + size = 0; + } + assert(nbytes>=0); + assert((hsize_t)nbytes<=size); + size -= (hsize_t)nbytes; + addr += (haddr_t)nbytes; + buf = (char*)buf + nbytes; + } + + /* Update current position */ + file->pos = addr; + file->op = OP_READ; + return 0; +} +\endcode +Example: The sec2 write callback is similar except it updates the file EOF marker when extending the file. + +\subsection subsec_vfl_data_flush Flushing Cached Data +Some drivers may desire to cache data in memory in order to make larger I/O requests to the +underlying file and thus improving bandwidth. Such drivers should register a cache flushing +function so that the library can insure that data has been flushed out of the drivers in +response to the application calling #H5Fflush. + + + + + +
static herr_t flush (H5FD_t *file)Flush all data for file file to storage.
+ +Example: The sec2 driver doesn't cache any data but it also doesn't extend the Unix file as +aggressively as it should. Therefore, when finalizing a file it should write a zero to the last +byte of the allocated region so that when reopening the file later the EOF marker will be at +least as large as the EOA marker saved in the superblock (otherwise HDF5 will refuse to open +the file, claiming that the data appears to be truncated). +\code +static herr_t +H5FD_sec2_flush(H5FD_t *_file) +{ + H5FD_sec2_t *file = (H5FD_sec2_t*)_file; + + if (file->eoa>file->eof) { + if (-1==file_seek(file->fd, file->eoa-1, SEEK_SET)) return -1; + if (write(file->fd, "", 1)!=1) return -1; + file->eof = file->eoa; + file->pos = file->eoa; + file->op = OP_WRITE; + } + + return 0; +} +\endcode + +\section sec_vfl_opt Optimization Functions +The library is capable of performing several generic optimizations on I/O, but these types of +optimizations may not be appropriate for a given VFL driver. + +Each driver may provide a query function to allow the library to query whether to enable these +optimizations. If a driver lacks a query function, the library will disable all types of +optimizations which can be queried. + + + + + + +
static herr_t query (const H5FD_t *file, unsigned long *flags)This function is called by the library to query which optimizations to enable for I/O to this driver.
+ +These are the flags which are currently defined: + + + + + + + + + + + + + +
H5FD_FEAT_AGGREGATE_METADATA (0x00000001)Defining the H5FD_FEAT_AGGREGATE_METADATA for a VFL driver means that the library will attempt to allocate +a larger block for metadata and then sub-allocate each metadata request from that larger block.
H5FD_FEAT_ACCUMULATE_METADATA (0x00000002)Defining the H5FD_FEAT_ACCUMULATE_METADATA for a VFL driver means that the library will attempt to cache +metadata as it is written to the file and build up a larger block of metadata to eventually pass to the +VFL 'write' routine.
H5FD_FEAT_DATA_SIEVE (0x00000004)Defining the H5FD_FEAT_DATA_SIEVE for a VFL driver means that the library will attempt to cache raw data + as it is read from/written to a file in a "data sieve" buffer.
+ +See Rajeev Thakur's papers: +http://www.mcs.anl.gov/~thakur/papers/romio-coll.ps.gz +http://www.mcs.anl.gov/~thakur/papers/mpio-high-perf.ps.gz + +\section sec_vfl_reg Registration of a Driver +Before a driver can be used the HDF5 library needs to be told of its existence. This is done by +registering the driver, which results in a driver identification number. Instead of passing many +arguments to the registration function, the driver information is entered into a structure and the +address of the structure is passed to the registration function where it is copied. This allows +the HDF5 API to be extended while providing backward compatibility at the source level. + + + + + + +
hid_t H5FDregister (H5FD_class_t *cls)The driver described by struct cls is registered with the library and an ID number for the driver is returned.
+ +The H5FD_class_t type is a struct with the following fields: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
const char *nameA pointer to a constant, null-terminated driver name to be used for debugging purposes.
size_t fapl_sizeThe size in bytes of the file access mode structure or zero if the driver supplies a copy function +or doesn't define the structure.
void *(*fapl_copy)(const void *fapl)An optional function which copies a driver-defined file access mode structure. This field takes +precedence over fm_size when both are defined.
void (*fapl_free)(void *fapl)An optional function to free the driver-defined file access mode structure. If null, then the +library calls the C free function to free the structure.
size_t dxpl_sizeThe size in bytes of the data transfer mode structure or zero if the driver supplies a copy +function or doesn't define the structure.
void *(*dxpl_copy)(const void *dxpl)An optional function which copies a driver-defined data transfer mode structure. This field +takes precedence over xm_size when both are defined.
void (*dxpl_free)(void *dxpl)An optional function to free the driver-defined data transfer mode structure. If null, then +the library calls the C free function to free the structure.
H5FD_t *(*open)(const char *name, unsigned flags, hid_t fapl, haddr_t maxaddr)The function which opens or creates a new file.
herr_t (*close)(H5FD_t *file)The function which ends access to a file.
int (*cmp)(const H5FD_t *f1, const H5FD_t *f2)An optional function to determine whether two open files have the same key. If this function +is not present then the library assumes that two files will never be the same.
int (*query)(const H5FD_t *f, unsigned long *flags)An optional function to determine which library optimizations a driver can support.
haddr_t (*alloc)(H5FD_t *file, H5FD_mem_t type, hsize_t size)An optional function to allocate space in the file.
herr_t (*free)(H5FD_t *file, H5FD_mem_t type, haddr_t addr, hsize_t size)An optional function to free space in the file.
haddr_t (*get_eoa)(H5FD_t *file)A function to query how much of the format address space has been allocated.
herr_t (*set_eoa)(H5FD_t *file, haddr_t)A function to set the end of address space.
haddr_t (*get_eof)(H5FD_t *file)A function to return the current end-of-file marker value.
herr_t (*read)(H5FD_t *file, H5FD_mem_t type, hid_t dxpl, haddr_t addr, hsize_t size, void *buffer)A function to read data from a file.
herr_t (*write)(H5FD_t *file, H5FD_mem_t type, hid_t dxpl, haddr_t addr, hsize_t size, const void *buffer)A function to write data to a file.
herr_t (*flush)(H5FD_t *file)A function which flushes cached data to the file.
H5FD_mem_t fl_map[H5FD_MEM_NTYPES]An array which maps a file allocation request type to a free list.
+ +Example: The sec2 driver would be registered as: +\code +static const H5FD_class_t H5FD_sec2_g = { + "sec2", /*name */ + MAXADDR, /*maxaddr */ + NULL, /*sb_size */ + NULL, /*sb_encode */ + NULL, /*sb_decode */ + 0, /*fapl_size */ + NULL, /*fapl_get */ + NULL, /*fapl_copy */ + NULL, /*fapl_free */ + 0, /*dxpl_size */ + NULL, /*dxpl_copy */ + NULL, /*dxpl_free */ + H5FD_sec2_open, /*open */ + H5FD_sec2_close, /*close */ + H5FD_sec2_cmp, /*cmp */ + H5FD_sec2_query, /*query */ + NULL, /*alloc */ + NULL, /*free */ + H5FD_sec2_get_eoa, /*get_eoa */ + H5FD_sec2_set_eoa, /*set_eoa */ + H5FD_sec2_get_eof, /*get_eof */ + H5FD_sec2_read, /*read */ + H5FD_sec2_write, /*write */ + H5FD_sec2_flush, /*flush */ + H5FD_FLMAP_SINGLE, /*fl_map */ +}; + +hid_t +H5FD_sec2_init(void) +{ + if (!H5FD_SEC2_g) { + H5FD_SEC2_g = H5FDregister(&H5FD_sec2_g); + } + return H5FD_SEC2_g; +} +\endcode + +A driver can be removed from the library by unregistering it + + + + + +
herr_t H5Dunregister (hid_t driver)Where driver is the ID number returned when the driver was registered.
+Unregistering a driver makes it unusable for creating new file access or data transfer property +lists but doesn't affect any property lists or files that already use that driver. + +\subsection subsec_vfl_reg_prog Programming Note for C++ Developers Using C Functions +If a C routine that takes a function pointer as an argument is called from within C++ code, +the C routine should be returned from normally. + +Examples of this kind of routine include callbacks such as #H5Pset_elink_cb +and #H5Pset_type_conv_cb and functions such as #H5Tconvert and #H5Ewalk2. + +Exiting the routine in its normal fashion allows the HDF5 C Library to clean up +its work properly. In other words, if the C++ application jumps out of the routine +back to the C++ “catch” statement, the library is not given the opportunity to close +any temporary data structures that were set up when the routine was called. The C++ +application should save some state as the routine is started so that any problem that +occurs might be diagnosed. + +\section sec_vfl_query Querying Driver Information + + + + + +
void * H5Pget_driver_data (hid_t fapl)
void * H5Pget_driver_data (hid_t fxpl)
This function is intended to be used by driver functions, not applications. It returns a pointer +directly into the file access property list fapl which is a copy of the driver's file access mode +originally provided to the H5Pset_driver function. If its argument is a data transfer property list +fxpl then it returns a pointer to the driver-specific data transfer information instead. +
+ +\section sec_vfl_misc Miscellaneous +The various private H5F_low_* functions will be replaced by public H5FD* functions so they +can be called from drivers. + +All private functions H5F_addr_* which operate on addresses will be renamed as public functions +by removing the first underscore so they can be called by drivers. + +The haddr_t address data type will be passed by value throughout the library. The original +intent was that this type would eventually be a union of file address types for the various +drivers and may become quite large, but that was back when drivers were part of HDF5. It will +become an alias for an unsigned integer type (32 or 64 bits depending on how the library was configured). + +The various H5F*.c driver files will be renamed H5FD*.c and each will have a corresponding header +file. All driver functions except the initializer and API will be declared static. + +This documentation didn't cover optimization functions which would be useful to drivers like MPI-IO. +Some drivers may be able to perform data pipeline operations more efficiently than HDF5 and need to +be given a chance to override those parts of the pipeline. The pipeline would be designed to call +various H5FD optimization functions at various points which return one of three values: the operation +is not implemented by the driver, the operation is implemented but failed in a non-recoverable manner, +the operation is implemented and succeeded. + +Various parts of HDF5 check the only the top-level file driver and do something special if it is +the MPI-IO driver. However, we might want to be able to put the MPI-IO driver under other drivers +such as the raw part of a split driver or under a debug driver whose sole purpose is to accumulate +statistics as it passes all requests through to the MPI-IO driver. Therefore we will probably need +a function which takes a format address and or object type and returns the driver which would have +been used at the lowest level to process the request. + +*/ diff --git a/src/H5Fmodule.h b/src/H5Fmodule.h index e83214bb40e..2551e13aaeb 100644 --- a/src/H5Fmodule.h +++ b/src/H5Fmodule.h @@ -658,7 +658,7 @@ * * HDF5 employs an extremely flexible mechanism called the virtual file layer, or VFL, for file * I/O. A full understanding of the VFL is only necessary if you plan to write your own drivers - * see \ref VFL in the HDF5 Technical Notes. + * see \ref VFLTN in the HDF5 \ref TN. * * For our * purposes here, it is sufficient to know that the low-level drivers used for file I/O reside in the @@ -691,7 +691,7 @@ * * If an application requires a special-purpose low-level driver, the VFL provides a public API for * creating one. For more information on how to create a driver, - * see \ref VFL in the HDF5 Technical Notes. + * see \ref VFLTN in the HDF5 \ref TN. * * \subsubsection subsubsec_file_alternate_drivers_id Identifying the Previously‐used File Driver * When creating a new HDF5 file, no history exists, so the file driver must be specified if it is to be @@ -1285,7 +1285,7 @@ * that the Memory virtual file driver, #H5FD_CORE, is used. The Memory file driver is also known * as the Core file driver. * - * Links to the \ref VFL and List of Functions documents can be found in the HDF5 \ref TN. + * Links to the \ref VFLTN and List of Functions documents can be found in the HDF5 \ref TN. * * \subsection subsec_file_image_api File Image C API Call Syntax * The C API function calls described in this chapter fall into two categories: low-level routines that are @@ -2653,7 +2653,7 @@ * of functions that deal with advanced file management tasks and use cases: * 1. The control of the HDF5 \ref MDC * 2. The use of (MPI-) \ref PH5F HDF5 - * 3. The \ref SWMR pattern + * 3. The \ref SWMRTN pattern * * \defgroup MDC Metadata Cache * \ingroup H5F diff --git a/src/H5Ppublic.h b/src/H5Ppublic.h index be6ea9fe3d2..45b6ab905cc 100644 --- a/src/H5Ppublic.h +++ b/src/H5Ppublic.h @@ -3629,7 +3629,7 @@ H5_DLL herr_t H5Pget_evict_on_close(hid_t fapl_id, hbool_t *evict_on_close); * application can retrieve a file handle for low-level access to * a particular member of a family of files. The file handle is * retrieved with a separate call to H5Fget_vfd_handle() (or, - * in special circumstances, to H5FDget_vfd_handle(), see \ref VFL). + * in special circumstances, to H5FDget_vfd_handle(), see \ref VFLTN). * * \since 1.6.0 * @@ -4576,7 +4576,7 @@ H5_DLL herr_t H5Pset_evict_on_close(hid_t fapl_id, hbool_t evict_on_close); * retrieve a file handle for low-level access to a particular member * of a family of files. The file handle is retrieved with a separate * call to H5Fget_vfd_handle() (or, in special circumstances, to - * H5FDget_vfd_handle(); see \ref VFL). + * H5FDget_vfd_handle(); see \ref VFLTN). * * The value of \p offset is an offset in bytes from the beginning of * the HDF5 file, identifying a user-determined location within the @@ -5316,7 +5316,7 @@ H5_DLL herr_t H5Pset_metadata_read_attempts(hid_t plist_id, unsigned attempts); * low-level access to the particular member of a set of \TText{MULTI} * files in which that type of data is stored. The file handle is * retrieved with a separate call to H5Fget_vfd_handle() (or, in special - * circumstances, to H5FDget_vfd_handle(); see \ref VFL. + * circumstances, to H5FDget_vfd_handle(); see \ref VFLTN. * * The type of data specified in \p type may be one of the following: *