HDF5 Property Storage
A module for storing quantum mechanical properties in hdf5 format.
Index
|
Create a group for holding user-specified properties. |
|
Construct a new dataset for holding a user-defined molecular property. |
|
Update dset at position index with data. |
|
Validate the passed hdf5 group, ensuring it is compatible with |
|
Construct an MultiIndex from the passed |
|
Convert the passed property Dataset into a DataFrame. |
API
- dataCAT.create_prop_group(file, scale)[source]
Create a group for holding user-specified properties.
>>> import h5py >>> from dataCAT import create_prop_group >>> hdf5_file = str(...) >>> with h5py.File(hdf5_file, 'r+') as f: ... scale = f.create_dataset('index', data=np.arange(10)) ... scale.make_scale('index') ... ... group = create_prop_group(f, scale=scale) ... print('group', '=', group) group = <HDF5 group "/properties" (0 members)>
- Parameters
file (
h5py.File
orh5py.Group
) – The File or Group where the new"properties"
group should be created.scale (
h5py.DataSet
) – The dimensional scale which will be attached to all property datasets created bydataCAT.create_prop_dset()
.
- Returns
The newly created group.
- Return type
- dataCAT.create_prop_dset(group, name, dtype=None, prop_names=None, **kwargs)[source]
Construct a new dataset for holding a user-defined molecular property.
Examples
In the example below a new dataset is created for storing solvation energies in water, methanol and ethanol.
>>> import h5py >>> from dataCAT import create_prop_dset >>> hdf5_file = str(...) >>> with h5py.File(hdf5_file, 'r+') as f: ... group = f['properties'] ... prop_names = ['water', 'methanol', 'ethanol'] ... ... dset = create_prop_dset(group, 'E_solv', prop_names=prop_names) ... dset_names = group['E_solv_names'] ... ... print('group', '=', group) ... print('group["E_solv"]', '=', dset) ... print('group["E_solv_names"]', '=', dset_names) group = <HDF5 group "/properties" (2 members)> group["E_solv"] = <HDF5 dataset "E_solv": shape (10, 3), type "<f4"> group["E_solv_names"] = <HDF5 dataset "E_solv_names": shape (3,), type "|S8">
- Parameters
group (
h5py.Group
) – The"properties"
group where the new dataset will be created.name (
str
) – The name of the new dataset.prop_names (
Sequence[str]
, optional) – The names of each row in the to-be created dataset. Used for defining the length of the second axis and will be used as a dimensional scale for aforementioned axis. IfNone
, create a 1D dataset (with no columns) instead.dtype (dtype-like) – The data type of the to-be created dataset.
**kwargs (
Any
) – Further keyword arguments for the h5pycreate_dataset()
method.
- Returns
The newly created dataset.
- Return type
- dataCAT.update_prop_dset(dset, data, index=None)[source]
Update dset at position index with data.
- Parameters
dset (
h5py.Dataset
) – The to-be updated h5py dataset.data (
numpy.ndarray
) – An array containing the to-be added data.index (
slice
ornumpy.ndarray
, optional) – The indices of all to-be updated elements in dset. index either should be of the same length as data.
- Return type
- dataCAT.validate_prop_group(group)[source]
Validate the passed hdf5 group, ensuring it is compatible with
create_prop_group()
andcreate_prop_group()
.This method is called automatically when an exception is raised by
update_prop_dset()
.- Parameters
group (
h5py.Group
) – The to-be validated hdf5 Group.- Raises
AssertionError – Raised if the validation process fails.
- dataCAT.index_to_pandas(dset, fields=None)[source]
Construct an MultiIndex from the passed
index
dataset.Examples
>>> from dataCAT import index_to_pandas >>> import h5py >>> filename = str(...) # Convert the entire dataset >>> with h5py.File(filename, "r") as f: ... dset: h5py.Dataset = f["ligand"]["index"] ... index_to_pandas(dset) MultiIndex([('O=C=O', 'O1'), ('O=C=O', 'O3'), ( 'CCCO', 'O4')], names=['ligand', 'ligand anchor']) # Convert a subset of fields >>> with h5py.File(filename, "r") as f: ... dset = f["ligand"]["index"] ... index_to_pandas(dset, fields=["ligand"]) MultiIndex([('O=C=O',), ('O=C=O',), ( 'CCCO',)], names=['ligand'])
- Parameters
dset (
h5py.Dataset
) – The relevantindex
dataset.fields (
Sequence[str]
) – The names of theindex
fields that are to-be included in the returned MultiIndex. IfNone
, include all fields.
- Returns
A multi-index constructed from the passed dataset.
- Return type
- dataCAT.prop_to_dataframe(dset, dtype=None)[source]
Convert the passed property Dataset into a DataFrame.
Examples
>>> import h5py >>> from dataCAT import prop_to_dataframe >>> hdf5_file = str(...) >>> with h5py.File(hdf5_file, 'r') as f: ... dset = f['ligand/properties/E_solv'] ... df = prop_to_dataframe(dset) ... print(df) E_solv_names water methanol ethanol ligand ligand anchor O=C=O O1 -0.918837 -0.151129 -0.177396 O3 -0.221182 -0.261591 -0.712906 CCCO O4 -0.314799 -0.784353 -0.190898
- Parameters
dset (
h5py.Dataset
) – The property-containing Dataset of interest.dtype (dtype-like, optional) – The data type of the to-be returned DataFrame. Use
None
to default to the data type of dset.
- Returns
A DataFrame constructed from the passed dset.
- Return type