HDF5 Access Logging

A module related to logging and hdf5.

Index

create_hdf5_log(file[, n_entries, …]) Create a hdf5 group for logging database modifications.
update_hdf5_log(group, index[, message, …]) Add a new entry to the hdf5 logger in file.
reset_hdf5_log(group[, version_values]) Clear and reset the passed logger Group.
log_to_dataframe(group) Export the log embedded within file to a Pandas DataFrame.

API

dataCAT.create_hdf5_log(file, n_entries=100, clear_when_full=False, version_names=array([b'CAT', b'Nano-CAT', b'Data-CAT'], dtype='|S8'), version_values=array([(0, 9, 11), (0, 6, 4), (0, 6, 0)], dtype=[('major', 'i1'), ('minor', 'i1'), ('micro', 'i1')]), **kwargs)[source]

Create a hdf5 group for logging database modifications.

The logger Group consists of four main datasets:

  • "date": Denotes dates and times for when the database is modified.
  • "version": Denotes user-specified package versions for when the database is modified.
  • "version_names" : See the version_names parameter.
  • "message": Holds user-specified modification messages.
  • "index": Denotes indices of which elements in the database were modified.

Examples

>>> import h5py
>>> from dataCAT import create_hdf5_log

>>> hdf5_file = str(...)  
>>> with h5py.File(hdf5_file, 'a') as f:
...     group = create_hdf5_log(f)
...
...     print('group', '=', group)
...     for name, dset in group.items():
...         print(f'group[{name!r}]', '=', dset)
group = <HDF5 group "/logger" (5 members)>
group['date'] = <HDF5 dataset "date": shape (100,), type "|V11">
group['version'] = <HDF5 dataset "version": shape (100, 3), type "|V3">
group['version_names'] = <HDF5 dataset "version_names": shape (3,), type "|S8">
group['message'] = <HDF5 dataset "message": shape (100,), type "|O">
group['index'] = <HDF5 dataset "index": shape (100,), type "|O">
Parameters:
  • file (h5py.File or h5py.Group) – The File or Group where the logger should be created.
  • n_entries (int) – The initial number of entries in each to-be created dataset. In addition, everytime the datasets run out of available slots their length will be increased by this number (assuming clear_when_full = False).
  • clear_when_full (bool) – If True, delete the logger and create a new one whenever it is full. Increase the size of each dataset by n_entries otherwise.
  • version_names (Sequence[str or bytes]) – A sequence consisting of strings and/or bytes representing the names of the to-be stored package versions. Should be of the same length as version_values.
  • version_values (Sequence[Tuple[int, int, int]]) – A sequence with 3-tuples, each tuple representing a package version associated with its respective counterpart in version_names.
  • **kwargs (Any) – Further keyword arguments for the h5py create_dataset() function.
Returns:

The newly created "logger" group.

Return type:

h5py.Group

dataCAT.update_hdf5_log(group, index, message=None, version_values=array([(0, 9, 11), (0, 6, 4), (0, 6, 0)], dtype=[('major', 'i1'), ('minor', 'i1'), ('micro', 'i1')]))[source]

Add a new entry to the hdf5 logger in file.

Examples

>>> from datetime import datetime

>>> import h5py
>>> from dataCAT import update_hdf5_log

>>> hdf5_file = str(...)  

>>> with h5py.File(hdf5_file, 'r+') as f:
...     group = f['ligand/logger']
...
...     n = group.attrs['n']
...     date_before = group['date'][n]
...     index_before = group['index'][n]
...
...     update_hdf5_log(group, index=[0, 1, 2, 3], message='append')
...     date_after = group['date'][n]
...     index_after = group['index'][n]

>>> print(index_before, index_after, sep='\n')
[]
[0 1 2 3]

>>> print(date_before, date_after, sep='\n')  
(0, 0, 0, 0, 0, 0, 0)
(2020, 6, 24, 16, 33, 7, 959888)
Parameters:
Return type:

None

dataCAT.reset_hdf5_log(group, version_values=array([(0, 9, 11), (0, 6, 4), (0, 6, 0)], dtype=[('major', 'i1'), ('minor', 'i1'), ('micro', 'i1')]))[source]

Clear and reset the passed logger Group.

Examples

>>> import h5py
>>> from dataCAT import reset_hdf5_log

>>> hdf5_file = str(...)  

>>> with h5py.File(hdf5_file, 'r+') as f:
...     group = f['ligand/logger']
...     print('before:')
...     print(group.attrs['n'])
...
...     group = reset_hdf5_log(group)
...     print('\nafter:')
...     print(group.attrs['n'])
before:
2

after:
0
Parameters:
Returns:

The newly (re-)created "logger" group.

Return type:

h5py.Group

dataCAT.log_to_dataframe(group)[source]

Export the log embedded within file to a Pandas DataFrame.

Examples

>>> import h5py
>>> from dataCAT import log_to_dataframe

>>> hdf5_file = str(...)  

>>> with h5py.File(hdf5_file, 'r') as f:
...     group = f['ligand/logger']
...     df = log_to_dataframe(group)
...     print(df)  
                             CAT              ... Data-CAT message               index
                           major minor micro  ...    micro
date                                          ...
2020-06-24 15:28:09.861074     0     9     6  ...        1  update                 [0]
2020-06-24 15:56:18.971201     0     9     6  ...        1  append  [1, 2, 3, 4, 5, 6]

[2 rows x 11 columns]
Parameters:group (h5py.Group) – The logger Group.
Returns:A DataFrame containing the content of file["logger"].
Return type:pandas.DataFrame