The PDBContainer Class
A module for constructing array-representations of .pdb files.
Index
|
An (immutable) class for holding array-like representions of a set of .pdb files. |
Get a read-only padded recarray for keeping track of all atom-related information. |
|
Get a read-only padded recarray for keeping track of all bond-related information. |
|
Get a read-only ndarray for keeping track of the number of atoms in each molecule in |
|
Get a read-only ndarray for keeping track of the number of atoms in each molecule in |
|
Get a recarray representing an index. |
|
Initialize an instance. |
|
Implement |
Implement |
|
Yield the (public) attribute names in this class. |
|
Yield the (public) attributes in this instance. |
|
Yield the (public) attribute name/value pairs in this instance. |
|
|
Concatenate \(n\) PDBContainers into a single new instance. |
|
Convert an iterable or sequence of molecules into a new |
|
Create a molecule or list of molecules from this instance. |
|
Create an rdkit molecule or list of rdkit molecules from this instance. |
|
Create a h5py Group for storing |
|
Validate the passed hdf5 group, ensuring it is compatible with |
|
Construct a new PDBContainer from the passed hdf5 group. |
|
Update all datasets in group positioned at index with its counterpart from pdb. |
|
Construct a new PDBContainer by the intersection of self and value. |
|
Construct a new PDBContainer by the difference of self and value. |
Construct a new PDBContainer by the symmetric difference of self and value. |
|
|
Construct a new PDBContainer by the union of self and value. |
API
- class dataCAT.PDBContainer(atoms, bonds, atom_count, bond_count, scale=None, validate=True, copy=True, index_dtype=None)[source]
An (immutable) class for holding array-like representions of a set of .pdb files.
The
PDBContainer
class serves as an (intermediate) container for storing .pdb files in the hdf5 format, thus facilitating the storage and interconversion between PLAMS molecules and theh5py
interface.The methods implemented in this class can roughly be divided into three categories:
Molecule-interconversion:
to_molecules()
,from_molecules()
&to_rdkit()
.hdf5-interconversion:
create_hdf5_group()
,validate_hdf5()
,to_hdf5()
&from_hdf5()
.Miscellaneous:
keys()
,values()
,items()
,__getitem__()
&__len__()
.
Examples
>>> import h5py >>> from scm.plams import readpdb >>> from dataCAT import PDBContainer >>> mol_list [readpdb(...), ...] >>> pdb = PDBContainer.from_molecules(mol_list) >>> print(pdb) PDBContainer( atoms = numpy.recarray(..., shape=(23, 76), dtype=...), bonds = numpy.recarray(..., shape=(23, 75), dtype=...), atom_count = numpy.ndarray(..., shape=(23,), dtype=int32), bond_count = numpy.ndarray(..., shape=(23,), dtype=int32), scale = numpy.recarray(..., shape=(23,), dtype=...) ) >>> hdf5_file = str(...) >>> with h5py.File(hdf5_file, 'a') as f: ... group = pdb.create_hdf5_group(f, name='ligand') ... pdb.to_hdf5(group, None) ... ... print('group', '=', group) ... for name, dset in group.items(): ... print(f'group[{name!r}]', '=', dset) group = <HDF5 group "/ligand" (5 members)> group['atoms'] = <HDF5 dataset "atoms": shape (23, 76), type "|V46"> group['bonds'] = <HDF5 dataset "bonds": shape (23, 75), type "|V9"> group['atom_count'] = <HDF5 dataset "atom_count": shape (23,), type "<i4"> group['bond_count'] = <HDF5 dataset "bond_count": shape (23,), type "<i4"> group['index'] = <HDF5 dataset "index": shape (23,), type "<i4">
- property atoms
Get a read-only padded recarray for keeping track of all atom-related information.
See
dataCAT.dtype.ATOMS_DTYPE
for a comprehensive overview of all field names and dtypes.- Type
numpy.recarray
, shape \((n, m)\)
- property bonds
Get a read-only padded recarray for keeping track of all bond-related information.
Note that all atomic indices are 1-based.
See
dataCAT.dtype.BONDS_DTYPE
for a comprehensive overview of all field names and dtypes.- Type
numpy.recarray
, shape \((n, k)\)
- property atom_count
Get a read-only ndarray for keeping track of the number of atoms in each molecule in
atoms
.- Type
numpy.ndarray[int32]
, shape \((n,)\)
- property bond_count
Get a read-only ndarray for keeping track of the number of atoms in each molecule in
bonds
.- Type
numpy.ndarray[int32]
, shape \((n,)\)
- property scale
Get a recarray representing an index.
Used as dimensional scale in the h5py Group.
- Type
numpy.recarray
, shape \((n,)\)
- __init__(atoms, bonds, atom_count, bond_count, scale=None, validate=True, copy=True, index_dtype=None)[source]
Initialize an instance.
- Parameters
atoms (
numpy.recarray
, shape \((n, m)\)) – A padded recarray for keeping track of all atom-related information. SeePDBContainer.atoms
.bonds (
numpy.recarray
, shape \((n, k)\)) – A padded recarray for keeping track of all bond-related information. SeePDBContainer.bonds
.atom_count (
numpy.ndarray[int32]
, shape \((n,)\)) – An ndarray for keeping track of the number of atoms in each molecule in atoms. SeePDBContainer.atom_count
.bond_count (
numpy.ndarray[int32]
, shape \((n,)\)) – An ndarray for keeping track of the number of bonds in each molecule in bonds. SeePDBContainer.bond_count
.scale (
numpy.recarray
, shape \((n,)\), optional) – A recarray representing an index. IfNone
, use a simple numerical index (i.e.numpy.arange()
). SeePDBContainer.scale
.
- Keyword Arguments
validate (
bool
) – IfTrue
perform more thorough validation of the input arrays. Note that this also allows the parameters to-be passed as array-like objects in addition to aforementionedndarray
orrecarray
instances.copy (
bool
) – IfTrue
, set the passed arrays as copies. Only relevant ifvalidate = True
.
- Return type
API: Miscellaneous Methods
- PDBContainer.__getitem__(index)[source]
Implement
self[index]
.Constructs a new
PDBContainer
instance by slicing all arrays with index. Follows the standard NumPy broadcasting rules: if an integer or slice is passed then a shallow copy is returned; otherwise a deep copy will be created.Examples
>>> from dataCAT import PDBContainer >>> pdb = PDBContainer(...) >>> print(pdb) PDBContainer( atoms = numpy.recarray(..., shape=(23, 76), dtype=...), bonds = numpy.recarray(..., shape=(23, 75), dtype=...), atom_count = numpy.ndarray(..., shape=(23,), dtype=int32), bond_count = numpy.ndarray(..., shape=(23,), dtype=int32), scale = numpy.recarray(..., shape=(23,), dtype=...) ) >>> pdb[0] PDBContainer( atoms = numpy.recarray(..., shape=(1, 76), dtype=...), bonds = numpy.recarray(..., shape=(1, 75), dtype=...), atom_count = numpy.ndarray(..., shape=(1,), dtype=int32), bond_count = numpy.ndarray(..., shape=(1,), dtype=int32), scale = numpy.recarray(..., shape=(1,), dtype=...) ) >>> pdb[:10] PDBContainer( atoms = numpy.recarray(..., shape=(10, 76), dtype=...), bonds = numpy.recarray(..., shape=(10, 75), dtype=...), atom_count = numpy.ndarray(..., shape=(10,), dtype=int32), bond_count = numpy.ndarray(..., shape=(10,), dtype=int32), scale = numpy.recarray(..., shape=(10,), dtype=...) ) >>> pdb[[0, 5, 7, 9, 10]] PDBContainer( atoms = numpy.recarray(..., shape=(5, 76), dtype=...), bonds = numpy.recarray(..., shape=(5, 75), dtype=...), atom_count = numpy.ndarray(..., shape=(5,), dtype=int32), bond_count = numpy.ndarray(..., shape=(5,), dtype=int32), scale = numpy.recarray(..., shape=(5,), dtype=...) )
- Parameters
index (
int
,Sequence[int]
orslice
) – An object for slicing arrays alongaxis=0
.- Returns
A shallow or deep copy of a slice of this instance.
- Return type
- PDBContainer.__len__()[source]
Implement
len(self)
.- Returns
Returns the length of the arrays embedded within this instance (which are all of the same length).
- Return type
- classmethod PDBContainer.keys()[source]
Yield the (public) attribute names in this class.
Examples
>>> from dataCAT import PDBContainer >>> for name in PDBContainer.keys(): ... print(name) atoms bonds atom_count bond_count scale
- Yields
str
– The names of all attributes in this class.
- PDBContainer.values()[source]
Yield the (public) attributes in this instance.
Examples
>>> from dataCAT import PDBContainer >>> pdb = PDBContainer(...) >>> for value in pdb.values(): ... print(object.__repr__(value)) <numpy.recarray object at ...> <numpy.recarray object at ...> <numpy.ndarray object at ...> <numpy.ndarray object at ...> <numpy.recarray object at ...>
- Yields
str
– The values of all attributes in this instance.
- PDBContainer.items()[source]
Yield the (public) attribute name/value pairs in this instance.
Examples
>>> from dataCAT import PDBContainer >>> pdb = PDBContainer(...) >>> for name, value in pdb.items(): ... print(name, '=', object.__repr__(value)) atoms = <numpy.recarray object at ...> bonds = <numpy.recarray object at ...> atom_count = <numpy.ndarray object at ...> bond_count = <numpy.ndarray object at ...> scale = <numpy.recarray object at ...>
- Yields
str
andnumpy.ndarray
/numpy.recarray
– The names and values of all attributes in this instance.
- PDBContainer.concatenate(*args)[source]
Concatenate \(n\) PDBContainers into a single new instance.
Examples
>>> from dataCAT import PDBContainer >>> pdb1 = PDBContainer(...) >>> pdb2 = PDBContainer(...) >>> pdb3 = PDBContainer(...) >>> print(len(pdb1), len(pdb2), len(pdb3)) 23 23 23 >>> pdb_new = pdb1.concatenate(pdb2, pdb3) >>> print(pdb_new) PDBContainer( atoms = numpy.recarray(..., shape=(69, 76), dtype=...), bonds = numpy.recarray(..., shape=(69, 75), dtype=...), atom_count = numpy.ndarray(..., shape=(69,), dtype=int32), bond_count = numpy.ndarray(..., shape=(69,), dtype=int32), scale = numpy.recarray(..., shape=(69,), dtype=...) )
- Parameters
*args (
PDBContainer
) – One or more PDBContainers.- Returns
A new PDBContainer cosntructed by concatenating self and args.
- Return type
API: Object Interconversion
- classmethod PDBContainer.from_molecules(mol_list, min_atom=0, min_bond=0, scale=None)[source]
Convert an iterable or sequence of molecules into a new
PDBContainer
instance.Examples
>>> from typing import List >>> from dataCAT import PDBContainer >>> from scm.plams import readpdb, Molecule >>> mol_list: List[Molecule] = [readpdb(...), ...] >>> PDBContainer.from_molecules(mol_list) PDBContainer( atoms = numpy.recarray(..., shape=(23, 76), dtype=...), bonds = numpy.recarray(..., shape=(23, 75), dtype=...), atom_count = numpy.ndarray(..., shape=(23,), dtype=int32), bond_count = numpy.ndarray(..., shape=(23,), dtype=int32), scale = numpy.recarray(..., shape=(23,), dtype=...) )
- Parameters
mol_list (
Iterable[Molecule]
) – An iterable consisting of PLAMS molecules.min_atom (
int
) – The minimum number of atoms whichPDBContainer.atoms
should accomodate.min_bond (
int
) – The minimum number of bonds whichPDBContainer.bonds
should accomodate.scale (array-like, optional) – An array-like object representing an user-specified index. Defaults to a simple range index if
None
(seenumpy.arange()
).
- Returns
A pdb container.
- Return type
- PDBContainer.to_molecules(index=None, mol=None)[source]
Create a molecule or list of molecules from this instance.
Examples
An example where one or more new molecules are created.
>>> from dataCAT import PDBContainer >>> from scm.plams import Molecule >>> pdb = PDBContainer(...) # Create a single new molecule from `pdb` >>> pdb.to_molecules(index=0) <scm.plams.mol.molecule.Molecule object at ...> # Create three new molecules from `pdb` >>> pdb.to_molecules(index=[0, 1]) [<scm.plams.mol.molecule.Molecule object at ...>, <scm.plams.mol.molecule.Molecule object at ...>]
An example where one or more existing molecules are updated in-place.
# Update `mol` with the info from `pdb` >>> mol = Molecule(...) # doctest: +SKIP >>> mol_new = pdb.to_molecules(index=2, mol=mol) >>> mol is mol_new True # Update all molecules in `mol_list` with info from `pdb` >>> mol_list = [Molecule(...), Molecule(...), Molecule(...)] # doctest: +SKIP >>> mol_list_new = pdb.to_molecules(index=range(3), mol=mol_list) >>> for m, m_new in zip(mol_list, mol_list_new): ... print(m is m_new) True True True
- Parameters
index (
int
,Sequence[int]
orslice
, optional) – An object for slicing the arrays embedded within this instance. Follows the standard numpy broadcasting rules (e.g.self.atoms[index]
). If a scalar is provided (e.g. an integer) then a single molecule will be returned. If a sequence, range, slice, etc. is provided then a list of molecules will be returned.mol (
Molecule
orIterable[Molecule]
, optional) – A molecule or list of molecules. If one or molecules are provided here then they will be updated in-place.
- Returns
A molecule or list of molecules, depending on whether or not index is a scalar or sequence / slice. Note that if
mol is not None
, then the-be returned molecules won’t be copies.- Return type
- PDBContainer.to_rdkit(index=None, sanitize=True)[source]
Create an rdkit molecule or list of rdkit molecules from this instance.
Examples
An example where one or more new molecules are created.
>>> from dataCAT import PDBContainer >>> from rdkit.Chem import Mol >>> pdb = PDBContainer(...) # Create a single new molecule from `pdb` >>> pdb.to_rdkit(index=0) <rdkit.Chem.rdchem.Mol object at ...> # Create three new molecules from `pdb` >>> pdb.to_rdkit(index=[0, 1]) [<rdkit.Chem.rdchem.Mol object at ...>, <rdkit.Chem.rdchem.Mol object at ...>]
- Parameters
index (
int
,Sequence[int]
orslice
, optional) – An object for slicing the arrays embedded within this instance. Follows the standard numpy broadcasting rules (e.g.self.atoms[index]
). If a scalar is provided (e.g. an integer) then a single molecule will be returned. If a sequence, range, slice, etc. is provided then a list of molecules will be returned.sanitize (bool) – Whether to sanitize the molecule before returning or not.
- Returns
A molecule or list of molecules, depending on whether or not index is a scalar or sequence / slice.
- Return type
- classmethod PDBContainer.create_hdf5_group(file, name, *, scale=None, scale_dtype=None, **kwargs)[source]
Create a h5py Group for storing
dataCAT.PDBContainer
instances.Notes
The scale and scale_dtype parameters are mutually exclusive.
- Parameters
file (
h5py.File
orh5py.Group
) – The h5py File or Group where the new Group will be created.name (
str
) – The name of the to-be created Group.
- Keyword Arguments
scale (
h5py.Dataset
, keyword-only) – A pre-existing dataset serving as dimensional scale. See scale_dtype to create a new instead instead.scale_dtype (dtype-like, keyword-only) – The datatype of the to-be created dimensional scale. See scale to use a pre-existing dataset for this purpose.
**kwargs (
Any
) – Further keyword arguments for the creation of each dataset. Arguments already specified by default are:name
,shape
,maxshape
anddtype
.
- Returns
The newly created Group.
- Return type
- classmethod PDBContainer.validate_hdf5(group)[source]
Validate the passed hdf5 group, ensuring it is compatible with
PDBContainer
instances.An
AssertionError
will be raise if group does not validate.This method is called automatically when an exception is raised by
to_hdf5()
orfrom_hdf5()
.- Parameters
group (
h5py.Group
) – The to-be validated hdf5 Group.- Raises
AssertionError – Raised if the validation process fails.
- classmethod PDBContainer.from_hdf5(group, index=None)[source]
Construct a new PDBContainer from the passed hdf5 group.
- Parameters
group (
h5py.Group
) – The to-be read h5py group.index (
int
,Sequence[int]
orslice
, optional) – An object for slicing all datasets in group.
- Returns
A new PDBContainer constructed from group.
- Return type
- PDBContainer.to_hdf5(group, index, update_scale=True)[source]
Update all datasets in group positioned at index with its counterpart from pdb.
Follows the standard broadcasting rules as employed by h5py.
Important
If index is passed as a sequence of integers then, contrary to NumPy, they will have to be sorted.
- Parameters
group (
h5py.Group
) – The to-be updated h5py group.index (
int
,Sequence[int]
orslice
) – An object for slicing all datasets in group. Note that, contrary to numpy, if a sequence of integers is provided then they’ll have to ordered.update_scale (
bool
) – IfTrue
, also exportPDBContainer.scale
to the dimensional scale in the passed group.
API: Set Operations
- PDBContainer.intersection(value)[source]
Construct a new PDBContainer by the intersection of self and value.
Examples
An example where one or more new molecules are created.
>>> from dataCAT import PDBContainer >>> pdb = PDBContainer(...) >>> print(pdb.scale) [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22] >>> pdb_new = pdb.intersection(range(4)) >>> print(pdb_new.scale) [0 1 2 3]
- Parameters
value (
PDBContainer
or array-like) – Another PDBContainer or an array-like object representingPDBContainer.scale
. Note that both value and self.scale should consist of unique elements.- Returns
A new instance by intersecting
self.scale
and value.- Return type
See also
set.intersection
Return the intersection of two sets as a new set.
- PDBContainer.difference(value)[source]
Construct a new PDBContainer by the difference of self and value.
Examples
>>> from dataCAT import PDBContainer >>> pdb = PDBContainer(...) >>> print(pdb.scale) [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22] >>> pdb_new = pdb.difference(range(10, 30)) >>> print(pdb_new.scale) [0 1 2 3 4 5 6 7 8 9]
- Parameters
value (
PDBContainer
or array-like) – Another PDBContainer or an array-like object representingPDBContainer.scale
. Note that both value and self.scale should consist of unique elements.- Returns
A new instance as the difference of
self.scale
and value.- Return type
See also
set.difference
Return the difference of two or more sets as a new set.
- PDBContainer.symmetric_difference(value)[source]
Construct a new PDBContainer by the symmetric difference of self and value.
Examples
>>> from dataCAT import PDBContainer >>> pdb = PDBContainer(...) >>> pdb2 = PDBContainer(..., scale=range(10, 30)) >>> print(pdb.scale) [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22] >>> pdb_new = pdb.symmetric_difference(pdb2) >>> print(pdb_new.scale) [ 0 1 2 3 4 5 6 7 8 9 23 24 25 26 27 28 29]
- Parameters
value (
PDBContainer
) – Another PDBContainer. Note that both value.scale and self.scale should consist of unique elements.- Returns
A new instance as the symmetric difference of
self.scale
and value.- Return type
See also
set.symmetric_difference
Return the symmetric difference of two sets as a new set.
- PDBContainer.union(value)[source]
Construct a new PDBContainer by the union of self and value.
Examples
>>> from dataCAT import PDBContainer >>> pdb = PDBContainer(...) >>> pdb2 = PDBContainer(..., scale=range(10, 30)) >>> print(pdb.scale) [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22] >>> pdb_new = pdb.union(pdb2) >>> print(pdb_new.scale) [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29]
- Parameters
value (
PDBContainer
) – Another PDBContainer. Note that both value and self.scale should consist of unique elements.- Returns
A new instance as the union of
self.index
and value.- Return type
See also
set.union
Return the union of sets as a new set.