nanoCAT.recipes.fast_sigma
A recipe for calculating specific COSMO-RS properties using the fast-sigma approximation.
Index
|
Perform (fast-sigma) COSMO-RS property calculations on the passed SMILES and solvents. |
|
Estimate the sigma profile of a SMILES string using the COSMO-RS fast-sigma method. |
|
Read the passed .csv file as produced by |
|
Sanitize the passed dataframe, canonicalizing the SMILES in its index, converting the columns into a multiIndex and removing duplicate entries. |
API
- nanoCAT.recipes.run_fast_sigma(input_smiles, solvents, *, output_dir='crs', ams_dir=None, chunk_size=100, processes=None, return_df=False, log_options=mappingproxy({'file': 5, 'stdout': 3, 'time': True, 'date': False}))[source]
Perform (fast-sigma) COSMO-RS property calculations on the passed SMILES and solvents.
The output is exported to the
cosmo-rs.csv
file.Includes the following properties:
LogP
Activety Coefficient
Solvation Energy
Formula
Molar Mass
Nring
boilingpoint
criticalpressure
criticaltemp
criticalvol
density
dielectricconstant
entropygas
flashpoint
gidealgas
hcombust
hformstd
hfusion
hidealgas
hsublimation
meltingpoint
molarvol
parachor
solubilityparam
tpt
vdwarea
vdwvol
vaporpressure
Jobs are performed in parallel, with chunks of a given size being distributed to a user-specified number of processes and subsequently cashed. After all COSMO-RS calculations have been performed, the temporary .csv files are concatenated into
cosmo-rs.csv
.Examples
>>> import os >>> import pandas as pd >>> from nanoCAT.recipes import run_fast_sigma >>> output_dir: str = ... >>> smiles_list = ["CO[H]", "CCO[H]", "CCCO[H]"] >>> solvent_dict = { ... "water": "$AMSRESOURCES/ADFCRS/Water.coskf", ... "octanol": "$AMSRESOURCES/ADFCRS/1-Octanol.coskf", ... } >>> run_fast_sigma(smiles_list, solvent_dict, output_dir=output_dir) >>> csv_file = os.path.join(output_dir, "cosmo-rs.csv") >>> pd.read_csv(csv_file, header=[0, 1], index_col=0) property Activity Coefficient ... Solvation Energy solvent octanol water ... octanol water smiles ... CO[H] 1.045891 4.954782 ... -2.977354 -3.274420 CCO[H] 0.980956 12.735228 ... -4.184214 -3.883986 CCCO[H] 0.905952 47.502557 ... -4.907177 -3.779867 [3 rows x 8 columns]
- Parameters:
input_smiles (
Iterable[str]
) – The input SMILES strings.solvents (
Mapping[str, path-like]
) – A mapping with solvent-names as keys and paths to their respective .coskf files as values.
- Keyword Arguments:
output_dir (path-like object) – The directory wherein the .csv files will be stored. A new directory will be created if it does not yet exist.
plams_dir (path-like, optional) – The directory wherein all COSMO-RS computations will be performed. If
None
, use a temporary directory inside output_dir.chunk_size (
int
) – The (maximum) number of entries to-be stored in a single .csv file.processes (
int
, optional) – The number of worker processes to use. IfNone
, use the number returned byos.cpu_count()
.return_df (
bool
) – IfTrue
, return a dataframe with the content ofcosmo-rs.csv
.log_options (
Mapping[str, Any]
) – Alternative settings forplams.config.log
. See the PLAMS documentation for more details.
- nanoCAT.recipes.get_compkf(smiles, directory=None, name=None)[source]
Estimate the sigma profile of a SMILES string using the COSMO-RS fast-sigma method.
See the COSMO-RS docs for more details.
- Parameters:
smiles (
str
) – The SMILES string of the molecule of interest.directory (
str
, optional) – The directory wherein the resulting.compkf
file should be stored. IfNone
, use the current working directory.name (
str
) – The name of the to-be created .compkf file (excluding extensions). IfNone
, use smiles.
- Returns:
The absolute path to the created
.compkf
file.None
will be returned if an error is raised by AMS.- Return type:
str
, optional
- nanoCAT.recipes.read_csv(file, *, columns=None, **kwargs)[source]
Read the passed .csv file as produced by
run_fast_sigma()
.Examples
>>> from nanoCAT.recipes import read_csv >>> file: str = ... >>> columns1 = ["molarvol", "gidealgas", "Activity Coefficient"] >>> read_csv(file, usecols=columns1) property molarvol gidealgas Activity Coefficient solvent NaN NaN octanol water smiles CCCO[H] 0.905952 47.502557 -153.788589 0.078152 CCO[H] 0.980956 12.735228 -161.094955 0.061220 CO[H] 1.045891 4.954782 NaN NaN >>> columns2 = [("Solvation Energy", "water")] >>> read_csv(file, usecols=columns2) property Solvation Energy solvent water smiles CCCO[H] -3.779867 CCO[H] -3.883986 CO[H] -3.274420
- Parameters:
file (path-like object) – The name of the to-be opened .csv file.
columns (key or sequence of keys, optional) – The to-be read columns. Note that any passed value must be a valid dataframe (multiindex) key.
**kwargs (
Any
) – Further keyword arguments forpd.read_csv
.
See also
pd.read_csv
Read a comma-separated values (csv) file into DataFrame.
- nanoCAT.recipes.sanitize_smiles_df(df, column_levels=2, column_padding=None)[source]
Sanitize the passed dataframe, canonicalizing the SMILES in its index, converting the columns into a multiIndex and removing duplicate entries.
Examples
>>> import pandas as pd >>> from nanoCAT.recipes import sanitize_smiles_df >>> df: pd.DataFrame = ... >>> print(df) a smiles CCCO[H] 1 CCO[H] 2 CO[H] 3 >>> sanitize_smiles_df(df) a NaN smiles CCCO 1 CCO 2 CO 3
- Parameters:
df (
pd.DataFrame
) – The dataframe in question. The dataframes’ index should consist of smiles strings.column_levels (
int
) – The number of multiindex column levels that should be in the to-be returned dataframe.column_padding (
Hashable
) – The object used as padding for the multiindex levels (where appropiate).
- Returns:
The newly sanitized dataframe. Returns either the initially passed dataframe or a copy thereof.
- Return type: