nanoCAT.recipes.fast_sigma

A recipe for calculating specific COSMO-RS properties using the fast-sigma approximation.

Index

`run_fast_sigma`(input_smiles, solvents, *[, ...])	Perform (fast-sigma) COSMO-RS property calculations on the passed SMILES and solvents.
`get_compkf`(smiles[, directory, name])	Estimate the sigma profile of a SMILES string using the COSMO-RS fast-sigma method.
`read_csv`(file, *[, columns])	Read the passed .csv file as produced by `run_fast_sigma()`.
`sanitize_smiles_df`(df[, column_levels, ...])	Sanitize the passed dataframe, canonicalizing the SMILES in its index, converting the columns into a multiIndex and removing duplicate entries.

API

nanoCAT.recipes.run_fast_sigma(input_smiles, solvents, *, output_dir='crs', ams_dir=None, chunk_size=100, processes=None, return_df=False, log_options=mappingproxy({'file': 5, 'stdout': 3, 'time': True, 'date': False}))[source]

Perform (fast-sigma) COSMO-RS property calculations on the passed SMILES and solvents.

The output is exported to the cosmo-rs.csv file.

Includes the following properties:

LogP
Activety Coefficient
Solvation Energy
Formula
Molar Mass
Nring
boilingpoint
criticalpressure
criticaltemp
criticalvol
density
dielectricconstant
entropygas
flashpoint
gidealgas
hcombust
hformstd
hfusion
hidealgas
hsublimation
meltingpoint
molarvol
parachor
solubilityparam
tpt
vdwarea
vdwvol
vaporpressure

Jobs are performed in parallel, with chunks of a given size being distributed to a user-specified number of processes and subsequently cashed. After all COSMO-RS calculations have been performed, the temporary .csv files are concatenated into cosmo-rs.csv.

Examples

>>> import os
>>> import pandas as pd
>>> from nanoCAT.recipes import run_fast_sigma

>>> output_dir: str = ...
>>> smiles_list = ["CO[H]", "CCO[H]", "CCCO[H]"]
>>> solvent_dict = {
...     "water": "$AMSRESOURCES/ADFCRS/Water.coskf",
...     "octanol": "$AMSRESOURCES/ADFCRS/1-Octanol.coskf",
... }

>>> run_fast_sigma(smiles_list, solvent_dict, output_dir=output_dir)

>>> csv_file = os.path.join(output_dir, "cosmo-rs.csv")
>>> pd.read_csv(csv_file, header=[0, 1], index_col=0)
property Activity Coefficient             ... Solvation Energy
solvent               octanol      water  ...          octanol     water
smiles                                    ...
CO[H]                1.045891   4.954782  ...        -2.977354 -3.274420
CCO[H]               0.980956  12.735228  ...        -4.184214 -3.883986
CCCO[H]              0.905952  47.502557  ...        -4.907177 -3.779867

[3 rows x 8 columns]

Parameters

input_smiles (Iterable[str]) – The input SMILES strings.
solvents (Mapping[str, path-like]) – A mapping with solvent-names as keys and paths to their respective .coskf files as values.

Keyword Arguments

output_dir (path-like object) – The directory wherein the .csv files will be stored. A new directory will be created if it does not yet exist.
plams_dir (path-like, optional) – The directory wherein all COSMO-RS computations will be performed. If None, use a temporary directory inside output_dir.
chunk_size (int) – The (maximum) number of entries to-be stored in a single .csv file.
processes (int, optional) – The number of worker processes to use. If None, use the number returned by os.cpu_count().
return_df (bool) – If True, return a dataframe with the content of cosmo-rs.csv.
log_options (Mapping[str, Any]) – Alternative settings for plams.config.log. See the PLAMS documentation for more details.

nanoCAT.recipes.get_compkf(smiles, directory=None, name=None)[source]

Estimate the sigma profile of a SMILES string using the COSMO-RS fast-sigma method.

See the COSMO-RS docs for more details.

Parameters

smiles (str) – The SMILES string of the molecule of interest.
directory (str, optional) – The directory wherein the resulting .compkf file should be stored. If None, use the current working directory.
name (str) – The name of the to-be created .compkf file (excluding extensions). If None, use smiles.

Returns

The absolute path to the created .compkf file. None will be returned if an error is raised by AMS.

Return type

str, optional

nanoCAT.recipes.read_csv(file, *, columns=None, **kwargs)[source]

Read the passed .csv file as produced by run_fast_sigma().

Examples

>>> from nanoCAT.recipes import read_csv

>>> file: str = ...

>>> columns1 = ["molarvol", "gidealgas", "Activity Coefficient"]
>>> read_csv(file, usecols=columns1)
property  molarvol  gidealgas Activity Coefficient
solvent        NaN        NaN              octanol     water
smiles
CCCO[H]   0.905952  47.502557          -153.788589  0.078152
CCO[H]    0.980956  12.735228          -161.094955  0.061220
CO[H]     1.045891   4.954782                  NaN       NaN

>>> columns2 = [("Solvation Energy", "water")]
>>> read_csv(file, usecols=columns2)
property Solvation Energy
solvent             water
smiles
CCCO[H]         -3.779867
CCO[H]          -3.883986
CO[H]           -3.274420

Parameters

file (path-like object) – The name of the to-be opened .csv file.
columns (key or sequence of keys, optional) – The to-be read columns. Note that any passed value must be a valid dataframe (multiindex) key.
**kwargs (Any) – Further keyword arguments for pd.read_csv.