nanoCAT.recipes.fast_sigma

A recipe for calculating specific COSMO-RS properties using the fast-sigma approximation.

Index

run_fast_sigma(input_smiles, solvents, *[, ...])

Perform (fast-sigma) COSMO-RS property calculations on the passed SMILES and solvents.

get_compkf(smiles[, directory, name])

Estimate the sigma profile of a SMILES string using the COSMO-RS fast-sigma method.

read_csv(file, *[, columns])

Read the passed .csv file as produced by run_fast_sigma().

sanitize_smiles_df(df[, column_levels, ...])

Sanitize the passed dataframe, canonicalizing the SMILES in its index, converting the columns into a multiIndex and removing duplicate entries.

API

nanoCAT.recipes.run_fast_sigma(input_smiles, solvents, *, output_dir='crs', ams_dir=None, chunk_size=100, processes=None, return_df=False, log_options=mappingproxy({'file': 5, 'stdout': 3, 'time': True, 'date': False}))[source]

Perform (fast-sigma) COSMO-RS property calculations on the passed SMILES and solvents.

The output is exported to the cosmo-rs.csv file.

Includes the following properties:

  • LogP

  • Activety Coefficient

  • Solvation Energy

  • Formula

  • Molar Mass

  • Nring

  • boilingpoint

  • criticalpressure

  • criticaltemp

  • criticalvol

  • density

  • dielectricconstant

  • entropygas

  • flashpoint

  • gidealgas

  • hcombust

  • hformstd

  • hfusion

  • hidealgas

  • hsublimation

  • meltingpoint

  • molarvol

  • parachor

  • solubilityparam

  • tpt

  • vdwarea

  • vdwvol

  • vaporpressure

Jobs are performed in parallel, with chunks of a given size being distributed to a user-specified number of processes and subsequently cashed. After all COSMO-RS calculations have been performed, the temporary .csv files are concatenated into cosmo-rs.csv.

Examples

>>> import os
>>> import pandas as pd
>>> from nanoCAT.recipes import run_fast_sigma

>>> output_dir: str = ...
>>> smiles_list = ["CO[H]", "CCO[H]", "CCCO[H]"]
>>> solvent_dict = {
...     "water": "$AMSRESOURCES/ADFCRS/Water.coskf",
...     "octanol": "$AMSRESOURCES/ADFCRS/1-Octanol.coskf",
... }

>>> run_fast_sigma(smiles_list, solvent_dict, output_dir=output_dir)

>>> csv_file = os.path.join(output_dir, "cosmo-rs.csv")
>>> pd.read_csv(csv_file, header=[0, 1], index_col=0)
property Activity Coefficient             ... Solvation Energy
solvent               octanol      water  ...          octanol     water
smiles                                    ...
CO[H]                1.045891   4.954782  ...        -2.977354 -3.274420
CCO[H]               0.980956  12.735228  ...        -4.184214 -3.883986
CCCO[H]              0.905952  47.502557  ...        -4.907177 -3.779867

[3 rows x 8 columns]
Parameters:
  • input_smiles (Iterable[str]) – The input SMILES strings.

  • solvents (Mapping[str, path-like]) – A mapping with solvent-names as keys and paths to their respective .coskf files as values.

Keyword Arguments:
  • output_dir (path-like object) – The directory wherein the .csv files will be stored. A new directory will be created if it does not yet exist.

  • plams_dir (path-like, optional) – The directory wherein all COSMO-RS computations will be performed. If None, use a temporary directory inside output_dir.

  • chunk_size (int) – The (maximum) number of entries to-be stored in a single .csv file.

  • processes (int, optional) – The number of worker processes to use. If None, use the number returned by os.cpu_count().

  • return_df (bool) – If True, return a dataframe with the content of cosmo-rs.csv.

  • log_options (Mapping[str, Any]) – Alternative settings for plams.config.log. See the PLAMS documentation for more details.

nanoCAT.recipes.get_compkf(smiles, directory=None, name=None)[source]

Estimate the sigma profile of a SMILES string using the COSMO-RS fast-sigma method.

See the COSMO-RS docs for more details.

Parameters:
  • smiles (str) – The SMILES string of the molecule of interest.

  • directory (str, optional) – The directory wherein the resulting .compkf file should be stored. If None, use the current working directory.

  • name (str) – The name of the to-be created .compkf file (excluding extensions). If None, use smiles.

Returns:

The absolute path to the created .compkf file. None will be returned if an error is raised by AMS.

Return type:

str, optional

nanoCAT.recipes.read_csv(file, *, columns=None, **kwargs)[source]

Read the passed .csv file as produced by run_fast_sigma().

Examples

>>> from nanoCAT.recipes import read_csv

>>> file: str = ...

>>> columns1 = ["molarvol", "gidealgas", "Activity Coefficient"]
>>> read_csv(file, usecols=columns1)
property  molarvol  gidealgas Activity Coefficient
solvent        NaN        NaN              octanol     water
smiles
CCCO[H]   0.905952  47.502557          -153.788589  0.078152
CCO[H]    0.980956  12.735228          -161.094955  0.061220
CO[H]     1.045891   4.954782                  NaN       NaN

>>> columns2 = [("Solvation Energy", "water")]
>>> read_csv(file, usecols=columns2)
property Solvation Energy
solvent             water
smiles
CCCO[H]         -3.779867
CCO[H]          -3.883986
CO[H]           -3.274420
Parameters:
  • file (path-like object) – The name of the to-be opened .csv file.

  • columns (key or sequence of keys, optional) – The to-be read columns. Note that any passed value must be a valid dataframe (multiindex) key.

  • **kwargs (Any) – Further keyword arguments for pd.read_csv.

See also

pd.read_csv

Read a comma-separated values (csv) file into DataFrame.

nanoCAT.recipes.sanitize_smiles_df(df, column_levels=2, column_padding=None)[source]

Sanitize the passed dataframe, canonicalizing the SMILES in its index, converting the columns into a multiIndex and removing duplicate entries.

Examples

>>> import pandas as pd
>>> from nanoCAT.recipes import sanitize_smiles_df

>>> df: pd.DataFrame = ...
>>> print(df)
         a
smiles
CCCO[H]  1
CCO[H]   2
CO[H]    3

>>> sanitize_smiles_df(df)
         a
       NaN
smiles
CCCO     1
CCO      2
CO       3
Parameters:
  • df (pd.DataFrame) – The dataframe in question. The dataframes’ index should consist of smiles strings.

  • column_levels (int) – The number of multiindex column levels that should be in the to-be returned dataframe.

  • column_padding (Hashable) – The object used as padding for the multiindex levels (where appropiate).

Returns:

The newly sanitized dataframe. Returns either the initially passed dataframe or a copy thereof.

Return type:

pd.DataFrame