nanoCAT.recipes.fast_sigma

A recipe for calculating specific COSMO-RS properties using the fast-sigma approximation.

Index

run_fast_sigma(input_smiles, solvents, *[, ...])

Perform (fast-sigma) COSMO-RS property calculations on the passed SMILES and solvents.

get_compkf(smiles[, directory, name])

Estimate the sigma profile of a SMILES string using the COSMO-RS fast-sigma method.

read_csv(file, *[, columns])

Read the passed .csv file as produced by run_fast_sigma().

sanitize_smiles_df(df[, column_levels, ...])

Sanitize the passed dataframe, canonicalizing the SMILES in its index, converting the columns into a multiIndex and removing duplicate entries.

API

nanoCAT.recipes.run_fast_sigma(input_smiles, solvents, *, output_dir='crs', ams_dir=None, chunk_size=100, processes=None, return_df=False, log_options=mappingproxy({'file': 5, 'stdout': 3, 'time': True, 'date': False}))[source]

Perform (fast-sigma) COSMO-RS property calculations on the passed SMILES and solvents.

The output is exported to the cosmo-rs.csv file.

Includes the following properties:

  • LogP

  • Activety Coefficient

  • Solvation Energy

  • Formula

  • Molar Mass

  • Nring

  • boilingpoint

  • criticalpressure

  • criticaltemp

  • criticalvol

  • density

  • dielectricconstant

  • entropygas

  • flashpoint

  • gidealgas

  • hcombust

  • hformstd

  • hfusion

  • hidealgas

  • hsublimation

  • meltingpoint

  • molarvol

  • parachor

  • solubilityparam

  • tpt

  • vdwarea

  • vdwvol

  • vaporpressure

Jobs are performed in parallel, with chunks of a given size being distributed to a user-specified number of processes and subsequently cashed. After all COSMO-RS calculations have been performed, the temporary .csv files are concatenated into cosmo-rs.csv.

Examples

>>> import os
>>> import pandas as pd
>>> from nanoCAT.recipes import run_fast_sigma

>>> output_dir: str = ...
>>> smiles_list = ["CO[H]", "CCO[H]", "CCCO[H]"]
>>> solvent_dict = {
...     "water": "$AMSRESOURCES/ADFCRS/Water.coskf",
...     "octanol": "$AMSRESOURCES/ADFCRS/1-Octanol.coskf",
... }

>>> run_fast_sigma(smiles_list, solvent_dict, output_dir=output_dir)

>>> csv_file = os.path.join(output_dir, "cosmo-rs.csv")
>>> pd.read_csv(csv_file, header=[0, 1], index_col=0)
property Activity Coefficient             ... Solvation Energy
solvent               octanol      water  ...          octanol     water
smiles                                    ...
CO[H]                1.045891   4.954782  ...        -2.977354 -3.274420
CCO[H]               0.980956  12.735228  ...        -4.184214 -3.883986
CCCO[H]              0.905952  47.502557  ...        -4.907177 -3.779867

[3 rows x 8 columns]
Parameters
  • input_smiles (Iterable[str]) – The input SMILES strings.

  • solvents (Mapping[str, path-like]) – A mapping with solvent-names as keys and paths to their respective .coskf files as values.

Keyword Arguments
  • output_dir (path-like object) – The directory wherein the .csv files will be stored. A new directory will be created if it does not yet exist.

  • plams_dir (path-like, optional) – The directory wherein all COSMO-RS computations will be performed. If None, use a temporary directory inside output_dir.

  • chunk_size (int) – The (maximum) number of entries to-be stored in a single .csv file.

  • processes (int, optional) – The number of worker processes to use. If None, use the number returned by os.cpu_count().

  • return_df (bool) – If True, return a dataframe with the content of cosmo-rs.csv.

  • log_options (Mapping[str, Any]) – Alternative settings for plams.config.log. See the PLAMS documentation for more details.

nanoCAT.recipes.get_compkf(smiles, directory=None, name=None)[source]

Estimate the sigma profile of a SMILES string using the COSMO-RS fast-sigma method.

See the COSMO-RS docs for more details.

Parameters
  • smiles (str) – The SMILES string of the molecule of interest.

  • directory (str, optional) – The directory wherein the resulting .compkf file should be stored. If None, use the current working directory.

  • name (str) – The name of the to-be created .compkf file (excluding extensions). If None, use smiles.

Returns

The absolute path to the created .compkf file. None will be returned if an error is raised by AMS.

Return type

str, optional

nanoCAT.recipes.read_csv(file, *, columns=None, **kwargs)[source]

Read the passed .csv file as produced by run_fast_sigma().

Examples

>>> from nanoCAT.recipes import read_csv

>>> file: str = ...

>>> columns1 = ["molarvol", "gidealgas", "Activity Coefficient"]
>>> read_csv(file, usecols=columns1)
property  molarvol  gidealgas Activity Coefficient
solvent        NaN        NaN              octanol     water
smiles
CCCO[H]   0.905952  47.502557          -153.788589  0.078152
CCO[H]    0.980956  12.735228          -161.094955  0.061220
CO[H]     1.045891   4.954782                  NaN       NaN

>>> columns2 = [("Solvation Energy", "water")]
>>> read_csv(file, usecols=columns2)
property Solvation Energy
solvent             water
smiles
CCCO[H]         -3.779867
CCO[H]          -3.883986
CO[H]           -3.274420
Parameters
  • file (path-like object) – The name of the to-be opened .csv file.

  • columns (key or sequence of keys, optional) – The to-be read columns. Note that any passed value must be a valid dataframe (multiindex) key.

  • **kwargs (Any) – Further keyword arguments for pd.read_csv.

See also

pd.read_csv

Read a comma-separated values (csv) file into DataFrame.

nanoCAT.recipes.sanitize_smiles_df(df, column_levels=2, column_padding=None)[source]

Sanitize the passed dataframe, canonicalizing the SMILES in its index, converting the columns into a multiIndex and removing duplicate entries.

Examples

>>> import pandas as pd
>>> from nanoCAT.recipes import sanitize_smiles_df

>>> df: pd.DataFrame = ...
>>> print(df)
         a
smiles
CCCO[H]  1
CCO[H]   2
CO[H]    3

>>> sanitize_smiles_df(df)
         a
       NaN
smiles
CCCO     1
CCO      2
CO       3
Parameters
  • df (pd.DataFrame) – The dataframe in question. The dataframes’ index should consist of smiles strings.

  • column_levels (int) – The number of multiindex column levels that should be in the to-be returned dataframe.

  • column_padding (Hashable) – The object used as padding for the multiindex levels (where appropiate).

Returns

The newly sanitized dataframe. Returns either the initially passed dataframe or a copy thereof.

Return type

pd.DataFrame