Add MNSol data by jaclark5 · Pull Request #109 · OpenFreeEnergy/openfe-benchmarks

jaclark5 · 2026-02-25T22:41:25Z

Close #61

Blocked by:

Add solvent/solute name to exp*data.json #110

Co-authored-by: Jennifer A Clark <jennifer.clark@omsf.io>

Co-authored-by: Hannah Baumann <43765638+hannahbaumann@users.noreply.github.com>

hannahbaumann · 2026-03-02T08:54:27Z

+                "solvent_inchikey": offmol_solute.to_inchikey(fixed_hydrogens=True),
+                "solvent_inchi": offmol_solute.to_inchi(fixed_hydrogens=True),
+            }
+            sys_data[key] = {


I think I'm not understanding yet what this system_data.json file is needed for. Is this because we cannot store the exp data, to at least have the rest here?

@hannahbaumann Ya I thought about whether we "really needed it" because someone could just generate the experimental*.json file and use that. I guess there are two reasons to keep it:

It provides the expected systems to run as of now, representing which ligands and systems we intend to be included in the dataset.

It allows me to run testing and CI on the plan_asfe, but I can just use freesolv instead.

It is sort of a shame to include this file since it's almost a GB and I needed to introduce LFS because of it, but it felt like the correct "provenance thorough" thing to do. I'm open to excluding it though.

Actually, I'm removing this. I don't think we are confident that including it would be valid given the license so better safe than sorry.

hannahbaumann

Thanks @jaclark5 , lgtm, just left some comments.
Would it be better to combine the two charge scripts (free solv and mnsol) into a single script?

hannahbaumann · 2026-03-02T09:00:20Z

+
+## Charging Solutes / Solvents
+
+Charges were generated using the [charge_mnsol.py](../../../data_generation/charge_freesolv.py) script using the [conda-lock_linux-64.yml](../../../data_generation/conda-lock_linux-64.yml) environment. 


Suggested change

Charges were generated using the [charge_mnsol.py](../../../data_generation/charge_freesolv.py) script using the [conda-lock_linux-64.yml](../../../data_generation/conda-lock_linux-64.yml) environment.

Charges were generated using the [charge_mnsol.py](../../../data_generation/charge_mnsol.py) script using the [conda-lock_linux-64.yml](../../../data_generation/conda-lock_linux-64.yml) environment.

hannahbaumann · 2026-03-02T09:01:40Z

+## Notes
+
+- Experimental uncertainties are set to 0.2 kcal/mol for all neutral entries, following the recommendation in the MNSol documentation.
+- Solvent and solute SMILES are stored as canonical explicit-hydrogen SMILES generated by the OpenFF Toolkit.


Maybe add a note that the exp data is not deposited here for licensing reasons, but can be generated on the fly using script X?

@hannahbaumann would you mind if I "Resolve comments" that I think are handled or would you like to resolve the comments as the reviewer?

I think it can be helpful for me to see them when I review again, but if it's better for you for tracking things, that's also fine!

jaclark5 · 2026-03-02T21:44:20Z

These changed happened when I ran pre-commit. Only formatting changes are present.

hannahbaumann · 2026-03-05T11:43:54Z

+The reference data was generated using the [generate_mnsol_data.py](../../../data_generation/generate_mnsol_data.py) script. Entries were excluded if:
+
+- The solute or solvent name was not present in `mnsol-name-to-smiles.json`
+  - Molecules with ambiguous isomeric structure were excluded including: 'bromotoluene', 'chlorotoluene', 'dichloroethane', 'fluoroctane', 'trimethylbenzene'


Is the different intent intended?

Nope thanks for catching!

hannahbaumann · 2026-03-05T11:54:10Z

+                print(f"Skipping {key}: Charge is not zero")
+                continue
+            if solute_name == "water":
+                print(f"Skipping {key}: Water")


I think I'm just blanking on something we probably discussed, but why are we skipping water here again?

This was an error, I was thinking I would skip writing it to the ligand.sdf but this has a larger effect, thanks for catching

You're checking for solute_name != "water" and solvent_name != "water" in a few other places, is that still desired or would that also need to be removed?

It is desired. Before I was accidentally skipping systems that had water as the solute. That is not desired.

Now I'm not including water in the ligands.sdf which is desired. This is because water is a special case that Pontibus handles separately to apply known water models like OPC3. Water won't have partial charges assigned and will cause our tests to fail for that reason, so we skip it and exclude it from the ligands_*.sdf.

hannahbaumann

Thanks @jaclark5 , I think this looks good, just the small comments. I think we should definitely run at least a subset of these, once we have the planning script ready, to validate this setup.

Add MNSol systems

96aa3d2

jaclark5 self-assigned this Feb 25, 2026

jaclark5 and others added 16 commits February 26, 2026 14:28

Update mnsol data

82bb4c1

Add mnsol.py

b1a8164

Add charged ligand sdfs

95dda40

Fix charge_freesolve docstring

ea6fa0a

Update openfe_benchmarks/data/benchmark_system_indexing.yml

5e805dc

Co-authored-by: Jennifer A Clark <jennifer.clark@omsf.io>

fix plan script, update test

9f1fead

update system data loading

eb9a579

Update openfe_benchmarks/scripts/_example_plan_rbfe.py

31b3b10

Co-authored-by: Hannah Baumann <43765638+hannahbaumann@users.noreply.github.com>

Update openfe_benchmarks/scripts/_example_plan_rbfe.py

017b682

Co-authored-by: Hannah Baumann <43765638+hannahbaumann@users.noreply.github.com>

Update ligands*.sdf to remove water

1bf06af

Update MNSol scripts and BenchmarkData to support systems_data.json

6c7d5bf

Update freesolv to include solute name / solvent

774d774

Fix test_benchmark_data

efce52c

Update mnsol dataset generation to not be openff specific

cffa8a8

Add solvent/solute name to exp*data.json

d15b57e

Merge branch 'update_freesolv' into mnsol

6102779

jaclark5 changed the base branch from main to update_freesolv February 27, 2026 18:49

jaclark5 mentioned this pull request Feb 27, 2026

plan_asfe script #99

Merged

jaclark5 requested review from hannahbaumann and jthorton February 27, 2026 18:52

jaclark5 added 2 commits February 27, 2026 14:12

Update preparation details for mnsol

6a1018a

Update partial charges

9d844ab

jaclark5 marked this pull request as ready for review February 27, 2026 19:29