CACHE Screening Databases

Welcome to CACHE Screening Databases

Last updated: Sept 20, 2022 by John Irwin

What is this?

CACHE is "Critical Assessment of Computational Hit-finding Experiments". The plan for CACHE is outlined in an article now available at Nature Reviews Chemistry. https://www.nature.com/articles/s41570-022-00363-z

What should you screen?

There are four libraries you could screen, two ready built in 3D, two in 2D. Most of these molecules are available at parallel synthesis prices for 6 week delivery. Synthesis success rate is about 85% overall. We provide options, it is up to you what you choose to screen.

What if I want some custom number, say 1M, 10M or 100M molecules to screen. How do I proceed?

You have several options to tailor subsets that fit the target and match your capabilities.

You are also welcome to screen the Enamine library direct from Enamine (below). ZINC is provided in the hope that it will be useful, but you must use it at your own risk.

Your mission in CACHE is to predict compounds that bind to your target and can be acquired at low cost, e.g. via parallel synthesis at Enamine.

Available Database Categories

CategoryApprox. NumberApprox. Size
mol2, sdf, pdbqt
Source DatabaseHow to proceed
In-stock 3M-4M 3D, 1 GB per million molecules in sdf and pdbqt formats. slightly more, 1.6 GB per million in mol2. 40 GB per million in db2 ZINC20 Select 3D, purchasability level: "agent", meaning compounds that are on the shelf somewhere and available direct from suppliers or procurement agents (e.g. Molport, eMolecules). About 90% expected sourcing success rate.
On-demand ZINC20 300M-500M 3D, 1 TB per billion molecules in sdf and pdbqt formats, 1.6 TB per billion molecules for mol2. 40 TB per billion in db2 ZINC20 Select 3D, purchasability level: "wait-ok", which includes in stock as well as make-on-demand, 6 weeks, 80% expected sourcing success rate. make any other changes you like. click download. Select you preferred format and download method.
Enamine REAL 2 B 2D, 29 GB (smiles plus data) Enamine REAL Database supplied as SMILES. Build your own 3D molecules.
ZINC20 SMILES 1 B 2D, 20 GB ZINC20 or the 2D link below Download from the 2D tranche browser, build your own 3D molecules

 

Updates

Because of the war in Eastern Europe, we are making available additional libraries that may be easier to access because the stocks are not held in a war zone. We include the original smiles, plus we have calculated InChIkey and the ZINC-22 tranche, which includes the heavy atom count and logP bin.

NameSmilesSmiles, inchikey, HAC, logPOriginal URL
Chemspace SC 346KchemspaceMar22.smi.gz chemspaceMar22.smi_hlogp_inchi.txt.gz chemspace databases
Chemspace BB 713KchemspaceMar22bb.smi.gz chemspaceMar22bb.smi_hlogp_inchi.txt.gz chemspace databases
Mcule Ultimate Express 1 step 578,693mcule-1step.smi.gz mcule-1step.smi_hlogp_inchi.txt.gz mcule Express 1 step
Mcule Ultimate Express 2 step 57.4Mmcule-2step.smi.gz mcule-2step.smi_hlogp_inchi.txt.gz Mcule Express 2 step
Mcule Purchasable Full 7.5Mmcule.smi.gz mcule.smi_hlogp_inchi.txt.gz Mcule purchasable full SDF
Mcule purchasable full SMI
Mcule Purchasable In stock 5.0Mmcule-instock.smi.gz mcule-instock.smi_hlogp_inchi.txt.gz Mcule purchasable in stock SDF
Mcule purchasable in stock SMI

If people want a particular subset, and if there is some agreement on what subset should be prepared, we can prepare it. Otherwise, please create your own subset according to your opinions and capabilities as described above. Or just download everything and dock as much as time allows, then stop.
 
 
[ICO]NameLast modifiedSizeDescription

[DIR]mcule/ 2022-03-31 11:01 -  
[DIR]chemspace/ 2022-05-27 12:58 -  
[DIR]2D/ 2023-05-02 08:21 -  
[DIR]zinc22/ 2024-02-02 14:40 -  
[DIR]3D/ 2024-03-14 09:49 -  

This space reserved for future files to be downloaded. For now, please use the tranche browsers referenced above.