Machine Learning is endlessly powerful for discovering correlations where humans might otherwise struggle. Applying these correlations to unseen data, however, is a challenge. astartes seeks to make adressing interpolation and extrapolation as easy as training a model in the first place.
astartes is a drop-in replacement for sklearn's train_test_split
provides training, validation, and testing sets to avoid data leakage
rigorous model validation is dramatically simplified
implements numerous sampling algorithms to enforce interpolation and extrapolation
Sphere Exclusion, K-Means, Kennard-Stone, SPXy, and others
Projects & Publications
astartes - Better Data Splits for Machine Learning
Algorithmic Samplers to Train Validate Test Partition Molecules and Arbitrary Arrays (astartes) makes it easy to do rigorous interpolative and extrapolative sampling for ML model development with training, testing, and hold out sets. Open Source and on GitHub.
py2sambvca - Python interface to SambVca catalytic pocket calculator
Simple thin client to interface python scripts with SambVca catalytic pocket Fortran calculator. Available on Python Package index and featured on SambVca webserver. py2sambvca is available for download via GitHub and is citable via figshare.
AIMSim - Accessible and Extensible GUI for similarity visualization of chemical datasets
Artificial Intelligence Molecular Similarity (AIMSim), an accessible cheminformatics platform for performing similarity operations on collections of molecules (molecular datasets), provides a unified platform to perform similarity-based tasks on molecular datasets. AIMSim is on chemrxiv and CPC.
CROW - High Throughput Experimentation Data Management Software
Crow is an open-source research tool written in Python for use in the retrieval, diagnosis, and presentation of multivariate High Throughput Experimentation data. Crow is available for download via GitHub and is citable via figshare.
MATLAB Start to Finish - Lecture Series and Practice Problems for MATLAB
Included in this repository is an (approximately) 10-week curriculum intended to cover all the essentials of MATLAB, ranging from "what is the command window?" up to evaluating partial differential equations symbolically, with a focus on the skills needed for an undergraduate chemical engineering student. View the materials on GitHub.