Our specialists combine discriminating features from rich spectral data with demographic and phenotypic information to create robust classification algorithms.

Biomarker discovery

We identify and extract clinically-relevant features from the chemical spectra data to classify patients by disease state.

Our highly skilled data scientist team takes the output data from sample analysis and processes it through compression techniques such as discrete wavelet transforms or principal component analysis to aid in feature discovery.

ReCIVA samples for GC-MS and FAIMS analysis

These identified features are combined with patient clinical and outcome data from our proprietary database to develop biomarker classification methods. The form of the classifier will depend on the dataset, but random forest, sparse logistic regression, and support vector machines have all been successfully deployed in biomarker discovery.

We generate standard statistical outputs including principal component analysis, box plots and receiver operating characteristics curves (ROC), showing the range of concentration for given patient sub-populations demonstrating the predictive power of each potential biomarker and the overall performance of the classification algorithms.

VOC data analysis pipeline
Using a training data set, a machine learning approach is taken to train a learning algorithm and develop classifiers using 10-fold stratified cross validation. The classifiers are applied to the test data set, and a systematic study of different sources of bias is performed, in order to identify any sources of technical variation and to verify that the classifier is robust to confounding factors. The features and classifiers are then further refined through an iterative process. In order to select the best performing classifier, classifier performance is evaluated in terms of sensitivity, specificity and area under the ROC curve. In the case of on-going studies, we also assess whether the number of patients is sufficient for statistical significance, or whether the study would benefit from recruitment of additional patients. Finally, we use a proprietary database containing thousands of anonymized patient VOC profiles to verify classification algorithms in silico.


Learn more about Machine Learning and its use in the discovery of VOC biomarkers

Read paper

Test creation

Our classification algorithms use multiple biomarkers to accurately identify patients by disease presence, therapy response or outcome.

Using patient history, comorbidity and outcome information along with biomarker concentration data we create tests to make actionable patient classifications that can be deployed in clinic. By generating patient specific probabilities for disease states you gain detailed insights into test performance as well as the means to optimize the positive and negative predictive values of the test to match your application.

disease to roc curve

Get in touch with the Owlstone team for further information and pricing

request pricing