>

>

Machine Learning for Early-Stage Parkinson’s Detection in North America: A Comparative Evaluation of Logistic Regression and Random Forest Models Using Vocal, REM Sleep and Movement Data

Machine Learning for Early-Stage Parkinson’s Detection in North America: A Comparative Evaluation of Logistic Regression and Random Forest Models Using Vocal, REM Sleep and Movement Data

Focus

Machine Learning, Neurology, Biomedical Data Science

Motivation

Early Detection, Accessibility, Clinical Interpretability

About the project

This research investigates how machine learning can support the early, non-invasive detection of Parkinson’s disease (PD) — a neurodegenerative condition often diagnosed only after substantial neuronal loss. The study compares two interpretable models, Logistic Regression (LR) and Random Forest (RF), applied separately to three publicly available datasets: vocal tremor recordings (UCI Parkinson’s dataset), REM sleep parameters (PhysioNet Sleep-EDF), and smartphone-based movement data (mPower). By evaluating these models through stratified cross-validation, the paper tests their diagnostic ability across different modalities.

The findings indicate that Random Forest consistently outperforms Logistic Regression, achieving ROC-AUC scores of 0.96 on vocal data, 0.92 on sleep data, and 0.96 on movement data. These results demonstrate the feasibility of early PD detection using simple, interpretable models on individual data modalities. However, since the datasets originate from distinct participant cohorts, any multimodal conclusions are treated as synthetic cross-dataset simulations rather than true integrated analyses. The study emphasizes that real-world multimodal validation requires synchronised, participant-level data collection.

Beyond model performance, the paper underscores the ethical and practical importance of interpretability in medical AI — particularly when working with small datasets and sensitive health decisions. The author argues that while deep learning can achieve superior accuracy, it often sacrifices transparency, which limits clinical adoption. By contrast, LR and RF provide clearer decision boundaries and feature-level insights that clinicians can trust. The study concludes that interpretable models, applied to accessible and non-invasive biomarkers such as voice, movement, and sleep, can play a pivotal role in developing scalable, early-screening tools for Parkinson’s disease, especially in regions with limited access to advanced diagnostic imaging like DaTscan.

Check out more projects

The 2025 US Tariffs on China and their affects on both economies, the effects on the stakeholders involved

By :

Hasan Ali K.

View

How do habitability factors impact the economic feasibility of a sustainable human colony on Mars?

By :

Swayam S.

View

Examining the Factors that Drive the Variation in Housing Prices Across India During and After the Covid-19 pandemic

By :

Jia T.

View

The 2025 US Tariffs on China and their affects on both economies, the effects on the stakeholders involved

By :

Hasan Ali K.

View

How do habitability factors impact the economic feasibility of a sustainable human colony on Mars?

By :

Swayam S.

View

The 2025 US Tariffs on China and their affects on both economies, the effects on the stakeholders involved

By :

Hasan Ali K.

View

How do habitability factors impact the economic feasibility of a sustainable human colony on Mars?

By :

Swayam S.

View

Interested in Research?
Apply Now

Interested in Research?
Apply Now

1.

1.

Fill RISE Research Application Form

Fill RISE Research Application Form

2.

2.

Profile Shortlisting

Profile Shortlisting

3.

3.

Interview Discussion

Interview Discussion

4.

4.

Program Onboarding

Program Onboarding