Bryan J. Carr

Data Science and Machine Learning Portfolio

About Me

I have 8 years of experience as a Naval Combat Systems Engineering Officer in the Royal Canadian Navy, mainly involving technical project management and team leadership. For the past two years, I've caught the data science bug and have been diving into Python, machine learning and artificial intelligence. I love building neural networks and machine learning models!

Naval engineers have to be great story tellers to translate between the technical details and high-level impacts. We're obsessed with capability and results for operations. Those skills serve me well in data science and ML.

Outside of work, I am an avid strategy gamer and motorcyclist, a home cook, and enjoyer of wine and craft beer.

I'm a proud Canadian, originally from Sudbury, Ontario and now living in Victoria, B.C.

Education

McGill University - B.Sc., Physics - 2010

University of San Diego - M.S., Applied Artificial Intelligence - 2023 (expected; part-time)


University of British Columbia - Certificate in Key Capabilities in Data Science - 2021

University of Victoria - Masters Certificate in Project Management - 2020

Harvard University - Certificate in Negotiation Mastery - 2020

Skills, Tools and Techniques

A brief overview of some things I know in the data world.

Intermediate programming in Python

Tools: Notebooks (Jupyter, Google Colab, etc). IDEs (VS Code, PyCharm)

Standard libraries: Pandas, Numpy,

ML: Scikit-Learn, XGBoost, Tensor Flow, Keras

Visualization and Plotting: Matplotlib, Seaborn, Altair

NLP: NLTK, Tensor Flow, spaCy

Deploying: Streamlit, Pickle


Basic programming in R, Matlab, C, Java, SQL

Projects for School

These projects were completed as part of my studies as the University of San Diego and University of British Columbia.

I'm happy to share code with recruiters, but generally will not be posting these publicly due to academic integrity considerations. Exceptions will be noted where the project goals are more general and personalized.

Analysis of Most Profitable Disney Directors (November 2020, UBC) - Combined, cleaned and reorganized datasets relating to Disney movies, with the objective of determining which director had generated the most revenue for the studio. Analysis found this to be Wolfgang Reitherman, the studio's workhorse of the 60s and 70s. Note: code not saved.

Techniques: Data selection, cleaning, joining in Pandas. Functions in Python. Data visualization in Altair. Analysis and interpretation.

Machine Learning Analysis of Canadian Cheeses (March 2021, UBC) - Applied machine learning techniques to a dataset of Canadian cheeses. My initial goal was to use the language data from tasting notes to see if keywords could be traced to the province of origin, however it proved to be a bit too much for my novice self. I ended up building a model to classify province of origin based on the other data (type of cheese, consistency, fat content, etc). It did not work well, giving accuracy in the 60-75% range -- this problem is not well suited for ML. Note: code not saved.

Techniques: Data cleaning & manipulation. Exploratory data analysis, including data visualization. Bag-of-words tokenizer. Simple null-value Imputing. ML Pipeline. K-Folds cross validation. Multi-class classification using Decision Tree Classifier, Random Forest Classifier.

Genetic Algorithm (Summer 2021, USD) - A short project to implement a genetic algorithm and run experiments on converting strings of zeroes to ones.

Techniques: Programming. Functions. Writing to output files. Visualization.

Spotify Song Classification (February 2022, USD) - Built a multi-class classification model using XGBoost to predict a song's genre, based on the Spotify song dataset from their previous Kaggle competition. Included a web app using Streamlit to train a model with hyperparameters input by the user.

Techniques: Data cleaning, selection, pre-processing. Hyperparameter tuning. Web-app deployment. Confusion Matrix and Data Visualization.

Home Credit Risk Default Prediction using Machine Learning (Summer 2022, USD) - In my Machine Learning class at USD, we worked with the Home Credit default risk dataset from Kaggle, applying different stages of the ML process and different models throughout, including:

Data selection, joining, cleaning; Exploratory data analysis; Data visualization; Selection of key variables. Data pre-processing.

Building simple Decision Tree and Random Forest models. Confusion Matrix.

Unsupervised approaches - PCA; Heirarchical clustering and Dendogram visualizations; K-Means clustering.

Selection of evaluation criteria. Evaluating feature importance. Feature Selection. Feature engineering. Imputing null values with regression.

Gradient-boosting classifiers with XGBoost.

Naive Bayesian classifiers.

Hyperparameter tuning. K-Folds cross validation.

Image Classification on MNIST Handwritten Digits (July 2022) - Built a simple convolutional neural networks to classify the images of handwritten digits in the MNIST data set. Achieved accuracy of 99.21% on the test data with two convolutional layers (852k parameters).

Techniques: Data loading, visualization, pre-processing. Convolutional neural networks. Confusion Matrix & Evaluations.

RNN for Stock Price Prediction (July 2022) - Built a recurrent neural network using LSTMs to predict the price of META stock based on recent returns. Included data reshaping function and early stopping callback. Created 140 days of future predictions, with an accuracy score of 96.004% compared to the real data -- in retrospect, this is overfitting and not indicative of predicting the trends.

Techniques: Data loading, reshaping, pre-processing. LTSM Recurrent neural networks.

Sentiment Analysis of IMDB Reviews (August 2022) - Applied NLP techniques in TensorFlow to build neural networks for sentiment analysis on the classic IMDB movie data set. Built both RNN and LTSM models. Achieved results in the 80% range for all of Accuracy, Recall, Precision and F1 on initial build-outs.

Techniques: Data loading, pre-processing. Tokenizing. Embedding. RNN, LTSM. Early stopping callback. Confusion Matrices. Transfer Learning for word embeddings.

Autoencoder to rebuild Handwritten Digits (August 2022) - Built a simple autoencoder system for the handwritten digit MNIST dataset (2 convolutional layers in the encoder, 2 transpose convolutions in the encoder). Observed MSE of 0.002 and MAE of 0.013 on the test set, across reconstruction of all 28x28 pixel images. Sample of test images were reconstructed with extremely high fidelity.

Techniques: Data loading and pre-processing. CNN-based autoencoder.

Movie Recommender Systems (August 2022) - Created a recommender system for movies based on reviews and preferences in the ML-100K Movielens data set. Creating User and Movie embeddings using deep learning and applying three different model approaches (two based on regression to predict a user's rating for a movie, one based on classification). Best approach was the simplest regression system, with a RMSE of 0.929 across the test set.

Techniques: Data loading, pre-processing, splitting. Multiple-input layer neural networks.