Thomas Durkin

Data Scientist and Software Engineer based in Syracuse, NY

About

Hello! I'm Thomas, an experienced Data Scientist and Software Engineer, holding a Masters in Data Science from the University of Rochester along with a Bachelors in Computer Science from RPI. I decided to pursue a career in data science because of my passion for programming and mathematics. Data science combines both of these disciplines as a way to solve complex problems that can positively impact the way we live and think. I thrive at taking on challenges as they stimulate personal growth which is why the world of data fascinates me. An area of data science that I am strongly interested in is AI in healthcare because of its potential to transform the medical industry, leading to innovation that everyone will benefit from.


Education

University of Rochester

Aug 2021 - Dec 2022

M.S. in Data Science

GPA: 3.79 / 4.0

Rensselaer Polytechnic Institute

Aug 2017 - May 2021

B.S. in Computer Science

GPA: 3.54 / 4.0

Skills

  • Programming Languages
    • Python, Bash, C, C++, C#, Java, R, SQL, Visual Basic,
  • Web Development
    • Blazor, CSS, HTML, .NET Framework, PostgreSQL
  • Python Packages
    • PyTorch, PySpark, Pandas, NumPy, Matplotlib, SciKit-Learn
  • Other
    • Linux Ubuntu, Git, Confluence, Jira, Jupyter Notebook, Azure Databricks

Certifications



Experience

Full Stack Software Engineer

Jan 2020 - Oct 2021

Harris School Solutions
Albany, NY

  • WinCapCom
    • Modernizing the BOCES module from the legacy WinCap application, which manages contracts, shared programs, and services used by school districts in order to provide an improved user experience
    • Implemented the entire Actual Use Bill Schedule financial management module successfully, using Blazor, C#, and Harris' Cheyenne Framework, allowing clients to issue, process, and post bills
    • Participated in comprehensive code reviews and maintained open communication with QA to accelerate the rollout of new features to beta users
    • Utilized Jira to manage bi-weekly sprint tasks and prioritize backlog items in collaboration with the Product Owner
  • WinCapWeb
    • Utilized .NET Framework, SQL, Visual Basic, to develop a web application for schools to manage tasks ranging from payroll to activity attendance
    • Worked to fix bugs and to incorporate new features based off client feedback, improving site reliability and functionality

Projects

Below is a collection of projects I have worked on throughout my academic career. These projects emcompass the most important topics and skills I have learned.

  Graduate Capstone Project
 Report

Project sponsored by NASA and Coral Vita to gain insight on coral reef restoration. Data was collected from NASA Satellites, Landsat-8 and MODIS, using Google Earth Engine and combined with coral databases. The spatially and temporally aligned data was used to create gradient boosted decision tree models for coral bleaching levels. Models were created for Great Barrier Reef and Northern Carribean having average accuracy of 91% for detecting coral and accurately identifying 80% of coral with moderate/severe bleaching.

 Classification of Cancer Discussion Posts using Deep Learning
 Report

Utilized deep learning frameworks in a comparitive study to classify cancer discussion board posts. Models analyzed included CNN, RNN, Bi-LSTM, and a Transformer encoder. Discussion posts were scraped from the American Cancer Society's discussion forum using Beautiful Soup resulting in 13 classes. Through an analysis of confusion matrices, F1-Score, and recall, it was determined stacking the Bi-LSTM and Transformer encoder would produce the best results at 70.7% accuracy.

 Discovering Trending Research Topics
 Report

Analyzed grant abstract data from Dimensions.ai in order to build topic models for the University of Rochester and Research 1 Universities. Models were created in five year intervals from 2000-2020 as well as one model for 2000 and 2020 respectively. The algorithms used to produce the topic models were BERT and Latent Dirichlet Allocation (LDA). It was determined that research is trending towards cell biology, brain and clinical sciences, as well as artifical intelligence.

 Custom LSTM Network

Created a Long short-term memory network from scratch using PyTorch. Users are able to set hyperparameters for number of embedding dimensions, hidden nodes, output nodes, layers, the amount of dropout, and whether the network is bidirectional or not. The network was trained to detect whether the sms message was spam or not and resulted in an accuracy of 88% for the validation set.

 Custom Feed-Forward Neural Network

A three layer multilayer perceptron was created using PyTorch to classify images as a dog or cat. Three activation functions were implemented: Sigmoid, Tanh, and ReLU. Backpropagation was implemented to minimize the final loss with Stochastic Gradient Descent.

 Classification of Covid-19 Tweets

Kaggle Competition to classify tweets based on country origin. The dataset consisted of six countries. An ensemble model consisting of a CNN and Multinomial Naive Bayes provided the best results with an accuracy of 51.3%. Placed 10th out of 57 teams.

 Anagrams Game

A game built using Java in which two players compete to make as many words as possible with the six given letters. The letters are randomized so each game is different. Players connect to each other using a Peer-to-Peer connection.

Contact

Phone

Mobile: (+412) 889 8244

Address

4501 South Eagle Village Road
Manlius, NY
13104 US

Email

mthomasdurkin@gmail.com