My Portfolio

About Me

Hi, I’m Terrence Scott — a data science enthusiast with a background in software engineering, analytics, and machine learning. I’m passionate about turning raw data into meaningful insights, solving real-world problems, and building solutions that make a difference.

Right now, I’m actively leveling up my skills through the TripleTen Data Science Bootcamp while pursuing my Master of Science in Data Science at the University of Phoenix. These programs are helping me deepen my expertise in data analysis, statistical modeling, and machine learning, while continuing to grow as a collaborative, impact-driven professional.

When I’m not coding or diving into data, you’ll find me nurturing my plant collection, embracing continuous learning, or connecting with others who share a passion for tech and innovation.

Email: terrencejay23@gmail.com

View My Projects AI Projects

Projects

Here are some of the projects I have worked on, showcasing my skills in data science, software engineering, and problem-solving:

Sprint 1: Working with Data in Python

📽️ Movie & TV Show Data Analysis with Pandas

Python | Pandas | Data Cleaning | Exploratory Data Analysis

In this project, I explored a dataset containing information on movies and TV shows. Using the pandas library, I read, cleaned, and analyzed the data to uncover insights such as content distribution by type, release year trends, and popular genres. This project strengthened my skills in working with DataFrames, handling missing data, indexing, filtering, and performing key data manipulation tasks. The final analysis provides a clear, structured summary of content trends in the entertainment industry.
Sprint 2: Exploratory Data Analysis (EDA)

🛒 Grocery Shopping Behavior Analysis – Instacart Dataset

Python | Pandas | Data Analysis | Data Visualization | Consumer Insights

In this project, I analyzed the Instacart dataset to uncover patterns in customer grocery shopping behavior. Using pandas for data manipulation and exploratory analysis, I identified key trends in shopping times, days of the week, reorder habits, and product preferences. Insights included peak shopping hours, high-demand days (Sunday & Monday), frequent reorders of fresh produce (bananas, spinach, avocados), and common cart behaviors. This project demonstrates my ability to derive actionable business insights from large datasets to support inventory planning, targeted marketing, and operational efficiency.
Sprint 3: Statistical Data Analysis (SDA)

📈 Revenue Attribution Analysis – Megaline Prepaid Plans

Python | Pandas | Statistical Testing | Data Analysis | Business Strategy

In this project, I conducted a comprehensive revenue attribution analysis for Megaline’s two prepaid mobile plans — Surf and Ultimate. Using pandas for data manipulation and hypothesis testing, I uncovered that the Surf plan consistently generated higher revenue, primarily through overage charges, despite having a lower base price than the Ultimate plan.

Key insights included seasonal revenue spikes, high user engagement on the Surf plan, and no significant revenue variation across geographic regions. Hypothesis testing validated that Surf users generated significantly more revenue than Ultimate users (p < 0.001), while churned users showed no meaningful difference in revenue compared to active users.

Based on these insights, I recommended reallocating marketing efforts toward high-engagement Surf customers and implementing targeted upsell strategies. This analysis highlights my ability to translate complex data into actionable business strategies that drive growth and customer value.
Project: Video Game Sales Forecasting

🎮 Global Video Game Market Analysis – ICE

Python | Pandas | Statistical Analysis | Data Visualization | Market Research

In this project with ICE, I analyzed global video game performance data to uncover patterns across platforms, genres, and regions. Using pandas for data manipulation and hypothesis testing, I examined regional sales trends, genre popularity, and the influence of ESRB ratings on market success.

Key findings revealed that Action games, while generating high global sales, had low median performance, indicating a high-risk, high-reward genre. Shooter and Role-Playing games demonstrated stronger consistency across markets, with regional dominance varying: PS4/Xbox One in the West, PC in Europe, and PSP in Japan. Hypothesis tests confirmed no significant difference in user ratings between Xbox One and PC titles, but found a significant rating gap between Action and Sports games (p < 0.001).

These insights inform targeted publishing strategies, regional marketing focus, and content investment decisions. This project highlights my ability to transform market data into actionable intelligence for business growth in the interactive entertainment industry.
Sprint 7: Introduction to Machine Learning
📱 Mobile Plan Recommendation System – Megaline

Python | Pandas | Scikit-learn | Gradient Boosting | Random Forest | Logistic Regression | Data Preprocessing | Model Evaluation | Matplotlib | Seaborn

In this project, I developed a classification model to recommend the most suitable mobile plan (Smart or Ultra) for Megaline customers based on their service usage. Using scikit-learn, I analyzed features such as the number of calls, total call duration, text messages, and internet usage to train and evaluate multiple models.

Key findings revealed that the Gradient Boosting model achieved the highest accuracy of 0.829, outperforming other approaches like Random Forest and Logistic Regression. Internet usage emerged as the most influential feature, followed by calls, messages, and minutes.

These insights enable Megaline to provide data-driven plan recommendations, supporting their customer migration strategy and enhancing user satisfaction. This project highlights my ability to build and evaluate machine learning models to solve real-world business challenges.
Sprint 8: Supervised Learning

📊 Customer Churn Prediction – Beta Bank

Python | Pandas | Scikit-learn | LightGBM | XGBoost | Gradient Boosting | Data Preprocessing | Class Imbalance Techniques | Model Evaluation | Matplotlib | Seaborn

Developed a binary classification model to predict customer churn for Beta Bank using features like credit score, account balance, and demographics. LightGBM achieved the best performance with a test accuracy of 85%, an F1-score of 0.59 for churn, and a ROC AUC of 0.85.

Class imbalance techniques and threshold tuning improved recall for churn cases, but further enhancements like resampling strategies and feature engineering are recommended. This project supports Beta Bank’s retention strategy by identifying at-risk customers for timely interventions.
Sprint 9: Machine Learning in Business

🛢️ Oil Well Profit Optimization – OilyGiant

Python | Pandas | Scikit-learn | Linear Regression | Bootstrapping | Feature Analysis | RMSE Evaluation | Profit Simulation | Data Visualization

In this project, I built a predictive modeling pipeline to help OilyGiant determine the most profitable region for oil well development. Using historical geological data from three regions, I analyzed feature distributions, trained linear regression models, and evaluated model performance using RMSE and predicted reserves.

To simulate real-world uncertainty, I applied bootstrapping across 1,000 iterations per region. I selected the top 200 wells based on model predictions and calculated profit based on actual reserves, development cost, and fixed oil prices. Each region's risk of financial loss was also assessed using confidence intervals.

Key findings showed that Region 0 offered the highest estimated profit ($33.21M) and strong model reliability, while Region 2 demonstrated consistent performance and clean data. Region 1 had the lowest risk but showed signs of overfitting. This project highlights my ability to combine predictive modeling with financial simulation to support strategic, data-driven investment decisions in the energy sector.
Sprint 10: Predicting Gold Recovery from Ore Processing

🏆 Gold Recovery Prediction – Mining Operations Optimization

Python | Pandas | Scikit-learn | CatBoost | Random Forest | XGBoost | sMAPE | Time Series Validation | Feature Engineering | Data Visualization

This project involved building machine learning models to predict gold recovery rates at two key stages of a mineral processing pipeline: flotation (rougher) and final purification. The goal was to optimize process control and improve operational efficiency using historical production data.

I engineered features based on time-indexed process variables, applied rigorous data cleaning techniques, and validated model accuracy using the symmetric Mean Absolute Percentage Error (sMAPE). I trained and compared multiple models—including Random Forest, CatBoost, and XGBoost—using custom scoring functions and K-Fold cross-validation.

Random Forest emerged as the top performer (Final sMAPE: 4.08), with CatBoost as a close second (Final sMAPE: 4.32). I also conducted a thorough sanity check of test predictions and analyzed feature importance to understand model decision-making. This project demonstrates my ability to apply advanced regression techniques in a high-stakes industrial context where predictive accuracy directly impacts profitability and process reliability.
Sprint 11: Linear Algebra and Privacy-Preserving Machine Learning

🏦 Insurance Benefits and Claims Prediction – Sure Tomorrow

Python | Pandas | Scikit-learn | kNN | Logistic Regression | Linear Regression | Data Obfuscation | Privacy-Preserving ML | Feature Scaling | RMSE Evaluation | Data Visualization

This project delivered a full machine learning workflow for the Sure Tomorrow insurance company, addressing customer targeting, benefit eligibility prediction, benefit quantity forecasting, and data privacy. I implemented k-Nearest Neighbors to identify customers with similar profiles, revealing the importance of feature scaling to prevent biased neighbor selection.

For benefit eligibility, I trained a logistic regression model that significantly outperformed a dummy baseline, providing reliable predictions for customer benefits. Using linear regression, I forecasted the number of benefits customers may receive, uncovering a highly skewed distribution where most customers received none.

To ensure data confidentiality, I developed a matrix-based obfuscation method that preserved model accuracy. Analytical proof and testing confirmed identical RMSE values (0.363719) between original and obfuscated data, with prediction differences near zero. This project demonstrates my ability to combine predictive modeling, statistical validation, and privacy-preserving techniques to deliver actionable business insights without compromising sensitive customer information.
Upcoming:

Sprint 12: Numerical Methods
Sprint 13: Time Series
Sprint 14: Machine Learning for Texts
Sprint 15: Computer Vision
Sprint 16: Unsupervised Learning
Sprint 17: Final Project

AI Projects + Books

Explore my AI-focused projects and the books that have shaped my understanding of artificial intelligence, machine learning, and data science:

📘 1. Transformers for Natural Language Processing
Generalization: Build, train, and fine-tune deep neural network architectures for NLP with Python, Hugging Face, and OpenAI's GPT-3, ChatGPT, and GPT-4.

Why first: This book establishes foundational understanding of transformers, Hugging Face models, and the inner workings of GPT-based architectures. It’s critical for grasping the models that power most modern AI systems.
📗 2. Modern Computer Vision with PyTorch
Generalization: A practical roadmap from deep learning fundamentals to advanced applications and Generative AI.

Why second: Computer vision complements NLP and reinforces your knowledge of deep learning, PyTorch, and practical model training. While not LLM-focused, it sharpens your understanding of architectures and model evaluation.
📙 3. Generative AI with Python (Rheinwerk Computing)
Generalization: The Developer’s Guide to Pretrained LLMs, Vector Databases, Retrieval Augmented Generation, and Agentic Systems.

Why third: Builds on your transformer/NLP knowledge by introducing LLMs, RAG, vector databases, and agentic systems. This gives a broad, modern perspective on GenAI development.
📕 4. Building LLMs for Production
Generalization: Enhancing LLM Abilities and Reliability with Prompting, Fine-Tuning, and RAG.

Why fourth: Now that you know how LLMs work, this book focuses on fine-tuning, prompting, and RAG pipelines — essential for customizing and deploying models reliably.
📒 5. Generative AI with LangChain
Generalization: Build production-ready LLM applications and advanced agents using Python, LangChain, and LangGraph.

Why fifth: LangChain is a dominant tool for building real-world LLM applications. Study it after you’re comfortable with RAG and LLM fundamentals so you can focus on chains, agents, and LangGraph workflows.
📘 6. Building AI Agents with LLMs, RAG, and Knowledge Graphs
Generalization: A practical guide to autonomous and modern AI agents.

Why sixth: This builds on LangChain and expands into knowledge graphs and autonomous agents — critical for more complex, memory-driven applications like personal assistants and decision-making tools.
📕 7. Building Agentic AI Systems
Generalization: Create intelligent, autonomous AI agents that can reason, plan, and adapt.

Why seventh: This goes deep into reasoning, planning, and adaptive behavior. Best studied once you have experience with RAG, agents, and production-level tools. It’s more advanced and conceptual.
📙 8. The AI Engineering Bible
Generalization: The Complete and Up-to-Date Guide to Build, Develop and Scale Production Ready AI Systems.

Why last: This is the most comprehensive and production-focused book. It brings together everything: MLOps, scaling, monitoring, deployment, and infrastructure. Ideal as a capstone or reference once you’ve built a few prototypes.

Resume

terrencejay23@gmail.com
Oakland, CA
www.linkedin.com/in/terrencejay23

Relevant Experience

Wells Fargo - Python Software Engineer (September 2023 – December 2023)
- Developed and implemented automated data manipulation programs, enhancing data streaming and reporting efficiency, showcasing strong Python development skills.
- Refactored legacy programming code to improve turn-around-times and leveraged modern technologies to optimize performance for derivative and marginal agreements technology teams.
Accenture - Data Scientist Consultant (April 2021 – May 2022)
- Managed 5 teams of 50+ employees in Program Management/Scrum/Agile processes using Azure DevOps.
- Created a Model Development Life Cycle framework to manage the complexity of model development into Agile processes.
- Designed and implemented machine learning strategies for healthcare, demonstrating strong proficiency in Python/PySpark.
- Created automated Python notebooks to analyze data and improve reporting efficiency.

Education

University of Phoenix - MS, Data Science (Expected: May 2026)
University of Phoenix - BS, Information Technology (August 2023)
New York University Tandon School of Engineering - Veteran Entrepreneurship Training (2023)
Product School - Product Management Certification (February 2020)
New York Institute of Technology - BS, Interdisciplinary Studies (May 2016)

Terrence Jay Scott

👋 Welcome to My Data Science Portfolio!

About Me

Skills

Programming Languages & Tools

Databases & Cloud Technologies

Data Engineering & Analytics

DevOps & Data Operations

Soft Skills

Projects

AI Projects + Books

Resume

Relevant Experience

Education