A list of projects I have been working on or built

Dr. Semmelweis and the Discovery of Handwashing

Reanalyzed the data behind one of the most important discoveries of modern medicine: handwashing. In 1847, the Hungarian physician Ignaz Semmelweis makes a breakthough discovery: He discovers handwashing. Contaminated hands was a major cause of childbed fever and by enforcing handwashing at his hospital he saved hundreds of lives. In this Python project, I reanalyzed the medical data Semmelweis collected. This project was done as part of the DataCamp Data Science with Python Career Track.


Data Engineering Project using Sales Data

Data Engineering in Hadoop using Cloudera. Performed the principle tasks involved in managing, loading, extracting, and transforming data. This project respository holds the scripst that I wrote during the whole project. The project was done in Cloudera using Hadoop.


Stock price prediction with Apache spark and cassandra

This is a data pipeline for predicting stock prices using Apache Spark, Apache Cassandra, and machine learning techniques. It collects and preprocesses stock data from Alpha Vantage API, engineers features, trains models, and performs data analysis and predictions.

Apache SparkApache CassandraMachine Learning

The GitHub History of the Scala language

Find out who has had the most influence on its development and who are the experts. Explore the evolution of the Scala language through its vibrant GitHub history. This is a comprehensive collection of historical data, commits, issues, and pull requests related to the development of Scala, a modern, multi-paradigm programming language.

ScalaGitHubData Science

Python Flask AI translation service

This is a web app made using Python-Flask framework that integrates the AI cognitive service of Azure.

AzureMachine LearningLinear Regression

The Forex Data Pipeline with Apache Airflow

The Forex Data Pipeline is a comprehensive solution designed to collect, process, and prepare currency exchange rate data for downstream machine-learning pipelines. This repository showcases the creation of a data pipeline that fetches currency rates from an external API, performs data transformation using PySpark, and loads the processed data into a Hive table within the Hadoop Distributed File System (HDFS). The primary goal is to provide clean and structured currency rate data for seamless integration into subsequent machine-learning workflows.

Apache AirflowPySparkHadoop

A visual history of Nobel Prize winners

The Nobel Prize is perhaps the world's most well-known scientific award. Except for the honor, prestige, and substantial prize money, the recipient also gets a gold medal showing Alfred Nobel (1833 - 1896), who established the prize. Every year it's given to scientists and scholars in chemistry, literature, physics, physiology or medicine, economics, and peace. The first Nobel Prize was handed out in 1901, and at that time, the Prize was very Eurocentric and male-focused, but nowadays it's not biased in any way whatsoever. Surely. Right?

PythonData ScienceData Visualization

Data Analysis Project: Stock Price Analysis and Forecasting

This repository contains the code and analysis for my data analysis project on stock price analysis and forecasting for my Internal attachment at Jomo Kenyatta University of Agriculture and Technology. The project analyzes historical stock price data, visualizes trends, and develops a forecasting model using Python and data science techniques.

PythonData ScienceData Visualization