Projects
New/Updated Projects
Objective
How well can you predict genders based on about 10k photos of men and 10k photos of women without going into deep learning?
The objective of this project was to learn image recognition using genders (Male/Female).
Main tools Used
SKLEAN
OpenCV python
PYWT
Business Case
Many tech companies and government agencies use facial recognition today to help them in many different ways, some of the use cases for facial recognition are:
Suspect Detection
When given an image of a suspect who has committed a crime facial recognition is used to help identify them.
People Recognition in Photos
Companies like Facebook use this type of technology to make a more user friendly experience by identifying people in a photo so you can tag them easier.
There are many more use cases for Facial recognition, right now this project only identifies gender, you can check out the other project of artist recognition to see a demo model of facial recognition for people.
What Goes Into gender Classification?
If you are new to image classification, specifically for facial recognition there is a library called OpenCV which I used in order to pick out faces and crop them to use for training. After the images were cropped the next thing to do was make them readable by a computer (Wavelets). If you are interested in what Wavelets are and how it works in python, please click the link.
The two photos would be stacked on top of each other and compared when a model is training. After we have gathered and converted all the images the next step would be to train a model.
Modeling
The first model that was run was a SVM model to get a decent understand of where we are with the predictions. The base model is very good and will most likely be used for our final model but we will need to check other models and see how they perform in order to get the best model we can produce.
After doing a basic model the next step was to check other models to see how they perform, we want to produce the best model, doing this we used a pipeline and grid search for three different types of models (SVM, Random Forest, Logistic Regression).
The best model in both cases was the SVM model so for deployment I went with the SVM model, with the GridSearchCV Hyper parameters.
Results
After running the model with the test data we get a nice confusion matrix to see how well it classified each gender based on facial pictures.
After we see this model we can than check the output of the model to see what it predicts with a new image.
We can see what the model predicted with the probability for each class.
Overview
Older Projects
Pet Store Analysis
Github
Objective
The objective of this project is to create a price predictor model for houses that were sold in the years of 2014 and 2015 in Kings County, WA. The First part of this notebook contains data exploration and data cleaning. The second part of this notebook contains feature exploration and feature creation.
There is a second notebook where the modeling process is contained and evaluated. After the model is created it will be imported into this notebook at the end to test its results and see if we have an accurate housing price predictor for Kings County.
Data Exploration
The first step is to look at how different features affect the prices of houses and graph them to see if there could be any relations between them and prices. I separated them into groups and graphed the groups.
Feature Engineering
After looking through the current features, we want to see if we can make some of our own so some of the features we create in this model are:
renovation_age
sale_year
sale_quarter
bath_to_bed
family_house
Creating Polynomials
We created a new data frame with the poly nomials added to it.
Creating dummy variables
With the new features we have we also want to make some dummy variables. The zipcodes were transformed into dummy variables
Creating and testing models
When creating the models and doing feature testing we used the F-Test and Wrapper method in order to see insights on what features to use. We got some decent scores with these tests and decided to go with the F-test results. The RMSE for these were ~131K.
Final Steps
The last thing to do was bring the model over and test it out on the testing set and see what prices it comes up with. We did not have issues with this step and we were able to get price predictions with 4,322 houses (All the houses within the testing csv!).
Upcoming projects
Pyspark Project
I am currently working on a spark project and will be uploading it soon to github. It will demonstrate my skill and knowledge on spark and am looking foward to sharing it.
New FastApi Project
I am looking to exand my website into a FastAPI based backend site. This will be done after I finish my Spark project.
JS Machine Learning Project
I have a good background in javascript so I would like to demonstrate its ability to make ML models and comapre them to python models. This project will incorporate tensor flow and d3.js