Repo 2016

R, Python and Mathematica Codes in Machine Learning, Deep Learning, Artificial Intelligence, NLP and Geolocation

106
114
Python

R, Python and Mathematica Codes in Data Science

Welcome to my GitHub repo.

I am a Data Scientist and I code in R, Python and Wolfram Mathematica. Here you will find some Machine Learning, Deep Learning, Natural Language Processing and Artificial Intelligence models I developed.

Outputs of the models can be seen at my portfolio: https://drive.google.com/file/d/0B0RLknmL54khdjRQWVBKeTVxSHM/view?usp=sharing


Mathematica Codes

MNIST_HOT.5.FULL: is a solution for the MNIST dataset in Mathematica, with 96.51% accuracy, based on difference of pixels.

Mathematica - Artificial Intelligence Simulating Interactions in Social Networks: is a model that simulates human interactions in a social network using cellular automata and agent-based modeling. Each agent has 3 possible choices for interation and a memory. The code has 14 pages with a big loop included in one line of code.

Mathematica - Facial Recognition in Movement: This code operationalizes facial recognition in a downloaded YouTube video. The output is also a video with the result of face recognition (YouTube link of the output is included in code page)

Mathematica - Monte Carlo Simulation: is an animated model of a Markov Chain Monte Carlo Simulation for autonomous driving. A video of the dynamic output was also generated and link for the YouTube video is included in code page.

Mathematica - Social Network Surveillance: is a model that tracks individuals in a social network, tracks also his connections and future interactions.


Python Codes

Keras version used in models: keras==1.1.0 | LSTM 0.2

Python - Autoencoder MNIST: is an autoencoder model for classification of images developed with Keras, for the MNIST dataset, with model Checkpoint as a callback to save weights.

Python - Autoencoder for Text Classification: is an autoencoder model for classification of text made with Keras, also with model Checkpoint.

Python - Deep Learning with Lasagne: is a deep neural network developed with Lasagne, where you can see values of weights in each layer, including bias.

Python - Face Recognition: is a model using OpenCV to detect faces.

Python - Image Extraction from Twitter: is a model that extracts pictures and their links from Twitter webpages, plotting with matplotlib.

Python - Keras Convolutional Neural Network: is a CNN developed to classify the MNIST dataset with an accuracy greater than 99%.

Python - Keras Deep Regressor: is a deep Neural Network for prediction of a continuous output made with Keras, learning rate scheduler according to derivative of error, random initial weights, with loss history.

Python - Keras LSTM Network: is a Recurrent Neural Network (LSTM) to predict and generate text.

Python - Keras Multi Layer Perceptron: is a MLP model, Neural Networks made with Keras with loss history, scheduled learning rate according to derivative of error for prediction and classification.

Python - Machine Learning: is a Principal Components Analysis followed by a Linear Regression.

Python - NLP Doc2Vec: is a Natural Language Processing model where I asked a Wikipedia webpage a question and 4 possible answers were semantically chosen from the tokenized and vectorized webpage, using KNN and cosine distance.

Python - NLP Semantic Analysis: is a Natural Language Processing model that classifies a given sentence according to semantic similarity to other sentences, using cosine distance.

Python - NLP Word2Vec: is a model developed from scratch to measure cosine similarity among words.

Python - Reinforcement Learning: is a model based on simple rules and Game Theory where agents attitude change according to payoff achieved. Can be adapted for tit-for-tat strategy, always cooperate, always defeat and other strategies. Rewards were placed in the payoff matrix.

Python - Social Networks: is a model that draws social networks configuration and connections.

Python - Support Vector Machines: is a Machine Learning model that classifies the Iris dataset with SVM and plots it.

Python - Theano Deep Learning: is a Neural Network with two hidden layers using Theano.


R Codes

R - Churn of Customers: is a model that uses a logistic regression associated with a threshold to predict which customers present the greater risk to be lost.

R - Data Cleaning + Multinomial Regression: is a model that presents data cleaning and a multinomial regression using package nnet to classify customers according to their level of loyalty.

R - Face Recognition: is a code to detect faces and objects in R.

R - Geolocation Brazil: is a file for geo-spatial localization, brazilian map.

R - Geolocation USA: is also a file for geo-spatial localization, USA map.

R - Geolocation World: is a file for geo-spatial localization, world map, zoom available, customizable icons.

R - Gradient Descent Logistic: is a model that performs a gradient descent to define a threshold for the sigmoid function in a Logistic Regression. Boosting was implemented and ROC curves compared.

R - H2O Deep Learning: is a Neural Network model developed to predict recommendations and word-of-mouth advertising.

R - Imbalanced classes is a model for employee churn, where features have no correlation with target variable and also there are imbalanced classes in the proportion 1/20. A logistic regression from scratch is applied, a hill climbing gradient is used to define the best threshold for the logistic function and after that, boosting was compared regarding AUC in a ROC plot.

Logistic Regression + Gradient Descent + Boosting is a model where features have no correlation with target variable. Logistic Regression with Gradient Descent was applied, and then Boosting.

R - MNIST: is a solution for the MNIST dataset, developed from scratch.

R - Markov Chains: is a simple visualization of Markov Chains and probabilities associated.

R - NeuralNet: is a Neural Network model developed to predict and classify word-of-mouth advertising.

R - Ridge Regression: is a model with Ridge Regularization made from scratch to prevent overfitting.

R - Deep Learning: is a Neural Network model with 2 hidden layers for prediction of a continuous variable.