data science vm

A Big Data Analytics VM for doing Data Science. It provides a huge kickstart to those working with the Big Data Analytics side of Data Science. Essentially, this project automates the creation of the Big Data Scientist's toolbox on a virtual machine (VM). In a few minutes one can begin working with a fully configured data science lab instead of performing the complex installations and configuration required for a functioning development environment. The Data Scientist's VM includes R, Git, Python, Cloudera, Hadoop, YARN, MRv2, Mahout, MongoDB, Spark, Neo4j, etc. pre-installed. The Data Scientist's Toolbox VM is automatically built for you on a single CentOS VM using the Vagrant DevOps tool with Chef and shell-scripts for VMware Fusion.

10
6
Ruby

Data Science VM

##Need to install the following Gems
vagrant plugin install vagrant-omnibus
vagrant plugin install vagrant-env

Users

root/vagrant
joe/joe
chuck/chuck
cloudera/cloudera

Hive Embedded DB

PostgresSQL
Host, e63:7432
DB name, hive
Username, hive
Password, 8xlpmpA6NE

Hue

http://e63:8888/
hdfs/hdfs

Test the Cluster

sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 100