Category Archives: iPythonNB

Interactive PCA Scatterplots

This is an HTML interactive plot of the popular iris dataset that is compatible with Jupyter Notebook. When the paintbrush is selected, it allows you to select a subset of data to be highlighted among all of the plots. When the cross-arrow is selected, it allows you to to mouseover the data point and see information about the original data. This functionality is very useful when doing exploratory data analysis.

Here is the code:

(Big) Data Science Happenings

Big Data! Cloud Computing!

So, I’ve been learning quite a bit at Galvanize this past week about Spark and AWS. Today it culminated in deploying Spark on multiple clusters on AWS to process large files. Spark has a growing number of machine learning models available, so you can do machine learning in the cloud!

Earlier this week I deployed a small AWS instance and installed Anaconda on it. When running IPython Notebook from AWS, I used a password to protect it. It’s really freaking cool that you can remotely access IPython Notebook! The only problem that I had was that matplotlib didn’t display plots. This was solved by installing the ubuntu-desktop which loaded the qt backend necessary for matplotlib to make plots.

Capstone Project!

I’ve really got to start buckling down on this capstone project. Thanks to Galvanize instructors Isaac and Clayton for bouncing ideas today!

iPython Notebook test

I will hopefully be sharing my iPython Notebook work here on blog, so I wanted to try it out. This post is publicly meaningless besides showing others how they might format notebooks in a WordPress blog. I used this technique while editing the comments color and editing the margin padding on the style.css file. And just FYI, the code below finds the peaks of a sine waveform.

In [3]:

Out[3]: