Lee Cohn

Multiple-Testing Problem in Financial Research
A Practical Solution to the Multiple-Testing Crisis in Financial Research

In this project, our research team explores one approach to the problem of multiple backetesting in finance. The main idea is to cluster backtests from the quantitative research process using the correlation matrix of backtest returns to understand how many "unique" ideas have been tried. Then, we compute each cluster's returns applying the minimum variance allocation, so that highly volatile trials do not contribute too strongly. Finally, one can use the deflated sharpe ratio of Marcos Lopez de Prado to determine if one should expect a backtest result is a false positve due to a selection bias under multiple testing.
Multispectral Imaging Band Study
Artificial Multispectralization of Color Satellite Imagery via GANs
Panchromatic to Multispectral: Object Detection Performance as a Function of Imaging Bands
Artificial Colorization of Grayscale Satellite Imagery via GANs: Part 1

In this project, I explore the effects of different satellite imagery types on object detection. I also experiment with GANs to translate between different imagery types.
SpaceNet Challenge
Object Segmentation on SpaceNet via Multi-task Network Cascades (MNC)

In this project, I describe a potential submission to the 2nd SpaceNet challenge using MNC. Our results are competitive with other top submissions to the competition.
Deep Learning - Pug-vs-Chihuahua
Web App (No Longer Available Online)
SlideShare Images of Web App
Github Repository - Pug-vs-Chihuahua
Ipython Notebook Model Output
Ipython Notebook Evaluation Output

In this project, I use 2000 images (from ImageNet) of pugs and chihuahuas to classify if a new image is a pug or chihuahua with 91% accuracy. The classification model was inspired by Michelangelo D'Agostino's Strata 2016 talk. It was built with deep learning using keras and the VGG16 pretrained model developed for the ILSVRC-2014 competition. The frontend was created with Flask.

Note: It costs $3 a day to keep the web app running online, so I had to stop the ec2 instance.
Analysis of Twitter Data During Apple's 2016 WWDC Keynote Address
Getting the WWDC Data
WWDC Analysis 1
WWDC Analysis 2
Map of Geocoded WWDC Tweets

In this project, I analyze Twitter data during Apple's 2016 WWDC conference. We collect the data using Tweepy and store the data in a MongoDB database. We then do various time series analyses of various aspects of the Twitter data and end with plotting a D3 map of the 127 tweets that shared their location.

Warning: Some of the WWDC tweets recorded (and sometimes displayed) in this analysis use foul language.