CS 229 Machine Learning
Project Suggestions, Autumn 2016

Given below are some project ideas and suggestions shared with us by research groups/professors/phd students/postdocs from various departments. Feel free to reach out to them directly (contact information provided below) if you see an idea that interests you!

Learning to communicate in a multiagent setting

We are interested in building a (neural) multiagent system where the agents solve a task together by exchanging information. One example is to have each agent read part of an article and ask them to answer questions collaboratively.

Contact: He He (

Deep learning based motor control unit

Human body movement is governed by a motor control unit which sends signals to activate muscles. The task of this project is to model this unit using machine learning and deep learning techniques. The dataset of observations come from a biomechanical model (OpenSim). Team with prior knowledge of C/C++ or Python are preferable (to interface with OpenSim, Knowledge of mechanics/biomechanics is a bonus.

Contact: Lukasz Kidzinski (

Indoor localization using WiFi signal

We are interested in indoor localization from WiFi signal data. Now we are trying to estimate the distance and AoA (Angle of Arrival) between Wi-Fi transceiver and receiver. We would like to compare the performance of machine learning and deep learning algorithms vs conventional algorithms.

Contact: Hirokazu Narui (

Multiagent congestion control on the Internet

Congestion control is the study of *when* to send data on the Internet, versus when to wait and let somebody else take a turn. These algorithms have been adjusted by hand for decades to optimize throughput, delay, and fairness among multiple independent computers. Recent work has begun to teach computers to synthesize these algorithms from first principles. How well can you do (and can you beat the current state of the art, both human- and computer-generated) with real machine learning? Contact: Keith Winstein (

Scalable approaches to reading comprehension using machine reading

The task of this project is to perform automated reading comprehension in a way that scales to large text documents.

Contact: He He (

Autonomous Driving Through an Unknown Map

Myself, with Professor Marco Pavone in the Department of Aeronautics and Astronautics, are interested in comparisons and extensions of the method proposed in this paper: ( The method uses machine learning with a Bayesian tilt to predict the fastest way to navigate an unknown map (e.g., how to approach a blind corner) while maintaining safety. The code used is proprietary to the lab that proposed it, so if the project team chooses to extend the approach suggested in this paper, they will first have to replicate their work.

Contact: Lucas Janson (

A Machine-Compiled Database of Genetic Disease
A large fraction of known gene/disease associations is not easily accessible in machine-readable form. working on a machine reading system for automatically extracting this information and presenting it in a useful way to scientists, clinicians and people interested in analyzing the genome (e.g. genome interpretation companies like 23andMe). The project team will work with us to improve on our text classification algorithms and help build a system that will enable users to use our database for personal genome interpretation.

Contact: Volodymyr Kuleshov (

Deep Learning Methods for Statistical Genetics

Statistical genetics uses large amounts of genetic information to make inferences about the population-level structure of human genomes, particularly about their ancestry, relatedness, and predisposition to disease. This project aims to design new deep learning algorithms for some of these problems, particularly genome phasing, imputation, and the estimation of the pathogenicity of mutations. Teams with prior knowledge of Deep Learning frameworks such as Theano and/or Tensorflow are preferable. Contact: Volodymyr Kuleshov (

Adversarial attacks on machine learning

With machine learning and deep learning becoming mainstream across various critical applications from self-driving cars to cyber authentication, their robustness to adversarial attacks is of paramount importance. In this project, we will design and analyze the effectiveness of various types of adversarial attacks against machine learning algorithms, in particular in the cybersecurity context. We will also study how these algorithms can be made robust to such adversarial attacks.

Contact: Bahman Bahmani (

Learning Good Auctions from Data

Parameter tuning is an important part of modern large-scale auctions (e.g., setting reserve prices in sponsored search auctions). Recent work ( lays the statistical learning foundations of this task. The project would be to develop and test algorithms for the corresponding empirical risk minimization problems (which are non-convex and hence challenging).

Contact: Tim Roughgarden (

Algorithms for Learning Good Heuristics

Parameter tuning is an important part of getting algorithms to work (as in gradient descent). Recent work ( lays the statistical learning foundations of this task. The project would be to develop and test algorithms for the corresponding empirical risk minimization problems (which are non-convex and hence challenging).

Contact: Tim Roughgarden (

Diabetic Retinopathy

A Kaggle dataset consisting of retina images of 17,500 patients (for a total of about 35,000 images) has recently been released (here). There is a label associated with how bad the damage from diabetes is. The aim of this project is to correctly classify the degree of retinopathy.

Contact: Mike Chrzanowski (

Malignant tumor classification

DDSM dataset is a dataset of mammograms that consists of 1,112 patients. There is an accompanying csv file providing metadata for each photo, including the label (benign or malignant), how severe it is, and the shape of the tumor. The dataset is hosted on Dropbox and can be found here. The aim of this project is to classify tumors as being benign or malignant.

Contact: Darvin Yi (