CS 229 Machine Learning
Given below are some project ideas and suggestions shared with us by research groups/professors/phd students/postdocs from various departments. Feel free to reach out to them directly (contact information provided below) if you see an idea that interests you!
We are interested in building a (neural) multiagent system where the agents solve a task together by exchanging information. One example is to have each agent read part of an article and ask them to answer questions collaboratively.Contact: He He (email@example.com)
Human body movement is governed by a motor control unit which sends signals to activate muscles. The task of this project is to model this unit using machine learning and deep learning techniques. The dataset of observations come from a biomechanical model (OpenSim). Team with prior knowledge of C/C++ or Python are preferable (to interface with OpenSim, http://simtk-confluence.stanford.edu:8080/display/OpenSim/Welcome+to+OpenSim). Knowledge of mechanics/biomechanics is a bonus.Contact: Lukasz Kidzinski (firstname.lastname@example.org)
We are interested in indoor localization from WiFi signal data. Now we are trying to estimate the distance and AoA (Angle of Arrival) between Wi-Fi transceiver and receiver. We would like to compare the performance of machine learning and deep learning algorithms vs conventional algorithms.Contact: Hirokazu Narui (email@example.com)
Congestion control is the study of *when* to send data on the Internet, versus when to wait and let somebody else take a turn. These algorithms have been adjusted by hand for decades to optimize throughput, delay, and fairness among multiple independent computers. Recent work has begun to teach computers to synthesize these algorithms from first principles. How well can you do (and can you beat the current state of the art, both human- and computer-generated) with real machine learning?
Contact: Keith Winstein (firstname.lastname@example.org)
Scalable approaches to reading comprehension using machine reading
The task of this project is to perform automated reading comprehension in a way that scales to large text documents.Contact: He He (email@example.com)
Myself, with Professor Marco Pavone in the Department of Aeronautics and Astronautics, are interested in comparisons and extensions of the method proposed in this paper: (http://groups.csail.mit.edu/rrg/papers/richter_isrr15.pdf). The method uses machine learning with a Bayesian tilt to predict the fastest way to navigate an unknown map (e.g., how to approach a blind corner) while maintaining safety. The code used is proprietary to the lab that proposed it, so if the project team chooses to extend the approach suggested in this paper, they will first have to replicate their work.Contact: Lucas Janson (firstname.lastname@example.org)
Statistical genetics uses large amounts of genetic information to make inferences about the population-level structure of human genomes, particularly about their ancestry, relatedness, and predisposition to disease. This project aims to design new deep learning algorithms for some of these problems, particularly genome phasing, imputation, and the estimation of the pathogenicity of mutations. Teams with prior knowledge of Deep Learning frameworks such as Theano and/or Tensorflow are preferable.
Contact: Volodymyr Kuleshov (email@example.com)
Adversarial attacks on machine learning
With machine learning and deep learning becoming mainstream across various critical applications from self-driving cars to cyber authentication, their robustness to adversarial attacks is of paramount importance. In this project, we will design and analyze the effectiveness of various types of adversarial attacks against machine learning algorithms, in particular in the cybersecurity context. We will also study how these algorithms can be made robust to such adversarial attacks.Contact: Bahman Bahmani (firstname.lastname@example.org)
Parameter tuning is an important part of modern large-scale auctions (e.g., setting reserve prices in sponsored search auctions). Recent work (https://arxiv.org/abs/1506.03684) lays the statistical learning foundations of this task. The project would be to develop and test algorithms for the corresponding empirical risk minimization problems (which are non-convex and hence challenging).Contact: Tim Roughgarden (email@example.com)
Parameter tuning is an important part of getting algorithms to work (as in gradient descent). Recent work (http://theory.stanford.edu/~tim/papers/features.pdf) lays the statistical learning foundations of this task. The project would be to develop and test algorithms for the corresponding empirical risk minimization problems (which are non-convex and hence challenging).Contact: Tim Roughgarden (firstname.lastname@example.org)
A Kaggle dataset consisting of retina images of 17,500 patients (for a total of about 35,000 images) has recently been released (here). There is a label associated with how bad the damage from diabetes is. The aim of this project is to correctly classify the degree of retinopathy.Contact: Mike Chrzanowski (email@example.com)
DDSM dataset is a dataset of mammograms that consists of 1,112 patients. There is an accompanying csv file providing metadata for each photo, including the label (benign or malignant), how severe it is, and the shape of the tumor. The dataset is hosted on Dropbox and can be found here. The aim of this project is to classify tumors as being benign or malignant.Contact: Darvin Yi (firstname.lastname@example.org)