projects & PUBlications
Ray is a distributed execution engine that is used to accelerate deep learning and reinforcement learning applications, and is currently being built out of UC Berkeley's RISELab. I created the first version of the Ray WebUI, which allows for interactive and scalable performance debugging via a task execution graph and system state analysis.
high accuracy SGD using low precision arithmetic and variance reduction (for linear models)
(To appear in SysML 2018) Stochastic gradient descent (SGD) is a popular algorithm for solving optimization problems, many of which originate from the training of linear models. The performance of SGD is dependent upon its ability to efficiently process large-scale data sets. With this in mind, we consider a new SGD-style algorithm, optimized for linear models, that was developed as part of a larger collaboration with researchers at Stanford*, referred to as HALP, which utilizes low-precision arithmetic and variance reduction techniques. We implemented an 8-bit version of this algorithm, and ran preliminary experiments measuring its convergence per iteration and wall-clock time.
Benefits of Resource Disaggregation in datacenters
The slowdown of Moore’s law has led to surfacing of several fundamental limitations of today’s server-centric datacenter architectures. As a result, a new computing paradigm is emerging — a disaggregated architecture, where each resource type is built as a standalone “blade” and a network fabric interconnects the resource blades. While beneficial from the computer architecture perspective, conventional wisdom suggests that decoupling of resources will lead to performance degradation for legacy systems and applications. Thus, widespread adoption of such architectures is currently gated on systems community building systems and applications optimized for this emerging computing paradigm. We present preliminary evidence that the above blockage may be unfounded. Our main insight is that disaggregated architectures offer finer-grained resource multiplexing and improved “packing” of jobs compared to server-centric architectures. We show that, for many legacy systems and applications, performance degradation due to resource disaggregation can be completely offset by improvements due to higher resource utilization in disaggregated architectures.