Good afternoon. All right. Welcome, everyone,

to the kickoff for the 2019 Allen

School Colloquium. I can now officially say this is

the second annual Allen School Colloquium, and I hope you

are all as excited as I am. I do need to do one

thing really quickly. Hold on, one second. So I’m not teaching

this quarter, but my mom always

went to see my class. And so I figured I’d show

her all of you, which will be the best-looking

photo I’ve sent her so far since I’ve been at UW. So for those of you who

are new to the department, the Allen School

Colloquium is an event we have each year, where

you give the talks. It’s a great way to see what

different groups are up to and hear about your

fellow students’ research. So today we have a wonderful

lineup of Sami, Max, Aditya, and Robbie. And Sami’s going to

kick it off, telling us about trace reconstructions. So I’ll hand it off to her. Thank you, Zach. I have to wait for this

thing to connect again. There we go. OK. So I am Sami, and we call this

the Hodgepodge Colloquium, which is partially true. But really, this is the

Softball Team Colloquium. So this is all people

who are on the softball team and either love being

on it, like Max and Robbie, or hate being on it, like me,

and would like more of you to play so that I don’t

have to show up to as many of the games. Or Aditya, who is new and

excited to join softball. So you should join

the Slack channel. Also, theory’s boring, so

just join the Slack channel right now you don’t even have to

listen to trace reconstruction. It’s all good. So what is trace reconstruction? So trace reconstruction

is this problem originally from the computational

biology community. I’ll talk a little bit more

about that in a second. But as theory does, we have

stolen the problem from fields and completely changed

it into a problem that we think is interesting,

and now no one else really likes it anymore. So the way that the theory

community looks at it is the following– there is a

binary string X of length n, and then we assume

that you’re given what we call some traces

of this unknown string. So your goal is to try to

figure out what the string is. So these traces that

you get are basically sampled substrings,

where you can imagine that each bit is deleted

with some fixed probability. Just imagine, probability

1/2, the bit is deleted, so you get to see these

little sampled substrings. So, for instance,

the last substring is what you would see if those

gray bits had been deleted. And again, the goal is

to learn the labels of X with high probability using

the minimum number of traces. You can assume that you know

the length of the string and that you know the

deletion probability. So as I said, we

stole this problem from the computational

biology community. In particular, this is actually

a problem that comes up for the MISL group, . And you can ask them

exactly how it arises, because I don’t

know, and I didn’t go to my college bio classes. But a lot of them did. The important thing to know is

that the full problem actually involves not only deletions,

as I described it, but also insertions and substitutions. But in the theory community,

we have studied substitutions extremely well, and it’s really

the insertions and deletions that we don’t know how

to handle very well. So hence, that’s why

I stated the problem with only deletions. So what do we know

about this problem? Remember that n is the

length of the unknown string. We know that exponential

and n to the 1/3 traces suffice to find the

original string. This is using something

called a mean-based algorithm. We are basically just looking

at the expectation single bit statistics of the traces. As for lower bounds, we know

that n to the 3/2 traces is required. And that’s for arbitrary

worst case strings. If you assume that you

sample the string randomly, then we know about exponential

and log n to the 1/3 traces suffice. So the important thing

to really get out of that is that we know that

the trace complexity is somewhere between n to the

3/2 and exponential and n to the 1/3. But the big question

which we’ve all been thinking about who

care about this problem are, can we reconstruct

any binary string with high probability using

subexponential traces? So 2 to some poly log n traces. And the sad thing is

that really nobody knows. People have been stuck

on this for a few years. Which, in the theory community,

being stuck for a few years is really not

a long time, but it seems like the

kind of thing where we think people are going to

be stuck on this for a very long time unfortunately. So what have people

been doing who are interested in this

problem, when I just told you that people are

really stuck on it. So instead, people have been

really exploring variance around the space of trace

reconstruction with the hope that it gives intuition

as to how you could maybe make progress on that

worst case problem that I talked about

or just exploring other interesting

areas in the space. So, for instance, I

described the problem as saying that each

bit has to be deleted with the same probability. Maybe you relax that

or change it somehow. You can also imagine that

instead of randomly sampling your string or it just being

like any binary string, you could imagine that you

create some alphabet which you sample from instead. You can characterize

trace complexity in terms of the number of

1s in your original string. There’s also other earlier work

on this more biology-related question about reconstructing

label trees when you were only given access to

leaves of the tree. And so, our work kind of

fits into this realm of, how do we explore the space

of trees reconstruction in a way which may be meaningful

to the theory community? So we’ve been looking

at this problem that we call tree

trace reconstruction. So my goal is to sell you on

the problem, not really my work. So I’ll briefly talk about

some of our work here, but I hope I’ve

just convinced you that trace reconstruction

is kind of interesting. So tree trace reconstruction

is the following– you assume that you have

an unknown label tree X on n nodes. You can assume that you

know the structure of X– say, you know that it’s a binary

tree or something like that. You know the number

of nodes tat it’s on. You know the

deletion probability, and you know the deletion model. Meaning, you know when

one note is deleted, how does it affect

the rest of the tree. The goal is to learn

the labels of X with high probability from

some samples that you receive. So to define how

you get a sample, we need to define

some deletion models. In the deletion models

that we consider, you’re going to assume that each

node is deleted independently with some fixed probability q,

just like in the string case. You can assume

that the siblings– siblings of a certain node– all

have a left to right ordering, and that the root

node’s not deleted. Those are just technicalities. So we introduced this

problem because we really wanted to bring more

combinatorial ideas back into reconstruction problems. Whereas they had been

very much so based in complex analysis– so just

expanding upon the techniques that we can use. And practically, there’s been

some recent work in the DNA nanotechnology

world which has led us to believe that, hey, maybe

actually practically traces of trees are interesting

to look at as well. So a deletion model that

we define called that’s TED deletion model– the Tree

Edit Distance model– is the following– when

you delete a node– so when you delete that

gray node up there– the children of it

move up in the tree. So, for instance, when

that gray node is deleted, its children have moved up to

take its place in the tree. And this is inspired by

computational biology settings, again. And then I’ll just

state our results for a complete k-ary trees. And the big takeaway

here is the stuff in red. The point is that when you

have a k-ary tree for k being the degree, you

can actually reconstruct n polynomially many traces. And then you can do

the same thing again using a different technique

when the degree is actually much larger. So the point is that with

this addition of structure, we are actually able to break

that exponential barrier that I had mentioned for the

string trace reconstruction case. So open problems– like I said,

string trace reconstruction is very open. People really don’t know how to

improve particularly that upper bound for the problem. And then there’s lots

of other variance that we are interested in. One that I like is

instead of reconstructing the original string

exactly, can you recover a string which is

close in Hamming distance? And maybe you can do

that with fewer traces. And this is another

example of a problem in the space which is well

motivated technically but also practically. So thank you. And again, join

the softball team. [APPLAUSE] ZACH: OK. And think we’re going to do

a couple of questions really quickly in between. SAMI: What’s up, AUDIENCE: So [INAUDIBLE]

model, would you actually– the tree was just a path. Don’t you just get back– ZACH: Exactly. Yeah. So it’s a generalization of

the string trace reconstruction problem, where a

string is just a path. AUDIENCE: better than what– SAMI: Oh, sorry. So that’s only for

complete k-ary trees I meant to state that for. Yeah. Yeah. Mm-hmm. AUDIENCE: Well,

pardon the naivety for asking the question, how bad

is this exponential complexity in practice? Like, in the area

of DNA sequencing? Does it actually have

exponentially [INAUDIBLE]?? SAMI: No. I believe you actually

have constant. A MISL person could

correct me on that. But yeah, you actually want to

constant– very, very small. So exponential is bad. OK. Thanks. [APPLAUSE] MAX: All right. I’m Max. And so, unfortunately,

I can’t answer any of those MISL

questions, because I’m the stupid MISL member. So I’m going to be talking

about fluidics plus computation. And because I’ve got 30

slides to do in 10 minutes, we’re just going to skip

most of the computation part. So I’m going to start

by just telling you what fluidics is,

because a year and a half ago I don’t really know

what politics was either. So I’m really talking

about microfluidics, which is the practice of

manipulating small fluids. So here’s a bunch of

different technologies that you might want

to do to do that. You basically have a little

series of tubes on the left, and that’s nice. It’s cheap. It’s single purpose. So that has some uses. And then, in the middle, you’ve

got a pipetting robot, which is basically a

robotic grad student, It picks up pipettes and

squirts them into little tubes, just like a grad student would. And then, on the right,

you have a newer technology called digital microfluidic

device, which I’ll explain a little bit more about in a. Second this is a

video of one we’ve built here at UW

called PurpleDrop. And I like think

about this progression of fluidics as kind

of the same thing that we saw in electronics. On the left, you’ve got

simple single-purpose devices, which we still use,

because they’re cheap when you want to mass

produce something at volume that has a fixed function. But, of course, people

want general purpose programmable things. And then, in electronics,

we eventually got that, but they were expensive and big. And then, of course, now

it’s cheap and ubiquitous. And so I think that’s the

way we’re moving in fluidics. And so that’s why I’m excited

about this digital microfluidic device. So how does it work? So that was the top view. Here’s a side view. So we’re slicing

through it here. You have a grid of electrodes. So you’re always seeing

you through the plane here. And you’ve got a

droplet squished between two hydrophobic layers. And then that white

electrode is on. On. And if you turn the

other one on, it moves. It’s as simple as that. And basically, that’s

the key primitive. And if you put that together,

you can move things around, you can mix and split

them in parallel, and you can move around

to other peripherals you might have on the board. So in this video here, the

little legs on the bottom– the little squares that

they’re not really moving to– those would be heaters. And you could attach

any peripherals you might want onto the board. But the question I

want to talk about is, how do you

program this thing? Because there’s a lot going on. You might want to

make sure– you don’t want your droplets to

bump into each other, and you have to

manage where they are. So how might you

program this thing? So you certainly

don’t want to talk about where the droplets are. You don’t want to say, turn

on electrode 1 at time step 1. That would be really bad. So what people currently do

is they make these graphs. So they basically

make a data flow graph that describes your protocol. This is a simple one. You input two things, mix

them, then output them. And the edges connect

the data flow– the droplet flow, in this case. And currently, the way

things work is you take that and you run it through

some big synthesis tool that uses place and

route techniques from VLSI. Because you’re basically

doing place and route. You need to figure out, where

would you mix this thing? How do you get the

droplets there? Same thing with wires

and modules on a chip. So then you get out a sequence

of electrode actuations. And then you run

that on the hardware, and it zips that droplet around. And then someone goes

and collect the data. And that works well and good if

you want to run one experiment. But a scientist probably

wanted to run that experiment, so they designed that graph. And they’re probably going

to collect that data. And science isn’t a

one and done thing, so they’re probably

going to want to run another experiment, right? And so this is where this kind

of static workflow falls down. And this is where we want to

put computation in the loop. So maybe the scientist is going

to run a machine learning model to figure out what

experiments to do next. Maybe it’s just a

simple if statement. Maybe they’re sweeping

over a parameter list. So instead of trying to come

up with some new programming tool that can solve all of

this with some fancy, schmancy programming language,

we’re just going to use the ones

we’ve already got. So instead of trying to

create a fluidic programming language that tries to overthink

what is a fluidic function and what does it mean for

an if statement execute on the fluidic device, we’re

going to punt on all that and instead create a fluidic

operating system that allows you to write a

regular program in whatever your favorite

programming language is. You can use all your

machine learning models, your if statements,

and all those wacky things. And you can run devices

on a microfluidic chip. So let’s see what

that looks like. So at the top here, this

is kind of our stack. These are all the

cool things you might want to do with fluidic. These are all the

awesome domains that I know nothing about that

scientists are really good at. And at the bottom, you

have this cool hardware that people might want to use

to automate their experiments. So our key idea here is put in

a general purpose programming language, instead of some

special baked fluidic programming stack. And, of course, to

do this, we need to fill this abstraction

gap with a fluidic operating system. And what this is going

to do for us is basically automate resource management. So just like when

you write to a file, the Linux kernel

is like, OK, well, you don’t really

know how many disks. You have you don’t

know where they are. You don’t know where the

blocks are on that disk. That’s OK, I’ll find it for

you, and I’ll just give you back a file descriptor. So a simple number

that represents, OK, I actually know,

under the hood, where those bits

are sitting on disk. And maybe that disk

gets corrupted– it’ll move it somewhere else for you. So it’s the same kind of thing–

dynamic resource management for fluidics. So we call that stack

Puddle, and the hardware we’ve called PurpleDrop. And one nice thing about

this kind of abstraction is that you can write programs

on an operating system and it can run on

diverse sets of hardware. Maybe it runs something

on a GPU or maybe have different kinds

of disks or something. By having a dynamic

abstraction, you can imagine kind of

plugging and playing different types of hardware. So even if people want to

use those pipetting robots, you could imagine using it

with a system stack like this. So we have back ends for

the microfluidic chip, but we’re working on

the ones for the robots. So let’s blow that

stack up and break it down a little bit more. So the high level programs that

people write in Puddle today, you just write in Python. We have front ends in

Python, Rust, and JavaScript. But I’ll show you some

snippets of the Python one, because that’s

what scientists use. So you can write

high level programs. You can even write

domain-specific libraries. Because you can share code,

because it’s just Python codes. And it all hits

puddle through an API. So there’s no compilation step. You’re talking to,

basically, the runtime system through a well-defined API. And then there’s

Puddle, the actual thing in the middle that’s

doing all the analysis and figuring out where

the droplets should go. We’re not really going to

talk about that at all. Basically, I’m going to show

you how you would use it and some cool stuff

we can do with it. And then there’s the hardware. Again, that, I’m going to not

really have time to talk about. So what kind of programs

can you write with Puddle? So there are programs

that look familiar to you, because they’re programs. So I promise this is only

code that I’ll really walk through and make you read. So this code has a

user defined function. And so all the things I’m

going to highlight in green are actual innovations that

when we published this paper, this was the first time people

have done in microfluidic land. So they don’t sound

big to you all, because they are things that

are very run-of-the-mill in programming, but these

are things that if you look at microfluidics in the static

way that people have been doing, it couldn’t

really be done. So you can define

your own functions, and you can share

code to people, and that’s really nifty. You can also loop over

unbounded data structures. So here we’re defining

a thermocycle function. Which, basically, thermocycling

means you repeatedly heat something up. So you’re taking the thing

you’re going to heat up– the droplet– and a list of the

temperatures and times that you’re going

to heat it up to. So what do you do? Well, you loop over that list. So that’s something that

you couldn’t do before. And then the underlined called

to heat here is an API call. So this is actually

just patching into the Puddle

runtime that routes the droplet to the heater

and executes the heat. Another fancy thing you can

do is run an if statement. So you can take a

sensor reading– here, you’re detecting the

volume of the droplet– and get the data

back, and do something with it, all in

the same program. So you’re computing

over the data that you got back from the real

world all in the same program. So here, we’re taking a

volume reading of the droplet. This is done with a camera. And we compared

to some threshold. And if too low– so maybe the

droplet has evaporated a little bit– we pump in some more water

with the input command, and that’s an API call. And we just add it

right to the droplet. And because this is just

regular, vanilla Python code, it’s compositional. So if I gave you the

thermocycle code in a library, you could write PCR or a

Polymerase Chain Reaction, which is something you

might do to duplicate DNA. And it’s basically a simple

instance of thermocycle. So you wouldn’t have to write

thermocycle on your own, you could use the

method that I gave you. So that’s really nice. You can write all this

fancy code in Python, and it will run on the device. So like I said, all these

go through API calls, which hit the Puddle runtime. Which I am not going

to talk about at all, because I don’t

have enough time. At the end of the day, Puddle

crunches on the commands that it gave you,

and it eventually calls out to PurpleDrop. Which, here’s a nice

big image of PurpleDrop. That eventually runs your code. Cool. So again, just to tell

you some cool things you can do with this,

there’s three things you could do with Puddle. And they’re all

sorts of decisions that you couldn’t make before. So you can make decisions at

the execution level, which is where you take sensor

data and you feed it back into the planning loop. So Puddle’s using this to

make your program run better. And one good example of

this is droplet detection. So PurpleDrop has

a camera on top, and it’s constantly

looking at your droplets. It’s using a very, very

simple computer vision, and it’s detecting how

big your droplets are and where they might be. And what it’s doing,

is it’s comparing that with internally where it’s

expecting your droplets to be. And so, if, as is common

on this technology, a droplet fails to

move as planned, we can catch that and

reroute around it. And we can do that

dynamically, because we’re not committed to a particular

plan ahead of time. And that’s the annotated

computer vision algorithm working in real time. So it works in real time,

and this graph basically says it works. And if you don’t have

this error correction, it’s hard to do

much on the device. Because errors are common,

and you need to correct them. [MUSIC PLAYING] AUDIENCE: You’re time’s up. MAX: My time’s up! [LAUGHTER] OK, that program

that we showed you, that was the first time anyone

could run PCR on this device. So that’s pretty cool. And we wrote a Key-DNA store. It’s like a key value store,

where you give it a key and you get back DNA. And that’s it. You can watch the

animations finish. [APPLAUSE] ZACH: Are there any

questions for Max? MAX: Sorry, I need this. AUDIENCE: Really cool talk. What is the debugging

experience like? So if I make a mistake, as I

do when I write Python code, what’s my experience in

figuring out what went wrong and how to fix it? MAX: So debugging

your protocol probably isn’t going to be too much

different from debugging your Python code. That’s one nice thing is that

you’re just running Python 2. If you want to do

print line debugging or even invoke the PDB and run

your protocol interactively, you could totally do that. Debugging the hardware

is a nightmare. It’s like magic when it works,

which means just like magic when it doesn’t. So it’s very hard to

figure out what’s going on. [LAUGHTER] Yeah. AUDIENCE: So you setup

very nice analogy with conventional

operating systems. I just want to write. I want to know how

many disk I have. But the thing is, maybe once

a decade I go to write a file, and it says, sorry, there’s

no room on the disk. I imagine you have constant

resource exhaustion issues, because it’s this big. Right? MAX: Yeah. So you can totally run

into resource exhaustion. It’s really kind of related

to how many live droplets. So the scheduler,

because you don’t get to say when

things run, has a lot of freedom in sequencing

things that you thought might should run in parallel. But if you don’t have enough

resources to run it, but yeah, you totally could come up

with a case that does that. We are picturing a case

where you instead of one of these devices you

have a gajillion of them all linked together. In which case, this

starts to make more sense. There was– I think

one, maybe over there. No. OK. [APPLAUSE] ADITYA: Yeah, I’m a new

grad student, so most of you didn’t meet me. Can I have a quick show

of hands, how many of you do that PixieDust

named machine learning? Yeah, cool. So the PixieDust named

machine learning? AUDIENCE: [INAUDIBLE]. ADITYA: Yeah, so the fancy term. So essentially, I’m

trying to combine two of these hard topics,

like machine learning for internet of things. So my research, which I did

at Microsoft Research India, was to enable this complex

machine learning algorithms in 2 kilobytes of RAM for

the internet of things. And we’re talking about

one particular algorithm which enables deep learning

for 2 kilobytes of RAM for time series of data. So I call this The Edge

of Machine Learning. But people thought it

was for Microsoft Edge because I was in

Microsoft, but it is not. [LAUGHTER] So let me be very clear about

the devices I’m talking about. Like these ARM Cortex-M0 class

devices which have 2 kilobytes of RAM and maybe 48

megahertz clock– maybe the weakest of

the devices out there. And you can’t even run a

proper for loop for long enough without running out the battery. So that’s essentially

the scale we’re tackling. And the flash is

not great as well. We had 32 kilobytes. So that essentially means a

working memory of anything should be less than 2

kilobytes and our model can’t be more than 32 kilobytes. So what is the impact? Most of these things

are futuristic. But one of them, actually,

the smart agriculture for disconnected

farms, that something called FarmBeats, which happens

across the lake in Microsoft Research, Redmond, that

has been pretty successful. But the rest of the things

are still futuristic. So let’s hope they

come to fruition. So just to be clear,

the models will be trained on the cloud,

infinite resources. We don’t care about how

many GPUs, how many icebergs we melt. Finally,

we want them to be deployed on the tiny

IoT devices which have very, very small resources. So a shameless plugging

of advertisement. So we developed a library in

Microsoft Research India which has four cool algorithms,

two in ICM in 2017, two in NeurIPS in 2018. We have two more coming up,

one in [INAUDIBLE] 2019, and in NeurIPS 2019. So essentially it

has six algorithms. Bonsai is our classical

tree-based algorithm which [INAUDIBLE] 2 kilobytes

for classical machine learning tasks. Essentially, you can

replace any classifier which you have been using

in your code bases and replace it with

Bonsai or ProtoNN. ProtoNN is an

kNN-based classifier. Whereas, coming to the much

more cooler thing that’s a deep learning– so EMI-RNN

is a signal-localizing tool for your time series data. And the thing which

I’ll be talking today is FastGRNN it expands too

fast, accurate, stable and tiny [INAUDIBLE] neural network. So it’s a recurrent

neural net, which we expect it to be 45 times

faster than [INAUDIBLE],, while maintaining

the same accuracy. So [INAUDIBLE] time

series like IoT essentially gets the

streaming time series data every single day,

so we just wanted to see how we can

leverage time series data and make some sort

of insights out of it using machine learning. So this has been a

joint collaboration with [INAUDIBLE] who is at

MSRI and a couple of people from Berkeley. So just a refresher

for people who don’t know what recurrent

neural networks are, they have been out there

from 1989 probably. And they essentially

are an intelligent way of recursive matrix

multiplications which keep on accumulating

some memory into the future so that you can leverage the

entire [INAUDIBLE] memory for some classification task. So all I want to show here

is the most important term is the U matrix multiplying with

the hidden state [INAUDIBLE] over and over. So if there is a long

time series which keeps on accumulating

memory, this U essentially becomes something like

U to the power of T, where T is the total

number of time steps. And this comes into

the gradient step during the optimization,

which leads to something called exploding

and vanishing gradient. So you either have your

neural network exploding in gradients– they don’t work– or you don’t have your gradient

moving in any direction, so they are just saturated. So you don’t learn anything

at the end of the day, because of this problem. And this has been a known

problem from like 1990s. And people have been

trying to solve it. So one of the techniques

which people came up with is called unitary ordinance. Because the eigenvalues of

the U to the power of T term explode, what they did

is they just constrained U to be a unitary matrix. So eigenvalues of the highest

eigenvalue and the lowest eigenvalue are always set to 1. So you always project the

matrix back into unitary space. So there is no chance

your optimization is going to go bad. But the problem

is because you’re constraining the

[INAUDIBLE] of model, you accuracy get a big hit, and

you require much bulkier models to work on this. So essentially, this stops

us from deploying them onto the edge. Then comes the standard

state-of-the-art, LSTMs, GRUs, which I’ve been there from 1994. They’ve got the

state-of-the-art everywhere, but they’re too bulky

because they’re too complex. And most of these processes

are single-threaded, so we can’t even leverage

parallel processing. So everything is

pretty expensive if you want to go for these. So we come up with

something called FastRNN. So FastRNN is the

predecessor of FastGRNN. FastRNN is one of

the primitive RNNs which you can see out there. It’s essentially a weighted

residual connection that implies you are adding

the current memory multiplied with the weight to the previous

memory with another weight. And both of these weights

are coupled, essentially, and you train them with

standardized [INAUDIBLE],, and you hope everything works. And surprisingly,

everything works. [LAUGHTER] But the only problem

is we are we are still 1% to 2% short of

LSTMs and GRUs, so we thought, what can we do? Because our inference

[INAUDIBLE] is as good as RNNs. So there’s no overhead

over RNN, so we thought, what can we do better? And there are some

theoretical [INAUDIBLE].. If someone wants to

talk, we can talk later. But essentially, we,

for the first time, proved that that can be a

non-time dependent bounce for generalization and

convergence in RNNs. And this is an [INAUDIBLE]

comparison, so it’s not fair, but still it’s a first

of its own result. Then we thought, why can’t

we match the LSTMs and GRUs? And we came up with a slightly

intuitive and intelligent way of leveraging gating, which

has been there in LSTMs and GRUs for a long time. So we’ve used most of the

information which RNN is using, and we essentially do some sort

of fine transformations– very, very simple intelligent

transformations– which essentially are

multiplying it with a scalar and adding it with

a scalar for a– just a subtraction

from one, assuming their entire information

is bounded by one. And we don’t know

why, but this works as well as LSTMs and GRUs,

and this four times faster. So we thought, why

not push it further? So we took the

matrices and made them low-rank, sparse,

byte quantized. So at the end of the day, we

are 45 times faster than LSTM while maintaining the accuracy. So whenever you see

your LSTM or GRU, you are finding it

hard for the deployment just to replace

that with FastGRNN. And hopefully you’ll get

within 1% of the accuracy. Otherwise, just hit me up. [LAUGHTER] So I was surprised

initially, because when we were designing

this algorithm we thought this would be

only for the IoT task. But it generalized pretty well. So the first three data sets

are actually speech recognition data sets, where we

have short utterances of words, like hello,

Marvin, 1, 2, 3, and so on. And Wakeword-2 is

a data set which is actually a

Microsoft proprietary Hey, Cortana data set. So we wanted to do Hey,

Cortana detection on device because privacy reasons. I think just after we released

the paper, Apple [INAUDIBLE] on-device. So before that,

nothing was actually happening on on-device. Everything was being

streamed to your cloud. So every single word you

utter, even before Hey, Siri, is actually in the cloud. So we don’t want that to happen,

and you can use this to happen. We generalized it to NLP

like sentiment analysis, language modeling. And then we also ran experiments

on activity recognition essentially. So the left three

bars are [INAUDIBLE].. So we are really good,

otherwise the paper wouldn’t have been accepted. But the model set

is something which I’m very happy about,

because this is log scale, and they’re super smart. This is a comparison

with unitary RNNs. Unitarian RNNs, we even

get to about 200 times better model sizes while reading

it [INAUDIBLE] by up to 2%. This is the crown jewel

of the entire paper. We actually did Hey, Cortana

detection on 1 kilobyte of RAM, and we deployed it. So the only caveat

here is we assume that we get the featurized

version of your [INAUDIBLE] in, because we can’t afford

to FFT on the device. So if you get the

feature as Hey, Cortana, we can always make the decisions

using our 1 kilobyte model. We ran a bunch of experience

on real-world IoT devices, and we see it’s up to

two orders of magnitude faster than most of the

other algorithms like LSTMs and unitary RNNs. If you want to deploy an

RNN0-based model for IoT, I think, [INAUDIBLE]

RNN is the way to go. It’s just a new upgrade. So we have about 700 stars. So please, I get 60

more stars, and it improves my GitHub profile. So we don’t support TensorFlow

2.0, but we support Python, C++, and TensorFlow. We even have code optimized

RNN [INAUDIBLE] for Python. So give it a shot. Thank you. [APPLAUSE] ZACH: Do you have any questions? AUDIENCE: So that accuracy– the 1% accuracy–

right now, I understand that’s sort of just a

general measurement. But are there– is there future

work planned to actually peg down why that happens? How far it generalized? Is that something

you’re planning on doing in the future? ADITYA: Not exactly. But we tried to deploy this

in huge products in Microsoft, and this 1% is something

which I observed there. In the IoT regime, I still

match the performance of LSTM and GRU, but it has

some [INAUDIBLE] issues when it goes to large-scale data sets. AUDIENCE: OK. So you’re saying is there’s not

sort of a general equivalence between these two things. ADITYA: Yeah. Because we– AUDIENCE: You don’t really

have a consensus of when the [INAUDIBLE] ADITYA: Yeah. It’s just experimental. So as long as LSTM works, I can

guarantee you up to 1% error. But yeah, that’s that. I can’t always get into your

matching LSTM performance. AUDIENCE: OK. Fair enough. It was cool. The experiment was [INAUDIBLE]

with very good results. ADITYA: Sure. Thanks. AUDIENCE: What size

[INAUDIBLE] was the library? Because you said you did Hey,

Cortana in 1k [INAUDIBLE] you did it– like, the general library

size, how big is it? ADITYA: So the code

is pretty small. The inference codes are

returning native [INAUDIBLE].. So inference codes are just a

couple, 20 to 30 lines of code. And once you write the

serialization you can do that. And we also have

a compiler which was published in PLDI which

can take in all of our models and do the inference for you. Yeah. AUDIENCE: You said

that the unitary RNN is inherently stable, but

then it’s really bulky. ADITYA: Yeah. AUDIENCE: So what’s the

trade-off for FastRNN? Is there a similar [INAUDIBLE]? ADITYA: No. So FastRNN is only limited

by its expressivity. Because even though universal

approximation theorem says all the RNNs are all-powerful,

but you are using a convex optimization routine

to solve a non-convex. Problem. So we don’t when we

hit the sweet spot. But empirically, when we

actually do the experiments, like unitary RNNs

are [INAUDIBLE] 5% load, then FastRNN. But FastRNN is still [INAUDIBLE]

1% to 2% less than [INAUDIBLE].. So this is all empirical. But theoretically, FastRNN’s

training is always stable. So you will never have exploding

and vanishing gradient problem. ZACH: These are great questions. I think we’re going to thank

Aditya again and have Robbie– [APPLAUSE] And remember, these

are your colleagues. So if you have more

questions, find them in the research [INAUDIBLE]. AUDIENCE: Could I have a

quick note, just for you if you’re using a

mic, make sure you’re holding it close

enough to your mouth that it actually picks up– ZACH: Everybody hear that? (SHOUTING) If you

want to be heard– Just a second, guys. ROBBIE: All right. So this is joint work that

I did with three undergrads here at UW– Nick, [INAUDIBLE] and Sierra. The title for the draft that

we have of this paper right now is A Strategy-Proof Alternative

to Round-Robin Subtournaments. But for this talk,

I’ve added a subtitle, which is, Are People

Good at Their Jobs? And that’s actually

the question I’m going to be asking you multiple

times throughout this talk. So I’m going to

start by asking you about some Olympic

athletes, and asking you, are they good at their jobs? So I’m going to

bring up a video. This is from the 2012

Olympic Badminton tournament. And you can tell it’s 2012,

because this video is in 360p. So I will be giving

play-by-play, because you can’t actually

entirely see what’s going on. So we’re going to

watch the team in gold. And specifically, we’re going

to watch this person in the back here. She’s going to take a big swing

and send the birdie directly into the net. Sending things

directly into the net is going to be a bit of a theme. Here’s a serve. It’s going to go

directly into the net. Another server

directly into the net– and a little later, a third

one directly into the net. At this point, this gentleman

is going to come here. He’s one of the organizers

of the tournament. He’s going to give the teams

a little bit of a pep talk and suggest maybe

they try hitting the birdie into the net

a little bit less often. The gold team really

takes this to heart. And instead of hitting

the birdie into the net, they just let it

drop onto the ground. [LAUGHTER] And lest you think this is

just the yellow team involved, a little bit later, the

purple team is going to serve, and they are going to

miss by about that much. [LAUGHTER] So these are Olympic athletes

playing their Olympic sport. And I’m going to ask

you the question, are these Olympians

good at their jobs? You do have to vote. Who votes yes, these Olympians

are good at their jobs? Who votes no? It’s about 50-50. So watching just

that 20-second video, you would probably say that

they’re not playing very well. In fact, you might

even say it looks like they’re trying to lose. And it doesn’t just look

like they’re trying to lose, they are trying to lose. And in fact, that’s

why I’m going to argue that they’re

good at their jobs. So what is the job

of an Olympian? Well, it’s to win

the gold medal. And the way this

tournament was set up, those teams actually

improve their chances of winning the gold medal

by losing the match. So how could that happen? I’m going to break

down very briefly how this tournament works. So there is 16 teams. They’re going to be divided

into groups of four, and you will play each of the

other teams in your group. So that’s three matches. After those three matches, the

top two teams in every group are going to move on, the bottom

two are going to be eliminated. So the second phase of the

tournament, for those top two teams, depending on

which group you’re in and whether you’ve

finished first or second, you’re going to be put into

this bracket over here. And in this bracket– this

is just a single elimination tournament– these top two

teams play, the next two play, the next two play,

and the next two play. Losers are eliminated,

winners move on. These two winners

play each other, those two play each other,

and then the last game, the two remaining teams

play, and the winner gets the gold medal. So the two teams we are watching

our teams W and X in group A. They know, whether

they win or lose, they’re going to move

on to the next round. It’s just a question of are

they going to be A1 or A2. And because of some other

results in the tournament, it turned out that D2, the

second place team in group D, was one of the best

teams– the team that they were

most worried about. And so W and X looked

at this bracket, they looked at where D2 is

going to end up, and they said, well, I don’t want to play

D2 for as long as possible. And given a choice of

A2, way down here, or A1, a little bit closer to D2,

I want to be further away. And so they said, I

would rather be A2. And to do that, I want

to lose the match. So losing their

match, not winning it, gave them the best chance

of winning the gold medal. Or at least, it was supposed to. Remember that guy

giving them a pep talk? That was a little bit

less of a pep talk and more of a threat talk. They were disqualified

immediately after the match for not using their best efforts

and for conducting themselves in a manner which was clearly

abusive and detrimental to the sport. So you could tell from

the quality of that video that it was from

quite a while ago. It was from 2012. And the Olympics happen

every four years, right? So there was another

Olympics in 2016. And after the international

news coverage of these badminton players looking silly

trying to lose a match, the badminton organizers

had to do something. So they had to do something

to change the tournament to make this not happen again. So what did they do? They kept the same

basic structure, but they sprinkled in a little

randomness between the two phases. So instead of knowing

exactly where you’re going to end up in

the bracket, they changed it a little

bit so there’s a little bit of randomness

in where you end up. It’s not uniformly

random where you end up, but it’s not completely

determined anymore. And what they said

when they announced this change is that they

had eliminated any players thoughts about actively trying

to lose a match or matches, irrespective of

other match results, and that they had ensured

such a regrettable spectacle would never be witnessed

in badminton again. [LAUGHTER] And the 2016

tournament happened. There were no incidents. So did the Badminton

World Federation, when they redesigned this

tournament, do a good job? Who thinks yes,

they did a good job? Who thinks no? Again, split. My vote is going to be no. It’s not too hard to

construct another example where your teams are still

going to want to lose. So it’s not too hard. Here’s what you do– you start with the scenario

that happened in 2012, where the teams wanted to lose, and

then you actually have it. Without changing anything

about the scenario, the teams are still

incentivized to lose. Now, you do you have to

think about it a little bit, and you do have to do a

probability calculation, but they’re still

incentivized to lose. So some of you are saying,

wait a minute, Robbie, we’re seven minutes

into your talk, and all you’ve talked

about is sports. Is this just a

talk about sports? And no, it’s not just

to talk about sports, there is some

computer science here. Let me get to it. Specifically, this is

going to be a problem in algorithmic game theory. So what is algorithmic

game theory, or what’s the problem here

in algorithmic game theory? The tournament organizers

did not properly align their incentives. The designers of the

tournament wanted a tournament that was exciting for the fans,

where every team was trying their best to win every

match, and every match was very exciting. The players wanted to

win the gold medal. So what we have here is a

misalignment of incentives. So mechanism design

is a subfield in algorithmic game theory,

where you design algorithms assuming that the inputs

are coming from people who have incentives. And what we want to do is

design a tournament here– an algorithm– where the incentives

of the competitors and the incentives of the

designers are aligned. Essentially, where wanting to

win the gold medal inherently makes you, from the

tournament design, want to win all of your matches. So that’s what

we’re going to try to do in the last couple

of minutes of this talk. So we one of tournament

that aligns incentives. So it’s not too hard to come

up with such a tournament. What you do is you just have a

single elimination tournament for the whole thing. . So take that second

phase that they had and just make that phase

the whole tournament. And you are never

going to want to lose a match in this tournament,

because if you lose, you have no chance of

winning the gold medal, because you’ve been eliminated. That does work, but it’s

not a very exciting answer, and it’s not a

very useful answer. In fact, this was the way the

badminton tournament worked until 2008. And in 2012, what the

badminton organizers decided is that they didn’t

want that anymore. And why didn’t they want that? What they want, I believe,

is to ensure every team plays more than one match. This is the Olympics. They have teams flying in

from all over the world. These teams are

going to come, and if play in a single elimination

tournament, half of them will play one match,

they will lose, and then they will

have to go home. And that’s very disappointing

for all the teams that flew across

the entire world to play in this tournament. It’s disappoint for

the fans who get to watch them once, be

disappointed that they lose, and then have to go home. So these are the

two goals we want– to align incentives

and ensure every team is going to play

more than one game. So your first thought

might be, well, let’s just keep the group

play and maybe add a little bit more randomness

to that second phase. Unfortunately,

that doesn’t work. This is a result

by Pauly from 2014, under some pretty

reasonable assumptions, you can’t have this group

play as the first phase and do anything in the second

phase that will properly align the incentives. So if we’re going to

design this tournament, we can’t be Messing

with the second phase. We have to actually

change that first phase, that group play phase. So this is what

we’re going to do. So my conjecture when we

started this project about nine months ago is that what

you should do is you should replace the group play

with a double elimination tournament. So what’s a double

elimination tournament? It starts out a lot like a

single elimination tournament. On the top half, it looks

almost exactly the same. But when you lose, you’re

not eliminated right away. When you lose, you go

down here to what’s called the losers

bracket, and you’d stay in the losers bracket

as long as you keep winning. So when you get your second

loss, you are eliminated. Until you have two losses,

you continue playing. And you just continue like that

until there is exactly one team remaining with less

than two losses. And then they can

advance out of this group play out of this first phase. And my conjecture

nine months ago is that this works– that this

properly aligns incentives. Now, one more time,

I’m going to ask you, is someone good at their job? And of course, the job of a

theoretical computer scientist is to make conjectures and then

prove those conjectures to be correct. So do we think I’m

good at my job? Who says yes? Who says no? You are far too optimistic. [LAUGHTER] That conjecture is false. Luckily, my co-authors are

very good at their jobs. And so, together, we

were able to come up with a different conjecture

that is actually true. So you can’t use a double

elimination tournament, but you can make just

the tiniest modification, and then it will properly

align incentives. So what we do is as soon as we

have two teams left, instead of those two teams playing

until someone has two losses, they play just a single game. And whoever wins that

game is who moves on. So I’m going to

actually prove to you right now that this

works– that this properly aligns the incentives. So in any game

where you’re going to be eliminated

if you lose, you are not going to want to lose. So all of these

matches in green here, we’ve handled just

with that statement. Then we just have to worry

about the winners bracket. In this semifinal game, if you

lose, what you’ll have to do is beat the other two teams who

are remaining in the tournament one right after the other. On the other hand,

if you win, you just have to beat one

of those two teams. So if you lose, you

will have to play a superset of the opponents that

you would play if you had won. So you’re definitely

going to choose to play fewer opponents to need

to beat fewer teams to win. And in the first round,

we have, essentially, the same statement. If you lose, you’re going

to have to beat all three of the other teams

in the tournament one right after the other

in elimination games. If you win, you

only have to beat two of the three teams

remaining in the tournament in elimination games in

order to win the tournament. And that is the entire

proof of the theorem. This theorem, one of the reasons

that I’m so happy with it, is it’s simple enough to

really fit onto a slide. And actually, if you take

maybe a little bit longer than I took here, I think

you could explain it to Olympic athletes, even

if they don’t have much math background. [LAUGHTER] And really do convince them not

to lose anymore in tournaments. So that is our theorem. So what are the

takeaways from my talk? The first takeaway is, watch the

badminton tournament in 2012. It could get really interesting. Secondly, tell the

Badminton World Federation about modified double

elimination tournament instead of group play and help

us ensure that such a regrettable spectacle is never

witnessed in badminton again. Thank you. [APPLAUSE] Dan? AUDIENCE: So you didn’t

give us any intuition on why the traditional double

elimination doesn’t work. ROBBIE: Yeah so what does

traditional double elimination not work? So the problem is, essentially,

it’s only in that first round. If you lose in the

first– if there’s a team that’s better than you,

that you don’t have a very good chance against,

you might be able to intentionally

lose in the first round, face them in the

loser’s bracket, only need to get an upset once. As opposed to, if you

win, you might play them in the final round, where

you have to beat them twice. And even though it’s

two out of three, you might prefer, if

you’re worse than them, to just have to beat them once

down here, rather than two times out of three naturally. Other questions? [INAUDIBLE] AUDIENCE: Actually,

no– never mind. ROBBIE: OK. Yeah? AUDIENCE: If you’re telling

us to watch badminton in 2020, wouldn’t this be evidence

that the committee has done a great job

designing the tournament. [LAUGHTER] ROBBIE: I suppose,

under the there’s no such thing as

bad press standard, then yeah, this is

quite possibly what they should keep around. Yeah. AUDIENCE: It seems like,

from a meta perspective, the committee has aligned

their incentives just fine by the method of if you

trying to pull this stunt we will just kick you loose. ROBBIE: Yeah. This is a good point. So there’s this

external statement that they can make of,

if you try to lose, we’re going to kick you out. And therefore, you

can’t try to lose. This was a particularly

bad example of teams trying to lose,

where it was very obvious. You should have to

be worried, though, about teams trying to lose and

being subtle enough about it that you don’t catch them. Paul. AUDIENCE: So there are other

types of other sports– curling and softball– College World Series,

and things like that, where, in

College World Series, you have multiple games

to decide whether you want to win or lose. So a single pairing isn’t

enough for things like that. Are there modifications

that would allow you to generalize in

some of these other scenarios? ROBBIE: Yeah. So are there things we

could do to generalize? So this is actually one

of the strange things about the statement, it

holds for four teams. If you go up to eight

teams, we actually don’t know how to make it so

that teams don’t want to lose. So this double

elimination, the proof is actually very fragile to

having only four teams around. So as long as you keep

everything in groups of four, you might be able to modify it. I actually don’t know

ways to get bigger than 4, except by saying, handle

the size 4 and recurse. Brett. AUDIENCE: One quick

one, one moderate one. The quick one is

just to clarify, you’re proposing the

modified double elimination to replace the group play and

then you proceed from there to the standard knockout. ROBBIE: Yes. Yeah. This replaces the group play,

then you can do either one. You can either go to

the standard knockout or you could do another version

of this– whichever works. AUDIENCE: And the

longer question is, is there a 30-second or

so summary of the poly results about why group play inherently

must [INAUDIBLE] aligned incentives. ROBBIE: Yeah is there

a 30-second version of the poly result? Unfortunately, no. His proof is a proof by

exhaustion that was– [LAUGHTER] –had to be checked

via computer, because he literally just said,

here are about 5,000 cases. I wrote up in C++ a program that

checks them all, and trust me, none of them work. [LAUGHTER] Getting a short

version of that proof is actually an interesting

open-ended statement. Lauren. AUDIENCE: So the one

benefit to group play is you only have to

play three games. Which, I’m thinking,

in terms of– I’ve been to so many

ultimate Frisbee tournaments, that time-wise you have

a weekend to do this. This actually, the

double elimination, winds up adding time. ROBBIE: Yeah. So it looks like it adds time. It actually doesn’t. So we’re only having one

team advance out of the four, instead of two. So we’ve actually gotten

rid of one of the rounds in the knockout round. And there are actually only

four rounds necessary here. Every team is going to

play, at most, four matches. AUDIENCE: True. But you have to play

A, B, C, D, and then L1 and L2 actually happens after

A, B, C, D happened, right? So doesn’t it– it’s

not really balanced. ROBBIE: So A, B, C,

D play all at once. The losers and the winners

play– so 3 and 4 happen. 5 happens by itself,

and then 6 happens. And then, remember,

there is no 7, because we modified

it to get rid of 7. So it’s actually

only four rounds, and it’s taking the

place of four rounds in the original tournament. AUDIENCE: Oh, four rounds. ROBBIE: Yeah. Should we– ZACH: One more. One more? OK, Nathan. AUDIENCE: So what if you

know that you can’t get gold, does this is still work. If you’re trying

to get [INAUDIBLE].. ROBBIE: Oh. That’s very interesting. I don’t know. But we have not put the

paper on archive yet, so we can figure it out

and type it up quickly. You can get an acknowledgment

for asking that question. ZACH: Sorry, [INAUDIBLE]. We’ll take more

questions offline. Some thanks, everybody,

for showing up. Please, make sure to

come back October 24, for the second installment of

2019’s Allen School Colloquium. And until then, I hope everybody

has a wonderful beginning of the quarter. Let’s thank the

speakers one last time. [APPLAUSE]

Truly Awesome!, its so cool!, See this New Album 'Monish Jasbird – Death Blow', channel link www.youtube.com/channel/UCv_x5rlxirO-WKjLIyk6okQ?sub_confirmation=1 , you can try 🙂