Deep Conversations: Lisha Li, Principal at Amplify Partners
Mathematician Lisha Li expounds on how she thrives as a Venture Capitalist at Amplify Partners to identify, invest and nurture the right startups in Machine Learning and Distributed Systems.
JM: Embodied Intelligence is one of your latest investment. I have been following the work of Pieter Abbeel’s work on one-shot intuition learning. Considering your expertise and involvement, why not your own startup? Why become a VC?
LL: It’s not out of question for me. I certainly see a wide range of ideas and opportunities, and learn the dos and don’ts of raising money. Regarding my choice to go into venture capital, not all VC roles are created equally. The role at Amplify aligned very well with my interests and gives me a lot of agency. The team trusts my judgment to look for companies and entrepreneurs. They also allowed me to jump in at the mid-level. They are also very serious about mentoring me to be a full stack investor -- learn how to be a good board member, learn how to be helpful to entrepreneurs and help build a company.
That said, if there’s an idea I feel so passionate about that I believe I should found a company around, I will consider it. So far, I’m having a good time doing the investing side of things.
JM: As an investor, a lot of time goes into the due diligence to decide. How much of the quotidian tasks interests you versus the research line? Do you find your day-to-day tasks as investor boring or you are excited?
LL: Fair question. I think they are complementary. To satisfy the research curiosity, I get to chat with researchers who are definitely not going to start a company. Those conversations give you a good sense of interesting work that’s going on, and who’s doing what. This is what is the most fun about the job, getting to be intellectually curious about many things!
From the investing sense, not everyone starting a company has a cutting-edge idea to implement and that’s also fine. What matters is:
have they found a compelling problem and do they have a viable solution?
I like that aspect of investing because to me, it’s like: Hey, that’s Ag-Tech! Or, Compute substrates! Or manufacturing! What are the problems that are not technical and what’s needed to build a complete solution? If I just wanted to do research, I might have stayed in academia.
JM: It’s said that in the VC community there’s a strong herd mentality. How do you balance applying your mathematical mind versus just following the groupthink or falling prey to group bias?
LL: I try to follow my own instincts on what is interesting to pursue, that’s what makes this job so much fun for me. Sometimes that will align with what’s popular, sometimes not. There’s a lot to learn, and venture capital gives me a feeling of being a kid in the candy shop. There are very few jobs where it is my job to learn things from very smart people and chat with them about the exciting things they are working on. That’s what gets me excited about getting up in the morning!
JM: In terms of how AI is shaping up: Do we have sufficient platforms that are mature enough and reliable enough to apply in the real world?
LL: There are plenty of opportunities. There is definitely room to build the infrastructure to introduce learning in a lot more areas which are not traditionally considered by AI.
Image processing and NLP are very classical AI problems. And we have certainly made astounding strides in the last few years in these areas with advancements in Deep Learning. Beyond that, the ability to introduce learning to more work processes that don’t have to fit the aforementioned more narrow AI tasks are where I think the most exciting opportunities lie.
Some examples include: being able to ingest data, in manufacturing for instance, where for most of the hardware stack there’s no easy way to process internal data. To introduce learning in those processes is an opportunity. At a systems level, what resources should we allocate on the computing side, how one extracts the data and learn from it, given that the entire stack is not mature yet with respect to applying Deep Learning.
JM: The rate of papers coming out of arXiv has been increasing. Are you happy with the quality of papers coming out? Is it just minor, incremental progress just to get a publication out? Is there a problem brewing?
LL: When I came to the field, I realized the huge difference between the traditional paper publication in statistics and math where you would put out a paper and wait for reviews for months and a slow conference publishing cycle. The fast turnaround time for conference cycles can be a good thing. It’s exciting to be in this field where a lot of people are working and there is a relatively low barrier to entry. I don’t want to make any sweeping kind of statements about the quality of work. However there people analyzing whether there is a reproducibility issue. For instance, Joelle Pineau spoke last year at WIML about how 30% of RL experiments failed reproducibility depending on random seeds.
What is nice about the academic community is that there is an awareness about this and people want to fix it. It’s not perfect; it’s a new field after all, but good folks are working on it.
JM: It almost feels like top names like Google (Industry), Investors and Academia should create an open source framework, an open source test platform if you will, for reproducibility of results, algorithms, models from the papers published.
LL: Yes, I agree it could be something.
JM: Differentiable Programming – what’s your own summary of it? By the way, thank you for putting out a recording and your slides about your talk at the Age of AI Conference in San Francisco early this year.
LL: My pleasure! I think there are formal definitions of differentiable programming based on functional programming. The things I wanted to highlight in my talk by using that term, and it was timely, given the Facebook post from Yann LeCunn, is that this is a better phrase to capture what we value about Deep Learning, since the term Deep Learning itself is becoming meaningless. There’s the aspect of modularity about it, the architectures you develop, and the abstraction of where the learning is happening, via back-propagation to beyond than just focusing on “Hey, I am going to iterate our architecture design; now you get to learn this and go beyond.” Not sure if it’s easy to summarize in just one sentence but … having modular components to introduce learning in more generalizable situations.
JM: It seems an opportunity to apply both discrete math and continuous math, using the chain rule to solve complex problems with vector calculus.
LL: Let me clarify, as I was struggling to decide if should include evolutionary methods in addition to gradient-based methods there, because it doesn’t technically take a gradient … and sometimes it’s too brute force; so, it has definitely appeared in the whole Deep Learning canon, to what extent should one generalize the definition to include that in a sensible way? But, if you are only using backpropagation as the means to do optimization, then the differentiable aspect (of differentiable programming) is supposed to capture that.
JM: That could lead to a limitation too, isn’t it? What if I’m using some other method?
LL: Like if you are using some evolutionary method? Yes, that’s what I meant.
JM: Adversarial attacks – it’s so easy to throw a model off-track. Are you worried?
LL: As these things get applied to more high stakes areas, then you expose yourself to greater risks without understanding those vulnerabilities. Right now, applications may be more innocuous but once we introduce this to Defense or Drones, then it will clearly be a bigger problem.
What is scarier is that adversarial examples are so generalizable across different models without real access to the architecture and more so, the problem you are training on. Given these vulnerabilities, I would like to see more research on, how do you guard against these in a more robust way?
JM: Is there a mathematical way of detecting the perturbations or, if one can model the amount of perturbation, when and where can they cause an attack? Here’s the landscape of possibilities, and this is the math that could predict potential vulnerability spots with a score for each.
LL: The landscape from my understanding is that the vulnerabilities are pervasive. The subset of the landscape for which the model works is a much smaller subset in volume than the total landscape. This is fairly intrinsic to the model. I have not spent a ton of time digging into the research behind proving this, but my understanding is that it is extremely generalizable.
JM: Hinton has said that Deep Learning is dead. There’s a buzz around Capsule Networks. Anything you want to share on about this?
LL: My views on Capsule Networks are not so mature yet. The part I sympathize with is that he’s striving for fundamental progress rather than, I think, about the incremental progress. When you describe this tiny improvement in accuracy or some such as if you are revamping the architecture in the papers, it sounds super sexy, but there’s not a lot of foundational thinking going on. That’s a good call to arms. Michael Jordan also recently said in a keynote on the lines of – we, in the research community have a ton to do, we cannot it even call it AI, we have systems that take in data and spew out data and merely mimic some ‘intelligence’ – so, intelligence is merely augmented. So, how do we provide a more sustainable foundation?
JM: Do you think there is any aspect or branch of Math that has not been discovered? Or, if it could be discovered by Machine Learning? For example, in AlphaGo and AlphaGo Zero, the algorithm was able to come up with new strategies far beyond human capability.
LL: Oh yeah! This touches upon what I’m excited about – using Differentiable Programming including Deep Learning and Reinforcement Learning as a collaborative tool in the creative domains which includes Math. I think Christian Szegedy is doing some interesting work on this: not just about theorem proving with Deep Learning, but formalizing the entire mathematical intuition around problem-solving. I would love to see how it progresses. The common string around this is: you have this method – differentiable programming, that is very comprehensive in exploring the optimization space, it is much more comprehensive than a human can be.
JM: Anyone else who is doing similar work?
LL: There’s Dawn Song at UC Berkeley who is working on Program Synthesis. And, Christian has collaborated with a bunch of people at Google Brain.
JM: Data-driven investing? Using Machine Learning for your day to day investing decisions? Any chance of this happening?
LL: It sounds like a cool story to tell but at the seed stage, the signal is so sparse and noisy that it is hard to train any model on it. Metrics are hard to come by – it just boils down to how good is the team, since they don’t have a product or a market share. It is not out of the question [to train a model], but I would question the returns. It is definitely useful for growth stage startups – many more quantitative signals are there to use. I think my value as a seed investor is that I’m more of a matchmaker to make effective teams happen – how do you introduce these talented people who have built the models (which is 90 percent of the work) to problems, investments and the environment for their ideas.
JM: What are your thoughts about Algorithm bias?
LL: I’m glad people are working on it and there’s media attention. There’s an asymmetry in the understanding of how pervasive machine learning algorithms are, that we need to address this now. Increase the understanding of how it affects people and increase the awareness around this.
JM: What are your thoughts on Quantum Computing?
LL: I am interested. Quantum speedups in simulating chemistry may be a very promising application if the hardware ecosystem matures. I also want to understand the engineering problems around this – quantifying the risk landscape and what are the ways we could run Machine Learning algorithms on them eventually.
JM: Why is Graphon one of your favorite mathematical objects? Any parallels to that in Machine Learning?
LL: It was hard to pick. Well, it’s a non-parametric model for certain things. What I like about the Graphon, is that it emphasizes universality which is a concept that I really enjoyed in Math. This object came about via very desperate definitions. And, all these definitions seemed natural on how you define this element – whether through combinatorial example or some kind of sampling example, also the model-theoretic like a logic-based one. But in the end, you got the same object because irrespective of the path you choose to arrive at it, it remains the same stable object, an inescapable truth.
I am not a believer of truth in a platonic sense. Via my Math and Philosophy studies, I believe truth is defined in a more structural way.
JM: Any favorite book or the most exciting book you have read in the last year or so?
LL: My taste is eclectic. I’ve been re-reading some science fiction. I’m a huge fan of Greg Egan – hard sci-fi stuff. I love that he doesn’t shy away from using lots of computational models of the mind, brain uploading, quantum ontology and what the identity of self-means – split senses of the self.
I also read Ted Chiang (whose book “Story of your Life” is the basis for the movie ‘Arrival’). He’s super poetic (shades of Borges) but hard sci-fi as well.
Have not been reading a ton of fiction. I love poetry. I like books on Topology and Autobiographies. A great one is David Quammen’s Spillover, which has a fascinating chapter called the ‘The Chimp and The River’ about the origin of AIDS.
JM: In the last one year, any one scientific paper you loved?
LL: I feel I would not do justice to the fundamentals that people are building. AlphaGo Zero – I was not expecting anything to happen so fast. And some examples from my talk – using Machine Learning in Music or Design. It is the exciting beginning of work in those areas.
JM: Any startup that you want to talk about?
LL: Tons. I can’t really talk about specific ones as we are evaluating or investing in them. I’m looking for applications in non-traditional industries, but also in healthcare or diagnostics not just “is it appropriate to use Machine Learning here” but what is good science here – how do you measure the value of the accretion of a company and the value it generates. If you are creating a diagnostics company, what are the risks associated with stuff that has nothing to do with Machine Learning? I am very excited about such questions. I am also digging deep into the blockchain space. It’s rare when mathematical proof has economic value. Many projects there also lie at the intersection of infrastructure, security and Machine Learning, which are the focuses of the firm.
JM: Thank you for your time, Lisha! I found this discussion very interesting!
LL: Very interesting interview. Thank you!
- The Next Big Inflection in Big Data: Automated Insights
- Bayesian Basics, Explained
- How Much Mathematics Does an IT Engineer Need to Learn to Get Into Data Science?