Chandra Khatri, Lead Research Scientist at Uber AI, discusses the future of conversational AI and multimodal research, as well as how companies can utilize these technologies to accelerate their digital evolution.

Avenue Code: Tell us about your personal career path. When did you first know you wanted to pursue a career in AI, and how did you get to where you are today? 

Chandra Khatri: I was introduced to Machine Learning during my undergraduate research work. I observed the potential impact ML can have in making the world a better place. Since then I have never looked back. While pursuing research in AI during my graduate studies at Georgia Tech, I observed that Conversational AI would shape the future of AI for the next 5-10 years. Unlike other emerging technologies at that time like self-driving cars and AGI, which have been advancing but were not yet marketable products, conversational AI has seen significant improvements in a short period of time and has made an immediate impact. So, I chose to pursue this path and am grateful for the opportunities I’ve had to work on products like the Alexa Prize, which is a competition led by Amazon to build an open-domain conversational AI system, one of the hardest challenges in AI. 

I am now transitioning to multimodal AI since most advancements have been centered around domain level such as Computer Vision, Natural language, and Speech Recognition. The next wave of research and usable product advancements will be happening around multi-domains, i.e. building technologies that can address any kind of input, be it voice, text, or vision, something which is closer to how humans perceive the environment. 

AC: How is Uber utilizing conversational AI and multimodal research to enhance its offerings?

CK: While other large companies have built their own conversational AI systems, Uber has been leveraging some of the existing state of the art tools, while also building and advancing research for the Uber ecosystem. For example, one of our products allows drivers to accept/reject ride requests through voice, allowing drivers to respond to riders’ questions through voice without touching the phone. These address both the safety concerns and legal constraints around smart phone use while driving. 

AC: What are the biggest challenges and opportunities for conversational AI at Uber?

CK: One of our biggest challenges has been that we used an external speech recognition system. Because this system is not designed specifically for Uber, the domain and context are not fitted, and the error rate can be high. This is exacerbated by background noise from traffic, passengers, etc. that make speech recognition more challenging. To address this, we built a model on top of an external solution; the model uses context to improve performance and significantly reduce errors. We published this work at ICASSP 2020.

AC: Tell us about multimodal research and its capabilities within the wider marketplace.

CK: Humans communicate and react to the environment through multiple modalities (vision, touch, hearing, etc.). Our human neurons are logical units but quickly change the way they react and communicate based on sensory input. The idea for multimodal research is to build AI systems that can consume sensory input in multiple ways, mimicking human capabilities.

One of the biggest challenges for AI is building in common sense understanding. As a part of Uber AI, we proposed an AI system that learns common sense on its own, over time, utilizing multimodal capabilities to create rules for basic assumptions humans make to navigate their environment. From a product point of view, we have some problems at Uber that can be solved through multimodal research.. 

AC: Is it still difficult to collect enough data to create such systems?

CK: There’s a branch of ML that focuses on active learning, which helps us to minimize and optimize the data collection so that each piece represents unique information. This is a better long-term solution than using massive data collections. Recently proposed self-supervised learning and generative techniques seem promising towards creating training samples and thereby minimizing data annotation.

AC: Can AI help companies adapt and thrive in a post-COVID economy?

CK: COVID will help enhance the adoption of conversational AI and robotics because we have an accelerated need for digitization. COVID hit everyone, and even enterprise companies with large customer service departments are finding that they don’t have enough workflow to address requests, especially for medical lines, so they’re turning to conversational AI to automate systems and expedite services. Most users are already accustomed to using conversational AI; now it’s up to industries to adopt it. Similar trends can be observed with robotics applications.

AC: Can small companies and startups compete with big players in pushing conversational AI?

CK: Uber, Amazon, Google, and Facebook are building tools with unique infrastructures that support their own services. They are advancing the technology, but generally speaking, most of their solutions are centered around their own problems. Because of this, about 80-90% of the market is still untouched and is available for other sectors like startups, which are beginning to use conversational AI to support healthcare, e-commerce, etc. 

AC: What are you personally most passionate about exploring as a Lead Research Scientist? 

CK: In the long-term, I want to impact the lives of people in a way that is not limited to only one domain. Companies that generate a lot of revenue have an opportunity to help others by building intelligence that has a positive impact on humans, flora and fauna, and the planet as a whole. For example, AI can help us solve a multitude of problems, from predicting and controlling forest fires to predicting and controlling pandemics. I want to be part of this.

AC: What are the biggest ethical considerations and concerns for developing conversational AI and multimodal research?

CK: In terms of ethical use of data, I believe we’ll see a change for the better within the next few years. What happened is that we build the technology very quickly without considering the ethical ramifications of its applications, similar to what happened in the industrial revolution. Now that the technology is mature enough, we’re addressing ethical issues. I have personally been working on this within Uber.

AC: What do you do to stay abreast of innovations in tools and technologies?

CK: I have less time to read now, but I still follow prominent researchers like Yoshua Bengio, as well as professors from universities like Berkeley Lab, Stanford, Oxford, Georgia Tech, etc. who know where AI is headed. I review papers and organize top conferences, thereby trying to stay updated with the advancements happening in the field. I also read whitepapers and abstracts, communicate with and hire several top researchers, and check YouTube, Reddit, Medium, and other channels that distill information quickly.

AC: Thank you for your time and insights, Chandra. We look forward to watching your work as you continue to drive multimodal research.



Anna Vander Wall

Anna Vander Wall is a freelance senior editor and writer in the tech industry and beyond. She particularly enjoys collaborating with Avenue Code’s talented Snippets contributors and whitepaper authors.

AC Spotlight - Alarico Assumpção Júnior


Avenue Code Expands to Germany


AC Spotlight - Kamelia Aryafar