Wang, Weiran (汪蔚然)Assistant ProfessorComputer Science Department University of Iowa Email : weiran-wang@uiowa.edu 201L MacLean Hall 2 W Washington St, Iowa City, IA 52240 Office Phone : 319-467-1886 previous homepage |
I am broadly interested in machine learning algorithms that are well motivated and truly work in practice. So far, my research topics include multi-modal/multi-view representation learning, speech and audio processing, optimization for machine learning, and applications.
Aug 2024--Current. Assistant Professor. University of Iowa.
Jan 2021--June 2024. Senior and Staff Research Scientist. Google.
Nov 2019--Dec 2020. Senior Research Scientist. Salesforce Research.
Oct 2017--Oct 2019. Senior Research Scientist. Amazon Alexa.
Jan 2014--Sep 2017. Postdoc Researcher. Toyota Technological Institute at Chicago. Advisors: Karen Livescu and Nathan Srebro.
Aug 2008--Dec 2013. PhD, EECS Department at UC Merced. Advisor: Miguel A. Carreira-Perpinan.
Sep 2005--June 2008. Master in Computer Science. Chengdu Institute of Computer Applications, Chinese Academy of Sciences. Chengdu, China.
Sep 2001--June 2005. Bachelor in Computer Science. Huazhong University of Science and Technology. Wuhan, China.
Fall 2024. CS4420: Artificial Intelligence.
Weiran Wang, Zelin Wu, Diamantino Caseiro, Tsendsuren Munkhdalai, Khe Chai Sim, Pat Rondon, Golan Pundak, Gan Song, Rohit Prabhavalkar, Zhong Meng, Ding Zhao, Tara Sainath, Pedro Moreno Mengibar.
Contextual Biasing with the Knuth-Morris-Pratt Matching Algorithm. Interspeech, 2024.
We propose a GPU/TPU implementation of search-based ASR biasing, which was traditionally done with weighted finite state transducers (WFSTs), based on the equivalent KMP string matching algorithm.
Our implementation reduces to a few carefully parallelized loops in deep learning frameworks. The gain from search-based biasing is additive to that of model-based biasing.
[arXiv version]
Weiran Wang, Rohit Prabhavalkar, Haozhe Shan, Zhong Meng, Dongseong Hwang, Qiujia Li, Khe Chai Sim, Bo Li, James Qin, Xingyu Cai, Adam Stooke, Chengjian Zheng, Yanzhang He, Tara Sainath, Pedro Moreno Mengibar.
Massive End-to-end Speech Recognition Models with Time Reduction. NAACL, 2024.
For massive ASR models (including CTC and RNN-T), it is possible to reduce the encoder output frame rate significantly with funnel pooling, without sacrificing recognition accuracy.
[link]
Weiran Wang, Ke Hu, and Tara Sainath. Deliberation of Streaming RNN-Transducer by Non-autoregressive Decoding. ICASSP, 2022.
We can improve the ASR accuracy of a small streaming RNN-T model with non-AR decoding by CTC, taking into account both acoustic features and label dependency of initial hypothesis.
[arXiv version]
Qi Lyu, Xiao Fu, Weiran Wang, and Songtao Lu. Latent Correlation-Based Multiview Learning and Self-Supervision: A Unifying Perspective. ICLR, 2022.
We give an example of provable separation of shared vs private components for multi-view self-supervised learning, based on CCA-type matching + reconstruction + group independence promoting regularization.
[arXiv version]
Junwen Bai, Weiran Wang, and Carla Gomes. Contrastively Disentangled Sequential Variational Autoencoder. NeurIPS, 2021.
We use contrastive mutual information estimation for separating the static component and dynamic component of sequence data.
[arXiv version] [implementation]
Junwen Bai, Weiran Wang, Yingbo Zhou, and Caiming Xiong. Representation Learning for Sequence Data with Deep Autoencoding Predictive Components. International Conference on Learning Representations (ICLR), 2021.
We use predictive information as regularization for representation learning with sequence encoders.
[arXiv version] [implementation]
Weiran Wang, Guangsen Wang, Aadyot Bhatnagar, Yingbo Zhou, Caiming Xiong, and Richard Socher. An investigation of phone-based subword units for end-to-end speech recognition. Interspeech, 2020.
We are the first to investigate phone-based subword units for end-to-end speech recognition.
For Switchboard, our phone-based BPE system achieves 6.8%/14.4% word error rate (WER) on the Switchboard/CallHome
portion of the test set while joint decoding achieves 6.3%/13.3% WER.
On Fisher + Switchboard, joint decoding leads to 4.9%/9.5% WER, setting new milestones for telephony speech recognition.
[arXiv version] [implementation]
Yang Chen, Weiran Wang, and Chao Wang. Semi-supervised ASR by End-to-end Self-training. Interspeech, 2020.
We are one of the first to perform on-the-fly self-training for end-to-end ASR, where we use pseudo-labels on clean data as targets for the same model on perturbed data.
[arXiv version]
Weiran Wang, Qingming Tang, and Karen Livescu. Unsupervised Pre-training of Bidirectional Speech Encoders via Masked Reconstruction. ICASSP, 2020.
We are one of the first to perform (BERT-style) masked reconstruction for pre-training speech encoders.
[arXiv version]
Chao Gao, Dan Garber, Nathan Srebro, Jialei Wang, Weiran Wang (by α-β order). Stochastic Canonical Correlation Analysis. Accepted by Journal of Machine Learning Research.
We provide tight sample complexity analysis of streaming CCA that matches the statistical limit for Gaussian inputs.
[arXiv version]
Weiran Wang and Nathan Srebro. Stochastic Nonconvex Optimization with Large Minibatches. Algorithmic Learning Theory (ALT), 2019.
We apply proximal point algorithm to non-convex population objective, which boils down to solving a sequence of convex problems,
and show benefits of larger minibatch size when the loss is not too nonconvex.
[arXiv version]
Weiran Wang, Xinchen Yan, Honglak Lee, and Karen Livescu. Deep Variational Canonical Correlation Analysis.
We extend the probabilistic interpretation of CCA to use deep generative models, and demonstrate the separation of shared components
from private components for multi-view data including audio + articulation, image + text.
[arXiv version]
Jialei Wang*, Weiran Wang*, and Nathan Srebro. Memory and Communication Efficient Distributed Stochastic Optimization with Minibatch Prox.
Conference On Learning Theory (COLT), 2017.
We apply the proximal point algorithm to stochastic optimization, which leads to a sequence of regularized ERM problems solved in a distributed setup;
this approach offers trade offs between batch size and communication rounds.
[arXiv version]
Weiran Wang*, Jialei Wang*, Dan Garber, and Nathan Srebro. Efficient Globally Convergent Stochastic Optimization for Canonical Correlation Analysis. Advances in Neural Information Processing Systems (NIPS), 2016.
We provide one of the first stochastic optimization algorithms for CCA with global convergence guarantee, based on the eigen-value structure of CCA, where each update makes use of a single data point.
[arXiv version]
Weiran Wang, Raman Arora, Karen Livescu, and Jeff Bilmes. On Deep Multi-View Representation Learning. International Conference on Machine Learning (ICML), 2015.
While the deep CCA objective couples all training sample together, we demonstrate that stochastic optimization works well with large minibatch size.
Deep CCA is a powerful model for extracting the shared component from multi-view data and we propose auto-encoding regularization for it.
[XRMB dataset] [Tensorflow implementation!]
Haozhe Shan. Intern at Google, 2023.
Junwen Bai. Intern at Salesforce Research, 2020.
Yang Chen. Intern at Amazon Alexa, 2019.