Reading List | Vipul Sehgal

📄 Papers

Title	Author	Year	Description
Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits	Amirhosein Ghasemabadi, Di Niu	2026	Large language models (LLMs) generate fluent and complex outputs but often fail to recognize their own mistakes and hallucinations. Existing approaches typically rely on external judges, multi-sample
Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits	Amirhosein Ghasemabadi, Di Niu	2026	Large language models (LLMs) generate fluent and complex outputs but often fail to recognize their own mistakes and hallucinations. Existing approaches typically rely on external judges, multi-sample
AI and Generative AI Transforming Disaster Management: A Survey of Damage Assessment and Response Techniques	Aman Raj, Ankit Shetgaonkar, Lakshit Arora et al.	2025	Natural disasters, including earthquakes, wildfires and cyclones, bear a huge risk on human lives as well as infrastructure assets. An effective response to disaster depends on the ability to rapidly
Skillful joint probabilistic weather forecasting from marginals	Ferran Alet, Ilan Price, Andrew El-Kadi et al.	2025	Machine learning (ML)-based weather models have rapidly risen to prominence due to their greater accuracy and speed than traditional forecasts based on numerical weather prediction (NWP), recently out
Attention Is All You Need	Ashish Vaswani, Noam Shazeer, Niki Parmar et al.	2023	The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and d
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism	Yanping Huang, Youlong Cheng, Ankur Bapna et al.	2019	Scaling up deep neural network capacity has been known as an effective approach to improving model quality for several different machine learning tasks. In many cases, increasing model capacity beyond
AdaGAN: Boosting Generative Models	Ilya Tolstikhin, Sylvain Gelly, Olivier Bousquet et al.	2017	Generative Adversarial Networks (GAN) (Goodfellow et al., 2014) are an effective method for training generative models of complex data such as natural images. However, they are notoriously hard to tra
A simple neural network module for relational reasoning	Adam Santoro, David Raposo, David G. T. Barrett et al.	2017	Relational reasoning is a central component of generally intelligent behavior, but has proven difficult for neural networks to learn. In this paper we describe how to use Relation Networks (RNs) as a
Neural Message Passing for Quantum Chemistry	Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley et al.	2017	Supervised learning on molecules has incredible potential to be useful in chemistry, drug discovery, and materials science. Luckily, several promising and closely related neural network models invaria
Pointer Networks	Oriol Vinyals, Meire Fortunato, Navdeep Jaitly	2017	We introduce a new neural architecture to learn the conditional probability of an output sequence with elements that are discrete tokens corresponding to positions in an input sequence. Such problems
Identity Mappings in Deep Residual Networks	Kaiming He, Xiangyu Zhang, Shaoqing Ren et al.	2016	Deep residual networks have emerged as a family of extremely deep architectures showing compelling accuracy and nice convergence behaviors. In this paper, we analyze the propagation formulations behin
Neural Machine Translation by Jointly Learning to Align and Translate	Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio	2016	Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neur
Multi-Scale Context Aggregation by Dilated Convolutions	Fisher Yu, Vladlen Koltun	2016	State-of-the-art models for semantic segmentation are based on adaptations of convolutional networks that had originally been designed for image classification. However, dense prediction and image cla
Order Matters: Sequence to sequence for sets	Oriol Vinyals, Samy Bengio, Manjunath Kudlur	2016	Sequences have become first class citizens in supervised learning thanks to the resurgence of recurrent neural networks. Many complex tasks that require mapping from or to a sequence of observations c
Deep Residual Learning for Image Recognition	Kaiming He, Xiangyu Zhang, Shaoqing Ren et al.	2015	Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly re
Recurrent Neural Network Regularization	Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals	2015	We present a simple regularization technique for Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units. Dropout, the most successful technique for regularizing neural networks, doe
ImageNet Classification with Deep Convolutional Neural Networks	Alex Krizhevsky, Ilya Sutskever, Geoffrey E Hinton	2012
Learning Domain-Driven Design	Vlad Khononov
Nested Learning: The Illusion of Deep Learning Architecture	Ali Behrouz, Meisam Razaviyayn, Peilin Zhong et al.		Over the last decades, developing more powerful neural architectures and simultaneously designing optimization algorithms to effectively train them have been the core of research efforts to enhance th

Page 1 of 1

📚 Books

Title	Author	Year	Description
AI engineering: building applications with foundation models	Chip Huyen	2025
The status game	Will Storr	2021
Kolmogorov Complexity and Algorithmic Randomness	A. Shen, V. Uspensky, N. Vereshchagin	2017
Computer systems: a programmer's perspective	Randal E. Bryant, David Richard O'Hallaron	2011	"Computer Systems: A Programmer's Perspective, Second Edition, introduces the important and enduring concepts that underlie computer systems by showing how these ideas affect the correctness, performa

Page 1 of 1

🔗 Others

Title	Author	Year	Description
Paper page - Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits		2026	Join the discussion on this paper page
Attention, You Had One Job!!	Vipul Sehgal		DeepSeek shrank it, Moonshot hacked it, and now the whole architecture playbook is up for grabs
Aman's AI Journal • Primers • Hyperparameter Tuning
The Unreasonable Effectiveness of Recurrent Neural Networks

Page 1 of 1