Papers
| Title | Author | Year | Description |
|---|---|---|---|
| Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits | Amirhosein Ghasemabadi, Di Niu | 2026 | Large language models (LLMs) generate fluent and complex outputs but often fail to recognize their own mistakes and hallucinations. Existing approaches typically rely on external judges, multi-sample |
| Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits | Amirhosein Ghasemabadi, Di Niu | 2026 | Large language models (LLMs) generate fluent and complex outputs but often fail to recognize their own mistakes and hallucinations. Existing approaches typically rely on external judges, multi-sample |
| AI and Generative AI Transforming Disaster Management: A Survey of Damage Assessment and Response Techniques | Aman Raj, Ankit Shetgaonkar, Lakshit Arora et al. | 2025 | Natural disasters, including earthquakes, wildfires and cyclones, bear a huge risk on human lives as well as infrastructure assets. An effective response to disaster depends on the ability to rapidly |
| Skillful joint probabilistic weather forecasting from marginals | Ferran Alet, Ilan Price, Andrew El-Kadi et al. | 2025 | Machine learning (ML)-based weather models have rapidly risen to prominence due to their greater accuracy and speed than traditional forecasts based on numerical weather prediction (NWP), recently out |
| Attention Is All You Need | Ashish Vaswani, Noam Shazeer, Niki Parmar et al. | 2023 | The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and d |
| GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism | Yanping Huang, Youlong Cheng, Ankur Bapna et al. | 2019 | Scaling up deep neural network capacity has been known as an effective approach to improving model quality for several different machine learning tasks. In many cases, increasing model capacity beyond |
| AdaGAN: Boosting Generative Models | Ilya Tolstikhin, Sylvain Gelly, Olivier Bousquet et al. | 2017 | Generative Adversarial Networks (GAN) (Goodfellow et al., 2014) are an effective method for training generative models of complex data such as natural images. However, they are notoriously hard to tra |
| A simple neural network module for relational reasoning | Adam Santoro, David Raposo, David G. T. Barrett et al. | 2017 | Relational reasoning is a central component of generally intelligent behavior, but has proven difficult for neural networks to learn. In this paper we describe how to use Relation Networks (RNs) as a |
| Neural Message Passing for Quantum Chemistry | Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley et al. | 2017 | Supervised learning on molecules has incredible potential to be useful in chemistry, drug discovery, and materials science. Luckily, several promising and closely related neural network models invaria |
| Pointer Networks | Oriol Vinyals, Meire Fortunato, Navdeep Jaitly | 2017 | We introduce a new neural architecture to learn the conditional probability of an output sequence with elements that are discrete tokens corresponding to positions in an input sequence. Such problems |
| Identity Mappings in Deep Residual Networks | Kaiming He, Xiangyu Zhang, Shaoqing Ren et al. | 2016 | Deep residual networks have emerged as a family of extremely deep architectures showing compelling accuracy and nice convergence behaviors. In this paper, we analyze the propagation formulations behin |
| Neural Machine Translation by Jointly Learning to Align and Translate | Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio | 2016 | Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neur |
| Multi-Scale Context Aggregation by Dilated Convolutions | Fisher Yu, Vladlen Koltun | 2016 | State-of-the-art models for semantic segmentation are based on adaptations of convolutional networks that had originally been designed for image classification. However, dense prediction and image cla |
| Order Matters: Sequence to sequence for sets | Oriol Vinyals, Samy Bengio, Manjunath Kudlur | 2016 | Sequences have become first class citizens in supervised learning thanks to the resurgence of recurrent neural networks. Many complex tasks that require mapping from or to a sequence of observations c |
| Deep Residual Learning for Image Recognition | Kaiming He, Xiangyu Zhang, Shaoqing Ren et al. | 2015 | Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly re |
| Recurrent Neural Network Regularization | Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals | 2015 | We present a simple regularization technique for Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units. Dropout, the most successful technique for regularizing neural networks, doe |
| ImageNet Classification with Deep Convolutional Neural Networks | Alex Krizhevsky, Ilya Sutskever, Geoffrey E Hinton | 2012 | |
| Learning Domain-Driven Design | Vlad Khononov | ||
| Nested Learning: The Illusion of Deep Learning Architecture | Ali Behrouz, Meisam Razaviyayn, Peilin Zhong et al. | Over the last decades, developing more powerful neural architectures and simultaneously designing optimization algorithms to effectively train them have been the core of research efforts to enhance th |
Page 1 of 1
Books
| Title | Author | Year | Description |
|---|---|---|---|
| AI engineering: building applications with foundation models | Chip Huyen | 2025 | |
| The status game | Will Storr | 2021 | |
| Kolmogorov Complexity and Algorithmic Randomness | A. Shen, V. Uspensky, N. Vereshchagin | 2017 | |
| Computer systems: a programmer's perspective | Randal E. Bryant, David Richard O'Hallaron | 2011 | "Computer Systems: A Programmer's Perspective, Second Edition, introduces the important and enduring concepts that underlie computer systems by showing how these ideas affect the correctness, performa |
Page 1 of 1
Others
| Title | Author | Year | Description |
|---|---|---|---|
| Paper page - Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits | 2026 | Join the discussion on this paper page | |
| Attention, You Had One Job!! | Vipul Sehgal | DeepSeek shrank it, Moonshot hacked it, and now the whole architecture playbook is up for grabs | |
| Aman's AI Journal • Primers • Hyperparameter Tuning | |||
| The Unreasonable Effectiveness of Recurrent Neural Networks |
Page 1 of 1