📄 Papers

Title Author Year Description
Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits Amirhosein Ghasemabadi, Di Niu 2026 Large language models (LLMs) generate fluent and complex outputs but often fail to recognize their own mistakes and hallucinations. Existing approaches typically rely on external judges, multi-sample
Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits Amirhosein Ghasemabadi, Di Niu 2026 Large language models (LLMs) generate fluent and complex outputs but often fail to recognize their own mistakes and hallucinations. Existing approaches typically rely on external judges, multi-sample
AI and Generative AI Transforming Disaster Management: A Survey of Damage Assessment and Response Techniques Aman Raj, Ankit Shetgaonkar, Lakshit Arora et al. 2025 Natural disasters, including earthquakes, wildfires and cyclones, bear a huge risk on human lives as well as infrastructure assets. An effective response to disaster depends on the ability to rapidly
Skillful joint probabilistic weather forecasting from marginals Ferran Alet, Ilan Price, Andrew El-Kadi et al. 2025 Machine learning (ML)-based weather models have rapidly risen to prominence due to their greater accuracy and speed than traditional forecasts based on numerical weather prediction (NWP), recently out
Attention Is All You Need Ashish Vaswani, Noam Shazeer, Niki Parmar et al. 2023 The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and d
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism Yanping Huang, Youlong Cheng, Ankur Bapna et al. 2019 Scaling up deep neural network capacity has been known as an effective approach to improving model quality for several different machine learning tasks. In many cases, increasing model capacity beyond
AdaGAN: Boosting Generative Models Ilya Tolstikhin, Sylvain Gelly, Olivier Bousquet et al. 2017 Generative Adversarial Networks (GAN) (Goodfellow et al., 2014) are an effective method for training generative models of complex data such as natural images. However, they are notoriously hard to tra
A simple neural network module for relational reasoning Adam Santoro, David Raposo, David G. T. Barrett et al. 2017 Relational reasoning is a central component of generally intelligent behavior, but has proven difficult for neural networks to learn. In this paper we describe how to use Relation Networks (RNs) as a
Neural Message Passing for Quantum Chemistry Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley et al. 2017 Supervised learning on molecules has incredible potential to be useful in chemistry, drug discovery, and materials science. Luckily, several promising and closely related neural network models invaria
Pointer Networks Oriol Vinyals, Meire Fortunato, Navdeep Jaitly 2017 We introduce a new neural architecture to learn the conditional probability of an output sequence with elements that are discrete tokens corresponding to positions in an input sequence. Such problems
Identity Mappings in Deep Residual Networks Kaiming He, Xiangyu Zhang, Shaoqing Ren et al. 2016 Deep residual networks have emerged as a family of extremely deep architectures showing compelling accuracy and nice convergence behaviors. In this paper, we analyze the propagation formulations behin
Neural Machine Translation by Jointly Learning to Align and Translate Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio 2016 Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neur
Multi-Scale Context Aggregation by Dilated Convolutions Fisher Yu, Vladlen Koltun 2016 State-of-the-art models for semantic segmentation are based on adaptations of convolutional networks that had originally been designed for image classification. However, dense prediction and image cla
Order Matters: Sequence to sequence for sets Oriol Vinyals, Samy Bengio, Manjunath Kudlur 2016 Sequences have become first class citizens in supervised learning thanks to the resurgence of recurrent neural networks. Many complex tasks that require mapping from or to a sequence of observations c
Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang, Shaoqing Ren et al. 2015 Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly re
Recurrent Neural Network Regularization Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals 2015 We present a simple regularization technique for Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units. Dropout, the most successful technique for regularizing neural networks, doe
ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E Hinton 2012
Learning Domain-Driven Design Vlad Khononov
Nested Learning: The Illusion of Deep Learning Architecture Ali Behrouz, Meisam Razaviyayn, Peilin Zhong et al. Over the last decades, developing more powerful neural architectures and simultaneously designing optimization algorithms to effectively train them have been the core of research efforts to enhance th
Page 1 of 1

📚 Books

Title Author Year Description
AI engineering: building applications with foundation models Chip Huyen 2025
The status game Will Storr 2021
Kolmogorov Complexity and Algorithmic Randomness A. Shen, V. Uspensky, N. Vereshchagin 2017
Computer systems: a programmer's perspective Randal E. Bryant, David Richard O'Hallaron 2011 "Computer Systems: A Programmer's Perspective, Second Edition, introduces the important and enduring concepts that underlie computer systems by showing how these ideas affect the correctness, performa
Page 1 of 1

🔗 Others

Title Author Year Description
Paper page - Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits 2026 Join the discussion on this paper page
Attention, You Had One Job!! Vipul Sehgal DeepSeek shrank it, Moonshot hacked it, and now the whole architecture playbook is up for grabs
Aman's AI Journal • Primers • Hyperparameter Tuning
The Unreasonable Effectiveness of Recurrent Neural Networks
Page 1 of 1