I will never stop learning and questioning myself, Never Settle.
I am always willing to collaborate and participate in small projects!
I'm tenacious and autonomous, I train myself using tutorials found on the internet.
I love sharing the music I listen to, every month I upload a playlist of the songs I listen to on my YouTube channel.
I'm looking to contribute to Open-Source projects and am interested in AI and mobile/web applications.
This roadmap charts the principal milestones in the development of deep learning, from the earliest formal models of artificial neurons to the open theoretical and architectural problems whose resolution is widely regarded as a prerequisite for Artificial General Intelligence (AGI). Items are classified into three categories: established (β ), denoting results that are well understood and have entered the canonical literature; active research (π ), denoting programmes whose foundations exist but whose theoretical or empirical scope remains incomplete; and open problems (β¬), denoting questions for which no satisfactory framework has yet been proposed.
- β 1943 β McCullochβPitts neuron. The first mathematical formalisation of an artificial neuron, establishing the computational primitive on which all subsequent neural architectures are built.
- β 1957 β Rosenblatt's perceptron. The first supervised learning algorithm with a convergence guarantee for linearly separable data.
- β 1969 β Minsky and Papert's critique. A formal demonstration of the perceptron's inability to represent non-linearly separable functions, precipitating the first AI winter.
- β 1971β1982 β VapnikβChervonenkis theory. The introduction of VC dimension, providing the first rigorous framework for statistical learning theory and uniform convergence bounds.
- β 1986 β Backpropagation. Rumelhart, Hinton, and Williams demonstrate that internal representations can be learnt by gradient descent through composed differentiable layers.
- β 1989 β LeNet. LeCun's convolutional network for handwritten digit recognition, establishing weight sharing and translation equivariance as core principles.
- β 1992 β TD-Gammon. Tesauro's temporal-difference learning agent attains world-class play in backgammon through self-play, providing the first major proof-of-concept for reinforcement learning combined with neural function approximation.
- β 1995 β Support vector machines and PAC-Bayes theory. McAllester's PAC-Bayesian bounds and the rise of kernel methods, which would dominate machine learning for the following decade.
- β 1997 β Long short-term memory. Hochreiter and Schmidhuber's solution to the vanishing gradient problem in recurrent networks.
- β 2000s β The second AI winter. Deep architectures are largely abandoned in favour of kernel methods and shallow probabilistic models.
- β 2006 β Deep belief networks. Hinton's unsupervised layer-wise pre-training reignites interest in deep architectures.
- β 2012 β AlexNet. Krizhevsky, Sutskever, and Hinton's decisive victory on ImageNet, marking the empirical breakthrough of GPU-accelerated deep learning.
- β 2012β β Scattering transforms and the harmonic analysis of deep networks. Mallat's framework for understanding convolutional architectures as cascaded wavelet transforms, providing the first rigorous mathematical theory of stability and invariance in deep networks and laying conceptual foundations for subsequent work on denoising and generative modelling.
- β 2014 β Generative adversarial networks. Goodfellow et al. introduce a novel paradigm for generative modelling based on adversarial training.
- β 2015 β Residual networks and batch normalisation. Architectural innovations that render networks of one hundred or more layers tractable to train.
- β 2015 β Deep unsupervised learning via non-equilibrium thermodynamics. Sohl-Dickstein et al. introduce the diffusion-based generative framework, formalising generation as the inversion of a noising process and connecting deep generative modelling to statistical physics.
- β 2016 β AlphaGo. DeepMind's combination of deep neural networks, Monte Carlo tree search, and self-play defeats Lee Sedol at Go, demonstrating that superhuman performance in domains of high combinatorial complexity is attainable.
- β 2017 β AlphaZero. Generalisation of AlphaGo to chess and shogi using pure self-play without human data, establishing tabula rasa reinforcement learning as a viable paradigm.
- β 2017 β Transformer architecture. Vaswani et al.'s "Attention is All You Need", which would subsequently underpin the entire generation of large language models.
- β 2018β2019 β BERT and GPT-2. The pre-train then fine-tune paradigm becomes the dominant methodology in natural language processing.
- β 2019 β MuZero. Schrittwieser et al. extend AlphaZero to environments without a known transition model, learning the dynamics jointly with the policy and value functions and establishing the first general-purpose planning agent operating on a learnt world model.
- β 2020 β Denoising diffusion probabilistic models. Ho et al. demonstrate that score-based generative models trained by denoising can match or surpass adversarial methods, with subsequent theoretical grounding in the work of Mallat and collaborators on denoisers as learned transfer functions and on the harmonic-analytic structure of the score-matching objective.
- β 2020 β Scaling laws. Kaplan et al. and subsequently the Chinchilla work establish predictable power-law relationships between compute, data, parameters, and loss.
- β 2020 β Intrinsic dimensionality of fine-tuning. Aghajanyan et al. demonstrate that BERT can be fine-tuned within a subspace of approximately two hundred dimensions, foreshadowing low-rank adaptation methods.
- β 2021 β LoRA. Hu et al. provide an empirical validation of the low intrinsic dimensionality hypothesis through low-rank weight updates.
- β 2022 β Latent diffusion models. Rombach et al. shift diffusion to compressed latent spaces, enabling the present generation of large-scale image and video synthesis.
- β 2022β2023 β ChatGPT, GPT-4, and reinforcement learning from human feedback. Large-scale emergence and the formalisation of preference-based alignment.
- π 2017β β Loss landscape geometry and flat minima. Investigations into mode connectivity (Garipov et al., Entezari et al.) and the relationship between curvature and generalisation.
- π 2017β β Non-vacuous PAC-Bayes bounds. Dziugaite and Roy's framework for numerically computable generalisation guarantees on deep networks, the first such bounds to yield non-trivial values.
- π 2019β β Double descent, neural tangent kernel, and implicit regularisation. Counter-intuitive generalisation phenomena that remain only partially understood.
- π 2020β β Geometric deep learning. Bronstein et al.'s unification of convolutional, graph, and attention-based architectures under a common framework of group equivariance, often described as the Erlangen Programme of deep learning.
- π 2020β β Mathematical theory of denoising and generative diffusion. Mallat and collaborators extend the harmonic-analytic framework of scattering transforms to the analysis of denoisers as learned transfer functions, providing a principled account of the regularity properties exploited by diffusion models and establishing connections to renormalisation group methods.
- π 2022β β Mechanistic interpretability and sparse autoencoders. The systematic decomposition of learnt representations into monosemantic features, addressing the superposition hypothesis (Anthropic, Olah and collaborators).
- π 2018β β World models for reinforcement learning. Ha and Schmidhuber's world models, followed by the Dreamer line (Hafner et al.), establish learned latent dynamics as a basis for sample-efficient planning agents.
- π 2022β β Joint Embedding Predictive Architectures (JEPA). LeCun's proposal for non-generative, non-autoregressive predictive models operating in abstract latent spaces, subsequently instantiated in I-JEPA, V-JEPA, and V-JEPA 2 for image and video understanding.
- π 2023β β Foundation models for embodied agents. Vision-language-action models such as RT-2, OpenVLA, and Οβ (Pi-Zero) extend the foundation model paradigm to robotic control, raising the question of whether grounded interaction with the physical world is a prerequisite for general intelligence.
- π 2024β β Test-time compute and chain-of-thought reasoning. Models in the o1 and o3 lineage demonstrate that substantial reasoning capability can emerge from increased inference-time computation alone, without architectural modification, suggesting that reasoning may be a property of the inference procedure rather than the network itself.
- π 2024β β Implicit world models in large language and video models. Empirical evidence (Othello-GPT, emergent spatial and temporal representations, Sora-class video models) that world models may emerge from sufficiently large-scale next-token or next-frame prediction, in tension with the explicitly architected approach advocated by JEPA proponents.
- β¬ Why stochastic gradient descent finds generalising minima. The central unresolved question of deep learning theory; no fully satisfactory account exists.
- β¬ A formal theory of the inductive bias stored in pre-trained weights. The characterisation of what ΞΈβ encodes remains essentially descriptive rather than predictive.
- β¬ An effective VC dimension for fine-tuning. A tight bound relating βΞΞΈβ, domain divergence, and target-task generalisation error has yet to be established.
- β¬ Computable and tight measures of domain divergence. The Ben-David β-divergence bounds, whilst theoretically elegant, remain vacuous in practice.
- β¬ Functional equivalences between architectures. A rigorous formalism for transforming, for instance, a convolutional network into a graph neural network whilst preserving learnt information.
- β¬ Canonical unfolded representations of trained networks. A geometric form in which each direction is monosemantic and the effective dimension of learning is directly measurable.
- β¬ Architected versus emergent world models. Whether genuine world models must be imposed architecturally, as in the JEPA programme, or whether they emerge from sufficiently large-scale predictive training, remains one of the most consequential open questions in the field.
- β¬ Multi-agent architectures inspired by Global Workspace Theory. Specialised modules communicating through a shared workspace, following the cognitive frameworks of Baars and Dehaene.
- β¬ Continual learning without catastrophic forgetting. The capacity to acquire new tasks without degrading performance on previously learnt ones.
- β¬ Causal reasoning and long-horizon planning. Capabilities extending beyond statistical pattern-matching towards genuine inference and goal-directed behaviour, plausibly requiring the integration of model-based reinforcement learning, world models, and test-time deliberation.
- β¬ The role of embodiment. Whether grounded sensorimotor interaction with the physical world is a necessary condition for general intelligence, or whether sufficiently rich multimodal data can substitute for it.
- β¬ Verifiable alignment and formal safety guarantees. Provable bounds on the behaviour of highly capable systems.
- β¬ Artificial General Intelligence. The conjectured culmination of the preceding programmes, contingent upon the resolution of the open problems above.
I work on MacOS with Warp terminal
I am also ready to work remotely.
I also know a lot of other things, I have networking skills, I can do video editing, use virtual machines, and so on...



