<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Rajeev’s Substack]]></title><description><![CDATA[My personal Substack]]></description><link>https://rajeevranjansinha.substack.com</link><image><url>https://substackcdn.com/image/fetch/$s_!H3re!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89d4d189-d431-4288-87ec-ecaf7b91848e_144x144.png</url><title>Rajeev’s Substack</title><link>https://rajeevranjansinha.substack.com</link></image><generator>Substack</generator><lastBuildDate>Sat, 23 May 2026 17:15:08 GMT</lastBuildDate><atom:link href="https://rajeevranjansinha.substack.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Rajeev ranjan Sinha]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[rajeevranjansinha@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[rajeevranjansinha@substack.com]]></itunes:email><itunes:name><![CDATA[Rajeev ranjan Sinha]]></itunes:name></itunes:owner><itunes:author><![CDATA[Rajeev ranjan Sinha]]></itunes:author><googleplay:owner><![CDATA[rajeevranjansinha@substack.com]]></googleplay:owner><googleplay:email><![CDATA[rajeevranjansinha@substack.com]]></googleplay:email><googleplay:author><![CDATA[Rajeev ranjan Sinha]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Cloud Native Data Infrastructure: Why Kubernetes Must Go Beyond Stateless Applications]]></title><description><![CDATA[For the last decade, enterprises have been racing toward cloud native transformation.]]></description><link>https://rajeevranjansinha.substack.com/p/cloud-native-data-infrastructure</link><guid isPermaLink="false">https://rajeevranjansinha.substack.com/p/cloud-native-data-infrastructure</guid><dc:creator><![CDATA[Rajeev ranjan Sinha]]></dc:creator><pubDate>Sat, 16 May 2026 11:26:20 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!YMkq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf6efc15-e856-4bff-b18e-f07328f9611b_600x523.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Containers replaced virtual machines.<br>Microservices replaced monoliths.<br>Infrastructure became declarative.<br>Kubernetes emerged as the control plane for modern applications.</p><p>Yet despite all this progress, one critical layer often remained stuck in the past:</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://rajeevranjansinha.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Rajeev&#8217;s Substack! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p><strong>Data infrastructure.</strong></p><p>Many organizations proudly describe themselves as &#8220;cloud native&#8221; while still running databases, analytics engines, and streaming systems outside the very platform orchestrating the rest of their applications.</p><p>This separation creates operational complexity, fragmented tooling, inconsistent automation, and slower innovation cycles.</p><p>The next phase of modernization is not simply about running applications on Kubernetes.</p><p>It is about bringing data into the same operational model.</p><p>Welcome to the era of <strong>cloud native data infrastructure</strong>.</p><div><hr></div><h1>The Incomplete Cloud Native Journey</h1><p>Early Kubernetes adoption focused heavily on stateless workloads.</p><p>This made sense.</p><p>Stateless services are easier to orchestrate because they don&#8217;t maintain long-term data. Containers can be created, scaled, or destroyed with minimal coordination. APIs, frontend services, and business logic layers naturally fit Kubernetes&#8217; original strengths.</p><p>Stateful workloads are different.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YMkq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf6efc15-e856-4bff-b18e-f07328f9611b_600x523.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YMkq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf6efc15-e856-4bff-b18e-f07328f9611b_600x523.png 424w, https://substackcdn.com/image/fetch/$s_!YMkq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf6efc15-e856-4bff-b18e-f07328f9611b_600x523.png 848w, https://substackcdn.com/image/fetch/$s_!YMkq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf6efc15-e856-4bff-b18e-f07328f9611b_600x523.png 1272w, https://substackcdn.com/image/fetch/$s_!YMkq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf6efc15-e856-4bff-b18e-f07328f9611b_600x523.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YMkq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf6efc15-e856-4bff-b18e-f07328f9611b_600x523.png" width="600" height="523" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cf6efc15-e856-4bff-b18e-f07328f9611b_600x523.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:523,&quot;width&quot;:600,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Stateless vs. stateful services&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Stateless vs. stateful services" title="Stateless vs. stateful services" srcset="https://substackcdn.com/image/fetch/$s_!YMkq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf6efc15-e856-4bff-b18e-f07328f9611b_600x523.png 424w, https://substackcdn.com/image/fetch/$s_!YMkq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf6efc15-e856-4bff-b18e-f07328f9611b_600x523.png 848w, https://substackcdn.com/image/fetch/$s_!YMkq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf6efc15-e856-4bff-b18e-f07328f9611b_600x523.png 1272w, https://substackcdn.com/image/fetch/$s_!YMkq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf6efc15-e856-4bff-b18e-f07328f9611b_600x523.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Stateless vs Stateful service</figcaption></figure></div><p>Databases, analytics systems, and streaming platforms must maintain consistency, durability, replication, and availability across failures. Storage persistence becomes critical. Capacity planning becomes far more complex. Performance unpredictability can create cascading failures.</p><p>For years, this led to a widely accepted belief:</p><blockquote><p>&#8220;You should never run databases on Kubernetes.&#8221;</p></blockquote><p>That statement once had practical merit.</p><p>But Kubernetes has evolved dramatically.</p><p>Today, the platform includes mature primitives for stateful infrastructure:</p><ul><li><p>StatefulSets</p></li><li><p>Persistent Volumes</p></li><li><p>StorageClasses</p></li><li><p>CSI drivers</p></li><li><p>Operators</p></li><li><p>Advanced networking policies</p></li><li><p>Declarative scaling models</p></li></ul><p>The ecosystem surrounding Kubernetes has also matured alongside it.</p><p>Modern data platforms increasingly embrace Kubernetes as their operational foundation:</p><ul><li><p>PostgreSQL</p></li><li><p>MySQL</p></li><li><p>MongoDB</p></li><li><p>Apache Cassandra</p></li><li><p>Apache Kafka</p></li><li><p>Apache Flink</p></li><li><p>Apache Spark</p></li></ul><p>The industry is moving toward a world where data systems are no longer exceptions to cloud native architecture.</p><p>They become first-class citizens within it.</p><div><hr></div><h1>What &#8220;Cloud Native Data&#8221; Actually Means</h1><p>Running a database inside a container is not enough.</p><p>Cloud native data is not a packaging exercise.</p><p>It is an operational philosophy.</p><p>A truly cloud native data platform must embody the same principles expected from modern applications:</p><ul><li><p>Scalability</p></li><li><p>Elasticity</p></li><li><p>Self-healing</p></li><li><p>Observability</p></li><li><p>Declarative management</p></li><li><p>Automation</p></li><li><p>Portability</p></li></ul><p>The goal is convergence.</p><p>Instead of separate operational silos for:</p><ul><li><p>Compute</p></li><li><p>Networking</p></li><li><p>Storage</p></li><li><p>Security</p></li><li><p>Databases</p></li><li><p>Streaming systems</p></li><li><p>Analytics platforms</p></li></ul><p>everything becomes part of a unified control plane.</p><p>This is where Kubernetes changes from being &#8220;just a container orchestrator&#8221; into something much larger:</p><p>A distributed operating model for infrastructure itself.</p><div><hr></div><h1>From Virtual Servers to Virtual Datacenters</h1><p>Traditional infrastructure thinking revolved around servers.</p><p>Cloud native thinking revolves around systems.</p><p>This distinction matters.</p><p>In earlier eras, scaling meant provisioning more hardware manually. Entire teams spent months planning infrastructure growth, configuring environments, and coordinating deployments.</p><p>Now infrastructure can be defined declaratively:</p><ul><li><p>Compute resources</p></li><li><p>Networking</p></li><li><p>Storage policies</p></li><li><p>Security rules</p></li><li><p>Service discovery</p></li><li><p>Observability pipelines</p></li></ul><p>all expressed as code.</p><p>Instead of deploying individual servers, we are increasingly deploying <strong>virtual datacenters</strong>.</p><p>Kubernetes continuously reconciles desired state against actual state:</p><ul><li><p>Failed containers restart automatically</p></li><li><p>Traffic reroutes dynamically</p></li><li><p>Storage attaches declaratively</p></li><li><p>Services scale horizontally</p></li><li><p>Infrastructure heals itself</p></li></ul><p>For stateless workloads, this model became mainstream years ago.</p><p>For data infrastructure, the transition is only beginning.</p><div><hr></div><h1>Why Stateful Infrastructure Was Hard</h1><p>The hesitation around stateful workloads on Kubernetes did not come from nowhere.</p><p>Databases and analytics systems have historically required:</p><ul><li><p>Stable storage</p></li><li><p>Predictable networking</p></li><li><p>High I/O throughput</p></li><li><p>Strict consistency guarantees</p></li><li><p>Controlled failover behavior</p></li><li><p>Resource isolation</p></li></ul><p>Traditional infrastructure teams optimized for predictability by dedicating hardware to databases.</p><p>The fear was understandable:<br>What happens if noisy neighbors impact performance?<br>What if storage becomes ephemeral?<br>What if orchestration introduces instability?</p><p>Early Kubernetes versions lacked mature answers to many of these concerns.</p><p>But today&#8217;s environment is very different.</p><p>Storage orchestration has improved dramatically.</p><p>Operators now automate many database lifecycle tasks:</p><ul><li><p>Failover</p></li><li><p>Backup management</p></li><li><p>Replication</p></li><li><p>Scaling</p></li><li><p>Recovery workflows</p></li></ul><p>Infrastructure teams can increasingly define sophisticated storage and placement policies declaratively.</p><p>The result is something powerful:<br>The operational benefits of Kubernetes can finally extend to data systems.</p><div><hr></div><h1>The Four Pillars of Cloud Native Data</h1><h2>1. Scalability</h2><p>Cloud native systems must scale without downtime.</p><p>Legacy systems often required maintenance windows for upgrades or expansion.</p><p>Modern infrastructure should allow:</p><ul><li><p>Dynamic horizontal scaling</p></li><li><p>Intelligent data redistribution</p></li><li><p>Elastic resource growth</p></li><li><p>Near-continuous availability</p></li></ul><p>Applications no longer tolerate downtime as a normal operational expectation.</p><p>Neither should data platforms.</p><p><em><strong>Scalability: </strong>If a service can produce a unit of work for a unit of resources, adding more resources should increase the amount of work a service can perform. Scalability describes the service&#8217;s ability to apply additional resources to produce additional work. Ideally, services should scale infinitely given an infinite amount of compute, network, and storage resources. For data, this means scale without the need for downtime. Legacy systems required a maintenance period while adding new resources, during which all services had to be shut down. With the needs of cloud native applications, downtime is no longer acceptable.</em></p><div><hr></div><h2>2. Elasticity</h2><p>Scalability is adding resources.</p><p>Elasticity is removing them when demand drops.</p><p>This distinction matters enormously in modern environments where infrastructure cost directly impacts business efficiency.</p><p>Cloud native data systems should intelligently:</p><ul><li><p>Reclaim unused storage</p></li><li><p>Tier cold data automatically</p></li><li><p>Scale compute dynamically</p></li><li><p>Optimize resource utilization continuously</p></li></ul><p>Infrastructure should expand and contract based on real demand.</p><p><em><strong>Elasticity: </strong></em>Whereas <em>scale</em> is adding resources to meet demand, elasticity is the ability to free those resources when they are no longer needed. The difference between scalability and elasticity is highlighted in below figure. Elasticity can also be called <em>on-demand infrastructure</em>. In a constrained environment such as a private datacenter, this is critical for sharing limited resources. For cloud infrastructure that charges for every resource used, this is a way to prevent paying for running services you don&#8217;t need. When it comes to managing data, this means that we need capabilities to reclaim storage space and optimize our usage&#8212;for example, moving older data to less expensive storage tiers.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8KpA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21904db1-0be3-45c7-b05f-4422701fbbfb_600x282.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8KpA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21904db1-0be3-45c7-b05f-4422701fbbfb_600x282.png 424w, https://substackcdn.com/image/fetch/$s_!8KpA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21904db1-0be3-45c7-b05f-4422701fbbfb_600x282.png 848w, https://substackcdn.com/image/fetch/$s_!8KpA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21904db1-0be3-45c7-b05f-4422701fbbfb_600x282.png 1272w, https://substackcdn.com/image/fetch/$s_!8KpA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21904db1-0be3-45c7-b05f-4422701fbbfb_600x282.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8KpA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21904db1-0be3-45c7-b05f-4422701fbbfb_600x282.png" width="600" height="282" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/21904db1-0be3-45c7-b05f-4422701fbbfb_600x282.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:282,&quot;width&quot;:600,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Comparing scalability and elasticity&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Comparing scalability and elasticity" title="Comparing scalability and elasticity" srcset="https://substackcdn.com/image/fetch/$s_!8KpA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21904db1-0be3-45c7-b05f-4422701fbbfb_600x282.png 424w, https://substackcdn.com/image/fetch/$s_!8KpA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21904db1-0be3-45c7-b05f-4422701fbbfb_600x282.png 848w, https://substackcdn.com/image/fetch/$s_!8KpA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21904db1-0be3-45c7-b05f-4422701fbbfb_600x282.png 1272w, https://substackcdn.com/image/fetch/$s_!8KpA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21904db1-0be3-45c7-b05f-4422701fbbfb_600x282.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>3. Self-Healing</h2><p>Failures are inevitable.</p><p>Disks fail. Nodes disappear. Networks partition. Services crash.</p><p>Modern systems must assume failure as a normal operating condition.</p><p>Self-healing infrastructure responds automatically:</p><ul><li><p>Rebuilding replicas</p></li><li><p>Rerouting traffic</p></li><li><p>Recovering workloads</p></li><li><p>Maintaining service availability</p></li></ul><p>For data systems, this extends beyond infrastructure:</p><ul><li><p>Detecting data corruption</p></li><li><p>Monitoring data quality</p></li><li><p>Recovering consistency automatically</p></li></ul><p><strong>Self-healing: </strong><em>Bad things happen. When they do, how will your infrastructure respond? Self-healing infrastructure will reroute traffic, reallocate resources, and maintain service levels. With larger and more complex distributed applications being deployed, this is an increasingly important attribute of a cloud native application. This is what keeps you from getting that 3 A.M. wake-up call. For data, this means we need capabilities to detect issues with data such as missing data and data quality.</em></p><div><hr></div><h2>4. Observability</h2><p>Distributed systems create distributed failure modes.</p><p>Without visibility, debugging becomes guesswork.</p><p>Cloud native observability combines:</p><ul><li><p>Logs</p></li><li><p>Metrics</p></li><li><p>Traces</p></li></ul><p>to provide system-wide insight into behavior.</p><p>For data infrastructure, observability becomes even more critical:</p><ul><li><p>Query latency</p></li><li><p>Replication lag</p></li><li><p>Throughput bottlenecks</p></li><li><p>Storage saturation</p></li><li><p>Pipeline delays</p></li><li><p>Distributed tracing</p></li></ul><p>The larger the system becomes, the more important observability becomes.</p><p><em><strong>Observability</strong>: If something fails and you aren&#8217;t monitoring it, did it happen? Unfortunately, not only is the answer yes, but that can be an even worse scenario. Distributed applications are highly dynamic, and visibility into every service is critical for maintaining service levels. Interdependencies can create complex failure scenarios, which is why observability is a key part of building cloud native applications. In data systems, the volumes that are commonplace need efficient ways of monitoring the flow and state of infrastructure. In most cases, early warnings for issues can help operators avoid costly downtime.</em></p><p>With all the previous definitions in place, let&#8217;s try a definition that expresses these properties:</p><p><em><strong>Cloud native data approaches</strong> empower organizations that have adopted the cloud native application methodology to incorporate data holistically rather than employ the legacy of people, process, technology, so that data can scale up and down elastically, and promote observability and self-healing. This is exemplified by containerized data, declarative data, data APIs, data meshes, and cloud native data infrastructure (that is, databases, streaming, and analytics technologies that are themselves architected as cloud native applications).</em></p><p></p><p>For data infrastructure to keep parity with the rest of our application, we need to incorporate each piece. This includes automation of scale, elasticity, and self-healing. APIs are needed to decouple services and increase developer velocity, as well as enable you to observe the entire stack of your application to make critical decisions. Taken as a whole, <strong>your application and data infrastructure should appear as one unit.</strong></p><div><hr></div><p></p><h1>More Infrastructure, More Problems</h1><p>Whether your infrastructure is in a cloud, on premises, or both (commonly referred to as <em>hybrid</em>), you could spend a lot of time doing manual configuration. Typing things into an editor and doing incredibly detailed configuration work requires deep knowledge of each technology. Over the past 20 years, significant advances have occurred in the DevOps community, both to code and the way we deploy our infrastructure. This is a critical step in the evolution of modern infrastructure. DevOps has kept us ahead of the scale required for applications, but just barely. Arguably, the same amount of knowledge is needed to fully script a single database server deployment. It&#8217;s just that now we can do it a million times over (if needed) with templates and scripts. What has been lacking is a connectedness between the components and a holistic view of the entire application stack. Let&#8217;s tackle this problem together. (Foreshadowing: this is a problem that needs to be solved.)</p><p>As with any good engineering problem, let&#8217;s break it into manageable parts. The first is resource management. Regardless of the many ways we have developed to work at scale, fundamentally, we are trying to manage three things as efficiently as possible: compute, network, and storage, as shown in figure below. These are the critical resources that every application needs and the fuel that&#8217;s burned during growth. Not surprisingly, these are also the resources that carry the monetary component to a running application. We get rewarded when we use the resources wisely and pay a literal high price if we don&#8217;t. Anywhere you run your application, these are the most primitive units. When on prem, everything is bought and owned. When using the cloud, we&#8217;re renting.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NoWG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18749413-285b-42ca-9471-2c615c47e962_600x229.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NoWG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18749413-285b-42ca-9471-2c615c47e962_600x229.png 424w, https://substackcdn.com/image/fetch/$s_!NoWG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18749413-285b-42ca-9471-2c615c47e962_600x229.png 848w, https://substackcdn.com/image/fetch/$s_!NoWG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18749413-285b-42ca-9471-2c615c47e962_600x229.png 1272w, https://substackcdn.com/image/fetch/$s_!NoWG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18749413-285b-42ca-9471-2c615c47e962_600x229.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NoWG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18749413-285b-42ca-9471-2c615c47e962_600x229.png" width="600" height="229" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/18749413-285b-42ca-9471-2c615c47e962_600x229.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:229,&quot;width&quot;:600,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Fundamental resources of cloud applications: compute, network, and storage&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Fundamental resources of cloud applications: compute, network, and storage" title="Fundamental resources of cloud applications: compute, network, and storage" srcset="https://substackcdn.com/image/fetch/$s_!NoWG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18749413-285b-42ca-9471-2c615c47e962_600x229.png 424w, https://substackcdn.com/image/fetch/$s_!NoWG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18749413-285b-42ca-9471-2c615c47e962_600x229.png 848w, https://substackcdn.com/image/fetch/$s_!NoWG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18749413-285b-42ca-9471-2c615c47e962_600x229.png 1272w, https://substackcdn.com/image/fetch/$s_!NoWG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18749413-285b-42ca-9471-2c615c47e962_600x229.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h6 style="text-align: center;">Fundamental resources of cloud applications: compute, network, and storage</h6><p style="text-align: center;"></p><p>The second part of the problem is having an entire stack act as a single entity. DevOps has provided many tools to manage individual components, but the connective tissue between them provides the potential for incredible efficiency&#8212;similarly to how applications are packaged for the desktop but working at datacenter scales. That potential has launched an entire community around cloud native applications. These applications are similar to what we&#8217;ve always deployed. The difference is that modern cloud applications aren&#8217;t a single process with business logic. They are a complex coordination of many containerized processes that need to communicate securely and reliably. Storage has to match the current needs of the application, but remain aware of how it contributes to the stability of the application. When we think of deploying stateless applications without data managed in the same control plane, it sounds incomplete because it is. Breaking your application components into different control planes creates more complexity and thus goes against the ideals of cloud native.</p><div><hr></div><h1>Kubernetes as the Universal Control Plane</h1><p>The real power of Kubernetes is not containers.</p><p>It is consistency.</p><p>Kubernetes standardizes infrastructure operations across environments:</p><ul><li><p>On-premises</p></li><li><p>Public cloud</p></li><li><p>Hybrid cloud</p></li><li><p>Edge deployments</p></li></ul><p>This consistency creates enormous operational leverage.</p><p>Teams can deploy applications using the same patterns everywhere:</p><ul><li><p>APIs</p></li><li><p>Declarative configuration</p></li><li><p>Automated reconciliation</p></li><li><p>Infrastructure abstraction</p></li></ul><p>Data infrastructure benefits immensely from this model.</p><p>Imagine databases designed natively for Kubernetes:</p><ul><li><p>Storage tiers managed declaratively</p></li><li><p>Automatic scaling policies</p></li><li><p>Intelligent workload placement</p></li><li><p>Self-healing replication</p></li><li><p>Built-in observability</p></li><li><p>Elastic compute utilization</p></li></ul><p>That future is becoming increasingly realistic.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OSg-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F983e379c-3526-47a7-9fc0-08672753a36f_600x285.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OSg-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F983e379c-3526-47a7-9fc0-08672753a36f_600x285.png 424w, https://substackcdn.com/image/fetch/$s_!OSg-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F983e379c-3526-47a7-9fc0-08672753a36f_600x285.png 848w, https://substackcdn.com/image/fetch/$s_!OSg-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F983e379c-3526-47a7-9fc0-08672753a36f_600x285.png 1272w, https://substackcdn.com/image/fetch/$s_!OSg-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F983e379c-3526-47a7-9fc0-08672753a36f_600x285.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OSg-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F983e379c-3526-47a7-9fc0-08672753a36f_600x285.png" width="600" height="285" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/983e379c-3526-47a7-9fc0-08672753a36f_600x285.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:285,&quot;width&quot;:600,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Moving from virtual servers to virtual datacenters&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Moving from virtual servers to virtual datacenters" title="Moving from virtual servers to virtual datacenters" srcset="https://substackcdn.com/image/fetch/$s_!OSg-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F983e379c-3526-47a7-9fc0-08672753a36f_600x285.png 424w, https://substackcdn.com/image/fetch/$s_!OSg-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F983e379c-3526-47a7-9fc0-08672753a36f_600x285.png 848w, https://substackcdn.com/image/fetch/$s_!OSg-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F983e379c-3526-47a7-9fc0-08672753a36f_600x285.png 1272w, https://substackcdn.com/image/fetch/$s_!OSg-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F983e379c-3526-47a7-9fc0-08672753a36f_600x285.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><div><hr></div><h1>The Rise of the SRE Mindset</h1><p>Cloud native infrastructure also changes engineering culture.</p><p>Traditional infrastructure roles focused heavily on managing individual systems.</p><p>Modern platform engineering requires understanding how entire ecosystems behave together.</p><p>This is where Site Reliability Engineering (SRE) becomes essential.</p><p>The focus shifts from:<br>&#8220;<em><strong>What are we deploying?</strong></em>&#8221;</p><p>to:<br>&#8220;<strong>How does the entire system behave under real-world conditions?</strong>&#8221;</p><p>That includes:</p><ul><li><p>CI/CD pipelines</p></li><li><p>Distributed systems design</p></li><li><p>Automation</p></li><li><p>Reliability engineering</p></li><li><p>Capacity planning</p></li><li><p>Observability</p></li><li><p>Incident response</p></li></ul><p>Infrastructure engineers, DBAs, and platform teams are increasingly converging into reliability-focused engineering disciplines.</p><p>The future belongs to engineers who can think holistically.</p><div><hr></div><h1>Distributed Systems Change Everything</h1><p>Kubernetes forces engineers to confront the realities of distributed computing.</p><p>The classic fallacies still apply:</p><ul><li><p>The network is not always reliable</p></li><li><p>Latency is never zero</p></li><li><p>Bandwidth is finite</p></li><li><p>Topology constantly changes</p></li><li><p>Security is never automatic</p></li></ul><p>Distributed systems introduce complexity that monolithic infrastructure rarely exposed.</p><p>But they also unlock unprecedented scalability and flexibility.</p><p>The tradeoff is worth it.</p><p>Modern applications demand:</p><ul><li><p>Global scalability</p></li><li><p>High availability</p></li><li><p>Elastic infrastructure</p></li><li><p>Rapid deployment cycles</p></li><li><p>Resilient architectures</p></li></ul><p>These requirements are impossible to achieve consistently without embracing distributed systems thinking.</p><div><hr></div><h1>The Future of Cloud Native Data</h1><p>We are still early in this transition.</p><p>But the direction is becoming increasingly clear.</p><p>The future will likely include:</p><ul><li><p>Databases built specifically for Kubernetes</p></li><li><p>Intelligent autoscaling based on workload patterns</p></li><li><p>Native observability baked into infrastructure</p></li><li><p>Declarative data lifecycle management</p></li><li><p>Seamless hybrid-cloud portability</p></li><li><p>Automated storage optimization</p></li><li><p>Self-healing distributed architectures</p></li></ul><p>Most importantly, data infrastructure will stop being treated as a separate operational domain.</p><p>It will become fully integrated into the cloud native ecosystem.</p><div><hr></div><h1>Final Thoughts</h1><p>Cloud native transformation was never only about containers.</p><p>It was about rethinking how infrastructure itself should operate.</p><p>The organizations that succeed over the next decade will not merely modernize applications.</p><p>They will modernize the entire stack:</p><ul><li><p>Compute</p></li><li><p>Networking</p></li><li><p>Storage</p></li><li><p>Security</p></li><li><p>Observability</p></li><li><p>Data infrastructure</p></li></ul><p>Kubernetes is becoming the foundation for that convergence.</p><p>The question is no longer whether data belongs in the cloud native model.</p><p>The question is how quickly organizations can adapt their systems, processes, and engineering culture to fully embrace it.</p><p>Because the future of infrastructure is not partially cloud native.</p><p>It is entirely cloud native.</p><p></p><p><strong>source</strong>: <a href="https://learning.oreilly.com/library/view/managing-cloud-native/9781098111380/">Cloud Native Data Infrastructure</a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://rajeevranjansinha.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Rajeev&#8217;s Substack! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Understanding Autoencoders: The Foundation of Generative AI]]></title><description><![CDATA[A beginner-friendly guide to the neural networks powering deepfakes, image restoration, and modern AI art]]></description><link>https://rajeevranjansinha.substack.com/p/understanding-autoencoders-the-foundation</link><guid isPermaLink="false">https://rajeevranjansinha.substack.com/p/understanding-autoencoders-the-foundation</guid><dc:creator><![CDATA[Rajeev ranjan Sinha]]></dc:creator><pubDate>Sat, 24 Jan 2026 13:05:46 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!H3re!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89d4d189-d431-4288-87ec-ecaf7b91848e_144x144.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If you&#8217;ve ever wondered how AI generates fake news articles, creates deepfakes, or restores old photographs, you&#8217;re about to discover the secret: <strong>autoencoders</strong>.</p><p>These elegant neural networks are the unsung heroes of generative AI, quietly powering everything from medical imaging analysis to the latest AI art generators. Today, we&#8217;re going to demystify how they work.</p><h2>What Are Autoencoders?</h2><p>Think of an autoencoder as a neural network that learns to compress and decompress data. It&#8217;s like teaching a computer to take detailed notes about an image, then reconstruct that image from just those notes.</p><p>The beauty? The computer teaches itself what&#8217;s important to remember.</p><p>Autoencoders consist of two main parts:</p><ul><li><p><strong>Encoder</strong>: Compresses the input data into a compact representation (the &#8220;latent space&#8221;)</p></li><li><p><strong>Decoder</strong>: Reconstructs the original data from this compressed form</p></li></ul><p>This simple architecture unlocks powerful capabilities:</p><ul><li><p>Generating realistic fake images</p></li><li><p>Translating poses between people</p></li><li><p>Animating static photographs</p></li><li><p>Restoring damaged images and videos</p></li><li><p>Analyzing medical scans</p></li><li><p>Creating AI-generated art</p></li></ul><h2>A Hands-On Example: MNIST Digit Generation</h2><p>Let&#8217;s walk through a practical example using the classic MNIST dataset of handwritten digits. This 70,000-image collection of numbers 0-9 has become the &#8220;Hello World&#8221; of computer vision.</p><h3>Setting Up the Data</h3><p>First, we load and prepare our data. Each image is 28&#215;28 pixels, where each pixel has a value from 0 to 255 representing its darkness.</p><pre><code>from tensorflow.keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Normalize pixel values to 0-1 range
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.

# Flatten images from 28x28 to vectors of 784
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))</code></pre><p>Why flatten? We&#8217;re converting each 28&#215;28 grid into a single row of 784 values. This makes it easier for our neural network to process.</p><h3>The Magic of Self-Supervised Learning</h3><p>Here&#8217;s where autoencoders get interesting. Unlike typical machine learning where you need labeled data (like &#8220;this is a 7&#8221; or &#8220;this is a 3&#8221;), autoencoders use <strong>self-supervised learning</strong>.</p><p>The trick? We train the network to recreate its own input. The image becomes both the input AND the target output. This means the network learns to:</p><ol><li><p>Extract the essential features of each digit</p></li><li><p>Compress them into a smaller representation</p></li><li><p>Reconstruct the original image from this compressed form</p></li></ol><h3>Training the Model</h3><p>Training is straightforward:</p><pre><code>autoencoder.compile(optimizer='adam', loss='mse')

autoencoder.fit(x_train, x_train,  # Notice: input = output
                epochs=50,
                batch_size=256,
                shuffle=True,
                validation_data=(x_test, x_test))</code></pre><p>We use Mean Squared Error (MSE) as our loss function because we&#8217;re essentially comparing pixels between the original and reconstructed images. The smaller the difference, the better our autoencoder has learned.</p><h3>Seeing the Results</h3><p>After training, we can generate reconstructions:</p><pre><code>generated = autoencoder.predict(x_test)</code></pre><p>The network takes messy, handwritten digits and recreates them, having learned the fundamental patterns that make each number unique.</p><h2>Why This Matters</h2><p>Autoencoders might seem simple, but they&#8217;re the foundation for cutting-edge AI technologies:</p><p><strong>Variational Autoencoders (VAEs)</strong> extend this concept to generate entirely new images, not just reconstruct existing ones.</p><p><strong>Generative Adversarial Networks (GANs)</strong> use similar principles to create photorealistic faces, artwork, and more.</p><p><strong>Transformers like GPT and BERT</strong> use encoder-decoder architectures inspired by autoencoders to understand and generate human language.</p><p><strong>Sequence-to-sequence models</strong> power machine translation, video prediction, and more.</p><h2>The Bigger Picture</h2><p>What makes autoencoders fascinating isn&#8217;t just what they do&#8212;it&#8217;s how they do it. By learning to compress and reconstruct data, they discover meaningful patterns humans might never identify.</p><p>That compressed middle layer (the latent space) becomes a rich representation of the input&#8217;s essential features. This is why autoencoders are so versatile: once you&#8217;ve learned a good compression, you can use it for classification, generation, denoising, and countless other tasks.</p><div><hr></div><h2>Want to Learn More?</h2><p>Autoencoders are just the beginning of generative AI. If you found this interesting, future posts will explore:</p><ul><li><p>Variational autoencoders and how they generate new content</p></li><li><p>GANs and the adversarial training process</p></li><li><p>Real-world applications in art, medicine, and beyond</p></li></ul><p><em>Have you experimented with autoencoders or generative AI? I&#8217;d love to hear about your experiences in the comments below.</em></p><div><hr></div><p><em>This article is part of a series exploring the foundations of modern AI. Subscribe to get the next post delivered to your inbox.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://rajeevranjansinha.substack.com/p/understanding-autoencoders-the-foundation?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://rajeevranjansinha.substack.com/p/understanding-autoencoders-the-foundation?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://rajeevranjansinha.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://rajeevranjansinha.substack.com/subscribe?"><span>Subscribe now</span></a></p><p></p><p></p><p></p><p></p>]]></content:encoded></item><item><title><![CDATA[Coming soon]]></title><description><![CDATA[This is Rajeev&#8217;s Substack.]]></description><link>https://rajeevranjansinha.substack.com/p/coming-soon</link><guid isPermaLink="false">https://rajeevranjansinha.substack.com/p/coming-soon</guid><dc:creator><![CDATA[Rajeev ranjan Sinha]]></dc:creator><pubDate>Thu, 22 May 2025 14:19:10 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!H3re!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89d4d189-d431-4288-87ec-ecaf7b91848e_144x144.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This is Rajeev&#8217;s Substack.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://rajeevranjansinha.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://rajeevranjansinha.substack.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item></channel></rss>