Skip to main content

Command Palette

Search for a command to run...

🚀 Dev Day with Sam #3: CNN: Teaching Machines How to See 👀

Updated
3 min read
N
I’m Samuel, a Cloud Engineer turned AI/ML Engineer, currently pursuing a Master’s in Artificial Intelligence. I share my journey building intelligent systems, exploring developer experience, and simplifying complex tech concepts.

There was a time I thought computers just knew what they were looking at.

You show it a cat… it says “cat.” Simple, right? Until I realized, a computer doesn’t actually see anything. It sees numbers. Just pixels. And that’s where Convolutional Neural Networks (CNNs) come in.

🧠 So… What Really Is a CNN? A Convolutional Neural Network (CNN) is a deep learning model designed to understand images by learning patterns, edges, shapes, textures, layer by layer.

Think of it like this: You don’t recognize a face all at once.

You notice edges → shapes → features → identity. CNNs do the same thing… but mathematically.

🔍 Step 1: Convolution: Where Learning Begins At the heart of CNNs is the convolution operation. A small filter (kernel) moves across the image and extracts features.
Mathematically:

Where:
I= Input image
K= Kernel (filter)
Output = Feature map
👉 This is how CNN detects things like edges and textures

Step 2: Activation: Introducing Non-Linearity After convolution, we apply an activation function:

This is ReLU (Rectified Linear Unit)
👉 It removes negative values and keeps important features
👉 Without this, the network would just be… linear (and useless for complex patterns)

🧊 Step 3: Pooling: Reducing Complexity Pooling reduces the size of the feature map while keeping key information.
Example: Max Pooling

👉 Keeps strongest signals
👉 Reduces computation
👉 Prevents overfitting

🧠 Step 4: Fully Connected Layer: Making Decisions After extracting features, CNN flattens everything and feeds it into a dense layer:
z=Wx+b

👉 This is where the model decides:
“Is this a cat… or not?”

🎯 Step 5: Output: Probability Final output uses Softmax:

👉 Converts outputs into probabilities
👉 Helps the model choose the most likely class

🔄 Putting It All Together
A CNN pipeline looks like this:
Image → Convolution → ReLU → Pooling → Fully Connected → Output
Or more formally:
Input → Feature Extraction → Dimensionality Reduction → Classification

🧠 What Makes CNN Powerful?
CNNs don’t just memorize images…
They learn patterns hierarchically:
Early layers → edges
Middle layers → shapes
Deep layers → objects

This is why CNNs are powerful:
Face recognition
Medical imaging
Self driving cars
Security systems

🛠️ Realization Moment
While studying CNNs, I had a moment: “This isn’t just image processing… this is perception.”
We’re teaching machines how to interpret the world.

🤝 Dev Day with Sam
This is what this journey is about.

Not just using models… But understanding what’s happening underneath.

We don’t just use AI
👉 You build it.