Sunday, May 24, 2026

Part 8B: The Robot Learns Names — Building a Face Recognition Attendance System

Quick recap before we dive in.

In Part 8A, we built Face Detection: the robot learned to find faces in a live video stream and draw a green box around them. It could answer “Is there a face here, and where is it?” — nothing more.

Now we go one level deeper. Face Detection was the prerequisite. This is the actual goal: Face Recognition — teaching the robot to answer “Whose face is this?”

And just to make it concrete — here’s what haarcascade_frontalface_default.xml actually is under the hood:

  • It contains thousands of Haar-like features (light/dark pattern detectors) stored as decision trees inside a <stages> structure.
  • It operates on a standard 24×24 pixel window, sliding that window across every part of the frame looking for face-shaped patterns.

That file does the finding. What we’re building today does the recognizing.


The Plan

Three Python files. Three jobs. One attendance system.

Dataset.py   →   Training.py   →   Recognition.py
(collect)         (learn)            (identify)

Install what you need first:

pip install opencv-python opencv-contrib-python numpy

Step 1 — Collect Face Data (Dataset.py)

Before the robot can recognize anyone, it needs to study them. We’re going to photograph each person 30 times — different angles, different expressions — and store those images in a folder called dataset.

Create a new Python file called Dataset.py and paste this in:

import cv2
import os

face_detector = cv2.CascadeClassifier(
    cv2.data.haarcascades + "haarcascade_frontalface_default.xml"
)

os.makedirs("dataset", exist_ok=True)

while True:
    face_id = input("\n Enter user ID (e.g. 1, 2, 3...) and press Enter: ")
    print("\n [INFO] Starting camera. Look straight ahead and stay still...")

    cam = cv2.VideoCapture(0)
    cam.set(3, 640)
    cam.set(4, 480)

    count = 0
    while True:
        ret, img = cam.read()
        gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
        faces = face_detector.detectMultiScale(gray, 1.3, 5)

        for x, y, w, h in faces:
            cv2.rectangle(img, (x, y), (x + w, y + h), (255, 0, 0), 2)
            count += 1
            cv2.imwrite(
                f"dataset/User.{face_id}.{count}.jpg",
                gray[y : y + h, x : x + w],
            )

        cv2.putText(img, f"ID: {face_id} | Photo: {count}/30",
                    (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 255, 0), 2)
        cv2.imshow("image", img)

        k = cv2.waitKey(100) & 0xFF
        if k == 27:
            break
        elif count >= 30:
            break

    cam.release()
    cv2.destroyAllWindows()

    print(f"\n [INFO] Done. 30 photos collected for ID: {face_id}")
    next_person = input("\n Continue with another person? (Y/N): ").strip().upper()
    if next_person != "Y":
        print("\n [INFO] Exiting.")
        break

Hit Run. The terminal will ask: “Enter user ID” — type 1 and press Enter. The camera opens, and you’ll see a live feed with a counter in the top-left corner showing how many photos have been taken. Stand still, look at the camera, let it work.

After 30 photos, the program asks: “Continue with another person? (Y/N)” — type Y to register the next person (enter ID 2), or N to stop.

What you’ll find in the dataset folder afterward:

dataset/
├── User.1.1.jpg
├── User.1.2.jpg
├── User.1.3.jpg
...
├── User.2.1.jpg
├── User.2.2.jpg
...

The naming format is User.[ID].[photo number].jpg. Person 1’s photos are User.1.1 through User.1.30. Person 2’s are User.2.1 through User.2.30. The numbers after the ID are just the shot counter — not a sub-ID.

Each image is a cropped grayscale face at 255×255 pixels. Thirty of them per person. This is the raw material the training step needs.


Step 2 — Train the Model (Training.py)

Install one more library:

pip install Pillow

Now for the interesting part. We’re going to feed all those face photos into an algorithm called LBPH (Local Binary Pattern Histogram) — the engine inside opencv-contrib-python. It will study the patterns, crunch the numbers, and save everything it learned into a single file: trainer.yml.

Think of trainer.yml as the robot’s memory. Once it exists, the robot no longer needs to look at the photos. It carries everything it knows about each face in that one compact file.

Create Training.py:

import cv2
import numpy as np
from PIL import Image
import os

path = "dataset"
recognizer = cv2.face.LBPHFaceRecognizer_create()
detector = cv2.CascadeClassifier(
    cv2.data.haarcascades + "haarcascade_frontalface_default.xml"
)
os.makedirs("trainer", exist_ok=True)


def getImagesAndLabels(path):
    imagePaths = [os.path.join(path, f) for f in os.listdir(path)]
    faceSamples = []
    ids = []

    for imagePath in imagePaths:
        PIL_img = Image.open(imagePath).convert("L")  # Convert to grayscale
        img_numpy = np.array(PIL_img, "uint8")

        id = int(os.path.split(imagePath)[1].split(".")[1])
        faces = detector.detectMultiScale(img_numpy)

        for x, y, w, h in faces:
            faceSamples.append(img_numpy[y : y + h, x : x + w])
            ids.append(id)

    return faceSamples, ids


print("\n [INFO] Training face data. Please wait...")
faces, ids = getImagesAndLabels(path)
recognizer.train(faces, np.array(ids))

# Save the trained model as a .yml file
recognizer.write("trainer/trainer.yml")
print(
    f"\n [INFO] {len(np.unique(ids))} face(s) trained. Saved to trainer/trainer.yml"
)

Run it. You’ll see “Training face data. Please wait…” — give it a moment — then “X face(s) trained. Saved to trainer/trainer.yml”.

That’s it. Training complete. The robot now has memories.


Step 3 — Recognize and Take Attendance (Recognition.py)

Last file. This is where it all comes together.

The robot opens the camera, scans for faces, and checks each detected face against trainer.yml. If the match confidence is above 70%, it marks that person as present — and won’t count them again. Once everyone is checked in, the system keeps running until you press ESC.

Create Recognition.py:

import cv2

recognizer = cv2.face.LBPHFaceRecognizer_create()
recognizer.read("trainer/trainer.yml")

cascadePath = cv2.data.haarcascades + "haarcascade_frontalface_default.xml"
faceCascade = cv2.CascadeClassifier(cascadePath)

font = cv2.FONT_HERSHEY_SIMPLEX
names = ["None", "User.1", "User.2", "User.3"]  # Match your registered IDs

cam = cv2.VideoCapture(0)
cam.set(3, 640)
cam.set(4, 480)

minW = 0.1 * cam.get(3)
minH = 0.1 * cam.get(4)

CONFIDENCE_THRESHOLD = 70  # Minimum confidence to accept a match (%)
attended = set()            # Tracks who has already been marked present
message = ""                # On-screen notification text
message_timer = 0           # How many frames to show the notification

while True:
    ret, img = cam.read()
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    faces = faceCascade.detectMultiScale(
        gray,
        scaleFactor=1.2,
        minNeighbors=5,
        minSize=(int(minW), int(minH)),
    )

    for x, y, w, h in faces:
        cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
        id, confidence = recognizer.predict(gray[y : y + h, x : x + w])
        accuracy = round(100 - confidence)

        if accuracy >= CONFIDENCE_THRESHOLD:
            name = names[id] if id < len(names) else "Unknown"

            if id not in attended:
                # First time this person is recognized — mark attendance
                attended.add(id)
                message = f"{name} checked in! Next person..."
                message_timer = 60  # Show for ~2 seconds (60 frames)
                print(f"\n [INFO] {name} marked present. Confidence: {accuracy}%")

            label = f"{name} ({accuracy}%) - Present"
        else:
            name = "Unknown"
            label = f"{name} ({accuracy}%)"

        cv2.putText(img, label, (x + 5, y - 5), font, 0.8, (255, 255, 255), 2)

    # Show attendance notification at the top of the screen
    if message_timer > 0:
        cv2.putText(img, message, (10, 40), font, 0.8, (0, 255, 0), 2)
        message_timer -= 1

    # Show attendance summary at the bottom
    attended_names = [names[i] for i in attended if i < len(names)]
    summary = "Present: " + (", ".join(attended_names) if attended_names else "None yet")
    cv2.putText(img, summary, (10, img.shape[0] - 15), font, 0.6, (0, 200, 255), 1)

    cv2.imshow("Camera - Attendance", img)

    k = cv2.waitKey(10) & 0xFF
    if k == 27:  # ESC to quit
        break

print("\n [INFO] Attendance session ended.")
print(f" [INFO] Summary: {len(attended)} person(s) present: {', '.join(attended_names)}")
cam.release()
cv2.destroyAllWindows()

Run it. Walk in front of the camera. If your face is in the training data and the confidence clears 70%, the screen will show “User.1 checked in! Next person…” — then that notification fades after about 2 seconds. The bottom of the screen keeps a running list of everyone who’s been marked present. When you press ESC, the terminal prints the final summary.

Each person is only counted once. No double-dipping.


What We Just Built

Three files. Roughly 100 lines of Python total. And the result is a working face recognition attendance system — the same concept used in real classrooms, offices, and access control systems.

The pipeline is:

Dataset.py collects raw face images → Training.py compresses those images into a compact model file → Recognition.py loads that model and runs live identification.

The robot no longer just sees that someone is there. It knows who is there.

That’s Part 8B done. Next up: Part 9 — Line Following with a Webcam.