Bart Dorsey

S3 Integration

All S3 interaction lives in photos.py — creating the client, validating uploads, uploading files, and generating pre-signed URLs.


Creating the S3 Client

Install boto3, the AWS SDK for Python:

If you already did pip install -r requirements.txt you can skip this

pip install boto3

The client is created once at module load time using credentials from config.py:

backend/photos.py#L1-L32 on GitHub

# photos.py
from __future__ import annotations

import uuid
from typing import TYPE_CHECKING

import boto3
from botocore.exceptions import ClientError

from config import (
    AWS_ACCESS_KEY,
    AWS_SECRET_KEY,
    S3_ENDPOINT_URL,
    BUCKET_NAME,
)

if TYPE_CHECKING:
    from fastapi import UploadFile

s3 = boto3.client(
    "s3",
    endpoint_url=S3_ENDPOINT_URL,
    aws_access_key_id=AWS_ACCESS_KEY,
    aws_secret_access_key=AWS_SECRET_KEY,
    region_name="us-east-1",
)

The endpoint_url parameter is what makes this work with MinIO — point it at http://localhost:9000 instead of AWS, and boto3 talks to MinIO using the same S3 protocol.


Validating Uploads

Before uploading anything to S3, the file is validated for type and size. This is a critical security step — never trust files from clients without checking them first.

backend/photos.py#L35-L77 on GitHub

MAX_IMAGE_SIZE_BYTES = 5 * 1024 * 1024  # 5 MB
ALLOWED_IMAGE_TYPES = [
    "image/jpeg",
    "image/png",
    "image/gif",
    "image/webp",
]


def validate_image(image: UploadFile) -> bool:
    """Validate image type and size before uploading."""
    # Check MIME type
    if image.content_type not in ALLOWED_IMAGE_TYPES:
        print(f"Rejected: unsupported type {image.content_type}")
        return False

    # Measure file size without reading it into memory
    current_position = image.file.tell()
    image.file.seek(0, 2)          # Seek to end
    size = image.file.tell()
    image.file.seek(current_position)  # Return to original position

    if size > MAX_IMAGE_SIZE_BYTES:
        max_mb = MAX_IMAGE_SIZE_BYTES / (1024 * 1024)
        print(f"Rejected: {size} bytes exceeds {max_mb} MB limit")
        return False

    return True

Two checks are performed:

  1. Content type — rejects anything that isn’t an accepted image format
  2. File size — prevents denial-of-service via huge uploads (5 MB limit)

The size check uses file.seek() to find the end of the file without reading the content into memory, which keeps memory usage low even for large rejected files.


Uploading to S3

This function is responsible for uploading the image to the S3 bucket.

backend/photos.py#L95-L118 on GitHub

def upload_photo(image: UploadFile) -> str | None:
    """Upload an image to S3 and return the object key, or None on failure."""
    if not validate_image(image):
        return None

    # Prefix with UUID to prevent filename collisions between users
    photo_name = f"{uuid.uuid4()}_{image.filename}"

    try:
        s3.upload_fileobj(
            image.file,
            BUCKET_NAME,
            photo_name,
            ExtraArgs={"ContentType": image.content_type},
        )
    except ClientError as e:
        print(e)
        return None

    return photo_name

A few things worth noting:


Pre-Signed URLs

Files in a private S3 bucket aren’t publicly accessible by URL. Pre-signed URLs are temporary, expiring links that grant access to a specific object:

backend/photos.py#L82-L92 on GitHub

def get_url(photo_name: str) -> str | None:
    """Generate a temporary pre-signed URL for accessing a photo."""
    try:
        return s3.generate_presigned_url(
            "get_object",
            ExpiresIn=604800,  # 7 days in seconds
            Params={"Bucket": BUCKET_NAME, "Key": photo_name},
        )
    except ClientError as e:
        print(e)
        return None

Pre-signed URLs are generated fresh each time a photo is requested — the database stores the S3 object key, not the URL, since URLs expire. After 7 days (604,800 seconds) the URL stops working and the client must request a new one.

This approach is preferable to making your bucket publicly readable because:

Note, this is an over-simplified approach, you could also cache the urls in memory on the server for seven days, which would reduce the number of API calls to S3.