[AWS] 고급 Amazon S3

Nov 18, 2025

Contents

S3 수명 주기 (Lifecycle) 규칙 S3 요청자 지불 (Requester Pays)S3 이벤트 알림 (Event Notifications)S3 성능 최적화 S3 Batch Operations S3 Storage Lens 실전 통합 시나리오 모범 사례 체크리스트 비용 계산 예시 트러블슈팅 가이드 고급 통합 패턴 S3와 다른 AWS 서비스 통합

S3 수명 주기 (Lifecycle) 규칙

객체를 자동으로 다른 스토리지 클래스로 이동하거나 삭제하여 비용을 최적화


업로드
  ↓
Standard (0일)
  ↓ 30일
Standard-IA
  ↓ 60일
Glacier Instant Retrieval
  ↓ 90일
Glacier Flexible Retrieval
  ↓ 365일
Glacier Deep Archive
  ↓ 2555일 (7년)
삭제

전환 규칙

기본 전환 규칙


{
  "Rules": [
    {
      "Id": "ArchiveOldData",
      "Status": "Enabled",
      "Filter": {
        "Prefix": "documents/"
      },
      "Transitions": [
        {
          "Days": 30,
          "StorageClass": "STANDARD_IA"
        },
        {
          "Days": 90,
          "StorageClass": "GLACIER_IR"
        },
        {
          "Days": 180,
          "StorageClass": "GLACIER"
        },
        {
          "Days": 365,
          "StorageClass": "DEEP_ARCHIVE"
        }
      ],
      "Expiration": {
        "Days": 2555
      }
    }
  ]
}

객체 크기 필터


{
  "Rules": [
    {
      "Id": "MoveSmallFiles",
      "Status": "Enabled",
      "Filter": {
        "And": {
          "Prefix": "images/",
          "ObjectSizeGreaterThan": 128000,
          "ObjectSizeLessThan": 10485760
        }
      },
      "Transitions": [
        {
          "Days": 30,
          "StorageClass": "STANDARD_IA"
        }
      ]
    }
  ]
}

주의사항:

Standard-IA와 One Zone-IA는 최소 128KB 객체만 가능

30일 미만 저장 시 30일분 비용 청구

태그 기반 필터


{
  "Rules": [
    {
      "Id": "ArchiveByTag",
      "Status": "Enabled",
      "Filter": {
        "Tag": {
          "Key": "archive",
          "Value": "true"
        }
      },
      "Transitions": [
        {
          "Days": 0,
          "StorageClass": "GLACIER"
        }
      ]
    }
  ]
}

버전 관리된 객체의 수명 주기


{
  "Rules": [
    {
      "Id": "ManageVersions",
      "Status": "Enabled",
      "NoncurrentVersionTransitions": [
        {
          "NoncurrentDays": 30,
          "StorageClass": "STANDARD_IA"
        },
        {
          "NoncurrentDays": 90,
          "StorageClass": "GLACIER"
        }
      ],
      "NoncurrentVersionExpiration": {
        "NoncurrentDays": 365
      }
    }
  ]
}

작동 방식:


file.txt (버전 1) ← 현재 버전
  ↓ 새 버전 업로드
file.txt (버전 2) ← 현재 버전
file.txt (버전 1) ← 이전 버전 (30일 후 IA로 이동)

불완전한 Multipart Upload 정리


{
  "Rules": [
    {
      "Id": "CleanupIncompleteUploads",
      "Status": "Enabled",
      "AbortIncompleteMultipartUpload": {
        "DaysAfterInitiation": 7
      }
    }
  ]
}

실패하거나 중단된 Multipart Upload를 자동으로 정리하여 비용 절감

실전 예시

로그 관리


{
  "Rules": [
    {
      "Id": "LogRetention",
      "Status": "Enabled",
      "Filter": {
        "Prefix": "logs/"
      },
      "Transitions": [
        {
          "Days": 7,
          "StorageClass": "STANDARD_IA"
        },
        {
          "Days": 30,
          "StorageClass": "GLACIER_IR"
        },
        {
          "Days": 90,
          "StorageClass": "DEEP_ARCHIVE"
        }
      ],
      "Expiration": {
        "Days": 2555
      }
    }
  ]
}

백업 정책


{
  "Rules": [
    {
      "Id": "BackupRetention",
      "Status": "Enabled",
      "Filter": {
        "Prefix": "backups/"
      },
      "Transitions": [
        {
          "Days": 30,
          "StorageClass": "GLACIER_IR"
        },
        {
          "Days": 365,
          "StorageClass": "DEEP_ARCHIVE"
        }
      ],
      "Expiration": {
        "Days": 2555
      }
    }
  ]
}

S3 요청자 지불 (Requester Pays)

개념

버킷 소유자 대신 요청자가 데이터 전송 및 요청 비용을 지불


일반 버킷:
소유자가 모든 비용 지불

Requester Pays:
소유자: 스토리지 비용만
요청자: 데이터 전송 + 요청 비용

활성화


aws s3api put-bucket-request-payment \
  --bucket my-public-dataset \
  --request-payment-configuration Payer=Requester

사용 방법

요청자는 명시적으로 비용 지불 동의가 필요


# CLI
aws s3 cp s3://my-public-dataset/data.csv . \
  --request-payer requester

# SDK (Python)
s3 = boto3.client('s3')
s3.get_object(
    Bucket='my-public-dataset',
    Key='data.csv',
    RequestPayer='requester'
)

사용 사례

공개 데이터셋 제공:


연구 기관이 대용량 데이터셋 공유
→ 소유자는 스토리지 비용만 지불
→ 사용자가 다운로드 비용 지불

파트너 간 데이터 공유:


회사 A가 데이터 제공
회사 B, C, D가 각자 필요한 만큼 다운로드
→ 각사가 자신의 사용량에 대한 비용 지불

S3 이벤트 알림 (Event Notifications)

개념

S3에서 발생하는 이벤트를 다른 AWS 서비스로 전송


S3 버킷
  ↓ 이벤트 발생
[SNS Topic] 또는 [SQS Queue] 또는 [Lambda Function]

지원하는 이벤트 타입

객체 생성:

s3:ObjectCreated:* (모든 생성 이벤트)

s3:ObjectCreated:Put

s3:ObjectCreated:Post

s3:ObjectCreated:Copy

s3:ObjectCreated:CompleteMultipartUpload

객체 삭제:

s3:ObjectRemoved:*

s3:ObjectRemoved:Delete

s3:ObjectRemoved:DeleteMarkerCreated

객체 복원:

s3:ObjectRestore:Post

s3:ObjectRestore:Completed

복제:

s3:Replication:OperationFailedReplication

s3:Replication:OperationMissedThreshold

Lambda로 이벤트 처리

설정


{
  "LambdaFunctionConfigurations": [
    {
      "Id": "ProcessNewImages",
      "LambdaFunctionArn": "arn:aws:lambda:region:account:function:ProcessImage",
      "Events": ["s3:ObjectCreated:*"],
      "Filter": {
        "Key": {
          "FilterRules": [
            {
              "Name": "prefix",
              "Value": "uploads/"
            },
            {
              "Name": "suffix",
              "Value": ".jpg"
            }
          ]
        }
      }
    }
  ]
}

Lambda 함수 예시 (이미지 썸네일 생성)


import boto3
import os
from PIL import Image
from io import BytesIO

s3 = boto3.client('s3')

def lambda_handler(event, context):
    # 이벤트에서 버킷과 키 추출
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = event['Records'][0]['s3']['object']['key']

    # 원본 이미지 다운로드
    response = s3.get_object(Bucket=bucket, Key=key)
    image_data = response['Body'].read()

    # 이미지 처리
    image = Image.open(BytesIO(image_data))

    # 썸네일 생성 (200x200)
    image.thumbnail((200, 200))

    # 메모리에 저장
    buffer = BytesIO()
    image.save(buffer, 'JPEG')
    buffer.seek(0)

    # 썸네일 업로드
    thumbnail_key = key.replace('uploads/', 'thumbnails/')
    s3.put_object(
        Bucket=bucket,
        Key=thumbnail_key,
        Body=buffer,
        ContentType='image/jpeg'
    )

    return {
        'statusCode': 200,
        'body': f'Thumbnail created: {thumbnail_key}'
    }

SNS로 알림 전송


{
  "TopicConfigurations": [
    {
      "Id": "NotifyOnUpload",
      "TopicArn": "arn:aws:sns:region:account:my-topic",
      "Events": ["s3:ObjectCreated:*"],
      "Filter": {
        "Key": {
          "FilterRules": [
            {
              "Name": "prefix",
              "Value": "important-docs/"
            }
          ]
        }
      }
    }
  ]
}

SNS를 통해 이메일, SMS, Lambda 등으로 알림을 보내기 가능

SQS로 큐잉


{
  "QueueConfigurations": [
    {
      "Id": "QueueForProcessing",
      "QueueArn": "arn:aws:sqs:region:account:my-queue",
      "Events": ["s3:ObjectCreated:*"]
    }
  ]
}

워커 프로세스가 SQS에서 메시지를 폴링하여 처리

EventBridge 통합 (권장)

더 많은 기능과 유연성을 제공합니다.


# EventBridge 활성화
aws s3api put-bucket-notification-configuration \
  --bucket my-bucket \
  --notification-configuration '{
    "EventBridgeConfiguration": {}
  }'

EventBridge 규칙 생성:


{
  "Source": ["aws.s3"],
  "DetailType": ["Object Created"],
  "Detail": {
    "bucket": {
      "name": ["my-bucket"]
    },
    "object": {
      "key": [{
        "prefix": "uploads/"
      }]
    }
  }
}

장점:

더 많은 이벤트 필터링 옵션

여러 대상에 동시 전송

아카이브 및 재생 가능

JSON 규칙으로 복잡한 조건 표현

실전 활용 사례

1. 자동 데이터 처리 파이프라인


S3 Upload (raw-data/)
    ↓ Event
Lambda: 데이터 검증
    ↓
S3 (validated-data/)
    ↓ Event
Lambda: 데이터 변환
    ↓
S3 (processed-data/)
    ↓ Event
Glue Job: 데이터 카탈로그 업데이트

2. 백업 알림


S3 Upload (backups/)
    ↓ Event
SNS Topic
    ↓
Email: "백업 완료 알림"

3. 악성 파일 스캔


S3 Upload (uploads/)
    ↓ Event
Lambda: 바이러스 스캔
    ↓
If 안전:
    S3 (safe-files/)
    SNS: "파일 승인"
If 위험:
    S3 삭제
    SNS: "악성 파일 차단"

S3 성능 최적화

1. 프리픽스 병렬화

S3는 프리픽스당 높은 처리량을 제공

3,500 PUT/COPY/POST/DELETE 요청/초

5,500 GET/HEAD 요청/초


❌ 단일 프리픽스 (제한적):
/data/file001.jpg
/data/file002.jpg
/data/file003.jpg
→ 5,500 GET/s

✅ 다중 프리픽스 (확장):
/data/partition-0/file001.jpg
/data/partition-1/file002.jpg
/data/partition-2/file003.jpg
→ 16,500 GET/s (3개 프리픽스)

날짜 기반 파티셔닝


/logs/2024/11/19/00/app.log
/logs/2024/11/19/01/app.log
/logs/2024/11/19/02/app.log

해시 기반 파티셔닝


import hashlib

def get_partition(file_id):
    # 파일 ID의 해시를 사용하여 파티션 결정
    hash_value = hashlib.md5(file_id.encode()).hexdigest()
    partition = hash_value[:2]  # 처음 2자리 (0-255)
    return f"data/{partition}/{file_id}"

# 예시
file_id = "user12345_photo.jpg"
key = get_partition(file_id)
# → data/3f/user12345_photo.jpg

256개의 프리픽스 생성으로 처리량 극대화:

GET: 1,408,000 요청/초

PUT: 896,000 요청/초

2. Multipart Upload

대용량 파일을 부분으로 나눠 병렬 업로드


import boto3
from boto3.s3.transfer import TransferConfig

# 설정
GB = 1024 ** 3
config = TransferConfig(
    multipart_threshold=5 * GB,      # 5GB 이상
    multipart_chunksize=100 * 1024 * 1024,  # 100MB 청크
    max_concurrency=10,               # 10개 동시 업로드
    use_threads=True
)

# 업로드
s3_client = boto3.client('s3')
s3_client.upload_file(
    'large-file.bin',
    'my-bucket',
    'large-file.bin',
    Config=config
)

권장 사항:

100MB 이상: Multipart 권장

5GB 이상: Multipart 필수 (단일 PUT 불가)

청크 크기: 5MB ~ 5GB

3. Transfer Acceleration

CloudFront 엣지 로케이션을 활용하여 장거리 전송을 가속화


일반 업로드 (50 Mbps):
Client (Seoul) ─────────────→ S3 (Virginia)
   공인 인터넷 경유

Transfer Acceleration (200 Mbps):
Client (Seoul) → Edge (Seoul) ═══AWS 백본═══→ S3 (Virginia)


# 활성화
aws s3api put-bucket-accelerate-configuration \
  --bucket my-bucket \
  --accelerate-configuration Status=Enabled

# 업로드 (가속 엔드포인트 사용)
aws s3 cp large-file.zip \
  s3://my-bucket/ \
  --endpoint-url https://my-bucket.s3-accelerate.amazonaws.com

비용:

추가 비용: $0.04 ~ $0.08/GB

30-500% 성능 향상

테스트 도구:


https://s3-accelerate-speedtest.s3-accelerate.amazonaws.com/en/accelerate-speed-comparsion.html

4. S3 Select

객체 내부 데이터를 쿼리하여 필요한 부분만 가져온다


import boto3

s3 = boto3.client('s3')

# CSV에서 특정 열만 선택
response = s3.select_object_content(
    Bucket='my-bucket',
    Key='sales-data.csv',
    ExpressionType='SQL',
    Expression="""
        SELECT product_name, SUM(revenue) as total_revenue
        FROM S3Object
        WHERE region = 'Asia'
        GROUP BY product_name
    """,
    InputSerialization={
        'CSV': {
            'FileHeaderInfo': 'USE',
            'FieldDelimiter': ','
        }
    },
    OutputSerialization={
        'CSV': {}
    }
)

# 결과 처리
for event in response['Payload']:
    if 'Records' in event:
        records = event['Records']['Payload'].decode('utf-8')
        print(records)

성능 개선:

전송 데이터 최대 400% 감소

처리 시간 최대 80% 감소

비용 절감 (스캔한 데이터만 과금)

지원 형식:

JSON

Parquet

S3 Batch Operations

대량의 객체에 대해 작업을 일괄 수행

지원하는 작업

객체 복사

태그 설정/교체

ACL 설정

스토리지 클래스 변경

Lambda 함수 호출

객체 잠금 설정

복원 (Glacier)

작업 생성

1. 매니페스트 파일 생성


bucket,key
my-bucket,data/file1.txt
my-bucket,data/file2.txt
my-bucket,data/file3.txt

또는 S3 Inventory 리포트 사용:


# Inventory 설정
aws s3api put-bucket-inventory-configuration \
  --bucket my-bucket \
  --id daily-inventory \
  --inventory-configuration file://inventory-config.json

2. 배치 작업 생성


{
  "Operation": {
    "S3PutObjectTagging": {
      "TagSet": [
        {
          "Key": "processed",
          "Value": "true"
        },
        {
          "Key": "batch-date",
          "Value": "2024-11-19"
        }
      ]
    }
  },
  "Report": {
    "Enabled": true,
    "Bucket": "arn:aws:s3:::my-batch-reports",
    "Format": "Report_CSV_20180820",
    "ReportScope": "AllTasks"
  },
  "Manifest": {
    "Spec": {
      "Format": "S3BatchOperations_CSV_20180820"
    },
    "Location": {
      "ObjectArn": "arn:aws:s3:::my-bucket/manifest.csv",
      "ETag": "..."
    }
  },
  "Priority": 10,
  "RoleArn": "arn:aws:iam::account-id:role/BatchOperationsRole"
}

실전 예시

1. 대량 스토리지 클래스 변경


aws s3control create-job \
  --account-id 123456789012 \
  --operation '{
    "S3PutObjectCopy": {
      "TargetResource": "arn:aws:s3:::my-bucket",
      "StorageClass": "GLACIER"
    }
  }' \
  --manifest file://manifest.json \
  --report file://report-config.json \
  --priority 10 \
  --role-arn arn:aws:iam::123456789012:role/BatchRole

2. Lambda로 커스텀 처리


def lambda_handler(event, context):
    # Batch Operations에서 호출
    task = event['tasks'][0]
    bucket = task['s3BucketArn'].split(':::')[1]
    key = task['s3Key']

    s3 = boto3.client('s3')

    # 객체 메타데이터 업데이트
    s3.copy_object(
        Bucket=bucket,
        Key=key,
        CopySource={'Bucket': bucket, 'Key': key},
        Metadata={
            'processed-date': '2024-11-19',
            'processed-by': 'batch-operations'
        },
        MetadataDirective='REPLACE'
    )

    return {
        'invocationSchemaVersion': '1.0',
        'treatMissingKeysAs': 'PermanentFailure',
        'invocationId': event['invocationId'],
        'results': [{
            'taskId': task['taskId'],
            'resultCode': 'Succeeded',
            'resultString': 'Metadata updated'
        }]
    }

3. 대량 복원 (Glacier → Standard)


{
  "Operation": {
    "S3InitiateRestoreObject": {
      "ExpirationInDays": 7,
      "GlacierJobParameters": {
        "Tier": "Bulk"
      }
    }
  }
}

작업 모니터링


# 작업 상태 확인
aws s3control describe-job \
  --account-id 123456789012 \
  --job-id 12345678-1234-1234-1234-123456789012

# 작업 목록
aws s3control list-jobs \
  --account-id 123456789012

# 작업 취소
aws s3control update-job-status \
  --account-id 123456789012 \
  --job-id 12345678-1234-1234-1234-123456789012 \
  --status Cancelled

리포트 분석

배치 작업 완료 후 리포트가 생성


Bucket,Key,VersionId,TaskStatus,ErrorCode,ErrorMessage
my-bucket,data/file1.txt,,succeeded,,
my-bucket,data/file2.txt,,failed,NoSuchKey,Object not found
my-bucket,data/file3.txt,,succeeded,,

S3 Storage Lens

S3 사용 현황을 시각화하고 분석하는 도구

주요 기능

스토리지 사용량 분석

버킷별, 리전별, 계정별

스토리지 클래스별 분포

비용 최적화 권장 사항

덜 사용되는 데이터 식별

라이프사이클 정책 제안

데이터 보호 지표

암호화 미적용 버킷

버전 관리 미활성화

복제 미설정

접근 패턴 분석

읽기/쓰기 요청 수

데이터 전송량

에러율

대시보드 예시


총 스토리지: 10.5 TB
월 비용: $240

스토리지 클래스별:
- Standard: 4.2 TB (40%)
- Standard-IA: 3.1 TB (30%)
- Glacier: 3.2 TB (30%)

권장 사항:
⚠️ 버킷 'old-data'의 60% 객체가 90일 이상 미접근
   → Glacier로 이동 시 $50/월 절감

⚠️ 버킷 'public-assets'에 암호화 미적용
   → SSE-S3 활성화 권장

활성화


aws s3control put-storage-lens-configuration \
  --account-id 123456789012 \
  --config-id default-lens \
  --storage-lens-configuration file://lens-config.json

고급 지표 (유료)

무료 지표 외에 추가 지표 제공:

객체별 상세 분석

CloudWatch와 통합

15개월 데이터 보관 (무료는 14일)

가격: $0.20/백만 객체/월

실전 통합 시나리오

시나리오 1: 완전 자동화된 데이터 파이프라인


1. 데이터 업로드
   └→ S3 (raw-data/)
       └→ Event → Lambda (데이터 검증)
           └→ 유효한 데이터
               └→ S3 (validated-data/)
                   └→ Event → Glue Crawler
                       └→ Data Catalog 업데이트
                           └→ Athena로 쿼리 가능

2. 30일 후
   └→ Lifecycle → Standard-IA

3. 90일 후
   └→ Lifecycle → Glacier

4. 7년 후
   └→ Lifecycle → 삭제

시나리오 2: 이미지 처리 서비스


사용자 업로드
  └→ S3 (uploads/)
      └→ Event → Lambda (이미지 처리)
          ├→ 썸네일 생성 → S3 (thumbnails/)
          ├→ 워터마크 추가 → S3 (watermarked/)
          ├→ 메타데이터 추출 → DynamoDB
          └→ CloudFront 캐시 무효화

시나리오 3: 대규모 마이그레이션


1. S3 Batch Operations
   └→ 1억 개 객체를 Glacier로 이동
       └→ Lambda로 메타데이터 업데이트
           └→ 진행 상황을 CloudWatch로 모니터링

2. S3 Storage Lens
   └→ 마이그레이션 전후 비용 비교
       └→ 월 $10,000 → $1,500 절감

모범 사례 체크리스트

보안

✅ 모든 버킷에 퍼블릭 액세스 차단

✅ 버킷 정책으로 최소 권한 원칙 적용

✅ SSE-S3 또는 SSE-KMS로 암호화

✅ MFA Delete 활성화 (중요 버킷)

✅ CloudTrail로 API 호출 로깅

성능

✅ Multipart Upload (100MB 이상)

✅ Transfer Acceleration (장거리 전송)

✅ 프리픽스 분산 (고처리량)

✅ CloudFront로 정적 콘텐츠 캐싱

비용

✅ Lifecycle 정책 설정

✅ Intelligent-Tiering (접근 패턴 불명확)

✅ S3 Storage Lens로 모니터링

✅ 불완전한 Multipart Upload 정리

✅ 이전 버전 자동 삭제 (버전 관리 사용 시)

✅ Requester Pays (공개 데이터셋)

안정성

✅ 버전 관리 활성화 (중요 데이터)

✅ Cross-Region Replication (재해 복구)

✅ S3 Inventory로 객체 추적

✅ Object Lock (규정 준수)

운영

✅ S3 이벤트로 자동화 파이프라인 구축

✅ Batch Operations로 대량 작업 수행

✅ CloudWatch 알람 설정

✅ 태그로 비용 추적 및 관리

비용 계산 예시

시나리오: 1TB 데이터 저장

전략 1: 모두 Standard (비효율)


1TB × $0.023/GB = $23.52/월

연간: $282.24

전략 2: Lifecycle 적용 (효율)


Standard (0-30일): 300GB × $0.023 = $6.90
Standard-IA (31-90일): 400GB × $0.0125 = $5.00
Glacier IR (91-180일): 200GB × $0.004 = $0.80
Glacier Flexible (181-365일): 100GB × $0.0036 = $0.36

월 평균: $13.06
연간: $156.72

절감액: $125.52/년 (44% 절감)

전략 3: Intelligent-Tiering (자동화)


자주 접근 (40%): 400GB × $0.023 = $9.20
가끔 접근 (30%): 300GB × $0.0125 = $3.75
아카이브 (30%): 300GB × $0.004 = $1.20
모니터링: 1,000,000 객체 × $0.0025/1000 = $2.50

월: $16.65
연간: $199.80

절감액: $82.44/년 (29% 절감)

트러블슈팅 가이드

문제 1: 느린 업로드/다운로드

원인:

단일 스레드 전송

네트워크 병목

작은 파일 많음

해결책:


# Multipart Upload 활성화
aws configure set default.s3.multipart_threshold 8MB
aws configure set default.s3.multipart_chunksize 8MB
aws configure set default.s3.max_concurrent_requests 10

# Transfer Acceleration 사용
aws s3 cp large-file.zip s3://my-bucket/ \
  --endpoint-url https://my-bucket.s3-accelerate.amazonaws.com

문제 2: 403 Forbidden 에러

원인:

IAM 권한 부족

버킷 정책 제한

퍼블릭 액세스 차단

해결책:


# IAM 정책 확인
aws iam get-user-policy --user-name myuser --policy-name S3Access

# 버킷 정책 확인
aws s3api get-bucket-policy --bucket my-bucket

# 퍼블릭 액세스 설정 확인
aws s3api get-public-access-block --bucket my-bucket

문제 3: Lifecycle 규칙이 작동하지 않음

원인:

최소 객체 크기 미충족 (IA: 128KB)

최소 저장 기간 미충족 (IA: 30일)

필터 조건 불일치

해결책:


# Lifecycle 규칙 확인
aws s3api get-bucket-lifecycle-configuration --bucket my-bucket

# 객체 스토리지 클래스 확인
aws s3api head-object --bucket my-bucket --key myfile.txt

# 수동으로 스토리지 클래스 변경 (테스트)
aws s3api copy-object \
  --bucket my-bucket \
  --key myfile.txt \
  --copy-source my-bucket/myfile.txt \
  --storage-class GLACIER

문제 4: 이벤트 알림이 발생하지 않음

원인:

Lambda 권한 누락

이벤트 필터 불일치

순환 이벤트 방지 (같은 버킷)

해결책:


# 이벤트 설정 확인
aws s3api get-bucket-notification-configuration --bucket my-bucket

# Lambda 권한 추가
aws lambda add-permission \
  --function-name myfunction \
  --statement-id s3-invoke \
  --action lambda:InvokeFunction \
  --principal s3.amazonaws.com \
  --source-arn arn:aws:s3:::my-bucket

문제 5: 높은 비용 발생

원인:

불필요한 Standard 사용

불완전한 Multipart Upload

많은 이전 버전 누적

해결책:


# Storage Lens로 분석
aws s3control get-storage-lens-configuration \
  --account-id 123456789012 \
  --config-id default-lens

# 불완전한 Multipart 확인
aws s3api list-multipart-uploads --bucket my-bucket

# 이전 버전 확인
aws s3api list-object-versions --bucket my-bucket \
  --query 'Versions[?IsLatest==`false`].[Key,VersionId,LastModified]'

# Lifecycle로 자동 정리
aws s3api put-bucket-lifecycle-configuration \
  --bucket my-bucket \
  --lifecycle-configuration file://cleanup-policy.json

고급 통합 패턴

패턴 1: Serverless 데이터 레이크


Data Sources (로그, IoT, 앱)
    ↓
[Kinesis Firehose]
    ↓ 배치 쓰기
[S3 - Raw Data]
    ↓ Event
[Lambda - 데이터 정제]
    ↓
[S3 - Processed Data]
    ↓ Crawler
[Glue Data Catalog]
    ↓
[Athena / Redshift Spectrum]
    ↓
[QuickSight 대시보드]

패턴 2: 콘텐츠 배포 네트워크


[S3 Origin]
    ↓
[CloudFront Distribution]
    ├─ Edge Location (Seoul)
    ├─ Edge Location (Tokyo)
    ├─ Edge Location (Singapore)
    └─ Edge Location (Sydney)
    ↓
[사용자들]

Lambda@Edge:
- 헤더 수정
- 인증 검증
- 이미지 리사이징
- A/B 테스트

패턴 3: 백업 및 재해 복구


Primary Region (ap-northeast-2)
├─ [S3 Bucket - Production]
│   ├─ Versioning: Enabled
│   ├─ MFA Delete: Enabled
│   └─ Lifecycle: Standard → IA → Glacier
└─ CRR ↓

Secondary Region (us-east-1)
└─ [S3 Bucket - DR]
    ├─ Versioning: Enabled
    └─ Lifecycle: Glacier (즉시)

Disaster Recovery Time:
- RTO (Recovery Time Objective): < 1시간
- RPO (Recovery Point Objective): < 1분

패턴 4: 빅데이터 분석 파이프라인


[S3 - Landing Zone]
    ↓ Event
[Lambda - Schema 검증]
    ↓
[S3 - Validated Data]
    ↓ Event
[Glue ETL Job]
    ├─ 데이터 변환
    ├─ 중복 제거
    └─ 파티셔닝
    ↓
[S3 - Analytics Ready]
├─ Parquet 형식
├─ 년/월/일 파티션
└─ 압축 (Snappy)
    ↓
[Athena / EMR / Redshift]

S3와 다른 AWS 서비스 통합

S3 + Lambda


# S3 객체 처리 템플릿
import json
import boto3
import os

s3 = boto3.client('s3')

def lambda_handler(event, context):
    # 이벤트 파싱
    for record in event['Records']:
        bucket = record['s3']['bucket']['name']
        key = record['s3']['object']['key']
        size = record['s3']['object']['size']

        print(f"Processing: {bucket}/{key} ({size} bytes)")

        # 객체 가져오기
        try:
            obj = s3.get_object(Bucket=bucket, Key=key)
            content = obj['Body'].read()

            # 처리 로직
            result = process_data(content)

            # 결과 저장
            output_key = key.replace('input/', 'output/')
            s3.put_object(
                Bucket=bucket,
                Key=output_key,
                Body=result
            )

            return {
                'statusCode': 200,
                'body': json.dumps(f'Processed: {key}')
            }

        except Exception as e:
            print(f"Error: {str(e)}")
            return {
                'statusCode': 500,
                'body': json.dumps(f'Error: {str(e)}')
            }

def process_data(data):
    # 데이터 처리 로직
    return data.upper()

S3 + Step Functions


{
  "Comment": "S3 Data Processing Pipeline",
  "StartAt": "ValidateFile",
  "States": {
    "ValidateFile": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:region:account:function:ValidateFile",
      "Next": "IsValid"
    },
    "IsValid": {
      "Type": "Choice",
      "Choices": [
        {
          "Variable": "$.isValid",
          "BooleanEquals": true,
          "Next": "ProcessData"
        }
      ],
      "Default": "NotifyError"
    },
    "ProcessData": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:region:account:function:ProcessData",
      "Next": "SaveResults"
    },
    "SaveResults": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:region:account:function:SaveResults",
      "Next": "NotifySuccess"
    },
    "NotifySuccess": {
      "Type": "Task",
      "Resource": "arn:aws:states:::sns:publish",
      "Parameters": {
        "TopicArn": "arn:aws:sns:region:account:success-topic",
        "Message": "Processing completed successfully"
      },
      "End": true
    },
    "NotifyError": {
      "Type": "Task",
      "Resource": "arn:aws:states:::sns:publish",
      "Parameters": {
        "TopicArn": "arn:aws:sns:region:account:error-topic",
        "Message": "Processing failed"
      },
      "End": true
    }
  }
}

S3 + DynamoDB


# S3 메타데이터를 DynamoDB에 저장
import boto3
import json
from datetime import datetime

s3 = boto3.client('s3')
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('S3ObjectMetadata')

def lambda_handler(event, context):
    for record in event['Records']:
        bucket = record['s3']['bucket']['name']
        key = record['s3']['object']['key']
        size = record['s3']['object']['size']

        # S3 객체 메타데이터 조회
        metadata = s3.head_object(Bucket=bucket, Key=key)

        # DynamoDB에 저장
        table.put_item(
            Item={
                'object_id': f"{bucket}/{key}",
                'bucket': bucket,
                'key': key,
                'size': size,
                'content_type': metadata.get('ContentType', 'unknown'),
                'last_modified': metadata['LastModified'].isoformat(),
                'etag': metadata['ETag'],
                'storage_class': metadata.get('StorageClass', 'STANDARD'),
                'indexed_at': datetime.now().isoformat()
            }
        )

        print(f"Indexed: {bucket}/{key}")

💡

고급 S3 기능들은 엔터프라이즈급 애플리케이션을 구축하는 데 필수적

Lifecycle 정책으로 자동 비용 최적화

이벤트 알림으로 자동화 파이프라인 구축

Batch Operations로 대량 작업 효율화

S3 Select로 쿼리 성능 향상

Storage Lens로 사용 현황 모니터링

프리픽스 분산과 Multipart Upload로 성능 극대화