[AWS] 머신 러닝

Nov 26, 2025

Contents

AWS AI/ML 서비스 개요 Amazon Rekognition - 이미지 및 비디오 분석 Amazon Transcribe - 음성을 텍스트로 Amazon Polly - 텍스트를 음성으로 Amazon Translate - 번역 Amazon Lex + Connect - 챗봇 및 콜센터 Amazon Comprehend - 텍스트 분석 Amazon SageMaker - 머신러닝 플랫폼 Amazon Forecast - 시계열 예측 Amazon Kendra - 지능형 검색 Amazon Personalize - 개인화 추천 Amazon Textract - 문서 텍스트 추출 실전 통합 시나리오 비용 최적화

AWS AI/ML 서비스 개요

3가지 레이어


1. AI Services (사용하기 쉬움)
   └─ Rekognition, Transcribe, Polly, Translate, Comprehend, Lex, etc.

2. ML Services (중간 수준)
   └─ SageMaker

3. ML Frameworks (전문가용)
   └─ TensorFlow, PyTorch on EC2/EKS

Amazon Rekognition - 이미지 및 비디오 분석

기능


이미지 분석:
- 객체/장면 감지
- 얼굴 인식 및 비교
- 텍스트 감지 (OCR)
- 유명인 인식
- 부적절한 콘텐츠 감지

비디오 분석:
- 객체 추적
- 활동 인식
- 얼굴 추적

사용 예시


import boto3

rekognition = boto3.client('rekognition')

# 이미지에서 레이블 감지
def detect_labels(image_path):
    with open(image_path, 'rb') as image:
        response = rekognition.detect_labels(
            Image={'Bytes': image.read()},
            MaxLabels=10,
            MinConfidence=80
        )

    for label in response['Labels']:
        print(f"{label['Name']}: {label['Confidence']:.2f}%")

# 얼굴 비교
def compare_faces(source_image, target_image):
    with open(source_image, 'rb') as source:
        source_bytes = source.read()

    with open(target_image, 'rb') as target:
        target_bytes = target.read()

    response = rekognition.compare_faces(
        SourceImage={'Bytes': source_bytes},
        TargetImage={'Bytes': target_bytes},
        SimilarityThreshold=80
    )

    for match in response['FaceMatches']:
        print(f"일치도: {match['Similarity']:.2f}%")

# 텍스트 감지 (OCR)
def detect_text(image_path):
    with open(image_path, 'rb') as image:
        response = rekognition.detect_text(
            Image={'Bytes': image.read()}
        )

    for text in response['TextDetections']:
        if text['Type'] == 'LINE':
            print(text['DetectedText'])

# 부적절한 콘텐츠 감지
def detect_moderation_labels(image_path):
    with open(image_path, 'rb') as image:
        response = rekognition.detect_moderation_labels(
            Image={'Bytes': image.read()},
            MinConfidence=60
        )

    for label in response['ModerationLabels']:
        print(f"{label['Name']}: {label['Confidence']:.2f}%")

실전 활용


소셜 미디어:
[사용자 업로드] → [Rekognition] → 부적절한 콘텐츠 차단

보안:
[CCTV] → [Rekognition Video] → 의심 활동 감지 → 알람

전자상거래:
[제품 이미지] → [Rekognition] → 자동 태깅 및 분류

Amazon Transcribe - 음성을 텍스트로

기능

실시간 및 배치 변환

다국어 지원

화자 식별

자동 구두점

사용자 정의 어휘

사용 예시


import boto3
import time

transcribe = boto3.client('transcribe')

# 배치 작업 시작
def transcribe_audio(audio_uri, job_name):
    transcribe.start_transcription_job(
        TranscriptionJobName=job_name,
        Media={'MediaFileUri': audio_uri},
        MediaFormat='mp3',
        LanguageCode='ko-KR',
        Settings={
            'ShowSpeakerLabels': True,
            'MaxSpeakerLabels': 2
        }
    )

    # 완료 대기
    while True:
        status = transcribe.get_transcription_job(
            TranscriptionJobName=job_name
        )

        job_status = status['TranscriptionJob']['TranscriptionJobStatus']

        if job_status in ['COMPLETED', 'FAILED']:
            break

        time.sleep(5)

    if job_status == 'COMPLETED':
        transcript_uri = status['TranscriptionJob']['Transcript']['TranscriptFileUri']
        return transcript_uri

# 사용
result = transcribe_audio('s3://my-bucket/audio.mp3', 'my-job-001')

실전 활용


콜센터:
[고객 통화] → [Transcribe] → 텍스트 변환 → 감정 분석

회의:
[회의 녹음] → [Transcribe] → 회의록 자동 생성

자막:
[동영상] → [Transcribe] → 자막 생성

Amazon Polly - 텍스트를 음성으로

기능

자연스러운 음성 합성

다양한 언어 및 음성

SSML (Speech Synthesis Markup Language) 지원

뉴럴 엔진 (더 자연스러움)

사용 예시


import boto3

polly = boto3.client('polly')

# 텍스트를 음성으로
def synthesize_speech(text, output_file):
    response = polly.synthesize_speech(
        Text=text,
        OutputFormat='mp3',
        VoiceId='Seoyeon',  # 한국어 여성 목소리
        Engine='neural'      # 뉴럴 엔진 (더 자연스러움)
    )

    # MP3 파일 저장
    with open(output_file, 'wb') as file:
        file.write(response['AudioStream'].read())

# SSML 사용 (강조, 속도 조절)
def synthesize_with_ssml(output_file):
    ssml_text = """
    <speak>
        안녕하세요. <break time="500ms"/>
        <emphasis level="strong">중요한</emphasis> 공지사항입니다.
        <prosody rate="slow">천천히 말합니다.</prosody>
    </speak>
    """

    response = polly.synthesize_speech(
        Text=ssml_text,
        TextType='ssml',
        OutputFormat='mp3',
        VoiceId='Seoyeon',
        Engine='neural'
    )

    with open(output_file, 'wb') as file:
        file.write(response['AudioStream'].read())

# 사용
synthesize_speech("AWS 머신러닝 서비스를 소개합니다.", "output.mp3")

실전 활용


콘텐츠 플랫폼:
[블로그 글] → [Polly] → 오디오북 생성

알림 시스템:
[텍스트 알림] → [Polly] → 음성 알림

교육:
[교재 텍스트] → [Polly] → 오디오 강의

Amazon Translate - 번역

기능

실시간 번역

75개 이상 언어

배치 번역

사용자 정의 용어

사용 예시


import boto3

translate = boto3.client('translate')

# 텍스트 번역
def translate_text(text, source_lang='ko', target_lang='en'):
    response = translate.translate_text(
        Text=text,
        SourceLanguageCode=source_lang,
        TargetLanguageCode=target_lang
    )

    return response['TranslatedText']

# 자동 언어 감지
def auto_translate(text, target_lang='en'):
    response = translate.translate_text(
        Text=text,
        SourceLanguageCode='auto',  # 자동 감지
        TargetLanguageCode=target_lang
    )

    return response['TranslatedText']

# 사용
korean_text = "AWS 머신러닝 서비스는 매우 강력합니다."
english_text = translate_text(korean_text, 'ko', 'en')
print(english_text)
# → "AWS machine learning services are very powerful."

Amazon Lex + Connect - 챗봇 및 콜센터

Lex (챗봇 구축)


[사용자 입력] → [Lex]
                  ├─ 의도(Intent) 파악
                  ├─ 엔티티 추출
                  └─ 대화 관리
                  ↓
                [Lambda] ← 비즈니스 로직
                  ↓
                [응답]

Connect (콜센터)


[고객 전화] → [Connect]
              ├─ IVR (음성 메뉴)
              ├─ Lex 통합 (챗봇)
              └─ 상담원 연결
              ↓
            [통화 녹음] → [Transcribe] → 분석

Amazon Comprehend - 텍스트 분석

기능

감정 분석

엔티티 인식

키 구문 추출

언어 감지

주제 모델링

사용 예시


import boto3

comprehend = boto3.client('comprehend')

# 감정 분석
def detect_sentiment(text):
    response = comprehend.detect_sentiment(
        Text=text,
        LanguageCode='ko'
    )

    sentiment = response['Sentiment']  # POSITIVE, NEGATIVE, NEUTRAL, MIXED
    scores = response['SentimentScore']

    print(f"감정: {sentiment}")
    print(f"긍정: {scores['Positive']:.2f}")
    print(f"부정: {scores['Negative']:.2f}")

# 엔티티 인식
def detect_entities(text):
    response = comprehend.detect_entities(
        Text=text,
        LanguageCode='ko'
    )

    for entity in response['Entities']:
        print(f"{entity['Type']}: {entity['Text']} ({entity['Score']:.2f})")

# 키 구문 추출
def detect_key_phrases(text):
    response = comprehend.detect_key_phrases(
        Text=text,
        LanguageCode='ko'
    )

    for phrase in response['KeyPhrases']:
        print(phrase['Text'])

# 사용
text = "삼성전자는 서울에 본사가 있으며, 이재용 회장이 이끌고 있습니다."
detect_entities(text)
# → ORGANIZATION: 삼성전자
# → LOCATION: 서울
# → PERSON: 이재용

Comprehend Medical

의료 텍스트 전문 분석입니다.


# 의료 엔티티 인식
def detect_medical_entities(text):
    response = comprehend.detect_entities_v2(
        Text=text
    )

    for entity in response['Entities']:
        print(f"{entity['Category']}: {entity['Text']}")

# 사용
medical_text = "환자는 두통과 발열 증상을 보이며 아스피린을 복용 중입니다."
detect_medical_entities(medical_text)
# → SYMPTOM: 두통
# → SYMPTOM: 발열
# → MEDICATION: 아스피린

Amazon SageMaker - 머신러닝 플랫폼

개념

완전 관리형 머신러닝 서비스입니다.


[데이터 준비] → [모델 학습] → [모델 배포] → [예측]
    ↓              ↓              ↓
Ground Truth   Training Jobs   Endpoints

주요 기능

SageMaker Studio - 통합 개발 환경

Ground Truth - 데이터 레이블링

Training Jobs - 모델 학습

Endpoints - 모델 배포

Autopilot - 자동 ML

간단한 예시


import boto3
import sagemaker

# SageMaker 세션
sagemaker_session = sagemaker.Session()
role = 'arn:aws:iam::account-id:role/SageMakerRole'

# 데이터 준비
train_data = 's3://my-bucket/train.csv'

# 내장 알고리즘 사용 (XGBoost)
from sagemaker.estimator import Estimator

xgb = Estimator(
    image_uri=sagemaker.image_uris.retrieve('xgboost', region),
    role=role,
    instance_count=1,
    instance_type='ml.m5.xlarge',
    output_path='s3://my-bucket/output/'
)

# 하이퍼파라미터 설정
xgb.set_hyperparameters(
    max_depth=5,
    eta=0.2,
    objective='binary:logistic',
    num_round=100
)

# 학습
xgb.fit({'train': train_data})

# 배포
predictor = xgb.deploy(
    initial_instance_count=1,
    instance_type='ml.t2.medium'
)

# 예측
result = predictor.predict(test_data)

Amazon Forecast - 시계열 예측

사용 사례

제품 수요 예측

재고 계획

인력 계획

재무 계획

사용 방법


import boto3

forecast = boto3.client('forecast')

# 1. 데이터셋 생성
forecast.create_dataset(
    DatasetName='sales_data',
    Domain='RETAIL',
    DatasetType='TARGET_TIME_SERIES',
    DataFrequency='D',
    Schema={
        'Attributes': [
            {'AttributeName': 'timestamp', 'AttributeType': 'timestamp'},
            {'AttributeName': 'item_id', 'AttributeType': 'string'},
            {'AttributeName': 'demand', 'AttributeType': 'float'}
        ]
    }
)

# 2. 데이터 임포트 (S3에서)
forecast.create_dataset_import_job(
    DatasetImportJobName='sales_import',
    DatasetArn=dataset_arn,
    DataSource={
        'S3Config': {
            'Path': 's3://my-bucket/sales.csv',
            'RoleArn': role_arn
        }
    }
)

# 3. Predictor 생성 (AutoML)
forecast.create_auto_predictor(
    PredictorName='sales_predictor',
    ForecastHorizon=30,  # 30일 예측
    ForecastFrequency='D'
)

# 4. 예측 생성
forecast.create_forecast(
    ForecastName='sales_forecast',
    PredictorArn=predictor_arn
)

# 5. 예측 결과 조회
response = forecast.query_forecast(
    ForecastArn=forecast_arn,
    Filters={'item_id': 'item_001'}
)

Amazon Kendra - 지능형 검색

개념

ML 기반 엔터프라이즈 검색 서비스


[문서들] → [Kendra] ← 자연어 질문
           ↓
        정확한 답변 추출

특징

자연어 이해

증분 학습

다양한 문서 형식 지원

권한 기반 접근


import boto3

kendra = boto3.client('kendra')

# 검색
def search(query):
    response = kendra.query(
        IndexId='index-id',
        QueryText=query
    )

    for result in response['ResultItems']:
        print(f"제목: {result['DocumentTitle']}")
        print(f"발췌: {result['DocumentExcerpt']['Text']}")
        print(f"신뢰도: {result['ScoreAttributes']['ScoreConfidence']}")

Amazon Personalize - 개인화 추천

사용 사례

제품 추천

콘텐츠 추천

마케팅 개인화


# 이벤트 추적
personalize_events = boto3.client('personalize-events')

personalize_events.put_events(
    trackingId='tracking-id',
    userId='user123',
    sessionId='session456',
    eventList=[{
        'eventType': 'click',
        'itemId': 'item789',
        'sentAt': datetime.now()
    }]
)

# 추천 받기
personalize_runtime = boto3.client('personalize-runtime')

response = personalize_runtime.get_recommendations(
    campaignArn='campaign-arn',
    userId='user123',
    numResults=10
)

for item in response['itemList']:
    print(f"추천 상품: {item['itemId']}")

Amazon Textract - 문서 텍스트 추출

기능

OCR (광학 문자 인식)

양식 데이터 추출

테이블 추출

신분증/영수증 분석


import boto3

textract = boto3.client('textract')

# 텍스트 추출
def extract_text(image_path):
    with open(image_path, 'rb') as document:
        response = textract.detect_document_text(
            Document={'Bytes': document.read()}
        )

    for block in response['Blocks']:
        if block['BlockType'] == 'LINE':
            print(block['Text'])

# 양식 분석
def analyze_document(image_path):
    with open(image_path, 'rb') as document:
        response = textract.analyze_document(
            Document={'Bytes': document.read()},
            FeatureTypes=['FORMS', 'TABLES']
        )

    # 키-값 쌍 추출
    for block in response['Blocks']:
        if block['BlockType'] == 'KEY_VALUE_SET':
            if 'KEY' in block['EntityTypes']:
                print(f"Key: {block['Text']}")

실전 통합 시나리오

시나리오: 스마트 콜센터


1. 고객 전화
   ↓
2. [Connect] - 음성 메뉴
   ↓
3. [Lex] - 챗봇 대응
   ├─ 간단한 문의 → 자동 응답
   └─ 복잡한 문의 → 상담원 연결
   ↓
4. [Transcribe] - 통화 텍스트 변환
   ↓
5. [Comprehend] - 감정 분석
   ├─ 부정적 → 관리자 알림
   └─ 긍정적 → 품질 개선 데이터
   ↓
6. [Kendra] - 상담원 지원
   └─ FAQ 검색 및 제안

비용 최적화


Rekognition:
- 첫 100만 이미지/월: $1.00/1000
- 이후: $0.60/1000

Transcribe:
- $0.024/분

Polly:
- 첫 500만 자: 무료
- 이후: $4.00/100만 자

Translate:
- $15.00/100만 자

Comprehend:
- $0.0001/단위 (100자)

최적화:
1. 배치 처리 (가능한 경우)
2. 캐싱 (반복 요청)
3. 결과 재사용

💡

AWS AI/ML 서비스는 머신러닝 전문 지식 없이도 강력한 AI 기능을 추가할 수 있게 해준다

이미지/비디오: Rekognition 음성 → 텍스트: Transcribe 텍스트 → 음성: Polly 번역: Translate 텍스트 분석: Comprehend 챗봇: Lex 검색: Kendra 추천: Personalize 예측: Forecast 문서 분석: Textract 커스텀 ML: SageMaker

선택 가이드:

사전 학습된 모델 필요 → AI Services

커스텀 모델 필요 → SageMaker