[응용#3] 자동화 프로젝트: 유튜브 인기 영상 데이터 수집기

한줄 요약:
파이썬으로 유튜브 인기 동영상 목록을 자동으로 크롤링하고,
제목·조회수·채널명·링크를 깔끔하게 정리해서 CSV로 저장해보자 📊

컨텐츠 목차

1. 목표

유튜브 인기 페이지 자동 접속
제목, 채널명, 조회수, 링크 정보 추출
CSV 파일로 저장
Selenium으로 스크롤 자동화 + 동적 로딩 처리

2. 시작하기

✅ 설치 패키지

pip install selenium beautifulsoup4

Selenium으로 동적 페이지 로딩,
BeautifulSoup으로 HTML 파싱을 함께 사용할 거예요.

1️⃣ 기본 셋업

from selenium import webdriver
from bs4 import BeautifulSoup
import time
import csv

# 크롬 드라이버 실행
driver = webdriver.Chrome()
driver.get("https://www.youtube.com/feed/trending")

# 페이지 로딩 대기
time.sleep(3)

💡 유튜브는 동적으로 콘텐츠를 렌더링하기 때문에,
Selenium으로 페이지를 완전히 로딩시켜야 해요.

2️⃣ 스크롤 자동 내리기

유튜브 인기 페이지는 스크롤을 내릴수록 더 많은 영상이 로드됩니다.

scroll_pause = 2
last_height = driver.execute_script("return document.documentElement.scrollHeight")

while True:
    # 스크롤 내리기
    driver.execute_script("window.scrollTo(0, document.documentElement.scrollHeight);")
    time.sleep(scroll_pause)

    new_height = driver.execute_script("return document.documentElement.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

✅ 모든 영상이 로딩될 때까지 자동으로 스크롤을 반복합니다.

3️⃣ HTML 파싱으로 영상 정보 추출

soup = BeautifulSoup(driver.page_source, "html.parser")

titles = soup.select("a#video-title")
channels = soup.select("a.yt-simple-endpoint.style-scope.yt-formatted-string")
views = soup.select("span.inline-metadata-item.style-scope.ytd-video-meta-block")

print("총 영상 수:", len(titles))

각 요소는 브라우저 검사(F12) → Copy Selector로 쉽게 확인할 수 있습니다.

4️⃣ CSV 파일로 저장

with open("youtube_trending.csv", "w", newline="", encoding="utf-8-sig") as f:
    writer = csv.writer(f)
    writer.writerow(["번호", "제목", "채널", "조회수", "링크"])

    for i, title in enumerate(titles, 1):
        title_text = title.get_text(strip=True)
        channel = channels[i*2].get_text(strip=True) if i*2 < len(channels) else "N/A"
        view_text = views[i].get_text(strip=True) if i < len(views) else "N/A"
        link = f"https://www.youtube.com{title['href']}"
        writer.writerow([i, title_text, channel, view_text, link])

print("✅ youtube_trending.csv 파일로 저장 완료!")

💡 utf-8-sig로 인코딩하면 엑셀에서도 한글이 깨지지 않아요.

5️⃣ headless 모드 적용 (창 안 띄우고 백그라운드 실행)

from selenium.webdriver.chrome.options import Options

options = Options()
options.add_argument("--headless")
driver = webdriver.Chrome(options=options)
driver.get("https://www.youtube.com/feed/trending")

✅ 서버 환경, 자동 실행용 스크립트에 유용합니다.

3. 전체 코드 정리

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
import time
import csv

options = Options()
options.add_argument("--headless")

driver = webdriver.Chrome(options=options)
driver.get("https://www.youtube.com/feed/trending")
time.sleep(3)

scroll_pause = 2
last_height = driver.execute_script("return document.documentElement.scrollHeight")

while True:
    driver.execute_script("window.scrollTo(0, document.documentElement.scrollHeight);")
    time.sleep(scroll_pause)
    new_height = driver.execute_script("return document.documentElement.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

soup = BeautifulSoup(driver.page_source, "html.parser")
titles = soup.select("a#video-title")
channels = soup.select("a.yt-simple-endpoint.style-scope.yt-formatted-string")
views = soup.select("span.inline-metadata-item.style-scope.ytd-video-meta-block")

with open("youtube_trending.csv", "w", newline="", encoding="utf-8-sig") as f:
    writer = csv.writer(f)
    writer.writerow(["번호", "제목", "채널", "조회수", "링크"])
    for i, title in enumerate(titles, 1):
        title_text = title.get_text(strip=True)
        channel = channels[i*2].get_text(strip=True) if i*2 < len(channels) else "N/A"
        view_text = views[i].get_text(strip=True) if i < len(views) else "N/A"
        link = f"https://www.youtube.com{title['href']}"
        writer.writerow([i, title_text, channel, view_text, link])

driver.quit()
print("✅ youtube_trending.csv 생성 완료!")

4. 결과 예시

번호	제목	채널	조회수	링크
1	파이썬 강좌 시작하기	CodeBro	23만회	https://youtube.com/…
2	AI 시대, 파이썬으로 살아남기	코딩팩토리	12만회	https://youtube.com/…

💡 엑셀에서 열면 깔끔하게 정리되어 바로 활용 가능!

5. 주의사항

항목	설명
너무 잦은 요청	유튜브에서 일시 차단될 수 있음
로그인 페이지	로그인 필요 시 쿠키 인증 또는 계정 자동화 필요
광고/추천 영상	CSS 선택자 구조가 바뀔 수 있음
HTML 변경 대응	`F12 → Copy Selector`로 최신 구조 다시 확인

6. 최종 점검 체크리스트

Selenium + BeautifulSoup 설치
자동 스크롤 코드 작동 확인
제목/조회수/채널 추출 성공
CSV 파일로 저장 확인
headless 모드 테스트 완료

7. 요약 한 줄

유튜브 인기 영상 데이터 수집기 완성!
Selenium으로 자동 스크롤하고, BeautifulSoup으로 정리해 CSV로 저장하자 ✅

이전 강좌 👈 [응용#2] Selenium으로 동적 페이지 자동 크롤링
다음 강좌 👉 [응용#4] 자동 로그인 + 데이터 다운로드 매크로 만들기