- api
- https://www.instagram.com/developer
- 2020년 공식적으로~ Legacy API 중단 선언 !!!
- https://developers.facebook.com/docs/instagram
- ...
- https://www.instagram.com/developer
- 크롤링? 스크래핑?
- aws 환경에서 login 없이는 데이터 조회가 되지않는 이슈!
- awesomeopensource.com/projects/instagram-scraper 에서 가장 별이많은 github.com/arc298/instagram-scraper 참고 해 봄.
- 인스타에서 계속(?) 아직(?) 지원하는... 다양한 URL Query를 활용하면 됨.
- BASE_URL = 'https://www.instagram.com/'
- LOGIN_URL = BASE_URL + 'accounts/login/ajax/'
- LOGOUT_URL = BASE_URL + 'accounts/logout/'
- CHROME_WIN_UA = 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'
- STORIES_UA = 'Instagram 123.0.0.21.114 (iPhone; CPU iPhone OS 11_4 like Mac OS X; en_US; en-US; scale=2.00; 750x1334) AppleWebKit/605.1.15'
- USER_URL = BASE_URL + '{0}/?__a=1'
- TAGS_URL = BASE_URL + 'explore/tags/{0}/?__a=1'
- LOCATIONS_URL = BASE_URL + 'explore/locations/{0}/?__a=1'
- MEDIA_URL = BASE_URL + 'p/{0}/?__a=1'
- SEARCH_URL = BASE_URL + 'web/search/topsearch/?context=blended&query={0}'
- ...
- ex) login cookie 기반의 파이썬 코드
-
import json import pickle import requests session = requests.Session() session.headers = {'user-agent': CHROME_WIN_UA} session.cookies.set('ig_pr', '1') session.headers.update({'Referer': BASE_URL, 'user-agent': STORIES_UA}) req = session.get(BASE_URL) session.headers.update({'X-CSRFToken': req.cookies['csrftoken']}) login_data = {'username': '아이디', 'password': '비번'} login = session.post(LOGIN_URL, data=login_data, allow_redirects=True) session.headers.update({'X-CSRFToken': login.cookies['csrftoken']}) cookie = login.cookies session.headers.update({'user-agent': CHROME_WIN_UA}) cookie_byte = pickle.dumps(cookie) session.cookies.update(pickle.loads(cookie_byte)) media_url = MEDIA_URL.format('숏코드') response = session.get(media_url) #response = session.get(media_url, timeout=CONNECT_TIMEOUT, cookies=cookie)
-
- '위경도+시간' 데이터 for RangeQuery DyDB 모델링?
- G도 : -90.0~90.0 , -180.0~180.0 , zoom3~zoom21
- 예) 37.5004, 127.0274, Z, @19:19:19, Likes 가중치 -> 29.005~30.005 & 119.005~120.005 & 19:00:00 -> 쿼리?
- 시간 및 공간 분할?
- 시간 :
- 일간(TB_DAILY), 주간(TB_WEEKLY), 월간(TB_MONTHLY), 연간(TB_ANNUALLY(YEARLY))
- 인덱스 테이블(TB_LEVEL) = GPS_LV | ??? | ...
- 테이블 분할 = 일간 -> (TTL) -> (Lv0~3 / Likes) -> (Thredhold) -> 주간 -> ...
- 공간 :
- Lv0(37_127), Lv1(37.5_127.0), Lv2(37.50_127.02), Lv3(37.500_127.027)
- 컬럼 분할 = 기본(PK,SK) , GSI(PK,SK)
- PK | CREATE_TS | LIKES | GPS_LV0 | GPS_LV1 | GPS_LV2 | GPS_LV3 | LAT | LON |
TAKEN_TS| THUBMNAIL_URL | MEDIA_REF | _LIST? | ...
- 시간 :
- ...
- TODO : redis or mdb's sphere index ???
- ...
- ?__a=1
- 유저
- 로케이션
- 태그
- 미디어
- graphql.shortcode_media.id = MEDIA_SEQ
- graphql.shortcode_media.shortcode = MEDIA_ID
- graphql.shortcode_media.edge_media_to_tagged_user.edges[0].node.user
- .id = INSTA_SEQ
- .username = INSTA_ID
- .full_name = INSTA_NAME
- .profile_pic_url = INSTA_PROFILE_URL
- graphql.shortcode_media.edge_media_to_caption.edges[0].node.text = TXT
- graphql.shortcode_media.edge_media_to_parent_comment.count = ...
- graphql.shortcode_media.edge_media_to_parent_comment.edges[0].node
- .text = REPLY
- .created_at = REPLY_TS
- .owner
- .id = INSTA_SEQ
- .username = INSTA_ID
- .profile_pic_url = INSTA_PROFILE_URL
- .edge_threaded_comments.count = ...
- .edge_threaded_comments.edges[0].node
- .text = REPLY
- .created_at = REPLY_TS
- .owner
- .id = INSTA_SEQ
- .username = INSTA_ID
- .profile_pic_url = INSTA_PROFILE_URL
- graphql.shortcode_media.taken_at_timestamp = MEDIA_TS
- graphql.shortcode_media.location
- .id = LOCATION_SEQ
- .name = LOCATION_NAME
- .address_json = LOCATION_ADDRRESS
- graphql.shortcode_media.owner
- .id = INSTA_SEQ
- .username = INSTA_ID
- .full_name = INSTA_NAME
- .profile_pic_url = INSTA_PROFILE_URL
- graphql.shortcode_media.edge_sidecar_to_children.edges[0].node
- .id = PIC_SEQ ???
- .shortcode = PIC_ID ???
- .display_url = PIC_URL
- .accessibility_caption = META
- edge_media_to_tagged_user.edges[0].node.user
- .id = INSTA_SEQ
- .username = INSTA_ID
- .full_name = INSTA_NAME
- .profile_pic_url = INSTA_PROFILE_URL
- video_url =
- .is_video =
- graphql.shortcode_media.video_url =
- graphql.shortcode_media.is_video =
- graphql.shortcode_media.accessibility_caption =
- graphql.shortcode_media.display_url =
-끝-
'빅브로 들' 카테고리의 다른 글
sendbird (0) | 2021.04.28 |
---|---|
firebase (0) | 2020.12.02 |
NCP, N-Devs, K-Devs, ... (0) | 2020.09.13 |
Youtube Data API (+ Live Streaming API) (0) | 2020.05.21 |
Google Apis 및 서비스 (with OAuth) (0) | 2020.05.21 |