• api
    • https://www.instagram.com/developer
      • 2020년 공식적으로~ Legacy API 중단 선언 !!!
    • https://developers.facebook.com/docs/instagram
      • ...
  • 크롤링? 스크래핑?
    • aws 환경에서 login 없이는 데이터 조회가 되지않는 이슈!
    • awesomeopensource.com/projects/instagram-scraper 에서 가장 별이많은 github.com/arc298/instagram-scraper 참고 해 봄.
    • 인스타에서 계속(?) 아직(?) 지원하는... 다양한 URL Query를 활용하면 됨.
      • BASE_URL = 'https://www.instagram.com/'
      • LOGIN_URL = BASE_URL + 'accounts/login/ajax/'
      • LOGOUT_URL = BASE_URL + 'accounts/logout/'
      • CHROME_WIN_UA = 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'
      • STORIES_UA = 'Instagram 123.0.0.21.114 (iPhone; CPU iPhone OS 11_4 like Mac OS X; en_US; en-US; scale=2.00; 750x1334) AppleWebKit/605.1.15'
      • USER_URL = BASE_URL + '{0}/?__a=1'
      • TAGS_URL = BASE_URL + 'explore/tags/{0}/?__a=1'
      • LOCATIONS_URL = BASE_URL + 'explore/locations/{0}/?__a=1'
      • MEDIA_URL = BASE_URL + 'p/{0}/?__a=1'
      • SEARCH_URL = BASE_URL + 'web/search/topsearch/?context=blended&query={0}'
      • ...
    • ex) login cookie 기반의 파이썬 코드
      • import json
        import pickle
        import requests
        
        session = requests.Session()
        session.headers = {'user-agent': CHROME_WIN_UA}
        session.cookies.set('ig_pr', '1')
        
        session.headers.update({'Referer': BASE_URL, 'user-agent': STORIES_UA})
        
        req = session.get(BASE_URL)
        session.headers.update({'X-CSRFToken': req.cookies['csrftoken']})
        
        login_data = {'username': '아이디', 'password': '비번'}
        login = session.post(LOGIN_URL, data=login_data, allow_redirects=True)
        session.headers.update({'X-CSRFToken': login.cookies['csrftoken']})
        cookie = login.cookies
        session.headers.update({'user-agent': CHROME_WIN_UA})
        
        cookie_byte = pickle.dumps(cookie)
        session.cookies.update(pickle.loads(cookie_byte))
        
        media_url = MEDIA_URL.format('숏코드')
        response = session.get(media_url)
        #response = session.get(media_url, timeout=CONNECT_TIMEOUT, cookies=cookie)
        
  • '위경도+시간' 데이터 for RangeQuery DyDB 모델링?
    • G도 : -90.0~90.0 , -180.0~180.0 , zoom3~zoom21
    • 예) 37.5004, 127.0274, Z, @19:19:19, Likes 가중치 -> 29.005~30.005 & 119.005~120.005 & 19:00:00 -> 쿼리?
    • 시간 및 공간 분할?
      • 시간 :
        • 일간(TB_DAILY), 주간(TB_WEEKLY), 월간(TB_MONTHLY), 연간(TB_ANNUALLY(YEARLY))
        • 인덱스 테이블(TB_LEVEL) = GPS_LV | ??? | ... 
        • 테이블 분할 = 일간 -> (TTL) -> (Lv0~3 / Likes) -> (Thredhold) -> 주간 -> ...
      • 공간 :
        • Lv0(37_127), Lv1(37.5_127.0), Lv2(37.50_127.02), Lv3(37.500_127.027)
        • 컬럼 분할 = 기본(PK,SK) , GSI(PK,SK)
        • PK | CREATE_TS | LIKES | GPS_LV0 | GPS_LV1 | GPS_LV2 | GPS_LV3 | LAT | LON | TAKEN_TS | THUBMNAIL_URL | MEDIA_REF | _LIST? | ...
    • ...
    • TODO : redis or mdb's sphere index ???
    • ...
  • ?__a=1
    • 유저
    • 로케이션
    • 태그
    • 미디어
    • graphql.shortcode_media.id = MEDIA_SEQ
    • graphql.shortcode_media.shortcode = MEDIA_ID
    • graphql.shortcode_media.edge_media_to_tagged_user.edges[0].node.user
      • .id = INSTA_SEQ
      • .username = INSTA_ID
      • .full_name = INSTA_NAME
      • .profile_pic_url = INSTA_PROFILE_URL
    • graphql.shortcode_media.edge_media_to_caption.edges[0].node.text = TXT
    • graphql.shortcode_media.edge_media_to_parent_comment.count = ...
    • graphql.shortcode_media.edge_media_to_parent_comment.edges[0].node
      • .text = REPLY
      • .created_at = REPLY_TS
      • .owner
        • .id = INSTA_SEQ
        • .username = INSTA_ID
        • .profile_pic_url = INSTA_PROFILE_URL
      • .edge_threaded_comments.count = ...
      • .edge_threaded_comments.edges[0].node
        • .text = REPLY
        • .created_at = REPLY_TS
        • .owner
          • .id = INSTA_SEQ
          • .username = INSTA_ID
          • .profile_pic_url = INSTA_PROFILE_URL
    • graphql.shortcode_media.taken_at_timestamp = MEDIA_TS
    • graphql.shortcode_media.location
      • .id = LOCATION_SEQ
      • .name = LOCATION_NAME
      • .address_json = LOCATION_ADDRRESS
    • graphql.shortcode_media.owner
      • .id = INSTA_SEQ
      • .username = INSTA_ID
      • .full_name = INSTA_NAME
      • .profile_pic_url = INSTA_PROFILE_URL
    • graphql.shortcode_media.edge_sidecar_to_children.edges[0].node
      • .id = PIC_SEQ ???
      • .shortcode = PIC_ID ???
      • .display_url = PIC_URL
      • .accessibility_caption = META
      • edge_media_to_tagged_user.edges[0].node.user
        • .id = INSTA_SEQ
        • .username = INSTA_ID
        • .full_name = INSTA_NAME
        • .profile_pic_url = INSTA_PROFILE_URL
      • video_url = 
      • .is_video =
    • graphql.shortcode_media.video_url = 
    • graphql.shortcode_media.is_video = 
    • graphql.shortcode_media.accessibility_caption = 
    • graphql.shortcode_media.display_url = 

-끝-

'빅브로 들' 카테고리의 다른 글

sendbird  (0) 2021.04.28
firebase  (0) 2020.12.02
NCP, N-Devs, K-Devs, ...  (0) 2020.09.13
Youtube Data API (+ Live Streaming API)  (0) 2020.05.21
Google Apis 및 서비스 (with OAuth)  (0) 2020.05.21

+ Recent posts