Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📷 Instagram Pipelines [DRAFT] #74

Closed
wants to merge 7 commits into from

Conversation

arifluthfi16
Copy link
Contributor

@arifluthfi16 arifluthfi16 commented Jan 19, 2022

This solves #60

Current Progression:

Created API for the Scrapping API, able to scrape:

  • Profile Data
  • Posts Data
  • Hashtag Data
  • Locations Data

Todo:

  • Dockerize Scrapping API ✅
  • Select data that matters (i need feedback on this)
  • Add Proxy Support
  • Convert Locations data Into H3 Indexes
  • Create Pipelines to Store Data
  • Dockerize Pipelines

Examples:

Hashtag Scrapping:
Requests

[
    "https://www.instagram.com/explore/tags/googlepixel3/",
    "https://www.instagram.com/explore/tags/google/"
]

Response:

[
    {
        "allow_following": false,
        "amount_of_posts": 178563,
        "id": "17852487268191726",
        "is_following": false,
        "is_top_media_only": false,
        "name": "googlepixel3",
        "profile_pic_url": "https://instagram.fbdo1-2.fna.fbcdn.net/v/t51.2885-15/e35/s150x150/240950065_1011889952946184_2416560484104956328_n.jpg?_nc_ht=instagram.fbdo1-2.fna.fbcdn.net&_nc_cat=104&_nc_ohc=pUqmR4ouF8AAX9MBxyA&edm=ABZsPhsBAAAA&ccb=7-4&oh=00_AT8BIF4IN-mzBBJxJnPEMR2fHL3e6praejBMXu6KXhbGVQ&oe=61EE4C5A&_nc_sid=4efc9f"
    },
    {
        "allow_following": false,
        "amount_of_posts": 11056646,
        "id": "17843843635029645",
        "is_following": false,
        "is_top_media_only": false,
        "name": "google",
        "profile_pic_url": "https://instagram.fbdo1-1.fna.fbcdn.net/v/t51.2885-15/e35/c0.180.1440.1440a/s150x150/271992644_353756903249047_2268745757588465676_n.webp.jpg?_nc_ht=instagram.fbdo1-1.fna.fbcdn.net&_nc_cat=107&_nc_ohc=V9JAOOMvFScAX_VbPZj&edm=ABZsPhsBAAAA&ccb=7-4&oh=00_AT-MQFEBk7kRv8pY8pn1ahglXB-4sKQI8kdBJw5Fb_F9Mg&oe=61EE44FB&_nc_sid=4efc9f"
    }
]

Posts Scrapping:

Requests

[
    "https://www.instagram.com/p/CY3rhPQqig6/",
    "https://www.instagram.com/p/CY0lOS2vp9Y/"
]

Response

[
    {
        "accessibility_caption": "Photo by Donatekart | India on January 18, 2022. May be an image of big cat and text that says 'DONATEKART A tribute to the Queen of Pench \"Collarwali\" Who passed away yesterday due to old age. She was 16 and one of the most important and legendary tigers in India who gave birth to 29 cubs.'.",
        "caption": "R.I.P \"Collarwali\" 🥺🙌🏻🙏🌱\n\nShe played a key role in maintaining the population of tiger reserve in India🐅\n.\n.\n#collarwali #tributepost #tigeress #cubs #tiger #forest",
        "caption_is_edited": true,
        "commenting_disabled_for_viewer": false,
        "comments": 72,
        "comments_disabled": false,
        "display_url": "https://instagram.fbdo1-1.fna.fbcdn.net/v/t51.2885-15/e35/271986206_140104905091452_6023192320528863036_n.jpg?_nc_ht=instagram.fbdo1-1.fna.fbcdn.net&_nc_cat=100&_nc_ohc=Svb-p5OLusAAX94Gh1N&edm=AABBvjUBAAAA&ccb=7-4&oh=00_AT9rrCMr8_CaTrHI_wLy400ol25ZaY38mOwHaneBan6ZcA&oe=61EF0E8B&_nc_sid=83d603",
        "fact_check_information": null,
        "fact_check_overall_rating": null,
        "full_name": "Donatekart | India",
        "gating_info": null,
        "has_audio":NaN,
        "has_ranked_comments": false,
        "hashtags": [
            "collarwali",
            "tributepost",
            "tigeress",
            "cubs",
            "tiger",
            "forest"
        ],
        "height": 1350,
        "id": "2753861097288771642",
        "is_video": false,
        "likes": 4555,
        "location":NaN,
        "media_overlay_info": null,
        "media_preview": null,
        "sensitivity_friction_info": null,
        "shortcode": "CY3rhPQqig6",
        "tagged_users": [],
        "timestamp": 1642505846,
        "tracking_token": "eyJ2ZXJzaW9uIjo1LCJwYXlsb2FkIjp7ImlzX2FuYWx5dGljc190cmFja2VkIjp0cnVlLCJ1dWlkIjoiZjAxOTMzODJhNjJkNDgyOGE4MGUzNDRkYzdiMWZjZTkyNzUzODYxMDk3Mjg4NzcxNjQyIn0sInNpZ25hdHVyZSI6IiJ9",
        "upload_date": "Tue, 18 Jan 2022 11:37:26 GMT",
        "username": "donatekart",
        "video_url":NaN,
        "video_view_count":NaN,
        "viewer_can_reshare": true,
        "viewer_has_liked": false,
        "viewer_has_saved": false,
        "viewer_has_saved_to_collection": false,
        "viewer_in_photo_of_you": false,
        "width": 1080
    },
    . . .
 ]

@arifluthfi16 arifluthfi16 changed the title Draft: Instagram Pipelines 📷 Instagram Pipelines [DRAFT] Jan 21, 2022
@arifluthfi16 arifluthfi16 deleted the insta-scrapper branch January 22, 2022 16:59
@arifluthfi16
Copy link
Contributor Author

Moved to #76

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant