A Python script that fetches Instagram posts from a specified account using a GraphQL endpoint. It supports batching, pagination, and resumable downloads, saving your results into JSON format. It also handles encoding issues for robust JSON loading.
- Batch Fetching: Pull a specified number of Instagram posts in batches.
- Pagination: Uses
after_cursor
to fetch subsequent pages automatically. - Resumable: Keeps track of the last fetched cursor in a state file (
resume_file
) so you can resume fetching later. - Flexible Encoding: Includes error-handling for different text encodings (
utf-8
,latin-1
, etc.).
-
Python 3.7 or later
-
Requests library
pip install requests
-
A valid Instagram GraphQL
doc_id
or endpoint (the example usesdoc_id="7898261xxxxxxxxxxx"
, but this can change).
-
Clone the repository (or download the script files):
git clone https://github.com/hitthecodelabs/InstagramPostsFetcher.git
-
Install Dependencies:
cd InstagramPostsFetcher pip install -r requirements.txt
-
If you don't have a requirements file, just ensure
requests
is installed:pip install requests
-
Update the Code:
- Replace
YOUR_ACCESS_TOKEN
with your token if needed (or remove the Authorization header if you’re not using it). - Make sure your
doc_id
is correct or aligned with the currently valid Instagram GraphQL endpoints.
- Replace
-
Run the Script:
python ig_posts_fecther.py
By default, it will:
- Fetch posts from the specified username (
username
). - Save them to a JSON file (
output_file
, e.g.,instagram_posts_<username>.json
). - Keep track of pagination state in a resume file (
resume_file
, e.g.,resume_state_<username>.json
).
- Fetch posts from the specified username (
-
Check the Output:
- After running, open or parse the JSON file to verify the fetched posts.
- If you stop the script or if it stops due to an error, just run it again to resume where it left off.
-
fetch_instagram_posts(username, after_cursor, post_count)
Makes a GET request to the Instagram GraphQL endpoint to fetch a specific batch of posts. -
load_resume_state(resume_file)
Loads the last pagination state from a JSON file so you can resume from the last cursor. -
save_resume_state(resume_file, after_cursor, post_count)
Saves the pagination state after each successful fetch. -
load_existing_posts(output_file)
Loads previously saved posts so you can append newly fetched data. -
iterate_and_save_posts(username, output_file, resume_file, batch_size)
Orchestrates fetching in batches until there are no more posts or an error occurs. -
load_json_file(file_path)
Demonstrates safe reading of JSON files with fallback encodings.
- If Invalid Token errors occur, check your token or authorization method.
- If the
doc_id
is invalid or outdated, you may see errors from Instagram's GraphQL endpoint. Make sure to update to a valid endpoint ordoc_id
.
- Fork the project.
- Create your feature branch (
git checkout -b feature/NewFeature
). - Commit your changes (
git commit -m 'Add a new feature'
). - Push to the branch (
git push origin feature/NewFeature
). - Open a Pull Request.
This project is licensed under the MIT License. You are free to modify and use the code. Refer to the license file for details.