Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
2b60e6d
Merge commit 'ffa58876c945108ce440c78ab4b19901c2ec7ef1' into download…
lenaschimmel Nov 22, 2022
2f109b5
Merge commit 'c8b9f45345a339206a0d90fc330b6c0418c2fc24' into download…
lenaschimmel Nov 22, 2022
23dba29
WIP: save known tweets to file and re-use them, to avoid re-downloads…
lenaschimmel Nov 22, 2022
d1fb57c
Tweet merging and a lot of error handling. Reduce number of useless r…
lenaschimmel Nov 23, 2022
59d653f
Merge commit 'fb0a0b65188488a64de648a446ffd26a7f6b51d0' into download…
lenaschimmel Nov 23, 2022
c7de375
replace image URLs in DMs with links to local files
flauschzelle Nov 23, 2022
f4e628b
url format for original size version of DM images, and comments expla…
flauschzelle Nov 24, 2022
38f232a
parse group DMs and output them as markdown
flauschzelle Nov 24, 2022
0e2254f
Merge branch 'upstream/main' into improve-dm-output
flauschzelle Nov 24, 2022
594c070
Merge branch 'upstream/main' into parse-group-dms
flauschzelle Nov 24, 2022
f52dfa3
Reorganize file output paths.
lenaschimmel Nov 24, 2022
2da269c
Merge commit '2acb75634c68fe33419743dde40ca2252984d663' into outputfo…
lenaschimmel Nov 24, 2022
3efe1b7
user handles in group dms output filename sorted by activity (for rep…
flauschzelle Nov 24, 2022
a79458f
Merge commit '2acb75634c68fe33419743dde40ca2252984d663' into download…
lenaschimmel Nov 24, 2022
9155e36
Bugfix: merge lists inside merged dicts, prevents None in known_tweets.
lenaschimmel Nov 24, 2022
ba62af0
use pathconfig for group dm output filenames
flauschzelle Nov 25, 2022
75fc713
separate collection of user ids from content parsing and output gener…
flauschzelle Nov 25, 2022
feef679
Make a minimal commit to test if GitHub is still broken...
lenaschimmel Nov 25, 2022
04e3a75
copy image files from group dms to output media dir and embed images …
flauschzelle Nov 25, 2022
fc6abe6
refactored: collect user ids from dms (separately) and do lookup befo…
flauschzelle Nov 25, 2022
4a76d17
refactored: collect user ids from followers (separately) and do looku…
flauschzelle Nov 25, 2022
d1bfca8
refactored: collect user ids from followings (separately) and do look…
flauschzelle Nov 25, 2022
1609d98
bundle the lookup of user handles (from followings, followers and dir…
flauschzelle Nov 25, 2022
c61911e
add empty lines to the output for better readability
flauschzelle Nov 25, 2022
373198c
escape md control chars in md output of tweet text body
flauschzelle Nov 25, 2022
a8d0e05
check if user is in the correct folder before init of the other paths
flauschzelle Nov 25, 2022
c17fe53
Merge pull request #125 from flauschzelle/fix-path-check-order
flauschzelle Nov 26, 2022
8a3737c
Merge pull request #116 from flauschzelle/improve-dm-output
flauschzelle Nov 26, 2022
676d557
Merge branch 'upstream/main' into parse-group-dms
flauschzelle Nov 26, 2022
3bb150c
Merge pull request #118 from flauschzelle/parse-group-dms
flauschzelle Nov 26, 2022
e890115
Merge branch 'main' into extract-collecting-userids
flauschzelle Nov 26, 2022
0810231
include users from group DMs in bulk handle lookup, extra prompt if t…
flauschzelle Nov 26, 2022
7bb5bfe
some more improved code formatting
flauschzelle Nov 26, 2022
b9990d9
Merge pull request #123 from flauschzelle/escape-markdown
flauschzelle Nov 26, 2022
74c01c5
Merge commit 'b9990d9fac9b9c7c3fb82899582e3291f6a244b9' into download…
lenaschimmel Nov 26, 2022
99eb174
Merge commit 'b9990d9fac9b9c7c3fb82899582e3291f6a244b9' into outputfo…
lenaschimmel Nov 26, 2022
dc1917a
escape md control chars in DMs and format them as quotes instead of c…
flauschzelle Nov 26, 2022
d41186a
Move and/or remove output in the archive root, which was left there b…
lenaschimmel Nov 26, 2022
5482d6c
Merge branch 'main' into extract-collecting-userids
flauschzelle Nov 26, 2022
e3207b8
Add method `get_consent` which ensures that yes/no questions are disp…
lenaschimmel Nov 26, 2022
94b1d50
simplified check and clarified text for prompting the optional exclus…
flauschzelle Nov 26, 2022
67baadd
Minor cleanup.
lenaschimmel Nov 26, 2022
7cc17a0
catch ValueError from urlparse when looking for links in old tweets
flauschzelle Nov 26, 2022
c4c1ac7
Merge pull request #121 from flauschzelle/extract-collecting-userids
flauschzelle Nov 27, 2022
16a716c
Merge branch 'main' into escape-markdown
flauschzelle Nov 27, 2022
819a271
Merge pull request #132 from flauschzelle/escape-markdown
flauschzelle Nov 27, 2022
9f7a7e3
Merge branch 'main' into urlparse-error-handling
flauschzelle Nov 27, 2022
4c315aa
Merge pull request #134 from flauschzelle/urlparse-error-handling
flauschzelle Nov 27, 2022
67dabf7
Merge branch 'main' into downloadtweets, maybe media downloading is b…
lenaschimmel Nov 27, 2022
aa52c93
output of % done and estimated remaining time while trying to downloa…
flauschzelle Nov 27, 2022
3daec96
Update README.md to represent the current state of DM parsing
flauschzelle Nov 27, 2022
638f33b
Update README.md with more explanation of how to use a command prompt.
flauschzelle Nov 27, 2022
d616c1f
Update README.md with more info about current DMs parsing functionality.
flauschzelle Nov 27, 2022
8c657c8
Merge pull request #137 from flauschzelle/time-remaining
flauschzelle Nov 27, 2022
f40258e
Merge pull request #138 from flauschzelle/update-readme
flauschzelle Nov 27, 2022
199ca9f
Fix multiple bugs which prevented media downloading and/or resulted i…
lenaschimmel Nov 27, 2022
9d75a99
index on (no branch): 199ca9f Fix multiple bugs which prevented media…
lenaschimmel Nov 27, 2022
28e5454
Merge branch 'downloadtweets-bugfix' into downloadtweets
lenaschimmel Nov 27, 2022
aadfbd4
Merge remote-tracking branch 'upstream/main' into robust-consent
lenaschimmel Nov 27, 2022
911c46c
Use get_consent for user handle download.
lenaschimmel Nov 27, 2022
3f8fe52
Merge pull request #133 from lenaschimmel/robust-consent
lenaschimmel Nov 27, 2022
76b8fe7
Merge remote-tracking branch 'upstream/main' into outputfolders
lenaschimmel Nov 27, 2022
4e907d5
Use create_path_for_file_output_dms for group DMs, remove unused file…
lenaschimmel Nov 27, 2022
3dbf43c
Merge pull request #120 from lenaschimmel/outputfolders
lenaschimmel Nov 27, 2022
6be8b78
Merge commit '3dbf43ce62fc5a670b5b10f3fce26e86fc18712e' into download…
lenaschimmel Nov 27, 2022
b65d66c
Use get_consent in migrate_old_output, skip question about downloadin…
lenaschimmel Nov 27, 2022
fdfe909
Extract format_duration and use it for additional download time estim…
lenaschimmel Nov 27, 2022
179d011
Fix bug which re-downloaded the same tweets over and over, instead of…
lenaschimmel Nov 27, 2022
7d470f7
Merge branch 'upstream/downloadtweets' into downloadtweets
lenaschimmel Nov 27, 2022
11846a5
Remove verbose tweet download logging and instead tell the user somet…
lenaschimmel Nov 27, 2022
3c59d12
Remove listing of moved media files, it's much to verbose.
lenaschimmel Nov 27, 2022
d39adb1
also include users with 0 messages in filename generation for group D…
flauschzelle Nov 27, 2022
af5fb36
Merge pull request #140 from lenaschimmel/outputfolders
lenaschimmel Nov 27, 2022
862f7f4
Merge pull request #141 from flauschzelle/group-dms-patch1
lenaschimmel Nov 27, 2022
c584bbd
Bugfix: make sure that a UserData object can't have an empty handle
flauschzelle Nov 27, 2022
cb5897d
added a few more 'not None' checks
flauschzelle Nov 27, 2022
3626068
Merge pull request #142 from flauschzelle/group-dms-patch1
flauschzelle Nov 28, 2022
5f499d7
Added TechCrunch article
timhutton Nov 28, 2022
ff7a7b6
Merge remote-tracking branch 'upstream/main' into downloadtweets
lenaschimmel Nov 28, 2022
2e0a125
Reduce redundant tweet downloads.
lenaschimmel Nov 28, 2022
b68eefd
fix bug in download_larger_media
flauschzelle Nov 28, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,12 @@
1. [Download your Twitter archive](https://twitter.com/settings/download_your_data) (Settings > Your account > Download an archive of your data).
2. Unzip to a folder.
3. Right-click this link --> [parser.py](https://raw.githubusercontent.com/timhutton/twitter-archive-parser/main/parser.py) <-- and select "Save Link as", and save into the folder where you extracted the archive. (Or use wget or curl on that link. Or clone the git repo.)
4. Run parser.py with [Python 3](https://realpython.com/installing-python/). e.g. `python parser.py` from a command prompt opened in that folder.
4. Open a command prompt and change directory into the unzipped folder where you just saved parser.py.
(**Here's how to do that on Windows:** Hold shift while right-clicking in the folder. Click on `Open PowerShell`.)
5. Run parser.py with [Python 3](https://realpython.com/installing-python/). e.g. `python parser.py`.
(**On Windows:** When the command window opens, paste or enter `python parser.py` at the command prompt.)



If you are having problems please check the [issues list](https://github.com/timhutton/twitter-archive-parser/issues?q=is%3Aissue) to see if it has happened before, and open a new issue otherwise.

Expand All @@ -21,7 +26,7 @@ Our script does the following:
- Replaces t.co URLs with their original versions (the ones that can be found in the archive).
- Copies used images to an output folder, to allow them to be moved to a new home.
- Will query Twitter for the missing user handles (checks with you first).
- Converts DMs to markdown, including the handles that we retrieved. Basic functionality for now (no embedded images), pending improvements.
- Converts DMs (including group DMs) to markdown with embedded media and links, including the handles that we retrieved.
- Outputs lists of followers and following.
- Downloads the original size images (checks with you first).

Expand All @@ -30,6 +35,7 @@ Our script does the following:
Some of the functionality requires the `requests` and `imagesize` modules. `parser.py` will offer to install these for you using pip. To avoid that you can install them before running the script.

## Articles about handling your Twitter archive:
- https://techcrunch.com/2022/11/21/quit-twitter-better-with-these-free-tools-that-make-archiving-a-breeze/
- https://www.bitsgalore.org/2022/11/20/how-to-preserve-your-personal-twitter-archive
- https://matthiasott.com/notes/converting-your-twitter-archive-to-markdown

Expand Down
Loading