Telegram Groups Parser

Languages: English | Русский

A tool for searching and filtering Telegram groups/channels by keywords and participant count.

English

Features

Search Telegram groups/channels by keywords
Generate city+word combinations for comprehensive regional search
Automatic deduplication of all text files (queries, cities, words)
Filter results by minimum participant count
Resume interrupted operations (progress tracking)
Rate limiting and flood protection
Separate search and filtering processes for better performance

Setup

Install dependencies:
```
npm install telegram
```
Create Telegram API credentials:
- Go to https://my.telegram.org/apps
- Create a new application
- Get your API_ID and API_HASH

Configure environment: Create .env file:

API_ID=your_api_id
API_HASH=your_api_hash
PHONE_NUMBER=+1234567890
TG_2FA=your_2fa_password  # Optional, only if you have 2FA enabled

Prepare search queries: Create queries.txt file with one search term per line:
```
crypto trading
bitcoin
ethereum
programming
```

Usage

Command Line

Step 1: Search for groups

Option A: Regular search

node parse.js

Option B: Cities combinations search

node parse.js --cities

Other options:

node parse.js --reset-progress    # Reset search progress
node parse.js --help             # Show help

This will:

Automatically remove duplicates from queries.txt, cities.txt, and words.txt
Search for groups/channels using keywords from queries.txt (regular mode) or generated combinations (cities mode)
Save results with participant counts to groups.json
Track progress in processed_queries.json
Resume from where it left off if interrupted

Cities mode (--cities):

Generates all combinations from cities.txt and words.txt
Creates queries_cities.txt with combinations like "Moscow work", "Moscow freelance", etc.
Uses these combinations for search instead of queries.txt

Step 2: Extract group IDs (optional)

node extract_ids.js                    # Extract IDs with participant filtering
node extract_ids.js --with-usernames   # Extract IDs with usernames
node extract_ids.js --no-filter        # Extract all IDs without filtering
node extract_ids.js --help             # Show help

This will:

Read groups from groups.json
Filter by minimum participant count (from config)
Extract only group IDs (one per line)
Save to group_ids.txt

Configuration

Edit config.json to customize settings:

{
  "search": {
    "queriesFile": "queries.txt",
    "limitPerQuery": 20,
    "saveFile": "groups.json",
    "processedQueriesFile": "processed_queries.json",
    "twoLevelParsing": {
      "enabled": true,
      "firstLevel": {
        "limitPerQuery": 100,
        "maxWords": 30
      },
      "secondLevel": {
        "limitPerQuery": 20,
        "useAllWords": true
      }
    }
  },
  "extract": {
    "inputFile": "groups.json",
    "outputFile": "group_ids.txt",
    "includeUsernames": false,
    "minParticipants": 1000,
    "filterByParticipants": true
  },
  "throttle": {
    "betweenQueriesMs": 3000,
    "betweenRequestsMs": 1200,
    "maxRetries": 3,
    "retryBackoffMultiplier": 2,
    "floodWaitCapSec": 900
  }
}

Parameter Descriptions:

Search (search):

queriesFile - file with keywords for search
limitPerQuery - maximum number of results per query
saveFile - file to save all found groups
processedQueriesFile - file to track processed queries
twoLevelParsing - two-level parsing settings:
- enabled - enable two-level parsing
- firstLevel.limitPerQuery - results limit for first level (high)
- firstLevel.maxWords - number of first words for first level
- secondLevel.limitPerQuery - results limit for second level (regular)
- secondLevel.useAllWords - use all words for second level

ID Extraction (extract):

inputFile - input file with groups (default groups.json)
outputFile - output file with IDs (default group_ids.txt)
includeUsernames - whether to add @username after ID
minParticipants - minimum participant count for filtering
filterByParticipants - enable filtering by participants

Throttling (throttle):

betweenQueriesMs - delay between search queries (ms)
betweenRequestsMs - delay between API requests (ms)
maxRetries - maximum retry attempts on error
retryBackoffMultiplier - delay multiplier for retries
floodWaitCapSec - maximum wait time for FLOOD_WAIT (sec)

Web Interface

A modern web interface is available for easier management:

Installation:

npm install                 # Install dependencies
npm run server             # Start API server (port 3001)
npm run dev                # Start React app (port 3000)

Access: Open http://localhost:3000 in your browser

Features:

🔍 Parsing Control - Start/stop parsing with real-time logs
📋 ID Extraction - Extract group IDs with filtering options
📁 File Management - Edit queries, cities, words files
⚙️ Configuration - Manage all settings including two-level parsing
📊 Statistics - Real-time stats with progress tracking

Production:

npm run build              # Build for production
npm run preview            # Preview production build

API Endpoints:

POST /api/parse - Start regular parsing
POST /api/parse-cities - Start cities parsing
POST /api/extract - Extract IDs with options
GET /api/files/:filename - Get file content
PUT /api/files/:filename - Save file content
GET /api/config - Get configuration
PUT /api/config - Save configuration
GET /api/stats - Get statistics
ws://localhost:3001/ws - Real-time logs via WebSocket

🔄 Two-Level Parsing

New feature for more efficient data collection in cities mode:

How it works:

First Level: Uses high limit (100) for first N words (30)
Second Level: Uses regular limit (20) for all words

Benefits:

More results for popular queries
Time savings on less popular queries
Flexible configuration

Example:

28 cities × 30 first words = 840 queries with limit 100
28 cities × 129 all words = 3612 queries with limit 20
Total: 4452 queries instead of 3612 in regular mode

Configuration parameters:

twoLevelParsing.enabled - enable two-level parsing
firstLevel.limitPerQuery - results limit for first level (high)
firstLevel.maxWords - number of first words for first level
secondLevel.limitPerQuery - results limit for second level (regular)
secondLevel.useAllWords - use all words for second level

Testing:

node test_two_level.js  # Check two-level parsing settings

NPM Scripts

npm run dev      # Start React development server
npm run build    # Build for production
npm run preview  # Preview production build
npm run server   # Start API server
npm run parse    # Run parsing via CLI
npm run extract  # Run ID extraction via CLI

Reset Progress

node parse.js --reset-progress        # Reset search progress

Output Format

Groups are saved in JSON format:

[
  {
    "id": "123456789",
    "title": "Group Name",
    "username": "group_username",
    "type": "supergroup",
    "access_hash": "hash_value",
    "participants_count": 1500
  }
]

Group Types:

group - regular group
supergroup - supergroup
channel - channel

Files

Core Scripts:

parse.js - Main search script
extract_ids.js - ID extraction script
test_two_level.js - Two-level parsing test utility

Configuration:

config.json - Main configuration
.env - Environment variables (API keys)

Data Files:

queries.txt - Search keywords (regular mode)
cities.txt - List of cities (for --cities mode)
words.txt - List of words (for --cities mode)
queries_cities.txt - Generated city+word combinations

Output Files:

groups.json - All found groups with participant counts
group_ids.txt - Extracted group IDs
processed_queries.json - Search progress
session.json - Telegram session

Web Interface:

server.js - Express API server
src/ - React application source
index.html - HTML template
vite.config.js - Vite configuration
package.json - Dependencies and scripts

Troubleshooting

Common Issues

"Password is empty" error
- Add your 2FA password to .env file
- Or enter it when prompted
FLOOD_WAIT errors
- The script automatically handles these
- Increase delays in config if needed
Session expired
- Delete session.json and re-authenticate
No groups found
- Check your search terms in queries.txt
- Try more general keywords
Two-level parsing not working
- Check configuration in config.json
- Run node test_two_level.js to verify settings
- Ensure cities.txt and words.txt exist
Web interface not loading
- Ensure API server is running on port 3001
- Check if React dev server is running on port 3000
- Verify WebSocket connection in browser console
Real-time logs not updating
- Check WebSocket connection status
- Restart both API server and React app
- Clear browser cache and reload

Recommendations

For better search results:
- Use diverse keywords
- Include synonyms and variations
- Add both English and Russian terms
- Don't worry about duplicates in queries.txt - they are automatically removed
For stable operation:
- Don't run multiple instances simultaneously
- Regularly check logs for errors
- Make backups of results
For optimization:
- First run search with a small number of queries
- Configure filtering parameters for your needs
- Use different minimum participant values for different purposes

Usage Examples

Searching Crypto Groups

# queries.txt
cryptocurrency
bitcoin
ethereum
trading
blockchain
DeFi
NFT

# config.json - increase minimum participants
"minParticipants": 5000

Searching Job Chats

Regular mode:

# queries.txt
jobs moscow
vacancies
freelance
remote work
programmer
designer

# config.json - decrease minimum participants
"minParticipants": 500

Cities mode (recommended):

# cities.txt
Moscow
Saint Petersburg
Kazan
Novosibirsk

# words.txt
jobs
vacancies
freelance
programmer
designer

# Run
node parse.js --cities

Searching Educational Channels

# queries.txt
programming
python
javascript
courses
learning
IT

# config.json
"minParticipants": 1000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Telegram Groups Parser

Table of Contents

English

Features

Setup

Usage

Command Line

Configuration

Parameter Descriptions:

Web Interface

🔄 Two-Level Parsing

NPM Scripts

Reset Progress

Output Format

Group Types:

Files

Troubleshooting

Common Issues

Recommendations

Usage Examples

Searching Crypto Groups

Searching Job Chats

Searching Educational Channels

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
README.ru.md		README.ru.md
bun.lock		bun.lock
config.json		config.json
extract_ids.js		extract_ids.js
index.html		index.html
jsconfig.json		jsconfig.json
package-lock.json		package-lock.json
package.json		package.json
parse.js		parse.js
server.js		server.js
test_two_level.js		test_two_level.js
vite.config.js		vite.config.js

chatman-media/telegram-groups-parser

Folders and files

Latest commit

History

Repository files navigation

Telegram Groups Parser

Table of Contents

English

Features

Setup

Usage

Command Line

Configuration

Parameter Descriptions:

Web Interface

🔄 Two-Level Parsing

NPM Scripts

Reset Progress

Output Format

Group Types:

Files

Troubleshooting

Common Issues

Recommendations

Usage Examples

Searching Crypto Groups

Searching Job Chats

Searching Educational Channels

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages