This microservice provides a public RESTish API for fetching team and game data from a Curling Club Manager admin area. Since CCM does not expose a public API, this service scrapes the necessary data using Playwright and implements a robust caching layer for performance and resilience.
The majority of this project was vibe-coded using Gemini 2.5 Flash.
- Features
- Prerequisites
- Installation
- Configuration
- Running the Application
- API Endpoints
- Project Structure
- Caching Strategy
- Playwright Scraping Details
- Error Handling
- Contributing
- License
- RESTish API: Exposes endpoints for teams, single team lookup, all games, and upcoming games by sheet.
- Data Scraping: Uses Playwright to extract data from the Curling Club Manager (CCM) web interface.
- Persistent Caching: Stores scraped data to local JSON files (
cache/) to persist across application restarts. - In-Memory Caching: Maintains a quick-access in-memory cache to reduce file I/O for frequent requests.
- Scheduled Refresh: Periodically scrapes for new data (once per day) to keep the cache fresh.
- On-Demand Refresh: Supports manual cache refresh via an API query parameter.
- Concurrent Scraping: Utilizes Playwright's
BrowserContextto manage authenticated sessions and limit concurrent page loads for efficient scraping. - Robust Data Processing: Handles common scraping challenges like parsing dates, cleaning team names, and linking related entities (Games to Teams).
Before you begin, ensure you have the following installed:
- Node.js: (LTS version recommended, e.g., 18.x or 20.x)
- npm: (Comes with Node.js)
- Playwright Browsers: Playwright automatically downloads necessary browser binaries, but you can manually install them if needed:
npx playwright install
- Clone the repository:
git clone https://github.com/your-username/curling-club-microservice.git cd curling-club-microservice - Install dependencies:
npm install
This project uses environment variables for sensitive data (like CCM login credentials) and configurable settings (like the server port).
-
Create a
.envfile: In the root directory of your project, create a file named.env. -
Add environment variables: Populate the
.envfile with the following:# Server Port PORT=3000 # Curling Club Manager (CCM) Login Credentials # These are required for the scraper to authenticate with the /administrator panel. CCM_USER=your_ccm_username CCM_PASS=your_ccm_password # Maximum number of browser tabs to use for scraping MAX_CONCURRENCY=5 BASE_URL=https://yourcurlingclub.com SHEET_NAMES=A,B,C,D
-
Git Ignore: The
.envfile is already added to.gitignoreto prevent it from being committed to version control.
For development with live reloading:
npm run devThis will:
- Start the server using
nodemonandtsx. - Watch for changes in the
src/directory. - Automatically restart the server when TypeScript files are modified.
- Trigger an initial cache scrape upon startup (if no
cache/files exist). - Schedule periodic background refreshes.
You will see console logs related to Playwright operations (login, scraping progress) and cache status.
To build and run the compiled JavaScript:
npm run build
npm startnpm run build: Compiles all TypeScript files fromsrc/into JavaScript in thedist/directory.npm start: Runs the compiled application fromdist/index.js.
The API is exposed on http://localhost:<PORT>/api/. Replace <PORT> with the port configured in your .env file (default 3000).
Many GET endpoints can accept a refreshCache=true query parameter. If present, it will trigger an immediate full cache refresh before serving the request. This is useful for development or when you suspect the cached data might be stale.
Example: http://localhost:3000/api/teams?refreshCache=true
Returns an array of all known teams from the cached data.
Response:
[
{
"teamId": 101,
"name": "Team Alpha",
"league": "Monday Night",
"skip": { "first": "John", "last": "Doe" },
"vice": { "first": "Jane", "last": "Smith" },
"second": { "first": "Peter", "last": "Jones" },
"lead": { "first": "Mary", "last": "Brown" }
},
// ... more teams
]Retrieves a single team by either its name and league, or by its unique ID.
Query Parameters:
- By Name & League:
teamName(string, required ifteamIdis not provided)leagueName(string, required ifteamNameis provided)
- By ID:
teamId(number, alternative toteamNameandleagueName)
Response (200 OK):
A Team object if found, otherwise null (404 Not Found).
Example (by name): http://localhost:3000/api/team?teamName=Team%20Alpha&leagueName=Monday%20Night
Example (by ID): http://localhost:3000/api/team?teamId=101
Returns an array of all known games from the cached data.
Response:
[
{
"date": "2025-07-22T20:30:00.000Z",
"league": "Wednesday League",
"sheet": "Sheet 2",
"team1": {
"teamId": 103,
"name": "Team Gamma",
"league": "Wednesday League",
"skip": { "first": "Eve", "last": "Blue" },
"vice": { "first": "Frank", "last": "Yellow" },
"second": { "first": "Grace", "last": "Orange" },
"lead": { "first": "Harry", "last": "Purple" }
},
"team2": {
"teamId": 104,
"name": "Team Delta",
"league": "Wednesday League",
"skip": { "first": "Ivy", "last": "Black" },
"vice": { "first": "Jack", "last": "White" },
"second": { "first": "Karen", "last": "Brown" },
"lead": { "first": "Liam", "last": "Green" }
}
},
// ... more games
]Returns the next N upcoming games for each sheet, optionally starting from a specified date. Sheets with no upcoming games will have an empty array.
Query Parameters:
count(integer, optional): The maximum number of upcoming games to return per sheet. Defaults to3.fromDate(string, optional, date-time format): The date string (e.g.,YYYY-MM-DDTHH:mm:ssZ) from which to start searching for upcoming games. Defaults tonow minus 45 minutes(to catch recently started or ongoing games).
Response:
{
"Sheet 1": [
{ /* game 1 on Sheet 1 */ },
{ /* game 2 on Sheet 1 */ }
],
"Sheet 2": [
{ /* game 1 on Sheet 2 */ }
],
"Sheet 3": [], // Empty array if no upcoming games on this sheet
"Sheet 4": [
{ /* game 1 on Sheet 4 */ },
{ /* game 2 on Sheet 4 */ },
{ /* game 3 on Sheet 4 */ }
]
}Example (default): http://localhost:3000/api/nextGames
Example (next 5 games from specific date): http://localhost:3000/api/nextGames?count=5&fromDate=2025-07-20T12:00:00Z
The project follows a modular structure to keep concerns separated:
.
+-- src/ # Source code directory
¦ +-- api-routes.ts # Defines Express API endpoints and their logic
¦ +-- ccm-adapter.ts # Handles caching, background refresh, and interfaces with scraper
¦ +-- ccm-scraper.ts # Contains Playwright logic for scraping CCM website
¦ +-- index.ts # Application entry point, initializes cache and starts server
¦ +-- server.ts # Express server setup and listener
¦ +-- types.d.ts # TypeScript interface definitions (Team, Game, Name)
+-- .env # Environment variables (IGNORED by Git)
+-- .gitignore # Specifies files/directories to ignore in Git
+-- package.json # Project metadata and dependencies
+-- tsconfig.json # TypeScript compiler configuration
+-- dist/ # Compiled JavaScript output (generated by `npm run build`)
+-- cache/ # Persistent cache files (teams.json, games.json) (IGNORED by Git)
The microservice employs a multi-level caching strategy to optimize performance and ensure data freshness:
-
File System Cache (
cache/teams.json,cache/games.json):- Stores the last scraped data to ensure persistence across application restarts.
- Loaded into memory upon
initializeCache()when the application starts.
-
In-Memory Cache:
- Provides rapid access for API requests, minimizing disk I/O.
- Data is considered "stale" after 1 hour (
CACHE_STALE_THRESHOLD_MS). If an API request finds the in-memory cache stale, it attempts to reload from the persistent file cache.
-
Background Refresh:
- The
refreshCache()function is executed periodically (once per day, configurable viaREFRESH_INTERVAL_MS) by a scheduled timer (startBackgroundRefreshTimer). - It scrapes the latest data from the CCM website, updates the in-memory cache, and then saves the fresh data to the file system.
- The
-
On-Demand Refresh (
refreshCache=true):- Clients can explicitly request an immediate full cache refresh by adding a
refreshCache=truequery parameter to any APIGETendpoint. This is useful for development or when an immediate data update is required.
- Clients can explicitly request an immediate full cache refresh by adding a
This layered strategy ensures that data is fresh enough for users while minimizing the load on the external CCM website.
The ccm-scraper.ts module orchestrates the web scraping using Playwright:
- Login: The scraper first authenticates with
yourcurlingclub.com/administratorusing credentials provided viaCCM_USERandCCM_PASSenvironment variables. - Shared Context: A single Playwright
BrowserContextis established after successful login. All subsequent parallel scraping operations create new pages (tabs) within this same context. This is crucial for maintaining authentication cookies and ensuring the session persists across multiple concurrent requests to the website. - Team Scraping (
getAllTeams):- Navigates to the main teams list page.
- Identifies all available league IDs from a
<select>dropdown. - For each discovered league, it navigates to that specific league's teams list page.
- Extracts individual
teamIds from the displayed table. - For each
teamId, a new page is opened within the shared browser context to visit the team's dedicated edit/detail page (editTeam). - Concurrently processes up to
MAX_CONCURRENT_PAGES(default 5) team detail pages at once, significantly speeding up data collection. - Extracts team name, league association, and individual player names (Skip, Vice, Second, Lead).
- Player names (e.g., "Last, First") are parsed into structured
{ first: string, last: string }objects.
- Game Scraping (
getAllGames):- Re-uses the initial league ID discovery step.
- For each league, a new page is opened within the shared browser context to navigate to that league's games list page (
games). - The process concurrently scrapes game data for up to
MAX_CONCURRENT_PAGES(default 5) league game pages. - For each game row found in the HTML table, it extracts the raw date string, sheet name, and a "Team1 VS. Team2" string.
- Team Name Cleaning: Before linking, team names are cleaned to remove any parenthetical chores or descriptions (e.g., "Sebastian (Snack Duty)" becomes "Sebastian").
- Team Linking: Crucially,
Gameobjects are linked to the canonicalTeamobjects (retrieved during thegetAllTeamsphase) by constructing a composite lookup key (e.g., "monday night::team alpha") using both the team's cleaned name and its league. This ensures accurate association even if team names are not globally unique. - Game dates are parsed from the "MM-DD-YYYY H:MM XM" format into
Dateobjects.
- Graceful Shutdown:
process.onceevent listeners are implemented to ensure that the Playwright browser instance is reliably closed and resources are released when the Node.js process exits (e.g., viaCtrl+C,SIGINT, orSIGTERMsignals).
The microservice incorporates several layers of error handling:
- Playwright Failures: Scraping functions (
getAllTeams,getAllGames) usetry...catchblocks to gracefully handle potential issues like network errors, unresponsive pages, or changes in HTML selectors. Errors are logged to the console, and the scraping process attempts to continue, returning partial data or empty arrays/nullfor individual failures, preventing the entire refresh process from crashing. - Cache Initialization Errors: If the
initializeCache()process encounters a critical error during application startup (e.g., unable to log in to CCM), the application will log the error and terminate (process.exit(1)) to prevent serving stale or incomplete data. - API Input Validation: API endpoints validate incoming query parameters (
count,fromDate,teamId). Invalid input triggers an HTTP 400 Bad Request response with an empty response body. - Cache Refresh Failures: If a manual (
refreshCache=true) or scheduled cache refresh fails, appropriate HTTP 500 Internal Server Error responses are returned, and detailed errors are logged. - Data Invariant Checks: During game data processing in
ccm-adapter.ts, checks are performed to warn if team leagues referenced by a game do not match the game's league, or if a team cannot be found for a game, helping to identify potential data inconsistencies.
Feel free to open issues, suggest improvements, or submit pull requests.
MIT