Skip to content

Conversation

@honzajavorek
Copy link
Collaborator

Previously

  • Python course had an images folder with images
  • New JS course (unlisted) had an images folder as well, but it was just a symlink to the Python one
  • Each of the courses had it's own _exercises.mdx

After this change

  • There will be three top level folders: scraping_basics with shared stuff, scraping_basics_javascript2 for the JS course, scraping_basics_python for the Python course
  • The scraping_basics contains shared images and partials (currently just _exercises.mdx, but there is a potential to share more in the future if we want)
  • The JS and Python courses link to the shared folder for images and partials
  • Only one shared _exercises.mdx, no duplication
  • No symlinks or other hacks

Note

This change doesn't touch the original scraping_basics_javascript course at all.

@honzajavorek honzajavorek requested a review from TC-MO September 4, 2025 13:16
@honzajavorek honzajavorek added the t-academy Issues related to Web Scraping and Apify academies. label Sep 4, 2025
@apify-service-account
Copy link

Preview for this PR was built for commit 8f903c5 and is ready at https://pr-1889.preview.docs.apify.com!

@honzajavorek honzajavorek mentioned this pull request Sep 8, 2025
5 tasks
@honzajavorek honzajavorek force-pushed the honzajavorek/restructure branch from 8f903c5 to 7d2a9ba Compare October 14, 2025 09:30
@apify-service-account
Copy link

Preview for this PR was built for commit 7d2a9ba and is ready at https://pr-1889.preview.docs.apify.com!

@honzajavorek
Copy link
Collaborator Author

I updated the changes so they're against current master. I fixed a few Vale errors.

@apify-service-account
Copy link

Preview for this PR was built for commit 17c47b4 and is ready at https://pr-1889.preview.docs.apify.com!

Copy link
Contributor

@TC-MO TC-MO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, hopefully Vale won't break... again

@honzajavorek honzajavorek merged commit 930242a into master Oct 15, 2025
9 checks passed
@honzajavorek honzajavorek deleted the honzajavorek/restructure branch October 15, 2025 12:26
honzajavorek added a commit that referenced this pull request Nov 21, 2025
The aim of this PR is to publish the new JS course as described in the
PR description of #1584, and to
unlist the old JS course. The old one should be still accessible for a
grace period.

_Replacing the old JS course with a new one, which is identical to the
Python course, has been previously sanctioned by both Ondra and Michał._

### The Plan
- [x] The `scraping_basics_javascript` root leads to the new JS course.
- [x] The pages of the old JS course move to
`legacy/web-scraping-for-beginners`. It's gonna be a read-only archive.
Must be `noindex` to avoid cannibalization issues.
- [x] The `web-scraping-for-beginners`, i.e. the root of the old JS
course URLs, leads to redirects which take people to corresponding pages
in the new JS course. This lets us use the SEO juice from the old URLs.
- [x] The redirects add `#old-js-course` to the URL. The new JS course
pages contain a component which, if `#old-js-course` is present in the
URL, displays a _commemorative plaque_ about the change and link the old
JS course. This improves UX: "Hey, you have until 1.1.2026 to go through
this course. After that please refer to the newly updated JS course
<link>."
- [ ] At some point in future, we'll nuke the archive of the old JS
course and link Internet Archive instead in the _commemorative plaque_.

_The Plan is a result of a [long discussion between Michał, Aleš, and
me](https://pyvec.slack.com/archives/C03BHBQNNG3/p1756992893312119),
which takes into account both the UX of existing users of the JS course
and SEO._

### Related Work
- Depends on #1889
- Closes #1584
- Closes #1579
- Fixes #947
- Discovered #1900
- Closes #2009 (PoC)
- Contains #2023
- Closes #1550

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Publishes the new JS course, archives the old one with redirects and
an on-page notice, and updates links, content, and Nginx rewrites across
the docs.
> 
> - **Academy: New JS course rollout**
> - Publishes `academy/webscraping/scraping_basics_javascript/*` (new
slugs, content, and index) and updates internal references to it.
> - Archives the old JS course under
`academy/webscraping/scraping_basics_legacy/*` with `noindex` and a
legacy notice.
> - Adds `src/components/LegacyJsCourseAdmonition.jsx` and integrates it
into new course pages to show a notice when `?legacy-js-course=` is
present.
> - Updates course metadata (titles/sidebar labels) in
Expert/Anti‑scraping lessons and adds caution notes where content
depends on the legacy course.
> - Updates homepage card and other references to point to
`'/academy/scraping-basics-javascript'`.
> - **Routing/Redirects (Nginx)**
> - Redirects old JS course paths
`^/academy/web-scraping-for-beginners...` to
`'/academy/scraping-basics-javascript'` with `?legacy-js-course=...`.
> - Adds other redirects (e.g., output-schema → dataset-schema, academy
php path, advanced web scraping path fix).
> - **Content/link maintenance**
> - Repoints numerous lessons to new paths (e.g., tutorials,
Puppeteer/Playwright, advanced courses) and updates sample URLs in
integrations (Make) to the new JS course.
> - Minor copy/heading tweaks (e.g., RPA title), and consistent
slug/slug changes across documents.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
2840ebd. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Michał Olender <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

t-academy Issues related to Web Scraping and Apify academies.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants