Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@ description: Lesson about building a Python application for watching prices. Usi
slug: /scraping-basics-python/downloading-html
---

import CodeBlock from '@theme/CodeBlock';
import Exercises from '../scraping_basics/_exercises.mdx';
import LegoExercise from '!!raw-loader!roa-loader!./exercises/scrape_lego.py';

**In this lesson we'll start building a Python application for watching prices. As a first step, we'll use the HTTPX library to download HTML code of a product listing page.**

Expand Down Expand Up @@ -139,26 +141,17 @@ Letting our program visibly crash on error is enough for our purposes. Now, let'

<Exercises />

### Scrape AliExpress
### Scrape LEGO

Download HTML of a product listing page, but this time from a real world e-commerce website. For example this page with AliExpress search results:
Download HTML of a product listing page, but this time from a real world e-commerce website. For example this page with LEGO search results:

```text
https://www.aliexpress.com/w/wholesale-darth-vader.html
https://www.lego.com/themes/star-wars
```

<details>
<summary>Solution</summary>

```py
import httpx

url = "https://www.aliexpress.com/w/wholesale-darth-vader.html"
response = httpx.get(url)
response.raise_for_status()
print(response.text)
```

<CodeBlock language="py">{LegoExercise.code}</CodeBlock>
</details>

### Save downloaded HTML as a file
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
import httpx

url = "https://www.lego.com/themes/star-wars"
response = httpx.get(url)
response.raise_for_status()
print(response.text)
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
setup() {
DIR=sources/academy/webscraping/scraping_basics_python/exercises
}

@test "outputs the HTML with Star Wars products" {
run uv run --with httpx python "$DIR/scrape_lego.py"
[[ "$output" == *"Millennium Falcon"* ]]
}
Loading