CoderCastrov logo
CoderCastrov
Парсер

Парсинг веб-страниц с использованием Python — Часть 2: Парсинг AccuWeather с помощью BeautifulSoup и Selenium

Парсинг веб-страниц с использованием Python — Часть 2: Парсинг AccuWeather с помощью BeautifulSoup и Selenium
просмотров
4 мин чтение
#Парсер

После того, как мы ознакомились с некоторыми из известных библиотек для парсинга веб-страниц, пришло время применить их на практике. В этой статье я сосредоточусь на использовании Python для парсинга погодного веб-сайта. В качестве примера я буду использовать погодный портал AccuWeather.

Нахождение необходимой информации

Сейчас мы будем извлекать данные о городах с AccuWeather, используя «london» в качестве ключевого слова. Затем мы извлечем детали каждого города со страницы результатов поиска, используя доступную информацию. URL для этого поиска и страницы, которую мы будем парсить, выглядит так: https://www.accuweather.com/en/search-locations?query=london.

Необходимые инструменты

Прежде чем перейти к следующему этапу, убедимся, что у нас установлены и настроены все необходимые инструменты.

  • PyCharm
  • Python 3.10
  • chromdriver

Начало работы

Следуйте этим инструкциям, чтобы создать новый проект в PyCharm.

Установите необходимые библиотеки, введя команду pip install в терминале, чтобы код стал функциональным.

$ pip install beautifulsoup4 fake_headers webdriver-manager lxml

Исследование перед написанием кода

Перед тем, как мы начнем писать код, необходимо понять структуру и содержимое веб-сайта. Самый простой способ, о котором мы знаем, для этого - использовать браузер для просмотра целевой страницы. Хотя и другие браузеры предлагают аналогичные возможности, мы будем использовать инструменты разработчика Chrome.

Результат инспектирования элемента

Парсинг AccuWeather

Давайте создадим новый файл get_city_list_by_keyword.py и введите следующие строки кода:

import requests
from bs4 import BeautifulSoup

def get_city_list_by_keyword(keyword):
    url = f"https://www.accuweather.com/en/search-locations?query={keyword}"
    response = requests.get(url)
    soup = BeautifulSoup(response.content, "html.parser")
    results = soup.find_all("div", class_="search-results")
    
    city_list = []
    for result in results:
        city_name = result.find("h3").text.strip()
        city_link = result.find("a")["href"]
        city_list.append((city_name, city_link))
    
    return city_list

keyword = "London"
city_list = get_city_list_by_keyword(keyword)

for i, city in enumerate(city_list):
    city_name = city[0]
    city_link = city[1]
    print(f"{i+1}. {city_name} => {city_link}")

Когда вы запускаете код выше, мы получаем следующий результат:

1. London, London, GB => [https://www.accuweather.com/web-api/three-day-redirect?key=328328&target=](https://www.accuweather.com/web-api/three-day-redirect?key=328328&target=)
2. London, Ontario, CA => [https://www.accuweather.com/web-api/three-day-redirect?key=55489&target=](https://www.accuweather.com/web-api/three-day-redirect?key=55489&target=)
3. Siguel, South Cotabato, PH => [https://www.accuweather.com/web-api/three-day-redirect?key=3422743&target=](https://www.accuweather.com/web-api/three-day-redirect?key=3422743&target=)
4. London, Kentucky, US => [https://www.accuweather.com/web-api/three-day-redirect?key=333298&target=](https://www.accuweather.com/web-api/three-day-redirect?key=333298&target=)
5. London, Ohio, US(43140) => [https://www.accuweather.com/web-api/three-day-redirect?key=335012&target=](https://www.accuweather.com/web-api/three-day-redirect?key=335012&target=)
6. London, Line Islands, KI => [https://www.accuweather.com/web-api/three-day-redirect?key=1123888&target=](https://www.accuweather.com/web-api/three-day-redirect?key=1123888&target=)
7. London, Arkansas, US => [https://www.accuweather.com/web-api/three-day-redirect?key=2123173&target=](https://www.accuweather.com/web-api/three-day-redirect?key=2123173&target=)
8. London, California, US => [https://www.accuweather.com/web-api/three-day-redirect?key=2154402&target=](https://www.accuweather.com/web-api/three-day-redirect?key=2154402&target=)
9. London, Texas, US(75684) => [https://www.accuweather.com/web-api/three-day-redirect?key=2-2103712_1_al&target=](https://www.accuweather.com/web-api/three-day-redirect?key=2-2103712_1_al&target=)
10. London, Altay, RU => [https://www.accuweather.com/web-api/three-day-redirect?key=1-2451587_1_al&target=](https://www.accuweather.com/web-api/three-day-redirect?key=1-2451587_1_al&target=)
11. London, Svalbard, SJ => [https://www.accuweather.com/web-api/three-day-redirect?key=2280056&target=](https://www.accuweather.com/web-api/three-day-redirect?key=2280056&target=)
12. London / Chapeskie Field, Ontario, CA => [https://www.accuweather.com/web-api/three-day-redirect?key=3-147408_1_poi_al&target=](https://www.accuweather.com/web-api/three-day-redirect?key=3-147408_1_poi_al&target=)
13. London, Minnesota, US(56036) => [https://www.accuweather.com/web-api/three-day-redirect?key=1-2247764_1_al&target=](https://www.accuweather.com/web-api/three-day-redirect?key=1-2247764_1_al&target=)
14. London, Michigan, US => [https://www.accuweather.com/web-api/three-day-redirect?key=1-2211307_1_al&target=](https://www.accuweather.com/web-api/three-day-redirect?key=1-2211307_1_al&target=)
15. London Township, Kansas, US => [https://www.accuweather.com/web-api/three-day-redirect?key=2639759&target=](https://www.accuweather.com/web-api/three-day-redirect?key=2639759&target=)
16. London, Texas, US(76854) => [https://www.accuweather.com/web-api/three-day-redirect?key=340897&target=](https://www.accuweather.com/web-api/three-day-redirect?key=340897&target=)
17. London, Missouri, US => [https://www.accuweather.com/web-api/three-day-redirect?key=2127488&target=](https://www.accuweather.com/web-api/three-day-redirect?key=2127488&target=)
18. London, Ohio, US(45647) => [https://www.accuweather.com/web-api/three-day-redirect?key=5-2214979_1_al&target=](https://www.accuweather.com/web-api/three-day-redirect?key=5-2214979_1_al&target=)
19. London, Texas, US(75684) => [https://www.accuweather.com/web-api/three-day-redirect?key=1-2185106_1_al&target=](https://www.accuweather.com/web-api/three-day-redirect?key=1-2185106_1_al&target=)
20. London, Alabama, US(36064) => [https://www.accuweather.com/web-api/three-day-redirect?key=2255111&target=](https://www.accuweather.com/web-api/three-day-redirect?key=2255111&target=)
21. London, Alabama, US(36432) => [https://www.accuweather.com/web-api/three-day-redirect?key=2231416&target=](https://www.accuweather.com/web-api/three-day-redirect?key=2231416&target=)
22. London, Indiana, US => [https://www.accuweather.com/web-api/three-day-redirect?key=2122624&target=](https://www.accuweather.com/web-api/three-day-redirect?key=2122624&target=)
23. London, Texas, US(75684) => [https://www.accuweather.com/web-api/three-day-redirect?key=2634173&target=](https://www.accuweather.com/web-api/three-day-redirect?key=2634173&target=)
24. London, Minnesota, US(55616) => [https://www.accuweather.com/web-api/three-day-redirect?key=2248424&target=](https://www.accuweather.com/web-api/three-day-redirect?key=2248424&target=)
25. London, West Virginia, US => [https://www.accuweather.com/web-api/three-day-redirect?key=2148311&target=](https://www.accuweather.com/web-api/three-day-redirect?key=2148311&target=)
26. London, Wisconsin, US => [https://www.accuweather.com/web-api/three-day-redirect?key=2248760&target=](https://www.accuweather.com/web-api/three-day-redirect?key=2248760&target=)
27. London, Tennessee, US => [https://www.accuweather.com/web-api/three-day-redirect?key=2203011&target=](https://www.accuweather.com/web-api/three-day-redirect?key=2203011&target=)
28. London, Pennsylvania, US => [https://www.accuweather.com/web-api/three-day-redirect?key=2242809&target=](https://www.accuweather.com/web-api/three-day-redirect?key=2242809&target=)
29. London, Oregon, US => [https://www.accuweather.com/web-api/three-day-redirect?key=2187826&target=](https://www.accuweather.com/web-api/three-day-redirect?key=2187826&target=)
30. London, Ohio, US(44875) => [https://www.accuweather.com/web-api/three-day-redirect?key=2214978&target=](https://www.accuweather.com/web-api/three-day-redirect?key=2214978&target=)
31. London, Delta, NG => [https://www.accuweather.com/web-api/three-day-redirect?key=930167&target=](https://www.accuweather.com/web-api/three-day-redirect?key=930167&target=)
32. London, Sistan and Baluchestan, IR => [https://www.accuweather.com/web-api/three-day-redirect?key=2082737&target=](https://www.accuweather.com/web-api/three-day-redirect?key=2082737&target=)
33. London, Orange Walk, BZ => [https://www.accuweather.com/web-api/three-day-redirect?key=1162514&target=](https://www.accuweather.com/web-api/three-day-redirect?key=1162514&target=)
34. London, Miranda, VE => [https://www.accuweather.com/web-api/three-day-redirect?key=1796640&target=](https://www.accuweather.com/web-api/three-day-redirect?key=1796640&target=)
35. London, Limpopo, ZA => [https://www.accuweather.com/web-api/three-day-redirect?key=1145972&target=](https://www.accuweather.com/web-api/three-day-redirect?key=1145972&target=)
36. London, Mpumalanga, ZA => [https://www.accuweather.com/web-api/three-day-redirect?key=1145971&target=](https://www.accuweather.com/web-api/three-day-redirect?key=1145971&target=)
37. London, Kachin, MM => [https://www.accuweather.com/web-api/three-day-redirect?key=1-1437856_1_al&target=](https://www.accuweather.com/web-api/three-day-redirect?key=1-1437856_1_al&target=)
38. London, Litoral, GQ => [https://www.accuweather.com/web-api/three-day-redirect?key=904379&target=](https://www.accuweather.com/web-api/three-day-redirect?key=904379&target=)
39. Londonderry, Derry City and Strabane, GB => [https://www.accuweather.com/web-api/three-day-redirect?key=329139&target=](https://www.accuweather.com/web-api/three-day-redirect?key=329139&target=)
40. Londonderry, New Hampshire, US => [https://www.accuweather.com/web-api/three-day-redirect?key=2174076&target=](https://www.accuweather.com/web-api/three-day-redirect?key=2174076&target=)
41. London Colney, Hertfordshire, GB => [https://www.accuweather.com/web-api/three-day-redirect?key=711822&target=](https://www.accuweather.com/web-api/three-day-redirect?key=711822&target=)
42. London Apprentice, Cornwall, GB => [https://www.accuweather.com/web-api/three-day-redirect?key=2523034&target=](https://www.accuweather.com/web-api/three-day-redirect?key=2523034&target=)
43. London Aquatics Centre, Newham, GB => [https://www.accuweather.com/web-api/three-day-redirect?key=53667_poi&target=](https://www.accuweather.com/web-api/three-day-redirect?key=53667_poi&target=)
44. London Bridge Golf Course, Arizona, US => [https://www.accuweather.com/web-api/three-day-redirect?key=189716_poi&target=](https://www.accuweather.com/web-api/three-day-redirect?key=189716_poi&target=)
45. London Country Club, Kentucky, US => [https://www.accuweather.com/web-api/three-day-redirect?key=192615_poi&target=](https://www.accuweather.com/web-api/three-day-redirect?key=192615_poi&target=)
46. London Country Club, Ohio, US => [https://www.accuweather.com/web-api/three-day-redirect?key=182065_poi&target=](https://www.accuweather.com/web-api/three-day-redirect?key=182065_poi&target=)
47. London Downs Golf Club, Virginia, US => [https://www.accuweather.com/web-api/three-day-redirect?key=185686_poi&target=](https://www.accuweather.com/web-api/three-day-redirect?key=185686_poi&target=)
48. London Velopark, Newham, GB => [https://www.accuweather.com/web-api/three-day-redirect?key=53669_poi&target=](https://www.accuweather.com/web-api/three-day-redirect?key=53669_poi&target=)
49. Londonderry Country Club, New Hampshire, US => [https://www.accuweather.com/web-api/three-day-redirect?key=180715_poi&target=](https://www.accuweather.com/web-api/three-day-redirect?key=180715_poi&target=)
50. London Road Stadium, Peterborough, GB => [https://www.accuweather.com/web-api/three-day-redirect?key=196433_poi&target=](https://www.accuweather.com/web-api/three-day-redirect?key=196433_poi&target=)
51. London Airport, Ontario, CA => [https://www.accuweather.com/web-api/three-day-redirect?key=1641_poi&target=](https://www.accuweather.com/web-api/three-day-redirect?key=1641_poi&target=)
52. London Gatwick Airport, West Sussex, GB => [https://www.accuweather.com/web-api/three-day-redirect?key=5440_poi&target=](https://www.accuweather.com/web-api/three-day-redirect?key=5440_poi&target=)

Парсинг подробной информации с AccuWeather

Мы можем получить подробную информацию о каждом городе после получения списка городов. Чтобы узнать больше, перейдите по ссылке.

AccuWeather Detail Page

Для парсинга деталей используйте следующий код:

В данный момент, при запуске кода, мы получим следующий результат:

░░░ Лондон, Лондон, GB >>> Погода = {'current_weather': '22°C, Солнечно', 'current_date': 'Июнь, 22', 'current_time': '10:18 AM', 'air_quality': '21°', 'wind': 'Хорошо', 'wind_gusts': 'СВ 13 км/ч', 'tomorrow_weather': '24°/ 15°, Прохожий дневной дождь'} ░░░

Пожалуйста, ознакомьтесь с кодом ниже и моей страницей на github для полного кода.

Вот и всё, что я обсудил в этой статье о парсинге AccuWeather. Если вам понравилась эта статья, пожалуйста, поделитесь ею с друзьями и подписчиками и посмотрите мои другие публикации. Спасибо! :)