Response 403 python requests как исправить

python requests 403 error

Есть cms bludit, есть у нее апи которым можно создавать новые записи post запросом. Но возвращает ошибку 403, код python:

requests.post("http://***.ru/api/pages", json=< "token": "api_token", "authentication": "user_token", "title": "test t", "content": "test d">)

~~responseerr~~
29.11.23 07:20:14 MSK

Ответить на это сообщение
Ссылка

Означает что доступ к этому url запрещён. Почему — разбирайся со своей CMS. Питон и requests тут не причём.

eternal_sorrow ★★★★★
( 29.11.23 07:26:08 MSK )

Ответить на это сообщение
Показать ответ
Ссылка

Ответ на: комментарий от eternal_sorrow 29.11.23 07:26:08 MSK

Так если отправлять post на корень сайта, то есть не http://***.ru/api/pages а просто http://***.ru — ошибка та же самая, пр этом с браузера сайт открывается

~~responseerr~~
( 29.11.23 07:30:12 MSK ) автор топика

Ответить на это сообщение
Показать ответ
Ссылка

Python 3.x. HTTP Error 403: Forbidden (даже с «User-agent»)

Хочу написать программу, которая бы скачивала альбомы с музыкой. Для начала решил просто прочитать страницу. Однако если с другими сайтами такой проблемы нет, то с необходимым всего случается Response 403. Не помогло добавление даже «user agent», как советуют во всех гайдах. Что делать? Код привожу ниже:

import urllib.request user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:87.0) Gecko/20100101 Firefox/87.0' url = "https://2mz.me/" request = urllib.request.Request(url) request.add_header('User-Agent', user_agent) response = urllib.request.urlopen(request) print(request)

Отслеживать
12.5k 7 7 золотых знаков 19 19 серебряных знаков 48 48 бронзовых знаков
задан 13 апр 2021 в 10:36
Егор Приставка Егор Приставка
9 5 5 бронзовых знаков

2 ответа 2

Сортировка: Сброс на вариант по умолчанию

Кроме user-agent есть и другие заголовки, которые проверяются сайтом. Добавьте все, чтоб не гадать:

import requests url = 'https://2mz.me/' headers = < 'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 'accept-encoding': 'gzip, deflate, br', 'accept-language': 'uk,en-US;q=0.9,en;q=0.8,ru;q=0.7', 'sec-ch-ua': '"Google Chrome";v="89", "Chromium";v="89", ";Not A Brand";v="99"', 'sec-ch-ua-mobile': '?0', 'sec-fetch-dest': 'document', 'sec-fetch-mode': 'navigate', 'sec-fetch-site': 'none', 'sec-fetch-user': '?1', 'upgrade-insecure-requests': '1', 'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36' >r = requests.get(url, headers=headers) >>> r.status_code 200

Отслеживать
ответ дан 13 апр 2021 в 10:51
12.5k 7 7 золотых знаков 19 19 серебряных знаков 48 48 бронзовых знаков
Тут забыли закрыть кавычки для ключа: ‘accept: text/html, 🙂 Работает, вы молодец 🙂
13 апр 2021 в 10:56
Странно, не увидел поля sec- в заголовках запроса, у вас они добавляются гугл хромом?
13 апр 2021 в 10:58

@gil9red поправил кавычки, спасибо! да, sec- добавляются (Version 89.0.4389.90 (Official Build) (64-bit))

13 апр 2021 в 10:59

А. понятно.. я у них попробовал ваш запрос (было 200), после начал убирать поля пока не наткнулся на 403, после стал возвращать поля и когда вернул все поля, все-равно 403, думаю что за нафиг, открываю сайт в браузере, а тот пишет про подозрительную активность по IP и предлагает ввести капчу. Сайт делали параноики 🙂

13 апр 2021 в 11:02

спасибо! теперь действительно теперь перестала 403-я вылезать. Правда один чёрт не приходит то что надо. Но на заметку возьму такой способ обхода)

13 апр 2021 в 12:44

Что-то намутили в том сайте, не стал разбираться, поэтому сделал через selenium.

from selenium import webdriver from selenium.webdriver.firefox.options import Options options = Options() options.add_argument('--headless') driver = webdriver.Firefox(options=options) driver.implicitly_wait(10) try: driver.get('https://2mz.me/') print(f'Title: ""') for item in driver.find_elements_by_css_selector('#tracks .item'): title = item.find_element_by_css_selector('.item-title').text author = item.find_element_by_css_selector('.item-author').text print(title, author) finally: driver.quit()

Title: "2mz.me - слушать музыку онлайн или скачать бесплатно в mp3" Minor Miyagi & Andy Panda ТЫ ГОРИШЬ КАК ОГОНЬ SLAVA MARLOW . неболей Zivert, Баста Поболело и прошло HENSY

Почему парсер выдаёт 403 даже после указания Cookie и User-Agent?

Пытался написать парсер для выгрузки себе картинок с artstation.com, взял рандомный профиль, практически весь контент там подгружается json-ом, нашёл GET запрос, в браузере он открывается норм, а через requests.get выдает 403. В гугле все советуют указать заголовок User-Agent и Cookie, использовал requests.sessions и указал User-Agent, но всё равно картина та же, ЧЯДНТ?

import requests url = 'https://www.artstation.com/users/kuvshinov_ilya' json_url = 'https://www.artstation.com/users/kuvshinov_ilya/projects.json?page=1' header = session = requests.Session() r = session.get(url, headers=header) json_r = session.get(json_url, headers=header) print(json_r) > Response [403]

Вопрос задан более трёх лет назад
10163 просмотра

Комментировать

Решения вопроса 2

Retard Soft Inc.

Виной 403 коду является cloudflare.
Для обхода мне помог cfscrape

def get_session(): session = requests.Session() session.headers = < 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0', 'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept-Language':'ru,en-US;q=0.5', 'Accept-Encoding':'gzip, deflate, br', 'DNT':'1', 'Connection':'keep-alive', 'Upgrade-Insecure-Requests':'1', 'Pragma':'no-cache', 'Cache-Control':'no-cache'>return cfscrape.create_scraper(sess=session) session = get_session() # Дальше работать как с обычной requests.Session

Немного кода о выдёргивании прямых ссылок на хайрес пикчи:

import requests import cfscrape def get_session(): session = requests.Session() session.headers = < 'Host':'www.artstation.com', 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0', 'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept-Language':'ru,en-US;q=0.5', 'Accept-Encoding':'gzip, deflate, br', 'DNT':'1', 'Connection':'keep-alive', 'Upgrade-Insecure-Requests':'1', 'Pragma':'no-cache', 'Cache-Control':'no-cache'>return cfscrape.create_scraper(sess=session) def artstation(): url = 'https://www.artstation.com/kyuyongeom' page_url = 'https://www.artstation.com/users/kyuyongeom/projects.json' post_pattern = 'https://www.artstation.com/projects/<>.json' session = get_session() absolute_links = [] response = session.get(page_url, params=).json() pages, modulo = divmod(response['total_count'], 50) if modulo: pages += 1 for page in range(1, pages+1): if page != 1: response = session.get(page_url, params=).json() for post in response['data']: shortcode = post['permalink'].split('/')[-1] inner_resp = session.get(post_pattern.format(shortcode)).json() for img in inner_resp['assets']: if img['asset_type'] == 'image': absolute_links.append(img['image_url']) with open('links.txt', 'w') as file: file.write('\n'.join(absolute_links)) if __name__ == '__main__': artstation()

Ответ написан более трёх лет назад

Нравится 3 2 комментария

How To Solve 403 Forbidden Errors When Web Scraping

Getting a HTTP 403 Forbidden Error when web scraping or crawling is one of the most common HTTP errors you will get.

Often there are only two possible causes:

The URL you are trying to scrape is forbidden, and you need to be authorised to access it.
The website detects that you are scraper and returns a 403 Forbidden HTTP Status Code as a ban page.

Most of the time it is the second cause, i.e. the website is blocking your requests because it thinks you are a scraper.

403 Forbidden Errors are common when you are trying to scrape websites protected by Cloudflare, as Cloudflare returns a 403 status code.

In this guide we will walk you through how to debug 403 Forbidden Error and provide solutions that you can implement.

Easy Way To Solve 403 Forbidden Errors When Web Scraping
Use Fake User Agents
Optimize Request Headers
Use Rotating Proxies

Easy Way To Solve 403 Forbidden Errors When Web Scraping

If the URL you are trying to scrape is normally accessible, but you are getting 403 Forbidden Errors then it is likely that the website is flagging your spider as a scraper and blocking your requests.

To avoid getting detected we need to optimise our spiders to bypass anti-bot countermeasures by:

Using Fake User Agents
Optimizing Request Headers
Using Proxies

We will discuss these below, however, the easiest way to fix this problem is to use a smart proxy solution like the ScrapeOps Proxy Aggregator.

ScrapeOps Proxy Aggregator

With the ScrapeOps Proxy Aggregator you simply need to send your requests to the ScrapeOps proxy endpoint and our Proxy Aggregator will optimise your request with the best user-agent, header and proxy configuration to ensure you don’t get 403 errors from your target website.

Simply get your free API key by signing up for a free account here and edit your scraper as follows:

  import requests API_KEY = 'YOUR_API_KEY'  def get_scrapeops_url(url):  payload = 'api_key': API_KEY, 'url': url>  proxy_url = 'https://proxy.scrapeops.io/v1/?' + urlencode(payload) return proxy_url r = requests.get(get_scrapeops_url('http://quotes.toscrape.com/page/1/')) print(r.text)

If you are getting blocked by Cloudflare, then you can simply activate ScrapeOps’ Cloudflare Bypass by adding bypass=cloudflare to the request:

  import requests API_KEY = 'YOUR_API_KEY'  def get_scrapeops_url(url):  payload = 'api_key': API_KEY, 'url': url, 'bypass': 'cloudflare'>  proxy_url = 'https://proxy.scrapeops.io/v1/?' + urlencode(payload) return proxy_url r = requests.get(get_scrapeops_url('http://example.com/')) print(r.text)

Or if you would prefer to try to optimize your user-agent, headers and proxy configuration yourself then read on and we will explain how to do it.

Use Fake User Agents

The most common reason for a website to block a web scraper and return a 403 error is because you is telling the website you are a scraper in the user-agents you send to the website when making your requests.

By default, most HTTP libraries (Python Requests, Scrapy, NodeJs Axios, etc.) either don’t attach real browser headers to your requests or include headers that identify the library that is being used. Both of which immediately tell the website you are trying to scrape that you are scraper, not a real user.

For example, let’s send a request to http://httpbin.org/headers with the Python Requests library using the default setting:

  import requests r = requests.get('http://httpbin.org/headers') print(r.text)

You will get a response like this that shows what headers we sent to the website:

   "headers":  "Accept": "*/*",  "Accept-Encoding": "gzip, deflate",  "Host": "httpbin.org",  "User-Agent": "python-requests/2.26.0",  > >

Here we can see that our request using the Python Requests libary appends very few headers to the request, and even identifies itself as the python requests library in the User-Agent header.

  "User-Agent": "python-requests/2.26.0",

This tells the website that your requests are coming from a scraper, so it is very easy for them to block your requests and return a 403 status code.

Solution

The solution to this problem is to configure your scraper to send a fake user-agent with every request. This way it is harder for the website to tell if your requests are coming from a scraper or a real user.

Here is how you would send a fake user agent when making a request with Python Requests.

  import requests HEADERS = 'User-Agent': 'Mozilla/5.0 (iPad; CPU OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148'> r = requests.get('http://quotes.toscrape.com/page/1/', headers=HEADERS) print(r.text)

Here we are making our request look like it is coming from a iPad, which will increase the chances of the request getting through.

This will only work on relatively small scrapes, as if you use the same user-agent on every single request then a website with a more sophisticated anti-bot solution could easily still detect your scraper.

To solve when scraping at scale, we need to maintain a large list of user-agents and pick a different one for each request.

  import requests import random user_agents_list = [ 'Mozilla/5.0 (iPad; CPU OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.83 Safari/537.36', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36' ] r = requests.get('http://quotes.toscrape.com/page/1/', headers='User-Agent': random.choice(user_agents_list)>) print(r.text)

Now, when we make the request. We will pick a random user-agent for each request.

Optimize Request Headers

In a lot of cases, just adding fake user-agents to your requests will solve the 403 Forbidden Error, however, if the website is has a more sophisticated anti-bot detection system in place you will also need to optimize the request headers.

By default, most HTTP clients will only send basic request headers along with your requests such as Accept , Accept-Language , and User-Agent .

 Accept: 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' Accept-Language: 'en' User-Agent: 'python-requests/2.26.0'

In contrast, here are the request headers a Chrome browser running on a MacOS machine would send:

 Connection: 'keep-alive' Cache-Control: 'max-age=0' sec-ch-ua: '" Not A;Brand";v="99", "Chromium";v="99", "Google Chrome";v="99"' sec-ch-ua-mobile: '?0' sec-ch-ua-platform: "macOS" Upgrade-Insecure-Requests: 1 User-Agent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.83 Safari/537.36' Accept: 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9' Sec-Fetch-Site: 'none' Sec-Fetch-Mode: 'navigate' Sec-Fetch-User: '?1' Sec-Fetch-Dest: 'document' Accept-Encoding: 'gzip, deflate, br' Accept-Language: 'en-GB,en-US;q=0.9,en;q=0.8'

If the website is really trying to prevent web scrapers from accessing their content, then they will be analysing the request headers to make sure that the other headers match the user-agent you set, and that the request includes other common headers a real browser would send.

Solution

To solve this, we need to make sure we optimize the request headers, including making sure the fake user-agent is consistent with the other headers.

This is a big topic, so if you would like to learn more about header optimization then check out our guide to header optimization.

However, to summarize, we don’t just want to send a fake user-agent when making a request but the full set of headers web browsers normally send when visiting websites.

Here is a quick example of adding optimized headers to our requests:

  import requests HEADERS =   "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:98.0) Gecko/20100101 Firefox/98.0", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8", "Accept-Language": "en-US,en;q=0.5", "Accept-Encoding": "gzip, deflate", "Connection": "keep-alive", "Upgrade-Insecure-Requests": "1", "Sec-Fetch-Dest": "document", "Sec-Fetch-Mode": "navigate", "Sec-Fetch-Site": "none", "Sec-Fetch-User": "?1", "Cache-Control": "max-age=0", > r = requests.get('http://quotes.toscrape.com/page/1/', headers=HEADERS) print(r.text)

Here we are adding the same optimized header with a fake user-agent to every request. However, when scraping at scale you will need a list of these optimized headers and rotate through them.

Use Rotating Proxies

If the above solutions don’t work then it is highly likely that the server has flagged your IP address as being used by a scraper and is either throttling your requests or completely blocking them.

This is especially likely if you are scraping at larger volumes, as it is easy for websites to detect scrapers if they are getting an unnaturally large amount of requests from the same IP address.

Solution

You will need to send your requests through a rotating proxy pool.

Here is how you could do it Python Requests:

  import requests from itertools import cycle list_proxy = [ 'http://Username:Password@IP1:20000', 'http://Username:Password@IP2:20000', 'http://Username:Password@IP3:20000', 'http://Username:Password@IP4:20000', ] proxy_cycle = cycle(list_proxy) proxy = next(proxy_cycle)  for i in range(1, 10):  proxy = next(proxy_cycle) print(proxy)  proxies =   "http": proxy, "https":proxy >  r = requests.get(url='http://quotes.toscrape.com/page/1/', proxies=proxies) print(r.text)

Now, your request will be routed through a different proxy with each request.

You will also need to incorporate the rotating user-agents we showed previous as otherwise, even when we use a proxy we will still be telling the website that our requests are from a scraper, not a real user.

If you need help finding the best & cheapest proxies for your particular use case then check out our proxy comparison tool here.

Alternatively, you could just use the ScrapeOps Proxy Aggregator as we discussed previously.

Response 403 python requests как исправить

python requests 403 error

Python 3.x. HTTP Error 403: Forbidden (даже с «User-agent»)

2 ответа 2

Почему парсер выдаёт 403 даже после указания Cookie и User-Agent?

How To Solve 403 Forbidden Errors When Web Scraping

Easy Way To Solve 403 Forbidden Errors When Web Scraping

Use Fake User Agents

Solution

Optimize Request Headers

Solution

Use Rotating Proxies

Solution

More Web Scraping Tutorials

Добавить комментарий Отменить ответ

Response 403 python requests как исправить

python requests 403 error

Python 3.x. HTTP Error 403: Forbidden (даже с «User-agent»)

2 ответа 2

Почему парсер выдаёт 403 даже после указания Cookie и User-Agent?

How To Solve 403 Forbidden Errors When Web Scraping

Easy Way To Solve 403 Forbidden Errors When Web Scraping​

Use Fake User Agents​

Solution​

Optimize Request Headers​

Solution​

Use Rotating Proxies​

Solution​

More Web Scraping Tutorials​

Добавить комментарий Отменить ответ

Easy Way To Solve 403 Forbidden Errors When Web Scraping

Use Fake User Agents

Solution

Optimize Request Headers

Solution

Use Rotating Proxies

Solution

More Web Scraping Tutorials