urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None)
Open the URL url, which can be either a string or a Request
object.
data must be a bytes object specifying additional data to be sent to the server, or None
if no such data is needed. data may also be an iterable object and in that case Content-Length value must be specified in the headers. Currently HTTP requests are the only ones that use data; the HTTP request will be a POST instead of a GET when the data parameter is provided.
data should be a buffer in the standard application/x-www-form-urlencoded format. The urllib.parse.urlencode()
function takes a mapping or sequence of 2-tuples and returns an ASCII text string in this format. It should be encoded to bytes before being used as the data parameter.
urllib.request module uses HTTP/1.1 and includes Connection:close
header in its HTTP requests.
The optional timeout parameter specifies a timeout in seconds for blocking operations like the connection attempt (if not specified, the global default timeout setting will be used). This actually only works for HTTP, HTTPS and FTP connections.
If context is specified, it must be a ssl.SSLContext
instance describing the various SSL options. See HTTPSConnection
for more details.
The optional cafile and capath parameters specify a set of trusted CA certificates for HTTPS requests. cafile should point to a single file containing a bundle of CA certificates, whereas capath should point to a directory of hashed certificate files. More information can be found in ssl.SSLContext.load_verify_locations()
.
The cadefault parameter is ignored.
This function always returns an object which can work as a context manager and has methods such as
-
geturl()
— return the URL of the resource retrieved, commonly used to determine if a redirect was followed -
info()
— return the meta-information of the page, such as headers, in the form of anemail.message_from_string()
instance (see Quick Reference to HTTP Headers) -
getcode()
– return the HTTP status code of the response.
For HTTP and HTTPS URLs, this function returns a http.client.HTTPResponse
object slightly modified. In addition to the three new methods above, the msg attribute contains the same information as the reason
attribute — the reason phrase returned by server — instead of the response headers as it is specified in the documentation for HTTPResponse
.
For FTP, file, and data URLs and requests explicitly handled by legacy URLopener
and FancyURLopener
classes, this function returns a urllib.response.addinfourl
object.
Raises URLError
on protocol errors.
Note that None
may be returned if no handler handles the request (though the default installed global OpenerDirector
uses UnknownHandler
to ensure this never happens).
In addition, if proxy settings are detected (for example, when a *_proxy
environment variable like http_proxy
is set), ProxyHandler
is default installed and makes sure the requests are handled through the proxy.
The legacy urllib.urlopen
function from Python 2.6 and earlier has been discontinued; urllib.request.urlopen()
corresponds to the old urllib2.urlopen
. Proxy handling, which was done by passing a dictionary parameter to urllib.urlopen
, can be obtained by using ProxyHandler
objects.
Changed in version 3.2: cafile and capath were added.
Changed in version 3.2: HTTPS virtual hosts are now supported if possible (that is, if ssl.HAS_SNI
is true).
New in version 3.2: data can be an iterable object.
Changed in version 3.3: cadefault was added.
Changed in version 3.4.3: context was added.
Please login to continue.