🌐 URL parsing and manipulation made easy.

2654
156
Python

furl

furl is a small Python library that makes parsing and
manipulating URLs easy.

Python’s standard
urllib and
urlparse modules
provide a number of URL related functions, but using these functions to
perform common URL operations proves tedious. Furl makes parsing and
manipulating URLs easy.

Furl is well tested, Unlicensed in the public
domain, and supports Python 3 and PyPy3.

👥 Furl is looking for a lead contributor and maintainer. Would you love
to lead furl, and making working with URLs a joy for everyone in Python?
Please reach out and let me know! 🙌

Code time: Paths and query arguments are easy. Really easy.

>>> from furl import furl
>>> f = furl('http://www.google.com/?one=1&two=2')
>>> f /= 'path'
>>> del f.args['one']
>>> f.args['three'] = '3'
>>> f.url
'http://www.google.com/path?two=2&three=3'

Or use furl’s inline modification methods.

>>> furl('http://www.google.com/?one=1').add({'two':'2'}).url
'http://www.google.com/?one=1&two=2'

>>> furl('http://www.google.com/?one=1&two=2').set({'three':'3'}).url
'http://www.google.com/?three=3'

>>> furl('http://www.google.com/?one=1&two=2').remove(['one']).url
'http://www.google.com/?two=2'

Encoding is handled for you. Unicode, too.

>>> f = furl('http://www.google.com/')
>>> f.path = 'some encoding here'
>>> f.args['and some encoding'] = 'here, too'
>>> f.url
'http://www.google.com/some%20encoding%20here?and+some+encoding=here,+too'
>>> f.set(host=u'ドメイン.テスト', path=u'джк', query=u'☃=☺')
>>> f.url
'http://xn--eckwd4c7c.xn--zckzah/%D0%B4%D0%B6%D0%BA?%E2%98%83=%E2%98%BA'

Fragments also have a path and a query.

>>> f = furl('http://www.google.com/')
>>> f.fragment.path.segments = ['two', 'directories']
>>> f.fragment.args = {'one': 'argument'}
>>> f.url
'http://www.google.com/#two/directories?one=argument'

Installation

Installing furl with pip is easy.

$ pip install furl

API

Basics

furl objects let you access and modify the various components of a URL.

scheme://username:password@host:port/path?query#fragment
  • scheme is the scheme string (all lowercase) or None. None means no
    scheme. An empty string means a protocol relative URL, like
    //www.google.com.
  • username is the username string for authentication.
  • password is the password string for authentication with username.
  • host is the domain name, IPv4, or IPv6 address as a string. Domain names
    are all lowercase.
  • port is an integer or None. A value of None means no port specified and
    the default port for the given scheme should be inferred, if possible
    (e.g. port 80 for the scheme http).
  • path is a Path object comprised of path segments.
  • query is a Query object comprised of key:value query arguments.
  • fragment is a Fragment object comprised of a Path object and Query object
    separated by an optional ? separator.

Scheme, Username, Password, Host, Port, Network Location, and Origin

scheme, username, password, and host are strings or
None. port is an integer or None.

>>> f = furl('http://user:[email protected]:99/')
>>> f.scheme, f.username, f.password, f.host, f.port
('http', 'user', 'pass', 'www.google.com', 99)

furl infers the default port for common schemes.

>>> f = furl('https://secure.google.com/')
>>> f.port
443

>>> f = furl('unknown://www.google.com/')
>>> print(f.port)
None

netloc is the string combination of username, password, host,
and port, not including port if it’s None or the default port for the
provided scheme.

>>> furl('http://www.google.com/').netloc
'www.google.com'

>>> furl('http://www.google.com:99/').netloc
'www.google.com:99'

>>> furl('http://user:[email protected]:99/').netloc
'user:[email protected]:99'

origin is the string combination of scheme, host, and port, not
including port if it’s None or the default port for the provided scheme.

>>> furl('http://www.google.com/').origin
'http://www.google.com'

>>> furl('http://www.google.com:99/').origin
'http://www.google.com:99'

Path

URL paths in furl are Path objects that have segments, a list of zero or
more path segments that can be manipulated directly. Path segments in
segments are percent-decoded and all interaction with segments should
take place with percent-decoded strings.

>>> f = furl('http://www.google.com/a/large%20ish/path')
>>> f.path
Path('/a/large ish/path')
>>> f.path.segments
['a', 'large ish', 'path']
>>> str(f.path)
'/a/large%20ish/path'

Manipulation

>>> f.path.segments = ['a', 'new', 'path', '']
>>> str(f.path)
'/a/new/path/'

>>> f.path = 'o/hi/there/with%20some%20encoding/'
>>> f.path.segments
['o', 'hi', 'there', 'with some encoding', '']
>>> str(f.path)
'/o/hi/there/with%20some%20encoding/'

>>> f.url
'http://www.google.com/o/hi/there/with%20some%20encoding/'

>>> f.path.segments = ['segments', 'are', 'maintained', 'decoded', '^`<>[]"#/?']
>>> str(f.path)
'/segments/are/maintained/decoded/%5E%60%3C%3E%5B%5D%22%23%2F%3F'

A path that starts with / is considered absolute, and a Path can be absolute
or not as specified (or set) by the boolean attribute isabsolute. URL Paths
have a special restriction: they must be absolute if a netloc (username,
password, host, and/or port) is present. This restriction exists because a URL
path must start with / to separate itself from the netloc, if
present. Fragment Paths have no such limitation and isabsolute and can be
True or False without restriction.

Here’s a URL Path example that illustrates how isabsolute becomes True and
read-only in the presence of a netloc.

>>> f = furl('/url/path')
>>> f.path.isabsolute
True
>>> f.path.isabsolute = False
>>> f.url
'url/path'
>>> f.host = 'blaps.ru'
>>> f.url
'blaps.ru/url/path'
>>> f.path.isabsolute
True
>>> f.path.isabsolute = False
Traceback (most recent call last):
  ...
AttributeError: Path.isabsolute is True and read-only for URLs with a netloc (a username, password, host, and/or port). URL paths must be absolute if a netloc exists.
>>> f.url
'blaps.ru/url/path'

Conversely, the isabsolute attribute of Fragment Paths isn’t bound by the
same read-only restriction. URL fragments are always prefixed by a # character
and don’t need to be separated from the netloc.

>>> f = furl('http://www.google.com/#/absolute/fragment/path/')
>>> f.fragment.path.isabsolute
True
>>> f.fragment.path.isabsolute = False
>>> f.url
'http://www.google.com/#absolute/fragment/path/'
>>> f.fragment.path.isabsolute = True
>>> f.url
'http://www.google.com/#/absolute/fragment/path/'

A path that ends with / is considered a directory, and otherwise considered a
file. The Path attribute isdir returns True if the path is a directory,
False otherwise. Conversely, the attribute isfile returns True if the path
is a file, False otherwise.

>>> f = furl('http://www.google.com/a/directory/')
>>> f.path.isdir
True
>>> f.path.isfile
False

>>> f = furl('http://www.google.com/a/file')
>>> f.path.isdir
False
>>> f.path.isfile
True

A path can be normalized with normalize(), and normalize() returns the
Path object for method chaining.

>>> f = furl('http://www.google.com////a/./b/lolsup/../c/')
>>> f.path.normalize()
>>> f.url
'http://www.google.com/a/b/c/'

Path segments can also be appended with the slash operator, like with
pathlib.Path.

>>> from __future__ import division  # For Python 2.x.
>>>
>>> f = furl('path')
>>> f.path /= 'with'
>>> f.path = f.path / 'more' / 'path segments/'
>>> f.url
'/path/with/more/path%20segments/'

For a dictionary representation of a path, use asdict().

>>> f = furl('http://www.google.com/some/enc%20oding')
>>> f.path.asdict()
{ 'encoded': '/some/enc%20oding',
  'isabsolute': True,
  'isdir': False,
  'isfile': True,
  'segments': ['some', 'enc oding'] }

Query

URL queries in furl are Query objects that have params, a one dimensional
ordered multivalue dictionary of
query keys and values. Query keys and values in params are percent-decoded
and all interaction with params should take place with percent-decoded
strings.

>>> f = furl('http://www.google.com/?one=1&two=2')
>>> f.query
Query('one=1&two=2')
>>> f.query.params
omdict1D([('one', '1'), ('two', '2')])
>>> str(f.query)
'one=1&two=2'

furl objects and Fragment objects (covered below) contain a Query object, and
args is provided as a shortcut on these objects to access query.params.

>>> f = furl('http://www.google.com/?one=1&two=2')
>>> f.query.params
omdict1D([('one', '1'), ('two', '2')])
>>> f.args
omdict1D([('one', '1'), ('two', '2')])
>>> f.args is f.query.params
True

Manipulation

params is a one dimensional
ordered multivalue dictionary that
maintains method parity with Python’s standard dictionary.

>>> f.query = 'silicon=14&iron=26&inexorable%20progress=vae%20victus'
>>> f.query.params
omdict1D([('silicon', '14'), ('iron', '26'), ('inexorable progress', 'vae victus')])
>>> del f.args['inexorable progress']
>>> f.args['magnesium'] = '12'
>>> f.args
omdict1D([('silicon', '14'), ('iron', '26'), ('magnesium', '12')])

params can also store multiple values for the same key because it’s a
multivalue dictionary.

>>> f = furl('http://www.google.com/?space=jams&space=slams')
>>> f.args['space']
'jams'
>>> f.args.getlist('space')
['jams', 'slams']
>>> f.args.addlist('repeated', ['1', '2', '3'])
>>> str(f.query)
'space=jams&space=slams&repeated=1&repeated=2&repeated=3'
>>> f.args.popvalue('space')
'slams'
>>> f.args.popvalue('repeated', '2')
'2'
>>> str(f.query)
'space=jams&repeated=1&repeated=3'

params is one dimensional. If a list of values is provided as a query value,
that list is interpreted as multiple values.

>>> f = furl()
>>> f.args['repeated'] = ['1', '2', '3']
>>> f.add(args={'space':['jams', 'slams']})
>>> str(f.query)
'repeated=1&repeated=2&repeated=3&space=jams&space=slams'

This makes sense: URL queries are inherently one dimensional – query values
can’t have native subvalues.

See the orderedmultimdict
documentation for more information on interacting with the ordered multivalue
dictionary params.

Parameters

To produce an empty query argument, like http://sprop.su/?param=, set the
argument’s value to the empty string.

>>> f = furl('http://sprop.su')
>>> f.args['param'] = ''
>>> f.url
'http://sprop.su/?param='

To produce an empty query argument without a trailing =, use None as the
parameter value.

>>> f = furl('http://sprop.su')
>>> f.args['param'] = None
>>> f.url
'http://sprop.su/?param'

encode(delimiter=‘&’, quote_plus=True, dont_quote=‘’) can be used to encode
query strings with delimiters like ;, encode spaces as + instead of %20
(i.e. application/x-www-form-urlencoded encoded), or avoid percent-encoding
valid query characters entirely (valid query characters are
/?:@-._~!$&'()*+,;=).

>>> f.query = 'space=jams&woofs=squeeze+dog'
>>> f.query.encode()
'space=jams&woofs=squeeze+dog'
>>> f.query.encode(';')
'space=jams;woofs=squeeze+dog'
>>> f.query.encode(quote_plus=False)
'space=jams&woofs=squeeze%20dog'

dont_quote accepts True, False, or a string of valid query characters to
not percent-enode. If True, all valid query characters /?:@-._~!$&'()*+,;=
aren’t percent-encoded.

>>> f.query = 'one,two/three'
>>> f.query.encode()
'one%2Ctwo%2Fthree'
>>> f.query.encode(dont_quote=True)
'one,two/three'
>>> f.query.encode(dont_quote=',')
'one,two%2Fthree'

For a dictionary representation of a query, use asdict().

>>> f = furl('http://www.google.com/?space=ja+ms&space=slams')
>>> f.query.asdict()
{ 'encoded': 'space=ja+ms&space=slams',
  'params': [('space', 'ja ms'),
             ('space', 'slams')] }

Fragment

URL fragments in furl are Fragment objects that have a Path path and Query
query separated by an optional ? separator.

>>> f = furl('http://www.google.com/#/fragment/path?with=params')
>>> f.fragment
Fragment('/fragment/path?with=params')
>>> f.fragment.path
Path('/fragment/path')
>>> f.fragment.query
Query('with=params')
>>> f.fragment.separator
True

Manipulation of Fragments is done via the Fragment’s Path and Query instances,
path and query.

>>> f = furl('http://www.google.com/#/fragment/path?with=params')
>>> str(f.fragment)
'/fragment/path?with=params'
>>> f.fragment.path.segments.append('file.ext')
>>> str(f.fragment)
'/fragment/path/file.ext?with=params'

>>> f = furl('http://www.google.com/#/fragment/path?with=params')
>>> str(f.fragment)
'/fragment/path?with=params'
>>> f.fragment.args['new'] = 'yep'
>>> str(f.fragment)
'/fragment/path?new=yep&with=params'

Creating hash-bang fragments with furl illustrates the use of Fragment’s boolean
attribute separator. When separator is False, the ? that separates
path and query isn’t included.

>>> f = furl('http://www.google.com/')
>>> f.fragment.path = '!'
>>> f.fragment.args = {'a':'dict', 'of':'args'}
>>> f.fragment.separator
True
>>> str(f.fragment)
'!?a=dict&of=args'

>>> f.fragment.separator = False
>>> str(f.fragment)
'!a=dict&of=args'
>>> f.url
'http://www.google.com/#!a=dict&of=args'

For a dictionary representation of a fragment, use asdict().

>>> f = furl('http://www.google.com/#path?args=args')
>>> f.fragment.asdict()
{ 'encoded': 'path?args=args',
  'separator': True,
  'path': { 'encoded': 'path',
            'isabsolute': False,
            'isdir': False,
            'isfile': True,
            'segments': ['path']},
  'query': { 'encoded': 'args=args',
             'params': [('args', 'args')]} }

Encoding

Furl handles encoding for you, and furl’s philosophy on encoding is simple: raw
URL strings should always be percent-encoded.

>>> f = furl()
>>> f.netloc = '%40user:%[email protected]'
>>> f.username, f.password
'@user', ':pass'

>>> f = furl()
>>> f.path = 'supply%20percent%20encoded/path%20strings'
>>> f.path.segments
['supply percent encoded', 'path strings']

>>> f.set(query='supply+percent+encoded=query+strings,+too')
>>> f.query.params
omdict1D([('supply percent encoded', 'query strings, too')])

>>> f.set(fragment='percent%20encoded%20path?and+percent+encoded=query+too')
>>> f.fragment.path.segments
['percent encoded path']
>>> f.fragment.args
omdict1D([('and percent encoded', 'query too')])

Raw, non-URL strings should never be percent-encoded.

>>> f = furl('http://google.com')
>>> f.set(username='@prap', password=':porps')
>>> f.url
'http://%40prap:%[email protected]'

>>> f = furl()
>>> f.set(path=['path segments are', 'decoded', '<>[]"#'])
>>> str(f.path)
'/path%20segments%20are/decoded/%3C%3E%5B%5D%22%23'

>>> f.set(args={'query parameters':'and values', 'are':'decoded, too'})
>>> str(f.query)
'query+parameters=and+values&are=decoded,+too'

>>> f.fragment.path.segments = ['decoded', 'path segments']
>>> f.fragment.args = {'and decoded':'query parameters and values'}
>>> str(f.fragment)
'decoded/path%20segments?and+decoded=query+parameters+and+values'

Python’s
urllib.quote() and
urllib.unquote()
can be used to percent-encode and percent-decode path strings. Similarly,
urllib.quote_plus()
and
urllib.unquote_plus()
can be used to percent-encode and percent-decode query strings.

Inline manipulation

For quick, single-line URL manipulation, the add(), set(), and
remove() methods of furl objects manipulate various URL components and
return the furl object for method chaining.

>>> url = 'http://www.google.com/#fragment' 
>>> furl(url).add(args={'example':'arg'}).set(port=99).remove(fragment=True).url
'http://www.google.com:99/?example=arg'

add() adds items to a furl object with the optional arguments

  • args: Shortcut for query_params.
  • path: A list of path segments to add to the existing path segments, or a
    path string to join with the existing path string.
  • query_params: A dictionary of query keys and values to add to the query.
  • fragment_path: A list of path segments to add to the existing fragment
    path segments, or a path string to join with the existing fragment path
    string.
  • fragment_args: A dictionary of query keys and values to add to the
    fragment’s query.
>>> f = furl('http://www.google.com/').add(
...   path='/search', fragment_path='frag/path', fragment_args={'frag':'arg'})
>>> f.url
'http://www.google.com/search#frag/path?frag=args'

set() sets items of a furl object with the optional arguments

  • args: Shortcut for query_params.
  • path: List of path segments or a path string to adopt.
  • scheme: Scheme string to adopt.
  • netloc: Network location string to adopt.
  • origin: Origin string to adopt.
  • query: Query string to adopt.
  • query_params: A dictionary of query keys and values to adopt.
  • fragment: Fragment string to adopt.
  • fragment_path: A list of path segments to adopt for the fragment’s path
    or a path string to adopt as the fragment’s path.
  • fragment_args: A dictionary of query keys and values for the fragment’s
    query to adopt.
  • fragment_separator: Boolean whether or not there should be a ?
    separator between the fragment path and the fragment query.
  • host: Host string to adopt.
  • port: Port number to adopt.
  • username: Username string to adopt.
  • password: password string to adopt.
>>> f = furl().set(
...   scheme='https', host='secure.google.com', port=99, path='index.html',
...   args={'some':'args'}, fragment='great job')
>>> f.url
'https://secure.google.com:99/index.html?some=args#great%20job'

remove() removes items from a furl object with the optional arguments

  • args: Shortcut for query_params.
  • path: A list of path segments to remove from the end of the existing path
    segments list, or a path string to remove from the end of the existing
    path string, or True to remove the entire path portion of the URL.
  • query: A list of query keys to remove from the query, if they exist, or
    True to remove the entire query portion of the URL.
  • query_params: A list of query keys to remove from the query, if they
    exist.
  • fragment: If True, remove the entire fragment portion of the URL.
  • fragment_path: A list of path segments to remove from the end of the
    fragment’s path segments, or a path string to remove from the end of the
    fragment’s path string, or True to remove the entire fragment path.
  • fragment_args: A list of query keys to remove from the fragment’s query,
    if they exist.
  • username: If True, remove the username, if it exists.
  • password: If True, remove the password, if it exists.
>>> url = 'https://secure.google.com:99/a/path/?some=args#great job'
>>> furl(url).remove(args=['some'], path='path/', fragment=True, port=True).url
'https://secure.google.com/a/'

Miscellaneous

Like pathlib.Path,
path segments can be appended to a furl object’s Path with the slash operator.

>>> from __future__ import division  # For Python 2.x.
>>> f = furl('http://www.google.com/path?example=arg#frag')
>>> f /= 'add'
>>> f = f / 'seg ments/'
>>> f.url
'http://www.google.com/path/add/seg%20ments/?example=arg#frag'

tostr(query_delimiter=‘&’, query_quote_plus=True, query_dont_quote=‘’)
creates and returns a URL string. query_delimiter, query_quote_plus, and
query_dont_quote are passed unmodified to Query.encode() as delimiter,
quote_plus, and dont_quote respectively.

>>> f = furl('http://spep.ru/?a+b=c+d&two%20tap=cat%20nap%24')
>>> f.tostr()
'http://spep.ru/?a+b=c+d&two+tap=cat+nap$'
>>> f.tostr(query_delimiter=';', query_quote_plus=False)
'http://spep.ru/?a%20b=c%20d;two%20tap=cat%20nap$'
>>> f.tostr(query_dont_quote='$')
'http://spep.ru/?a+b=c+d&two+tap=cat+nap$'

furl.url is a shortcut for furl.tostr().

>>> f.url
'http://spep.ru/?a+b=c+d&two+tap=cat+nap$'
>>> f.url == f.tostr() == str(f)
True

copy() creates and returns a new furl object with an identical URL.

>>> f = furl('http://www.google.com')
>>> f.copy().set(path='/new/path').url
'http://www.google.com/new/path'
>>> f.url
'http://www.google.com'

join() joins the furl object’s URL with the provided relative or absolute
URL and returns the furl object for method chaining. join()'s action is the
same as navigating to the provided URL from the current URL in a web browser.

>>> f = furl('http://www.google.com')
>>> f.join('new/path').url
'http://www.google.com/new/path'
>>> f.join('replaced').url
'http://www.google.com/new/replaced'
>>> f.join('../parent').url
'http://www.google.com/parent'
>>> f.join('path?query=yes#fragment').url
'http://www.google.com/path?query=yes#fragment'
>>> f.join('unknown://www.yahoo.com/new/url/').url
'unknown://www.yahoo.com/new/url/'

For a dictionary representation of a URL, use asdict().

>>> f = furl('https://xn--eckwd4c7c.xn--zckzah/path?args=args#frag')
>>> f.asdict()
{ 'url': 'https://xn--eckwd4c7c.xn--zckzah/path?args=args#frag',
  'scheme': 'https',
  'username': None
  'password': None,
  'host': 'ドメイン.テスト',
  'host_encoded': 'xn--eckwd4c7c.xn--zckzah',
  'port': 443,
  'netloc': 'xn--eckwd4c7c.xn--zckzah',
  'origin': 'https://xn--eckwd4c7c.xn--zckzah',
  'path': { 'encoded': '/path',
            'isabsolute': True,
            'isdir': False,
            'isfile': True,
            'segments': ['path']},
  'query': { 'encoded': 'args=args',
             'params': [('args', 'args')]},
  'fragment': { 'encoded': 'frag',
                'path': { 'encoded': 'frag',
                          'isabsolute': False,
                          'isdir': False,
                          'isfile': True,
                          'segments': ['frag']},
                'query': { 'encoded': '',
                           'params': []},
                'separator': True} }