We parse a URL into domain, port, path and query arguments.
We're guided by the grammar as reported by Jon Crowcroft of Cambridge. site
url = << ( local | httpaddr ) >> local = << '/'? '/'? path query? >> &double-quote ip = << [0-9]+ '.' [0-9]+ '.' [0-9]+ '.' [0-9]+ >> protocol = <<< 'http' 's'? >>> port = << [0-9]+ >> query = << '?' (!double-quote .)+ >> httpaddr = << protocol '://' ( ip | domain-name ) (':' port)? '/' path query? >>
We get specific about select parts of the url.
domain-name = << ( familiar-domain-name | other-domain-name ) >> other-domain-name = << domain ('.' domain)* >> domain = << [a-zA-Z0-9-]+ >>
Refinement
Continue matching familiar-domain-names.
Familiar Script Repositories such as google.com
path = << file ('/' file)* >> file = << ( familiar-file | other-file ) other-file = << ('-' | '_' | [a-zA-Z0-9.])+ >>
Continue matching familiar-file path components.
Familiar Script Path Elements for common script libraries.