URL Components

We parse a URL into domain, port, path and query arguments.

We're guided by the grammar as reported by Jon Crowcroft of Cambridge. site

url = << ( local | httpaddr ) >> local = << '/'? '/'? path query? >> &double-quote ip = << [0-9]+ '.' [0-9]+ '.' [0-9]+ '.' [0-9]+ >> protocol = <<< 'http' 's'? >>> port = << [0-9]+ >> query = << '?' (!double-quote .)+ >> httpaddr = << protocol '://' ( ip | domain-name ) (':' port)? '/' path query? >>

We get specific about select parts of the url.

domain-name = << ( familiar-domain-name | other-domain-name ) >> other-domain-name = << domain ('.' domain)* >> domain = << [a-zA-Z0-9-]+ >>

Refinement

Continue matching familiar-domain-names.

Familiar Script Repositories such as google.com

path = << file ('/' file)* >> file = << ( familiar-file | other-file ) other-file = << ('-' | '_' | [a-zA-Z0-9.])+ >>

Continue matching familiar-file path components.

Familiar Script Path Elements for common script libraries.