referer-parser is a multi-language library for extracting marketing attribution data (such as search terms) from referer URLs, inspired by the ua-parser ua-parser project (an equivalent library for user agent parsing).
referer-parser is available in the following languages, each in a sub-folder of this repository:
- [Ruby implementation] ruby-impl
- [Java and Scala implementation] java-scala-impl
- [Python implementation] python-impl
referer-parser is a core component of Snowplow snowplow, the open-source web-scale analytics platform powered by Hadoop, Hive and Redshift.
Note that we always use the original HTTP misspelling of 'referer' (and thus 'referal') in this project - never 'referrer'.
The Java version of this library uses the updated API, and identifies search, social, webmail, internal and unknown referers:
import com.snowplowanalytics.refererparser.Parser;
...
String refererUrl = "http://www.google.com/search?q=gateway+oracle+cards+denise+linn&hl=en&client=safari";
String pageUrl = "http:/www.psychicbazaar.com/shop" // Our current URL
Parser refererParser = new Parser();
Referer r = refererParser.parse(refererUrl, pageUrl);
System.out.println(r.medium); // => "search"
System.out.println(r.source); // => "Google"
System.out.println(r.term); // => "gateway oracle cards denise linn"
For more information, please see the Java/Scala [README] java-scala-readme.
The Scala version of this library uses the updated API, and identifies search, social, webmail, internal and unknown referers:
val refererUrl = "http://www.google.com/search?q=gateway+oracle+cards+denise+linn&hl=en&client=safari"
val pageUrl = "http:/www.psychicbazaar.com/shop" // Our current URL
import com.snowplowanalytics.refererparser.scala.Parser
for (r <- Parser.parse(refererUrl, pageUrl)) {
println(r.medium) // => "search"
for (s <- r.source) {
println(s) // => "Google"
}
for (t <- r.term) {
println(t) // => "gateway oracle cards denise linn"
}
}
For more information, please see the Java/Scala [README] java-scala-readme.
The Ruby version of this library still uses the old API, and identifies search referers only:
require 'referer-parser'
referer_url = 'http://www.google.com/search?q=gateway+oracle+cards+denise+linn&hl=en&client=safari'
r = RefererParser::Referer.new(referer_url)
puts r.known? # => true
puts r.referer # => 'Google'
puts r.search_parameter # => 'q'
puts r.search_term # => 'gateway oracle cards denise linn'
puts r.uri.host # => 'www.google.com'
For more information, please see the Ruby [README] ruby-readme.
The Python version of this library still uses the old API, and identifies search referers only:
from referer_parser import Referer
referer_url = 'http://www.google.com/search?q=gateway+oracle+cards+denise+linn&hl=en&client=safari'
r = Referer(referer_url)
print(r.known) # True
print(r.referer) # 'Google'
print(r.search_parameter) # 'q'
print(r.search_term) # 'gateway oracle cards denise linn'
print(r.uri) # ParseResult(scheme='http', netloc='www.google.com', path='/search', params='', query='q=gateway+oracle+cards+denise+linn&hl=en&client=safari', fragment='')
For more information, please see the Python [README] python-readme.
referer-parser identifies whether a URL is a known referer or not by checking it against the [referers.yml
] referers-yml file; the intention is that this YAML file is reusable as-is by every language-specific implementation of referer-parser.
The file is broken out into sections for the different mediums that we support:
unknown
for when we know the source, but not the mediumemail
for webmail providerssocial
for social media servicessearch
for search engines
Then within each section, we list each known provider (aka source
) by name, and then which domains each provider uses. For search engines, we also list the parameters used in the search engine URL to identify the search term
. For example:
Google: # Name of search engine referer
parameters:
- 'q' # First parameter used by Google
- 'p' # Alternative parameter used by Google
domains:
- google.co.uk # One domain used by Google
- google.com # Another domain used by Google
- ...
The number of referers and the domains they use is constantly growing - we need to keep referers.yml
up-to-date, and hope that the community will help!
We welcome contributions to referer-parser:
- New search engines and other referers - if you notice a search engine, social network or other site missing from
referers.yml
, please fork the repo, add the missing entry and submit a pull request - Ports of referer-parser to other languages - we welcome ports of referer-parser to new programming languages (e.g. JavaScript, PHP, C#, Haskell)
- Bug fixes, feature requests etc - much appreciated!
General support for referer-parser is handled by the team at Snowplow Analytics Ltd.
You can contact the Snowplow Analytics team through any of the [channels listed on their wiki] talk-to-us.
referers.yml
is based on [Piwik's] piwik [SearchEngines.php
] piwik-search-engines and [Socials.php
] piwik-socials, copyright 2012 Matthieu Aubry and available under the [GNU General Public License v3] gpl-license.
The original Ruby code is copyright 2012-2013 Snowplow Analytics Ltd and is available under the [Apache License, Version 2.0] apache-license.
The Java/Scala port is copyright 2012-2013 Snowplow Analytics Ltd and is available under the [Apache License, Version 2.0] apache-license.
The Python port is copyright 2012-2013 [Don Spaulding] donspaulding and is available under the [Apache License, Version 2.0] apache-license.