Syntactic patterns of HTTP user-agents used by bots / robots / crawlers / scrapers / spiders. pull-request welcome :star:
This repository contains a list of of HTTP user-agents used by robots, crawlers, and spiders as in single JSON file.
Download the crawler-user-agents.json
file from this repository directly.
crawler-user-agents is deployed on npmjs.com: https://www.npmjs.com/package/crawler-user-agents
To use it using npm or yarn:
npm install --save crawler-user-agents
# OR
yarn add crawler-user-agents
In Node.js, you can require
the package to get an array of crawler user agents.
const crawlers = require('crawler-user-agents');
console.log(crawlers);
Each pattern
is a regular expression. It should work out-of-the-box wih your favorite regex library:
if (RegExp(entry.pattern).test(req.headers['user-agent']) { ... }
if (preg_match('/'.$entry['pattern'].'/', $_SERVER['HTTP_USER_AGENT'])): ...
if re.search(entry['pattern'], ua): ...
I do welcome additions contributed as pull requests.
The pull requests should:
Example:
{
"pattern": "rogerbot",
"addition_date": "2014/02/28",
"url": "http://moz.com/help/pro/what-is-rogerbot-",
"instances" : ["rogerbot/2.3 example UA"]
}
The list is under a MIT License. The versions prior to Nov 7, 2016 were under a CC-SA license.
There are a few wrapper libraries that use this data to detect bots:
Other systems for spotting robots, crawlers, and spiders that you may want to consider are: