api-server
is a JSON RESTful API server implemented in
Go.
It serves data collected and processed by various DevMine projects (crawld, featscomp, repotool…).
All data is sent and received as JSON.
Timestamps use the ISO 8601 format:
YYYY-MM-DDThh:mm:ssTZD (eg 2015-01-09T14:19:47+01:00)
Blank fields are included as null
.
Only GET requests are answered.
Sending invalid JSON will result in a 400 Bad Request
response:
GET /search/{"foo":1
Response
{
"message": "invalid JSON input"
}
400 Bad Request
response may be returned when sending valid JSON that the
server does not know how to process. Example:
GET /search/{"foo":1}
Response
{
"message": "non existing feature: foo"
}
Parameters not specified as a segment in the path can be passed as an HTTP query string parameter.
Requests that return multiple results limit to 30 items by default.
However, up to 100 items can be returned by specifying the ?per_page
parameter. Example:
GET /users?per_page=42
Since not all resources are shown on a page, further items may be queried
by specifying the ?since
parameter, which corresponds to an item ID:
GET /users?per_page=42&since=3747
All requests receive the version 1 of the API. You can verify which version of
the API the server is serving by visiting the route /
:
GET /
Response
{
"version": 1,
"doc_url": "http://devmine.ch/doc/api-server"
}
Users related resources are served under the /users
routes.
The /users
route provides a dump of all the users, sorted by user IDs.
As the number of results is limited, you can specify from which user ID you
would like to list the users with the ?since
parameter.
GET /users
You can get a single user by querying the /users/:username
route.
GET /users/Rolinh
Response
{
"id": 38769,
"username": "Rolinh",
"name": "Robin Hahling",
"email": "robin.hahling@gw-computing.net",
"gh_user": {
"id": 38769,
"github_id": 1324157,
"login": "Rolinh",
"bio": null,
"blog": "http://projects.gw-computing.net",
"company": "HGdev",
"email": "robin.hahling@gw-computing.net",
"hireable": false,
"location": "Switzerland",
"avatar_url": "https://avatars.githubusercontent.com/u/1324157?v=3",
"html_url": "https://github.com/Rolinh",
"followers_count": 8,
"following_count": 19,
"collaborators_count": null,
"created_at": "2012-01-12T09:37:19+01:00",
"updated_at": "2015-01-09T18:36:56+01:00",
"gh_organizations": [
{
"id": 2522,
"github_id": 6969061,
"login": "DevMine",
"avatar_url": "https://avatars.githubusercontent.com/u/6969061?v=3",
"html_url": "https://github.com/DevMine",
"name": "DevMine",
"company": null,
"blog": "http://devmine.ch/",
"location": "Around the world",
"email": null,
"collaborators_count": null,
"created_at": "2014-03-16T22:07:05+01:00",
"updated_at": "2015-01-09T21:51:06+01:00"
}
]
}
}
You can get all the commits of a user by querying the /users/:username/commits
route. As there is potentially a lot of commits, results are paginated.
GET /users/Rolinh/commits
Response
[
{
"id": 1375919,
"repository": {
"id": 93271,
"name": "crawld",
"primary_language": "Go",
"clone_url": "https://github.com/DevMine/crawld.git",
"clone_path": "go/devmine/crawld",
"vcs": "git"
},
"message": "crawld: Make sure we finish writing logs before exiting.\n\nLog output is buffered and written periodically using flush (around\nevery 10 seconds).\n",
"author": {
"id": 46138,
"username": "Rolinh",
"name": "Robin Hahling",
"email": "robin.hahling@gw-computing.net"
},
"committer": {
"id": 46138,
"username": "Rolinh",
"name": "Robin Hahling",
"email": "robin.hahling@gw-computing.net"
},
"author_date": "2015-01-08T00:31:22+01:00",
"commit_date": "2015-01-08T00:31:22+01:00",
"file_changed_count": 1,
"insertions_count": 3,
"deletions_count": 0
},
...
]
You can get the repositories associated to a user by querying the
/users/:username/repositories
route.
GET /users/Rolinh/repositories
Response
[
{
"id": 76947,
"name": "crawld",
"primary_language": "Go",
"clone_url": "https://github.com/DevMine/crawld.git",
"clone_path": "go/devmine/crawld",
"vcs": "git",
"gh_repository": {
"id": 76941,
"github_id": 28636035,
"full_name": "DevMine/crawld",
"description": "A data crawler and repository fetcher",
"homepage": "http://devmine.ch/doc/crawld/",
"fork": false,
"default_branch": "master",
"master_branch": null,
"html_url": "https://github.com/DevMine/crawld",
"forks_count": 0,
"open_issues_count": 1,
"stargazers_count": 0,
"subscribers_count": 3,
"watchers_count": 0,
"size_in_kb": 260,
"created_at": "2014-12-30T16:44:02+01:00",
"updated_at": "2015-01-09T18:37:41+01:00",
"pushed_at": "2015-01-09T16:57:28+01:00"
}
},
...
]
You can get the features scores of a user by querying the
/users/:username/scores
route.
GET /users/Rolinh/scores
Response
{
"contributions_count": 0.48484848484848486,
"followers_count": 0.02478026651545222,
"forks_avg": 0.27997405412506565,
"hireable": 1,
"stars_avg": 0.31201177610713027
}
Repositories related resources are served under the /repositories
routes.
The /repositories
route provides a dump of all the repositories, sorted by
repositories IDs.
As the number of results limited, you can specify from which repository ID you
would like to list the repositories with the ?since
parameter.
GET /repositories
You can get repositories by name using the /repositories/:name
route.
Note that several repositories may have the same name. Hence, a list of
repositories is returned.
GET /repositories/crawld
Response
[
{
"id": 76947,
"name": "crawld",
"primary_language": "Go",
"clone_url": "https://github.com/DevMine/crawld.git",
"clone_path": "go/devmine/crawld",
"vcs": "git",
"gh_repository": {
"id": 76941,
"github_id": 28636035,
"full_name": "DevMine/crawld",
"description": "A data crawler and repository fetcher",
"homepage": "http://devmine.ch/doc/crawld/",
"fork": false,
"default_branch": "master",
"master_branch": null,
"html_url": "https://github.com/DevMine/crawld",
"forks_count": 0,
"open_issues_count": 1,
"stargazers_count": 0,
"subscribers_count": 3,
"watchers_count": 0,
"size_in_kb": 260,
"created_at": "2014-12-30T16:44:02+01:00",
"updated_at": "2015-01-09T18:37:41+01:00",
"pushed_at": "2015-01-09T16:57:28+01:00"
}
},
...
]
Features related resources are served under the /features
routes.
The /features
route provides a dump of all the features, sorted by
features IDs.
As the number of results limited, you can specify from which feature ID you
would like to list the features with the ?since
parameter.
GET /features
Features are classified into categories. You can get a dump of all features from
a category using the /features/by_category/:name
route.
GET /features/by_category/other
Response
[
{
"id": 13,
"name": "followers_count",
"category": "other",
"default_weight": 1
},
{
"id": 14,
"name": "hireable",
"category": "other",
"default_weight": 1
},
...
]
The /features/:name/scores
route provides a list of users and scores for the
given feature name.
As the number of results limited, you can specify from which user ID you would
like to list the results with the ?since
parameter.
GET /followers_count/scores
Response
[
{
"id": 234,
"username": "austinheap",
"score": 0.00022682166146867
},
{
"id": 235,
"username": "javierprovecho",
"score": 0.00056705415367168
},
...
]
Search queries can be done under the /search/:query
route.
query
is a JSON formatted input object of feature name with their weights.
Example query:
GET /search/{"followers_count":4}
The results is a list of users with their ranks, sorted from higher ranked to lower ranked user according to the query. The search results is limited to the top 1000 ranked users.
Response
[
{
"id": 2290,
"username": "defunkt",
"name": "Chris Wanstrath",
"email": "chris@github.com",
"rank": 4.292859984017694
},
{
"id": 2374,
"username": "mojombo",
"name": "Tom Preston-Werner",
"email": null,
"rank": 4.095068589338459
},
{
"id": 22682,
"username": "paulirish",
"name": "Paul Irish",
"email": "",
"rank": 3.211172019935161
},
...
]
Querying the /stats
route provides some statistics about the items in the
database.
GET /stats
Response
{
"users_count": 59171,
"repositories_count": 121898,
"commits_count": 10331903,
"commit_deltas_count": 94117385,
"features_count": 6,
"gh_users_count": 59170,
"gh_organizations_count": 3985,
"gh_repositories_count": 121896
}
To install the API server, run this command in a terminal, assuming Go is installed:
go get github.com/DevMine/api-server
Or you can download a binary for your platform from the DevMine project’s downloads page.
You also need to setup a PostgreSQL database. And of course, you need to add some data into your database and compute the features (see crawld, features and other DevMine projects for this).
Some matrix computation is done and it uses the BLAS library so you need to have it installed on the server as well.
Copy devmine.conf.sample
to devmine.conf
and edit it according to your
needs. The configuration file has two sections:
Once the configuration file has been adjusted, you are ready to run the API
server (devmine
).
You need to specify the path to the configuration file with the help of the -c
option. Example:
devmine -c devmine.conf
Some command line options are also available, mainly about logging options.
The composition function computes the final ranking of developers according to a given user query. For all features, it retrieves the corresponding pre-computed developer scores from the database, normalizes them by dividing each score by the maximum score for that very feature, and builds a big sparse matrix. In order to decrease the response time, this matrix is pre-computed when the API server is started. The default weights per feature are also fetched from the database and the weights are increased or decreased based on the user query. A column vector is then built from these weights. At this point, it is very important that the columns of the sparse matrix match the rows of the weights vector: it must have both, the same size and the same order. Finally, for computing the final developer rank, the composition function uses a weighted sum model. To do so, it computes the dot product between the sparse matrix of scores and the weights vector.
The scores matrix is loaded by functions from the cache
package and the actual
dot product between the scores matrix and the adjusted features weights vector
is done in the score
package.
The figure below illustrates the process.