Ingest Assembla
Summary:
Assembla is a SaaS with support for the Perforce, Apache Subversion (SVN), and Git
VCSes as well as other aspects of development like CI. There are groups called "spaces" and each group can have multiple repositories of different types. The service does not offer a way to enumerate public spaces, but it is possible to scrape search engines to find an initial list of spaces and user profiles and then scrape those pages to find more spaces and the individual public repos in each of the spaces. Some user profiles link to the relevant spaces from their bio and others link to them from a dedicated panel on the right. There are two ways to get a list of users from a space; the members of the space's team, or the users who committed to each VCS repo recently. Here are some examples of how to achieve the scraping:
This command downloads a user profile and extracts the list of public spaces from it:
curl -s https://app.assembla.com/profile/shestkov.vasily |
pup 'a attr{href}' |
sed -n s_^/spaces/_https://app.assembla.com/spaces/_p
This command downloads a space and extracts the list of VCS repositories from it:
curl -H 'Accept: text/html' -sL https://app.assembla.com/spaces/harneetsingh-assembla-oss-fork |
pup 'script#new-sidebar-manager text{}' |
jq -r '
.space.repos[].urls[] |
select(.type=="repository" and .name == "Source") |
.url
'
This command extracts the profiles of people who are on the space's team:
curl https://app.assembla.com/spaces/harneetsingh-assembla-oss-fork/team |
pup 'a attr{href}' |
sed -n 's@^/user/popup_profile/@https://app.assembla.com&@p'
This command extracts the list of profiles of people who have committed to a repository recently:
curl -s -H 'Accept: application/json' -H 'X-Requested-With: XMLHttpRequest' 'https://app.assembla.com/spaces/harneetsingh-assembla-oss-fork/git/source/sub_nodes/master/?directory=true' |
jq -r .[].author.url |
sort -u | sed s_^/_https://app.assembla.com/_
This command downloads the JSON containing the VCS repository clone URLs for each source repo:
curl -H 'Accept: application/json' -H 'X-Requested-With: XMLHttpRequest' https://app.assembla.com/spaces/SimpleMesh/git/source
There is also a developer API but it requires accounts and authorisation.
A list of app.assembla.com seed URLs from Bing.
A validated list of repos on git.assembla.com found on web pages indexed by Google and Bing.
Plan:
-
Implement lister -
Implement loader -
docker: Run lister -
docker: Run loader -
Document lister -
Document loader -
Deploy on staging -
Call for public review -
Deploy on production