Ingest HGKeeper
Summary:
A lister is needed for HGKeeper, a web frontend for Mercurial/hg repositories.
HGKeeper sites have HGKeeper
as the default page title, but it can be customised by the theme config so not every site will use that.
HGKeeper sites can be listed by downloading and parsing the front page, then downloading/parsing group pages, then downloading/parsing repo pages, for example using this shell snippet:
for site ; do
for group in $(
curl -sL $site | pup 'h4.mr-auto a attr{href}' | sed "s=^/=$site="
) ; do
for repo in $(
curl -sL $group | pup 'h4.mr-auto a attr{href}' | sed "s=^/=$site="
) ; do
curl -sL $repo | pup 'input.hgk-repository-clone-url attr{value}' | sed 's/hg clone //'
done
done
done
The front page has the group URLs in the h4.mr-auto a
href
attribute. The group pages have the repo page URLs in the h4.mr-auto a
href
attribute. The repo pages have the hg repository URLs in the input.hgk-repository-clone-url
value
attribute, prefixed with the hg clone
command.
According to Google and Bing, there is only one site using HGKeeper:
https://keep.imfreedom.org/grim/hgkeeper/
Plan:
-
Implement lister -
Implement loader -
docker: Run lister -
docker: Run loader -
Document lister -
Document loader -
Deploy on staging -
Call for public review -
Deploy on production