launchpad: Allow bzr origins listing
Related to swh-loader-bzr#3945 (closed)
Test Plan
tox
and docker is happy too:
Lister log (with a pprint on the collection it lists)
swh-lister_1 | [2022-02-16 16:33:50,268: INFO/MainProcess] Task swh.lister.launchpad.tasks.FullLaunchpadLister[f3e3f3aa-8f4a-4e2c-8821-facd4952e53e] received
swh-lister_1 | ('git', <lazr.restfulclient.resource.Collection object at 0x7ff52ebef290>)
swh-lister_1 | ('bzr', <lazr.restfulclient.resource.Collection object at 0x7ff52d1a9450>)
...
- [1] scheduler db in docker, it's listing new bzr origins (no bzr prior to the run):
17:58:57 swh-scheduler@localhost:5433=# select now(), count(*) from listed_origins where visit_type='bzr';
+-------------------------------+-------+
| now | count |
+-------------------------------+-------+
| 2022-02-16 16:59:07.267496+00 | 21000 |
+-------------------------------+-------+
(1 row)
Time: 5.236 ms
17:59:07 swh-scheduler@localhost:5433=# select now(), count(*) from listed_origins where visit_type='bzr';
+-------------------------------+-------+
| now | count |
+-------------------------------+-------+
| 2022-02-16 16:59:46.291344+00 | 22000 |
+-------------------------------+-------+
(1 row)
Time: 5.584 ms
18:00:23 swh-scheduler@localhost:5433=# select now(), * from listed_origins where visit_type='bzr' order by last_update desc limit 1;
+-[ RECORD 1 ]-----------+-------------------------------------------------------------------------------+
| now | 2022-02-16 17:00:34.779024+00 |
| lister_id | 9290a3f8-6896-47ea-81b3-e3adc9df21be |
| url | https://code.launchpad.net/~ubuntu-branches/ubuntu/karmic/libvncserver/karmic |
| visit_type | bzr |
| extra_loader_arguments | {} |
| enabled | t |
| first_seen | 2022-02-16 16:59:53.309055+00 |
| last_seen | 2022-02-16 16:59:53.309055+00 |
| last_update | 2009-06-27 00:56:06.928908+00 |
+------------------------+-------------------------------------------------------------------------------+
Time: 11.362 ms
After an incremental run:
19:59:26 swh-scheduler@localhost:5433=# select now(), count(*) from listed_origins where visit_type='bzr';
+-------------------------------+--------+
| now | count |
+-------------------------------+--------+
| 2022-02-17 08:18:45.201575+00 | 168000 |
+-------------------------------+--------+
(1 row)
Time: 20.536 ms
09:18:45 swh-scheduler@localhost:5433=# select * from listers where name='launchpad';
+-[ RECORD 1 ]--+-----------------------------------------------------------------------------------------------------------------------+
| id | 9290a3f8-6896-47ea-81b3-e3adc9df21be |
| name | launchpad |
| instance_name | launchpad |
| created | 2022-02-16 16:24:45.466527+00 |
| current_state | {"bzr_date_last_modified": "2009-09-10T10:21:25+00:00", "git_date_last_modified": "2022-02-16T19:07:16.970183+00:00"} |
| updated | 2022-02-16 21:25:33.628123+00 |
+---------------+-----------------------------------------------------------------------------------------------------------------------+
Time: 0.414 ms
Migrated from D7193 (view on Phabricator)
Merge request reports
Activity
Some references in the commit message have been migrated:
- T3945 is now swh-loader-bzr#3945 (closed)
Build is green
Patch application report for D7193 (id=26070)
Rebasing onto 31b4429c...
Current branch diff-target is up to date.
Changes applied before test
commit 262f9369c837e293f8389dd9f7a6a965c09f621e Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Feb 16 17:56:13 2022 +0100 launchpad: Allow bzr origins listing Related to swh/devel/swh-loader-bzr#3945
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/454/ for more details.
19 19 20 20 logger = logging.getLogger(__name__) 21 21 22 LaunchpadPageType = Iterator[Collection] 22 VcsType = str 23 LaunchpadPageType = Tuple[VcsType, Collection] 23 24 24 25 25 26 @dataclass 26 27 class LaunchpadListerState: 27 28 """State of Launchpad lister""" 28 29 29 date_last_modified: Optional[datetime] = None 30 """modification date of last updated repository since last listing""" 30 git_date_last_modified: Optional[datetime] = None 31 """modification date of last updated git repository since last listing""" I think altering the JSON data in the scheduler db should be a good move as we already listed plenty of git repos.
21:54 $ psql service=swh-scheduler psql (12.10 (Debian 12.10-1.pgdg110+1), server 12.9 (Debian 12.9-1.pgdg110+1)) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off) Type "help" for help. softwareheritage-scheduler=> select current_state from listers where name = 'launchpad'; current_state ------------------------------------------------------------ {"date_last_modified": "2022-02-16T19:32:09.400561+00:00"} (1 row)
mentioned in merge request !266 (closed)
mentioned in merge request !412 (closed)
56 64 credentials=credentials, 57 65 ) 58 66 self.incremental = incremental 59 self.date_last_modified = None 67 self.date_last_modified: Dict[str, Optional[datetime]] = { 68 "git": None, 69 "bzr": None, 70 } 60 71 61 72 def state_from_dict(self, d: Dict[str, Any]) -> LaunchpadListerState: 62 date_last_modified = d.get("date_last_modified") 63 if date_last_modified is not None: 64 d["date_last_modified"] = iso8601.parse_date(date_last_modified) 73 for vcs_type in ["git", "bzr"]: 64 d["date_last_modified"] = iso8601.parse_date(date_last_modified) 73 for vcs_type in ["git", "bzr"]: 74 key = f"{vcs_type}_date_last_modified" 75 date_last_modified = d.get(key) 76 if date_last_modified is not None: 77 d[key] = iso8601.parse_date(date_last_modified) 78 65 79 return LaunchpadListerState(**d) 66 80 67 81 def state_to_dict(self, state: LaunchpadListerState) -> Dict[str, Any]: 68 d: Dict[str, Optional[str]] = {"date_last_modified": None} 69 date_last_modified = state.date_last_modified 70 if date_last_modified is not None: 71 d["date_last_modified"] = date_last_modified.isoformat() 82 d: Dict[str, Optional[str]] = {} 83 for vcs_type in ["git", "bzr"]: 93 118 """ 94 119 assert self.lister_obj.id is not None 95 120 96 prev_origin_url = None 121 prev_origin_url: Dict[str, Optional[str]] = {"git": None, "bzr": None} We can now remove the previous origin check as @vsellier fixes the duplicated origin insertion in the scheduler db in rDSCH0a6aac583adff2c55069c9da676ad95670e35708.
I've amended !266 (closed) with another commit which drops this as well.