Investigate scheduler journal client discrepancies
swh/infra/sysadm-environment#2993 (closed) deployed and first run completed.
But, we face discrepancies:
-
have 98M [1] origins referenced in the cache table but we should have around 131M [2] origins. That difference ~33M is currently unexplained and high.
-
Checking for example on "linux" origin, the cache data computed is off as well [3]
-
[1]
softwareheritage-scheduler=> select now(), count(*) from origin_visit_stats;
now | count
-------------------------------+----------
2021-01-28 08:34:40.152554+00 | 98231002
- [2]
softwareheritage=> select count(distinct origin) from origin_visit_status where status in ('full', 'partial');
- [3]
softwareheritage-scheduler=> select * from origin_visit_stats where url='https://github.com/torvalds/linux';
url | visit_type | last_eventful | last_uneventful | last_failed | last_notfound | last_snapshot | last_scheduled
-----------------------------------+------------+-------------------------------+-----------------+-------------------------------+---------------+--------------------------------------------+----------------
https://github.com/torvalds/linux | git | 2017-09-07 18:43:13.021746+00 | | 2018-08-23 11:53:06.553328+00 | | \x3e3045be901bacc7594176e79ba13fe030f601e2 |
(1 row)
softwareheritage=> select * from origin_visit_status where origin=2 order by date desc limit 10;
origin | visit | date | status | metadata | snapshot | type
--------+-------+-------------------------------+---------+----------+--------------------------------------------+------
2 | 67 | 2020-09-21 21:55:01.586191+00 | full | | \xc7beb2432b7e93c4cf6ab09cd194c7c1998df2f9 |
2 | 67 | 2020-09-21 19:15:24.238712+00 | created | | |
2 | 66 | 2020-09-21 17:12:11.930011+00 | partial | | |
2 | 66 | 2020-09-21 17:07:41.94459+00 | created | | |
2 | 65 | 2020-08-24 11:51:54.472736+00 | full | | \xb16664848afbd3e867e8fce516ef15c1772679b2 |
2 | 65 | 2020-08-24 09:22:41.181224+00 | created | | |
2 | 64 | 2020-03-19 23:29:59.614232+00 | full | | \x89eed60d46be8b8963a1a2268762aee5bbb41038 |
2 | 63 | 2020-01-20 19:50:46.750039+00 | full | | \xcabcc7d7bf639bbe1cc3b41989e1806618dd5764 |
2 | 62 | 2019-12-16 13:44:56.685885+00 | ongoing | | |
2 | 61 | 2019-08-25 14:04:07.603463+00 | full | | \xeb8087624d47f6e8ee89692df041b2f568fb0e5f |
Related to swh/infra/sysadm-environment#2993 (closed)
Migrated from T3000 (view on Phabricator)
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Antoine R. Dumont mentioned in merge request !215 (closed)
mentioned in merge request !215 (closed)
- Antoine R. Dumont mentioned in merge request !219 (closed)
mentioned in merge request !219 (closed)
- Antoine R. Dumont mentioned in merge request !222 (closed)
mentioned in merge request !222 (closed)
- Antoine R. Dumont mentioned in merge request swh-journal!200 (closed)
mentioned in merge request swh-journal!200 (closed)
- Antoine R. Dumont mentioned in issue #2345
mentioned in issue #2345
- Antoine R. Dumont added Scheduling utilities priority:Normal labels
added Scheduling utilities priority:Normal labels
- Antoine R. Dumont changed the description
changed the description
- Maintainer
Using a semi-trivial journal client, I've dumped the data from swh-journal regarding the linux origin visit statuses:
{'origin': 'https://github.com/torvalds/linux', 'visit': 57, 'date': datetime.datetime(2018, 12, 27, 7, 47, 0, 154971, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b'|\xc1\x9bJ{Y\xf0\xbb\xa2\x06\xb0,z\xacM\xf8b\x9f\xa6R', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 25, 'date': datetime.datetime(2017, 10, 8, 11, 54, 25, 582463, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b'\x1b2xW\xf7.\x8c\xde\xb4\xd1wHrt\xaf\x18\xb2i\x03\x12', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 28, 'date': datetime.datetime(2017, 11, 21, 19, 37, 42, 517795, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b'\x86=F\x9d&1<Cau\xb4\xdb\x04)6u\x97+\xf9l', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 51, 'date': datetime.datetime(2018, 9, 30, 15, 54, 18, 68701, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b'\xdf\x1dX\xdb\xafoJ!!=\xced\x1aLg\xabN>.\x97', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 18, 'date': datetime.datetime(2017, 9, 11, 5, 49, 58, 300518, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b' \xec\x8f!\xe9\xb7@\xb5\x9b\xbd\x04\xe68C\x08\x95{\x98\x97\xd5', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 67, 'date': datetime.datetime(2020, 9, 21, 19, 15, 24, 238712, tzinfo=datetime.timezone.utc), 'status': 'created', 'snapshot': None, 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 35, 'date': datetime.datetime(2018, 7, 5, 17, 18, 5, 610308, tzinfo=datetime.timezone.utc), 'status': 'partial', 'snapshot': None, 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 47, 'date': datetime.datetime(2018, 9, 18, 6, 38, 28, 791590, tzinfo=datetime.timezone.utc), 'status': 'partial', 'snapshot': None, 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 64, 'date': datetime.datetime(2020, 3, 19, 23, 29, 59, 614232, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b'\x89\xee\xd6\rF\xbe\x8b\x89c\xa1\xa2&\x87b\xae\xe5\xbb\xb4\x108', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 21, 'date': datetime.datetime(2017, 9, 12, 18, 12, 35, 344511, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b'\xe5\xb1\xbd8\xd5\xb0l\xf9\x9f1L\xf89\xd5Mi`\xe1f\xa1', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 42, 'date': datetime.datetime(2018, 9, 7, 20, 8, 54, 581862, tzinfo=datetime.timezone.utc), 'status': 'partial', 'snapshot': None, 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 39, 'date': datetime.datetime(2018, 8, 31, 6, 32, 17, 742578, tzinfo=datetime.timezone.utc), 'status': 'partial', 'snapshot': None, 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 56, 'date': datetime.datetime(2018, 10, 21, 5, 22, 50, 678295, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b'-A\x80Y7\xba1\xd3\xbbm\xb3|\xfa\x8a\x9a#j\xa9\x81\x0f', 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 27, 'date': datetime.datetime(2017, 11, 5, 22, 45, 54, 59797, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b'l\xd9\x08\x9f\xaa\x91x\x9f\xb4A\xee\xbb\x90$;\x0f\xea\xd6 s', 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 29, 'date': datetime.datetime(2018, 3, 8, 0, 0, 53, 641972, tzinfo=datetime.timezone.utc), 'status': 'partial', 'snapshot': None, 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 26, 'date': datetime.datetime(2017, 10, 18, 6, 25, 9, 481005, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b'\x83&\xea\xba\xfa\x96\xfd\x12\xc5\xfb\xd9sU\xd3r\xcdz\xb8>#', 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 17, 'date': datetime.datetime(2017, 9, 10, 17, 35, 20, 158515, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b'\xf3\xbb\xa4\n\x96G{\xc4\xe16\xca\xa3\xbf\x95\xfeL\xa0\xe0\x12\xa2', 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 66, 'date': datetime.datetime(2020, 9, 21, 17, 12, 11, 930011, tzinfo=datetime.timezone.utc), 'status': 'partial', 'snapshot': None, 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 53, 'date': datetime.datetime(2018, 10, 4, 18, 58, 58, 156154, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b'w\t\x86-\xaa\xe70\xe3\xa0C\xb5F\x88$K}\xf2\x13@]', 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 7, 'date': datetime.datetime(2016, 8, 29, 18, 55, 54, 153721, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b'k\xd6i\x93\x83\x9d\xc8\x97\xaa\x15\xa4C\xc4\xe3\xb9\x16O\x81\x14\x99', 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 32, 'date': datetime.datetime(2018, 7, 4, 9, 1, 25, 333541, tzinfo=datetime.timezone.utc), 'status': 'partial', 'snapshot': None, 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 37, 'date': datetime.datetime(2018, 8, 23, 11, 53, 6, 553328, tzinfo=datetime.timezone.utc), 'status': 'partial', 'snapshot': None, 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 60, 'date': datetime.datetime(2019, 8, 5, 2, 16, 12, 840378, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b'\x8e\x9cP;]\xe5H\x8d\x17\xcfX\xb8\xce\x1c|_\x82x\xd7\x07', 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 65, 'date': datetime.datetime(2020, 8, 24, 9, 22, 41, 181224, tzinfo=datetime.timezone.utc), 'status': 'created', 'snapshot': None, 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 13, 'date': datetime.datetime(2017, 9, 7, 18, 43, 13, 21746, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b'>0E\xbe\x90\x1b\xac\xc7YAv\xe7\x9b\xa1?\xe00\xf6\x01\xe2', 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 15, 'date': datetime.datetime(2017, 9, 9, 17, 18, 54, 307789, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b'\xad\xd0\xa0#b\xf3\xe5\x93\x14\xe8\x0f\xa6\x01\xed\xd6\t*d\xbal', 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 55, 'date': datetime.datetime(2018, 10, 13, 19, 9, 33, 773125, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b"L\xc3,\xd1V'4\xe1=\xb5\x96)\xc8z\x1e:\xe6\x1bv\xa8", 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 43, 'date': datetime.datetime(2018, 9, 8, 21, 36, 3, 892238, tzinfo=datetime.timezone.utc), 'status': 'partial', 'snapshot': None, 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 24, 'date': datetime.datetime(2017, 10, 6, 4, 27, 51, 146287, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b"u\x9a\xef\x82/\x07\xefy\ne\x05\xbfk'[W\x088\xde\x0e", 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 6, 'date': datetime.datetime(2016, 8, 17, 9, 16, 22, 52065, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b'i\x03\xf8h\xdfm\x94\xa4D\x81\x8bP\xbe\xcdH5\xb2\x9b\xe2t', 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 46, 'date': datetime.datetime(2018, 9, 18, 2, 22, 45, 500261, tzinfo=datetime.timezone.utc), 'status': 'partial', 'snapshot': None, 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 4, 'date': datetime.datetime(2016, 6, 18, 1, 22, 24, 808485, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b'\xce!\xf3\x17\xd9\xfdt\xbbJ\xf3\x1b\x06 r@\x03\x1fK%\x16', 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 50, 'date': datetime.datetime(2018, 9, 24, 6, 23, 8, 81958, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b'I(\xe6k=Z7\x07]\x15\xda\xae\xcd\x9a>!\xab\xf2I\xcf', 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 40, 'date': datetime.datetime(2018, 8, 31, 18, 53, 23, 407469, tzinfo=datetime.timezone.utc), 'status': 'partial', 'snapshot': None, 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 5, 'date': datetime.datetime(2016, 8, 14, 12, 10, 0, 536702, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b'\xfe\x0e\xac\x19\x14\x1f\xdc\xdf\x03\x9e\x8fZ\xce^A\xb9\xa29\x8aI', 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 41, 'date': datetime.datetime(2018, 9, 6, 23, 54, 1, 808678, tzinfo=datetime.timezone.utc), 'status': 'partial', 'snapshot': None, 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 38, 'date': datetime.datetime(2018, 8, 23, 22, 43, 57, 639257, tzinfo=datetime.timezone.utc), 'status': 'partial', 'snapshot': None, 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 16, 'date': datetime.datetime(2017, 9, 10, 5, 29, 1, 462971, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b'\x88{\x850\x81\xdd\xff.\xa2\x1b5\xaeYF\x1b}(\xe794', 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 36, 'date': datetime.datetime(2018, 8, 1, 7, 26, 22, 603055, tzinfo=datetime.timezone.utc), 'status': 'partial', 'snapshot': None, 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 67, 'date': datetime.datetime(2020, 9, 21, 21, 55, 1, 586191, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b'\xc7\xbe\xb2C+~\x93\xc4\xcfj\xb0\x9c\xd1\x94\xc7\xc1\x99\x8d\xf2\xf9', 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 65, 'date': datetime.datetime(2020, 8, 24, 11, 51, 54, 472736, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b'\xb1fd\x84\x8a\xfb\xd3\xe8g\xe8\xfc\xe5\x16\xef\x15\xc1w&y\xb2', 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 33, 'date': datetime.datetime(2018, 7, 4, 12, 3, 2, 684804, tzinfo=datetime.timezone.utc), 'status': 'ongoing', 'snapshot': None, 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 9, 'date': datetime.datetime(2016, 9, 14, 10, 36, 21, 505296, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b'@\xa58\x1e+l\x0c\x04w\\[~{7(L:\xff\xc1)', 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 2, 'date': datetime.datetime(2016, 2, 23, 18, 5, 23, 312045, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b'&\xbe\xfd\xbfK9=\x1e\x03\xaa\x80\xf2\xa9U\xbc8\xb2A\xa8\xac', 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 54, 'date': datetime.datetime(2018, 10, 13, 5, 8, 9, 273129, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b"L\xc3,\xd1V'4\xe1=\xb5\x96)\xc8z\x1e:\xe6\x1bv\xa8", 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 58, 'date': datetime.datetime(2019, 3, 30, 10, 44, 8, 484859, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b'J\xaeZ\x00Dj\xa0i\xd90gbIG\xb5\x0e\xb2\xffv\xdb', 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 34, 'date': datetime.datetime(2018, 7, 4, 12, 3, 59, 959837, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b'\x91\xec\xbck=\x9e$;+\x1f\x9f\xc9aN\xd6\x85\xfe\xc4sr', 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 19, 'date': datetime.datetime(2017, 9, 11, 18, 0, 15, 37345, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b' \xec\x8f!\xe9\xb7@\xb5\x9b\xbd\x04\xe68C\x08\x95{\x98\x97\xd5', 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 48, 'date': datetime.datetime(2018, 9, 22, 13, 51, 35, 886745, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b'v\x8ai\xf2\x05\xd4\xfa4i\x0c>\x9a4\x12\x08A\x19O?\xa8', 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 10, 'date': datetime.datetime(2016, 9, 23, 10, 14, 2, 169862, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b'"R\xb4\xd4\x9b\x9exn\xb7w\xa0\tzB\xe5\x1cq\x93\xbb\x9c', 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 20, 'date': datetime.datetime(2017, 9, 12, 6, 6, 34, 703343, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b' \xec\x8f!\xe9\xb7@\xb5\x9b\xbd\x04\xe68C\x08\x95{\x98\x97\xd5', 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 30, 'date': datetime.datetime(2018, 6, 1, 12, 48, 0, 902827, tzinfo=datetime.timezone.utc), 'status': 'partial', 'snapshot': None, 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 63, 'date': datetime.datetime(2020, 1, 20, 19, 50, 46, 750039, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b'\xca\xbc\xc7\xd7\xbfc\x9b\xbe\x1c\xc3\xb4\x19\x89\xe1\x80f\x18\xddWd', 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 8, 'date': datetime.datetime(2016, 9, 7, 8, 44, 47, 861875, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b'\xc0j\x96_\x85_Ms\xc8O\xbe\xfd\x85\x9f}\xf5\x07\x18}\x9c', 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 59, 'date': datetime.datetime(2019, 6, 21, 19, 28, 48, 774654, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b'\xcb\x0eVD\x8f\xea<i\xa2y%9\x89e\xf8\x07\xdc\xba\x8f$', 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 66, 'date': datetime.datetime(2020, 9, 21, 17, 7, 41, 944590, tzinfo=datetime.timezone.utc), 'status': 'created', 'snapshot': None, 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 44, 'date': datetime.datetime(2018, 9, 9, 12, 40, 48, 143883, tzinfo=datetime.timezone.utc), 'status': 'partial', 'snapshot': None, 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 11, 'date': datetime.datetime(2017, 2, 16, 7, 53, 39, 467657, tzinfo=datetime.timezone.utc), 'status': 'partial', 'snapshot': b'\x1a\x88\x93\xe6\xa8oDN\x8b\xe8\xe7\xbd\xa6\xcb4\xfb\x175\xa0\x0e', 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 62, 'date': datetime.datetime(2019, 12, 16, 13, 44, 56, 685885, tzinfo=datetime.timezone.utc), 'status': 'ongoing', 'snapshot': None, 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 22, 'date': datetime.datetime(2017, 9, 13, 6, 26, 36, 580675, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b'\x82\xd9\x11q\xabGJP\xb4e=I\xaf\x9f\xab\x1b\x9f\x1a\xf9\xc1', 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 49, 'date': datetime.datetime(2018, 9, 23, 1, 14, 36, 694248, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b'v\x8ai\xf2\x05\xd4\xfa4i\x0c>\x9a4\x12\x08A\x19O?\xa8', 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 31, 'date': datetime.datetime(2018, 7, 1, 3, 51, 21, 433801, tzinfo=datetime.timezone.utc), 'status': 'partial', 'snapshot': None, 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 14, 'date': datetime.datetime(2017, 9, 9, 5, 14, 33, 466107, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b'\x1e\xf6\xb3q\xc8WF\xe2`\xaf<\x84\x13\xf8\x02\xfd\xd7,\xec\x9d', 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 12, 'date': datetime.datetime(2017, 5, 4, 19, 40, 9, 336451, tzinfo=datetime.timezone.utc), 'status': 'full', 'snapshot': b'\\\x9bEM\xae\x06\x82\x81\xd4\x84E\xc7\x9a$\x08\xa2\xc2C\xbc\xa2', 'type': 'git', 'metadata': None} {'origin': 'https://github.com/torvalds/linux', 'visit': 45, 'date': datetime.datetime(2018, 9, 17, 6, 15, 20, 340219, tzinfo=datetime.timezone.utc), 'status': 'partial', 'snapshot': None, 'type': 'git', 'metadata': None}
A few observations:
- some visit statuses don't have a type (?);
- the data doesn't look complete
- the contents of the origin visit stats table still doesn't match what the journal apparently contains
The first two issues make me think there's a (hidden) issue with the default settings of the journal client.
I've restarted the journal client with:
- debug logging
- a bumped up max.message.size (matching what was set in the backfill, see swh/meta$931)
- a bumped up batch_size (10000 instead of the default of 200)
For now, this seems to
- return more messages
- return all the more recent messages
This is all quite promising.
I've also added a test with "actual production" origin visit status messages (!215 (closed)) which doesn't seem to reproduce the prod issue, which is some more food for the "something is missing on the journal client side".
- Maintainer
The dump using a bumped up max.message.size completed without issue. Thanks to the bumped up batch_size, it only took 2 hours.
In that dump, all expected messages are present. Looks like the messages without
type
haven't been compacted away yet.I then reset the offsets of my client, and restarted the dump with default settings for max.message.size and batch_size. Looks like I didn't reproduce any of the previous issues with missing messages, until the client ran out of disk space for logs :(
- Maintainer
I'm completely stumped.
I've added the following to the swh-scheduler-journal-client config on saatchi:
message.max.bytes: 100000000 batch_size: 5000
I've also patched the journal client to ignore messages without a type:
interesting_messages = [ msg for msg in messages[msg_type] if "type" in msg and msg["status"] not in ("created", "ongoing") ] if not interesting_messages: return
and restarted the client in debug mode.
- Phabricator Migration user mentioned in commit cf32e376
mentioned in commit cf32e376
- Maintainer
After landing rDSCHecab745a5f20, we got partway there, but we still had only 98m origins in the visit stats table.
After checking the diff between origin lists in the swh-scheduler db and in swh-storage, I thought there was only origins with relatively high origin ids missing.
This led me to believe that the hardcoded limit of 100m origins in swh.storage.backfill would have been the culprit, but some origins higher than 100m got backfilled properly.
After (finally!) checking the backfill logs (on getty:/srv/softwareheritage/backfill-2021-01), looks like a bunch of backfill processes have been interrupted with the following traceback:
2021-01-26T15:09:27 INFO swh.storage.backfill Processing origin_visit_status range 91132000 to 91133000 Traceback (most recent call last): File "/usr/bin/swh", line 11, in <module> load_entry_point('swh.core==0.11.0', 'console_scripts', 'swh')() File "/usr/lib/python3/dist-packages/swh/core/cli/__init__.py", line 185, in main return swh(auto_envvar_prefix="SWH") File "/usr/lib/python3/dist-packages/click/core.py", line 764, in __call__ return self.main(*args, **kwargs) File "/usr/lib/python3/dist-packages/click/core.py", line 717, in main rv = self.invoke(ctx) File "/usr/lib/python3/dist-packages/click/core.py", line 1137, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/lib/python3/dist-packages/click/core.py", line 1137, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/lib/python3/dist-packages/click/core.py", line 956, in invoke return ctx.invoke(self.callback, **ctx.params) File "/usr/lib/python3/dist-packages/click/core.py", line 555, in invoke return callback(*args, **kwargs) File "/usr/lib/python3/dist-packages/click/decorators.py", line 17, in new_func return f(get_current_context(), *args, **kwargs) File "/usr/lib/python3/dist-packages/swh/storage/cli.py", line 145, in backfill dry_run=dry_run, File "/usr/lib/python3/dist-packages/swh/storage/backfill.py", line 552, in run writer.write_additions(object_type, objects) File "/usr/lib/python3/dist-packages/swh/storage/writer.py", line 66, in write_additions self.journal.write_additions(object_type, values) File "/usr/lib/python3/dist-packages/swh/journal/writer/kafka.py", line 247, in write_additions self._write_addition(object_type, object_) File "/usr/lib/python3/dist-packages/swh/journal/writer/kafka.py", line 235, in _write_addition self.send(topic, key=key, value=dict_) File "/usr/lib/python3/dist-packages/swh/journal/writer/kafka.py", line 175, in send topic=topic, key=kafka_key, value=value_to_kafka(value), BufferError: Local: Queue full
$ zgrep --files-with-matches BufferError logs/origin_visit_status/* logs/origin_visit_status/100000000.log.gz logs/origin_visit_status/110000000.log.gz logs/origin_visit_status/120000000.log.gz logs/origin_visit_status/60000000.log.gz logs/origin_visit_status/80000000.log.gz logs/origin_visit_status/90000000.log.gz
Which means that the visit statuses for around 50-60m origins haven't been backfilled with a
type
key, explaining the discrepancy...I'll re-run the backfill on these broken shards (ouch!), and we'll see how that finally goes...
- Maintainer
I had an immediate resurgence of the issue when restarting the backfill of the missing chunks, so I hot-patched a retry on
BufferError
s inswh.journal.writer.kafka
. Glancing at sentry, we never had this issue in prod. The hot patch seems to work; I'll write tests on that and submit it as a diff tomorrow. - Maintainer
The backfill is now complete, and the scheduler journal client has processed its backlog of messages.
Time to compare origin lists again...
- Maintainer
After the full backfill:
softwareheritage-scheduler=> select count(*) from origin_visit_stats; count ----------- 138073057 (1 ligne)
$ psql service=swh -c "\copy (select url from origin where exists (select 1 from origin_visit_status where origin=id and status in ('full', 'partial')) order by url) to stdout csv" > archive_origins $ wc -l archive_origins 138041707 archive_origins
The amount of origins seems to be consistent in both lists.
I'm currently dumping the list of origins from the scheduler, to do an actual comparison, but this is very promising!
- Maintainer
That's all been working consistently for months now, closing!
- Nicolas Dandrimont assigned to @olasd
assigned to @olasd
- Nicolas Dandrimont closed
closed