Project 'infra/sysadm-environment' was moved to 'swh/infra/sysadm-environment'. Please update any links and bookmarks that may still have the old path.
Now that we have fork detection; a single successful load of https://github.com/chromium/chromium might unstuck all its forks; unfortunately we cannot load that repository either, not even from its previous full snapshot.
I think we should try loading https://github.com/chromium/chromium manually as a one-time thing to get it going again (and future loads of this repository should success too, assuming we visit it often enough).
I've triggered a run on worker1.staging [1] and worker17 as is for now.
We'll see for the pack file size limit after that run fails (if it does).
swhworker@worker1:~$ url=https://github.com/chromium/chromium; /usr/bin/time -v swh loader run git $url | tee chromium-20220601-01.txtINFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/chromium/chromium' with type 'git'
[2]
swhworker@worker17:~$ url=https://github.com/chromium/chromium; /usr/bin/time -v swh loader run git $url | tee chromium-20220601-01.txtINFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/chromium/chromium' with type 'git'
Ok, expectedly, it does not work as is [1] ;)
Second run then with twice the actual pack file limit [2].
[2]
swhworker@worker1:~$ url=https://github.com/chromium/chromium; /usr/bin/time -v swh loader run git $url pack_size_bytes=8589934592 | tee chromium-20220601-02.txtINFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/chromium/chromium' with type 'git'...
[1]
swhworker@worker1:~$ url=https://github.com/chromium/chromium; /usr/bin/time -v swh loader run git $url | tee chromium-20220601-01.txtINFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/chromium/chromium' with type 'git'Enumerating objects: 18243310, done.Counting objects: 100% (5895/5895), done.Compressing objects: 100% (3180/3180), done.ERROR:swh.loader.git.loader.GitLoader:Loading failure, updating to `failed` statusTraceback (most recent call last): File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 374, in load more_data_to_fetch = self.fetch_data() File "/usr/lib/python3/dist-packages/swh/loader/git/loader.py", line 318, in fetch_data self.origin.url, base_repo, do_progress File "/usr/lib/python3/dist-packages/swh/loader/git/loader.py", line 240, in fetch_pack_from_origin progress=do_activity, File "/usr/lib/python3/dist-packages/dulwich/client.py", line 2087, in fetch_pack progress, File "/usr/lib/python3/dist-packages/dulwich/client.py", line 915, in _handle_upload_pack_tail SIDE_BAND_CHANNEL_PROGRESS: progress, File "/usr/lib/python3/dist-packages/dulwich/client.py", line 674, in _read_side_band64k_data cb(pkt) File "/usr/lib/python3/dist-packages/swh/loader/git/loader.py", line 228, in do_pack f"Pack file too big for repository {origin_url}, "OSError: Pack file too big for repository https://github.com/chromium/chromium, limit is 4294967296 bytes, current size is 4294959115, would write 8192{'status': 'failed'} for origin 'https://github.com/chromium/chromium' Command being timed: "swh loader run git https://github.com/chromium/chromium" User time (seconds): 563.53 System time (seconds): 343.31 Percent of CPU this job got: 43% Elapsed (wall clock) time (h:mm:ss or m:ss): 34:25.20 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 22185788 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 1707154 Minor (reclaiming a frame) page faults: 22250583 Voluntary context switches: 1920190 Involuntary context switches: 94371 Swaps: 0 File system inputs: 68880160 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0
worker17 is complaining as well but differently somehow.
same version for both though [2].
Anyway, no point in waiting for the same issue so triggering the same as staging (that might take a while to finish so...).
[2]
ii python3-swh.loader.git 1.9.0-1~swh1~bpo10+1 all Software Heritage Git loader
[1]
swhworker@worker17:~$ url=https://github.com/chromium/chromium; /usr/bin/time -v swh loader run git $url | tee chromium-20220601-01.txtINFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/chromium/chromium' with type 'git'ERROR:swh.loader.git.loader.GitLoader:Loading failure, updating to `failed` statusTraceback (most recent call last): File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 374, in load more_data_to_fetch = self.fetch_data() File "/usr/lib/python3/dist-packages/swh/loader/git/loader.py", line 318, in fetch_data self.origin.url, base_repo, do_progress File "/usr/lib/python3/dist-packages/swh/loader/git/loader.py", line 240, in fetch_pack_from_origin progress=do_activity, File "/usr/lib/python3/dist-packages/dulwich/client.py", line 2076, in fetch_pack "git-upload-pack", url, data=req_data.getvalue() File "/usr/lib/python3/dist-packages/dulwich/client.py", line 1952, in _smart_request resp, read = self._http_request(url, headers, data) File "/usr/lib/python3/dist-packages/dulwich/client.py", line 2181, in _http_request "POST", url, headers=req_headers, body=data File "/usr/lib/python3/dist-packages/urllib3/request.py", line 72, in request **urlopen_kw) File "/usr/lib/python3/dist-packages/urllib3/request.py", line 150, in request_encode_body return self.urlopen(method, url, **extra_kw) File "/usr/lib/python3/dist-packages/urllib3/poolmanager.py", line 323, in urlopen response = conn.urlopen(method, u.request_uri, **kw) File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 616, in urlopen **response_kw) File "/usr/lib/python3/dist-packages/urllib3/response.py", line 525, in from_httplib **response_kw) File "/usr/lib/python3/dist-packages/urllib3/response.py", line 209, in __init__ self._body = self.read(decode_content=decode_content) File "/usr/lib/python3/dist-packages/urllib3/response.py", line 438, in read data = self._fp.read() File "/usr/lib/python3.7/http/client.py", line 468, in read return self._readall_chunked() File "/usr/lib/python3.7/http/client.py", line 580, in _readall_chunked return b''.join(value)MemoryError{'status': 'failed'} for origin 'https://github.com/chromium/chromium' Command being timed: "swh loader run git https://github.com/chromium/chromium" User time (seconds): 448.15 System time (seconds): 319.74 Percent of CPU this job got: 52% Elapsed (wall clock) time (h:mm:ss or m:ss): 24:26.73 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 29481236 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 7362239 Voluntary context switches: 161768 Involuntary context switches: 39326 Swaps: 0 File system inputs: 8 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0
8g (pack size limit) was not enough either, it broke on both workers ¯_(ツ)_/¯.
We have no clue as to what size limit should be done so i'm clearly taking shots in the dark.
I've started a 32g experiment in worker1.staging and 64g in worker17.
We will see.
I've started a 32g experiment in worker1.staging and 64g in worker17.
64g was a bit too much for worker17 [1], it ran out of memory so fail!
The staging worker seems to be taking a nicer path (still up and running) so
i've started that same ingestion (32g of pack size limit) in worker17 now.
[1]
swhworker@worker17:~$ url=https://github.com/chromium/chromium; /usr/bin/time -v swh loader run git $url pack_size_bytes=68719476736 | tee chromium-20220601-04-pack-size-limit-64g.txtINFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/chromium/chromium' with type 'git'Enumerating objects: 18243673, done.Counting objects: 100% (1536/1536), done.Compressing objects: 100% (939/939), done.ERROR:swh.loader.git.loader.GitLoader:Loading failure, updating to `failed` statusTraceback (most recent call last): File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 374, in load more_data_to_fetch = self.fetch_data() File "/usr/lib/python3/dist-packages/swh/loader/git/loader.py", line 318, in fetch_data self.origin.url, base_repo, do_progress File "/usr/lib/python3/dist-packages/swh/loader/git/loader.py", line 240, in fetch_pack_from_origin progress=do_activity, File "/usr/lib/python3/dist-packages/dulwich/client.py", line 2087, in fetch_pack progress, File "/usr/lib/python3/dist-packages/dulwich/client.py", line 915, in _handle_upload_pack_tail SIDE_BAND_CHANNEL_PROGRESS: progress, File "/usr/lib/python3/dist-packages/dulwich/client.py", line 674, in _read_side_band64k_data cb(pkt) File "/usr/lib/python3/dist-packages/swh/loader/git/loader.py", line 233, in do_pack pack_buffer.write(data) File "/usr/lib/python3.7/tempfile.py", line 903, in write rv = file.write(s)OSError: [Errno 28] No space left on device{'status': 'failed'} for origin 'https://github.com/chromium/chromium' Command being timed: "swh loader run git https://github.com/chromium/chromium pack_size_bytes=68719476736" User time (seconds): 409.20 System time (seconds): 398.17 Percent of CPU this job got: 48% Elapsed (wall clock) time (h:mm:ss or m:ss): 27:35.21 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 58774716 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 155 Minor (reclaiming a frame) page faults: 10859992 Voluntary context switches: 178535 Involuntary context switches: 18268 Swaps: 0 File system inputs: 30112 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0
Status update, both worker1.staging and worker17 are beyond the failing step of pack
file limit where they usually crash \o/ [1].
So, current chromium ingestion retrieves a pack file of ~18G (if i read the log
correctly).
And their memory use is now way more reasonable that it's using prior to the starting up
of the ingestion [2] (respectively virt/rss: ~2g/2g vs ~56g/21g at the initialization).
[1]
swhworker@worker1:~$ url=https://github.com/chromium/chromium; /usr/bin/time -v swh loader run git $url pack_size_bytes=34359738368 | tee chromium-20220601-pack-size-32g.txtromium; /usr/bin/time -v swh loader run git $url pack_INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/chromium/chromium' with type 'git'Enumerating objects: 18243617, done.Counting objects: 100% (1476/1476), done.Compressing objects: 100% (893/893), done.Total 18243617 (delta 622), reused 705 (delta 572), pack-reused 18242141INFO:swh.loader.git.loader:Listed 28831 refs for repo https://github.com/chromium/chromium...
[2] from htop:
4091208 swhworker 20 0 2225M 2185M 5240 S 0.0 9.1 52:32.93 │ └─ /usr/bin/python3 /usr/bin/swh loader run git https://github.com/chromium/chromium pack_size_bytes=34359738368
Success for production worker [1]. Staging worker is still working on it.
[1] worker17
swhworker@worker17:~$ url=https://github.com/chromium/chromium; /usr/bin/time -v swh loader run git $url pack_size_bytes=34359738368 | tee chromium-20220601-03-pack-size-limit-32g.txtINFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/chromium/chromium' with type 'git'Enumerating objects: 18243862, done.Counting objects: 100% (1723/1723), done.Compressing objects: 100% (1094/1094), done.Total 18243862 (delta 717), reused 890 (delta 607), pack-reused 18242139INFO:swh.loader.git.loader:Listed 28832 refs for repo https://github.com/chromium/chromiumINFO:swh.loader.git.loader.GitLoader:Fetched 18243863 objects; 6260568 are new{'status': 'eventful'} for origin 'https://github.com/chromium/chromium' Command being timed: "swh loader run git https://github.com/chromium/chromium pack_size_bytes=34359738368" User time (seconds): 102415.56 System time (seconds): 5484.61 Percent of CPU this job got: 22% Elapsed (wall clock) time (h:mm:ss or m:ss): 134:40:21 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 58810708 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 35 Minor (reclaiming a frame) page faults: 23570033 Voluntary context switches: 535303 Involuntary context switches: 307983 Swaps: 0 File system inputs: 33808 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0
Loader crashed with memory issues. Probably too much loading in //.
Currently stopping the worker's other processes to let this one finish (i'll restart it).
swhworker@worker17:~$ url=https://github.com/thebigbrain/chromium; /usr/bin/time -v swh loader run git $url lister_name=github lister_instance_name=github pack_size_bytes=34359738368 | tee chromium-20220607-04-pack-size-limit-32g-fork.txtINFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/thebigbrain/chromium' with type 'git'Enumerating objects: 10930922, done.Counting objects: 100% (191/191), done.Compressing objects: 100% (56/56), done.Total 10930922 (delta 140), reused 135 (delta 135), pack-reused 10930731INFO:swh.loader.git.loader:Listed 15020 refs for repo https://github.com/thebigbrain/chromiumERROR:swh.loader.git.loader.GitLoader:Loading failure, updating to `failed` statusTraceback (most recent call last): File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 377, in load self.store_data() File "/usr/lib/python3/dist-packages/swh/loader/git/base.py", line 80, in store_data for obj in self.get_contents(): File "/usr/lib/python3/dist-packages/swh/loader/git/loader.py", line 414, in get_contents for raw_obj in self.iter_objects(b"blob"): File "/usr/lib/python3/dist-packages/swh/loader/git/loader.py", line 404, in iter_objects PackData.from_file(self.pack_buffer, self.pack_size) File "/usr/lib/python3/dist-packages/dulwich/pack.py", line 1386, in _walk_all_chains for result in self._follow_chain(offset, type_num, None): File "/usr/lib/python3/dist-packages/dulwich/pack.py", line 1444, in _follow_chain unpacked = self._resolve_object(offset, obj_type_num, base_chunks) File "/usr/lib/python3/dist-packages/dulwich/pack.py", line 1435, in _resolve_object unpacked.obj_chunks = apply_delta(base_chunks, unpacked.decomp_chunks)MemoryError{'status': 'failed'} for origin 'https://github.com/thebigbrain/chromium' Command being timed: "swh loader run git https://github.com/thebigbrain/chromium lister_name=github lister_instance_name=github pack_size_bytes=34359738368" User time (seconds): 6907.23 System time (seconds): 619.13 Percent of CPU this job got: 62% Elapsed (wall clock) time (h:mm:ss or m:ss): 3:19:27 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 21273060 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 75 Minor (reclaiming a frame) page faults: 6586107 Voluntary context switches: 14848 Involuntary context switches: 90856 Swaps: 0 File system inputs: 10352 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0
Looks like either the loader didn't detect it is a fork, or github sent a large packfile anyway.
In swh/loader/git/loader.py at the end of the prepare function, could you print self.statsd.constant_tags and self.parent_origins, to see which it is?
i've already started back the process. The packfile sent was a large one [~10G [1]) but way less than the initial load [~18G [2]) if i read those logs correctly.
[1]
Total 10930922 (delta 140), reused 135 (delta 135), pack-reused 10930731
[2]
Total 18243617 (delta 622), reused 705 (delta 572), pack-reused 18242141
initial load of a different repository, which has 338k more commits
Which one has that much more commit, the initial one? If so, i would expect the fork to be loaded way faster since they should have a shared history at some point in the past.
My point was mostly to say "no, not immediately" to your question [1], not immediately since the process already restarted back for some time (prior to the syadm channel notification time).
I'll do it if that fails again.
And I thought the packfile log were interesting. If they are not, please detail a bit because i don't see it exactly.
[1] > In swh/loader/git/loader.py at the end of the prepare function, could you print self.statsd.constant_tags and self.parent_origins, to see which it is?
Which one has that much more commit, the initial one?
Yes
If so, i would expect the fork to be loaded way faster since they should have a shared history at some point in the past.
I would have expected it not to run out of memory (which was the point of the manual load), and it already failed that test
yes, ok so we are aligned then.
Note that the first repo run took 134:40:21 (after multiple iterations so maybe more than that actually), so even if the fork ingestion take like ~10h, that'd be much quicker already ¯_(ツ)_/¯ (been ongoing for ~52min now)
Note that the first repo run took 134:40:21 (after multiple iterations so maybe more than that actually), so even if the fork ingestion take like ~10h, that'd be much quicker already ¯_(ツ)_/¯ (been ongoing for ~52min now)
Well, it finished and took ~20h [1], still some win in regards to the initial ingestion of 134h...
[1]
swhworker@worker17:~$ url=https://github.com/thebigbrain/chromium; /usr/bin/time -v swh loader run git $url lister_name=github lister_instance_name=github pack_size_bytes=34359738368 | tee chromium-20220607-04-pack-size-limit-32g-fork.txtINFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/thebigbrain/chromium' with type 'git'WARNING:swh.storage.proxies.retry:Retrying RPC callWARNING:swh.storage.proxies.retry:Retrying RPC callEnumerating objects: 10930922, done.Counting objects: 100% (191/191), done.Compressing objects: 100% (56/56), done.Total 10930922 (delta 140), reused 135 (delta 135), pack-reused 10930731INFO:swh.loader.git.loader:Listed 15020 refs for repo https://github.com/thebigbrain/chromiumsINFO:swh.loader.git.loader.GitLoader:Fetched 10930923 objects; 3 are new{'status': 'eventful'} for origin 'https://github.com/thebigbrain/chromium' Command being timed: "swh loader run git https://github.com/thebigbrain/chromium lister_name=github lister_instance_name=github pack_size_bytes=34359738368" User time (seconds): 53568.28 System time (seconds): 2469.61 Percent of CPU this job got: 75% Elapsed (wall clock) time (h:mm:ss or m:ss): 20:43:24 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 21274320 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 17 Minor (reclaiming a frame) page faults: 6065975 Voluntary context switches: 200471 Involuntary context switches: 213563 Swaps: 0 File system inputs: 21200 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0