Skip to content
Snippets Groups Projects
  1. Jun 02, 2021
  2. May 12, 2021
  3. May 10, 2021
  4. May 08, 2021
  5. May 07, 2021
  6. May 05, 2021
  7. May 04, 2021
  8. Apr 27, 2021
  9. Apr 23, 2021
  10. Apr 16, 2021
    • Antoine Lambert's avatar
      cli: Fix sphinx warning · 04a1656c
      Antoine Lambert authored
      This fixes the following warning:
      
      <generated>:1: WARNING: Inline emphasis start-string without end-string.
      
      Related to T2265
      04a1656c
  11. Apr 15, 2021
  12. Apr 09, 2021
    • Antoine Pietri's avatar
    • Antoine Pietri's avatar
      NodeIdMap: use the MPH + mmapped .order to translate SWHID -> node ID · 53bbd5c6
      Antoine Pietri authored
      Right now we are generating two different binary mappings on disk for
      the translation between SWHID <-> webgraph node ID:
      
      1. The node2swh.bin reverse map, which contains a list of binary SWHID.
      It allows O(1) access to the SWHID of a given node n by seek()ing to the
      position (n * record size).
      
      2. The swhid2node.bin map, which contains a list of <SWHID, node ID>
         ordered by SWHID. The node ID of a given SWHID can be found in
         O(log N) by doing a binary search on this list.
      
      Because the swhid -> node binary search requires multiple seek() on the
      disk, it is very slow and cannot be used for performance sensitive code.
      
      The alternative route is to compute the node from the minimal perfect
      hash function and then lookup the equivalent node in the permuted graph
      using the .order file containing the permutation, which can be done in
      O(1) and is extremely fast. However, this has two caveats:
      
      - The .order file is ~150 GB, which would be too big to load in memory.
      
      - MPH cannot check whether their input is valid. They could do so
        probabilistically if we signed them, but when replying to a query in
        the graph service, we want to be absolutely certain that a node is or
        is not present in the graph.
      
      This code mitigates these problems in two ways. First, it memory-maps
      the permutation file to avoid loading it in main memory. Second, it uses
      a roundtrip check to detect invalid SWHIDs: we hash + permute the
      SWHID, then use the reverse map to check that the obtained node ID is
      associated to the original SWHID.
      
      This is a big performance gain (basic benchmarks show a ~x3 speedup).
      To go even faster, we offer a boolean option to skip the roundtrip
      check, to use when we know that the input is always valid.
      
      This will also allow us in the future to remove the swhid2node map
      completely, however it is currently still in use by the Python frontend
      to encode the SWHIDs. This will be done directly in the Java side in the
      future.
      53bbd5c6
  13. Apr 06, 2021
  14. Apr 02, 2021
  15. Mar 31, 2021
  16. Mar 23, 2021
  17. Mar 15, 2021
    • Stefano Zacchiroli's avatar
      FindEarliestRevision: bug fix: do not follow rev:rev edges · 58b46f78
      Stefano Zacchiroli authored
      Rationale: we do not need to do so at all, because we are only interested in
      commits that directly contain the content or dir (at some depth), nor in any of
      their successors.
      
      Note: this bug was not responsible for wrong answers in most cases (because
      successors will tend to have higher timestamps), but incurred significant extra
      time (4x in early benchmarks) due to exploring significant commit histories for
      any file/dir appearing in large projects.
      58b46f78
  18. Mar 11, 2021
  19. Feb 26, 2021
  20. Feb 25, 2021
Loading