plot: Add support for pandas >= 2 (!85) · Merge requests · Platform / Development / swh-scanner

Antoine Lambert requested to merge anlambert/swh-scanner:fix-pandas-2-support into master May 11, 2023
pandas.DataFrame.append was removed in pandas 2.0, pandas.concat must be used instead.
We missed the issue as our CI still uses Python 3.7 and the latest available pandas version on PyPI for that Python version is 1.3.5.
$ pytest -svv swh/scanner/tests/test_plot.py::test_build_hierarchical_df
================================================================================================================================== test session starts ==================================================================================================================================
platform linux -- Python 3.9.2, pytest-7.3.0, pluggy-1.0.0 -- /home/anlambert/.virtualenvs/swh/bin/python3
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/home/anlambert/swh/swh-environment/swh-scanner/.hypothesis/examples')
rootdir: /home/anlambert/swh/swh-environment/swh-scanner
configfile: pytest.ini
plugins: case-1.5.3, mock-3.10.0, postgresql-3.1.3, requests-mock-1.10.0, subprocess-1.5.0, anyio-3.6.2, forked-1.6.0, dash-2.9.2, redis-3.0.1, flask-1.2.0, asyncio-0.21.0, django-test-migrations-1.2.0, cov-4.0.0, httpserver-1.0.6, Faker-18.5.1, docker-compose-3.2.1, xdist-3.2.1, hypothesis-6.71.0, testinfra-7.0.0, swh.journal-1.3.1, swh.core-2.22.0
asyncio: mode=auto
collected 1 item                                                                                                                                                                                                                                                                        

swh/scanner/tests/test_plot.py::test_build_hierarchical_df FAILED

======================================================================================================================================= FAILURES ========================================================================================================================================
______________________________________________________________________________________________________________________________ test_build_hierarchical_df _______________________________________________________________________________________________________________________________

source_tree = Directory(id=0a7b61ef5780b03aa274d11069564980246445ce, entries=[b'some-binary', b'link-to-another-quote', b'toexclude', b'bar', b'link-to-foo', b'foo'])
source_tree_dirs = [PosixPath('toexclude'), PosixPath('bar'), PosixPath('bar/barfoo'), PosixPath('bar/barfoo2'), PosixPath('foo')]
nodes_data = {CoreSWHID.from_string('swh:1:dir:0a7b61ef5780b03aa274d11069564980246445ce'): {'known': True}, CoreSWHID.from_string('...34c9a'): {'known': True}, CoreSWHID.from_string('swh:1:cnt:acac326ddd63b0bc70840659d4ac43619484e69f'): {'known': True}}

    def test_build_hierarchical_df(source_tree, source_tree_dirs, nodes_data):
        root = Path(source_tree.data["path"].decode())
        dirs = [Path(dir_path) for dir_path in source_tree_dirs]
        dirs_data = get_directory_data(root, source_tree, nodes_data)
        max_depth = compute_max_depth(dirs)
        metrics_columns = ["contents", "known"]
        levels_columns = ["lev" + str(i) for i in range(max_depth)]
        df_columns = levels_columns + metrics_columns
    
        actual_df = generate_df_from_dirs(dirs_data, df_columns, max_depth)
    
>       actual_result = build_hierarchical_df(
            actual_df, levels_columns, metrics_columns, root
        )

swh/scanner/tests/test_plot.py:60: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
swh/scanner/plot.py:147: in build_hierarchical_df
    complete_df = complete_df.append(df_tree_list, ignore_index=True)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = Empty DataFrame
Columns: [id, parent, contents, known]
Index: [], name = 'append'

    def __getattr__(self, name: str):
        """
        After regular attribute access, try looking up the name
        This allows simpler access to columns for interactive use.
        """
        # Note: obj.x will always call obj.__getattribute__('x') prior to
        # calling obj.__getattr__('x').
        if (
            name not in self._internal_names_set
            and name not in self._metadata
            and name not in self._accessors
            and self._info_axis._can_hold_identifiers_and_holds_name(name)
        ):
            return self[name]
>       return object.__getattribute__(self, name)
E       AttributeError: 'DataFrame' object has no attribute 'append'

../../../.virtualenvs/swh/lib/python3.9/site-packages/pandas/core/generic.py:5989: AttributeError
================================================================================================================================ short test summary info ================================================================================================================================
FAILED swh/scanner/tests/test_plot.py::test_build_hierarchical_df - AttributeError: 'DataFrame' object has no attribute 'append'
=================================================================================================================================== 1 failed in 0.75s ===================================================================================================================================
plot: Add support for pandas >= 2

Merge request reports