As discussed with @anlambert at the SWH community workshop back in February, source code of the 4k+ TeX Live packages in Guix cannot currently be recovered due to its peculiar arrangement: the source code of those packages is obtained by checking out individual directories and then combining them, as in this example.
IIRC, @anlambert suggested extending sources.json to include the list of sub-directories to be checked out; the loader would then compute a nar-sha256 ExtID for the combined directories.
@civodul In your excerpt, to what does the integrity field refer to?
Is that the Nar computation of the svn exported directory of the repository at the
revision 66594 with only the subdirectories mentioned in the svn_subdirectories key?
The only hash Guix keeps is the nar hash of the combined checkouts, and this is what the integrity field contains (so it's currently incorrect).
What I'm proposing is to have svn_subdirectories specify which directories are being combined, with the understanding that integrity is the nar hash of that combination.
The only hash Guix keeps is the nar hash of the combined checkouts, and this is what the integrity field contains (so it's currently incorrect).
yes.
What I'm proposing is to have svn_subdirectories specify which directories are being combined, with the understanding that integrity is the nar hash of that combination. Does that make sense?
Yes, I think so. I'm under the impression we agree, just telling it differently.
Implementation wise, that gives roughly something like, using the json manifest:
Do the svn export of the repository at the svn revision mentioned.
Keep the subdirectories folder mentioned in the svn 'svn_subDirectories' key for that repository 'checkout'.
Compute the nar hash on the main repository folder.
Do the svn export of the repository at the svn revision mentioned.
Keep the subdirectories folder mentioned in the svn 'svn_subDirectories' key for that repository 'checkout'.
We should rather export each sub-directory independently as subversion allows it and combine them using the expected folders hierarchy, as Tex Live source tree is quite huge and a full export will consume a lot of bandwidth.
This should be easy to modify the SvnExportLoader to implement the processing.
I tested it with the Guix package texlive-algpseudocodex to check if we end up with the correct nar hash.
Package definition is the following:
(define-publictexlive-algpseudocodex(package(name"texlive-algpseudocodex")(version(number->string%texlive-revision))(source(texlive-originnameversion(list"doc/latex/algpseudocodex/""tex/latex/algpseudocodex/")(base32"1gjcdmzijiagzxwjwygqpbjjapzk9dfljv5d94iabzr8032l9rsh")))(outputs'("out""doc"))(build-systemtexlive-build-system)(home-page"https://ctan.org/pkg/algpseudocodex")(synopsis"Package for typesetting pseudocode")(description"This package allows typesetting pseudocode in LaTeX. It is based on@code{algpseudocode} from the @code{algorithmicx} package and uses the samesyntax, but adds several new features and improvements. Notable featuresinclude customizable indent guide lines and the ability to draw boxes aroundparts of the code for highlighting differences. This package also has bettersupport for long code lines spanning several lines and improved comments.")(licenselicense:lppl1.3c)))
The two subfolders of the Tex Live repository to export are doc/latex/algpseudocodex and tex/latex/algpseudocodex from base URL svn://www.tug.org/texlive/tags/texlive-2023.0/Master/texmf-dist at revision 66594. The NAR hash to obtain is 1gjcdmzijiagzxwjwygqpbjjapzk9dfljv5d94iabzr8032l9rsh (in format nix-base32, default in guix hash).
Let's try to reproduce the directory to archive locally.
So adding svn_subdirectories field in JSON manifest with value ["doc/latex/algpseudocodex/", "text/latex/algpseudocodex/"] is sufficient on our side to reconstruct the same directory to hash.
Another thought on how to proper archiving these Tex Live origins, currently in the manifest we only have the base SVN url for Tex Live, see sample below:
While adding the new info of directories to checkout and combine will enable us to recompute the same hash, all archived directories will be a new snapshot of the same origin URL svn://www.tug.org/texlive/tags/texlive-2023.0/Master/texmf-dist/ which is not an issue on the Guix side as directories will be requested by ExtIDs but is not really consistent on the SWH side.
Adding the Guix package names in the JSON manifest would be of interest to us so we could use origin URLs in the form https://packages.guix.gnu.org/packages/<package_name> to disambiguate what we archived.
Adding the Guix package names in the JSON manifest would be of interest to us so we could use origin URLs in the form https://packages.guix.gnu.org/packages/<package_name> to disambiguate what we archived.
In addition to svn_subdirectories, which field would be required in the file sources.json?
Adding a new packageName field would be of interest to us, for instance in the texlive-algpseudocodex example above, its entry in the manifest would be:
@ardumont, as an alternate way to disambiguate the archived directory, instead of changing the origin URL, we could create a branch in the produced snapshot with the package name.
@civodul added svn_files (instead of svn_subdirectories) with the rationale that it is not always directories but can also be just files. Is it fine?
The addition of the package name on our side needs some adaptations because the sources are extracted for each package. Other said, a package has one name but then it can refer to several origins (the main, patches, some inputs for the tests, etc.). Last each origin is processed and all the fields are filled; i.e., there is not anymore any field packageName.
Therefore, we need to special case svn and maybe TeXlive packages, in order to keep the packageName.
Or if you prefer, we could generate a list of URLs, something like:
@civodul added svn_files (instead of svn_subdirectories) with the rationale that it is not always directories but can also be just files. Is it fine?
All good !
The addition of the package name on our side needs some adaptations because the sources are extracted for each package. Other said, a package has one name but then it can refer to several origins (the main, patches, some inputs for the tests, etc.). Last each origin is processed and all the fields are filled; i.e., there is not anymore any field packageName.
Ack, I thought adding the package name was straightforward on your side but apparently not. Do not bother with that then, we will add a special processing in our guix lister for the the Tex Live origins to derive the tex package name from the paths listed in svn_files and use it as branch name in the snapshot to produce.
Or if you prefer, we could generate a list of URLs, something like:
where multi-svn is for dealing with this list of URLs, here svn_urls. Or just svn and you check on the type of svn_url.
I tested locally the loading of the 4152 Tex Live packages referenced in the sources.json file and everything went fine except for the texlive-hyphen-complete where a NAR hash mismatch is obtained.
I noticed that the sub-directories doc/generic/elhyphen and doc/generic/huhyphen referenced in the guix package definition are missing in the NAR archive hence the hash mismatch.
Looking for something to prevent similar issue, I am not sure what could be done on our side.
Roughly, the ending slash / suffix is used to detect if svn export must happen in path/to/directory/ or in the directory name of path/to/file (i.e., path/to/). For each location listed (e.g., "doc/generic/elhyphen/"), the code reads:
Therefore, I am not sure we could do better without something overcomplicated on our side. Other said, this kind of error might happen again if a slash / is forgotten as it was the case. Hum?
Oh, I did not even notice the missing trailing slashes.
On the SWH side, we do not have such processing, trailing slashes are stripped and the svn export command is called to export either a directory or a file, see related code. Sub-folders to export are created a priori and we use the --force option of the export command to ensure subversion will not raise an error if a directory already exists due to a previous export operation.
Therefore, I am not sure we could do better without something overcomplicated on our side. Other said, this kind of error might happen again if a slash / is forgotten as it was the case. Hum?
Not sure what you can do on Guix side to avoid that kind of mistake. Maybe you could add some checks based on the svn info command as it can give you the type of svn node targeted by an URL: