i guess this is about deploying the //gitlab.com// lister, right? or do you plan to also deploy listers for other gitlab instances? (of course it's great that we can support any instances we want, just wanted to clarify what's the specific purpose of this task)
this is a more far fetching question, but I wonder about the long-term sustainability of having a different DB for each lister. Is that scalable? I wonder if it wouldn't be better to have a single DB for all listers. I don't want to hold onto the deployment of the gitlab lister just for this, so go ahead as you feel better. But maybe take the chance to discuss this with other sysadms. If appropriate, we can consolidate things later.
i guess this is about deploying the gitlab.com lister, right?
Bluntly, it's a subtask of the ingest gitlab.com lister ;) So gitlab.com it is, yes. [1]
My previous comment [2] about other instances might have mislead you (as i kind of derailed in my thoughts when i wrote that ;).
or do you plan to also deploy listers for other gitlab instances? (of course it's great that we can support any instances we want, just wanted to clarify what's the specific purpose of this task)
I could open another task for the other instances indeed [2].
this is a more far fetching question, but I wonder about the long-term sustainability of having a different DB for each lister.
Oh yeah, i see your point.
I just got along with the current way of doing thing as i was pretty new to the lister stack.
Just to clarify, for the gitlab instance, it is indeed one db for all gitlab instances (that was not clear to me at first). [3]
Is that scalable? I wonder if it wouldn't be better to have a single DB for all listers.
That's a good point.
For now, I guess i can only say it depends on the nature of the lister though.
Github, Bitbucket, Gitlab ones are simple db schema (1 table, _repo), so they could be aggregated together.
The debian one's model is not that simple though (multiple tables). And as far as you explained it to me, there is a chance the pypi one will be as well.
But i don't know yet if that will share anything with the debian lister.
So yeah, discussion would be good.
I don't want to hold onto the deployment of the gitlab lister just for this, so go ahead as you feel better.
Yes, thanks, finishing what i start makes me feel better :)
But maybe take the chance to discuss this with other sysadms.
Right.
If appropriate, we can consolidate things later.
Indeed.
[1] Also note that the gitlab lister is about listing 'gitlab' instances. Gitlab.com is just the biggest known (so far).
My point being, I will have to deploy once the gitlab lister. Then i'll have to parametrize the scheduler to create the recurring listing gitlab.com tasks (2 i think).
That's the only instance (here gitlab.com) specific part.
That's what was implied in the not so detailed message 'some data inserted in db' in the description.
I'll try to clarify the task's intent.
[3] All listed origins from any instance will be there. Each origin having an instance column (gitlab(.com), inria, debian, freedesktop, gnome, etc...) in their model to distinguish between those.
Cheers,
Antoine R. Dumontchanged title from Deploy gitlab lister to infra to Deploy gitlab instance lister to infra + start listing gitlab.com
changed title from Deploy gitlab lister to infra to Deploy gitlab instance lister to infra + start listing gitlab.com
this is a more far fetching question, but I wonder about the long-term sustainability of having a different DB for each lister. Is that scalable? I wonder if it wouldn't be better to have a single DB for all listers. I don't want to hold onto the deployment of the gitlab lister just for this, so go ahead as you feel better. But maybe take the chance to discuss this with other sysadms. If appropriate, we can consolidate things later.
I think it's more convenient to keep using one database per lister: it lets us have listers that use separate tables (e.g. Debian) without them stomping on one another; It also makes it easier to start back from scratch by just erasing the database.
I don't foresee any scalability issues with this, the only net cost of a new database is a few files in the postgresql cluster, we can reconsider when we have a hundred of them ;) From a deployment standpoint, it might be sensible, to have the listers share credentials for access to the database, which would avoid having an ever-increasing list of passwords in our manifests... but again I don't foresee that to be an issue for a while.