IRC channel logs

<damo22>youpi: i think we need a filesystem that has rich metadata capability that is free software, suited to HPC environments where you might want to attach arbitrary tags or metadata to your files

<youpi>is xattr not enough?

<damo22>xattr, what can you put in there?

<youpi>whatever you want

<damo22>can you put a blob of json?

<youpi>man xattr says an attribute value is limited to 64 kB

<youpi>(but even that might be outdated, and actually larger)

<youpi>now if you want more, perhaps better use file formats such as hdf5

<Pellescours>damo22: (do you have news for your patch in netbsd?)

<damo22>not really

<youpi>ah, ext4 stores the xattr in a block, so it cannot be larger than one block (from 1024 to 4096 depending on your device size)

<damo22>i mean, 1024 bytes isnt bad, but supposing you had a bunch of metadata you wanted to store that could be used to search the file efficiently like a db

<damo22>eg, in a health care setting, the concept of "patient" and "test"

<damo22>and any other relevant batching

<damo22>you could make the storage of the files more intelligent

<damo22>so it could do things like, if you access one patient from a batch, it shuffles all the patient files from a batch to SSD

<damo22>and not at the application layer

<damo22>and making some files immutable once created

<damo22>what kind of thing am i talking about, is it a filesystem?

<youpi>filesystems and databases are not so far apart actually

<youpi>sometimes you'd like a mixture of the two

<damo22>is there anything like this in free software?

<youpi>well, sqlite, hdf5, etc.

<youpi>the thing is: the more you ask at the POSIX file interface, the less probable you'll get it

<youpi>so it's indeed simpler to just use a file and implement a database on top of it

<youpi>be it hdf5, or hadoop etc.

<damo22>why would i want a file with a database in it?

<youpi>because then it's easy to move it around, put it on a different filesystem, etc.

<youpi>whatever the OS or filesystem

<damo22>i guess so

<damo22>we have that already with SAM format

<damo22>and block gzipped into BAM

<damo22>but then we need a separate database to manage all the database files

<damo22>i think that would be better done by a filesystem with metadata

<damo22>hmm i could write a hurd translator for SAM files

<damo22>what would be the benefit of a hurd translator for something like a sqlite file?

<damo22>what would it do?

<youpi>you could for instance connect() to it, and talk mysql

<youpi>-my

<damo22>heh yeah

<damo22>so i could use an existing library for a "database"-like file format and link it to a hurd translator that exposed those ops

<damo22>but a "database"-like file format is a container for storing data, so might that map onto a netfs?

<damo22>what about doing a SQL query that returns a netfs

<damo22>we have a file format that contains blobs of data but you cannot view the whole file because its too big

<damo22>youpi: can a translator be configured after it is serving a netfs, to change what it serves?

<mbanck>fsysopts IIRC

<mbanck>if that is supported for a particular translator might be impementation specific

<damo22>hmm what about a "live translator" that drops the user to a sub-prompt that allows them to reconfigure the translator on the fly to present a different netfs based on the current SQL statement?

<damo22>it would be like mounting a disk that changes its contents to a different view based on what you want to query

<damo22>and the contents are read out of the database file

<mbanck>sounds like Linux namespaces?

<damo22>settrans -a ./db /hurd/sqlitething --input=mydb.sqlite

<damo22>> select name,datablob from mytable;

<damo22>db/John db/Peter db/Paul

<damo22>> select test, datablob from mytable2;

<damo22>db/test1 db/test2 db/test3

<damo22>I guess this does not map 1:1 onto files unless you put each row result into a file

<Gooberpatrol66>damo22: a tag-based filesystem would be awesome

<Gooberpatrol66>there is some pre-existing stuff like this https://github.com/oniony/TMSU

<youpi>damo22: prompting for data is not something that a translator can do. But an fsysopts call will do, yes

<youpi>the alternative is to use the tag-based approach

<youpi>i.e. the user uses cd "SELECT * from ..."

IRC channel logs

2021-10-29.log