IRC channel logs

2021-10-29.log

back to list of logs

<damo22>youpi: i think we need a filesystem that has rich metadata capability that is free software, suited to HPC environments where you might want to attach arbitrary tags or metadata to your files
<youpi>is xattr not enough?
<damo22>xattr, what can you put in there?
<youpi>whatever you want
<damo22>can you put a blob of json?
<youpi>man xattr says an attribute value is limited to 64 kB
<youpi>(but even that might be outdated, and actually larger)
<youpi>now if you want more, perhaps better use file formats such as hdf5
<Pellescours>damo22: (do you have news for your patch in netbsd?)
<damo22>not really
<youpi>ah, ext4 stores the xattr in a block, so it cannot be larger than one block (from 1024 to 4096 depending on your device size)
<damo22>i mean, 1024 bytes isnt bad, but supposing you had a bunch of metadata you wanted to store that could be used to search the file efficiently like a db
<damo22>eg, in a health care setting, the concept of "patient" and "test"
<damo22>and any other relevant batching
<damo22>you could make the storage of the files more intelligent
<damo22>so it could do things like, if you access one patient from a batch, it shuffles all the patient files from a batch to SSD
<damo22>and not at the application layer
<damo22>and making some files immutable once created
<damo22>what kind of thing am i talking about, is it a filesystem?
<youpi>filesystems and databases are not so far apart actually
<youpi>sometimes you'd like a mixture of the two
<damo22>is there anything like this in free software?
<youpi>well, sqlite, hdf5, etc.
<youpi>the thing is: the more you ask at the POSIX file interface, the less probable you'll get it
<youpi>so it's indeed simpler to just use a file and implement a database on top of it
<youpi>be it hdf5, or hadoop etc.
<damo22>why would i want a file with a database in it?
<youpi>because then it's easy to move it around, put it on a different filesystem, etc.
<youpi>whatever the OS or filesystem
<damo22>i guess so
<damo22>we have that already with SAM format
<damo22>and block gzipped into BAM
<damo22>but then we need a separate database to manage all the database files
<damo22>i think that would be better done by a filesystem with metadata
<damo22>hmm i could write a hurd translator for SAM files
<damo22>what would be the benefit of a hurd translator for something like a sqlite file?
<damo22>what would it do?
<youpi>you could for instance connect() to it, and talk mysql
<youpi>-my
<damo22>heh yeah
<damo22>so i could use an existing library for a "database"-like file format and link it to a hurd translator that exposed those ops
<damo22>but a "database"-like file format is a container for storing data, so might that map onto a netfs?
<damo22>what about doing a SQL query that returns a netfs
<damo22>we have a file format that contains blobs of data but you cannot view the whole file because its too big
<damo22>youpi: can a translator be configured after it is serving a netfs, to change what it serves?
<mbanck>fsysopts IIRC
<mbanck>if that is supported for a particular translator might be impementation specific
<damo22>hmm what about a "live translator" that drops the user to a sub-prompt that allows them to reconfigure the translator on the fly to present a different netfs based on the current SQL statement?
<damo22>it would be like mounting a disk that changes its contents to a different view based on what you want to query
<damo22>and the contents are read out of the database file
<mbanck>sounds like Linux namespaces?
<damo22>settrans -a ./db /hurd/sqlitething --input=mydb.sqlite
<damo22>> select name,datablob from mytable;
<damo22>db/John db/Peter db/Paul
<damo22>> select test, datablob from mytable2;
<damo22>db/test1 db/test2 db/test3
<damo22>I guess this does not map 1:1 onto files unless you put each row result into a file
<Gooberpatrol66>damo22: a tag-based filesystem would be awesome
<Gooberpatrol66>there is some pre-existing stuff like this https://github.com/oniony/TMSU
<youpi>damo22: prompting for data is not something that a translator can do. But an fsysopts call will do, yes
<youpi>the alternative is to use the tag-based approach
<youpi>i.e. the user uses cd "SELECT * from ..."