-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A schema for collections? #40
Comments
How's about a FileManager object that inherits from a dict or list, depending on whether or not key or integer-based indexing makes sense (I typically use, and prefer, key-based indexing so you're robust to shuffling / partitioning), and contains a FileCollection, consisting of fields which point to any number of file paths. As an added bonus, we / the user could register different load / open methods with filetypes for transparent (lazy) loading, i.e. "npz" -> np.load, "jams" -> jams.load, etc.
Additionally, if everything inherits from JObject, then this database-style object can be saved / loaded just as easily. Thoughts? |
I'd argue that int-based indexing never makes sense, unless the int is actually treated as a key (eg in gtzan). It may also be worth looking at something like asdf for inspiration, since they have many of the same problems we do.
I like this idea, but transparent loading seems a little tricky to get exactly right. Ideally, I'd want to be able to clobber load arguments (such as audio sampling rate). This could be supported pretty easily by setting defaults on kwargs, but the resulting api may be kind of a mess. Maybe we should ponder on that a bit. |
Circling back on this after a bit of pondering. fmgr = FileManager()
fmgr['my_song'] = FileCollection(
audio='/path/to/my/song.wav',
annotation='/a/different/file.jams',
features='/data/features/my_song.npz') This looks exactly like a dataframe to me. # Assuming 'npz' -> np.load by default
data = fmgr['my_song'].features.load() How about something a little less objecty? I like your idea of having a dispatch object that can map a key (eg This way, we don't have to worry about schematizing the whole thing, and it becomes much easier to import data sets on the fly. (We can also tag along non-loadable fields at the same level, such as an artist id for split filtering.) |
Punting this to #98 |
Having thought on this for years at this point, I think the reasonable course of action here is as follows:
|
Of course, it couldn't be that simple. MongoDB does not support $ref in json schema (?!). |
Going back to this comment, we punted on the idea of managing extrinsic data (eg, file paths) explicitly from within a JAMS object. Now that the dust has settled a bit on JAMS schema, I'm wondering if we can come up with a better solution than sandboxing this stuff.
I bring this up because maintaining links between audio content and annotations is still kind of a pain, and I'd prefer to not solve it over and over again.
How do people feel about introducing an interface/schema for managing collections of jamses? At the most basic level, this would provide a simple index of audio content, jams content, and collection-level information. (It might also be useful to index which annotation namespaces are present in each jams file.) This kind of thing can spiral out of control easily, so if we do it, we should keep it tightly scoped.
The text was updated successfully, but these errors were encountered: