Notes from the DAS/2 teleconference for the code sprint, 7 Feb 2006 $Id: das2-teleconf-2006-02-07.txt,v 1.1 2006/02/08 00:37:41 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Ed E., Gregg Helt Sanger: Andreas Prlic, Thomas Down Sweden: Andrew Dalke UC Berkeley: Nomi Harris, Suzi Lewis UCLA: Allen Day Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Agenda: * Vote on constructing URLs/URIs to query segments, types, features * Status report from people * Ontologies * Feat property changes Topic: Constructing URLS/URIs to query segments, types, features ---------------------------------------------------------------- 1.) specified by query_id 2.) hardwired to ~/segments, ~/types, ~/features 3.) ? ad: lots of people have left here so the vote won't include all. see email why a query url is useful agree w/ gregg: short names could be a nice to have. shouldn't have to worry about how you organize your urls gh: yes it does: this/types this/segments etc. ad: can take it out if there's confusion gh: recommended structure is good. ee/gh: people will look at the examples and do it that way. they won't look at .rnc file gh: make it clearer in the spec that these are merely suggestions of the hierarchy, you don't have to do it this way. ad: roy's view: likes the query id url for doing search for all featues, or all types. query id is the url used to do search against features. uri could be relative or absolute. gh: category element defines a query id for a subset of das. it's the attribute query id in the category ad: I also want to rename category back to capability. how do we arrange urls in a versioned source. construction off of strings or via attributes in a url gh: votes for hardwired, but feels less strong today about it. ad: majority vote is for query id, spec czar goes with that. [A] query id [A] andrew will update spec to have less mention of hierarchical structure [A] allen will update server to do it the recommended way gh: in addition to have an arbitrary query id to get segments, types, features, there's a recommended way to do it via the hierarchy. server should do it the recommended way (hierarchy) ee: we should be flexible about it. gh/ad: ok take out recommendation. Topic: Status reports --------------------- ad: see his emails. gh: we need examples in spec document and scratch to be better synchronized. ad: should be, i've been trying to keep these in sync. gh: plan to push into html, incorporate scratch into doc? ad: yes, eventually. will also add andreas' work to scratch too. td: java xml binding libraries, how to put it into a workable server ap: das registry, sources command, attribute handling, people can connect to a toy server publically available. gh: registry will respond? ap: yes. toy server, toy data like das1, returning sources command. gh: can you add allen's codesprint server? consider it registered. ap: is fully working? gh: can allen send a command to it to register it? ap: no. gh: would like to tell my client to do discovery rather than hard wiring. gh: comits to igb das/2 client to handle seq, segment, types. not features query yet. given decision about url construction, can do this fast so we can test on codesprint server seq, seg, types to bring up something meaningful in gui. not features by today. affy das/2 server is running behind. will sync up today as well. nh: apollo working out sequence, segment, types request. now does versioned sources. integrating those into query gui as well. aday: changes early this am. server running under /codesprint is now a static doc pointing back to the old server. adding segment command, merging region and seq command. has made everything except capabilities writeback stuff. ad: there's another request recently, see my email. aday: have gotten 40 emails from you in the last day! aday: brian oconnor is working on bundling dependencies for an rpm based release. gh: I also did significant refactoring/moving assay/ontology stuff into subclasses on client side. haven't seen brian's code, but should run fine. Topic: Integrating Sequence Ontology with DAS/2 ----------------------------------------------- suzi: national center for biomedical ontology, one of 7 natl centers for biomedical computing. focus on needs regarding developing and using ontologies. gh: hoping to have a typing system in das/2 via types queries that references SO but doesn't require client to fully understand ontologies. too much of a burden. that's the challenge. this translates into referring to ontology terms as opaque uris suzi: 'understands' means they're ignoring any relationships between types. gh: yes. currently type has attrib for id, attrib for ontology. ad: uri or arbitrary string suzi: can use uri or string, preprocessed ad: one or the other gh: prefers uri suzi: from uri you can get the string gh: not clear how to construct uri for particular terms in an ontology doc suzi: this will happen in next few months. talking with daniel rubin about this. gh: this is where allen comes in. ontology das. aday: next step is getting it hosted on NCBO server. currently communicating with chris mungall. said they're planning on implementing something similar soon, not sure if they'd accept allen's solution. unclear. working with gavin sherlock on ontology support for microarry samples, tissue type, phenotype. was hoping people could pick this up and use it. suzi: gavin and I could help push this. gh: chris m posted concerns about obo xml that's in allen's scheme isn't same as what he's using. re: how you make absolution uris. aday: there's not much docs on obo xml format. did the best I could. suzi: should be able to sort it out. just an inertia problem of getting it installed. not a competition issue. fine with me. not difficult? aday: by end of week we'll have an rpm. suzi: let's keep pushing on this to make it happen. I'll talk to gavin tomorrow. can we install on sf site, or do we need to set it up elsewhere? aday: could conceivably set up a cgi on sf. uses custom apache handler tho. gh: more ontology q's can wait till tomorrow w/ lincoln. concern: how do we deal w/ types that represent more than one ontology terms. defer discussion till tomorrow. Topic: Feature Properties ------------------------- See andrew's post today. ad: this ties into ontologies. two ontology related issues: two different ways to query. ontology of a feature, and two diff ways to search a db for that property: exactly equal, or a subtype. this is a property with two diff searches you may want to do on it. properties like note, alias, phase have ability to search key/val properties, e.g., att:alias=something. score is a floating point number you may want to support > or < on it. regular exp searches, identical, etc. td says use xml query language, but worried about complexity of this. 99% of time this is way more that you need. scenario: given 4 different notes in a feature, is order important? extensions: curation point gives curator's name and time stamp. e.g., search for all featues modified by andrew in 2004. discussion: pull this into a note element, perhaps phase and alias too. property table only supports a substring search. give me an author name, e.g. not saying getting rid of tag values. server supporting new data types, extensions, feat search w/ sanger curation elements for query. or thomas xml search. this is why I want to move categories back to capabilities. gh: more appropriate as capabilities than header. ad: someone can get a document. andreas can combining many servers into one, say: which one supports which. to summarize: - properties are simple strings - only substring searches - change att: to prop: - note and alias and phase are elements - advertise that a server has extension to das query lang gh: what about phase? lincoln needs it. ad: if it's something that people will be editing, make it a element. gh: phase is inappropriate for certain types. would like formal way when it should be there or not. ad: this is formalizing a way for server to tell client that there are more types of searches available. can't see how to do it automatically: eg for a given score, knowing what is considered significant (low or high, e.g.). td: if he needs a phase he re-infers it. doesn't work for partial CDS tho. gh: how much spec churn will this generate? ad: [various things, half a dozen or so, some simplifying] gh: does a colon in a query string need to be escaped? if so, this makes it hard to read. ad: could use prop_ rather than prop: thomas and I had long discussion about this. [A] andrew will incorporate these changes into feature properties Topic: Maintainer information ----------------------------- ad: modified examples under scratch gh: maintainer at source or version level ad: one for all sources level ap: at sanger we have one central server with lots of sources. notes who's responsible for which server. gh: ownership cascades down to sub elements? ad: yes Topic: XML Base --------------- gh: can be in any element. as well as xml:lang, don't really understand. ad: it's what the atom spec does, so we copied. maybe for bidirectional languages. gh: flexible uri resolution scheme w/ xml base. implementation in java tools is spotty for xml:base. curious about java obj binding of xml what support they have for resolving xml base. at this point will have to roll it myself. want to ask thomas about this. ap: he's using Stacks parser, gets global namespace. gh: bigger concern for when we have to use sax, need to do xml:base resolution, eg. when we need to retrieve lots of features. ad: it can be done with sax. gh: not hard, but it is a multistep process. ad: multiple levels of xml:base ad: tomorrow's agenda: go through roy's otter stuff, convert into new das format. to get a feel for how data will look. see roy's email. to use experience gathered from otter to make sure we're sufficiently covering features. gh: talking about writeback? ad: premature. let's talk style sheets wed, and writeback thursday. plus anything else that's come up about the spec. want to know how style sheets will look. lincoln should be able to help out there.