Notes from the DAS/2 teleconference for the code sprint, 8 Feb 2006 $Id: das2-teleconf-2006-02-08.txt,v 1.2 2006/02/08 21:55:14 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Ed E., Gregg Helt CSHL: Lincoln Stein Sanger: Thomas Down Sweden: Andrew Dalke UC Berkeley: Nomi Harris UCLA: Allen Day, Brian O'connor Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Agenda: * progress report for grant renewal * ontologies * ids and urls * style sheets * status reports Topic: Progress report for grant -------------------------------- gh: needs to be in the mail by 5pm tomorrow, to be included as a hard copy addendum to grant. will improve chances of funding for next cycle. review will be done be end of feb. nh: no later than 4pm pst today. state what you've accomplished since Nov 1 and now, in particular this week. one or two paragraphs. gh: 1. highlight significant enhancements 2. involvement of sanger, ebi 3. registry work from andreas, http spec for that registry 4. writeback ad: andreas worked on registry server, will send write up soon post telelconference. [A] Everyone write up 1-2 paragraphs of progress and send to Nomi ASAP Topic: Ontologies ----------------- gh: concerned about ontol attrib in types doc because, do we want it to be possible for a type to be an instantiation of multiple terms in the ontology. ls: will make it hard to validate. one type = many ontol terms. don't like it. types will be specializations of SO terms and will not have multiple parents. gh: thinking about people doing curation. if a type is anchored to one tern in the ontol, and a feat can have only one type, a feat won't be able to refer to >1 term in SO. ls: any use case for this? gh: still exploring this. eg., both a computed feature and an exon? ls: no. separate category for predicted genes. gh: is there something for 'computed exon' or 'computed cds'? ls: think so. sc: multiple branches like go? ls: multiple relationship types do exist. something can be is_a or part_of. I wanted das/2 to be limited to what you can say in SO, with notion that you can extend it. e.g., three predicted exons one with genefinder, exonerate, etc. ad: given a string 'exon' how does that get used to query server? ls: find exon SO term, download list of types from das server, find everything that inherits from exon ontology term. clients need to know how to search the SO list. they will have a local copy of SO that they'll refresh from time to time. gh: client isn't required to know the full structure, except maybe to search higher-level terms. but the term in the ontology attribute is sufficient. ls: could just search types and desc to find exons, but that relies on implementer describing their types correctly. gh: if a client wants to understand an ontol, the best way to go is via what allen's proposing, searching via ontology das, preferably via NCBO server. ad: what is the actual string we're searching on? aday: name or definition, or id. ls: client should have a copy of the SO. unambiguous in this opinion. client has SO, looks through types XML to find what the local types are which the server supports which match what it's looking for in the SO. here's a flowchart: - client downloads SO, caches. - client downloads seq types list, caches. - user searches to find exon - client looks to find matches against 'exon', maybe 5 hits. - prompts user to select which he's looking for - client looks thru cached types xml to find server types of SO term that user selected - client does feature query. ad: what is the string that the user is looking for URL or string? ls: in type xml how do we indicate the term? gh: we've been discussing this the past few days ls: why not replace the term with SO accession number? then we don't have to figure out the correct representation of ontology in an xml. can finish this by friday. chris mungall has weighed in, and xml version of SO ontology is not completely stable. gh: perferctly ok for client to know nothing about SO and treat these as unique string. ls: right. names will eventually be things like 'exon'. aday: chris's main complaint is that the doc didn't validate. I didn't have a dtd. got it and now it validates. I thought this was a done deal. there is a document written that describes how to do what we're talking about. ls: the only thing to be resolved, in types xml document, how do we refer to SO terms? aday: an attribute there that allows you to put in uri. it's a relative url that points to ontology das server to get obo xml for that term. ad: how do I go from string 'exon' to find out what that is? aday: ls: lets say administrator of das server has local type called foobar. associated w/ url for SO 'exon' term. andrew's question is, user want's to search for exons, how to go from 'exon' to correct url in SO to find what types correspond to that? what's to go from 'exon' to foobar. aday: search SO for exon, local types. there's a filter onontolgy that lets you search all terms and definitions gh: there's a reqt now that server must understnd parent child relationships in ontology. aday: server could do xpath query to pull out the terms you're interested in w/o understanding ontology ls: user types 'exon' returns all feats in the genome that are exons. aday: two servers, feat and ontol server gets all types from feat server, each has url to ontology das server, maybe multiple ontology das servers. each must have it's ontology searched returns supported or not. client assembles all search results from static obo xml documents, gh: for most clients this will be irrelevant. user will get a list of types - genscan, blat alignment, for things they may be interested in. they don't need to understand ontology nor does client. there may be a url to look up info about the term. this is the typical case. more sophisticated use cases can be put off till later. ls: in types xml can we have two attributes, url and accession so_accession="SO:12414", other will be url for obo xml. [A] types will have separate attributes for URI and SO accession number Topic: IDs and URLs ------------------- ad: discussion about searching for exon, use case: client goes to server to get list of all types, wants all features of a given type in a given range. may filter based on contains or inside, das-type=xxxxx. talking about that being a URL to get full name for it. what is the thing you send to server to ask for the types? gh: url ad: make this an id so it's not a long complex url. just an id specific to that server. such that you go to feat query url and get it. ls: can just chose the last component of the url, type id. ad: why have ability to get feature type individually? ls: will have to be uniquified, by adding url to types query. ad: feat query = ls: isn't this the way it was? gh: every feat has unique uri. ad: talking about filtering and querying. ls: just give it the id not the whole url. ad: now it is the url ls: should be the id does it make sense to be something that another server has defined? probably not. just a local type. [lots of back and forth here, didn't catch it all...] ad: do we need ability to refer to feature or type by url? gh: yes. for making rdf statements about das2 features. ad: who will do this? gh: I will if no one else does. web technology is moving in this direction. ls: we want every object a das server serves to be referencable as a url/uri. as for filtering mechanism, for type filter we can just use the id of the type, a short string. ad: agree, as of this morning the url and id are same thing. ls: a relative uri, by definition the server should implicitly attach the versioned data source url to it. ad: xml processors ls: define the way the filter query mechanism, hard code implicit paths into it. ls: featuresquery?type=something if 'something' has no slashes, server implicitly adds http://myserver/das/types/... ad: don't like pasting urls and strings together to get things. don't like queries with implicit logic like that. ls: perfectly happy saying you can use urls in the query strings. I'd go with short ids ad: propsing we have both, id and href. here's the case: people uploading to server want to provide a das track, can provide two documents. works well for < 1000 features gh: we have to have uri for features. ad: why? gh: I will send you the page from the first grant. ls: main reason is: to avoid namespace clashes when integrating data sets. td: what do you mean by integrate? ls: view of features from 4 diff annotation groups, want to search for a particular feature by its id, need to indicate which data source it's coming from. td: won't you be keeping track of which data source anyway? you never get a track that's a mixture of diff sources. gh: dangerous to do this. td: there must be something keeping track of which track is from. gh: my assumption is that this is with uri td: there's nothing that constrains a server to only use uris from itself. gh: we sacrificed this when we went with capabilities. ls: a server can emit a set of features, some use relative uris and some absolute ones. if my server starts emiting features with affymetrix uris, the assumption is these originate from affymetrix. uris indicate that they originate from diff places even though you may physically get them from a das server at a different location. gh: thomas is right. given a feature uri you have no way to tell which das server it came from. clients must keep track of this themselves. ls: we wanted to divorce the origin of the feat from the sever that serves it. should be possible to serve features that come from somewhere else. gh: making feature uri opaque was deliberate. ad: when you do a feat query it could return the whole db. so the server must know how to return a feature document that contains all features. that server must know all the data. gh: don't see problem ad: all features and types have id and url. different. url is optional gh: no, required. also, not url, but uri. ad: ok. why should all records have a uri? gh: compatibility with semantic web/rdf, lsid, future proofing. ad: if they want to they can, if not they shouldn't be required. no one is doing rdf now. ls: what issue are you concerned about with respect to uri? ad: like ontology search. give me all features of this das type, you then have to give the url. this is different than id. ls: completely happy treating id as the last component of uri and doing a paste. why don't you like the paste? ad: you can get features from two diff places, each ending with same last word. ls: what query is it that allows you to filter by feature id? we have positional, type filtering and getting a single feature from server of origin. gh: there shouldn't be an id filter. just resolving uri for that feature. ls: we can't search a feature by regex match on it's id. ad: i'm not saying that. I'm suggesting that the url be optional. ls: I don't understand the point. gh: why can't uri be required? ad: see use case in email today subject="ids and urls". involves uploading das tracks to a server. [some trouble: not everyone has seen it] ls: I say we have a policy that if there is big discussion, the email should come more than 30 minutes before conf call. gh: I've read most of it and am still confused. ls: I still don't understand it after reading. you'll have to rephrase it. ad: all types and features have id and url. ls: no, explain in a follow up email. ad: ok [A] Andrew will send follow up email to elaborate on his "ids and urls" use case [A] Everyone will try to absorb andrew's ids and urls use case Topic: Style Sheets ------------------- ad: how do you refer to elements in style sheets, by id or url? gh: no opinion ad: if everything is refered to by id, that makes style sheets easier to write. gh: has anyone gotten to implementation of style sheets for das/2? ad: my proposal was a straw man. Topic: Status reports --------------------- gh: reading lots of specs. after yesterday's rant about xml:base last night, implemented a stack. works fine for our current server. we shouldn't throw out xml:base because of a few edge cases. we might want to specify which subset of xml:base we use. checked in code for igb client, does capabilities, specify feat, types, segments. trouble when modeling sequences. ee: working on das/2 client. building new widget as gregg asked for. ad: working with andreas write up for registry. td: understanding the spec. xml parsing. gh: you are using stacks, have experience with it? td: yes, less painful. streaming api for xml. gh: tried xom. picky about namespaces. difficult to use with spec that's not stable. td: some trouble with dom gh: sources, types, segments I use dom (small document). for features use sax nh: progress with apollo. list of versioned sources, show segments, user picks, gets features. something that the parser doesn't like. not sure where the problem comes from. sc: working on setting up internal das server on 64bit machine here. refining the pipeline for generating files for loading the affy das server with updated data for various public and affy data sources. also writing up and posting meeting notes. aday: message from gavin about ontology responses. caching issue cased trouble with model/controller. chris's obo dtd. dependencies for server rpm were finished. now building the rpm. td: prsing xml from codesprint server. a few things are matching the spec from a few weeks back. prop, loc elements. will these be changed. aday: feature xml? td: yes. I'm still absorbing the changes, dozens of mails about feat properties. gh: more important is loc element, splitting into id and range. used to be one thing, now is two. one is id, other is start,end,strand. aday: will look into today. nh: I'm also taking charge of getting grant progress report done. especially need allen re: server, andreas via registry. gh: any reports for write back. brian: some work on that. not ready for prime time. gh: roy? ad: some talk about this puts and deletes on the urls. gh: let's talk about it tomorrow.