Notes from the biweekly DAS/2 teleconference, 8 Jan 2007 $Id: das2-teleconf-2007-01-08.txt,v 1.1 2007/01/08 20:05:30 sac Exp $ Teleconference Info: * Schedule: Biweekly on Monday * Time of Day: 9:30 AM PST, 17:30 GMT * Dialin (US): 800-531-3250 * Dialin (Intl): 303-928-2693 * Toll-free UK: 08 00 40 49 467 * Toll-free France: 08 00 907 839 * Conference ID: 2879055 * Passcode: 1365 Attendees: Affy: Steve Chervitz, Gregg Helt CSHL: Lincoln Stein Dalke Scientific: Andrew Dalke Note taker: Steve Chervitz Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Agenda ------- * Status reports * URIs for global seq IDs gh: On the das2 get html spec, I took out all xxx comments. Just committed today. not all fixed, but adjusted language as needed. Remaining xxx comments are still present as html comments. we can keep editing, looking for hidden comments, adding info as we can. I also refined some text, discussing coordinate elements, where to find uri for those. put in pointer to global seq ids page on biowiki, said it will soon be incorporated in sanger registry. Todo: alignments, and address global seq id fuzziness. should be able to clean up the seq id stuff based on today's teleconf. so really just aligmnent examples. [A] Gregg will add alignment examples to das2 get spec html version [A] Announce das2 get spec (html and xml) when alignment examples are complete [A] discuss with andreas how to incorporate global seq ids into registry gh: now weh have gobal seq ids in biowiki, added by lincoln, but editable by anyone (handy). I want to integerate into sanger das2 registry, and generate from that or vice versa a das2 server that serves those things up. makes sense to have those uris served up as versioned source in a das2 document. now, coord uris are only available in the biowiki page. ad: only need to be abstract strings if there's no need to resolve them. gh: want these to be uris for a das2 versioned source. ad: get ncbi to do it? gh: not necessary. just make them accessible through the das2 registry. doesn't matter where it is. ad: how use? gh: someone can then go to das2 registry and see all the ones that have a global coordinate system. ad: can be done now. gh: want to avoid screen scraping the html page. ls: need to have a set of url's for parsable documents with coord systems? gh: we have xml for versioned sources in segments xml document. there's direct mapping from v-source id to coord uri, and a direct mapping from segment reference uri to segment id. we need a v-source document where each has a uri for global coord system. there would be just one capability -- segments, retrievable in the segments xml. ls: right. who's maintianing? there are 1000s of genomes. ad: coordinate element has other slots, attributes, what else is needed? gh: there is no central way to look at all coords that are available? ad: why needed? can't you get it from the sources document without looking something else up? ls: registry could compile a unique list of that and produce a report which has a list of all coordinates followed by all data sources that use that coord system. would be useful. would show who is using various systems. Spot bugs like two servers using diff coord systems for the same taxon. Server could show coord systems on a per-taxon basis, no additional query needed. gh: another example: part of the problem for me is relationship between coord uris and segment reference uris. ls: why? no consistency enforced in spec. gh: no guarantee that if you use the same coord uri that you'll use the same reference uri. another problem: looking at biowiki doesn't tell you how to construct a coord element. coord elem has attribs for taxonomy, authority, etc. ls: the coordinates url should point to the biowiki page with correct anchor, need a line for each coord system. I didn't give url to coord system. gh: attribs aren't on the wiki page. gh: people need to know that's what they should use in the das2 documents. we should show on page: "name for das2 full coordinates element is 'such and such'". people can then cut and paste. ls: ok. [A] lincoln will post additional attributes on global seq id biowiki page gh: then screen scaping by registry to grab all coordinates to make them available. ad: why do screen scraping? gh: ... ls: when people add info in the registry, then that goes in. gh: if using wrong authority? ls: it goes in wrong. gh: would like the server to catch it. ad: when registring a server, it presents a drop down field gh: where does it get that list? ad: best to ask andreas. gh: should be within biodas.org sc: wikification progress of biodas.org - can migrate global seq ids page there when it's ready. gh: might be best to let Andreas decide, to manage it on his server. sc: he's been active in the initial wikification work, so maybe he'll be ok maintaining/migrating it there. ls: some confusion over where to add this info (biodas.org, wiki of that, or open-bio.org wiki). only want to do this once. [A] steve/andreas will complete biodas.org wiki migration and notify all gh: status continued - alignment examples to spec are still to do. igb release in december: pays attention to coord uri's should be able to match up biopackages and affymetrix das2 servers v-sources on the same genome and overlay rather than making a new genome. in the next month: working on getting transcriptome data in a das2 server. lincoln had mentioned NCI interest in this. most of code is in place to serve up affy transcriptome data as graphs, a datum every 20 bp, e.g., efficient slicing from whole chromosomes, to get what you need for das range query, and bring them into igb as slices. working on serving up in alternative formats (e.g., UCSC's wiggle) rather than just affy binary format. Return options - graph the size of whole chromosome (now), or more per region, put score in a das score element, which would be a very large document. ls: be prepared to return in a das2 xml document, each score in an element, not sensible over whole chromosome, but ok for a limited region. A good form of compatibility. gh: size issues - could give a 'request too large' error. ls: could use http compression. NCI will likely never support the specialized format, so if you don't give das2 xml format, it will not be available to that client. ls: brian gilman - can't pay him to do more work on that contract. gh: might be able to pay him via affy. need to get this going within next month, couple of publications need it. ls: status: - took xml parser for perl das2 client, cleaned it up, put it on perl cpan website, underlies parsing and processing of das2 streams. pure perl, no requirements for c libraries, not validating sax client (so it's faster than the c libries). it does namespace handling, multi-threaded. offered as a standard reference. missing handling of features (types, sources, segments) -- big hole. NCI java client library went thru it's approval process, still doing various tests and qualifications before folding into main NCI source code repository. Hapmap data source still in progress. Just xml parser on cpan, full server not complete. gh: using biopackages server? ls: yes, an instance at CSHL (or will). ad: holidays and was sick for last month. working on proxy code, rewriting, fixing. sc: regarding your das2 committment? ad: making up for sick time last month. plan is to get proxy stuff done, then that's it for das. gh: (more status) also working on getting the das2 feature query support fleshed out. handle any combination of filters. coming soon, moving to new affy server hardware. on steve's plate. sc: worked on biodas.org wikification. some server configuration issues. have made a good start with Andreas' help, but more to do. should be in place later this month. some html get spec edits, fixes. planning to help gregg set up new hardware for affy das server. can then support more arrays, genome versions, organisms. gh: hardware - hoping it would be here. approved by purchasing on 1/5/07. PO likely went out to vendor, so should be in within a week. we have requests in to support more versions, probe set location for exons on older genome versions. sc: transcript annotations? gh: background - affy chp data has no genome location, just probe set id and score. IGB takes that data and merges with genome info to build heat maps to look at data. Been tricky to determine most efficient way to do that. Need to have both probe set level and transcript level data. in progress. [A] steve talk with UCSC about meeting focussed on das in feb/march [A] Next das2 teleconf: 22 Jan 2007