Notes from the weekly DAS/2 teleconference, 23 Oct 2006 $Id: das2-teleconf-2006-10-23.txt,v 1.1 2006/10/24 01:15:21 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Gregg Helt, Ed Erwin UCLA: Allen Day Dalke Scientific: Andrew Dalke Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Agenda ------- * Status reports * Spec discussion Status Reports --------------- [Note: lots of digressions within status reports] ad: Have been looking at how Tim Hubbard's group is using das/1. gh: you are acting as our proxy to the uk group. gh: andreas has been working on das registry. ad: yes, in use for both das/1 and 2 servers. gh: am interested in his work to ping servers to test for live-ness. gh: see my response on das discussion list to Brian Gilman's message. where to find das/2 servers to hit on. biopackages was not giving correct answers for sources query. ee: was true two weeks ago. aday: just a bug. gh: we need to get both servers fixed. need an automated way to figure out when servers are down, such as what andreas is doing with das/1. [A] Andrew will ask Andreas about live-ness test for das/2 as well. gh: andrew's validator could be scripted to do this, too. gh: your validator is not running, btw. ad: server rebooted, not set up to restart automatically. [A] andrew will see that his validator server is up (done). gh: affy server is serving up incorrect xml base now. code is set up to allow which xml base to use. [A] steve will fix xml base on affy server gh: need to use four arg version: port, data dir, email for maintainer, xml:base without xml:base, everything goes screwy gh: Andrew's validator should catch this since xml:base resolution of capabilities would resolve to local host which would throw an error. ad: yes. gh: Andrew: you are focusing on das now? ad: this week at EBI, then next month focusing on DAS work. Status (continued) ------------------- gh: this week - distracted by igb issues, also on 1/2 time this month, so no new das work to report. ee: gff3 parser, got feedback from lincoln. adding support for track lines, several of our parsers there is a diff between the way igb puts things into tracks and the way the ucsc browser puts things into tracks. in igb: we put thing into tracks based on source field. so one file can lead to multiple tiers. in ucsc: everything below track line goes into one track. Soln: if there are track lines, do it the way UCSC does it. Otherwise, do it the igb way. Also worked on coloring by score (affects gff, ed, and one other). Makes it similar to ucsc. Assumption is white background. It is rigged to be based on normal foreground and background colors. white = ucsc Also participated in the java "ask the experts" thing: asked about swing, but they didn't answer. gh: das2 style sheets? ee: yes, how free am I to change that spec? ad: go for it. ee: don't want spec to say you need to use certain shaped glyphs -- hard to support. just simple things - colors, labels. ad: asked uk folks about style sheets, they haven't done anything. gh: gbrowse (lincoln) uses style sheets for das/1. ee: the stuff in das/2 come from das/1? ad: yes, with some changes. ee: also need to do documentation. sc: worked on added data for currently unsupported arrays on the Affy DAS/1 server to the quickload directory. Got some requests for mouse assembly aug 2005, RG-U34 rat arrays. Didn't update the annots.txt yet, so IGB users won't know they are available. [A] steve will update affy quickload annots.txt sc: ideally, this should be automated. gh/ee: could possibly have IGB detect these without needing to update an extra file. But there was no standard way to read directory contents. gh: chp files have no genomic location for probe sets, so igb needs to look this up, likely via das/2 server. primary way for people to look at results in igb. sc: did some work on loading exon array annotations into das/2 server using gregg's new bp2 format (reported last time). Didn't see any justification for the "probeset with zero probes" error it threw. [A] gregg and steve will look into bp2 format parsing issues [A] gregg will put in order for new hardware for affy das server aday: porting gff3 into writeback server as an alt format for loading data in. Email thread with Ed - ambiguities in the gff3 specification [A] Allen will forward email to list. aday: some communication with lincoln's group, re: validator. I need to create some sample gff3 docs to make sure validator can parse them all. will adding support to parser in bioperl (likely). Re: alignments: target and source have to be stranded, length of one have to be equal to or less than the one it's aligned to, etc. No work on server uml. hold off until spec is finalized before committing to uml model. Eg., fasta response not mentioned, broken hyperlinks, no response from Andrew. gh: fasta? aday: refered to but not described. properties response mentioned but not described. fasta has been replaced by segments, properties gone. See email on list. sc: sequence retrieval command used to return fasta format, hence the fasta request. this has been replaced with segments, but spec not updated. gh: property capability? aday: yes. not sure how to proceed yet. [A] Andrew will fix/respond to issues raised by Allen. gh: another spec issue: last code sprint I didn't like semantics of range feature filters, I eventually caved to majority. caveat: I wanted an optional attrib in types doc to say: "here's a type but you can or cannot use it in search filter." I.e., optionally restrict which types you can use in those filters. If false, it indicates to client it shouldn't use it as a searchable thing. ad: if it does anyway? gh: server could throw an error ad: or not return any results of that type? gh: ok ad: reason for this? is there a better word than 'searchable'? w/r/t the problem domain. gh: the reason: I want people to search for 'genscan transcripts' not 'genscan exon' because of how we decided to do range queries. ad: not sure why someone would want to do this. gh: it was agreed on at last code sprint... [A] gregg will write up use case for range feature filters underlying his need ad: Regarding parent and child bidirectional feature pointers: I'm willing to say that there's no need to assemble features dynamically on streaming approach. so we can get rid of parent or child relationship. make it more like gff3 to have parent link only. gh: worried about not having full closure. could get parents that don't know about child. if you have child, do you then have to have every parent in the response? ad: I thought we required it? if there is a feature then all features in that group must be returned. ee: never a fan of specifying both parents and children. can lead to mistakes - not compatible. andrew says parsing is more difficult... ad: when processing input you know when done with a feature group. this is useful. if no one impls it why have the overhead? ee: impl doesn't seem difficult gh: my impl doesn't catch cycles. still have to do cycle check regardless if it was bi-directional. ad: can't find a simple algorithm for doing it. gh: keep children around. check if tree is complete. bidirectionality allows me to crawl tree. ad: you don't check for cycles or multiply rooted trees. ee: just assume there are not such problems. ad: I don't like bogus data. ee: my gff3 parsing, I wait until end to assemble things. ad: as mine does, too. worried about extra fields means more possibilities of breaking things. bad data. ee: should be able to detect bad data. ad: duplicate links means you can't assemble from one but not other. most people will not check both. gh: main justification was to get complete feats before end of doc. lincoln was the one who wanted this ability. ad: several ways to do it. eg. contained feature elements with all children, spanning tree, etc. ee: catching loops is hard, need to wait till end. gh: let's wait till lincoln comes in. [A] Everyone will revisit bidirectional parent-child pointers with Lincoln Other issues: ------------- ad: Regarding Brian's question from email, the xml document he sent. gh: my reply: document was otherwise correct but xml:base was wrong. ad: also: lowercase close types element at end. ad: know anything about brian's deadline mentioned by lincoln? gh: no. [A] Someone will send Brian pointer to Andrew's validator. ee: das/2 impl is not usable by igb now. need to fix top-level document. gh: we really need an automated way to know when server is having problems. gh: conf call with Andreas and other's in UK? can set up a conf call to talk about registry. Also coordinate mapping - when one system is the same as the other. ties into registry stuff. [A] Gregg/Andrew maybe will have conf call with Andreas while Andrew is in UK