Notes from the biweekly DAS/2 teleconference, 5 Feb 2007 $Id: das2-teleconf-2007-02-05.txt,v 1.1 2007/02/05 19:08:10 sac Exp $ Teleconference Info: * Schedule: Biweekly on Monday * Time of Day: 9:30 AM PST, 17:30 GMT * Dialin (US): 800-531-3250 * Dialin (Intl): 303-928-2693 * Toll-free UK: 08 00 40 49 467 * Toll-free France: 08 00 907 839 * Conference ID: 2879055 * Passcode: 1365 Attendees: Affy: Steve Chervitz, Ed Erwin, Gregg Helt CSHL: Lincoln Stein Sanger: Andreas Prlic UAB: Ann Loraine UCLA: Allen Day, Brian O'connor Note taker: Steve Chervitz Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Agenda ------- * Open discussion * Status reports * Overall plans, timelines from now till End of May (end of grant term) * Global genome assembly and sequence IDs, how they fit with registry Open discussion ---------------- ap: biosapiens meeting this month, announcement sent to das/2 list gh: Ed and I will be attending. [A] Gregg/Ed meet w/ Andreas prior to biosapiens mtg, Sat during day. al: anyone interested in setting up das for plants, starting with arabidopsis, then rice and poplar? ls: we have das/1 server for rice in Gramene. al: didn't see it on public website, sent feedback, no response. I want to set something up to viz data in IGB. better to use das/2. rice, annotations from Robin Buell's group and an Arizona group, a new genome from moss in a few months. Two tiling array data sets for Arabidopsis, Joe Ecker, and another based on Nimblegen. Everyone has Arab v6, but some cDNA-to-genome alignments scattered in different places. Good to capture in das. Hard to distribute as a giant file (as I've done). ls: Happy to send all data from Gramene if you want to set it up in das/2. Rice annotations, cross-species alignment data. Poplar being added. Can give it in Ensembl format (schema database files). al: Interested. bo: what distribution of linux? fc2 or 5 or gentoo, you can install via rpms (prepared via biopackages.net). I can help. gh: for tiling array data, working on support for that in affy server, might be worth looking into. al: timeline? something working in the next two months. Announce on arab mailing list etc. access data programmatically via das and viz via IGB, gBrowse, etc. Want to publish it as well, so authorship is available. Can compensate anyone who wants to help out. Registry ----------- al: would be great if more people would use it. Eg. can Gramene register? ls: fair enough. [A] Lincoln will register Gramene das servers in the Sanger das registry gh: regarding das meeting that Andreas is setting up. ap: biosapiens, a european funded project. goal to annotate the human genome. dedication to use das, mainly das/1, mainly protein sequences. focus on people doing das client-side dev, want to invite technical folks, network, share ideas, find synergies between groups. So far 24 have registered. ee: how are reservations, enough space accomodations? ap: yes, rooms in hinxton conf center, working on writing confirmations. after client meeting, there will be a meeting on annotation type ontology work, a project to standardize the annotations being provided. I am syncronizing with this person (Henning). Everyone who wants to can give a short presentation, 15', what features are needed, etc. gh: would love a session about das/2, addressing those needs, how well does it address needs of people doing protein das. ap: summarize new features, etc. [A] Biosapiens mtg talks: Ed will talk about IGB, Gregg talk on das/2 spec gh: what's useful is seeing whether das/2 things will meet needs of protein das world, not too much focus on proteins. Esp mult seq alignment. ap: we'll have people from pfam, jalview. al: jalview is a very nice program, impressive work. gh: some integration with Apollo (can use jalview to view msa), I believe. gh: useful for us to be a biosapiens-specific part of the meeting? ap: i'll send you program gh/ee: interested in what the biosapiens project is up to in general. [A] Andreas will send gregg info about biosapiens meeting, program, contacts Topic: Global seq ids --------------------- gh: in das wiki pages we have a global seq ids page. summarizing what std uri's for coord systems and sequences. wondering how to sync this with registry coordinates. ap: if wiki should be point of reference, and registry sucks this in automatically. if someone breaks wiki with bad text, then breaks registry. gh: yes, but we want it to be editable so that users can add new organisms. not sure the mechanism. sc: allen day's wiki module. converts to dom. maybe? aday: not a dom, but a latex format. there are some other modules out there. ap: uris are not linked to actual project doing the sequencing. gh: we talked about a couple of weeks ago. not it's just text. "march 2006", then "uri". proposal: add "Here's the coordinates fragment you should have in your das/2 xml request". Then the registry could parse this document and just look for coordinates elements to see what's defined here. ap: yes, for das/2 registry, initially wrote code that is compatible with das/1 and 2. Now it's hard to write code that can work with both. need to re-write das/2-specific code. Thought there could be a generic interface, but there are too many small differences. gh: yes. lots of things that are more tighly defined in das/2. can see why it would be difficult. goal now: someone who's setting up a das/2 server can see what their coord uri should look like. later goal: screen scraping, syncing with registry. ee: is registry for das/2 auto generated? ap: yes. coming from my code that works with both versions. but it won't scale with all new features. gh: lincoln said he'd put those snippets on the biodas seq ids page. what's status? ls: still willing to do this when ready. sc: still planning to migrate the global seq ids from the open-bio.org wiki into the new biodas.org wiki. simple matter of cutting and pasting html and setting up a pointer from old page. Will focus on finishing this week. [A] Steve migrate global seq ids page from open-bio wiki to biodas wiki. Goals and timelines -------------------- [A] All: send goals and timelines thru end of May to the grant list: das2grant@lists.open-bio.org This will help us figure out where we can get to through end of grant period. Status: -------- ee: ad end of March, I'm moving to a different project, won't have lots of time post march for das/igb. my timeframe is therefore shortened, unfortunate given my long list. will focus on making it easy for others to add plugins to extend igb, interacting with igb via http protocol, ensure igb uses std file formats for easier sharing with other apps, includes stylesheets. Some other little things. Need a bug fix release for igb soon. Would also like to make interaction with das/2 registry better. Problem now: doesn't realize when two genome versions are the same. Have been talking to Andreas about this. will be available sporadically, perhaps. unfortunate, since I've noticed that use of igb is going up rapidly, in the past couple of months. sc: if it keeps going up they'll have to put you back on igb! ee: purely a budgetary issue, not because igb is considered unimportant, but just a need to shift resources to new project without hiring new devs. gh: my das percentage will go up in March, focus on getting good paper out on das/2 spec, submit to open access journals, PLoS, Biomed central, etc. Have been sick for last week, so not much progress in last week. sent out schedule for my goals on the das2grant list. Review: * additions to retrieval spec doc (html), diff kinds of features * affy chp file viz in igb, leveraging calls to das/2 to get genome locations for experimental results. slowly but steady. * merging quickload functionality into igb, completely via das/2 but hides UI * Bug fix release for igb * genometry and das/2 server - efficient retrieval of slices of data. that server is incomplete re: feature filters, bringing up to spec re: arbitrary combinations of feat filters. * biosapiens meeting * writeback impl focus in March * major igb release toward end of March * das/2 paper in April, submit in May. sc: will paper focus on retrieval only? gh: want to get writeback going in March, to include in paper. it's a major part of the spec. stable, but untested. sc: configuring affy das/2 public server, adding support for more arrays and genome versions. Want to focus on streamlining pipeline that keeps annotations up-to-date on affy das/2 server. Also, will help as needed on supporting exon array features (and related gene-level support). Working on wikifying biodas.org. Plan for this to be completed this month. Andreas has helped here. aday: was working on a manuscript, not as much das work done, but did get env set up to start writing UML, stubbing out files. Am optimistic now, lots of off the shelf components that make it easy for das/2 server package, eg. from biopackages, gbrowse, blat, blast. Shipping with yeastgenome.org data file, bio::db::gff memory adapter to query and serve, need to work on writing out das2xml. gh: this will be a second backend aday: a first backend for the current project aday: adding binaries for seq searches, want to try dynamic features, e.g., primer design, or submit a query, get back hits. all part of same code base, couple this with existing chado backend, and writeback code, unifying into a common code base. working on flowchart diagram. still working on that, doable by end of grant period. gh: can you break it down month by month? aday: want to put two weeks solid on it, barring derailment by other priorities. want also to participate on the publication. need to consult with advisor on that, maybe part of my dissertation. shooting for graduating this summer. gh: great. planning to submit by then. there will be enough to say about spec w/out getting into impl. could point at ref impls. aday: in lieu of a das/2 publication, am referencing the biodas.org site for a manuscript we're submitting soon. server with 60,000 array result files, 20K are hg-u133. gh: planning for all to contribute to das/2 ms. We can work out detailed contribs later. bo: working on graduating now (march 21st defense). full time work on the grant after that, whatever is left on the refactor, packaging, documentation from biopackages perspective. working with Mark Carlson on das/2 igb client in another project. for time being, will be full time focus on graduation. can do maybe up to a month of full-time work. will still attend conf calls to keep tabs. ap: several things: editing on biodas.org wiki page. needs more work. protein structure das applications, casp prot structure prediction, scop, collection of prot struc alignments, making web pages via das, wrote paper on this. for Ensembl, it's making large amt of data available, set up ~17 das sources, working on registration server for that, allowing people to upload data as well. gh: anyone actively working on das/2 dev at Sanger/EBI besides registry? ap: don't know, I'm on a different grant. gh: yes, the ball is in our court re: das/2, but haven't heard much from the uk folks yet. andrew had idea on das1-das2 (proxy server) ap: yes, will be very useful. gh: then the registry will be able to list das2 as well as das1 servers. ap: his status on that? gh: he hasn't called in in a while, but that was is focus for remainder of his contribution to grant. sc: he was re-engineering to deal with some speed issues. haven't heard latest status. [A] gregg ask andrew about das1-das2 proxy status, mention at biosapiens mtg Wrapup: --------- [A] Everyone get their goals milestones to gregg ASAP (via the das2-grant list) [A] Next teleconf: 19 Feb