Posts

Showing posts from February, 2009

code4lib 2009: Day 1+2

Day 1 LibX2. LibX Edition builder . Build custom version of LibX Xtensible Catalog. Drupal, LMS, NCIP, LMS integration: Blackboard. webcast . scriblio: Social Library System Wordpress based plugin enjoysthin.gs . Mark Matienzo. anarchivist.Rich contextual book marklet. Emily Lynema. NCSU Libraries. E-Matrix : Open Source ERM Eric Lease Morgan. Alex4 . Erik Hatcher. Lucid Imagination. Lucene/SOLR. Index of Lucene apache site. Mike Taylor and Mike. Index Data. Translucent record store=="Torus" pazpar2 . Registry of searchable targets? Hard to do. IRSpy :Z39.50 Mike Beccaria from Paul Smith College. Microsoft DeepZoom . "Like microfiche" - audience. Photosynth of library stacks?? Dan Chudnov, LOC. BagIt File Package Format Random things heard and seen: citation style language , Open Vocab , UCSD Libraries Digital Assest Management System , SWORDS . Distributed version control: monotone , mercurial , bzr . Day 2: A new frontier - the Open Library Environment (OLE) -- T

code4lib: Sebastian Hammer quote

" If you have something to say, you should release it as code... " Sebastian Hammer , Index Data

code4lib update: LuSql talk done; Lucene, Solr links

Gave my LuSql talk today at code4lib2009 and didn't get cut down by any Solr/Lucene dudes! Met Erik Hatcher of Lucene/Solr fame (and now of Lucid Imagination fame) & hopefully we can collaborate on some Lucene/indexing Solr stuff in the future. I also spoke with Tom Burton-West of UMich about Lucene indexing and search performance for their 1M+ Google Books index (they use Solr). These are documents that are a lot longer than the STM articles I work with. They have 220GB sized indexes and - as they have to keep stops words for their Humanities for phrase searching - suffer from poor query performance (despite 32GB RAM). I pointed to some of my previous work on high performance indexing and searching [ 1 , 2 , 3 ]. I'd like to get at their data to examine some performance issues in Lucene, both on the indexing and searching side. I was wondering if Solr is configurable for the initial/max number of IndexSearchers. I couldn't find this in the Solr wiki, but did see in

"Elvis impersonators as XML documents"

As heard at code4lib2009: " Let's say all of these Elvis impersonators are XML documents... " - Mark A. Matienzo, New Your Public Library

code4lib pre-conference: Linked Data et al...

I am at the exciting and arcane code4lib 2009 conference here in Providence, Rhode Island. Right now at the pre-conference called LinkedData . on Linked Data . I had forgotten that Rhode Island and more specifically Providence, are the old stomping grounds (and location for many short stories and novels) of H.P. Lovecraft . And - this morning - I was talking to Ross Singer about this, and realised how this all made sense: when I first met Ross at an Access conference a number of years ago, the first thing I thought on meeting him was, " Chthulu "! He of course denied being one of the Elder Things and then levitated across the room from me. But I think this explains a lot of things... ;-) We will have to see what other Links I make at this conference. :-) Oh, BTW I will be giving a presentation tomorrow morning on LuSql . Feel free to drop in. :-)

Obama in Ottawa and the Obama - Portuguese Water Dog Effect

Image
Today, of course, U.S. president Barack Obama is visiting Ottawa . Much of the city is shut-down for the visit, at least from the perspective of getting around the city. Many are excited about his first visit outside of the U.S. as president. In related but much more minor news, my sister and mom breed and show p ortuguese water dogs (see MacDuff Kennels Portuguese Water Dogs ) , a breed that is a candidate for the Obama's next family dog . While their web site is rather anemic due to my own general neglect of the site, I do have Google Analytics turned-on, and we noticed a real spike on the site around the time of the U.S. presidential inauguration on Jan 20. Basically the site traffic doubled from its background level. Here is the graph of the time around the inauguration: By the way, if you are looking for a PWD puppy, my sister has 2 new litters born Dec 26 (8 puppies, most sold) and Jan 3 (10 puppies, some still available).

Java, MySql increased performance with Huge Pages

[Resources updates: 2010.07.07 , 2009.11.03 , 2009.05.27 ] Long running, large memory, high performant applications often have special needs with respect to their memory management. On Linux, Solaris and other modern OSes, the translation look-aside buffer ( TLB ) - whose page size of 4k for many CPUs/OSes - becomes a scalability issue in these extreme conditions. In order to get around TLB scalability issues, huge page sizes are used to reduce the impact on performance. This can be of use to installations with large scale Java, MySql and other large memory applications. 300% improvement : " Well, in my case, I was able to achieve an over 3x improvement in my EJB 3 application, of which fully 60 to 70% of that was due to using large page memory with a 3.5GB heap. Now, a 3.5GB heap without the large memory pages didn't provide any benefit over smaller heaps without large pages. Besides the throughput improvements, I also noticed that GC frequency was cut down by two-third

ICSTI2009 "Managing Data for Science" Conference in Ottawa

ICSTI2009 "Managing Data for Science" Conference From the site: ICSTI's 2009 Public conference will take place on June 9 and 10, 2009 at Library and Archives Canada, 395 Wellington Street, Ottawa, Ontario, Canada. Speakers from Canada, the United States and Europe will address: How eScience affects the way libraries, publishers and scientists relate to each other. How the era of "big data" will enable enhanced experimentation and collaboration in science. The program includes presentations from: Francine Berman San Diego Super-Computing Center, California Richard Boulderstone Director of e-Strategy and Programmes British Library, UK Jan Brase German National Library of Science and Technology, EU Lee Dirks Microsoft, State of Washington Peter Fox Rensselaer Polytechnic University, State of New York Paula Hurtubise Project Manager Carleton University, Ottawa Liz Lyon UKOLN University of Bath, UK James Mullins Purdue

Open Access for Quebec FRSQ funded health research publications

The Quebec health research fund ( Fonds de la recherche en santé du Québec (FRSQ) ) has mandated all full or partially funded projects to publish their research outputs ("peer reviewed publications") to an open access site not less than 6 months after publication. This policy went into effect January 1 2009. While this is excellent news, it would have been even better news if the policy took a broader view of research outputs and included research data in its open access policy, mandating research data management and release, similar to the policies of CIHR , Ontario Institute for Cancer Research (OICR) , NIH , Genome Canada , UK Research Councils , the Australian government , the European Research Council (ERC) and others. The benefits of sharing research data - especially health data - are well documented .

NSF-sponsored workshop on Cyberinfrastructure Software Sustainability

This workshop - to be held at Indiana University March 26,27 2009 - examines the question: " given millions of dollars invested in initiating software development, how is software that will be important to the US research and engineering communities identified, maintained, and supported over years to decades? " Of course a question of interest for other countries and their cyberinfrastructure initiatives. Workshop Goals: The goals of the Cyberinfrastructure Software Sustainability and Reusability workshop are as follows: Examine software evaluation and adoption models by individual research labs and virtual organizations Examine models for long-term software sustainability – the ability to obtain the software one wants with assurance, obtain the information required to use the software, obtain the software and hardware environments required to run the software, and use the software. Discuss mechanisms for supporting sustainability, including direct government support, unive