Posts

Showing posts from 2008

Web (marketing) controlled experiments == No informed consent?

Kohavi et al [ 1 ] is an extremely useful survey and guide to controlled experiments on/using the web, told primarily from a marketing perspective. It introduces and describes various experimental methods, examines the technical and organization challenges of running controlled experiments, and delves into various issues of experimental design. It is - for the most part - an excellent resource for anyone wanting to do these kinds of web-based controlled experiments. While I know this article is marketing-oriented, it is clear that some of the results from these experiments will be/have been published in peer-reviewed journals. Yet the authors make no mention of informed consent - even as an aside - in the entire article (and no mention of privacy or privacy issues either). Some of the experiments described or cited are not too different from those that might be done in social sciences or IT user interface research, where researchers are usually required to go through an ethics review

Open Standards and standards organizations

This report - from January 2008 - examines 10 "open" standard s organizations and evaluates how "open" they are. It uses a methodology that maps directly into Krechmer's open standards requirements. The organizations reviewed are: CEN (European Committee for Standardization) Ecma (European association for standardizing information and communication systems ETSI (European Telecommunications Standards Institute) IETF (Internet Engineering Task Force) ISO (International Organization for Standardization) ITU (International Telecommunication Union) NIST (National Institute of Technology and Standards) OASIS (Organization for the Advancement of Structured Information Standards) OMG (Object Management Group) W3C (World Wide Web Consortium) Evaluation of Ten Standard Setting Orgizanizations with Regard to Open Standards Abstract: On 2 June 2006, the Danish parliament (the Folketing) unanimously adopted Parliamentary Resolution B103 on the use of open standard

Article: Canadian Federal Support for University Research Commercialization

Rasmussen [ 1 ] does a thorough examination of Canadian federal government programs and organizations supporting the commercialization of university research. This work is based on background research and interviews in January 2006 with 28 " ...policy makers, program managers, policy researchers, university administrators, and program users. .", including " A case description was written based on the collected material and later verified by several key people at Canadian agencies ". For those following federal university commercialization activities, this work is an excellent review of the recent state of these programs, activities and organizations. It should be noted that this research is part of a larger and broader research effort [ 2 ] benchmarking commercialization of research in Canada, Finland, Ireland, the Netherlands, Scotland, and Sweden. The list of the Canadian interviewees can be found in this larger work (p.52). Programs: " Compared to most co

Uncertainty Reasoning for the Semantic Web I

Uncertainty Reasoning for the Semantic Web I, ISWC International Workshops, URSW 2005-2007, Revised Selected and Invited Papers. DOI http://dx.doi.org/10.1007/978-3-540-89765-1 , Lecture Notes in Computer Science . Of note: Towards Machine Learning on the Semantic Web. http://dx.doi.org/10.1007/978-3-540-89765-1_17 Author copy: http://www.cs.ubc.ca/spider/poole/papers/SemSciChapter2008.pdf Semantic Science: Ontologies, Data and Probabilistic Theories. http://dx.doi.org/10.1007/978-3-540-89765-1_2 Analogical Reasoning in Description Logics . http://dx.doi.org/10.1007/978-3-540-89765-1_19 Table of Contents: Fernando Bobillo, Miguel Delgado, Juan Gómez-Romero. 2008. A Crisp Representation for Fuzzy with Fuzzy Nominals and General Concept Inclusions. http://dx.doi.org/10.1007/978-3-540-89765-1_11 Mauro Mazzieri, Aldo Franco Dragoni. 2008. A Fuzzy Semantics for the Resource Description Framework . http://dx.doi.org/10.1007/978-3-540-89765-1_15 Matthias Nickles, Ruth Cobos. 2008. An Approa

The (near) Future of Research Articles

Image
Rod Page 's demo for his Elsevier Grand Challenge submission (" Towards realising Darwin’s dream: setting the trees free ") shows the type of enrichment of biological - if not all research - articles that is quickly becoming possible. Taking a published article (" Mitochondrial paraphyly in a polymorphic poison frog species (Dendrobatidae; D. pumilio "), various additional biological, geographical and other metadata are extracted and added to a web page for the article. These include: Map showing all localities mentioned in the paper, with their enclosing polygon List of other studies which have samples in area enclosed by the study polygon Each of the following are linked through to their underlying databases (such as NIH accession number and NCBI nucleotide viewer or linked to ubio taxonomic name viewer record: List of sequence features (such as genes) in the article List of taxa sequenced in the article List of gene sequences cited by the article An image c

Lucene 2.3.1 vs 2.4 benchmarks using LuSql

I have been doing some indexing performance tests with LuSql , and have some numbers comparing Lucene 2.3.1 with 2.4. Despite some discussion about 2.4 having poorer indexing performance, my tests with LuSql 0.9 suggest otherwise: Lucene 2.3.1 Number of records added= 2000000 Optimizing index Closing index Optimizing index time: 311 seconds Closing JDBC: result set Closing JDBC: statement Closing JDBC: connection *********** Elapsed time: 854 seconds 15m 18s Lucene 2.4 Number of records added= 2000000 Optimizing index Closing index Optimizing index time: 322 seconds Closing JDBC: result set Closing JDBC: statement Closing JDBC: connection *********** Elapsed time: 759 seconds 12m 39s Index size: 3.7GB. It is interesting that the overall indexing time is significantly less, but the optimizing time is slightly higher. Data, hardware and system configuration: as per my previous Lucene benchmarking . Note that this is a simple benchmark, so YMWV. This benchmark was done with the LuSql de

Asian Digital Libraries 2008 Proceedings

Proceedings of the 11th International Conference on Asian Digital Libraries, ICADL 2008 , Bali, Indonesia, December 2-5, 2008 are now available: DL2Go: Editable Digital Libraries in the Pocket . Hyunyoung Kil, Wonhong Nam, Dongwon Lee. Hierarchical Classification of Web Pages Using Support Vector Machine . Yi Wang, Zhiguo Gong. The Prevalence and Use of Web 2.0 in Libraries . Alton Yeow Kuan Chua, Dion Hoe-Lian Goh, Chei Sian Lee. Usability of Digital Repository Software: A Study of DSpace Installation and Configuration . Nils Körber, Hussein Suleman. Developing a Traditional Mongolian Script Digital Library . Garmaabazar Khaltarkhuu, Akira Maeda. Weighing the Usefulness of Social Tags for Content Discovery . Khasfariyati Razikin, Dion Hoe-Lian Goh, Chei Sian Lee, Alton Yeow Kuan Chua. A User Reputation Model for DLDE Learning 2.0 Community . Fusheng Jin, Zhendong Niu, Quanxin Zhang, Haiyang Lang, Kai Qin. Query Relaxation Based on Users Unconfidences on Query Terms and Web K

Software Announcement: LuSql: Database to Lucene indexing

LuSql is a simple but powerful tool for building Lucene indexes from relational databases. It is a command-line Java application for the construction of a Lucene index from an arbitrary SQL query of a JDBC -accessible SQL database. It allows a user to control a number of parameters, including the SQL query to use, individual indexing/storage/term-vector nature of fields, analyzer, stop word list, and other tuning parameters. In its default mode it uses threading to take advantage of multiple cores. LuSql can handle complex queries, allows for additional per record sub-queries, and has a plug-in architecture for arbitrary Lucene document manipulation. Its only dependencies are three Apache Commons libraries, the Lucene core itself, and a JDBC driver. LuSql has been extensively tested, including a large 6+ million full-text & article metadata document collection, producing an 86GB Lucene index. I am the author of the LuSql software. LuSql at CISTI Lab LuSql at freshmeat Upda

New Book: Semantic Digital Libraries

I am looking forward to getting a hold of this just announced book, Semantic Digital Libraries , Editors: Sebastian Ryszard Kruk , DERI NUI, Galway, Bill McDaniel , DERI NUI, Galway. Springer-Verlag, Heidelberg (DE) 2009, XVI, 246 p. 1 illus., Hardcover ISBN: 978-3-540-85433-3. The site for the book includes Tutorial on Semantic Digital Libraries , a tutorial presented at JCDL2008 , as well as a faceted searchable interface to the (extensive and useful) links described in the book. Contents Introduction Part I - Introduction to Digital Libraries and Semantic Web Digital Libraries and Knowledge Organization Semantic Web and Ontologies Social Semantic Information Spaces Part II - A Vision of Semantic Digital Libraries Goals of Semantic Digital Libraries Architecture of Semantic Digital Libraries Long-time Preservation Part III - Ontologies for Semantic Digital Libraries Bibliographic Ontology Community-aware Ontologies Part IV - Prototypes of Semantic Digital Libraries JeromeDL

Opportunistic Software Systems Development

In the 25th anniversary issue (November/December 2008 (vol. 25 no. 6)) of IEEE Software , my NRC colleague Anatol Kark is part of the editorial team for the special issue on " Opportunistic Software Systems Development ". These are all great articles, and I particularly like the Jansen et al article (" Pragmatic and Opportunistic Reuse in Innovative Start-up Companies ") and feel that almost everyone who is trying to bring their organizationl IT into the 21st century should be forced to read the Gamble et al article (" Monoliths to Mashups: Increasing Opportunistic Assets "). Cornelius Ncube, Patricia Oberndorf, Anatol W. Kark, " Opportunistic Software Systems Development: Making Systems from What's Available ," IEEE Software, vol. 25, no. 6, pp. 38-41, Nov/Dec, 2008 Slinger Jansen, Sjaak Brinkkemper, Ivo Hunink, Cetin Demir, " Pragmatic and Opportunistic Reuse in Innovative Start-up Companies ," IEEE Software, vol. 25, no. 6, p

"The Thistle Amongst the Lilies"

Image
I have to break from the usual content of this blog to point out to all the bagpipers who read this blog - there is at least one - my Montreal / Black Watch / 78th Fraser Highlanders friend Jeff McCarthy's new book of pipe music called " The Thistle Amongst the Lilies : A Collection of Original Compositions by Montreal Pipers For The Great Highland Bagpipes ". I'm going to order one. You should too! :-)

Fantastic Viral Campaign

Image
The Pomegranate Phone has a great campaign: make sure you look at all of the features before checking out the release date. And yes, I do want one!

Springer to acquire BioMed Central Group

I just read this happened earlier this month (more at the BioMed Central Blog ) via Peter Suber's Open Access News . I must admit I am rather surprised by this turn of events.

Ukraine law mandating open access to publicly funded research

I have just discovered that the Ukraine passed a law[ 1 ] in January 2007 mandating Open Access to publicly funded research. This was done after extensive consultation and lobbying[ 2 , 3 ]: "Since January 2007 Ukraine has a law mandating open access to publicly funded researches. It was widely supported by most of the Parliament members. And it is already the second parliamentary inquiry mandating the Cabinet of Ministers to take actions on creating favorable conditions for developing open access repositories in archives, libraries, museums, scientific and research institutions with open access condition to state funded researches." [ 4 ] [ 1 ]Law of Ukraine On the principles of developing information society in Ukraine (in Ukrainian). [ 2 ]Kuchma, I. 2007. Developing National Open Access Policies: Ukrainian Case Study . Proceedings ELPUB2007 Conference on Electronic Publishing . Vienna, Austria, June 2007. [ 3 ]Kuchma, I. 2008. Open Access, Equity, and Strong Economy in

Second Life Beagle trip

Wow! This is so very cool: "To commemorate the 150th anniversary of Darwin’s On the Origin of Species by Means of Natural Selection, the University of Cincinnati has recreated the Galapagos Islands, where Darwin conducted some of his famous research, in Second Life. The project is part of the university’s 2009 Darwin Sesquicentennial Celebration. By January 2009, all avatars will be able to retrace Darwin’s steps — from his 1832 journey to South America aboard the Beagle to his tours of the islands — with the help of a wind-surfing tour guide. Archived audio and video clips, as well as live events, will be available in the Darwin Celebration Theater and Gallery." From: Darwin's Famous Journey Is Recreated in Second Life , The Wired Campus, October 16, 2008

Science (and life??) through (augmented reality) Semantic Goggles

The FP6 CINeSPACE project, Experiencing Urban Film and Cultural Heritage project ( article ) creates an augmented reality by combining GIS information with semantic technology. The results are a location-/ orientation-aware binocular-like device which overlays multimedia information based on - among other things - what the user is looking at: This is a great concept and prototype, but could we take it a little further and generalize it? Say, to semantically enhanced reality goggles that allow you to select a particular semantic view of the world, including scientific and social views? Put them on and toggle the " Biological taxonomy " semantic view while you are walking through the rainforest and you have species names overlaid on your enhanced reality; identified poisonous plants and animals are marked with a bright red " Do not eat " and " Avoid ", respectively; identified endangered species are marked with an " Endangered: Do not step-on / tou

2006 vs. 2008: Doubling of Gold Open Access articles!

As reported in Heather Morrison's blog. The Imaginary Journal of Poetic Economics .

An author and a scientific publisher

Image
I am following with a sense of detached fascination UBC researcher Dr. Rosie Redfield's saga of her quest to get her CIHR funded article published in a scientific journal and also made available Open Access . While her blog is not one dedicated to Open Access but instead to her research (" Thinking about our research into the mechanism, function and evolution of DNA uptake by Haemophilus influenzae and other bacteria "), it is clear that she is spending more time and accruing more frustration dealing with this particular issue than she would want to be, or should be... Update: 2008 Nov 12 : It seems that the Dr. Redfield has given up on Elsevier, and has decided to stop publishing articles with this publisher: " ... I won't be submitting to any Elsevier journals in the future ."

October 14 2008: Open Access Day & PLoS 5th Birthday

Image
This Oct 14 is Open Access Day and the 5th publishing anniversary of the first PLoS journal, so the PloS and others are celebrating both with a number of events, T-shirts, buttons , blog competition, flyer , downloadable posters, bookmark, etc: Spread the word - downloadables, creatables, educational resources T-shirts, buttons and posters - freebies (No T-shirts left :-( ) Participants from around the world organizing large and small local events PS. I wonder if they will be releasing the T-Shirt designs with a Creative Commons license, so anyone can print a T-shirt?

Science Blogging Challenge: Get a Senior Scientist Blogging

As reported on the Science Blogging 2008: London forum , this clever challenge was announced at the London Science Blogging Conference August 30 2008. Related: Good summary of Timo Hannay 's closing sum-up of the conference and the same blog's ( Humans in Science ) entry on the conference proper.

Great T-shirt: "I Survived the Large Hadron Collider"

Image
I Survived the Large Hadron Collider ;-)

Job ad: Scientific Data Management Specialist

The following excerpt from an ad for Scientific Data Management Specialist suggests it bodes well for the prospects of this (relatively) nascent profession: Processing, soliciting, and providing assistance with data submissions for scientific data from genome sequencing and genotyping experiments into existing databases , analysis pipelines and associated data flows. Developing and improving the infrastructure supporting these systems. Required Skills Formal Education PhD Scripting experience in perl or related language Experience with SQL Experience with LINUX/UNIX Ability to use Microsoft Excel and related applications Proven record solving related problems Desired Skills Knowledge of genetics, especially human genetics Experience with large data sets XML/XSLT and related web based tools Experience with array data, especially expression or genotyping data C/C++ Experience with grid computing (LSF,SunGrid, etc.) QA filtering of genotype data (HWE, non-Mendelian segregation)

Katta released: Lucene-on-the-Grid!

I am excited at the release of Katta , a technology built on Lucene , Zookeeper and Hadoop allowing for Lucene indexes to be distributed across a number of nodes for distributed & fault tolerant search. Note that it does not create the indexes, simply deploys existing indexes onto this infrastructure.

ECDL (European Conference on Digital Libraries) 2008 Proceedings Available

Research and Advanced Technology for Digital Libraries 12th European Conference, ECDL 2008, Aarhus, Denmark, September 14-19, 2008. Lecture Notes in Computer Science. Volume 5173: Research and Advanced Technology for Digital Libraries. Ed. Birte Christensen-Dalsgaard, Donatella Castelli, Bolette Ammitzbøll Jurik, Joan Lippincott. Table of Contents: George Buchanan, Jennifer Pearson. Improving Placeholders in Digital Documents Ying Jiang, Hui Dong. Towards Ontology-Based Chinese E-Government Digital Archives Knowledge Management Christoph Becker, Miguel Ferreira, Michael Kraxner, Andreas Rauber, Ana Alice Baptista, José Carlos Ramalho. Distributed Preservation Services: Integrating Planning and Actions Eld Zierau, Anders Sewerin Johansen. Archive Design Based on Planets Inspired Logical Object Model Manfred Thaller, Volker Heydegger, Jan Schnasse, Sebastian Beyl, Elona Chudobkaite. Significant Characteristics to Abstract Content: Long Term Preservation of Information Khasfariyati Razi

Canadian Minister of Industry Accepts S&T Strategy's Sub-Priorities Recommended by the Science, Technology and Innovation Council

The Industry Canada minister accepted (Sept 2 2008) recommendations from the Science, Technology and Innovation Council on the sub-priorities of the four priority areas identified in the 2007 Science and Technology (S&T) Strategy ( The State of Science & Technology in Canada , 2006, Council of Canadian Academies ). The recommended sub-areas are: S&T priority: Environmental science and technologies Sub-priorities : Water (health, energy, security); cleaner methods of extracting, processing and using hydrocarbon fuels, including reduced consumption of these fuels S&T priority: Natural resources and energy Sub-priorities : Energy production in the oil sands; Arctic (resource production, climate change adaptation, monitoring); biofuels, fuel cells and nuclear energy S&T priority: Health and related life sciences and technologies Sub-priorities : Regenerative medicine; neuroscience; health in an aging population; biomedical engineering and medical technologies S&T p

Australian government innovation report Part II: "Innovation in Government"

The previously reported "Review of the National Innovation System Report - Venturous Australia" -- interestingly and surprisingly -- includes a whole section entitled " Innovation in Government ". Its recommendations are: Recommendation 10.1: Consideration should be given to extending the platform created to enforce payments and administer income contingent loans through the tax system; for instance, by extending income contingent loans for tertiary education outside universities and for sole trader entrepreneurs seeking to fund innovative projects. Recommendation 10.2: An advisory committee of web 2.0 practitioners should be established to propose and help steer governments as they experiment with web 2.0 technologies and ideas. Recommendation 10.3 An Advocate for Government Innovation should be established to promote innovation in the public sector. Recommendation 10.4: A rigorous policy of evaluating all Australian Government innovation programs ­ and other re

Australian innovation report recommends Open Access to research outputs, Creative Commons for government documents, open standards for publishing

The Australian government has just released a report " Review of the National Innovation System Report - Venturous Australia" . Given the similarities on size and nature of our economies, innovation, higher education and R&D environments, this report should be examined by Canadians interested in our own national innovation system. The Australian minister for Innovation, Industry, Science and Research (just having a ministry so named is a Good Thing!), Kim Carr spoke about this report in a speech released yesterday and talks about - among other interesting things for those interested in national innovation and R&D strategy - Creative Commons and Open Access to research outputs: It is embodied in a series of recommendations aimed at unlocking public information and content, including the results of publicly funded research. The review panel recommends making this material available under a creative commons licence through: machine searchable repositories, especially

"Big Data" Nature special issue

The latest Nature -- Vol 455(7209), 4 September 2008 -- is a special issue on " Big Data ". Articles (& editorial) include: Editorial (2008). Community cleverness required Nature, 455 (7209), 1-1 DOI: 10.1038/455001a David Goldston (2008). Big data: Data wrangling Nature, 455 (7209), 15-15 DOI: 10.1038/455015a Cory Doctorow (2008). Big data: Welcome to the petacentre Nature, 455 (7209), 16-21 DOI: 10.1038/455016a Mitch Waldrop (2008). Big data: Wikiomics Nature, 455 (7209), 22-25 DOI: 10.1038/455022a Clifford Lynch (2008). Big data: How do your data grow? Nature, 455 (7209), 28-29 DOI: 10.1038/455028a Sue Nelson (2008). Big data: The Harvard computers Nature, 455 (7209), 36-37 DOI: 10.1038/455036a Doug Howe, Maria Costanzo, Petra Fey, Takashi Gojobori, Linda Hannick, Winston Hide, David P. Hill, Renate Kania, Mary Schaeffer, Susan St Pierre, Simon Twigger, Owen White, Seung Yon Rhee (2008). Big data: The future of biocuration Nature, 455 (7209), 47-50 DOI: 10.103