An example of the semantic web, and some thoughts

Took me lot's of time to understand the principles and the huge potential of the semantic web, which is suffering some kind of crisis, and did not yet reach a point to be usable for everyone. In this little demo video of "Freebase parallax", you can see an excellent practical example of what it could become one day for us normal people, in this case when looking up information. You no longer, have one website, one click for one piece of information belonging to a concept implying many concepts each with many pieces of information. Instead you are dealing directly with concepts itself and work with the associated sets of information. But stop the blabla, just look at this..

Freebase Parallax: A new way to browse and explore data from David Huynh on Vimeo.


It's acually working already to some degree, but not as good as in the example shown here for every search query you throw at it. Till now, the data in the world wide web is not yet represented consistently and completely in the semantic web and that integration thing will take many years, there are some technical challenges left but mostly, the ressources lack I guess.
If just the whole semantic web thing was perfected for daily usage, it could have HUGE consequences. The technology itself is actually already being widely used invisibly by many big fishes like amazon, last.fm, often where you see websites making "smart" suggestions. It could be a HUGE paradigm change. It would allow computers to help us integrating an overload of information that a person alone cannot overview anymore. The computer will become much more useful in the future, it will be getting increasingly smart in interpreting any given information, anticipating actions that can be taken, and making intelligent suggestions.
In life science, this is already being applied more widely. It allows for example, that the computer can make conclusions for himself (called "reasoner"), but also there, although much more care is taken to integrate existing data with the semantics (so called "ontologies"), it's still in its infancy when it comes to the application.
But it would not just change the way we deal with information or rather, the way we are provided with information. If you think it to the end, this could be nothing less than a revolution with wide psychological, social and political consequences as well. It would allow to cluster not only information, but also PEOPLE on the basis of their shared interests, activities and opinions. The basis of this is already there with one particular semantic subweb called FOAF (friend of a friend). For example, it could boost the ridiculously inefficient way of resistance towards cooperate interests against the human people and the planet as a whole (see empathic civilisation). It would help coordinate concerted actions of public disobedience, which until now, is absorbing itself by repeating things one to another, loosing time and energy for concrete action in a lot of blabla. In could reduce the "overhead" of organization among a large number of heterogenous individuals and groups. If you just let your healty paranoia do its thing for a second, this is probably the reason, why society doesn't give it the necessary ressources to develop to the end. It could be "dangerous" to them.
Interesting about the approach of freebase, linking things and describing relationships -just like in wikipedia- is open to contributions by everyone.

video stabilization and rolling shutter compensation with the open source combo Virtualdub + Deshaker plugin

Sometimes you couldn't bring a tripod, gave your best to keep the camcorder steady, but you had to zoom in and the built-in stabilization left over some movements.

Shaky movies can be stabilized afterwards on the PC, but it's a frowned upon option for purists: You can choose between cut-off blocks appearing on the edges and crop them later, which leaves individual remaining pixels as they are (the most purist option), or loose picture quality by zooming in and hide the cuttings (the standard but most dirty option).
So, it's a pretty dirty thing to do! But if the shakying is mild and you don't care about some quality loss, or if the things you care about are in the middle and don't care about cropping, it's ok even for a purist.

Finally found a open source combination which gave good results and was not too complicated:
virtualdub + deshaker

The website of deshaker is quite self-explanatory, and there exist many good tutorials on the web (i got only german ones). I only want to put the hint.

Great bonus:
more recent versions of deshaker can also compensate for the ugly rolling shutter artifact, a design flaw that all CMOS based consumer cameras have. Deshaker can compensate for this, as long as the movement originates from the camera not the object. Fore more details on the percentage value check deshaker homepage (for my cam I had to pick 38% instead of default 88%).

my current material is interlaced. To minimize encoding/decoding (inherent quality loss), it's possible to deinterlace and deshake at once by simple putting yadif and deshake as the second filter in the filter chain. The deshaker first and second pass run in parallel with yadif deinterlacer (which had to do the same work twice). If you are abundant of disk space, you can deinterlace into a lossless container (such as huffyuv), and do the deshakeing on this one, this way speeding up things without quality loss.

convert structured text into mind maps in seconds

(this became a somewhat long tutorial. basic idea: make mind maps of text.
see below for pictures and the interactive embedded mind map made from a text book)

Very often, we are confronted with an abundance of information. Loosing the overview fastly results in frustration. Lateron you realize that it was not so bad actually, and it was just the way in which you were confronted with it, that overwhelmed you.

Visualization can help a lot. But if it is text, things are more difficult.
Mindmaps are a good to make some structure, but in some cases, entering data in mindmaps is too time consuming and can be a fiddly thing.
Often, you already have a logically structed document, but you would need to enter by hand. That's what you actually will not do at all in the end. But there is a way to do that, without the need to re-type.

If your source has a well made index, which is structured and detailed enough, then it could be useful as a starting point for creating a mindmap. This mindmap can be useful in many ways later.

First you have to make the computer understand the logical structure of your text. The most easy and practical form is by using tabulators. Neither freemind nor xmind list text files directly in their import lists, but it's possible: just copy and paste from a text editor into an existing map!

As a quick example, I took a xy powerpoint of my current subject



and made a mindmap in about 10secs just by copying pasting into the freeware xmind:


(click to enlarge)

In this example, it didnt attribute the headings A and B as parent nodes, because they were not using tabulators. But you can easily modify it yourself in the text or in the mindmap. As long as the whole document is more or less consistent, it can be used (if it's not using tabulators, as in this case, it has to be modified slightly in a text editor first).

This trick of copying and pasting tabulator structured text works in the many other freeware/opensource mindmappers like the very prominent freemind (which i dont like to much), or it's improved spin-off freeplane.

If you are not familiar with mindmaps, you might ask yourself, well, that's so nice, what now?
Well, first it's just a way to store information. You can use collapse and expand nodes and therefore, it helps maintain the overview of the main structure and context. It helps reduce slutter.
Besides this, there are lots of things you can do with mindmaps. You can of course include pictures and links, create links between nodes of different parents, mark topics with symbols, use it as a ToDo/GTD-Manager and much more.


Here's some concrete example of both, a practical application and how to deal with the most frequent problem: text documents that are structured, but can't be imported because they are not using tabulators.
We will first copy an index of an online-book, transform the structure (that's based just on spaces) into tabulators. Copy&paste in mindmapper, and then use this visualised index as a scaffold to fill it with information.

1. On the NCBI bookshelf, you can access many important biomedical books for free online, and do researches on the whole bookshelf which also contains quite up-to-date sources, such as the Madame Curie Bioscience Database.
For our example, there is an excellent book I am currently using, the Alberts it's the old edition but still sufficient for the "rough" index.
2. On that page, click "expand all". Then select all (ctrl-a) and copy to the clipboard.
3. Then open a new document in the most simple Text Editor you can find on your system (in windows this would be the "editor" not "wordpad" or Office) and paste what you just copied.
4. Delete all that's not essential for your index. Hence, everything above "Part I" and everything below "glossary".
5. save as txt, not as rtf or something that allows formatting. This simplies the process
6. Open that primitivated text file in your advanced wordprocessor. You know get an apparently perfect index, with three different indents, depending if it's a main heading or a sub-heading. But you'll have to do some search&replace in order to replace spaces with tabulators, in order that the mindmapper programm can understand.
7. Start with the most indent subsub-heading ("The Universal Features of Cells on Earth"). Copy whats between the start of the line and the first letter (" + "). Paste that in the search field of search&replace dialog.
8. Replace with Tabulators: in Openoffice, you have to click on "more options", activate "regular expression" and insert "\t" as many times, as deep this heading is. In that case, we have only 3 levels of headings upwards: PartI: Chapter1. The first level doesn't need a tab, so here we enter "\t\t". Press replace all (it should replace 136).
9. Repeat with the superior heading (replacing 25 times " o " with "\t")
10. For the top level headings (Part I), remove what's in front of the heading (replacing " *" with nothing!)
11. Now, select all, copy. Open xmind and paste it. At first, you should get this sligthly overwhelming mindmap:


I currently care about the fourth section, so I collapse the rest:


Alternatively, even better: right-click on the the node you care about and click "next layer" in the context menu. This gives you this:

you can return to the entire map by clicking in the little green arrow in the center node.


12. Back on the alberts online website. Copy a title of a section that interests you, let's say "The Molecular Mechanisms of Membrane Transport and the Maintenance of Compartmental Diversity" and paste it into the search field on the top of the page. It then gives me the full text and on the right, there is a frame with the subsections of that chapter. Select, copy. Then in xmind, click on the node of the parent chapter, and paste it direclty. Works!

13. Now, back on the Alberts page, click on an illustration. Right click on that picture, save picture as. In the mindmap, click on the corresponding node, right click, insert: picture from file (one node one pic, so for more than one pic you need to make child nodes). You can insert also the information of a pic, simply as the title of the node containing the picture, or as "note" (you can see the info when you hover over the yellow icon). well just see it here:


14. Optionally, you can then import the map on website like mindmeister which gives you an editable mindmap for graphical online collaboration (free version available). Here you see the test-mind map embedded into the blog post. It's a working interactive mindmap! You can move around, zoom out, click on +/- to expand / collapse or even directly change it here and now, without any login process ( you can do that, i have backups of course). Try it!

semantic net2: ontology driven visualisation of cellular biology

This is another example of the semantic net. It's much more applied and concrete then the other ones. It's about molecular biology. Since I am currently preparing an graduate exam of cell biology, I was doing some research and understood some basic things about bioinformatics, the workings of annotation databases (excellent introduction videos) and I am starting to understand what are the possibilities there, except sequence alignment and the prediction of protein structure. I put it here in some more detail, because I think this could be helpful for other anyone studying some cell biology, not only students. It's just a matter of how deep you go.
I wished I had found this thing years ago!!

Don't be alarmed too much, it could be really interesting for someone that doesn't have basics in molecular biology, in this programm, there is a text field that gives a short but comprehensible description of the selected elements.

This thing gives you nothing less but a graphical way of exploring the clockworks of life, the logic of the cell.

1. Download this free opensource tool, OBO-Edit.
2. Once installed, you need to feed it with a so-called ontology, that's roughly another term for semantic database. You can download special ontologies for all kind of fields in biology (berkleybob / Obofoundry, NCBO, BRENDA, Protégé-link-list) but lets keep it simple for a "standard cell", and just right-click & "save as" this ontology.
3. Open it in Obo-Edit: "load ontology", then locate the file you just downloaded by clicking "browse" in this dialog then OK:


4. Wait a second, and you should get this main windows:


5. Now you can go and explore the cell by three main categories
biological process
cellular component
molecular function
either
a) hierarchically by clicking on the plus sign of "Classes" in "Ontology Tree Browser" or
b)graphically. Graphical view has some great advantages over hierarchial(just as in the reality), such as the fact that you can see relationship that span from hierarchy branch to another. But slowly, how to browse graphically: with the mouse pointer, go on the biological process node, without clicking. Three tiny buttons will appear on it (two blue and one red x), when you hover-over one with the mouse, it will show a little info bubble:


Principle is quite easy: right side will expands the contents of a node, left side collapses.

6. By clicking on the right blue triangle ("expanding all 35 children"), you will see those 35 children nodes, each one of it being a main category of cellular function:


You can navigate by clicking and draging on the white background.

7. When you start expanding nodes, very quickly the graph grows laterally way beyond the screen. To see more, you can minimize other fields (Search panel and ontology tree editr) by clicking in their minimize "minus" symbol in their title bar.
Text Editor is a good thing, since it gives you a brief definition of the node that is currently selected. Since it's the vertical space that is running out fast, drag and drop the title bar of "Text Editor" field and place it in the middle top or middle bottom:



Note, that in this last screenshot, it's still only one of the three main nodes (biological process), of which only a single child node is expanded (developmental process). I hid most of the other sister nodes. Still, on the little preview bubble on the lower right, you can see that the graph extends much further to the right (the green rectangle represents the visible part).
As you can see, lot's of nodes are not only children of developmental, but also children of others, which are sisters nodes of developmental. There is a lot of interdependancies of processes in a cell of course, whatever way you organize an hierarchy representing the cell, you couldn't avoid this. These, let's say, passively interlinked nodes are actually not expanded. If you want to be impressed, you can grow this thing really very large very quickly, if you open up just some few those. I don't really understand why they created a "cellular process" node here... ?? well, they'll have their reasons I guess.

Some usability tips:
- note, that the blue little preview is resizable
- you can change the zoom factor with the little slider on the upper left.
- as you can expand children, you can collapse them in the lower left blue triangle. - strangely, you can also collapse the parent with the upper left blue triangle (increases visibility)
- to go back, you can expand the parents again, on the upper right blue triangle.
- in short: right expand, left collapse. lower triangle: children. upper triangle: parent.
- if you want to concentrate one particular node, click once on it to select it, then in the context menu (right-click), you select "hide" then "everything but selection". Then you can expand in both directions, the parent and childrens. The advantage is, that "other sisters" of the selected node in question, but also the sisters of the children or parents you expand, are hidden. This means, you are not overwhelmed all the time and can concentrate more easily on one particular function and see what it belongs to / what are the details.
- Dont forget that this is only one of three categories, which maybe is the most understandable one for non-biologists. To explore the other two root nodes, you might need to unhide them first, by browsing "upwards" clicking on the respective parents upper right blue triangle. Or simply, in the main programm menu, chose Edit/reroot.
-sometimes, I was unable to get it to work as expected anymore. In that case, it helps to go to the menu "config", "configuration manager" and in the tab "user settings" click "reset all configuration files", confirm, and restart the obo-edit.

I'am not finished yet, but I am a bit busy and I haven't figured out yet how to use the search and filtering functions that would not only help to limit the clutter, but would make some more sense in it for studying purposes.
The big question is: how to get the connections between function and structure?

The Problem: If I enter a search term, for example "clathrin", I get a nice list. When i click on something, it displays only the parents of the nodes (mostly 1-5). It's not possible to explore around, the expansion/collapse functionality is off.
However, at least it's possible to click on the visible node and get some info.

(to be continued...)

UPDATE:
i read in the help, that you need to activate the "reasoner", that means, the computer needs to calculate the connections first, before it can display it. I started it (menu "reasoner", reasoner manager), but it was extremely slow (i've got a core 2 duo) and after a minute or so, I got an alert saying my RAM was running out (i've got 2GB installed, and manually 1GB to java).
Hmm! Seems my PC is not fit for this... Maybe I find out more, on how to limit the "reasoning" parameters.

first steps in the semantic net

I'm currently exploring the "semantic net". There is lots of blabla about it, yet it is basically very simple: give meaning to words.
This works by linking attributes to words that are meaningless per se, for the computer. Done this, you can connect to other items sharing the same attributes. (quick and dirty basic youtube introduction video)

Having given meaning to words, computers can be much more useful, since they can understand the context.

Here is a nice little graphical ontology viewer with limited but good data set.


(click on the nodes!)
(on the adress bar, you can directly enter a person name. The n=5 value describes how many child nodes divert from the search term)

AFAIK, the technicals problems are:
1) extract metadata from non-structured or semi-structured content automatically
2) integrating content with metadata, linking different standards of semantic nets
3) real life applications

However, thanks to automatic metadata extraction, the semantic net makes advances.
This clickable map of semantic networks, each node being a big semantic databases of different subjects and their interdependancies shows the enormous potential:



What a pity, that apparently we cannot really use it for now

opensource music recognition mp3 auto-tagger

many have, all hate it: mp3 files without song, artist and album information. For some reasons, i have a lot of those, it just sucks when you are around, and ask yourself, what song is it?!? well, I think the problem is clear?
MusicBrainz Picard is a very smart opensource tool which creates a "footprint" of your song and looks up all the information. You can save then just the tags, or also rename or even move the files. It also does the cover pic somehow, but I haven't checked it out yet.
It works not with each and every single song, depending on how you got your mp3, but there are ways to give the app a hand.

thanks! I love it.




Btw, it is an example of the semantic web, applied to music. Will write about that, once I got a better idea.

desktop search - open source & paranoia-free

Spotlight in Apple MacOS X, Google Desktop Search and Windows Search offer searching within file's content and media attributes, for a wide range of filetypes including PDF, MS Office, OpenOffice etc. This is extremely useful, when working productively with large collections of pdf, especially if you did not take enough care of maintaining much order in your files.
Unfortunately, one may not feel comfortable to let google, microsoft or even apple view the very heart of your files, especially as they also offer a network index sharing function. Might be really wise to be a little bit paranoiac these days. Not that I have a big deal to hide, but just for a good feeling.

It appears to me, there are not so many high quality open source replacements, or at least, easily usable ones. I found DocFetcher to be quite good at doing the basic job, although it won't certainly win any design award. It's text-only search, no mp3/image/video attribute search, and has also a primitive but effective preview function, working also with pdfs, highlighting the search query as you can see here (click to enlarge):


yadif: excellent open source bobbing deinterlacer

I made some really nice videos on my last hollidays with my canon hv20. Unfortunately, somehow, the cam unnotedly reset some settings and I recorded everything in the interlaced mode instead of using the progressive mode (interlaced: made for old TV, but not PC screens, using a dirty old trick, giving double the temporal resolution, but only half the spatial resolution: alternating interlaced half-images aka "fields". progressive: half the framerate, but full "true" pictures aka "frames"). That's bad because this way, I need to deinterlace all the stuff I might want to use for my purposes (that is the process to convert from interlaced to progressive. It's a dirty thing, in principle).
The good thing is, that you can get the double progressive framerate out of it. Normally, this variant of deinterlacing, called "bobbing" or "blending", depending on the scenes, involves big quality loss, but there exist really smart algorithms now which result in quite good quality (meaning few deinterlacing artefacts).
Although common NLE video editors like FinalCutExpress do offer deinterlacing, and some of them also bobbing, often their built-in algorithm are not particularly good (Vegas 8 interpolate deinterlacer allows quite good bobbing though).

There are several good free open source deinterlacers. The hero of the hour, is yadif. It offers you all important variants of deinterlacing, including bobbing. The results are really good, one of the bests out there, but even better, considering the quality, the speed is excellent (adaptive motion compensation can be a veeery slow task). Yadif is included in players such as VLC and (k)mplayer. For encoding/transcoding matters, you can get it along with virtualdub (windows only. mpeg2/ac3/wmv plugins) and avidemux (mac, linux, windows).
Be careful: the outcome quality depends largely on the codec choice and it's parameters. That's what be the next post about

freeware for lossless operations on jpeg images

Various standard operations you do on jpeg photos involve a loss in image quality.
In most programs, rotation, flipping, cropping and downscaling involve re-compressing the image, which always deteriorates image quality, even with the quality setting set to 100%.

If you like it clean, this free program allows these operations to occur without any re-encoding process, hence, without quality loss: http://jpegclub.org/ Look for jpeg-crop downwards.
thanks jpegclub!

nice freeware for recording audio&videos from flash based websites

just found a very powerful freeware browser toolbar (working not as a separate application but inside firefox). It allows you to record audio and video from flash based content. It's quite loaded with good and rare features, check it out yourself:

One particularly interesting thing for me, especially for audio recording:
instead of passing by dirty analog-digital conversions by your cheap soundcard, leading to ugly background noise and strange system noises, it apparently uses some form of directly copying the sound in good quality (calling itself "Applian Audio Driver Technology").
It also auto-starts and auto-splits the recordings. Well done!

comprehensive link collection on Web 2.0 collaborative tools

http://www.mindmeister.com/de/12213323/best-online-collaboration-tools-2009-robin-good-s-collaborative-map

my favorite in the moments:

www.mindmeister.com:
allows collaborative brainstorming using mind maps. excellent implementation, really helpful for making up your mind, once you got used to it. (free version available)

simultaneous real time editing of texts. start immediately without registering. got a unique URL directly accessible (free version available).

www.mixedink.com
another collaborative text editor, but this one has a special feature: the site takes a "democratic" decision, based on voting of contributors within a certain deadline.
also interesting: you are supposed to pick text parts of other versions of other users, the system tracks which text part originally came from which contributor to grant partial authorship in the final version.

alte und aktuelle PCs warten, testen und aufrüsten

Hier ein paar Tipps und links, wie man (allgemein) Rechner in Stand
hält und wartet. Es lässt sich somit meistens ein alter Rechner für gar kein bzw.
wenig Geld wieder fit machen, und ein Neukauf vermeiden:

1. Wartung:
- falls Festplatte voll (vor allem wichtig bei zu wenig RAM). Platz machen. Zeigt wo wieviel womit Festplattenplatz belegt wird: http://www.jgoodies.com/freeware/jdiskreport/index.html
- Defragmentieren und Optimieren: --> http://jkdefrag.8qm.de
- Reinigung alter Caches, Beheben von Registryfehlern, Entfernen von überflüssigen Autostart-Einträgen : --> www.ccleaner.com/
- Aktualisierung aller wesentlichen Treiber. Dazu nötig, Identifizieren der Komponenten: http://www.cpuid.com/pcwizard.php

Nach der Wartung sollte die CPU-Auslastung im Ruhebetrieb nicht >20% liegen, nachschaun im taskmanager: rechts-click auf Taskbar, dann Reiter Systemleistung. Im Reiter "Prozesse" kann man die Liste so sortieren, dass diejenigen Prozesse angezeigt werden, die momentan am meisten beanspruchen und diese eventuell beenden. Hierzu ein oder zweimal auf die Spalten-Beschriftung "CPU-Auslastung" klicken, so dass oben sinnigerweise die größeren Werte angezeigt werden.

2.Ist zu wenig Arbeitsspeicher eingebaut? Das lässt sich objektiv nachschaun: im taskmanager, Reiter "Systemleistung". Der Wert von Physikalischer Speicher: "Verfügbar" sollte wesentlich größer sein als "Zugesicherter-Speicher: Insgesamt". Mehr RAM einbaun, falls nicht schon geschehen.
Außerdem lässt sich testen ob RAM defekt ist (verursachte sporadische Abstürze)
--> http://www.heise.de/software/download/ct_ramtest/3666 bzw
www.memtest86.com


3. Eine aktuelle Festplatte: kann abgesehen vom Zusatzspeicher einen großen Geschwindigkeitszuwachs ausmachen. Gängige Festplatten haben Transferraten von >80MB/sec. Mein 2 jahre alter Normalo-Laptop nur ca 25MB/sec.
Lässt sich testen ob nötig:
http://crystalmark.info/software/CrystalDiskMark/index-e.html
und auch wie zuverlässig die Festplatte ist bzw ob sie bald kaputt
geht: http://crystalmark.info/software/CrystalDiskInfo/index-e.html

je nachdem wie alt der PC ist (welche Schnittstelle), sind "neue alte" Festplatten (noch) günstig zu haben (zw. 50-70€ für Ultra-DMA. schon stark eingeschränkte Auswahl)

4. Ubuntu/Linux statt Windows, angepasst an alten Rechner. Hierzug ist vor allem die Desktopumgebung XFCE statt Gnome oder KDE zu empfehlen. Diese Kombination ist wirklich wesentlich schneller und ressourcenschonender als Windows. z.B. "xubuntu" bzw ubuntu und nachträglich XFCE installieren)
Lässt sich auch mit sehr wenig Aufwand machen, d.h. ohne Neupartitionieren/Neuaufspielen/Backup: wubi-installer.org

5. Evtl gebrauchte Grafikkarte auf ebay kaufen (je nachdem, 2-40€)