Millions of Free Images to be Legally Available

(IPRinfo 2/2011)

Herkko Hietanen
Lawyer, economist and researcher, Helsinki Institute for Information technology

Kumaripaba Athukorala
Computer scientist and a researcher, Helsinki Institute for Information technology

The YouTube generation is creating their own music videos whether rights owners like it or not. Why not offer them tools to channel their creativity and make money while doing it?

In the AudioImager-project Helsinki Institute for Information Technology (HIIT) is examining how Open Content could generate value to content business. The project is funded by National Technology agency Tekes and industry partners.

The mind-set of the creative industry has changed in past couple of years. The rights owners are willing to try new models for licensing and recognising consumers’ role as co-creators. Part of the testing is done together with Finnish composers’ collecting society Teosto, online music store platform provider SecuryCast and record company EMI music.

It is the night before your big presentation. You are thinking that those bullet points could use some graphics. We have all been there; scrambling in Google image search to find images to our PowerPoint presentations. PowerPoint might be the biggest tool of conducting copyright infringements. HIIT is looking to change that by making millions of open content images easily available right where they are used.

Open content licenses are increasing
While many of the images found online are protected by copyright and can be used by the rights owner only, there is a growing pool of images that are either in public domain or are licensed with permissive open content licenses.

There are nearly 200 million Creative Commons licensed photos on online photo management and sharing application Flickr. Wikimedia commons has another collection of 25 million images.

However, finding the right images and using them legally in media products can be difficult. Unlike with commercial stock photo services, open content repositories are rarely integrated to creative software platforms. Finding a suitable image from millions of images coming from several sources can be slow. Image users also have to figure out how to manage required attribution of the original author.

Many of the websites that store open images, like Flickr, offer open Application Programming Interfaces (API) for third party programs and services to access their collections. However, even computer programs are having tough time accessing different repositories.

Millions of open content images are scattered in several repositories behind slow APIs, the search results often return data in multiple forms and the search results have varying quality. However the biggest problem is that searches do not learn and evolve. There is no way to reliably change the irrelevant tags, to write new tags or to change the order of search results.

To overcome difficulties of the decentralised image storage, we decided to build our own database. Our goals are to create a database of the best open content and public domain images, to refine the metadata, to create new linkages and context data to the images and to offer an API for easy and fast access to the images and their metadata.

From images to videos
Google Summer of Code program provides students a stipend to work in an open source project together with a mentor organization. With the help of the Google funding we managed to develop our first tool ”AudioImager” which automates the process of video creation utilizing open content images.

AudioImager helps users to create videos by combining audio and Open Content images. Users can enter keywords that describe the audio track to which they want to create a video. Then the system retrieves Creative Commons licensed images from Flickr which are matching the keywords that the user provides.

It also provides a Graphical Interface for the user to adjust the durations of each image, preview the video and search for different images if the user is not satisfied with the proposed image.

Finally, AudioImager will render the video which can be published online. The software also creates end credits where the photos’ right owners are given credit. Therefore it helps to reduce copyright violations generally committed by amateur video authors.

Video creation is generally perceived as a challenging task. Our goal was to present this task in a new manner where image discovery and retrieval are embedded in to the story telling and video creation process. We also wanted to encourage the usage of legally shared material and automate the cumbersome process of attributing images.

Our next step in the project is to replace Flickr by our own image database. Though Flickr’s photos are great for systems like AudioImager, it lags behind some of our requirements such as the speed of image and metadata access. And also the tags we find in Flickr images leave room for improvement.

We can improve the tags of images in our database with the use of different techniques. Another import target is to make our database aware of the context. It could be tuned with every image search and provide more accurate results that suits the audio the most. Hence with our database we could provide more satisfactory results than we could do with Flickr. We also used Wordnet ontologies to feed our database and with that we could provide more meaningful images to the user.

Would you surrender your slides to your audience?
Today, many of the best presenters are using only images to support their message. What if the audience controlled what is displayed on a presentation screen? We developed a lecture game application to encourage the usage of our database. It’s a simple game which helps to make lectures more illustrative and interesting.

The audience of a presentation can enter keywords describing their feelings and the content of the lecture. If two keywords are matched the system displays a matching image from our database. The image is then projected to the classroom. When the event is over, the software creates a timeline of the images. The audio recording of the lecture can be connected to the timeline. This enables navigation to the point of time when the keyword was logged.

The game was tested in a real classroom and the results were quite interesting. The lecturer controlled the other projector which was displaying the courses regular slides. The second projector was controlled by the student participants. During the 45 minute lecture the students managed to match 96 images. About 80 percent of the images were class related. About 20 percent were related to lunch break and only few showed the audience dissatisfaction to the lecture.

The lecturer found the system useful. It enabled him to find out when students were feeling bored and not listening to the lecture. However, as we predicted in advance, most of the students thought the keyword typing in the middle of a lecture was a distraction. Tagging lectures with keywords is not a natural way of participation. However, students write lecture notes and diaries which is something we aim to facilitate in our next iteration of the software. We predict that the educational sector would be a major benefactor for free and easy availability of images.

Need free images, we got them
The research project outcome will be an image repository and an interface that can be used to connect applications that need access to images.

We still have a lot to do to improve the quality of the metadata of the database. We hope to develop affordable methods together with our research partner Microtask to improve the quality of the database. Because of the sheer number of data we have, crowd sourcing is likely to play a major role in this.

Similarly we are targeting to develop several other games to improve our database. Currently we have developed a facebook integrated tagging game where the user could enter tags to images and points are given for correct tags.

We are hoping to support new kind of creativity which can connect professional ”all rights reserved” -material with Open Content. So if you have applications which could benefit from the database, let us know. And stay tuned, this is just the beginning…

Share: