Feature extraction for content-based recommendation – Mining the long tail
Paula Vaz Lobo,
The large amount of available items for consumption surpasses our processing capabilities. New content (books, news, music, video, etc.) is published every day, highly exceeding our capacity to make informed choices. The items that we do not know become potentially useless, because we are not aware of its existence and cannot specifically search for them.
Current recommendation systems try to predict what we want to consume. Nevertheless, quite often tend to recommend popular items, because they are mostly based on ratings. This phenomenon shapes the consumer curve as a Pareto’s distribution placing popular rated items in the “head” (the first 20% of the total items) and the unpopular unrated items in the “long tail” (the rest 80%). Items in the long tail have a recognized interest for smaller groups of people. However, current recommendation systems are failing to reveal the unpopular items, because of the rating scarcity. There is a need to assist people finding interesting unrated items in the long tail.
In this thesis we explore textual features of documents in long tail. We explore document content to find similar documents using a top-N recommendation algorithm. We use semantic similarity (documents about the same subjects) as well as stylometric similarity (documents with similar types of writing style) to find documents that are closer to user preferences. Document similarity is measured using documents semantic and stylometric features. The combination of these two features type can improve recommendations novelty and help people find interesting documents in the long tail.
Date: 2011-Mar-09 Time: 14:30:00 Room: 336
For more information:
Workshop “Metabolism and mathematical models: Two for a tango” – 2nd Edition
Title: Workshop Metabolism and mathematical models: Two for a tango – 2nd Edition
Dates: October 25-26, 2022
Location: This workshop will be held in a virtual way
The topic of this workshop is metabolism in general, with a special focus, although not exclusive, on parasitology. Besides an exploration of the biological, biochemical and biomedical aspects, the workshop will also aim at presenting some of the mathematical modelling, algorithmic theory and software development that have become crucial to explore such aspects.
This workshop is being organised in the context of two projects, both with the Inria European Team Erable. One of the projects involves a partnership with the University of São Paulo (USP), in São Paulo, Brazil, more specifically the Institute of Mathematics and Statistics (IME) and the Institute of Biomedical Sciences – Inria Associated Team Capoeira – and the other involves the Inesc-ID/IST in Portugal, ETH in Zürich and EMBL in Heidelberg – H2020 Twinning Project Olissipo.
The workshop is open to all members of these two projects but also, importantly, to the community in general.
The program and more details are available here.