By continuing to browse, you accept the use of cookies to enhance and personalise your experience. Learn more

Web mining

The employment market has fully embraced the digital age, nowadays almost all recruitment related data is available on the Internet; on corporate websites, social network sites or via Open Data. Multiposting aims to exploit and cross-check this data in order to gain a comprehensive overview of the recruitment market.

To achieve this Multiposting has developed robots that systematically browse millions of web pages and collect information related to employment. Crawling – the extraction of this information - requires the automatic detection of relevant content. This process (which is principally used by search engines) enables us to collect tens of millions of job postings and anonymous candidate profiles. This data is imported on a continual basis, so as to be constantly up-to-date with market trends.

Big Data

Multiposting uses Big Data architecture to manage this large quantity of data. Each web page retrieved online must be segmented, cleaned and standardized before being stored in a structured database. The Multiposting technical team has several experts in Hadoop, Spark and HBase, which means that we can run these different configurations in parallel on distributed databases, i.e. using multiple computers. The mathematical models and statistics produced by Multiposting require complex calculations which must then be executed in the most reliable and fastest way possible.

1Raw data
2Parallel data processing
3Distributed database
4Client Application

1 Raw data

2 Parallel data processing

3 Distributed database

4 Client Application

Machine Learning

The millions of profiles and job postings collected by Multiposting not only enable us to compute statistics, but also to build predictive models. This process is known as Machine Learning or statistical learning and automatically replicates human reasoning using a dataset. The innovative models created by Multiposting have been specifically developed to suit the structure of our date and are often the subject of scientific publications. We have particularly focused our efforts on learning models which use datasets that are unaltered and do not require additional manual supervision. In this way we hope to replicate the implicit hiring mechanisms used by recruiters.

1Human reasoning
2Annotated data-set

1 Human reasoning

2 Annotated data-set

3 Learning

The first use of machine learning at Multiposting is for the standardization of text. The aggregated data must be structured and machine-readable in order to calculate exhaustive and relevant statistics. We need to be able to answer questions such as: what is the candidate education level? Who would suit this type of job?

The standardization process is essential to the Smartsearch application because in order to predict employment market trends we must first define the structure of the market.

Semantic Analysis

The majority of the time the text in job postings or candidate profiles is written in natural language. Semantic analysis aims to understand the meaning of these phrases in order to extract information from them. The models developed by Multiposting facilitate the extraction of the focus, tasks and necessary skills of a job advert. We initially use this information to standardize our data, and then to summarize the main points of a profile or a job posting in a few words.

Deep Learning

In order to benefit from the latest and most promising advances in Machine Learning, we have developed Deep Learning algorithms. The Deep Learning algorithm is represented as a multi-layer neural network, it is able to match jobs to CVs by finding concepts shared by both documents which are not necessarily apparent at first glance. These algorithms (often developed and used by Google and Facebook) are the best performers across multiple fields including image recognition and predictive email response. The Multiposting algorithms have been particularly successful in job/CV matching (automated recruitment.)

2Extracted features
3Neuron layer
4Neuron layer

1 Problem

2 Extracted features

3 Neuron layer

4 Neuron layer

5 Prediction


Multiposting’s work has led to the publication of various scientific articles which have been presented at major international conferences on data science.

International conferences

Bringing order to the job market: Efficient job categorization in e-recruitment

E. Malherbe, M. Cataldi, and A. Ballatore, SIGIR ’15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015.

A case-based approach for easing schema semantic mapping

E. Malherbe, T. Iwaszko, and M.-A. Aufaure, in Case-Based Reasoning Research and Development. Springer, 2015.

From a ranking system to a confidence aware semi-automatic classifier

E. Malherbe, Y. Vanrompay, and M.-A. Aufaure, Procedia Computer Science, vol. 60, 2015.

Field selection for job categorization and recommendation to social network users

E. Malherbe, M. Diaby, M. Cataldi, E. Viennet, and M.-A. Aufaure, in Advances in Social Networks Analysis and Mining (ASONAM), 2014 IEEE/ACM International Conference on. 2014.

A semi-supervised hybrid system to enhance the recommendation of channels in terms of campaign ROI

J. Séguéla et G. Saporta, in CIKM'2011 : 20th ACM Conference on Information and Knowledge Management, 2011.

A comparison between latent semantic analysis and correspondence analysis

J. Séguéla et G. Saporta, in CARME'2011 : International conference on Correspondence Analysis and Related Methods, 2011.

A semi-supervised recommender system to predict online job performance

J. Séguéla et G.Saporta, in SDA'2011 : Theory and Application of High-dimensional Complex and Symbolic Data Analysis in Economics and Management Science, 2011.

Automatic categorization of job postings

J. Séguéla et G.Saporta, in COMPSTAT'2010, 19th International Conference on Computational Statistics, 2010.

National conferences

Automatic categorisation of job adverts into job categories

J. Séguéla, In EGC'2011 : 11e Conférence Internationale Francophone sur l'Extraction et la Gestion des Connaissances, 2011.

e-Recruitment : searching for relevant key words in job advert titles

J. Séguéla, G. Saporta et S. Le Viet, In JADT'2010 : 10th International Session on Statistical Analysis of Textual Data, June 2010.

Counting models used to determine candidate job application decisions online

J. Séguéla et G. Saporta, in JDS'2010 : 42nd Statistical Session, 2010.