Skip to main content

STAG Initiative

The STAG Initiative

Arakne is launching the STAG (Semantic TAGging) initiative to develop an unsupervised text and multimedia classifier based on advanced AI algorithms. STAG product will be developed along the lines of the keyword assigner service that Arakne has already designed and deployed in production for the European portal KEEP
In the case of KEEP portal our solution works to tag cross-territorial project information with respect to a predefined set of keywords. With STAG we plan to extend this solution to a fully unsupervised tagging, moreover capable of classifying also multimedia information.


STAG has the following features:

  • uses a fully unsupervised algorithm
  • is language agnostic: i.e. it is implementable in all languages and can be coupled to a translation engine,
  • is horizontally scalable because each texts comparison is independent and parallelizable
  • can keep up with the linguistic trends because is updatable.

From a technical point of view STAG is developed closely following a BigData Lambda architecture and making use of state-of-art components of Hadoop suite.

Arakne has been selected by EDI incubator as one of the brightest “Big Data” startups in Europe to make the STAG initiative grow towards a fully functional product.  Read more in this blog post.



What is STAG4Covid-19?

We were asked to contribute to the fight against Covid-19, and we try in our way.

We are an IT company, we are able to do semantic analysis and we have developed STAG. STAG4Covid-19 is the natural result of these assumptions. Our solution wants to offer a web-based tool that every scientist can use to orientate himself in the overwhelming proliferation of health-related entities (e.g. findings, symptoms, diseases, diagnoses, and medications) that are mentioned in the medical research papers in a not normalized and not semantically interoperable form.
These entities often indicate the same concepts that STAG4Covid-19 can semantically analyze and represent in a controlled vocabulary. 
Such a semantic map, we strongly believe, will increase the efficiency in the circulation of the information among the scientific community and it is our contribution to find a solution to this pandemic.
But that's not all: not only scientific language can find its standardized form, but also the common language, the one we use in our messages on social media to describe how we feel and what symptoms we have, the so-called User Generated Content (UGC).
Bringing this vast amount of information back to a standardized, measurable and comparable form will be very useful for those who will have the duty of the pharmacosurveillance and will have to promptly grasp the effects that the large-scale release of a partially tested vaccine or a new drug may have on the population.

To massively process the UGC, STAG4Covid-19 want to offer a set of API to programmatically exploit the same services of the web interface.

Who is STAG4Covid-19 for?

The clinicians, the hospitals, the practitioners, the scientific community in general and the public health monitoring institutions.