Instructions

Read the manual of the PHARM Internet Interface

 

Introduction

The PHARM Internet Interface is a web interface for a semi-structured integrated format orientated to store and query multi-source news and social media contents. The Interface consists of a database and front-end web interface for exposing data and functionality to the users. The main functions that are supported are:

In specific, the analysis and searching functions are freely exposed to all users visiting the current site, whereas authentication is required for accessing the Scrape, Annotate and Submit functions. If you want to be part of the PHARM project and want to contribute with additional data, consider contacting our team for requesting such a role. Please visit the contact page to do so.

Users

The PHARM project specifies two types of users: the visitor and the contributor. A visitor can access the analysis and search modules, while the contributor has full access to all available processing functions. The basic work-flows for a visitor (unauthenticated user) and a contributor (authenticated user) are:

Contributor

Visitor

Functions

As aforementioned, the main action that are supported by the interface are five: Search, Analyze, Scrape, Annotate and Submit. A more detailed description and guiding for these functions follows.

Search

One of the main functionalities of the interface is navigating through the hate-speech (text) recordings of the database. The user can view all the results or apply a variety of filters (e.g. source, language, date). In detail, the available filters are:

Analyze

When a record is selected, a view presenting detailed information appears. The location is marked on a map and the results of various text analysis algorithms are presented with graphics (icons, bars, etc.). The results concern hate-speech detection (for both unsupervised and supervised classification methods), sentiment analysis (for both unsupervised and supervised classification methods), frequent word detection, and hate-speech entity collection.

Scrape

This module enables the the mass-collection of text data from two popular platforms: Twitter and YouTube. A User can collect hate-speech data from Twitter, by simply choosing the desired language (Greek, English, Italian, or Spanish) and invoking the process by hitting the "Twitter Stream" button. The stream process starts, tweets are collected based on language-specific lexicons that have been developed in the context of the PHARM project, and data are collected for predefined time interval. A link is provided for downloading a JSON file that contains the data. These data may be used for any desirable Natural Language Processing (NLP) task. The user can repeat the process multiple times. In the case of YouTube, instead of selecting a language, a search query must be set. The search query can include individual search terms or combination of them, separated by a comma. The process is instantaneous in this case and a JSON-file with the corresponding data becomes ready for download. For more information about the twitter-filtering lexicons and JSON-based data format you can visit the PHARM GitHub repository.

Annotate

The annotation process is supported by the doccano tool. Doccano is an annotation management system for text data. It can be used for developing datasets for classification, sentiment analysis, entity tagging, or translation tasks. In this project's context, it is used for text classification and each record should be labeled with specific tags. In a few words, the annotator should assign one from the labels “Hate” and “No Hate” (if the text contains content against refugees or migrants or not), and mark it as “Positive”, “Neutral” or “Negative” (according to the sentiment of the text) to each entry. First, the user may take a look at the online demo for sentiment analysis (the same procedure can be followed for classification as well). Second, he can refer to the detailed manual for the doccano tool.

Submit

For adding a new record, the user should select the “Submit” from the (top) main menu. Data entry can be executed either manually (one by one) or massively. Concerning the first method, the user should set all data (text) and metadata (source, language, date, hate, sentiment, etc.) via the corresponding input forms (i.e. text fields, radio buttons, checkboxes, etc.). If data are already formed appropriately (see sample data), they can be imported as a JSON file. PHARM GitHub repository.

Data format

For a more descriptive presentation, sample data from different data sources are presented below.

Facebook

{"id": "1", "annotations": [{"label": "hate", "user": 2}, {"label': "negative", 'user': 2}], "meta": {"id": "80056833", "type": "facebook_comment", "source": "https://www.facebook.com/provinciale.rivista", "plang": "it", "pdate": "1571-10-15 00:00:00", "phate": "(nero cinesi)", "pterms": "", "ploc": "Italy, United States of America"}, "text": "andrebbe anche fatta con Venezia (principale flotta a Lepanto). 1571: le mie potentissime galeazze spaccheranno il culo all infedele ottomano. 2020: per favore basta quarantena, devo vendere lasagne sur-gelate in nero ai cinesi a 50 euro"}

Twitter

{"id": "2", "annotations": [{"label": "hate", "user": 2}, {"label': "neutral", 'user': 2}], "meta": {"type": "twitter_comment", "date": "10/13/2020", "tweet_id": 1315981643111432192, "is_retweet": true, "is_quote": false, "user_id": 1025701121749340160, "username": "Christina Dim", "scr_name": "ChristinaDim31", "location": "Αθήνα ", "followers": 3178, "friends": 4756, "quoted_text": "", "pid": 27454420, "plang": "el", "pdate": "", "phate": "(Τουρκία μετανάστες)", "pterms": "", "ploc": "Turkey"}, "text": "RT @kanekos69: Εν τω μεταξύ αν γίνει στραβή με Τουρκία βλέπω μετανάστες εθελοντές στο μέτωπο και δεξιούς προσφυγες στα Παρίσια"}

YouTube

{"id": "3", "annotations": [{"label": "no_hate", "user": 2}, {"label': "neutral", 'user': 2}], "meta": {"type": "youtube_comment", "comment_id": "Ugy-SPKz3HGo4OohJnfR4AaABAg", "reply_count": 3, "like_count": 30, "video_id": "Gaz6UvRW0G8", "channel": "Μηδέν Ένα Μηδέν 010", "video_title": "0 1 0 ~ Πρόσφυγες", "video_desc": "Lyrics/Raps - 0 1 0 Beat by Apo (Αισθήσεις) Recorded @ Blackspot Studio Mix/Master by Sativa Cover by SpyOne (Baseline Co.) 0 1 0 IG: ...", "author_id": {"value": "UCPP-OugMmE8pNbWRbFv4YCA"}, "au-thor_name": "GATE21QNZ", "rating": "none", "date": "2020-09-03T21:16:31Z", "plang": "el", "pdate": "2020-10-25 00:00:00", "phate": "(Αφγανιστάν ισλαμιστές)", "pterms": "", "ploc": "Afghanistan"}, "text": "Είμαι 25 ετών, ονομάζομαι Μήτσος ο μαλάκας, Μαζεύω ισλαμιστές και λιποτάκτες από μια τούρκο βάρκα, Το παίζω ανοιχτόκαρδος με τις τσέπες του μπαμπάκαα, Και όταν ο Μήτσος πήγε να κάνει φίλους στο Αφγανιστάν, Τον σφάξαν σαν τους προγόνους του, για χάρη του Iσλαμ."}

Website Article

{"id": "4", "annotations": [{"label": "no_hate", "user": 2}, {"label': "positive", 'user': 2}], "meta": {"id": "92860318", "type": "article", "source": "http://defencereview.gr/gnorizontas-ta-gallika-ploia-oi-fremm-kai-o/", "meta": "Άμυνα Ελλάδα 9 Σεπτεμβρίου 2020 18:22 ", "title": "Γνωρίζοντας τα γαλλικά πλοία: Οι FREMM και οι [email protected] (Video)", "lang": "el", "date": "2020-09-09 00:00:00", "hate": "", "terms": "", "loc": ""}, "text": "Συχνά πυκνά αναφερόμαστε στις γαλλικές ναυπηγικές σχεδιάσεις. Τα παρακάτω βίντεο που βρήκαμε είναι αντιπροσωπευτικά για τις δυνατότητες των πλοίων με πολύ καλά σκηνοθετημένα βίντεο και ενδιαφέροντα πλάνα. Αξίζει να τα δείτε: "}

Website Comment

{"id": "5", "annotations": [{"label": "no_hate", "user": 2}, {"label': "neutral", 'user': 2}], "meta": {"id": "92860318", "type": "comment", "source": "http://defencereview.gr/gnorizontas-ta-gallika-ploia-oi-fremm-kai-o/", "lang": "el", "date": "", "hate": "", "terms": "", "loc": ""}, "text": "To μπαραζ επεκτεινεται. Ελπιζω αυριο τετοια ωρα πανω κατω να μην κλαιμε"}

Offline Manual

For additional information, you can refer to the full manual of the Interface or contact our team for support. The guide is also available in the Greek, Italian, and Spanish languages.