• Non ci sono risultati.

Fake news detection

N/A
N/A
Protected

Academic year: 2021

Condividi "Fake news detection"

Copied!
105
0
0

Testo completo

(1)

University of Pisa

Master’s Degree in Humanistic Computer Science (Corso di Laurea Magistrale in Informatica Umanistica)

Fake news detection

Candidate:

Valeriya Slovikovskaya

Thesis advisor:

Prof. Giuseppe Attardi

Research supervisor:

(2)

Contents

1 Introduction. Fake news: their impact on modern society 4

1.1 Crucified boy . . . 4

1.2 Pizzagate . . . 4

1.3 Brexit’s fake claim: The UK sends £350m a week to the EU . . . 6

1.4 Russiagate. Russian troll’s U.S. election hack . . . 7

1.4.1 "Active measures" . . . 8

1.4.2 Advertisements . . . 8

1.4.3 Political rally promotion . . . 9

1.4.4 Audience . . . 9

1.4.5 Takeaway . . . 10

2 Democracy mass media: their functions 11 2.1 Inform . . . 12

2.2 Monitor . . . 12

2.3 Communicate . . . 13

2.4 Entertain? . . . 13

3 Parliamentary monitoring platforms 14 3.1 GovTrack.us . . . 15

3.2 Vote Smart . . . 15

4 Fact checking organizations 15 4.1 Snopes . . . 16

4.2 Full Fact . . . 19

4.3 PolitiFact . . . 20

4.4 Partnership with Facebook . . . 21

4.5 Fact-checkers collaboration . . . 23

4.5.1 Global Fact . . . 23

4.5.2 The Duke Tech & Check Cooperative Project . . . 23

4.5.3 ClaimReview . . . 25

5 Fact-checking application: ClaimBuster 25 5.1 Claim spotting . . . 26 5.1.1 Data collection . . . 26 5.1.2 Feature extraction . . . 27 5.1.3 Classification . . . 27 5.2 Claim matching . . . 27 5.3 Claim checking . . . 27

6 Fake-news detection challenges 28 6.1 Fake News Detection Challenge 2017 . . . 29

6.1.1 FNC-1 task . . . 29

6.1.2 Dataset . . . 31

6.1.3 Evaluation metric . . . 33

6.1.4 Model architecture . . . 34

(3)

6.1.6 Semantic features . . . 35

6.2 PAN @ SemEval 2019 Task 4. Partisanship detection . . . 35

6.2.1 BuzzFeed Fake News Collection . . . 36

6.2.2 PAN @ SemEval 2019 Task 4 Corpus . . . 38

6.2.3 A stylometric inquiry into hyper-partisan news . . . 40

6.2.4 PAN @ SemEval 2019 Task 4 submissions . . . 41

6.3 WSDM - Fake News Classification Cup . . . 43

7 Troll farms detection 47 7.1 Twitter trolls datasets . . . 47

7.1.1 Kaggle Russian Troll Tweets (NBC) . . . 47

7.1.2 Kaggle Russian Troll Tweets (IRA) . . . 49

7.2 Scholarship around the Russian troll accounts . . . 50

7.2.1 Student’s projects on tweets . . . 51

7.2.2 Student’s projects on Reddit dataset . . . 53

7.2.3 High-profile journalists mentions case study . . . 56

8 Towards capturing semantic meanings. Events and entities representation learning 62 8.1 Multiple choice narrative cloze (MCNC) task . . . 62

8.1.1 Dataset . . . 64

8.1.2 Models . . . 65

8.1.3 Results . . . 68

8.2 Scaled graph neural network for event prediction . . . 69

8.2.1 Dataset . . . 69

8.2.2 Model . . . 69

8.2.3 Results . . . 72

8.3 PyTorch-BigGraph for learning entities representations from knowledge graphs . 73 8.3.1 Model . . . 73

8.3.2 Performance . . . 74

9 Defending against neural fake news 74 9.1 Deepfakes . . . 74

9.2 Generative Pretrained Transformer 2 . . . 75

9.3 Raise of large language models . . . 76

9.4 Machine fake detection . . . 77

9.4.1 OpenAI and Facebook Research AI efforts . . . 77

9.4.2 Grover: Adversary vs Verifier experiment . . . 78

10 Transfer learning from pre-trained models to FNC-1 task 82 10.1 Feature-based transfer learning . . . 83

10.1.1 InferSent model . . . 83

10.1.2 Transfer learning form InferSent . . . 83

10.1.3 BERT architecture . . . 83

10.1.4 Transfer learning from BERT . . . 84

10.2 Transformers fine-tuned on FNC-1 task . . . 85

(4)

10.2.2 RoBERTa . . . 86 10.2.3 FNC-1 improved results . . . 87

11 Conclusion 89

(5)

1

Introduction. Fake news: their impact on modern society

1.1 Crucified boy

During the War in Donbass, on July 12, 2014, Russia’s state-owned Channel One broadcast a now infamous report that Ukrainian nationalists had crucified a Russian three-year-old child on the central square in Slovyansk after the Ukrainian Army expelled Russian-backed separatists from the town. A news report that was officially titled "A refugee from Sloviansk recalls how a little son and a wife of a militiaman were executed in front of her".

Investigative journalists from Russian outlets Novaya Gazeta and Dozhd who visited Sloviansk and interviewed local residents, did not find any supporting evidence to back up the allegations, nor did they find any audio or video footage of the incident, unusual for the time, since actions of the Ukrainian army in the city were well documented at the time.

In Russian mass culture the episode - this good piece of propaganda - became a synonymous for journalist fakes. It was a good piece of propaganda because it was produced for prime time television, it was emotional, and it was totally groundless. It was a gross breach of professional ethics by the Russian leading television channels. It was an attempt to rally naïve people behind the idea a war against Ukraine.

The report enraged Russian speaking separatists already taking up arms against Kiev, throwing gasoline on the fire of the uprising in the east.

Other recurring false themes are that the Ukrainian government planned to put Hitler’s face on its paper money and that western, Ukrainian-speaking regions of the country have declared independence (Kramer 2017).

1.2 Pizzagate

Since early November, dozens of individuals connected to Comet Ping Pong, as well as the owners and staff of nearby businesses, have received countless death threats and a barrage of online harassment, all inspired by a ludicrous and entirely unfounded conspiracy theory known as Pizzagate. Originating from 4chan message boards frequented by far right extremist trolls and white supremacists, the conspiracy claims that Hillary Clinton and her former campaign chair, John Podesta, run a child trafficking ring based at the restaurant. (In a related delusion, the same message boards claimed that Clinton and Podesta participated in Satanic rituals.)

In the span of a few weeks, a false rumor that Hillary Clinton and her top aides were involved in various crimes snowballed into a wild conspiracy theory that they were running a child-trafficking ring out of a Washington pizza parlor.

WikiLeaks began releasing emails hacked from the account of John Podesta, Hillary Clinton’s campaign chairman, a month before the election. Within the emails were discussions that include the word pizza, including dinner plans between Mr. Podesta and his lobbyist brother, Tony Podesta.

(6)

Figure 1: Pizzagate graph. NYT, 10/12/2016

A participant on 4chan connected the phrase "cheese pizza" to pedophiles, who on chat boards use the initials "c.p." to denote child pornography.

Following the use of "pizza", theorists focused on the Washington pizza restaurant Comet Ping Pong. The WikiLeaks emails revealed that John Podesta corresponded with Comet’s owner, James Alefantis, who had connections to Democratic operatives.

The theory started snowballing, taking on the meme #PizzaGate. Fake news articles emerged and were spread on Twitter and Facebook.

On December 4, a gunman entered Comet Ping-Pong with an assault rifle and fired several shots. He recently read online that Comet Ping Pong, a pizza restaurant in northwest Washington, was harboring young children as sex slaves as part of a child-abuse ring led by Hillary Clinton. Fortunately, no one was hurt in the shooting. He told police he had driven from North Carolina to DC to "self-investigate" the conspiracy. He surrendered after finding no evidence to support claims of child slaves being held there.

The articles were soon exposed as false by publications including The New York Times, The Washington Post and the fact-checking website Snopes (Aisch, Huang, and Kang 2016; Lacapria 2016; Kang and Goldman 2016).

But shooting did not put the theory to rest.

A poll of voters conducted on December 17–20, 2016 by The Economist/YouGov asked voters if they believed that "Leaked e-mails from the Clinton campaign talked about pedophilia and human trafficking - ’Pizzagate’." The results showed that 17% of Clinton voters responded "true" while 82% responded "not true"; and 46% of Trump voters responded "true" while 53% responded "not true".1

(7)

1.3 Brexit’s fake claim: The UK sends £350m a week to the EU

On 15 April 2016, 10 weeks before the british referendum the first Vote Leave2 billboard has been unveiled in Manchester.

The advert by Vote Leave - named as the official pro-exit campaign - suggested that should Britain vote itself out of the EU, the country’s purported £350 million-a-week payment could be better spent on the National Health Service. 3

THe Leave camp’s controversial claim was also plastered over buses: "We send the EU £350

million a week".

The figure of £350 million was widely disputed by other sources have disputed this figure, as it does not took account of the UK’s agreed rebate. In fact the net amount sent was £250 million (£13 billion a year). Another £77 million a week (£4 billion a year) later comes back as EU expenditure on UK based projects, "mainly to farmers and for poorer areas of the country". Furthermore extra money comes back as the EU spends money on the UK private sector, estimated by Full Fact to be £1.5 billion a year (£29 million a week) (Team 2019; Worrall 2016).

Figure 2: Vote Leave billboard Earlier on 8 March,

The Sun ran a front-page story with the headline "Queen Backs

Brexit" based purely

on anonymous sources. After Buckingham Palace lodged a complaint, Britain’s press watch-dog IPSO4 judged the headline was "sig-nificantly misleading" and not backed up by the text.

On 15 June, the Daily Mail published a front-page story showing migrants getting out of a lorry in Britain with the headline "We’re from Europe, let us in". However, police footage clearly showed the migrants saying they were from Iraq and Kuwait.

In both cases the newspapers published small corrections on inside pages. But by then the false

2A faction in British politics that favoured leaving the European Union in the 2016 United Kingdom European Union membership referendum, made up of Conservative, Labour and UKIP MPs and donors.

3https://www.itv.com/news/update/2016-04-15/vote-leave-unveils-first-billboard-of-referendum-campaign/ 4The Independent Press Standards Organisation (IPSO) was established on Monday 8 September 2014 following the windup of the Press Complaints Commission (PCC), which had been the main industry regulator of the press in the United Kingdom since 1990.

(8)

stories had become ingrained in the collective consciousness of readers.

Another Leave campaign’s whopper was a billboard screaming "Turkey (population 76 million)

is joining the EU" despite negotiations barely crawling along and no expert, whether in Turkey or

the EU, expecting membership in the foreseeable future (Harding 2017).

On 23 June 2016 British voters opted to leave the EU by a slim majority after a referendum campaign that will be best remembered for the lies told by leading campaigners.

The referendum was notable for the proliferation of fact-checking sites analysing claims made by politicians. The BBC devoted a whole section of its site to a Reality Check aimed at getting to the "facts behind the claims in the EU referendum campaign and beyond". Despite its close links to the Remain campaign, the pro-EU InFacts website also did valuable work in puncturing the myths propagated by both sides.

Nevertheless, as shows the survey of The Policy Institute at King’s College in London, there are still significant misconceptions on some key facts around Brexit5.

For example, only 29% of the public correctly think that immigrants from European countries pay £4.7bn more in taxes than they receive in welfare benefits and services.

The public hugely underestimate how much investment into the UK comes from EU countries: the actual figure for 2016 was 63%, but the average guess was only 36%.

Different groups also have very different levels of belief in the claim that the UK sends £350m a week to the EU. Conservative and Labour Leave supporters most likely to believe it (64% and 65%), and Labour Remain supporters least likely (20%).

1.4 Russiagate. Russian troll’s U.S. election hack

U.S. Department of Justice investigation6 conducted by a Special Counsel Robert S. Mueller7 established that Russia interfered in the 2016 presidential election through the "active measures" social media campaign carried out by the Russian Internet Research Agency (IRA), an organization funded by Yevgeniy Prigozhin, a Russian businessman with ties to Russian president Vladimir Putin, called "Putin’s chef" (Mueller 2019a; Mueller 2019b; Parfitt 2015; Benedictus 2016).

IRA, known also as a Kremlin-linked troll8 farm deployed a social media campaign, targeted at large U.S. audiences, that favored presidential candidate Donald J. Trump and disparaged presidential candidate Hillary Clinton.

5The Policy Institute at King’s College London, in partnership with Ipsos MORI and UK in a Changing Europe, has run a major survey of over 2,200 people aged 18–75 in Great Britain on misperceptions of immigration and Brexit realities (B. Duffy 2018)

6 https://en.wikipedia.org/wiki/Special_Counsel_investigation_(2017-2019) 7https://en.wikipedia.org/wiki/Robert_Mueller

8The term "troll" refers to internet users - in this context, paid operatives - who post inflammatory or otherwise disruptive content on social media or other websites.

(9)

1.4.1 "Active measures"

Initially, the IRA created social media accounts that pretended to be the personal accounts of U.S. persons. By early 2015, the IRA began to create larger social media groups or public social media pages that claimed (falsely) to be affiliated with U.S. political and grassroots organizations. In certain cases, the IRA created accounts that mimicked real U.S. organizations. For example, one IRA-controlled Twitter account, @TEN_GOP, pretended to be connected to the Tennessee Republican Party.

More commonly, the IRA created accounts in the names of fictitious U.S. organizations and grassroots groups and used these accounts to pose as anti immigration groups, Tea Party activists, Black Lives Matter protesters, and other U.S. social and political activists. For example, the Twitter account @TEN_GOP, purported to be connected to the Tennessee Republican Party.

IRA Facebook groups active during the 2016 campaign covered a range of political issues and included purported conservative groups (with names such as "Being Patriotic," "Stop All Immigrants," "Secured Borders," and "Tea Party News"), purported Black social justice groups ("Black Matters," "Blacktivist," and "Don’t Shoot Us"), LGBTQ groups ("LGBT United"), and religious groups ("United Muslims of America").

1.4.2 Advertisements

To reach larger U.S. audiences, the IRA purchased advertisements from Facebook that promoted the IRA groups on the news feeds of U.S. audience members. According to Facebook, the IRA purchased over 3,500 advertisements, and the expenditures totaled approximately $100,000.

Many IRA-purchased advertisements explicitly supported or opposed a presidential candidate or promoted U.S. rallies organized by the IRA. For example, on March 18, 2016, the IRA purchased an advertisement depicting candidate Clinton and a caption that read:

"If one day God lets this liar enter the White House as a president - that day would be a real national tragedy."

Similarly, on April 6, 2016, the IRA purchased advertisements for its account "Black Matters" calling for a "flash mob" of U.S. persons to take a photo with #nohillaiy2016 or #HillaryClinton-ForPrison2016.

The first known IRA advertisement explicitly endorsing the Trump Campaign was purchased on April 19, 2016. The IRA bought an advertisement for its Instagram account "Tea Paity News" asking U.S. persons to help them "make a patriotic team of young Trump supporters" by uploading photos with the hashtag "#KIDS4TRUMP."

In subsequent months, the IRA purchased dozens of advertisements supporting the Trump Campaign, predominantly through the Facebook groups "Being Patriotic," "Stop All Invaders," and "Secured Borders".

(10)

1.4.3 Political rally promotion

The IRA organized and promoted political rallies inside the United States while posing as U.S. grassroots activists.

First, the IRA used one of its preexisting social media personas (Facebook groups and Twitter accounts) to announce and promote the event. The IRA sent a large number of direct messages to followers of its social media account asking them to attend the event. It then further promoted the event by contacting U.S. media about the event and directing them to speak with the coordinator. After the event, the IRA posted videos and photographs of the event to the IRA’s social media accounts.

For example, on May 31, 2016, the operational account "Matt Skiber" began to privately message dozens of pro-Trump Facebook groups asking them to help plan a "pro-Trump rally near Trump Tower. In May 2016, the IRA created the Twitter account @march_for_trump, which promoted IRA-organized rallies in support of the Trump Campaign.

The were dozens of U.S. rallies organized by the IRA. The earliest evidence of a rally was a "confederate rally" in November 2015. The IRA continued to organize rallies even after the 2016 U.S. presidential election. The attendance at rallies varied. Some rallies appear to have drawn few (if any) participants while others drew hundreds. The reach and success of these rallies was closely monitored.

1.4.4 Audience

Facebook. In November 2017, a Facebook representative testified that Facebook had identified 470 IRA-controlled Facebook accounts that collectively made 80,000 posts between January 2015 and August 2017. Facebook estimated the IRA reached as many as 126 million persons through its Facebook accounts.

For example, at the time they were deactivated by Facebook in mid-2017,

− the IRA’s "United Muslims of America" Facebook group had over 300,000 followers, − the "Don’t Shoot Us" Facebook group had over 250,000 followers,

− the "Being Patriotic" Facebook group had over 200,000 followers, − and the "Seemed Borders" Facebook group had over 130,000 followers.

Twitter. The IRA’s Twitter operations involved two strategies. First, IRA trolls operated certain Twitter accounts to create individual U.S. personas. Separately, the IRA operated a network of automated Twitter accounts (bot network) that enabled the IRA to amplify existing content on Twitter.

Individualized accounts used to influence the U.S. presidential election included − @jenn_abrams (claiming to be a Virginian Trump supporter with 70,000 followers);

(11)

− @Pamela_Moore13 (claiming to be a Texan Trump supporter with 70,000 followers); − and @America_lst_ (an anti-immigration persona with 24,000 followers).

Multiple IRA-posted tweets gained popularity. For example, one IRA account tweeted, "To those people, who hate the Confederate flag. Did you know that the flag and the war wasn’t about slavery, it was all about money." The tweet received over 40,000 responses.

The U.S. media outlets also quoted tweets from IRA-controlled accounts and attributed them to the reactions of real U.S. persons. Similarly, numerous high-profile U.S. persons, including former Ambassador Michael McFaul, Roger Stone, Sean Hannity, and Michael Flynn Jr., retweeted or responded to tweets posted to these IRA controlled accounts. Multiple individuals affiliated with the Trump Campaign also promoted IRA tweets.

Posts from the IRA-controlled Twitter account @TEN_GOP were cited or retweeted by multiple Trump Campaign officials in the United States, including Donald J. Trump Jr., Eric Trump, Kellyanne Conway, Brad Parscale, and Michael T. Flynn. These posts included allegations of voter fraud, as well as allegations that Secretary Clinton had mishandled classified information.

On September 19, 2017, President Trump’s personal account @realDonaldTrump responded to a tweet from the IRA-controlled account @l0_gop9. The tweet read: "We love you, Mr. President!" In January 2018, Twitter publicly identified 3,814 Twitter accounts associated with the IRA. According to Twitter, in the ten weeks before the 2016 U.S. presidential election, these accounts posted approximately 175,993 tweets, "approximately 8.4% of which were election related. Twitter also announced that it had notified approximately 1.4 million people who Twitter believed may have been in contact with an IRA-controlled account. Roughly 29 million people were served content in their News Feeds directly from the IRA’s posts over the two years. Posts from these Pages were also shared, liked, and followed by people on Facebook, and, as a result, three times more people may have been exposed to a story that originated from the Russian operation.

1.4.5 Takeaway

In modern life information about socially valuable events is produced and consumed in form of news. In real life news production is inevitably engaged in political as well as commercial competition.

The state regime suppressing human rights and freedom might be interested in massive propa-ganda, masking crimes and only mimicking traits of healthy society.

Big media holdings might be interested in producing sensational messages able to attract wide audience.

Social media users might be more concerned about viral spread of information than about its quality.

(12)

What is important is the constant need and public demand for transparency, free access to valuable information, for the freedom of speech and thought, for the possibility to conscious and responsible participation in the social life.

This demand can be satisfied by the whatch-dog and investigative journalistic activity, based on constant monitoring states, political organizations, and businesses, and incorporating in-depth fact checking. Only this way mass fake news production and spreading can be balanced in civil society.

As for the Russian trolls, they activity besides the US presidential election, has touched several other major Western elections:

− Brexit (Wintour 2018; Duh, Rupnik, and Korošak 2017; Gorodnichenko, Pham, and Talavera 2018)

− Catalonia Independence Referendum (White 2017) − French Presidential Election (Daniels 2017)10 − German Federal Election (Shuster 2017).

For instance, (Gorodnichenko, Pham, and Talavera 2018) has shown that a high volume of Russian tweets were generated a few days before the voting day in the case of the 2016 E.U. Referendum (Brexit Referendum), and then dropped afterwards.

And the trolls continue to work this day.

2

Democracy mass media: their functions

News is a "product vested with public interest" rather than a "commodity product" (McQuail 2005). Western legislation views traditional mass media as being integral to democracy: the European Convention on Human Rights (ECHR) has long assigned many democracy-producing functions to traditional mass media. They can be found in the Charter of Fundamental Rights of the European Union, which is intertwined with national legislation, as well as in relevant judgments of the ECHR. As one of the most crucial functions of mass media, the public service function expects journalism to generate public attention for politically relevant issues, enable democratic processes, monitor the government, and provide public value.

The freedom of expression and freedom of information are fundamental for that. These pillars are legally based on law and for example codified in Article 10 of the European Convention on Human Rights, stating: freedom of expression shall include freedom to hold opinions and to receive and impart information and ideas without interference by public authority and regardless of frontiers in a democratic society.

10It is suspected that Russia is behind the MacronLeaks campaign that occurred during the 2017 French presidential elections period (Ferrara 2017), as well as the Catalonian referendum (Stella, Ferrara, and De Domenico 2018).

(13)

Citizens have the right to freely make political decisions and participate in democratic elections, and thus depend on journalism’s role to enable them to make these political decisions based on relevant information (Lohmann and Riedl 2019).11

2.1 Inform

As elections are the core of all representative democracies, citizens need to be supplied with all of the information they need in the polling booth. The mass media bears responsibility for that and should provide high quality political information and help citizens to come to well informed political opinions.

Common professional values of journalist are closely related to the watchdog ideal. At the level of news performance (actual news content), the watchdog model has three classical elements: objectivity, factually, and critical coverage (Jebril 2013). In order to conduct their role as a watchdog, journalists need to have a certain distance from the powers and challenge them, as opposed to "propagandist" journalists, who are loyal to the ruling powers and elites. Detached watchdog is objective, neutral, and impartial. Still, because of his watchdog function, he articulates his "skeptical and critical attitude towards the government and business elites" (Lohmann and Riedl 2019).12

Be a detached observer, report things as they are, provide analysis of current affairs means provide information people need to make political decisions.

Watchdog journalism informs the public about goings-on in institutions and society, especially in circumstances where a significant portion of the public would demand changes in response.

2.2 Monitor

Journalists have a monitoring function, they have a responsibility to monitor and criticize govern-ment, politics, and the economy in the name of the citizen. Moreover, this monitoring responsibility, or public watchdog role, strengthens the democratic process by creating an additional layer of pro-tection against potential detriments or injustices against citizens. The idea of media’s monitoring function basically reflects the legal concept of division of powers that shapes modern constitutional states.

While the information function mainly addresses citizens as voters and thus aims to establish democratic representation, the monitoring function aims to maintain this representation during times of governing to ensure the representatives feel obliged to uphold the pledges made before elected.

Watchdog journalism model involve interviewing public figures and challenging them with problems or concerns, beat reporting to gather information from meetings that members of the

11See The Worlds of Journalism Study (WJS) https://worldsofjournalism.org/ 12See also https://en.wikipedia.org/wiki/Watchdog_journalism

(14)

public might not otherwise attend, observe "on the ground" in broader society and alert public when a problem is detected

2.3 Communicate

The public forum established via the media’s communication function may also function as an initial point of social change.

Journalists advocate for social change, motivate people to participate in political activity, let people express their views, promote tolerance and cultural diversity.

Through the advancement of the internet, reciprocal communication can be established and facilitated beyond the limitations associated with traditional media such as newspaper, TV and radio. Thus, it amplifies the possibility of participation, supplementing the democratic process by creating and maintaining various possibilities of interactions and feedback systems.

2.4 Entertain?

Monitoring, informing, communicating. Beside that, media need to provide commercially valuable entertainment and relaxation, provide the kind of news that attracts the largest audience. Sometimes this need is in conflict with it’s highly social engaged activities.

It has been argued that journalism is more entertaining than it is informative by focusing on scandals, violence and political personalities rather than ideologies.

The media industry is described as becoming more commercialized than ever before, and the transformation is sometimes discussed in terms of a change from public driven to market driven conditions.

The 24-hour news cycle also means that news channels repeatedly draw on the same public figures, which benefits PR-savvy politicians and means that presentation and personality can have a larger impact on the audience than facts, while the process of claim and counter-claim can provide material for days of news coverage at the expense of deeper analysis of the case.

The digital culture allows anybody with a computer and access to the internet to post their opinions online and mark them as fact which may become legitimized through echo-chambers and other users validating one another. Content may be judged based on how many views a post gets, creating an atmosphere that appeals to emotion, audience biases, or headline appeal instead of researched fact.

Some also argue that the abundance of fact available at any time on the internet leads to an attitude focused on knowing basic claims to information instead of an underlying truth or formulating carefully thought-out opinions. The internet allows people to choose where they get their information, allowing them to reinforce their own opinions.

(15)

− TV channels and newspapers, funded by the state13 and hence inclined to pro-government propaganda,

− major news outlets, driven by the imperatives of profit-obsessed markets, touched by market enforced sensationalism and infotainment,

− wide scoped social networks14 spreading unverified information

are the targets of the constant critics and the subject to considerable skepticism for its potential harm to the public. This critics promote the rise of monitoring and fact-checking institutions incorporated in media companies15 structure as well as a independent publicly financed charities.

Indeed, some of the most sophisticated watchdogging is being done by independent, nonprofit entities devoted to investigative reporting. They fill a gap in media systems where market, own-ership, or political pressures make investigative reporting by commercial or state-owned media difficult if not impossible. These centers are involved in training and reporting and serve as models of excellence that are helping raise the standards of local journalism (Coronel 2010).

Despite the fact that many of the nonprofit investigative reporting centers have recently flour-ished in new democracies, in this report authors restrict ourselves to some prominent examples of non-governmental, non-political and financially independent organization charged with monitoring - informing - communicating duty tightly relates with use of information technologies.

3

Parliamentary monitoring platforms

Parliamentary monitoring tools were some of the first tech solutions developed for governmental transparency efforts. They allow the public to check the details on every Member of Parliament (MP) – their background, affiliations, voting record. They present how bills are processed: from an idea, through commissions, consultations and hearings all the way to the votes in different chambers and signing into law. They allow searches of speeches both in plenary sessions or committees. Some initiatives extend the scope of statement monitoring into collecting data from social and traditional media.

Such tools and their features can be also used to monitor local city hall councils, district councils, political parties, tenants associations or any other organizations where democratic processes of debate and voting are used.

Democratic organizations can only be accountable to their members (citizens) if they are transparent. That is the main drive of organizations recording council sessions, opening up

13Like CCTV News and RT, and Voice of America in the USA 14Facebook, Instagram and Twitter and others

15Examples: BBC Reality Check, a dedicated team established in 2016 to debunk fake news spread in social net https://www.bbc.com/news/reality_check (Jackson 2017), Channel 4 News’ FactCheck blog (https://www.channel4. com/news/factcheck), Les Décodeurs at Le Monde, France (https://www.lemonde.fr/les-decodeurs/), The Washington

(16)

datasets and/or creating the above-mentioned tools.

3.1 GovTrack.us

GovTrack.us is the comprehensive, nonpartisan sources for legislative information that have been used by millions of people. It serves a critical purpose: providing congressional information to citizens and civil society with an eye toward transparency and accountability. It offers free data and APIs that civil society groups and citizens can freely use.

The website collects data on members of Congress, allowing users to check members’ voting records and attendance relative to their peers. It enables its users to track the bills and members of the United States Congress. Users can add trackers to certain bills, thereby narrowing the scope of the information they receive. It propagates the ideology of increasing transparency in the government and building better communication between the general public and the government.

It has been developed since 2004 by Josh Tauberer and early in 2016 another big US-based portal, Sunlight’s OpenCongress, joined forces with GovTrack to be the go to place for American citizens looking for information about Congress (K. Duffy 2016).16

3.2 Vote Smart

Vote Smart, formerly called Project Vote Smart, is a non-profit, non-partisan research organization that collects and distributes information on candidates for public office in the United States. It covers candidates and elected officials in six basic areas: background information, issue positions, voting records, campaign finances, interest group ratings, and speeches and public statements. This information is distributed via their web site, a toll-free phone number, and print publications.

Vote Smart started by Presidents Ford and Carter along with 38 other national leaders of both parties launched the non-profit organization in 1992, and in 1995 introduced its website, which provides an encyclopedic log of candidates’ speeches, voting records, special interest group ratings, campaign financing sources, and an assessment of candidates’ positions on a wide variety of issues. It also provides records of public statements, contact information for state and local election offices, polling place and absentee ballot information, ballot measure descriptions for each state (where applicable), links to federal and state government agencies, and links to political parties and issue organizations.

4

Fact checking organizations

President Trump has made 12,019 false or misleading claims over 928 days . . . , adding about 20 a day in the past two months. The Washington Post Fact Checker, 12 Aug 2019

(17)

4.1 Snopes

Snopes is an extremely popular website, and it has been around for more than 20 year. Its mission is to reject internet rumors, hoaxes, urban legends, and other nonsense.

Back in 1994 its creator David Mikkelson was just playing around with this shiny new thing called the Internet. By then he was working as software engineer and was participating in early Usenet newsgroups. The first two he joined were alt.folklore.urban and rec.arts.disney.

On the rec.arts.disney, Mikkelson started out writing about the biggest urban legends surround-ing all thsurround-ings Disney: Was Walt Disney cryogenically frozen after death? (No, he wasn’t.) Next came the rumors about salacious things supposedly hidden in Disney movies.

Figure 3: From Webby exclusive How the truth

set Snopes free (Walker 2016)

Alt.folklore.urban was a place for people who enjoyed collecting, sorting, and organizing facts. As as his handle on alt.folklore.urban Mikkelson used "Snopes" — the family name of an batch of characters in a series of William Faulkner novels.

Via that Usenet board he got acquainted with Barbara, his future wife. And in 1995, the couple launched Snopes.com basically porting the alt.folklore.urban model onto the Web: a collection of carefully researched articles, list-ing verifiable sources, mostly debunklist-ing popu-lar contemporary legends. This was all before social media, YouTube or even search engines. But word got around and people started seeking out Mikkelson’s site.

Then, on September 11, 2001, everything changed. The planes flew into the Twin Towers and crashed at the Pentagon and in Pennsyl-vania, and America turned, panicked, to the internet to try to explain those events to itself. All possible conspiracy theories and rumors sprang up.

There was the rumor that the 16th century astrologer Nostradamus had predicted the attacks. Another claim was telling there were 4,000 Israelis who worked in the World Trade Center and who stayed home that fateful day. The site was flooded with requests to verify or debunk this or that theory about the terror attacks, the attackers, the possibility that the government was in on it and other rumors coursing through an accelerating news cycle.

Traffic spiked. Suddenly the press, which had treated Snopes mostly as a curiosity, took real interest. Snopes was the only site that was tracking and writing about all the 9/11 rumors and

(18)

conspiracy theories that were on the Internet, the traditional news media weren’t doing that.

Figure 4: From Webby exclusive How the truth

set Snopes free (Walker 2016)

Mikkelson remained a full-time software engineer until late 2002. By then, advertising was generating enough money to live on, and the site had begun to cover not just classic urban legends, but basically whatever was coming up on the Internet. In 2002 Snopes stopped to be just Mikkelson hobby.

Snopes doesn’t really have an office. All of the 16 editorial employees are spread out across the country and work remotely, geographically far-flung, but constantly connected on Slack.

The things that Snopes works on are deter-mined by its readers. Snopes receives about 1,500 emails a day. They also get story ideas from a private Facebook group which has 80,000 members, as well as from what people search for when they log onto the site.

Advertising is Snopes’ main source of rev-enue. Snopes is now 50% owned by an ad agency (Proper Media) and they make money by generating millions of views on the 3rd-party advertisements on their website. A small pro-portion of revenue derives from Snopes’

fact-checking partnership with Facebook and public donations.17

Snopes has no political leanings, doesn’t accept political advertising and does not advocate for issues on the left or on the right. But many conservatives consider it a left-leaning site (Criss 2017). Reader can controll the Snopes "non-bias" on MBFC site: https://mediabiasfactcheck.com/ snopes/.18

17In 2017, Snopes received $100,000 from Facebook for participating in the partnership. (More about Facebook and Snoopes collaboration see in (Levin 2018; Funke 2019b))

18Media Bias/Fact Check (MBFC), founded in 2015, is an independent online media outlet. MBFC is dedicated to educating the public on media bias and deceptive news practices. Funding for Media Bias Fact Check comes from donations and third party advertising. MBFC was founded by Dave Van Zandt in 2015. Dave studied Communications in college and over the years has focused on personal research in media bias and the role of media in politics. (https://mediabiasfactcheck.com/about/) By the date, the Media Bias Fact Check reviewed about 2500 sources. The first 500 or so sources reviewed were picked by the reviewers. The last 2000+ sources added have been submitted by users. See https://mediabiasfactcheck.com/frequently-asked-questions/. If you were wondering who is there to fact check the fact checkers, that would be MBFC. (https://www.makeuseof.com/tag/true-5-factchecking-websites/). It takes into account its own methodology (https://mediabiasfactcheck.com/methodology/), factors that they consider include sourcing, biased wording, story choices, and political affiliation.

(19)

The first decade of the 2000s, the mainstreaming of the Web, and all the misinformation that spread through it, fueled a massive desire for rapid fact-checking. This decade saw the debut of FactCheck.org (a nonprofit connected to the Annenberg Foundation)19, PolitiFact (launched by The Tampa Bay Times) and Fact Check (Glen Kessler’s column and blog for The Washington Post). All were primarily focused on the statements of political figures, and positioned themselves as bipartisan. (NewsBusters.org has a more openly conservative fact-check agenda, and Media Matters a more liberal one.)

Traffic grew. Events like Hurricane Katrina (2005) produced even more newsy conjecture and real-time mythology. Then came the 2008 election with wild rumors about Obama erupting "within minutes of his candidacy" announcement. By 2009 Snopes pulled in over 6 million unique visitors per month. (Now it averages 22 million views per month.)

In 2014 it was the Malaysian Boeing that was swiftly running stream of bogusness: Malaysia Airlines Boeing 777-200ER from Flight MH370 disappeared over the South China Sea on 8 March 2014 with 239 persons, 227 passengers and 12 crew members (McNamara 2014).

Prior to and after the election of 2016 the fake right wing websites exploded again, many of them originate in the country of Macedonia.20

By that time, the Duke Reporters’ Lab already counted 96 active fact-check sites "keeping tabs on politicians" in 37 countries. Thanks to elections controversies large and small, Snopes traffic exploded to about 20 million visitors a month during the primaries and political conventions. The day after the election was the busiest in Snopes’ history. 2.5 million people visited the site that day. (Before that it had been averaging 750,000 visitors a day.)

And in the age of Trump, it hasn’t slowed down.

President Trump’s victory in the 2016 election provided a shift in the types of stories being vetted. Political items have become a much higher percentage of the topic mix since the election, probably up from 60-70% to about 90%. Vetting political news was certainly not the original Snopes mission, but gradually became core to the site as truth-seeking readers searched for answers. For at least some months in 2016, the records show, Snopes was pulling in more than $200,000 a month in advertising sales.

A lot of people trust the website Snopes.com and use it to fact-check things they hear on the internet. The verb "to snope" apear in readers lexicon next to the verb "to google".

"I haven’t done a paper in the past 10 years that I haven’t also checked to see what Snopes had to say about it first," says Patricia Turner, professor of folklore at UCLA. "Anything that raises hairs on the back of my neck, I go to Snopes." (Dean 2017; Dawn 2018)

19FactCheck is a non-partisan fact-checking website which focusing primarily on US politics. It is a non-profit project run by the Annenberg Public Policy Center of the University of Pennsylvania. (The APPC was established by publisher and philanthropist Walter Annenberg to create a community of scholars within the University of Pennsylvania that would address public policy issues at the local, state and federal levels.) Not only does the site regularly debunk politician claims and viral fake news, but it also lets users submit their own questions to the website. FactCheck.org is a fact-checking website with an established history of journalistic rigor and Webby Award winner.

(20)

According to Alexa, a web analytics firm, more people in the US visit Snopes than either of its two main fact-checking rivals, PolitiFact and FactCheck.org. Massive readership keeps it near the top 500 Alexa ranking of US sites (Walker 2016).

4.2 Full Fact

Full Fact is the UK’s independent fact-checking charity.

Full Fact was founded in 2009 by Michael Samuel (Chair) a Conservative Party donor and Anne Freud Centre Chairman, Will Moy. Full Fact applied to the Charity Commission for charitable status when it was being founded in 2009 but this was refused. An appeal was heard by the commission’s tribunal in 2011 but this was rejected on the grounds that the stated objective of "civic engagement" was too political in nature. The wording was changed to "the advancement of public education" and charitable status was then granted in 2014.21

In 2016, Full Fact covered the United Kingdom European Union membership referendum to check claims made during the campaign. For example, the most controversial claim was that "We send the EU £350M a week" and Full Fact demonstrated that this was false.

In 2017, Full Fact collaborated with another similar organisation, First Draft22, to staff a fact-checking team covering the UK general election.

In 2016 Full Fact has been given a 50,000 (£42,966) grant from Google. The money comes from its Digital News Initiative (DNI) –a scheme intended to help groups create new tools that could make it easier for journalists to do their jobs. The tech company is giving 150m (£128m) over three years to "help stimulate innovation in digital journalism" (Burgess 2016).

In June 2017 Full Fact has been sponsored $500,000 to develop automated fact-checking tools by the Omidyar Network and Open Society Foundations (Full Fact team 2017).

The group of four persons works on fact checking automation.

The goal is to fully automate monitoring of Twitter and Facebook accounts, websites, TV subtitles, online adverts, as well as radio and TV programs via speech recognition.

The other ambition is to fully automate claim recognition. Claim recognition is the task of detecting verifiable claims, such as:

− "Britain sends over 350 million pounds to the European union each week" − "Immigration from inside the EU has increased"

− "The deficit has been cut in half as a share of GDP" − "Crime is starting to rise"

21https://en.wikipedia.org/wiki/Full_Fact 22https://firstdraftnews.org

(21)

and similar. The task is not easy and requires the work on paraphrase handling. The group are working on building models that can detect claims automatically, match similar claims and cluster groups of claims together.

By this time the process is partially automated. Team uses

− MAVIS (Microsoft Speech recognition API) for TV and Radio speech automatic transcrip-tion,

− Poplus SayIt to monitor legislative debates23, − Twitter API to monitor tweets.

Full Fact developed two automation tools named "Live" and "Trend". "Trends" records every repetition of a claim known to be wrong, as well as where it comes from, and keeps track of who or what is persistently pushing misleading claims out into the world.24

"Live" has two functions. It spots already checked claim in TV subtitles and automatically pulls up Full Fact’s most recent articles in response. It also spots claims that haven’t been fact checked before - but reliable data exists for - and creates fact checks on the spot using that data.25

4.3 PolitiFact

PolitiFact is a non-partisan fact-checking website that focuses on claims made in the political sphere in the US. This includes statements by politicians, political topics such as immigration, and general news. A global edition of the site tackles stories from other parts of the world. PolitiFact was created by the Tampa Bay Times, a Florida newspaper, in 2007.

PolitiFact is a Pulitzer Prize-winning website for it’s coverage of the 2008 presidential election. In 2018, PolitiFact was acquired by the Poynter Institute, a nonprofit school for journalists. Company has a full-time staff of 10 split between Washington, D.C., and St. Petersburg, Florida. PolitiFact has affiliates in 18 states.

While PolitiFact relies on administrative support from the Poynter Institute, it is otherwise financially self-sustaining. It receives funding from online advertisements placed on the website. PolitiFact also receives compensation for selling its content to media publishers and companies. Organizations that contributed more than 5 percent of total PolitiFact revenues in the previous calendar year are The E.W. Scripps Company and Facebook.

PolitiFact also accepts grants. In 2017, PolitiFact launched a membership campaign called the Truth Squad to allow individual donations. Donors have no say in the ratings PolitiFact issues. PolitiFact does not give donors the right to review or edit content.

23https://sayit.mysociety.org/about/questions

24Full Fact adopts of ClaimReview markup https://schema.org/ClaimReview 25See https://fullfact.org/automated

(22)

As part of PolitiFact’s mission to remain transparent and independent, PolitiFact will disclose on this page any individual donation in excess of $1,000. PolitiFact does not accept donations from anonymous sources, political parties, elected officials or candidates seeking public office, or any other source with a conflict of interest as determined by PolitiFact’s executive director (Sharockman 2011).

Every day, reporters and researchers from PolitiFact and its partner news organizations examine statements made by political candidates. Then they rate it’s accuracy on their Truth-O-Meter: "True", "Mostly True", "Half True", "Mostly False", and "False". The most ridiculous falsehoods get PolitiFacts lowest rating, "Pants on Fire".

For example on August 3, 2019, following the recent mass shootings in Gilroy, California and El Paso, Texas, and just hours before a separate mass shooting in Dayton, Ohio, California Democratic Senator Dianne Feinsteina, a strong advocate for gun control, twitted about the number of guns and gun deaths in America. She stated: "There are more guns in this country than people, and more per capita than any other country in the world. And there are more gun deaths by far".

PolitiFact checked: it’s false: USA are on the second place after Brasil with 40,229 and 48,493 2017 year’s gun death correspondingly (Nichols 2019).

For other exemplary claims – of Sheila Bynum-Coleman, Democrat from Virginia, made on Monday, August 5th, 2019: "African-Americans are three to four times more likely to be arrested for drug crimes, and nine times more likely to be imprisoned.", as well as for Donald Trump’s (on Tuesday, August 6th, 2019): "Last year, we prosecuted a record number of firearms offenses." – PolitiFact verdict was "Mostly True".

In addition to every day fact-checking activity the company has an experience of tracking campaign promises. In 2008 it created the "Obameter" to track the record of President Barack Obama. Over eight years and two campaigns, PolitiFact have been tracking more than 530 of Obama’s campaign promises to find out what became of them26 (X. Zhang and Ghorbani 2019).

4.4 Partnership with Facebook

The 25 April 2019 Facebook announced its partnership with 52 fact-checking sites in 33 countries worldwide. All sites are the signatories of the International Fact-Checking Network’s code of principles, as it was a necessary condition for joining the project. International Fact-Checking Network’s code of principles includes non-partisanship and fairness, transparency of sources, transparency of funding and organization, transparency of methodology, and open and honest corrections policy.27

The project first was launched in December 2016 with only five American fact-checkers:

26To date, it was found that Obama has kept 45 percent of his promises, compared to 22 percent that he broke. The remainder either resulted in a compromise or are still in the works.

27IFCN has 65 signatories out of 188 fact-checking organizations worldwide. For more details see https: //ifcncodeofprinciples.poynter.org/, https://www.ifcncodeofprinciples.poynter.org/signatories

(23)

(Poynter28-owned) PolitiFact, Factcheck.org, Snopes, ABC News and the Associated Press. Since then, it has grown at a steady clip. It has also expanded to limit the reach of false photos and videos in addition to articles.

Facebook also is trying to amplify fact-checkers’ work using machine learning to identify duplicate claims and decrease the reach of pages that repeatedly share misinformation.

The premise was promising: independent fact-checkers would be given access to a dashboard on Facebook, where they could see what posts users were flagging as potentially false.

Under the partnership, once a fact-checker rates a post as false, its future reach in the News Feed is decreased and a fact check is appended. If users try to share it, they will receive a notification that it has been debunked. Additionally, pages that repeatedly share false posts have their monetization capabilities restricted.

The partnership was described in the several web publications of the sort "transparent fact-checkers meet not that transparent Facebook (Levin 2018; Funke 2019b). Fact-fact-checkers were concerned by the lack of transparency from Facebook about how their work has affected the spread of misinformation on the platform: while the company has shared more details about the partnership, detailed data on its results have not yet materialized.

To get a better sense of the results of Facebook’s fight against misinformation, Pointer prepared the survey of this collaborative efforts, interviewing 19 Facebook partners about their experience. Responses indicate that fact-checkers have flagged tens of thousands of links to false or misleading content, that they are discreetly satisfied with the relationship as a whole, but don’t think it has been a game-changer, and there’s a broad consensus among them that Facebook should do more when it comes to sharing information with the public.29

Besides that Factcheck.org disclosed receiving a palindromic $188,881 from Facebook in fiscal year 2018, and it was already mentioned the sum of $100,000 got from Facebook by Snopes.

28Poynter is a journalism training and research center in St. Petersburg, Florida.

29 "When asked why they joined the partnership, most fact-checkers offered a variety of reasons. For many, this was an opportunity to reach audiences where they were and reduce the reach of misinformation in a way that aligned with their mission. The financial incentive is also attractive.

Judged by their own objectives, fact-checkers appear moderately satisfied with the partnership, rating it on average a 3.5 out of 5. If this were a Yelp review, the restaurant wouldn’t be a must-eat but also not somewhere you would risk food poisoning.

Fact-checkers are less convinced that the partnership has helped their organizations find claims that they otherwise wouldn’t have surfaced as rapidly (3 out of 5). They are uncertain about whether it has helped them reduce the reach of viral hoaxes (2.9 out of 5), which is a central plank of the social network’s communication about what the partnership should achieve.

The most critical question for partners remains that they believe the company is not telling the public enough about how the partnership works. On average, agreement with the statement "Facebook provides enough information about this partnership with the public" was a measly 2.2 out of 5." (Funke and Mantzarlis 2018).

Poynter reported that "in the past two-and-a-half years of Facebook’s fact-checking program, the only public numbers regarding its efficacy are contained in a leaked email obtained by BuzzFeed News in fall 2017. According to the email, the future reach of posts flagged as false decreases by up to 80% after an average of three days" (Funke 2019a; Funke 2018).

(24)

4.5 Fact-checkers collaboration

4.5.1 Global Fact

According to the latest tally by the Duke Reporters’ Lab. number of fact-checking outlets surges to 188 in more than 60 countries by the June 2019. A bit more than half of fact-checkers are part of a media company: 106 of 188, or 56%, see Table 1 (Stencel 2019).

Country February 2018 June 2019

Africa 4 9 Asia 22 35 Australia 3 5 Europe 52 61 North America 53 60 South America 15 18

Table 1: The increase in number of fact-checking organizations world wide from February 2018 to June 2019 (Stencel 2019)

Duke Reporters’Lab also released he interactive map of world fact-checking: https://reporterslab. org/fact-checking/

In June, 2019 took place the Sixth Global Fact-Checking Summit30 in Cape Town, South Africa. About 250 participants from nearly 60 countries attended – which is yet another measure of fact-checking’s continued growth: that’s five times the number from the first GlobalFact in London in 2014. Among attenders were people from fact-checking companies like PolitiFact, Snopes.com, Full Fact, Chequeado, Pagella Politica as well as representatives of big social networks as Facebook, Google, Twitter, and YouTube.

The Global Fact-Checking Summit - Global Fact - is the premier conference dedicated to fact-checking worldwide. Global Fact is organized by Poynter’s International Fact-Checking Network31 and has been held every year since 2014. The first two conferences were held in London, and the respective ones were in Buenos Aires, Madrid, and Rome.

4.5.2 The Duke Tech & Check Cooperative Project

In September 2017 the Reporters’ Lab, a center for journalism research in the Sanford School of Public Policy at Duke University, launched The Duke Tech & Check Cooperative32, $1.2 million project to automate fact-checking. Tech & Check is funded by the John S. and James L. Knight Foundation, the Facebook Journalism Project and the Craig Newmark Foundation. Google and Facebook are among the funders of the Reporters’ Lab.

30See https://gfworkshops.org/

31The International Fact-Checking Network is a unit of the Poynter Institute dedicated to bringing together fact-checkers worldwide. See https://www.poynter.org/ifcn/

(25)

The two-year endeavor brings together journalists, developers and academics to build apps to disseminate fact-checking to new audiences and create tools that help fact-checkers do their work.

Tech & Check Alerts The project has two focus areas. The first is developing the "tracker family" of instruments to help fact-checkers identify claims to check. Claim detection is one of the most time-consuming aspects of fact checking. The scientists at the University of Texas at Arlington, participating in project, already developed and presented ClaimBuster, a tool that uses an algorithm to identify factual claims. It is a core of Tech & Check Alerts tracking app, developed by Duke students on the Tech & Check team. Tech & Check Alerts is an automated service that helps fact-checkers find claims to check. Using the ClaimBuster algorithm that scores sentences based on how checkable they are, the service combs through official cable news transcripts and social media posts, then sends participating fact-checkers a daily email with statements they might be interested in scrutinizing.

Similar tools are in development progress at the state government level, in partnership with Digital Democracy, an initiative of the Institute for Advanced Technology and Public Policy at Cal Poly-San Luis Obispo.

Post-hoc analysis of the claims checked by professional fact-checkers at CNN, PolitiFact.com, and FactCheck.org reveals a highly positive correlation between ClaimBuster and journalism organizations in deciding which claims to check.

FactStream The other focus is the developing the "pop-up family" of tools, that help to dis-seminate fact-checking to large audiences. FactStream is a second-screen app that provides live fact-checking during political events. Currently available for the iPhone and iPad, FactStream provides users with pop-ups that include previously published fact-checks or real-time analyses of politicians’ factual claims. Developers plan to extend the service for all smartphones, tablets and TV platforms, including Chromecast and Apple TV, as well as to voice-activated assistants, such as Google Home and the Amazon Echo.

Share the Facts There is also Share the Facts, developed in conjunction with Jigsaw, that offers fact-checkers and consumers a way to share fact-checks and spread them across the internet. Fact-checkers can use the Share the Facts widget to help search engines more easily identify and highlight their content, and readers can easily find, share and embed fact-checks on the web.

Tech & Check also serves as a hub for other researchers building automated fact-checking projects. It hosts an email listserv and regular video conferences that bring together technologists, journalists and academics working on similar projects.

(26)

4.5.3 ClaimReview

The Duke Reporters’ Lab worked with Google and Schema.org to develop ClaimReview. Claim-Review is an open-source tagging system that fact-checkers can use to mark their articles for search engines and social media platforms such as Google Search, Google News, Bing, Facebook and YouTube. The platforms then use the tags to promote and highlight fact-check articles. Any pub-lisher can claim-review fact-check articles, both reviewing a statement and assessing its accuracy. The easiest way to make ClaimReview is to use the Fact Check Markup Tool, an application that instantly alerts search engines and other platforms to new fact-checks by submitting the data to DataCommons.org, and making them available to all.33.

5

Fact-checking application: ClaimBuster

In the previous section Full Fact and Reporters’ Lab efforts to automate fact-checking were mentioned. Both organisations followed the logic of journalist’s fact-checking activity and tried to provide fact-checker with filtered and narrowed information stream.

In this workflow first comes tool for verifiable claims detection that poll news sources and output the phrases to check.

At this point user needs the trusted information to confront the claims obtained. Thus he helps yourself with other instrument that transforms the claim in hand into the search query, performs the search in integrated search engine and provide the user with context information related to the claim to verify.

ClaimBuster, a fully-fledged fact-checking application, is created at the University of Texas at Arlington in collaboration with Reporters’ Lab. It was deployed in 2015 and then tested with live coverage of all primary and general election debates of 2016 U.S. presidential election. Its developers published several papers describing the claim scoring algorithm and the high level architecture of the system (Hassan, C. Li, and Tremayne 2015; Hassan, G. Zhang, et al. 2017; Hassan, Arslan, et al. 2017).34

Closed captions of the debates on live CNN’s TV broadcasts, captured by a decoding device, are fed to ClaimBuster, which immediately scores each sentence spoken by the candidates and posts top-scored claims to the project’s website https://idir.uta.edu/claimbuster/ and Twitter account @ClaimBusterTM.

ClaimBuster is also continuously monitoring a list of 2220 Twitter accounts: U.S. politicians, news and media organizations, – using the Twitter streaming API and retweeting the check-worthy

33To start using the tool, publishers must first get authorization through Google Search Console, see https://www. claimreviewproject.com/the-facts-about-claimreivew, https://toolbox.google.com/factcheck/markuptool

34ClaimBuster demo is released and available for testing on the University of Texas at Arlington site: https://idir.uta. edu/claimbuster/.

(27)

factual claims it finds.35. It filters out non politics-related tweets using an SVM classifier.

Recently it started to monitor Hansard, the transcripts of proceedings of the Australian parlia-ment: https://idir.uta.edu/claimbuster/hansard.

5.1 Claim spotting

Given a sentence, ClaimBuster gives it a score between 0.0 and 1.0. The higher the score, the more likely the sentence contains check-worthy factual claims. The lower the score, the more non-factual, subjective and opinionated the sentence is. ClaimBuster’s score is based on a classification and scoring model.

Researchers categorize sentences in presidential debates into three categories:

− NFS - Non-Factual Sentence: subjective sentences: opinions, beliefs, declarations. These sentences do not contain any factual claim. Examples: "But I think it’s time to talk about the

future.", "You remember the last time you said that?".

− UFS - Unimportant Factual Sentence: these are factual claims but not check-worthy, because the general public will not be interested in knowing whether these sentences are true or false. Examples are "Next Tuesday is Election Day", "Two days ago we ate lunch at a restaurant". − CFS - Check-worthy Factual Sentence: these sentences contain factual claims and the general public will be interested in knowing whether the claims are true or false. For example: "He

voted against the first Gulf War", or "Over a million and a quarter Americans are HIV-positive".

The task is to automatically detect CFSs. The authors see it as a supervised learning problem, specifically, a multi-class classification problem where the classes are NFS, UFS and CFS.

5.1.1 Data collection

The dataset for model training was created leveraging the presidential debate transcripts labeled by human coders.36

The first general election presidential debate was held in 1960. Since then, there have been 14 elections till 2012. In 1964, 1968 and 1972, no presidential debate was held. There were 2 to 4 debate episodes in each of the remaining 11 elections. A total of 30 debate episodes spanned 1960 – 2012. There are a total of 123 speakers including 18 presidential candidates, moderators and guests. The whole dataset consists of 28029 sentences. The authors were interested in sentences spoken by the presidential candidates only. There are 23075 such sentences.

35 See https://twitter.com/ClaimBusterTM 36ClaimBuster dataset was not publicly released.

(28)

To label the sentences, a data collection website was developed, and journalists, professors and university students were invited to participate in the survey. There was a reward system to encourage high quality answers. A participant was given one sentence at a time and was asked to label it with one of the three possible options.37

5.1.2 Feature extraction

To build classification model, from the corpus sentences were extracted multiple categories of features like sentiment, length, word, POS, entity type, - 6201 features in total. To avoid over-fitting and to attain a simpler model, the authors performed feature selection. They trained a random forest classifier for which they used GINI index to measure the importance of features in constructing each decision tree. The overall importance of a feature is its average importance over all the trees

5.1.3 Classification

For classification, the authors report to have performed 4-fold cross-validation using several su-pervised learning methods, including Multinomial Naive Bayes Classifier (NBC), Support Vector Classifier (SVM ) and Random Forest Classifier (RFC). The top scoring classifiers achieve the accuracy of 0.85: the top-quality participants faced screening sentences 1395 times and made incorrect judgment 208 times (14.9%). The recall and precision in detecting check worthy factual claim are 74% and 79%, respectively.

5.2 Claim matching

Given an important factual claim identified by the claim spotter, the ClaimBuster claim matcher searches a fact-check repository and returns those fact-checks matching the claim.

The system has two approaches to measuring the similarity between a claim and a fact-check. One is based on the similarity of tokens and the other is based on semantic similarity. An Elastic Search server is deployed for searching the repository based on token similarity, while a semantic similarity search toolkit, Semilar (Rus et al. 2013), is applied for the search based on semantic similarity.

5.3 Claim checking

Given a claim, the claim checker collects supporting or debunking evidence from knowledge bases and the Web.

With regard to knowledge bases, it uses a question generation tool (Heilman and Smith 2009)38

37 See the data collection demo https://idir.uta.edu/classifyfact_survey/.

(29)

to generate many questions based on the claim and select those "good" questions which are then sent to the question answering engine Wolfram Alpha39 via an API, then the answers from Wolfram Alpha are extracted.

Simultaneously, the claim checker sends the aforementioned questions to Google via HTTP requests and extracts the answers from Google’s answer boxes in the HTML responses. If any clear discrepancies between the returned answers and the claim exist, then a verdict may be derived and presented to the user. Meanwhile, the factual claim itself is sent to Google as a general search query.

The claim checker then parses the search results and downloads the web page for each top result. Within each such page, it finds sentences matching the claim. The matching sentences and a few of their surrounding sentences are then grouped together into a context for the answers returned from Wolfram Alpha.

The evidence is reported to the user. The system automatically sends a daily email to partici-pating fact-checkers with statements they might be interested in scrutinizing.40.

6

Fake-news detection challenges

To date, the most reliable antidote against desiformation seems to be the crowdsourcing. It’s crowdsourcing that keeps Wikipedia quality surprisingly high despite many attempts manipulate its content. While Wikipedia does not explicitly vet each article that is posted around the internet, it does assemble information in real time, and faces the problem of excluding fabricated content.

Fact-checking for fake news detection is a complex process that might involve high level of expertise in different s spheres of human knowledge. It poses an interdisciplinary challenge: technology is required to extract factual statements from text, to match facts with a knowledge base, to dynamically retrieve and maintain knowledge bases from the web, to reliably assess the overall veracity of an entire article rather than individual statements, to do so in real time as news events unfold, to monitor the spread of fake news within and across social media, to measure the reputation of information sources, and to raise awareness in readers. These are only the most salient things that need be done to tackle the problem.

That is why large-scale fact-checking is still far out of reach.

Moreover, a sheer, binary distinction between fake and real news turned out to be infeasible, since hardly any piece of fake news is entirely false, and pieces of real news may not be flawless.

39Wolfram Alpha, is the computational knowledge engine started at 2009 accessing the vast repository of information from trusted sources around the world: the CIA’s World Factbook, the United States Geological Survey, a Cornell University Library publication called All About Birds, Chambers Biographical Dictionary, Dow Jones, the Catalogue of Life and others. At the heart of Alpha lies Mathematica, piece of software that’s wildly popular with engineers and scientists. Wolfram Alpha was designed by Alpha’s author, the British physicist Stephen Wolfram. See also https://en.wikipedia.org/wiki/Wolfram_Alpha

Riferimenti

Documenti correlati

Il problema del falso e della falsificazione in età romana costituisce il filo conduttore dei saggi raccolti in questo volume, che spaziano dalla definizione, problematica,

Figure 5.43 Confusion matrices describing detection performances of the finer faults for the simple binary classification tree method in both the aerobic (a) and the anoxic phases

In the following the discussion of social media research critique has five entry points: good data, human subjects, proprietary effects, repurposing and alternatives.. The first

Hence, the aim of this study is twofold: on the one hand, to provide an overview of the fake news phenomenon by conducting a systematic review, with a particular focus on fake

• Contenuto manipolato: quando l'informazione reale, o l'immagine, viene manipolata per trarre in inganno.. • Contenuto fuorviante: quando si fa uso ingannevole dell'informazione

• Contenuto manipolato: quando l'informazione reale, o l'immagine, viene manipolata per trarre in inganno.. • Contenuto fuorviante: quando si fa uso ingannevole dell'informazione

Sono state monitorate le conversazioni online (siti news, testate online, social, blog, forum) riferiti ai vaccini contro il Covid-19 da dicembre 2020 a maggio 2021.. La creazione di

The 7 centres participating in the study [Ancona, Catania (2), Genova, Imola, Rome, Verona] enrolled 1200 women, according to Black method, including a sample of fertile women