• Non ci sono risultati.

Mining Technical Knowledge

N/A
N/A
Protected

Academic year: 2021

Condividi "Mining Technical Knowledge"

Copied!
313
0
0

Testo completo

(1)

University of Pisa Department of Energy, Systems, Territory and Construction Engineering

Ph.D. Dissertation

Mining Technical Knowledge

Natural Language Processing Techniques and Engineering Management Methods

Filippo Chiarello

January 2019

Supervisors: Prof. Andrea Bonaccorsi Prof. Gualtiero Fantoni

(2)

ii

Mining Technical Knowledge

Filippo Chiarello, 2019.

Supervisors:

Prof. Andrea Bonaccorsi University of Pisa, Dept. of Energy, Systems, Territory, and Construction Engineering

Prof. Gualtiero Fantoni University of Pisa, Dept. of Industrial Engineering

Revised by:

Dr. Alessio Ferrari National Council of Research, Institute of Information

Science and Technology,

Prof. Riccardo Fini University of Bologna, Dept. of Management

Department of Energy, Systems, Territory, and Construction Engineering University of Pisa

Largo Lucio Lazzarino, 1 IT-56122 Pisa, Italy

Typeset in Markdown Pisa, Italy 2019

(3)

iii

“The larger grows the area of knowledge, so too grows our perimeter of ignorance. It maybe that for all we know we could be stepped in the center of infinite ignorance. Which then provides job security for ever for scien-tists.”

(4)
(5)

v

UNIVERSITY OF PISA

Abstract

Faculty of Engineering Management School of Engineering

Doctor of Philosophy

Mining Technical Knowledge

Natural Language Processing Techniques and Engineering Management Methods

by Filippo CHIARELLO

The information field companies are living in has changed dra-matically over the last years bringing a new challenge for manage-ment engineers. This discipline comes with engineering method-ologies applied to inherent systems but, nowadays, activities with greater added value for companies are hardly standardized and non-repetitive. The enormous amount of information, which is chang-ing the environment of companies, has a determinant impact on Research and Development, Design, Marketing and Human Re-sources Management: all functions with high strategic content, and so knowledge. Since documents written in natural language con-tains knowledge by design, management engineers has nowadays the great opportunity to exploit the technical knowledge hidden in this unstructured sources to generate value.

The aim of this thesis is to design methods and processes for the analysis of technical documents in order to extract valuable knowl-edge for companies. The methods are ensembles of Natural Lan-guage Processing and Managements Engineering techniques. The methods has the goal of providing correct knowledge exchange be-tween humans and machines, leading to incorporate knowledge of the experts inside machine-learning systems and experts’ abil-ity to use in their process of decision making inductively generated knowledge of machines.

(6)
(7)

vii

Acknowledgements

I want to thank you to my parents for being my greatest teachers in these years of study. I hope to give you back a at least fraction of the things you given me.

I want to thank you to my supervisors, prof. Gualtiero Fantoni and prof. Andrea Bonaccorsi for believing in me before i did and for giving me the opportunity to start this career that i love more everyday. The best has yet to come.

I want to thank you to my best friends and band, Alessandro, Ranieri and Paolo for remembering me that my work is not the only thing i love to do in my life.

I want to thank you to my colleagues Leonello, Simona, Silvia and Elena for showing me new ways in which things can be done.

I want to thank you prof. Antonella Martini for the patience and passion in showing me how academia actually works.

Finally, i want to thank you Arianna for giving me the inspira-tion and the energy that i needed to conclude my PhD how i wanted to.

(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)

0 10 20 30 40 2005 2010 2015 2020 Year ‘Inf or mation [ZB]‘

(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)
(25)
(26)
(27)
(28)
(29)
(30)
(31)
(32)
(33)
(34)
(35)
(36)
(37)
(38)
(39)
(40)
(41)
(42)
(43)
(44)
(45)
(46)
(47)
(48)
(49)
(50)
(51)
(52)
(53)
(54)
(55)
(56)
(57)
(58)
(59)
(60)
(61)
(62)
(63)
(64)
(65)
(66)
(67)
(68)
(69)
(70)
(71)
(72)
(73)
(74)
(75)
(76)
(77)
(78)
(79)
(80)
(81)
(82)
(83)
(84)
(85)

Automatic patent set annotation 1 Automatically Annotated Patent Set 1 List of new Users Automatically Annotated Patent Set 2 Relevant sentences extraction Sentences Training Set Automatic patent set annotation 2 Difference computation Manual review List of Users List of user generation Patent set Patent text preprocessing Morfosyntactically Analyzed Patent set Patent set Selection 1 3 4 2 5 6 7 8 Document Phase Legend

(86)
(87)
(88)
(89)
(90)
(91)
(92)
(93)
(94)
(95)
(96)
(97)
(98)
(99)

Advantages & Drawbacks Clues Collection 1 Domain Clues Extraction 2 Domain Clues Validation 3 Advantages & Drawbacks Extraction 4

(100)
(101)

Automatically Annotated Patent Set 1 List of Advantages & Drawbacks Clues Patent set Patent text preprocessing Morphosyntactically Analyzed Patent set Patent set Selection

List of Advantages & Drawbacks clues

generation

Document Phase Legend Advantages & Drawbacks Pointer Collection

Automatic patent set annotation 1

(102)
(103)

Automatically Annotated Patent Set 1 Automatically Annotated Patent Set 2 Sentences Training Set Automatic patent set annotation 2 Difference computation List of domain Clues Document Phase Legend Domain Clues Extraction Phase

Relevant sentences extraction

(104)

0 0 −2, −1, 0, 1, 2

(105)
(106)

List of domain Clues Tweets Collection Tweets Corpus Clues Sentiment Polarization List Cleaning Validated Domain Clues Sentiment Analysis W2V Features Clues Validation Phase

Document Phase Legend

(107)
(108)
(109)
(110)

Non-domain Clues Validated Domain Clues Merge List of Clues Morphosyntactically Analyzed Patent set Regular Expression Rules Advantages & Drawbacks Sentences Extraction Advantages & Drawbacks Document Phase Legend Advantages & Drawbacks Extraction

(111)
(112)
(113)
(114)
(115)
(116)
(117)
(118)
(119)
(120)
(121)
(122)
(123)
(124)
(125)
(126)
(127)
(128)
(129)
(130)
(131)
(132)
(133)
(134)
(135)
(136)
(137)
(138)

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● minimiz e maximiz e 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 number of topics metrics: ●Griffiths2004 CaoJuan2009 Arun2010

(139)
(140)

cloud manufacturing computer aided manufacturing information management internet of things cutting fluids machining surface roughness turning green manufacturing resource efficiency computer simulation finite element method machine tools production engineering cutting tools electric welding energy efficient environmental management gas metal arc welding optimization welding welds 3d printing additive manufacturing product development product life cycle management remanufacturing industrial applications global warming greenhouse gases recycling waste management aluminum titanium titanium alloys wear of materials manufacturing industries integer programming production control scheduling renewable energies renewable energy renewable energy resources wind power eco innovation innovation lean manufacturing product service systems chains competition maintenance supply chain management

1 2 3 4 5 6 7 8 9 10 11 12

factor(topic)

reorder(ter

(141)
(142)
(143)
(144)

0 1000 2000 0 50 100 150 200 250 Number of words Number of sentences

(145)

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.8 −0.6 −0.4 −0.2 0.0 2013 2014 2015 2016 2017 Year P olar ity V alue

(146)
(147)

● ● ● ● ● ● ● ● ●● ●● ● ● ●● ●● ●● ●● ●● ●● ● ●● minimiz e maximiz e 2 3 4 5 6 7 8 9101112131415161718192021222324252627282930 2 3 4 5 6 7 8 9101112131415161718192021222324252627282930 2 3 4 5 6 7 8 9101112131415161718192021222324252627282930 2 3 4 5 6 7 8 9101112131415161718192021222324252627282930 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 number of topics metrics: ●Griffiths2004 CaoJuan2009 Arun2010 Deveaud2014

(148)
(149)
(150)
(151)
(152)
(153)
(154)
(155)
(156)
(157)
(158)
(159)
(160)
(161)
(162)
(163)
(164)
(165)
(166)
(167)
(168)
(169)
(170)
(171)
(172)
(173)
(174)
(175)
(176)
(177)
(178)
(179)
(180)
(181)
(182)
(183)
(184)
(185)
(186)
(187)
(188)
(189)
(190)
(191)
(192)
(193)
(194)
(195)
(196)
(197)
(198)
(199)
(200)

Riferimenti

Documenti correlati

In turn, this information can be retrieved from RAOBS, typically collected worldwide at airports twice a day: While temperature is directly measured during the radiosonde

La cute dei soggetti con assenza e/o presenza di stress psicologico e/o abitudine al fumo e con fotoesposizione media gruppi V, VI, VII, VIII evidenzia una notevole riduzione

In effetti ogni infermiere agisce a seconda della sua esperienza, della sua formazione, delle sue conoscenze o delle sue opinioni; lo dimostra il fatto che nella nostra realtà

present paper (though without embracing the notion of a new computer systems level). However, all is not quite so simple. The matter is worth exploring briefly. It

Si tratta di regioni le cui politiche di sviluppo si fondano sull'ampliamento della base istituzionale locale, vale a dire della capacità di autorganizzazione politica attraverso

(2016), “Main Research Topics in Knowledge Management: A Content Analysis of ECKM Publications”, Electronic Journal of Knowledge Management, Vol.. (2016), “Learning from an

KEEx, each community of knowing (or Knowledge Nodes (KN), as they are called in [Bonifacio 02a]) is represented by a peer, and the two principles above are implemented

The function takes as parameters the jsTree parent node of the node that is going to be created, and a JavaScript Object containing information about the type of node (aggregator