ContentslistsavailableatScienceDirect
Journal
of
Informetrics
jo u r n al hom e p ag e :w w w . e l s e v i e r . c o m / l o c a t e / j o i
Regular
article
bibliometrix:
An
R-tool
for
comprehensive
science
mapping
analysis
Massimo
Aria
a,∗,
Corrado
Cuccurullo
baDepartmentofEconomicsandStatistics,UniversitàdegliStudidiNapoliFedericoII,ViaCintia,C.ssoM.teS.Angelo,80126Naples,Italy bDepartmentofEconomicsandManagement,UniversitàdellaCampaniaLuigiVanvitelli,CorsoGranPrioratodiMalta,Capua,CE,Italy
a
r
t
i
c
l
e
i
n
f
o
Articlehistory:Received14February2017
Receivedinrevisedform27August2017 Accepted27August2017
Availableonline12September2017 Keywords: Bibliometrics Sciencemapping Workflow Co-citation Bibliographiccoupling Rpackage
a
b
s
t
r
a
c
t
Theuseofbibliometricsisgraduallyextendingtoalldisciplines.Itisparticularlysuitable forsciencemappingatatimewhentheemphasisonempiricalcontributionsisproducing voluminous,fragmented,andcontroversialresearchstreams.Sciencemappingiscomplex andunwieldlybecauseitismulti-stepandfrequentlyrequiresnumerousanddiverse soft-waretools,whicharenotallnecessarilyfreeware.Althoughautomatedworkflowsthat integratethesesoftwaretoolsintoanorganizeddataflowareemerging,inthispaperwe proposeauniqueopen-sourcetool,designedbytheauthors,calledbibliometrix,for per-formingcomprehensivesciencemappinganalysis.bibliometrixsupportsarecommended workflowtoperformbibliometricanalyses.AsitisprogrammedinR,theproposedtoolis flexibleandcanberapidlyupgradedandintegratedwithotherstatisticalR-packages.Itis thereforeusefulinaconstantlychangingsciencesuchasbibliometrics.
©2017ElsevierLtd.Allrightsreserved.
1. Introduction
Thenumberofacademicpublicationsisincreasingatarapidpaceanditisbecomingincreasinglyunfeasibletoremain currentwitheverythingthatisbeingpublished.Moreover,theemphasisonempiricalcontributionshasresultedin volu-minousandfragmentedresearchstreams(Briner&Denyer,2012).Thishamperstheabilitytoaccumulateknowledgeand activelycollectevidencethroughasetofpreviousresearchpapers.Therefore,literaturereviewsareincreasinglyassuminga crucialroleinsynthesizingpastresearchfindingstoeffectivelyusetheexistingknowledgebase,advancealineofresearch, andprovideevidence-basedinsight intothepracticeofexercisingandsustainingprofessionaljudgmentand expertise (Rousseau,2012).
Scholarsusedifferentqualitativeandquantitativeliteraturereviewingapproachestounderstandandorganizeearlier findings.Amongthese,bibliometricshasthepotentialtointroduce asystematic,transparent,andreproduciblereview processbasedonthestatisticalmeasurementofscience,scientists,orscientificactivity(Broadus,1987;Diodato,1994; Pritchard,1969).Unlikeothertechniques,bibliometricsprovidesmoreobjectiveandreliableanalyses.Theoverwhelming volumeofnewinformation,conceptualdevelopments,anddataarethemilieuwherebibliometricsbecomesusefulby providingastructuredanalysistoalargebodyofinformation,toinfertrendsovertime,themesresearched,identifyshifts intheboundariesofthedisciplines,todetectthemostprolificscholarsandinstitutions,andtopresentthe“bigpicture”of extantresearch(Crane,1972).
∗ Correspondingauthor.
E-mailaddresses:aria@unina.it,massimo.aria@unina.it(M.Aria),corrado.cuccurullo@unicampania.it(C.Cuccurullo).
http://dx.doi.org/10.1016/j.joi.2017.08.007
Althoughovertime,theuseofbibliometricshasbeenextendedtoalldisciplines,bibliometricanalysisiscomplexbecause itentailsseveralstepsthat employnumerousanddiverseanalysesandmappingsoftwaretools,whicharefrequently availableonlyundercommerciallicenses(Guler,Waaijer,andPalmblad,2016).Thesedifficultiesarecompoundedbythe realitythatfewresearchersandpractitionersaretrainedinhowtoreviewliteratureandtoidentifyevidence-basedpractices (Briner&Denyer,2012).Thecumbersomenatureoftheprocessreducesthepossibilitiesandthepotentialofbibliometrics, especiallyforscholarswhohavenogeneralprogrammingskills.
Recently,automatedworkflowstoassemblespecializedsoftwareintoacomprehensiveandorganizeddataflowhave beguntoemergeforbibliometrics.Theyareparticularlywellsuitedtomulti-stepanalysesusingdifferenttypesofsoftware tools(Guler,Waaijer,Mohammed,&Palmblad,2016).Inthispaper,weproposeauniquetool,developedintheRlanguage, whichfollowsaclassiclogicalbibliometricworkflowthatwereconstruct.WehavedesignedandproducedanR-toolfor comprehensivebibliometricanalyses.Risalanguageandenvironmentforstatisticalcomputingandgraphics(RCoreTeam, 2016).Itprovidesawidevarietyofstatisticalandgraphicaltechniquesandishighlyextensible(Matloff,2011).Inadditionto enablingstatisticaloperations,itisanobject-orientedandfunctionalprogramminglanguage;hence,youcanautomateyour analysesandcreatenewfunctions.Ithasanopen-softwarenature,whichmeansitiswellsupportedbytheusercommunity andnewfunctionsareregularlycontributedbyusers,manyofwhomareprominentstatisticians.AsitisprogrammedinR, theproposedtoolisflexible,canberapidlyupgraded,andcanbeintegratedwithotherstatisticalR-packages.Itistherefore usefulinaconstantlychangingfieldsuchasbibliometrics.
Theaimofthispaperistwofold.First,wepresenttheproposedopen-sourcebibliometrixR-packageforperforming com-prehensivebibliometricanalyses,comparingittootherimportantsoftwaretools.Secondly,wediscusshowtheproposedtool supportsarecommendedworkflowforperformingbibliometricstudies.Weillustratethemainbibliometrixfunctionsinthis workflow,usingallthearticleswritteninEnglishonbibliometricsinthemanagement,business,andpublicadministration domainsoveraspanof30years.
2. Recommendedworkflowforsciencemapping
ThegeneralsciencemappingworkflowwasdescribedbyBörner,Chen,andBoyack(2003).Cobo,Lopez-Herrera, Herrera-Viedma,andHerrera,(2011a)comparedsciencemappingsoftwaretoolsusingasimilarworkflow.Astandardworkflow consistsoffivestages(Zupic& ˇCater,2015):
1.Studydesign; 2.Datacollection; 3.Dataanalysis; 4.Datavisualization; 5.Interpretation.
Instudydesign, scholarsdefinetheresearchquestion(s)andchoosetheappropriatebibliometricmethodsthatcan answerthequestion(s).Threegeneraltypesofresearchquestionscanbeansweredusingbibliometricsforsciencemapping: (i)identifyingtheknowledgebaseofatopicorresearchfieldanditsintellectualstructure;(ii)examiningtheresearchfront (orconceptualstructure)ofatopicorresearchfield;and(iii)producingasocialnetworkstructureofaparticularscientific community.Instudydesign,oneofthemostsignificantchoicesforscholarsisthetimespanordecisiontodividethetimespan intotimeslices.Bibliometricanalysisisperformedataspecificpointintimetorepresentastaticpictureofthefieldatthat moment;itcandividethetimespanintomultipletimeperiodstocapturethedevelopmentofthefieldthroughtime.
Indatacollection,scholarsselectthedatabasethatcontainsthebibliometricdata,filterthecoredocumentset,andexport thedatafromtheselecteddatabase.Thisstepcaninvolveconstructingone’sowndatabase(Waltman,2016).
Fordataanalysis,oneormorebibliometricorstatisticalsoftwaretoolsareemployed.Alternatively,scholarscanwrite theirowncomputercodetomeettheirrequirements.
Thefourthstageisdatavisualization.Scholarsmustdecidewhatvisualizationmethodistobeusedontheresultsofthe thirdstepandthenemploytheappropriatemappingsoftware.
Thelaststageisinterpretation,wherescholarsinterpretanddescribetheirfindings.Althoughbibliometricmethods willfrequentlyrevealthestructureofafielddifferentlytotheclassificationoftraditionalliteraturereviews,theyarenota substituteforextensivereadinginthefield.Scholarswithin-depthknowledgeofthefieldhaveacleardistinctiveadvantage.
Thesecondtofourthstagesaretypicallysoftware-assistedandincludedifferentsub-stages. 2.1. Datacollection
Datacollectionisdividedinthreesub-stages.Thefirstisdataretrieval.Manyonlinebibliographicdatabases,where metadata regarding scientific works are stored, can be sources of bibliographic information, such as Clarivate Ana-lytics Web of Science (WoS at http://www.webofknowledge.com), Scopus (http://www.scopus.com), Google Scholar (http://scholar.google.com),andScienceDirect(http://www.sciencedirect.com/)(Coboetal.,2011a).Theydonotcover thescientificfieldsandjournalsinthesamemannerandhencethechoiceisnotneutral(Waltman,2016;Zupic& ˇCater,
Table1
Mostcommonbibliometrictechniquesperunitofanalysis(adaptedbyCoboetal.,2011).
Bibliometrictechniquetaxonomy Unitofanalysisused Kindofrelation BibliographicCoupling • Author
• Document • Journal
• Commonreferencesinauthors’oeuvres • Commonreferencesindocuments • Commonreferencesinjournals’oeuvres
Co-citation • Author • Reference • Journal • Co-citedauthors • Co-citeddocuments • Co-citedjournals Co-author • Author
• Countryfromaffiliation • Institutionfromaffiliation
• Co-occurrenceofauthorsintheauthorlistofa document
• Co-occurrenceofcountriesintheaddresslistof adocument
• Co-occurrenceofinstitutionsintheaddresslist ofadocument
Co-word • Keyword,ortermextractedfromtitle,abstract ordocument’sbody
• Co-occurrenceoftermsinadocument
2015).Othersimilardatabasesexistforspecificdisciplines(e.g.,Medline,AstrophysicsDataSystem),patentdata,anddigital materials(e.g.,arXiv,DBPL,CiteSeerXPatent).
Thesecondsub-stageisdataloadingandconverting,wherescholarsmustconvertdataintoasuitableformatforthe employedbibliometrictools.
Thefinalsub-stageisdatacleaning.Thequalityoftheresultdependsonthequalityofthedata.Severalpreprocessing methodscanbeapplied,forexample,todetectduplicateandmisspelledelements.Althoughthemajorityofbibliometric dataarereliable,citedreferencescancontainmultipleversionsofthesamepublicationanddifferentspellingsofanauthor’s name.Moreover,becauseauthorsaretypicallyabbreviatedbytheirsurnameandinitials,aproblemcanarisewithcommon names.Citedjournalscanalsoappearinslightlydifferentforms.Bookshavedifferenteditions,whichcanappearasdifferent citations.
2.2. Dataanalysis
Dataanalysisentailsdescriptiveanalysisandnetworkextraction.Differentapproacheshavebeendevelopedtoextract networksusingdifferentunitsofanalysis(Table1).Forexample,co-wordanalysis(Callon,Courtial,Turner,&Bauin,1983) usesthemostimportantwordsorkeywordsofdocumentstostudytheconceptualstructureofaresearchfield.Itisthe onlymethodthatusestheactualcontentofthedocumentstoconstructasimilaritymeasure;theothersconnectdocuments indirectlythroughcitations.Co-wordanalysisproducessemanticmapsofafieldthatfacilitatetheunderstandingofits cognitivestructure.Itcanbeappliedtodocumentkeywords,abstracts,orfulltexts.Theunitofanalysisisusuallyaconceptor keyword,notadocument,author,orjournal.Anothercommonbibliometricanalysisisco-authoranalysis,whichexamines theauthorsandtheiraffiliationstostudythesocialstructureandcollaborationnetworks(Glänzel,2001;Peters&Van Raan,1991).Themostcommonanalysisinbibliometricsiscitationanalysis.Itemployscitationcountsasa measureof similaritybetweendocuments,authors,andjournals.Citationanalysiscanbedecomposedintobibliographiccouplingand co-citationanalysis.Examplesareauthorcoupling(Zhao&Strotmann,2008),authorco-citation(White&McCain,1998; White&Griffith,1981),journalcoupling(Gao&Guan,2009;Small&Koenig,1977;Yan&Ding,2012),andjournalco-citation (McCain,1991).
Abibliographiccouplingconnectionisestablishedbytheauthorsofthearticlesinquestion,whereasaco-citation con-nectionisestablishedbytheauthorswhoarecitingthedocumentsanalysed.Thatis,bibliographiccoupling(Kessler,1963) analysesthecitingdocuments,whereasco-citationanalysis(Small,1973)studiestheciteddocuments.Although biblio-graphiccouplingishelpfulindetectingtheconnectionsofresearchgroups(Yang,Han,Wolfram,&Zhao,2016),co-citation analysis,whenexaminedovertime,isalsohelpfulindetectingashiftinparadigmsandschoolsofthought.Thechoiceofthe techniquetoemploydependsonthegoalsoftheanalysis.Usually,co-citationanalysisisperformedformappingolderpapers (prospectiveanalysis–itisdynamicandisbestperformedwithindifferenttimeslices),whereasbibliographiccouplingis usedtomapacurrentresearchfront(retrospectiveanalysis–itdoesnotchangeovertime).Recently,KlavansandBoyack (2017)suggestedthatdirectcitationsaremoreaccurateinrepresentingaresearchfrontthanbibliographiccouplingand co-citation.
Oncethenetworkhasbeenbuilt,anormalizationprocesscanbecommonlyperformedovertherelations(edges)between itsnodes(vertices)usingsimilaritymeasuressuchasSalton’scosine,Jaccard’scoefficient,andPearson’scorrelation.
Finally,datareductionishelpfulinidentifyingsubfields.Withthenormalizeddata,differenttechniquescanbeusedto buildthemap.Variousdimensionalityreductiontechniquescanbeapplied,suchasprincipalcomponentanalysis/factor analysis,multidimensionalscaling(MDS),multiplecorrespondenceanalysis(MCA),andclusteringalgorithms.
2.3. Datavisualization
Analysismethodsallowtheextractionofusefulknowledgefromdataandtorepresentitthroughintuitivevisualizations ormapssuchasbi-dimensionalmaps,dendrograms,andsocialnetworks.Networkanalysisallowsustoperformastatistical analysisoverthemapsgeneratedtoindicatedifferentmeasuresoftheentirenetworkormeasuresoftherelationshiporthe overlappingofthedifferentclustersdetected.
Visualizationtechniquesareusedtorepresenta sciencemap andtheresultofthedifferentanalyses.Forexample, networkscanberepresentedusingheliocentricmaps(deMoya-Anegonetal.,2005),geometricalmodels(Skupin,2009), thematicnetworks(Bailón-Moreno,Jurado-Alameda,&Ruiz-Ba ˜nos,2006;Cobo,López-Herrera,Herrera-Viedma,&Herrera, 2011b),ormapswheretheproximitybetweenitemsrepresentstheirsimilarity(vanEck&Waltman,2010).Alternatively, temporalanalysisaimstoindicatetheconceptual,intellectual,orsocialevolutionoftheresearchfieldbydiscovering pat-terns,trends,seasonality,andoutliers.Burstdetection,atemporalanalysis,aimstoidentifyfeaturesthathavehighintensity overfinitedurationsoftimeperiods.Todemonstratetheevolutionindifferenttimeperiods,clusterstrings(Small,2006; Small&Upham,2009;Upham&Small,2010)andthematicareas(Coboetal.,2011b)canbeused.Finally,geospatialanalysis aimstodiscoverwhereaneventoccursanditsimpactontheneighbouringareas.
3. Relatedbibliometricsoftwaretools 3.1. Softwaretoolsforsciencemapping
Numeroussoftware tools supportbibliometric analysis; however, many of thesedo not assist scholarsin a com-plete recommended workflow. The most relevant tools are CitNetExplorer (van Eck & Waltman, 2014), VOSviewer (van Eck & Waltman, 2010), SciMAT (Cobo, López-Herrera, Herrera-Viedma, & Herrera, 2012), BibExcel (Persson, Danell,&Schneider,2009),ScienceofScience(Sci2)Tool(Sci2Team,2009),CiteSpace(Chen,2006),andVantagePoint (www.thevantagepoint.com).
CitNetExplorerand VOSvieweraretwo freeJavaapplications,designedbyvanEckandWaltman,foranalysingand visualizingcitationnetworksofscientificcollections.CitNetExplorerallowstheuserto(i)analysethedevelopmentofa researchfieldovertime,(ii)identifythecoreliteratureonaresearchtopic,and(iii)explorethepublicationoeuvreofa researcheranditsinfluenceonthepublicationsofotherresearchers.VOSvieweraddressesthegraphicalrepresentationof bibliometricmapsandisespeciallyusefulfordisplayinglargebibliometricmapsinaneasy-to-interpretmanner.
SciMATisanopensourcesoftwaretooldevelopedtoperformasciencemappinganalysisunderalongitudinalframework. SciMATprovidesthreedifferentmodules:(i)managementofaknowledgebaseanditsentities;(ii)sciencemappinganalysis; and(iii)visualizationofthegeneratedresults.
BibExcelisdesignedtoassistascholarinanalysingbibliographicdata,oranydataofatextualnatureformattedina similarmanner.ItgeneratesdatafilesthatcanbeimportedintoExcel,oranyprogramthatacceptstabbeddatarecords,for furtherprocessing.However,BibExceldoesnotincludeanymoduletovisualizeandmaptheresults.
TheScienceofScience(Sci2)Toolisfreesoftwarethatsupportsthetemporal,geospatial,topical,andnetworkanalysis andvisualizationofbibliographiccollections.
CiteSpaceisafreeJavaapplicationforvisualizingandanalysingtrendsandpatternsinscientificliterature.Itfocuseson identifyingcriticalpointsinthedevelopmentofafieldoradomain,especiallyintellectualturningpointsandpivotalpoints. VantagePointiscommercialsoftwareforsciencemappinganalysis.Itsmajorstrengthistheabilitytoreadvirtually anystructuredtextcontent.Itsupportsmorethan190differentimportfilters.Moreover,VantagePointincludesatoolfor visualizingthemainbibliometricmaps.
3.2. R-packagesforbibliometricanalysis
IntheRenvironment,otherpackageshavebeenpublishedrecentlyontheofficialrepository(CRAN,TheComprehensiveR ArchiveNetwork,https://cran.r-project.org/)addressingbibliometrics.Eachofthemprovidesforspecificanalysisfunctions; however,noneaddressestheentireworkflow.Forexample,theprimaryaimofCITAN(Gagolewski,2011)–CITation ANal-ysispackageforRstatisticalcomputingEnvironmentistosupportscholarswithatool(i)forpreprocessingandcleaning bibliographicdataretrievedfromScopusand(ii)forcalculatingthemostpopularindicesofscientificimpact.Moreover, CITANprovidesmetricssuchash-index,g-index,andL-index.Unlikebibliometrix,CITAN(i)canuseonlydatafromScopus and(ii)hasnofunctionsforco-citation,bibliographiccoupling,scientificcollaboration,co-wordanalysis,ortextextraction fromtitlesandabstracts.
ScientoText(Uddin,2016)isanotherrecentpackagethatisperhapsthemostcomparabletothebibliometrixR-package. Nevertheless,althoughScientoTextstatesthatitusesdatafromtheWoSandScopusdatabases,itcurrentlyhasnofunctions forimportingandconvertingdata.
H-indexCalculator(Alavifard,2015)usesonlydatafromtheClarivateAnalyticsWoSforcalculatingtheh-index. Finally,Scholar(Keirstead,2015)offerssimilarfunctionalitiesasthewell-knownsoftwaretoolPublishorPerish(Harzing, 2007).ItenablesdatatobeextractedfromGoogleScholarforoneormoreresearchersforanalysingcitationsandcalculating certainimpactmetrics.AswithCITAN,Scholardoesnotincludeanyfunctionforco-citation,bibliographiccoupling,scientific
Fig.1. bibliometrixandtherecommendedsciencemappingworkflow.
collaboration,co-wordanalysis,ortextextractionfromtitlesandabstracts.Moreover,itcanonlyusedatafromGoogle ScholarwithallthelimitationswithrespecttotheWoSandScopusdatabases(Bar-Ilan,2007;Yang&Meho,2006). 4. bibliometrixandtherecommendedsciencemappingworkflow
ThebibliometrixR-package(http://www.bibliometrix.org)providesasetoftoolsforquantitativeresearchinbibliometrics andscientometrics.ItiswrittenintheRlanguage,whichisanopen-sourceenvironmentandecosystem.Theexistenceof substantial,effectivestatisticalalgorithms,accesstohigh-qualitynumericalroutines,andintegrateddatavisualizationtools areperhapsthestrongestqualitiestopreferRtootherlanguagesforscientificcomputation.
Fig.1illustratesthebibliometrixworkflowsupportingthesecondthroughfourthstagesoftherecommendedscience mappingworkflowpresentedinSection2.
1.Datacollection.bibliometrixsupportsthefollowingsub-stage: a DataloadingandconversiontoRdataframe(Section4.1). 2.DataAnalysis,articulatedinthreesub-stages:
aDescriptiveanalysisofabibliographicdataframe(Section4.2);
bNetworkcreationforbibliographiccoupling,co-citation,collaboration,andco-occurrenceanalyses(Section4.3); cNormalization(Section4.4).
3.Datavisualization:
aConceptualstructuremapping(Section4.3e); bNetworkmapping(Section4.5).
To describe the main functions of bibliometrix (Table 2), we analysed articles on bibliometrics in the management and business fields between 1985 and 2015. The data is available on the bibliometrix website (http://www.bibliometrix.org/datasets/bibliometricmanagementbusinesspa.txt).
4.1. DataloadingandconvertingtoRdataframe
Datacollectionisataskcomposedofdifferentsubtasksasfollows.
• Dataretrieval.bibliometrixfunctionswithdataextractedfromthetwomainbibliographicdatabases,namelyClarivate AnalyticsWoSandScopus.Thebibliometrixtutorial(http://www.bibliometrix.org/index.html#header3-16)assists schol-arswithqueryingthesedatabases.Moreover,bibliometrixconnectswiththeScopusAPItoautomaticallycollectmetadata regardingthecompletescientificproductionofalistofscholars.Intheexamplethatfollows,weusedClarivateAnalytics– WebofSciencecorecollection–SocialSciencesCitationIndex(SSCI)andScienceCitationIndexExpanded(SCI-Expanded) andchose(1)thegenerickeyword“bibliometric*”asthetopic,(2)onlyarticleswritteninEnglishforthedocumenttype, (3)“management”,“business”,and“publicadministration”assubjectcategories,and(4)thetimespan1985–2015. • Data loading and converting (hereinafter, square brackets denote the R syntax for commands). The export files are
Table2
Mainbibliometrixfunctions.
Softwareassistedworkflowsteps bibliometrixfunction Description Output Dataloadingand
converting
• readFiles() • LoadsasequenceofScopusandClarivateAnalyticsWoS exportfilesintoR
• Bibliographicdata frame
• convert2df() • Createsabibliographicdataframe
• retrievalByAuthorID() • UsesScopusAPIsearchtoobtaininformationregarding documentsonasetofauthorsusingScopusID Descriptivebibliometric
analysis
• biblioAnalysis() • Returnsanobjectofclassbibliometrix • Tablesofresults • summary()andplot() • Summarizethemainresultsofthebibliometricanalysis
• citations() • Identifiesthemostcitedreferencesorauthors • localCitations() • Identifiesthemostcitedlocalauthors • dominance() • Calculatestheauthors’dominanceranking
• Hindex() • Measuresproductivityandcitationimpactofascholar • lotka() • EstimatesLotka’slawcoefficientsforscientificproductivity • keywordGrowth() • Calculatesyearlycumulativeoccurrencesoftop
keywords/terms
• keywordAssociation() • Associatesauthors’keywordstokeywordsplus DocumentxAttribute
matrixcreation
• metaTagExtraction() • Extractsotherfieldtags,differentfromthestandard WoS/Scopuscodify
• DocumentxAttribute matrix
• termExtraction() • Extractsandstemstermsfromtextualfields(abstract,title, author’skeywords,andothers)ofabibliographicdata frame
• cocMatrix() • ComputesaDocumentxAttributematrix
Normalization • normalizeSimilarity() • Calculatesassociationstrength,inclusionindex,Jaccard’s coefficient,andSalton’ssimilaritycoefficientamong objectsofabibliographicnetwork
• Similaritymatrix
DataReduction • conceptualStructure() • Createsconceptualstructuremapofascientificfieldusing MCAandClustering
• Wordoccurrence matrix,MCA,and clusteringresults Networkmatrixcreation • biblioNetwork() • Calculatesthemostfrequentlyusedbibliographiccoupling,
co-citation,collaboration,andco-occurrencenetworks
• Networkmatrixand historicalnetwork matrix • histNetwork() • Createsahistoricalco-citationnetworkfroma
bibliographicdataframe
Mapping • networkPlot() • PlotsabibliographicnetworkusinginternalRlibraryor VOSviewersoftware
• Networkgraph, networkPajekformat forVOSviewer, Historiograph,and semanticmap • histPlot() • Plotsahistoricaldirectcitationnetwork
• conceptualStructure() • Plotsconceptualstructuremapofascientificfieldusing MCAandClustering
businesspa.txt)]thatcreatesalargecharacterobjectcalledD.Thefunctionsupportsplaintext(forClarivateAnalytics database)and BibTex (forboth ClarivateAnalyticsand Scopus databases)formats and allows importing simultane-ouslymultipleexportfiles.Thesecanbeconvertedintoadataframeusingtheconvert2dffunction[M<-convert2df(D, dbsource=“isi”,format=“plaintext”)].convert2dfcreatesabibliographicdataframewithcasescorrespondingto docu-mentsandvariablestofieldtagsintheoriginalexportfile.Eachdocumentcontainsseveralelementssuchasauthors’ names,title,keywordsandotherinformation.Theseelementsconstitutethebibliographicattributesofadocument,also calledthemetadata.Wehavechosentousestandardcolumnnamesforthebibliographicdataframeadoptingthefield tagsproposedbyClarivateAnalyticsandforScopuscollections.ThisfacilitatesmergingdifferentsourcesandapplyingR
Table3
bibliometrixdataframestructure.
FieldTag Class Description
UT CHARACTER UniqueArticleIdentifier
AU CHARACTER Authors
TI CHARACTER DocumentTitle
SO CHARACTER PublicationName(orSource)
JI CHARACTER ISOSourceAbbreviation
DT CHARACTER DocumentType
DE CHARACTER Authors’Keywords
ID CHARACTER KeywordsassociatedbyWoSorScopusdatabase
AB LARGECHARACTER Abstract
C1 CHARACTER AuthorAddress
RP CHARACTER ReprintAddress
CR LARGECHARACTER CitedReferences
TC NUMERIC TimesCited
PY NUMERIC Year
SC CHARACTER SubjectCategory
DB CHARACTER BibliographicDatabase
Table4
Elementlistofabibliometrixobject. Listelement Description
Articles Totalnumberofdocuments Authors Authors’frequencydistribution
AuthorsFrac Authors’frequencydistribution(fractionalized) FirstAuthors Firstauthorofeachdocument
nAUperPaper Numberofauthorsperdocument Appearances Numberofauthorappearances nAuthors Totalnumberofauthors
AuMultiAuthoredArt Numberofauthorsofmulti-authoredarticles Years Publicationyearofeachdocument
FirstAffiliation Affiliationofthefirstauthorforeachdocument
Affiliations Frequencydistributionofaffiliations(ofallco-authorsforeachdocument) Afffrac Fractionalizedfrequencydistributionofaffiliations(ofallco-authorsforeachpaper) CO Affiliationcountryoffirstauthor
Countries Affiliationcountries’frequencydistribution TotalCitation Numberoftimeseachdocumenthasbeencited
TCperYear Yearlyaveragenumberoftimeseachdocumenthasbeencited Sources Frequencydistributionofthesources(journals,books,others) DE Frequencydistributionoftheauthors’keywords
ID FrequencydistributionofkeywordsassociatedtothedocumentbyClarivateAnalyticsWebofScienceandScopusdatabases
routines.Table3containsthestructureofthebibliometrixdataframeconsideringthemainfieldtags.Thecolumn“class” reportsthedatatypeofeachdataframecolumn.
• Datacleaning.bibliometrixdoesnothavespecificroutinesdedicatedtodatacleaning.Itdoesincludeinitsmainfunctions (e.g.,loadingandconverting,citationanalysis)asetofcleaningrulessuchas:(i)transformfulltextintouppercase,(ii) removenon-alphanumericcharacters,(iii)removepunctuationsymbolsandextraspaces,and(iv)truncateauthor’sfirst andmiddlenamestotheinitials.
4.2. Descriptiveanalysisofabibliographicdataframe
Thedescriptiveanalysisofthebibliographicdataframeusesmanyfunctions.
• ThebiblioAnalysisfunctioncalculatesthemainbibliometricmeasuresusingsimplesyntax[results<-biblioAnalysis(M, sep=“;”)].The biblioAnalysisfunctionreturnsanobject ofclass“bibliometrix”,which isa listcontainingtheelements reportedinTable4.
• Thefunctionssummaryandplotsummarizethemainresultsofthebibliometricanalysis.Theydisplaytheprincipal infor-mationregardingthebibliographicdataframeandsixtables.summaryacceptstwoadditionalarguments:kisaformatting valuethatindicatesthenumberofrowsforeachtable;pauseisalogicalvalue(TRUEorFALSE)usedtopermit(ornot)a pauseinscreenscrolling.Forexample,choosingk=10,weexpressedthedesiretoviewthefirsttenauthorsorfirstten sources.TheresultsaredisplayedinTables5–10andinFig.2.
• Thecitationsfunctiongeneratesthefrequencytableofthemostcitedreferencesorthemostcitedfirstauthors(of refer-ences).Foreachdocument,citedreferencesareinasinglestringstoredinthe“CR”columnofthedataframe.Foracorrect
Table5
Descriptiveanalysis:Maininformationregardingthecollection.
Description
Articles 304
Period 1985–2015
AnnualPercentageGrowthRate 12.40 Averagecitationsperarticle 26.56
Authors 617
AuthorAppearances 801
Authorsofsingleauthoredarticles 32 Authorsofmultiauthoredarticles 585
ArticlesperAuthor 0.493
AuthorsperArticle 2.03
Co-AuthorsperArticles 2.63
CollaborationIndex 2.49
Table6
Descriptiveanalysis:Top10–Mostproductiveauthors.
Author No.ofArticles Author No.ofArticlesFractionalized
KostoffRN 16 KostoffRN 7.77 KajikawaY 9 VogelR 3.50 PorterAL 9 PorterAL 3.46 AbramoG 5 KajikawaY 3.00 D’AngeloCA 5 ShilburyD 3.00 MoedHF 5 TalukdarD 2.33 BowlesCA 4 HicksD 2.08 HicksD 4 EomSB 2.00 LeePC 4 McmillanGS 2.00 SakataI 4 SaetrenH 2.00 Table7
Descriptiveanalysis:Top10–Mostcitedpapers.
Paper TotalCitations(TC) TCperYear
ChenHC,ChiangRHL,StoreyVC,(2012),MisQ. 386 77.20
DaimTU,RuedaG,MartinH,GerdsriP,(2006),Technol.Forecast.Soc.Chang. 240 21.82 MoedHF,BurgerWJM,FrankfortJG,VanraanAFJ,(1985),Res.Policy 232 7.25 KostoffRN,ScallerRR,(2001),IEEETrans.Eng.Manage. 220 13.75
LohL,VenkatramanN,(1992),Inf.Syst.Res. 210 8.40
VolberdaHW,FossNJ,LylesMA,(2010),OrganSci. 202 28.86
Ramos-RodriguezAR,Ruiz-NavarroJ,(2004),Strateg.Manage.J. 190 14.62
MurrayF,(2002),Res.Policy 187 12.47
MelinG,(2000),Res.Policy 187 11.00
GambardellaA,(1992),Res.Policy 139 5.56
Table8
Descriptiveanalysis:Top10–Mostproductivecountries(basedonfirstauthor’saffiliation).
Country No.ofArticles %ofArticles
USA 89 29.5 Netherlands 25 8.3 England 21 6.9 Germany 20 6.6 Italy 20 6.6 Japan 15 5.0 Spain 15 5.0 Sweden 10 3.3 Taiwan 10 3.3 Australia 8 2.7
extraction,wemustidentifytheseparatorfieldamongdifferentreferencesusedbytheselecteddatabase.Typically,the WoSdefaultseparatoris“;”.Thebibliometrixtutorialalsodescribesotherseparators.
Citedreferencesfrequentlyhavenumerousinconsistenciesinthedataformat.Forexample,somedatabases,suchas Scopus,donothaveastandardizedformat.Thecitationsfunctionalsoimplementsasetofcleaningrulesasdescribedin Section4.1.
Table11containsthemostfrequentlyciteddocuments[CR<-citations(M,field=“article”,sep=“;”)].ThelocalCitations function[CR<-localCitations(M,results,sep=“;”)]generatesthefrequencytableofthemostlocalcitedauthors.Local
Table9
Descriptiveanalysis:Top10–Mostfrequentjournals.
Sources No.ofArticles %ofArticles
ResearchPolicy 58 19.1
TechnologicalForecastingandSocialChange 56 18.4
TechnologyAnalysis&StrategicManagement 17 5.6
Technovation 15 4.9
InternationalJournalofTechnologyManagement 6 2.0
R&DManagement 6 2.0
ScienceandPublicPolicy 6 2.0
AfricanJournalofBusinessManagement 5 1.6
JournalofTechnologyTransfer 5 1.6
JournalofBusinessEthics 4 1.3
Table10
Descriptiveanalysis:Top10–Mostfrequentkeywords.
AuthorKeywords(DE) No.ofArticles Keywords-Plus(ID) No.ofArticles
Bibliometrics 87 Science 82
BibliometricAnalysis 30 Innovation 39
CitationAnalysis 26 Performance 33
Nanotechnology 17 Journals 30
Research 17 Technology 30
Innovation 16 Impact 28
Analysis 13 Knowledge 26
Scientometrics 13 Management 26
TextMining 10 Bibliometrics 25
PatentAnalysis 9 CitationAnalysis 25
Fig.2. Publicationsperyear1985–2015. Table11
Citationanalysis:Top10–Mostcitedreferences.
CitedReference Citations
Ramos-RodriguezAR,2004,StrategicManageJ,V25,P981,Doi101002/Smj397 32 SmallH,1973,JAmSocInformSci,V24,P265,Doi101002/Asi4630240406 29 MccainKW,1990,JAmSocInformSci,V41,P433 26
NelsonRR,1982,EvolutionaryTheory 22
WhiteHR,1981,JAmSocInformSci,V32,P163,Doi101002/Asi4630320302 21
PriceDJ,1963,LittleSciBigSci 19
CohenWM,1990,AdminSciQuart,V35,P128,Doi102307/2393553 18 DaimTU,2006,TechnolForecastSoc,V73,P981,Doi101016/Jtechfore200604004 18 HoffmanDL,1993,JConsumRes,V19,P505,Doi101086/209319 18 NerurSP,2008,StrategManageJ,V29,P319,Doi101002/SMJ659 18 SmallH,1974,SciStud,V4,P17,Doi101177/030631277400400102 18
WhiteHD,1998,JAmSocInformSci,V49,P327 18
citationsmeasurehowmany timesanauthor includedinthis collectionhasbeencitedbyotherauthorsalsointhe collection.Table12reportsthemostfrequentlocalauthors.
• Theauthors’h-indexisanauthor-levelmetricthatattemptstomeasureboththeproductivityandcitationimpactofthe publicationsofascientistorscholar(Hirsch,2005).Theindexisbasedonthesetofthescientist’smostcitedpapersand
Table12
Localcitationanalysis:Top10–Mostcitedauthors.
LocalCitedAuthor Citations
KostoffRN 317 NarinF 103 PorterAL 96 LeydesdorffL 84 MoedHF 71 PilkingtonA 53 HicksD 48 MeyerM 47 MartinB 45 PavittK 43
thenumberofcitationsthattheyhavereceivedinotherpublications.TheHindexfunctioncalculatestheauthors’h-index anditsvariants(g-indexandm-index)inabibliographiccollection(vanEck&Waltman,2008).Functionargumentsare thefollowing:Mabibliographicdataframeandauthorsacharactervectorcontainingtheauthors’namesforwhichyou wanttocalculatetheh-index.Forexample,tocalculatetheh-index,inthiscollection,forauthorRonaldKostoff,youwould use[Hindex(M,authors=“KOSTOFFR”,sep=“;”)]
4.3. Networkcreationforbibliographiccoupling,co-citation,collaboration,andco-occurrenceanalyses
Adocument’sattributesareconnectedtoeachotherthroughthedocumentitself(e.g.,author(s)tojournal,keywordsto publicationdate).TheseconnectionsofdifferentattributescanberepresentedthroughamatrixDocument×Attribute.
cocMatrixisthegeneralfunctiontocreatearectangularmatrixDocument×AttributethatwecallA.Anattributeisan itemofinformationassociatedtothedocumentandstoredinafieldtagwithinthebibliometricdataframe(e.g.,authors, publicationsource,keywords,citedreferences,affiliations).
Insomecases,thismatrixcanbeinterpretedasabipartiteortwo-modenetwork(e.g.,whereattributeisauthor,keyword, citedreference).Forexample,tocreateaDocument×Citedreferencematrix,wemustusethefieldtag“CR”[A<-cocMatrix(M, Field=“CR”,sep=“;”)].Inthiscase,Aisarectangularbinarymatrix(andalsoabipartitenetwork)whereeachrowisa documentandeachcolumnconcernsacitedreferenceofthecollection.Thegenericelementaijis“1”ifthedocumentihas
citedthereferencej,otherwiseitis“0”.Thej-thcolumnsuma+jisthenumberofdocumentscitingthereferencej.Thei-th
rowsumai+isthenumberofreferencescitedbydocumenti.
UsingthecocMatrixfunction,severalmatricescanbecomputed,suchas: • Document×Author[A<-cocMatrix(M,Field=“AU”,sep=“;”)];
• Document×Country. Authors’ Countries is not a standard attribute of the bibliographic data frame. We must extract this information from the affiliation attribute using the metaTagExtraction function [M <- metaTagExtrac-tion(M,Field=“AUCO”,sep=“;”);A<-cocMatrix(M,Field=“AUCO”,sep=“;”)].metaTagExtractionallowsthefollowing additional field tags to be extracted: Authors’ countries (Field=“AUCO”), First author of each cited reference (Field=“CRAU”), Publication source of each cited reference (Field=“CRSO”), and affiliation for each co-author (Field=“AUUN”);
• Document×Authors’keyword[A<-cocMatrix(M,Field=“DE”,sep=“;”)]orDocumentxKeywordPlus[A<-cocMatrix(M, Field=“ID”,sep=“;”)].
a)Bibliographiccoupling
Twoarticlesaresaidtobebibliographicallycoupledifatleastonecitedsourceappearsinthebibliographiesorreference listsofbotharticles(Kessler,1963).Abibliographiccouplingnetworkcanbeobtainedusingthegeneralformula:
Bcocit=A×A
whereAis aDocument×Citedreference matrix.Elementbij indicateshowmany bibliographiccouplingsexistbetween
documentsiandj.Bcoupisanon-negativeandsymmetricalmatrixBcoup=Bcoup.
Thestrengthofthebibliographiccouplingoftwoarticles,iandjisdefinedsimplybythenumberofreferencesthatthe articleshaveincommon,asgivenbytheelementbijofmatrixBcoup.
ThebiblioNetworkfunctioncalculates,startingfromabibliographicdataframe,themostfrequentlyusedbibliographic couplingnetworkssuchasdocuments,authors,sources,keywords,andcountries.TousebiblioNetworkitisnecessarytoset twoarguments.First,thetypeofanalysisisset.Inthiscase,theanalysisargumentis“coupling”(alternatively,“co-citation”, “collaboration”,and“co-occurrences”).Then,thenetworkunitofanalysis,whichcanbealternatively“authors”,“references”, “sources”,“countries”,“keywords”,“authorkeywords”,“titles”,or“abstracts”mustbeset.
Thefollowingcodecalculatesaclassicaldocumentbibliographiccouplingnetwork[NetMatrix<-biblioNetwork(M, analysis=“coupling”,network=“references”,sep=“;”)].Articleswithonlyasmallnumberofreferences,therefore,would
tendtobemoreweaklybibliographicallycoupledwhenbibliographiccouplingstrengthissimplymeasuredaccordingto thenumberofreferencesthatarticleshaveincommon.
b)Co-citationanalysis
Co-citationoftwoarticlesoccurswhenbotharecitedinathirdarticle.Thus,co-citationisthecounterpartofbibliographic coupling.Aco-citationnetworkcanbeobtainedusingthegeneralformula:
Bcoup=A×A
whereAisaDocument×Citedreferencematrix.
SimilartomatrixBcoup,matrixBcocitisalsosymmetric.Elementbijindicateshowmanyco-citationsexistbetween
docu-mentsiandj.ThemaindiagonalofBcocitcontainsthenumberofdocumentswhereareferenceiscitedinourdataframe.That
is,thediagonalelementbiiisthenumberoflocalcitationsofthereferencei.ThebiblioNetworkfunctionprovidesaclassical referenceco-citationnetwork[NetMatrix<-biblioNetwork(M,analysis=“co-citation”,network=“references”,sep=“;”)].
c)Collaborationanalysis
Ascientificcollaborationnetworkisanetworkwherenodesareauthorsandlinksareco-authorships.Itisoneofthe mostwell-documentedformsofscientificcollaboration(Glänzel&Schubert,2004).Anauthorcollaborationnetworkcan beobtainedusingthegeneralformula:
Bcoll=A×A
whereAisaDocument×Authormatrix.Elementbijindicateshowmanycollaborationsexistbetweenauthorsiandj.The
diag-onalelementbiiisthenumberofdocumentsauthoredorco-authoredbyresearcheri.ThebiblioNetworkfunctioncalculatesan
authors’collaborationnetwork[NetMatrix<-biblioNetwork(M,analysis=“collaboration”,network=“authors”,sep=“;”)]or acountrycollaborationnetwork[NetMatrix<-biblioNetwork(M,analysis=“collaboration”,network=“countries”,sep=“;”)].
d)Co-wordanalysis
Theaimoftheco-wordanalysisistodrawtheconceptualstructureofaframeworkusingawordco-occurrence net-worktomapandclustertermsextractedfromkeywords,titles,orabstractsinabibliographiccollection[NetMatrix <-biblioNetwork(M,analysis=“co-occurrences”,network=“keywords”,sep=“;”)].
Aco-wordnetworkcanbeobtainedusingthegeneralformula: Bcoc=A×A
whereAisaDocument×Wordmatrix,whereWordis,alternatively,authors’keywords,keywordsplus,ortermsextracted fromtitlesorabstracts.Elementbijindicateshowmanyco-occurrencesexistbetweenwordsiandj.Thediagonalelement
biiisthenumberofdocumentscontainingthewordi.
ThetermExtractionfunctionextractstermsfromatextualfield(e.g.,abstract,title,author’skeywords),deletes stop-words,andappliesPorter’sstemmingalgorithm(Porter,1980).Stemmingistheprocessofreducinginflected(orsometimes derived)wordstotheirwordstem,baseorrootform,typicallyawrittenwordform.Hence,thisfunctionnormalizesterms beforeperformingtheco-occurrenceanalysis[M<-termExtraction(M,Field=“TI”,stemming=TRUE,language=“english”, verbose=TRUE)].
ThebibliometrixR-packageallowsusingtheconceptualStructurefunctiontoperformmultiplecorrespondenceanalysis (MCA)todrawaconceptualstructureofthefieldandK-meansclusteringtoidentifyclustersofdocumentsthatexpress commonconcepts.
MCAisanexploratorymultivariatetechniqueforthegraphicalandnumericalanalysisofmultivariatecategoricaldata (Benzécri,1982;Greenacre&Blasius,2006;Lebart,Morineau,&Warwick,1984).MCAperformsahomogeneityanalysis ofanindicatormatrixtoobtainalow-dimensionalEuclideanrepresentationoftheoriginaldata(Gifi,1990).Inco-word analysis,MCAisappliedtoaDocumentxWordmatrixA.Thewordsareplottedonatwo-dimensionalmap[CS<- conceptu-alStructure(M,field=“ID”,minDegree=5,k.max=5,stemming=FALSE),labelsize=5].Theresultsareinterpretedbasedonthe relativepositionsofthepointsandtheirdistributionalongthedimensions;aswordsaremoresimilarindistribution,the closertheyarerepresentedinthemap(Fig.3)(Cuccurullo,Aria,&Sarto,2016).
4.4. Normalization
ThenormalizeSimilarityfunctionallowstheusertonormalizebibliographiccoupling,co-citation,andco-occurrencedata calculatingasimilaritymeasure.Thisfunctioncomputesthefollowingmeasures(vanEck&Waltman,2009):theassociation strength(alsocalledproximityindex),theinclusionindex(alsocalledSimpson’scoefficient),theJaccard’scoefficient,and theSalton’scosine.
Fig.3.Conceptualmapandkeywordclusters.
Theassociationstrengthistheratiobetweentheobservedandexpectedstrengthundertheassumptionofprobabilistic independence:
SAij= bbij iibjj
[S<-normalizeSimilarity(NetMatrix,type=“association”)].
Theinclusionindexisanoverlapmetricthatmeasureshowmuchasetisincludedinanother: SIij= bij
min
bii,bjj[S<-normalizeSimilarity(NetMatrix,type=“inclusion”)].
TheJaccard’sindex(orJaccard’ssimilaritycoefficient)isarelativemeasureoftheintersectionoftwosets.Itiscalculated astheratiobetweentheintersectionandtheunionofthetwoobjects:
SJij=b bij ii+bjj−bij
Fig.4.KeywordPlusco-occurrencenetwork(Kamada&Kawailayout).
TheSalton’sindexrelatestheintersectionofthetwoobjectstothegeometricmeanofthesizeofboth: SSij=
bijbiibjj
[S<-normalizeSimilarity(NetMatrix,type=“salton”)].
ThesquareofSalton’sindexisalsocalledtheequivalenceindex[S<-normalizeSimilarity(NetMatrix,type= “equiva-lence”)].
4.5. Networkmapping
Allbibliometricnetworkscanbegraphicallyvisualizedormodelled.ThenetworkPlotfunctionplotsanetworkcreatedby biblioNetworkusingRroutinesorusingVOSviewersoftwarebyNeesJanvanEckandLudoWaltman(VanEck&Waltman, 2010;vanEck,Waltman,&Noyons,2010;Waltman,VanEck,&Noyons,2010).
Fig.4displaysakeywordplusco-occurrencenetworkusingthekamada-kawailayout(Kamada&Kawai,1989).The networkis drawnselectingthe40verticeswithhighestdegree[COC<-biblioNetwork(M,analysis=“co-occurrences”,
Fig.5. Countrycollaborationnetwork(Spherelayout).
network=“keywords”,sep=“;”),networkPlot(COC,n=40,size=TRUE,remove.multiple=T,Title=“Termco-occurrences”, type=“kamada”,labelsize=0.5)].
Fig.5displaysanotherexampleofabibliographicnetworkconsideringcollaborationlinksbetweencountries.Inthiscase, weusedspherelayout[M<−metaTagExtraction(M,Field=“AUCO”);CC<-biblioNetwork(M,analysis=“collaboration”, network=“countries”,sep=“;”);networkPlot(CC,n=44,size=TRUE,remove.multiple=FALSE,Title=“CountryCollaboration”, type=“sphere”)].
bibliometrixalsoperforms historiographicanalysis,asproposedbyGarfield(2004)[histResults<-histNetwork(M,n =20,sep=“;”)].ThehistPlotfunctionplotsachronologicalcitationnetwork(calledahistoriograph,pleaseseeFig.6and Table13)that representsa chronologicalmap ofthemostrelevant citationsresultingfrom abibliographiccollection [histPlot(histResults,size=FALSE)].
5. Conclusions
Sciencemappingisbecominganessentialactivityforscholarsofallscientificdisciplines.Asthenumberofpublications continuestoexpandatincreasingratesandpublicationsdevelopfragmentarily,thetaskofaccumulatingknowledgebecomes morecomplicated.Thedeterminationofintellectualstructureandtheresearch-frontofscientificdomainsareimportant notonlyfortheresearchbutalsoforthepolicy-makingandpractice.
Fig.6.Historiograph.
Specializedsoftwaretoolscommonlyperformonlycertainstepsofsciencemappinganalysis.Onlyasmallnumberof theseallowscholarstofollowthecompleteworkflow.bibliometrixisanopen-sourcetoolforexecutingacomprehensive sciencemappinganalysisofscientificliterature.ItwasprogrammedinRtobeflexibleandfacilitateintegrationwithother statisticalandgraphicalpackages.Indeed,bibliometricsisaconstantlychangingscienceandbibliometrixhastheflexibility tobequicklyupgradedandintegrated.Itsdevelopmentcanaddressalargeandactivecommunityofdevelopersformed byprominentresearchers.Theadvantagesaredirect.Infact,sourcesarepublishedonGitHubpermittingthecreationofa shareddevelopment.Otheradvantagesareindirect.Inanenvironmentcomposedofthousandsofpackages,bibliometrixcan beastepinalargerworkflow,exploitingotherRsolutions.
Wearealreadyworkingonnewdevelopments.Theyconcern(i)theextensionofcompatibilitywithotherbibliographic databasessuchasPubMed,(ii)theimprovementofreferencedisambiguationbystringmetric-basedalgorithms,(iii)the introductionofdirectcitation(Klavans&Boyack,2017)andtri-citationanalysis(Marion,2002;McCain,2009),and(iv) theuseofhybridmethodsthatcombinebibliometricandsemanticapproaches(Glänzel&Thijs,2012;Thijs,Schiebel,& Glänzel,2013).Thelast-mentioneddevelopmentincludesterm-burstdetectionthroughexpectilesmoothing(Schnabel& Eilers,2009),thematicmappingandevolution(Coboetal.,2011b),andlatentsemanticanalysis(Dumais,2004).
Table13
Historiographlegend.
ID Reference DOI Localcitations Totalcitations
1985−1 MOEDHF,1985,RESPOLICY 10.1016/0048−7333(85)90012-5 14 232 1988−2 NARINF,1988,RESPOLICY 10.1016/0048-7333(88)90039-X 9 44 1993−3 NEDERHOFAJ,1993,RESPOLICY 10.1016/0048−7333(93)90005-3 6 61
1993−4 HOFFMANDL,1993,JCONSUMRES 10.1086/209319 18 67
1995−5 PORTERAL,1995,TECHNOLFORECASTSOC 10.1016/0040−1625(95)00022-3 8 89 1995−6 USDIKENB,1995,ORGANSTUD 10.1177/017084069501600306 11 80 1997−7 WATTSRJ,1997,TECHNOLFORECASTSOC 10.1016/S0040-1625(97)00050-4 14 112 1998−8 RINIAEJ,1998,RESPOLICY 10.1016/S0048-7333(98)00026-2 10 113 2001−9 KOSTOFFRN,2001,IEEETENGMANAGE 10.1109/17.922473 13 220 2004−10 RAMOS-RODRIGUEZAR,2004,STRATEGICMANAGEJ 10.1002/SMJ.397 32 190 2004−11 KOSTOFFRN,2004,TECHNOLFORECASTSOC 10.1016/S0040-1625(03)00048-9 6 100 2005−12 KOSTOFFRN,2005,TECHNOLFORECASTSOC 10.1016/J.TECHFORE.2005.02.001 6 14 2006−13 DAIMTU,2006,TECHNOLFORECASTSOC 10.1016/J.TECHFORE.2006.04.004 18 240 2006−14 SCHILDTHA,2006,ENTREPTHEORYPRACT 10.1111/J.1540-6520.2006.00126.X 8 59 2006−15 PILKINGTONA,2006,TECHNOVATION 10.1016/J.TECHNOVATION.2005.01.009 6 38 2007−16 KOSTOFFRN,2007,TECHNOLFORECASTSOC 10.1016/J.TECHFORE.2007.02.007 6 26 2007−17 KOSTOFFRN,2007,TECHNOLFORECASTSOC 10.1016/J.TECHFORE.2007.04.004 7 47 2007−18 BIEMANSW,2007,JPRODINNOVATMANAG 10.1111/J.1540-5885.2007.00245.X 6 20 2008−19 KAJIKAWAY,2008,TECHNOLFORECASTSOC 10.1016/J.TECHFORE.2008.04.007 7 44 2008−20 SHIBATAN,2008,TECHNOVATION 10.1016/J.TECHNOVATION.2008.03.009 7 83 2008−21 KAJIKAWAY,2008,TECHNOLFORECASTSOC 10.1016/J.TECHFORE.2007.05.005 14 76 2008−22 NERURSP,2008,STRATEGMANAGEJ 10.1002/SMJ.659 18 96
2008−23 CHARVETFF,2008,JBUSLOGIST <NA> 9 34
2009−24 KAJIKAWAY,2009,TECHNOLFORECASTSOC 10.1016/J.TECHFORE.2009.04.004 7 34 2009−25 ABRAMOG,2009,RESPOLICY 10.1016/J.RESPOL.2008.11.001 6 50
Authorcontributions
MassimoAria,CorradoCuccurullo:Conceivedanddesignedtheanalysis;Collectedthedata;Contributeddataoranalysis tools;Performedtheanalysis;Wrotethepaper.
Acknowledgements
Theauthorswouldliketothanktheeditorandrefereesfortheirhelpfulcomments.Thesehaveallowedustosignificantly improvethequalityofthispaper.
References
Alavifard,S.(2015).hindexcalculator:H-indexcalculatorusingdatafromawebofscience(WoS)citationreport.Rpackageversion1.0.0. https://CRAN.R-project.org/package=hindexcalculator
Börner,K.,Chen,C.,&Boyack,K.W.(2003).Visualizingknowledgedomains.AnnualReviewofInformationScienceandTechnology,37(1),179–255.
Bailón-Moreno,R.,Jurado-Alameda,E.,&Ruiz-Ba ˜nos,R.(2006).Thescientificnetworkofsurfactants:Structuralanalysis.JournaloftheAmericanSociety forInformationScienceandTechnology,57(7),949–960.
Bar-Ilan,J.(2007).Whichh-index?AcomparisonofWoS,ScopusandGoogleScholar.Scientometrics,74(2),257–271.
Benzécri,J.P.(1982).L’AnalysedesDonnéss.II.L’analysedescorrespondances.Paris:Dunod.
Briner,R.B.,&Denyer,D.(2012).Systematicreviewandevidencesynthesisasapracticeandscholarshiptool.InHandbookofevidence-basedmanagement: Companies,classroomsandresearch.pp.112–129.
Broadus,R.(1987).Towardadefinitionofbibliometrics.Scientometrics,12(5–6),373–379.
Callon,M.,Courtial,J.-P.,Turner,W.A.,&Bauin,S.(1983).Fromtranslationstoproblematicnetworks:Anintroductiontoco-wordanalysis.SocialScience Information,22(2),191–235.http://dx.doi.org/10.1177/053901883022002003
Chen,C.(2006).CiteSpaceII:Detectingandvisualizingemergingtrendsandtransientpatternsinscientificliterature.JournaloftheAssociationfor InformationScienceandTechnology,57(3),359–377.
Cobo,M.J.,Lopez-Herrera,A.G.,Herrera-Viedma,E.,&Herrera,F.(2011).ScienceMappingSoftwareTools:Review,analysis,andcooperativestudy amongtools.JournaloftheAmericanSocietyforInformationScienceandTechnology.
Cobo,M.J.,López-Herrera,A.G.,Herrera-Viedma,E.,&Herrera,F.(2011).Anapproachfordetecting,quantifying,andvisualizingtheevolutionofa researchfield:apracticalapplicationtothefuzzysetstheoryfield.JournalofInformetrics,5(1),146–166.
Cobo,M.J.,López-Herrera,A.G.,Herrera-Viedma,E.,&Herrera,F.(2012).SciMAT:Anewsciencemappinganalysissoftwaretool.JournaloftheAmerican SocietyforInformationScienceandTechnology,63(8),1609–1630.
Crane,D.(1972).Invisiblecolleges:Diffusionofknowledgeinscientificcommunities.Chicago:UniversityofChicagoPress.
Cuccurullo,C.,Aria,M.,&Sarto,F.(2016).Foundationsandtrendsinperformancemanagement.Atwenty-fiveyearsbibliometricanalysisinbusinessand publicadministrationdomains.Scientometrics,108(2),595–611.
Diodato,V.(1994).Dictionaryofbibliometrics.Binghamton,NY:HaworthPress.
Dumais,S.T.(2004).Latentsemanticanalysis.AnnualReviewofInformationScienceandTechnology,38,189–230.
Gagolewski,M.(2011).BibliometricimpactassessmentwithRandtheCITANpackage.JournalofInformetrics,5(4),678–692.
Gao,X.,&Guan,J.(2009).Networksofscientificjournals:AnexplorationofChinesepatentdata.Scientometrics,80(1),283–302.
Garfield,E.(2004).Historiographicmappingofknowledgedomainsliterature.JournalofInformationScience,30(2),119–145.
Gifi,A.(1990).Nonlinearmultivariateanalysis.JohnWiley&SonsIncorporated.
Glänzel,W.,&Schubert,A.(2004).Analysingscientificnetworksthroughco-authorship.pp.257–279.Handbookofquantitativescienceandtechnology research(11).
Glänzel,W.,&Thijs,B.(2012).Usingcoredocumentsfordetectingandlabellingnewemergingtopics.Scientometrics,91(2),399–416.
http://dx.doi.org/10.1007/s11192-011-0591-7
Glänzel,W.(2001).Nationalcharacteristicsininternationalscientificco-authorshiprelations.Scientometrics,51(1),69–115.
Greenacre,M.,&Blasius,J.(Eds.).(2006).Multiplecorrespondenceanalysisandrelatedmethods.CRCPress.
Guler,A.T.,Waaijer,C.J.,Mohammed,Y.,&Palmblad,M.(2016).AutomatingbibliometricanalysesusingTavernascientificworkflows:Atutorialon integratingWebServices.JournalofInformetrics,10(3),830–841.
Guler,A.T.,Waaijer,C.J.,&Palmblad,M.(2016).Scientificworkflowsforbibliometrics.Scientometrics,107(2),385–398.
Harzing,A.W.(2007).PublishorPerish.[availablefrom].http://www.harzing.com/pop.htm
Hirsch,J.E.(2005).Anindextoquantifyanindividual’sscientificresearchoutput.ProceedingsoftheNationalacademyofSciencesoftheUnitedStatesof America,16569–16572.
Kamada,T.,&Kawai,S.(1989).Analgorithmfordrawinggeneralundirectedgraphs.InformationProcessingLetters,31(1),7–15[Elsevier].
Keirstead,J.(2015).scholar:analysecitationdatafromGoogleScholar.Rpackage.
Kessler,M.M.(1963).Bibliographiccouplingbetweenscientificpapers.JournaloftheAssociationforInformationScienceandTechnology,14(1),10–25.
Klavans,R.,&Boyack,K.W.(2017).Whichtypeofcitationanalysisgeneratesthemostaccuratetaxonomyofscientificandtechnicalknowledge?Journal oftheAssociationforInformationScienceandTechnology,68(4),984–998.
Lebart,L.,Morineau,A.,&Warwick,K.M.(1984).Multivariatedescriptivestatisticalanalysis(correspondenceanalysisandrelatedtechniquesforlarge matrices).Chichester:Wiley.
Marion,L.(2002).Atri-citationanalysisexploringthecitationimageofKurtLewin.ProceedingsoftheAmericanSocietyforInformationScienceand Technology,39(1),3–13.
Matloff,N.(2011).TheartofRprogramming:Atourofstatisticalsoftwaredesign.NoStarchPress.
McCain,K.W.(1991).Mappingeconomicsthroughthejournalliterature:Anexperimentinjournalcocitationanalysis.JournaloftheAmericanSocietyfor InformationScience,42(4),290.
McCain,K.W.(2009).Usingtricitationtodissectthecitationimage:ConradHalWaddingtonandtheriseofevolutionarydevelopmentalbiology.Journal oftheAmericanSocietyforInformationScienceandTechnology,60(7),1301–1319.http://dx.doi.org/10.1002/asi
Persson,O.,Danell,R.,&Schneider,J.W.(2009)..pp.9–24.HowtouseBibexcelforvarioustypesofbibliometricanalysis.Celebratingscholarlycommunication studies:AFestschriftforOllePerssonathis60thBirthday(5).
Peters,H.,&VanRaan,A.(1991).Structuringscientificactivitiesbyco-authoranalysis:Anexperciseonauniversityfacultylevel.Scientometrics,20(1), 235–255.
Porter,M.F.(1980).Analgorithmforsuffixstripping.Program,14(3),130–137.
Pritchard,A.(1969).Statisticalbibliographyorbibliometrics.JournalofDocumentation,25,348.
RCoreTeam.(2016).R:Alanguageandenvironmentforstatisticalcomputing.Vienna,Austria:RFoundationforStatisticalComputing.
https://www.R-project.org
Rousseau,D.M.(Ed.).(2012).TheOxfordhandbookofevidence-basedmanagement.OxfordUniversityPress.
Schnabel,S.K.,&Eilers,P.H.(2009).Optimalexpectilesmoothing.ComputationalStatistics&DataAnalysis,53(12),4168–4177.
Sci2Team.(2009).ScienceofScience(Sci2)Tool.IndianaUniversityandSciTechStrategies.https://sci2.cns.iu.edu
Skupin,A.(2009).Discreteandcontinuousconceptualizationsofscience:Implicationsforknowledgedomainvisualization.JournalofInformetrics,3(3), 233–245.
Small,H.G.,&Koenig,M.E.(1977).Journalclusteringusingabibliographiccouplingmethod.InformationProcessing&Management,13(5),277–288.
Small,H.,&Upham,P.(2009).Citationstructureofanemergingresearchareaonthevergeofapplication.Scientometrics,79(2),365–375.
Small,H.(1973).Co-citationinthescientificliterature:Anewmeasureoftherelationshipbetweentwodocuments.JournaloftheAssociationfor InformationScienceandTechnology,24(4),265–269.
Small,H.(2006).Trackingandpredictinggrowthareasinscience.Scientometrics,68(3),595–610.
Thijs,B.,Schiebel,E.,&Glänzel,W.(2013).Dosecond-ordersimilaritiesprovideadded-valueinahybridapproach?Scientometrics,96(3),667–677.
http://dx.doi.org/10.1007/s11192-012-0896-1
Uddin,A.(2016).scientoText:Text&ScientometricAnalytics.Rpackageversion0.1.[https://CRAN.R-project.org/package=scientoTextversion0.1.4, http://github.com/jkeirstead/scholar].
Upham,S.P.,&Small,H.(2010).Emergingresearchfrontsinscienceandtechnology:patternsofnewknowledgedevelopment.Scientometrics,83(1), 15–38.
Waltman,L.,VanEck,N.J.,&Noyons,E.C.M.(2010).Aunifiedapproachtomappingandclusteringofbibliometricnetworks.JournalofInformetrics,4(4), 629–635.
Waltman,L.(2016).Areviewoftheliteratureoncitationimpactindicators.JournalofInformetrics,10(2),365–391.
White,H.D.,&Griffith,B.C.(1981).Authorcocitation:Aliteraturemeasureofintellectualstructure.JournaloftheAmericanSocietyforInformationScience, 32(3),163–171.http://dx.doi.org/10.1002/asi.4630320302
White,D.,&McCain,K.(1998).Visualizingadiscipline:Anauthorco-citationanalysisofinformationscience,1972–1995.JournaloftheAmericanSociety forInformationScience,49(4),327–355.
Yan,E.,&Ding,Y.(2012).Scholarlynetworksimilarities:Howbibliographiccouplingnetworks,citationnetworks,cocitationnetworks,topicalnetworks, coauthorshipnetworks,andcowordnetworksrelatetoeachother.JournaloftheAmericanSocietyforInformationScienceandTechnology,63(7), 1313–1326.
Yang,K.,&Meho,L.I.(2006).Citationanalysis:acomparisonofGoogleScholar,Scopus,andWebofScience.ProceedingsoftheAmericanSocietyfor informationscienceandtechnology,43(1),1–15.
Yang,S.,Han,R.,Wolfram,D.,&Zhao,Y.(2016).Visualizingtheintellectualstructureofinformationscience(2006–2015):Introducingauthorkeyword couplinganalysis.JournalofInformetrics,10(1),132–150.
Zhao,D.,&Strotmann,A.(2008).Evolutionofresearchactivitiesandintellectualinfluencesininformationscience1996–2005:Introducingauthor bibliographic-couplinganalysis.JournaloftheAmericanSocietyforInformationScience,59(1998),2070–2086.http://dx.doi.org/10.1002/asi
Zupic,I.,& ˇCater,T.(2015).Bibliometricmethodsinmanagementandorganization.OrganizationalResearchMethods,18(3),429–472.
deMoya-Anegon,F.,Vargas-Quesada,B.,Chinchilla-Rodriguez,Z.,Corera-Alvarez,E.,Herrero-Solana,V.,&Munoz-Fernández,F.J.(2005).Domainanalysis andinformationretrievalthroughtheconstructionofheliocentricmapsbasedonISI-JCRcategorycocitation.InformationProcessing&Management, 41(6),1520–1533.
vanEck,N.J.,&Waltman,L.(2008).Generalizingtheh-andg-indices.JournalofInformetrics,2(4),263–271.
vanEck,N.J.,&Waltman,L.(2009).Howtonormalizecooccurrencedata?Ananalysisofsomewell-knownsimilaritymeasures.JournaloftheAssociation forInformationScienceandTechnology,60(8),1635–1651.
vanEck,N.J.,&Waltman,L.(2010).Softwaresurvey:VOSviewer,acomputerprogramforbibliometricmapping.Scientometrics,84(2),523–538.
vanEck,N.J.,&Waltman,L.(2014).CitNetExplorer:Anewsoftwaretoolforanalyzingandvisualizingcitationnetworks.JournalofInformetrics,8(4), 802–823.
vanEck,N.J.,Waltman,L.,&Noyons,C.M.(2010).Aunifiedapproachtomappingandclusteringofbibliometricnetworks14.EleventhInternational ConferenceonScienceandTechnologyIndicators[p.284].