Metodo 2 job recommendation

Il clustering gerarchico di position è stato utilizzato nella job recommendation in due maniere: come base per la valutazione della recall del primo metodo di job recommendation e per lo sviluppo di un secondo metodo di job recommendation. Job recommendation gerarchica

Nella sezione 6.3 è stato proposto un nuovo metodo di job recommendation. Nel listato 7.9 il codice relativo.

Listato 7.9: Metodo 2 di job recommendation

1 l i b r a r y ( Matrix ) 2 l i b r a r y ( pbapply ) 3 4 #h i e r a r c h i c a l c l u s t e r i n g o f t r a i n i n g s e t cutted at height 12 5 6 hc_complete <− h c l u s t ( d i s t ) 7 8 cutted_hc_9 <− c u t r e e ( hc_complete , h=12) 9 10 s k i l l_c o o c c u r r e n c e s <− t ( s_t r a i n i n g ) %∗% s_t r a i n i n g 11

12 occ_1 <− matrix ( s k i l l_occurrences , ncol=ncol ( s k i l l_c o o c c u r r e n c e s )

, nrow=nrow ( s k i l l_c o o c c u r r e n c e s ) , byrow=TRUE)

14 occ_2 <− matrix ( s k i l l_occurrences , ncol=ncol ( s k i l l_c o o c c u r r e n c e s )

, nrow=nrow ( s k i l l_c o o c c u r r e n c e s ) )

16 s k i l l_j a c c a r d <− s k i l l_c o o c c u r r e n c e s / ( occ_1 + occ_2 − s k i l l_

c o o c c u r r e n c e s )

18 #f u n c t i o n that r e t u r n s subtree o f a input node 19

20 get_subtree <− f u n c t i o n ( x ) {

21 cl u_to_merge <− hc_complete $merge [ x , ] 22

23 i f ( clu_to_merge [ 1 ] < 0 && clu_to_merge [ 2 ] < 0) { 24 return ( cl u_to_merge )

25 } e l s e i f ( cl u_to_merge [ 1 ] > 0 && cl u_to_merge [ 2 ] < 0) { 26 return ( c ( clu_to_merge [ 2 ] , get_subtree ( clu_to_merge [ 1 ] ) ) ) 27 } e l s e i f ( cl u_to_merge [ 1 ] < 0 && cl u_to_merge [ 2 ] > 0) { 28 return ( c ( clu_to_merge [ 1 ] , get_subtree ( clu_to_merge [ 2 ] ) ) ) 29 } e l s e {

30 return ( c ( get_subtree ( clu_to_merge [ 2 ] ) , get_subtree ( clu_to_

merge [ 1 ] ) ) )

31 } 32 } 33

34 # method 2 of job recommendation

36 h i e r a r c h i c a l_c l a s s i f i c a t i o n <− f u n c t i o n (x , t r e e_index ) { 37

38 cl u <− hc_complete $merge [ t r e e_index , ] 39

40 i f ( clu [ 1 ] < 0 && clu [ 2 ] < 0) { 41 p o s i t i o n_1 <− −cl u [ 1 ] 42 p o s i t i o n_2 <− −cl u [ 2 ] 43 } e l s e i f ( cl u [ 1 ] > 0 && clu [ 2 ] < 0) { 44 p o s i t i o n_1 <− −get_subtree ( clu [ 1 ] ) 45 p o s i t i o n_2 <− −cl u [ 2 ] 46 } e l s e i f ( cl u [ 1 ] < 0 && clu [ 2 ] > 0) { 47 p o s i t i o n_1 <− −cl u [ 1 ] 48 p o s i t i o n_2 <− −get_subtree ( clu [ 2 ] ) 49 } e l s e { 50 p o s i t i o n_1 <− −get_subtree ( clu [ 1 ] ) 51 p o s i t i o n_2 <− −get_subtree ( clu [ 2 ] ) 52 }

53 54 index <− which ( s_t e s t [ x , ] == 1) 55 person_v e c t o r s <− s k i l l_j a c c a r d [ index , ] 56 i f ( length ( index ) == 1) { 57 person_vector <− person_v e c t o r s 58 } e l s e {

59 person_vector <− colMeans ( person_v e c t o r s ) 60 } 61 62 i f ( length ( p o s i t i o n_1) > 1) { 63 c e n t r o i d_1 <− colMeans ( p o s i t i o n_s k i l l_j a c c a r d [ p o s i t i o n_1 , ] ) 64 } e l s e { 65 c e n t r o i d_1 <− p o s i t i o n_s k i l l_j a c c a r d [ p o s i t i o n_1 , ] 66 } 67 68 i f ( length ( p o s i t i o n_2) > 1) { 69 c e n t r o i d_2 <− colMeans ( p o s i t i o n_s k i l l_j a c c a r d [ p o s i t i o n_2 , ] ) 70 } e l s e { 71 c e n t r o i d_2 <− p o s i t i o n_s k i l l_j a c c a r d [ p o s i t i o n_2 , ] 72 } 73 74 d i s t_1 <− d i s t ( rBind ( person_vector , c e n t r o i d_1) ) 75 76 d i s t_2 <− d i s t ( rBind ( person_vector , c e n t r o i d_2) ) 77 78 true_pos <− which (p_t e s t [ x , ] == 1) 79 80 i f ( d i s t_1 <= d i s t_2) { 81 i f ( clu [ 1 ] > 0) {

82 return ( c ( true_pos %in% p o s i t i o n_1 , h i e r a r c h i c a l_

c l a s s i f i c a t i o n (x , clu [ 1 ] ) ) )

83 } e l s e {

84 return ( true_pos %in% p o s i t i o n_1) 85 }

86 } e l s e {

87 i f ( clu [ 2 ] > 0) {

88 return ( c ( true_pos %in% p o s i t i o n_2 , h i e r a r c h i c a l_

c l a s s i f i c a t i o n (x , clu [ 2 ] ) ) )

90 return ( true_pos %in% p o s i t i o n_2) 91 }

92 } 93 } 94

95 recommendation_h i e r a r c h i c a l <− pbsapply ( c ( 1 : nrow (p_t e s t ) ) ,

f u n c t i o n ( x ) h i e r a r c h i c a l_c l a s s i f i c a t i o n (x , 2 4 0 9 ) ) Recall del primo metodo basandosi sul clustering gerarchico

Nella sezione 6.3.2 è stato descritto come è stata ricalcolata la recall del primo metodo sfruttando la gerarchia di position. Il listato 7.10 mostra il codice corrispondente.

Listato 7.10: Calcolo recall primo metodo di job recommendation

1 l i b r a r y ( Matrix ) 2 l i b r a r y ( pbapply ) 3 4 f i r s t_method_h i e r a r c h i c a l_c l a s s i f i c a t i o n <− f u n c t i o n ( true_pos , t r e e_index , rec_pos ) { 5

6 cl u <− hc_complete $merge [ t r e e_index , ] 7 8 i f ( clu [ 1 ] < 0 && cl u [ 2 ] < 0) { 9 p o s i t i o n_1 <− −cl u [ 1 ] 10 p o s i t i o n_2 <− −cl u [ 2 ] 11 } e l s e i f ( cl u [ 1 ] > 0 && clu [ 2 ] < 0) { 12 p o s i t i o n_1 <− −get_subtree ( clu [ 1 ] ) 13 p o s i t i o n_2 <− −cl u [ 2 ] 14 } e l s e i f ( cl u [ 1 ] < 0 && clu [ 2 ] > 0) { 15 p o s i t i o n_1 <− −cl u [ 1 ] 16 p o s i t i o n_2 <− −get_subtree ( clu [ 2 ] ) 17 } e l s e { 18 p o s i t i o n_1 <− −get_subtree ( clu [ 1 ] ) 19 p o s i t i o n_2 <− −get_subtree ( clu [ 2 ] ) 20 } 21 22 i f ( rec_pos %in% p o s i t i o n_1) { 23 i f ( true_pos %in% p o s i t i o n_1) { 24 i f ( length ( p o s i t i o n_1) == 1) {

25 return (1) 26 } e l s e {

27 return ( c(1+ f i r s t_method_h i e r a r c h i c a l_c l a s s i f i c a t i o n ( true

_pos , clu [ 1 ] , r ec_pos ) ) )

28 } 29 } e l s e { 30 return (0 ) 31 } 32 } e l s e { 33 i f ( true_pos %in% p o s i t i o n_2) { 34 i f ( length ( p o s i t i o n_2) == 1) { 35 return (1) 36 } e l s e {

37 return ( c(1+ f i r s t_method_h i e r a r c h i c a l_c l a s s i f i c a t i o n ( true

_pos , clu [ 2 ] , r ec_pos ) ) )

38 } 39 } e l s e { 40 return (0) 41 } 42 } 43 } 44

45 f i r s t_method_h i e r a r c h i c a l <− pbsapply ( c ( 1 : nrow (p_t e s t ) ) , f u n c t i o n

( x ) sapply ( c ( 1 : 5 0 ) , f u n c t i o n ( y ) f i r s t_method_h i e r a r c h i c a l_ c l a s s i f i c a t i o n ( which (p_t e s t [ x , ] == 1) ,2409 , recommended_ p o s i t i o n [ y , x ] ) ) )

Conclusioni

Il lavoro presentato nella tesi consiste nella realizzazione di un sistema di recommendation applicato alla tematica della ricerca ed oerta di lavoro. Non avendo dati su cui lavorare la prima fase è stata quella dell'ottenimento dei dati dal social media LinkedIn, particolarmente adatto al tema di interesse perché anch'esso orientato al lavoro. I proli pubblici degli utenti LinkedIn contengono diversi tipi di informazioni, l'attenzione è stata posta in particolar modo sugli skill, le capacità di una persona, e sulle position, le posizioni lavorative. Sono stati sviluppati diversi metodi di recommendation:

• Skill Recommendation che suggerisce agli utenti i possibili skill da aggiungere al prolo sulla base delle altre informazioni di prolo inserite dall'utente. Se- guendo il lavoro di un paper di ricercatori LinkedIn [6], sono state fatte diverse prove utilizzando vari feature set; la recall@50 (suggerendo 50 skill) massima ottenuta è di 0.516

• Job Recommendation 1. Questo metodo utilizza la similarità di Jaccard tra skill e position per inferire le position più ani ad un set di skill, dove il set di skill può rappresentare una persona che cerca lavoro o un particolare prolo ricercato da una azienda. Misurando la recall binaria, ovvero 1 se la position suggerita è quella vera, 0 altrimenti, si raggiunge un valore @50 di 0.644. Misurando la recall sfruttando la gerarchia di position realizzata, ovvero calcolando il grado di correttezza della position suggerita, si ottiene un valore @50 di 0.862.

• Job Recommendation 2. Questo metodo sfrutta la gerarchia di position realizzata applicando l'LSA alla matrice di co-occorrenze skill-position. La recall è misurata come numero di scelte corrette all'interno dell'albero diviso per il numero di step per arrivare alla position vera. Il valore ottenuto è 0.248, mol-

to buono considerando che a dierenza dell'altro metodo che suggerisce più position (da 1 a 50), questo ne suggerisce una sola.

La linea guida di tutto il lavoro è stata quella di cercare di sviluppare un recommender system generale, che ora diverse opportunità in base all'utente nale del sistema. Le possibili applicazioni infatti sono molteplici, per esempio se il fruitore del sistema è una persona che cerca lavoro esso potrebbe supportarlo nella scelta dell'oerta di lavoro più ane alle sue caratteristiche, se invece è un'azienda il sistema potrebbe aiutarla nella scelta di candidati per una precisa posizione oppure per una posizione caratterizzata da un insieme specico di skill.

Bibliograa

[1] Gediminas Adomavicius e Alexander Tuzhilin. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. Knowledge and Data Engineering, IEEE Transactions on, 17(6):734749, 2005. [2] Rakesh Agrawal, Heikki Mannila, Ramakrishnan Srikant, Hannu Toivonen, A Inkeri Verkamo, e altri. Fast discovery of association rules. Advances in knowledge discovery and data mining, 12(1):307328, 1996.

[3] Scott W Ambler. Mapping objects to relational databases: What you need to know and why. Ronin International, 2000.

[4] Sitaram Asur e Bernardo A Huberman. Predicting the future with social media. In Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on, volume 1, pp. 492499. IEEE, 2010.

[5] Marko Balabanovi¢ e Yoav Shoham. Fab: content-based, collaborative recommendation. Communications of the ACM, 40(3):6672, 1997.

[6] Mathieu Bastian, Matthew Hayes, William Vaughan, Sam Shah, Peter Skomo- roch, Hyungjin Kim, Sal Uryasev, e Christopher Lloyd. Linkedin skills: large- scale topic extraction and inference. In Proceedings of the 8th ACM Conference on Recommender systems, pp. 18. ACM, 2014.

[7] Chumki Basu, Haym Hirsh, William Cohen, e altri. Recommendation as classication: Using social and content-based information in recommendation. In AAAI/IAAI, pp. 714720, 1998.

[8] Christian Bauer e Gavin King. Java Persistence with Hibernate. Manning Publications, 2006.

[9] John S Breese, David Heckerman, e Carl Kadie. Empirical analysis of predictive algorithms for collaborative ltering. In Proceedings of the Fourteenth conference on Uncertainty in articial intelligence, pp. 4352. Morgan Kaufmann Publishers Inc., 1998.

[10] Zhixiong Chen e Delia Marx. Experiences with eclipse ide in programming courses. Journal of Computing Sciences in Colleges, 21(2):104112, 2005. [11] Chen-Fu Chien e Li-Fei Chen. Data mining to improve personnel selection

and enhance human capital: A case study in high-technology industry. Expert Systems with applications, 34(1):280290, 2008.

[12] Courtney Corley, Armin R Mikler, Karan P Singh, e Diane J Cook. Monitoring inuenza trends through mining social media. In BIOCOMP, pp. 340346, 2009.

[13] Pedro Domingos. Mining social networks for viral marketing. IEEE Intelligent Systems, 20(1):8082, 2005.

[14] Korry Douglas e Susan Douglas. PostgreSQL: a comprehensive guide to building, programming, and administering PostgreSQL databases. SAMS publishing, 2003.

[15] Susan T Dumais, George W Furnas, Thomas K Landauer, Scott Deerwester, e Richard Harshman. Using latent semantic analysis to improve access to textual information. In Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 281285. ACM, 1988.

[16] Sudheep Elayidom, Sumam Mary Idikkula, Joseph Alexander, e Anurag Ojha. Applying data mining techniques for placement chance prediction. In Advan- ces in Computing, Control, & Telecommunication Technologies, 2009. ACT'09. International Conference on, pp. 669671. IEEE, 2009.

[17] James Elliott, Timothy M O'Brien, e Ryan Fowler. Harnessing hibernate. O'Reilly Media, Inc., 2008.

[18] Usama Fayyad, Gregory Piatetsky-Shapiro, e Padhraic Smyth. From data mining to knowledge discovery in databases. AI magazine, 17(3):37, 1996. [19] The R foundation. What is r? http://www.r-project.org/about.html.

[20] David Goldberg, David Nichols, Brian M Oki, e Douglas Terry. Using collaborative ltering to weave an information tapestry. Communications of the ACM, 35(12):6170, 1992.

[21] Adriana Mata Greenwood. Updating the International Standard Classication of Occupations, ISCO-08. UN, 2004.

[22] Pritam Gundecha e Huan Liu. Mining social media: A brief introduction. Tutorials in Operations Research, 1(4), 2012.

[23] Ido Guy, Naama Zwerdling, Inbal Ronen, David Carmel, e Erel Uziel. So- cial media recommendation based on people and tags. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pp. 194201. ACM, 2010.

[24] Will Hill, Larry Stead, Mark Rosenstein, e George Furnas. Recommending and evaluating choices in a virtual community of use. In Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 194201. ACM Press/Addison-Wesley Publishing Co., 1995.

[25] Ross Ihaka e Robert Gentleman. R: a language for data analysis and graphics. Journal of computational and graphical statistics, 5(3):299314, 1996.

[26] ILO. Isco international standard classication of occupations. http://www. ilo.org/public/english/bureau/stat/isco/, 2015.

[27] Christopher Ireland, David Bowers, Michael Newton, e Kevin Waugh. A classication of object-relational impedance mismatch. In Advances in Databa- ses, Knowledge, and Data Applications, 2009. DBKDA'09. First International Conference on, pp. 3643. IEEE, 2009.

[28] Andreas M Kaplan e Michael Haenlein. Users of the world, unite! the challenges and opportunities of social media. Business horizons, 53(1):5968, 2010. [29] Ioannis Konstas, Vassilios Stathopoulos, e Joemon M Jose. On social networks

and collaborative recommendation. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pp. 195202. ACM, 2009.

[30] George Lekakos e Petros Caravelas. A hybrid approach for movie recommendation. Multimedia tools and applications, 36(1-2):5570, 2008.

[31] Kristina Lerman. Social networks and social information ltering on digg. arXiv preprint cs/0612046, 2006.

[32] Greg Linden, Brent Smith, e Jeremy York. Amazon. com recommendations: Item-to-item collaborative ltering. Internet Computing, IEEE, 7(1):7680, 2003.

[33] Marek Lipczak. Tag recommendation for folksonomies oriented towards individual users. ECML PKDD discovery challenge, 84, 2008.

[34] Duen-Ren Liu e Ya-Yueh Shih. Hybrid approaches to product recommendation based on customer lifetime value and purchase preferences. Journal of Systems and Software, 77(2):181191, 2005.

[35] Pasquale Lops, Marco De Gemmis, e Giovanni Semeraro. Content-based recommender systems: State of the art and trends. In Recommender systems handbook, pp. 73105. Springer, 2011.

[36] Bradley N Miller, Istvan Albert, Shyong K Lam, Joseph A Konstan, e John Riedl. Movielens unplugged: experiences with an occasionally connected recommender system. In Proceedings of the 8th international conference on Intelligent user interfaces, pp. 263266. ACM, 2003.

[37] Hokey Min e Ahmed Emam. Developing the proles of truck drivers for their successful recruitment and retention: a data mining approach. International Journal of Physical Distribution & Logistics Management, 33(2):149162, 2003. [38] Deep Nishar. The next three billion [infographic]. http://blog.linkedin.

com/2014/04/18/the-next-three-billion/, 2014.

[39] Elizabeth J O'Neil. Object/relational mapping 2008: hibernate and the entity data model (edm). In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pp. 13511356. ACM, 2008.

[40] Neelamadhab Padhy, Dr Mishra, Rasmita Panigrahi, e altri. The survey of data mining applications and feature scope. arXiv preprint arXiv:1211.5723, 2012. [41] Zizi Papacharissi. The virtual geographies of social networks: a comparative

analysis of facebook, linkedin and asmallworld. New media & society, 11(1- 2):199220, 2009.

[42] Michael J Pazzani. A framework for collaborative, content-based and demographic ltering. Articial Intelligence Review, 13(5-6):393408, 1999. [43] Paul Resnick, Neophytos Iacovou, Mitesh Suchak, Peter Bergstrom, e John

Riedl. Grouplens: an open architecture for collaborative ltering of netnews. In Proceedings of the 1994 ACM conference on Computer supported cooperative work, pp. 175186. ACM, 1994.

[44] EDWARD C RILEY. International standard classication of occupations. Journal of Occupational and Environmental Medicine, 1(11):615, 1959.

[45] Matthew A Russell. Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More. O'Reilly Media, Inc., 2013.

[46] Börkur Sigurbjörnsson e Roelof Van Zwol. Flickr tag recommendation based on collective knowledge. In Proceedings of the 17th international conference on World Wide Web, pp. 327336. ACM, 2008.

[47] Meredith M Skeels e Jonathan Grudin. When social networks cross boundaries: a case study of workplace use of facebook and linkedin. In Proceedings of the ACM 2009 international conference on Supporting group work, pp. 95104. ACM, 2009.

[48] Ian Soboro e Charles Nicholas. Combining content and collaboration in text ltering. In Proceedings of the IJCAI, volume 99, pp. 8691, 1999.

[49] R Core Team. R language denition, 2000.

[50] D Michael Titterington, Adrian FM Smith, Udi E Makov, e altri. Statistical analysis of nite mixture distributions, volume 7. Wiley New York, 1985. [51] Sholom M Weiss e Casimir A Kulikowski. Computer systems that learn: Clas-

sication and prediction methods from statistics, neural nets, machine learning and exp. 1990.

[52] Wikipedia. Hibernate wikipedia, l'enciclopedia libera. http://it. wikipedia.org/w/index.php?title=Hibernate&oldid=66336951, 2014. [53] Wikipedia. Linkedin wikipedia, l'enciclopedia libera. http://it.

Nel documento Mining linkedin social media: Tecniche di data e text mining applicate a recommender system per la ricerca ed offerta di lavoro (pagine 106-117)

Conclusioni

Bibliograa

Bibliograa