• Non ci sono risultati.

Email:[email protected] http://www.statistica.unimib.it/utenti/pennoni/ FulviaPennoniDepartmentofStatisticsandQuantitativeMethodsUniversityofMilano-Bicocca Dynamicsequentialanalysisofcareers

N/A
N/A
Protected

Academic year: 2021

Condividi "Email:[email protected] http://www.statistica.unimib.it/utenti/pennoni/ FulviaPennoniDepartmentofStatisticsandQuantitativeMethodsUniversityofMilano-Bicocca Dynamicsequentialanalysisofcareers"

Copied!
27
0
0

Testo completo

(1)

Dynamic sequential analysis of careers

Fulvia Pennoni

Department of Statistics and Quantitative Methods University of Milano-Bicocca

http://www.statistica.unimib.it/utenti/pennoni/ Email: [email protected]

(2)

Outline

I Introduction: thesurvey

I Target: transition from school to workusing sequencesrelated to activities experienced by individuals between the ages of 16 and 22

I Baselinecovariates (measured at the age of 16) describing individuals’ and their family’s characteristics

I Inferential analysis of the relation between the entire careers and the

baseline covariates based on theLatent Markov (LM) model

(3)

Introduction

I The interest is to study life coursesrepresented as sequences of states describing the activities experienced by young individuals

I The goal is to identify factors exposing young people torisk patterns i.e. careers dominated by unemployment, as well as factors "which encourage young people to remain infull-time educationafter the compulsory stage" (Amstrong, 1999)

I Thesurvey dataare about a representative sample of 712 young

people living in Northern Irelandwho became eligible to leave school for the first time in July 1993

I Young people in the education system in Northern Ireland were

(4)

Introduction

I The survey is made of two waves: the firstmade in 1995, and the

secondin 1999

I The data at the second waves are related to the transitions from school to work to identify those sequences if anyrisk patterns dominated by unemployment

I Themonthly activitieshave been collectedfrom 1993 to 1999when

they were first eligible to leave compulsory education (at the age of 16, July 1993) and an additional state, higher education, was added, possibly experienced after the age of 18

I The sample has beenstratifiedaccording to the choice made by

(5)

Sequential histories

I The individual careers were built considering the monthly activities experienced by individuals during the T = 72 months:

 Unemployment(U)

 still being atSchool(S)

 attending aFurther Educationcollege (FE)

 Higher Education(HE)

 being inTraining(T)

 Employed(E)

I According to the descriptive statistics the maintransitionsare those from school to higher education; from further education to

(6)

Sequential histories

I Theobserved patternsare reported in the figure, where individuals are arranged on the horizontal axis using a seriation procedure illustrated in Piccarreta and Lior (2010)

I The colours are related to theactivity experiencedin each month

(7)

Individual covariates

I The following set of (binary)background variableswas collected at the age of 16 (in July 1993, first month of observation)

Covariate % Male 47.31 Catholic 43.99 Region Belfast 11.25 South 27.49 SouthEast 10.61 West 14.96 Grammar school 16.47 Father unemployed 14.96

High grades at school 33.25

Father manager 22.38

Cohabiting with both parents 57.67

(8)

Individual covariates

I On the other hand, within our approach we suppose that the

response variables indirectly measure a latent traitwhich may represent for example attitudes, aspirationwhich may evolve over time also according to experiences

I As argued by Amstrong (1999) attitudes and aspiration "are likely to

have an important influenceon staying on over and above the

observed covariates"

I Therefore, we propose a LM model where the available covariates

(9)

LM model with covariates

I The LM model with covariates affecting the latent Markov model

(Bartolucci, Farcomeni, Pennoni 2013) has the followingnotation:  n: samplesize

 T : number of timeoccasions

 Yitcategoricalresponse variable for subject i at occasion t where i = 1, . . . , n, t = 1, . . . , T ,

 Yt: vector of response variables for each occasion

I For every i , the response variables in Yi are conditionally

independent given thelatent processV = (V1, . . . , VT): local

independence assumption

I The latent process V follows afirst-order homogeneous Markov

(10)

Measurement model parameters

I The parameters of the measurement model(conditional distribution

of each Yt given V ) are:

φy |v = fYt|Vt(y |v ),

for t = 1, . . . , T , v = 1, . . . , k

(11)

Inclusion of the covariates in the latent model

I The parameters of the model for the latent process(distribution of Vt) are:

 initial probabilities

πv |x= fVt|X(v |¯v ), v = 1, . . . , k

 transition probabilities(may depend on time):

(12)

Inclusion of the covariates in the latent model

I Inclusion of covariates is allowed by the adoption of suitablelink functionsfor the initial and the transition probabilities

I We use the formulationwhich assumes the multinomial logit model

for the initialprobabilities logp(V1= v |X = x ) p(V1= 1|X = x ) = logπv |x π1|x = β0v+ x 0β 1v,

I and the following for thetransitionprobabilities logp(Vt= v |V(t−1)= ¯v , X = x )

p(Vt = 1|V(t−1)= ¯v , X = x )

= logπv |¯vx πv |¯vx

= γ0¯v v + x0γ1¯v v,

(13)

Maximum likelihood estimation

I Estimationis performed by maximizing theobservedlog-likelihood

`(θ) =X i log  fY | ˜˜ X(˜yi|˜xi) 

 x˜i and ˜yi vectorsobserved datai = 1, . . . , n

 θ: vector of all modelparameters

I `(θ) is efficiently computed by using backwardrecursions also employed in the hidden Markov literature

I Likelihood maximization is performed by an EM algorithm (Baum et

al., 1970, Dempster et al., 1977) based on thecomplete data log-likelihood(`∗(θ)), i.e. the log-likelihood that we could compute

(14)

EM algorithm

I The EM algorithm performs two steps until convergence in `(θ):

E: compute theconditional expected valueof the complete data log-likelihood given the current θ and the observed data

M: maximizethis expected value with respect to θ

I Computing the conditional expected value of `∗(θ) at theE-stepis equivalent to computing the posterior probabilitiesof each latent state at every occasion

I The parameters in θ are updated at theM-stepby simple iterative

algorithms (explicit formulae are occasionally available)

I The model is estimated by using the R library namedLMest1

(15)

Path prediction

I By using an adapted version of the Viterbi(1967) algorithm a

forward recursion computes thejoint conditional probabilityof the latent states given the responses and the covariates ˆfV | ˜X , ˜Y(v |˜x , ˜y ) according to the ML estimates ˆθ

I Theoptimal global stateis found by means of computations and

maximisations of the joint posterior probabilities

I the functionest_lm_cov_latent of the package LMest is employed

(16)

LM likelihood parameter estimates

I We applied the functionsearch.model.LMtoselect the value of k

on the basis of the observed data

I When the model selection is performed according to the Bayesian

Information Criterionk = 5 latent statesare selected

I We show the maximum log-likelihood at convergence, the number of

parameters, and AIC and BIC indices for models from M1to M6

Model `ˆ #par BIC AIC

(17)

Results

I According to theestimated conditional probabilitiesthelatent states may be characterized as

“School → VS” “Employment → VE” “Further Education → VFE”

“Jobless or Training → VJ&T”

“Higher Education → VHE”

I VU&T refers to an alternation between unemployment and firm

training, consistently with the observed data structure

I The estimates of the β parameters allow evaluating the impact of

(18)

Results

(19)

Results

(20)

Results

I By averagingestimated initial probabilities conditioned to the covariates’s levels we can make the following comparisons

I The initial probabilities broken down byfather occupation(manager vs not manager)

Father manager: ˆπ1 Father not manager: ˆπ1 VS VE VFE VU&T VHE VS VE VFE VU&T VHE 0.21 0.23 0.14 0.42 0.00 0.18 0.25 0.14 0.43 0.00

(21)

Results

I By averaging the estimatedestimated transition probabilities conditioned to the covariates’s levels we can make many comparisons

- For example if we broken down byfather occupation(manager vs not manager) (from row ¯v to column v ):

Father manager: ˆπv |¯v Father not manager: ˆπv |¯v VS VE VFE VU&T VHE VS VE VFE VU&T VHE VS 0.95 0.01 0.01 0.01 0.02 0.94 0.02 0.01 0.01 0.01 VE 0.00 0.97 0.01 0.01 0.01 0.00 0.98 0.01 0.01 0.00 VFE 0.00 0.02 0.96 0.01 0.01 0.00 0.03 0.95 0.02 0.00 VU&T 0.01 0.05 0.03 0.91 0.00 0.00 0.04 0.02 0.94 0.00 VHE 0.00 0.02 0.07 0.00 0.98 0.00 0.01 0.08 0.00 0.89

- There is quite a lot ofpersistencyin the same state during the period of observation

(22)

Results

I The initial probabilities broken down bycatholic religion(catholic vs not catholic)

Catholic: ˆπ1 Non catholic: ˆπ1 VS VE VFE VU&T VHE VS VE VFE VU&T VHE 0.20 0.23 0.15 0.42 0.00 0.17 0.25 0.12 0.44 0.00

- Catholic individuals show ahigher probabilityto proceed withschool and further educationafter the end of the compulsory school compared to non catholic individuals

(23)

Results

I By averaging the estimatedestimated transition probabilities conditioned to the catholic religion

Catholic: ˆπv |¯v Non catholic: ˆπv |¯v VS VE VFE VU&T VHE VS VE VFE VU&T VHE VS 0.94 0.02 0.01 0.02 0.01 0.94 0.02 0.02 0.01 0.01 VE 0.00 0.98 0.01 0.01 0.00 0.00 0.98 0.01 0.01 0.00 VFE 0.00 0.03 0.95 0.01 0.01 0.00 0.03 0.95 0.01 0.01 VU&T 0.01 0.03 0.01 0.94 0.00 0.01 0.05 0.02 0.92 0.00 VHE 0.00 0.01 0.07 0.00 0.92 0.00 0.01 0.08 0.00 0.91

I The catholic have a higher probability ofremaining inU&T (unemployment and training) compared to their counterpart I They also show a lower probability to transit towardsE

(24)

Predicted sequence of latent states

100 200 300 400 500 600 700 10 20 30 40 50 60 70

Optimally decoded sequences

Individuals Months VS VFE VHE VE VU&T

I To each individual a (vertical) bar is associated, with colours depending on the predicted activityin each month (the ordering of individuals on the horizontal axes is the same used for the original sequences)

(25)

Results

I On the basis of the estimates of the covariate effects on the initial probabilities and on the transition probabilities we conclude that under the selected model

 To bemaleaffect positively the probability of begin in the class with moderate orhigh employmentwith respect to the class of low work employment

 Young people withhigh gradeshave a higher probability to change from low to high employment than subjects with low grades

 Young peopleleaving in the Belfasthave a higher probability to be employed than subjects leaving on the other areas

I The model is able to account for the longitudinal phenomena as well

(26)

Main References

I Amstrong D. M. (1999). School performance and staying on: a micro analysis for Northen Irland. The Manchester School,76, 203-230.

I Bartolucci, F. and Farcomeni, A. and Pennoni, F. (2010). An overview of latent Markov models for longitudinal categorical data, Technical report available at http://arxiv.org/abs/1003.2804.

I Piccarreta, R., Lior, O. (2010). Exploring sequences: A graphical tool based upon Multidimensional Scaling, JRSS Series A,ă173,165-184

I Piccarreta, R. and Billari, F. (2007). Clustering work and family trajectories using a divisive algorithm. Journal of the Royal Statistical Society, Series A,170, 1061–1078.

I Dempster A. P., Laird, N. M. & Rubin, D. B. (1977), Maximum Likelihood from Incomplete Data via the EM Algorithm (with discussion), JRSS-B,39, 1-38.

I Mcvicar, D., Anyadike-Danes, M. (2002). Predicting successful and unsuccessful transitions from school to work by using sequence methods. JRSSA,165, 317–334.

I MacDonald, I. and Zucchini, W. (1997). Hidden Markov and Other Models for Discrete-valued Time Series. London: Chapman & Hall.

(27)

Main References

I Pennoni, F. (2014). Issues on the estimation of latent variable and latent class models. Scholars’ Press, Saarbucken.

I Bartolucci, F., Farcomeni A., Pennoni, F. (2014). Latent Markov models: a review of a general framework for the analysis of longitudinal data with covariates,with discussion, Test,23, 433-465.

I Bartolucci, F., Farcomeni A., Pennoni F. (2014). Rejoinder on: Latent Markov Models: a review of a general framework for the analysis of longitudinal data with covariates, Test,23, 484-496.

I Bartolucci, F. and Farcomeni, A. and Pennoni, F. (2013). Latent Markov Models for Longitudinal Data, Chapman and Hall/CRC press, Boca Raton.

I Baum, L.E., Petrie, T., Soules, G. and Weiss, N. (1970), A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains, Annals of Mathematical Statistics,41, 164-171.

I Bartolucci, F.; Pennoni, F. and Vittadini, G. (2011). Assessment of school performance through a multilevel latent Markov Rasch model, Journal of Educational and Behavioural Statistics ,36, 491-522

Riferimenti

Documenti correlati

In the sections 2, 3 and 4, we prove our main existence results (Theorem 1.1) and the existence of an optimal domain for a fixed vector field (Theorem 1.5), as well as the existence

Un unico ingresso, dalla profonda strombatura, si apriva in facciata, in corrispondenza della navata centrale; alla fine dell’Ottocento, delle finestre originali rimaneva aperta

The present study aims to investigate inhibition in individuals with Down Syndrome compared to typically developing children with different inhibitory tasks tapping response

Fulvia Pennoni - University of Milano-Bicocca - A dynamic causal LM model, Catania July 4, 2017.. , y iT |z i ) the manifest probability;. I The complete data log-likelihood ` ∗ (θ)

Fulvia Pennoni - University of Milano-Bicocca - Optimal model-based clustering..., IFCS 2017, Tokai University

Fixed time search: The islanders perform a fixed number of searches in each cycle, accumulating all the hardtack they find (this is the ‘‘basic’’ model described before)..

[105] E' impossibile in questa sede dar conto del percorso teorico di approfondimento che ha interessato, nella seconda metà del novecento, i temi del coordinamento e

(Pubblicazioni del Centro di studi interdisciplinari sulle province romane dell'Università degli studi di