3.5 Data and Top-firms set construction
3.5.1 Finding top-firms in the data
As discussed in Section 3.1, being able to filter out firms using the optimal allocation rule f is essential to estimate it. Were this not possible our estimates would not capture a unique underlying allocation rule used to assign workforce. This advocates for a careful selection of top-firms.
Based on the relationships in (3.4) and (3.5), a possible approach is to select firms with higher level of productivity. In fact, we know that using the allocation f maximizes it. However, this is a naive and likely poor solution as it is. It is well known that (e.g. Syverson,2011) productivity levels varies strongly across non homogeneous firms (e.g. different industries, different size etc.).
Moreover, there are different drivers for this productivity dispersion. If these are not taken into account and if they have much larger impact on productivity than workforce allocation, we are going to to capture these in our split and not the allocation rule f . Thus, a better approach is to first isolate the residual productivity that can be reasonably attributed to workforce allocation and only then select firms showing higher levels of productivity.
Controlling for other factors (the O variables in (3.4)) allows to split firms in (more or less) homogeneous bins and makes the selection of top-firms within bins more reliable. Nonetheless, there is a trade-off. On the one hand, adding more controls allow to better filter out residual productivity depending on allocation rule. On the other hand, adding too much of them will kill all the residual productivity dispersion, also that arising from the allocation rule. For example, in the extreme case of one firm-one bin, all the firms would be top-firms, and all the allocation rules would be optimal allocation rules.
For this reason, controlling for observables should be supported by economic theory. For example, it known that different industries may behave very differently in term of productivity one from another (Bernard and Jones, 1995or Syverson,2011), so it may reasonable to consider industries when splitting firms. Controlling for firm’s age, on the contrary, might be a more arguable choice: there seems to be contrasting evidence on the relationship between the age and productivity, at least for firms older than 10 years (see Kok, Brouwer, and Fris, 2005). Adding this control could result in adding just more noise to the selection of top-firms.
We now describe the procedure we use to split the data in top-firms and non top-firms. We also refer to the first set the learning set, because on this set we are going to estimate ˆf . The second set is also referred to as control set, and is the set of firms where we are going to evaluate f to compute the JAQ measures (see Remarkˆ 3.2.3). These will be used to assess the extent by which these firms deviate from the optimal rule f .
The learning set is constructed according to Algorithm5, which is briefly described as follows.
According to some criteria, we split the full data in bins and from these we extract a part of top-firms. Further refinement criteria are applied and the learning and control set are defined.
The criteria used to split the data in different bins are defined by control variables.
We now give an example of how the data are split according to Algorithm 5. Consider the
Algorithm 5: Create top-firms set
Input: X full firms data; B1, . . . , BM binning criteria; t thresholding criterion; s splitting variable; R further refinement criterion.
Output: learning set, control set.
for m P 1, . . . , M do
XmÐ tx P X : x satisfies Bmu
topmÐ tx P Xm : spxq ě tpXmqu. I.e. define top the elements of the subset for which the variable s is above a threshold, computed on the subset itself.
top Ð tptopmqmu. I.e. join the top elements.
learning set Ð tx P top : x satisfies Ru control set Ð Xzlearning set
following splitting criteria:
1. Size intervals: r50, 100s; r101, 250s; r251, 1000s; r1001, 6000s.
2. Year: 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010.
3. Industry: 58 categories of industries (see Table3.6).
A bin Bm will be defined as one of the possible combinations of the above criteria (e.g. observa-tions in year 2005, firms with size in r50, 100s and within industry 6). We split on value added per employee, and this defines variable s. For each bin, we compute the top productivity decile;
this defines the thresholding criterion t. Then, we collect for each bin, firms with s greater or equal than the (within bin) top productivity decile. Finally, we refine the set by keeping as top-firms only those that for at least 6 years were classified as top-firms in this way. This defines the refinement criterion R. Note that top-firms set is made by firm, and not by firm in particular years. Thus, if a firm is in this set, it is in this set for the whole data time window, even if for some of the year considered its productivity level was not above the threshold. The idea is that allocation rule is not changing with the years, while productivity may undergo some shocks.
Table3.5:SummariesonEmployeesCharacteristics.TotalObservations:28710464 (a)ObservationsonEmployeesperYear year2001200220032004200520062007200820092010 obs.2816199285744928182142834160282842929005562964534295365728366052900661 (b)Age min.16 25%33 50%43 75%53 max.99 mean.42.6 std.dev.12.47 (c)LaborMarketEx- perience min.0 25%8 50%19 75%29 max.83 mean.19.42 std.dev.13.07 (d)Tenure min.0 25%1 50%4 75%10 max.20 mean.5.80 std.dev.5.55 (e)Daysinunemploy- ment min.0 25%0 50%0 75%228 max.5181 mean.192.07 std.dev.365.28
(f)Educationlevel(Obs.) Basic3466087 Highschool13679968 Vocational3811652 University7668471 N/A84286 (g)Female/Malecompo- sition Female15592450 Male13118014
(h)Immigrantcomposition Immigrant4054687 Notimmigrant24655777 (i)Livesinbirth country Yes17321074 No11389390
(j)Graduatedduring recession Yes5311539 No23398925 (k)NarrowEducationlevel N.differentCategories347 Obs.SmallestCategory4 MedianObs.inCategories22670 Obs.BiggestCategory3663270 Mean.Obs.inCategories82739.09 Std.dev.Obs.inCategories245780.28
(l)Municipality(Obs.) N.differentCategories291 Obs.SmallestCategory5868 MedianObs.inCategories43725 Obs.BiggestCategory2569869 Mean.Obs.inCategories98661.39 Std.dev.Obs.inCategories198399.91
(m)N.industries workedin(since 1990) 029312 120627635 26807573 31086231 4142798 515612 61228 769 86 (n)N.firmsworkedat(since1990) 179381721113133 27507355124735 35655882131703 4364822414584 5205673915178 610482911663 74956791722 8215264185 989409191 1035025
Table3.6:SummariesonFirmsCharacteristics.TotalObservations:71432
(a)ObservationsonFirmsperYear
year2001200220032004200520062007200820092010obs.6884693667886891692871527476760072497528
(b)Productivity:valueaddedperemployee(MSEK2017)
min.´17.2825%0.4650%0.6075%0.81max.245.10mean.0.76std.dev.1.70 (c)FirmSize(inEmploy-ees)
min.5025%6550%9975%210max.52151mean.401.93std.dev.1592.15 (d)Sales(MSEK2017)
min.025%77.8450%166.4275%397.12max.120641.4mean.664.92std.dev.2954.57 (e)Totalassets(MSEK2017)
min.025%12.1950%61.7475%203.03max.362723.5mean.789.71std.dev.6959.61 (f)Firmagefrom1990
min.025%950%1375%16max.20mean.12.01std.dev.5.29
(g)Firmindustry
N.differentCategories58Obs.SmallestCategory7MedianObs.inCategories688Obs.BiggestCategory7322Mean.Obs.inCategories1231.59Std.dev.Obs.inCategories1511.39 (h)FamilyFirm
Yes10870No60562 (i)Stateormu-nicipalityowned
Yes9866No61566 (j)ListedFirm
Yes870No70562
(k)Broadindustry(Obs.)
Notharmonizedorother466Agriculture,hunting,forestry,andfishing429Mining141Manufacturing18846Utilities889Construction3753Wholesaleandretail11109Hotelsandrestaurants2070Transport,storage,andcommunications4761Financialintermediation1886Realestate,renting,andbusiness12486Publicadministrationanddefence1496Education3285Healthandsocialwork5577Otherserviceactivities4238
Table 3.7: Occupation Variable (SSYK). Right columns of panels (b), (c), (d) report values in number of employees observed in the data.
(a) Occupation Categories. Major groups, number of subdivisions and skill level.
Code Major groups N. Sub-major
groups
N. minor groups
N. Unit groups
Skill level (ISCO)
1 Legislators, senior officials and managers 3 6 29 N/A
2 Professionals 4 21 67 4th
3 Technicians and associate professionals 4 19 72 3rd
4 Clerks 2 8 17 2nd
5 Service workers and shop sales workers 2 7 27 2nd
6 Skilled agricultural and fishery workers 1 5 11 2nd
7 Craft and related trades workers 4 16 58 2nd
8 Plant and machine operators and assemblers 3 20 59 2nd
9 Elementary occupations 3 10 14 1st
0 Armed forces 1 1 1 N/A
Totals 9 27 113 355
(b) Composition Major groups (Obs.).
Not available 1317155
Managers 1279702
Professionals 5760078
Technicians and associate professionals 5580636
Clerks 2474938
Service, shop, and market sales workers 6105558
Skilled agricultural and fishery worker 110989
Craft and trades related workers 1787794
Plant and machine operators and assemblers 2597792
Elementary occupations 1695822
Total 28710464
(c) Minor groups composition
N. different Categories 113
Obs. Smallest Category 114
Median Obs. in Categories 115047
Obs. Biggest Category 4549131
Mean. Obs. in Categories 243451.19
Std. dev. Obs. in Categories 469362.06
N/A 1200480
Total 28710464
(d) Unit groups composition.
N. different Categories 355
Obs. Smallest Category 7
Median Obs. in Categories 19863
Obs. Biggest Category 1526392
Mean. Obs. in Categories 67435.17
Std. dev. Obs. in Categories 152118.08
N/A 4770978
Total 28710464