**Development and validation of a university students’ progression in **

**learning Quantum Mechanics through exploratory factor analysis and **

**Rasch analysis**

We report an empirical study on the development and validation of a Learning Progression (LP) in Quantum Mechanics (QM) at the university level. Starting Drawing from a systematic review ofonpublished research results on about students’ reasoning in QM, we designed a hypothetical LP (HLP) consisting of three initial Big Ideas: Measurement, Atoms and Electrons, and Wave function. We then developed a measurement tool based onten Ordered-Multiple-Choice (OMC) items to assess the construct validity and the hierarchy of HLP levels of the HLP. We administered the questionnaire to 244 students attending the Bachelor in Physics, divided into three groups under different instruction conditions: no course, introductory course, introductory and upper-level course. An additional group of 43 non-physics students, who attended an introductory QM course, was also involved to inspect the role of physics background

knowledge in physics. We used exploratory factor analysis and Rasch analysis to analyse collected data. The results provided evidence for the revision of the HLP around only two Big Ideas – Atomic description and measurement; Wave

function and its properties in the measurement process – which roughly match the content areastopics covered,respectively, by the introductory and upper-level courses, respectively. However, the hierarchy of hypothesized levels was substantially confirmed. We discuss theIimplications of our findings in the light of existing research inforphysics the teaching of education about QM and the improvement of the revised LP are also discussed. Finally, we identify some steps to further improve the revised learning progression and the measurement tool.

Keywords: Quantum Mechanics; Learning Progressions; Rasch Analysis

**Introduction**

The breakthroughs of Quantum Mechanics (QM) (e.g., quantum computing,

sub-nuclear particles and exotic nuclei, …) and its technological applications in many fields (microelectronics, materials science, information security, renewable energies, medical diagnostics,metrology, sensing, imaging, telecommunications, electronics, material science, information security, …) are strong stimuli for students who access a career in Physics (Oon & Subramaniam, 2013). HenceFor this reason, freshmen in Physics are motivated or at least interested to QM (Levrini et al., 2016). HoweverNevertheless, QM remains a difficult troublesome topic, because its main leading concepts fall out of the realm of everyday experience and are hence perceived, by both students and instructors, as abstract and demanding afor the high level of conceptualization required (Johnston, Crawford and Fletcher, 1998). Typical difficulties concern, e.g., the (obviously

mysterious topuzzling for students) “dual nature of electron alternatively as a wave or a particle” (Ayene, Kriek, & Damtie, 2011; Levy-Leblond, 1988; Manilla, Koponen & Niskanen, 2002) and or the understanding that atomic orbitals are wave functions (Tsaparlis & Papaphotis, 2002; 2009).

Reasons for the difficulty of teaching and learning QM were investigated in many previous works (Krijtenburg-Lewerissa, Pol, Brinkman & van Joolingen, 2017; Singh & Marshman, 2015). First, the empirical basis of QM mainly consists of complex experiments that seldom are seldom object of practical activities and more often of simulated experiment (Sayer, Maries & Singh, 2017). Under these circumstances, it is easily understood why students perceive QM as highly theoretical and have difficulties in reconnecting theory to experiments (Ke, Monk & Duschl, 2005). Second, QM systematically obliges the students to reconsider the key concepts of classical physics. Terms such as “state of a system”, “observable quantity”, “measurement”,

“probability”, ..., all have a different meaning in QM. The students are hence called to
*refer to “the objects of quantum theory by using the vocabulary of classical physics” *

(Lautesse et al., 2015) and to build a bridge between classical and quantum physics, which is only viable if the classical concepts are clear and sound (Marshman & Singh, 2015). Third, teaching of QM does require a harmonization and integration of Physics and Chemistry concepts (Mulder, 2011), a process that is rarely achieved, especially at the university level. Finally, the exposure to inaccurate sources of information

(including social media) may lead to misconceptions about advanced concepts in QM, such as entanglement.

To address such issues, several research-based instructional strategies have been developed at introductory and advanced level (Witmann, Morgan & Bao, 2005; Zhu & Singh, 2011; 2012a; 2012b; Zollman, Rebello & Hogg, 2002). However, such efforts have not yet produced a coherent framework to describe the learning pathways followed by students since the first year of university, when they are involved in QM courses. This work addresses this issue by investigating through the learning progression (LP) approach how university students develop their understanding of QM. The reason for adopting a LP approach is that QM provides a comprehensive description of the natural phenomena at the microscopic level and plays a central role in modern science and up-to-date technological achievements. Hence, QM can be considered a core concept in science for which it is of interest to develop a LP.

As better described later, the development and validation of LP is based on an iterative approach. It starts from a first hypothetical LP (HLP), which is refined after the analysis of the test results. In this paper, we adopted an initial students’ progression designed from research findings and curriculum materials. Then, we validated it with empirical data obtained through a ten-question instrument. The specific research questions that guided this study are the following:

RQ2) How do students progress in their understanding of QM when exposed to different teaching conditions, from introductory to upper-level university course?

To this aim, we involved a sample of students attending a Bachelor course in Physics and a control group of students attending a Bachelor course in Mathematics at a university in Southern Italy. We present the exploratory factor analysis and a one-dimensional Rasch model of collected data. Finally, we discuss how the results led to HLP refinements to better describe students’ achievement in learning QM.

**Theoretical Framework**

**Learning Progressions**

For the purpose of this study, we assumed the following definition: “[LPs] are descriptions of the successively more sophisticated ways of thinking about a topic that can follow one another as children learn about and investigate a topic over a broad span of time” (Duncan & Hmelo-Silver, 2009; Smith, Wiser, Anderson & Krajcik, 2006; Wilson & Bertenthal, 2006). Rooted in a developmental view of learning, the LP framework assumes that students learn a given science content over an extended time period starting from their intuitive ideas and progress through subsequent cognitive stages of more sophisticated understanding of the topic (Carey, 1985; Driver, 1994; Vosniadou, 2002). Intrinsically, LPs allow to iteratively describe the interactions between science contents, instructional methodologies and assessment strategies used to investigate students’ achievements (Furtak, Morrison, & Kroog, 2014). Therefore, LPs should be informed by empirical evidence about students’ conceptual understanding and, as hypothetical models, validated (Duncan & Rivet, 2013; Plummer, 2012; Rivet & Kastens, 2012). An aspect of the analysis of any LP validation process, which is

particularly relevant for the present study, is that collected data should be interpreted considering the specific teaching/learning models that informed the instructional practices through which the students have learnt about the content targeted in the LP (Corcoran, Mosher & Rogat, 2009; Shavelson & Kurpius, 2012). When such careful analysis is carried out, LP research can inform suitable instructional activities more responsive to students’ reasoning strategies about a specific topic (Alonzo, Robinson, Christensen & Lee, 2017).

Concerning the development of LPs, most scholars agree on an iterative approach (Sevian & Stains, 2013), which can be implemented as follows. First, an initial LP is hypothesized on the basis of the literature and reference syllabus, then data are collected through a suitable instrument to investigate the alignment with the actual students’ achievements. If the alignment is not acceptable, the measurement instrument and the initial LP need to be revised. The cycle ends when alignment between actual and hypothesized progression becomes sufficiently satisfactory (Neumann, Viering, Boone & Fischer, 2013).

However, despite efforts in the science education community, a consensus on how to implement the above cycle to develop LPs has not yet been reached. Several models to guide the design and validation of HLPs have been proposed so far. For the sake of brevity, we shortly review here only those that are relevant for the present study.

At early stages of the research about LPs, when even the term LP was not yet adopted, Briggs, Alonzo, Schwab, & Wilson (2006) first proposed the BEAR

assessment system (BAS) to address the issue of the weak connection between results of the research about students’ misconceptions and what they called “developmental progression of student understanding” (p. 37). The paper by Briggs et al. brings forth two central ideas: the ordered-multiple choice (OMC) item format and the construct

map. The two ideas are strictly linked: construct maps describe how students progress from lower to higher levels of understanding in a specific discipline domain, while answer choices in OMC correspond to specific levels of the construct map. The BAS approach was later refined and generalized in subsequent papers by Wilson and colleagues. In the most recent one, Morell, Collier, Black & Wilson (2017) develop a LP about structure of matter building on four phases: (i) development of the construct map; (ii) systematic design of tasks that operationalize the designed construct map and are aimed to measure the level at which a student is located; (iii) definition of an outcome space, which maps the students’ responses onto the levels of the construct map through specific scoring rubrics; (iv) use of the measurement model, e.g., Rasch

analysis, where the outcome space is mapped back to the original construct map with the aim of revising and improving it. The authors’ main claim is that the adopted approach allowed them to develop a unified LP by empirically relate levels of students’ understanding across the identified constructs in the structure of matter domain. A similar, multi-dimensional approach was adopted in a different content domain by Plummer and colleagues. For instance, Plummer & Maynard (2014) developed a construct map about change of seasons, which is included in a larger LP about celestial motion. Plummer et al. (2015) propose a LP about solar system based on four construct maps: physical properties, dynamical properties, formation, and gravity.

Another approach is described in the paper by Hadenfeldt, Neumann, Bernholt, Liu, & Parchmann (2016), which also addresses the topic of structure of matter domain. Their model of LP features five components (p. 684-685): (i) “big ideas”; (ii) levels of understanding; (iii) assessment; (iv) instruction; and (v) boundaries, rationale and connections. The underlying rationale ofAccording to their model,is that LPs are developed for core concepts in science (e.g., force and motion, energy, matter) and

organized around a few big ideas. The latter are cross-cutting concepts across the theory that help students connect different phenomena, empirical laws and explanatory models related to the core concept (Duschl, Maeng, & Sezen, 2011; Krajcik, Sutherland, Drago & Merritt, 2012;). Big ideas can be compared to the core constructs in the Morell et al. (2017) approach. As such, also big ideas are structured in levels and progress indicators, which define what the students know at a given level. The first level is called lower anchor (LA), which may feature also students’ naïve or incorrect ideas about the topic. The intermediate levels (L1, L2, …, Lj) feature increasingly complex reasoning or incomplete explanations. The final level is called upper anchor (UA) and corresponds to the accepted scientific explanation of the targeted concept according to the chosen instructional level. Hadenfeldt et al. (2016) used four big ideas to explore the

progression of students in understanding the matter concept drawing on responses to an OMC-based questionnaire. A similar research design was adopted in Johnson (2013), where students’ responses to three multiple-choice instruments were used to develop a one-dimensional LP about the concept of substance.

The two approaches share three basic aspects: (i) in a given content domain, relatively few relevant concepts (core constructs or big ideas) are necessary to organize the students’ progression in understanding that specific content; (ii) such progression can be subdivided in increasingly sophisticated levels of understanding; (iii) some kind of quantitative assessment, as the OMC technique, can be used to map empirically measured students understanding onto such levels. Both approaches could have been chosen for the present study, which aims at developing and validating a LP about QM. However, since to our knowledge this is the first attempt in such direction, we chose the Hadenfeldt et al. (2016) model. Indeed, at this initial exploratory stage, our primary interest is capturing the actual students’ progression in understanding few relevant

concepts as a result of the current undergraduate teaching of QM, rather than establishing comparisons in students’ understanding across the chosen concepts or advising university instructors on how they might use our findings to design and change their QM courses. Such aims will be addressed in future steps of our ongoing research.

**Hypothesized LP**

To develop the HLP, we first had to choose the big ideas that are indispensable to develop a sound understanding of QM. Most A recent work (Krijtenburg-Lewerissa et al., 2017) suggests that QM features a few specific big ideas requiring high-level, abstract reasoning to describe the underlying physics: 1) Wave-particle duality; 2) Wave function; 3) Atoms; 4) Complex quantum behavior. The first big idea includes the dual behavior of photons and electrons, the double slit experiment, the uncertainty principle and the photoelectric effect. The second, the knowledge about wave function and potentials, probability and tunneling. The third, quantization and energy levels, atomic models, Pauli principle and spin. The final includes time dependent Schrödinger equation, quantum states, superposition and measurement. However, a review of students’ difficulties at undergraduate level (Singh & Marshman, 2015) suggests that the big ideas 2 and 4 can be collapsed given the central role of the wave function (or state vector) for a minimal interpretation of QM formal apparatus and experimental results. The first can be related to the interpretation of challenging experimental observations, since wave-particle duality big idea, as presented in

Krijtenburg-Lewerissa et al. (2017), concerns essentially experimental contexts and generalized to the bigger issue of measurement in QM. As a result, we tentatively developed our HLP around three big ideas, namely Atoms and Electrons (A&E), MEasurement (ME) and Wave Function (WF). The progress indicators are reported in Table 1. For the lower levels (roughly LA-L1), where students show more misconceptions and incorrect

reasoning, we built on past work on misconceptions in QM (Ireson, 2000; Johnston, Crawford & Fletcher, 1998; Olsen, 2002; Petri & Niedderer, 1998; Styer, 1996) and structure of matter (Stevens, Delgado & Kraijcik, 2010). For the upper levels (roughly L2-UA), where students increasingly hold more systemic views, we built on updated QM textbooks (e.g., Pade, 2014). A Not Defined (ND) level was introduced to account for students’ inability to answer a given question since they have never been taught about any topic related to the three big ideas.

Table 1 around here

**Methods**

**Instrument development**

To answer our research questions, we first designed a draft questionnaire with 25 open items starting from existing instruments (Sadaghiani & Pollock, 2015; Singh, 2001, 2008; Wuttiprom, Sharma, Johnston, Chitaree, & Soankwan, 2009). We piloted this version with about 30 third-year physics undergraduate students and 50 experienced physics teachers, including university professors and teaching assistants. The aim of this pilot phase was to establish content validity and optimize wording of the questions, as well as to collect a spectrum of possible reliable answers representing misconceptions or alternative ideas in QM. A sub-sample of three students and five teachers were interviewed after the questionnaire administration. Then, following the Ordered Multiple-Choice (OMC) technique (Briggs, Alonzo, Schwab, & Wilson, 2006), we selected ten items distributed across the big ideas that define the HLP and designed for each of them four answer choices as a function of the underlying HLP levels. Most of the answer choices corresponding to lower levels of the HLP were drawn from data collected during the pilot phase, while answer choices corresponding to the upper

anchor were mostly drawn from the literature. The distribution of answer choices across levels of the HLP understanding is shown in Table 2. Blank answers were mostly related to the ND level and LA, except for item 7 (L2 of WF big idea). The choice of selecting only 10 items was mainly related to time constraints for the questionnaire administration during university classes (about 40 minutes).

Then, for analysis purposes, the designed answer choices were grouped in three
*categories. A “limited or incorrect answer” corresponds to an inadequate understanding*
*of a concept or to a misconception. A “partial or questionable answer” corresponds to *
*ideas that are characterized by a combination of classical physics and QM. A “best *

*answer” corresponds to a possible synthetic expression of scientifically sound ideas. To *

each alternative, a raw score was hence assigned: 1 for a “limited or incorrect” answer choice, 2 for a “partial or questionable” answer choice, 3 for a “best” answer choice. Neither credit nor penalty was given for blank answers.

The administration procedure was as follows: before distributing the questionnaire, the students were made aware of the ongoing research study and

reminded they had 40 minutes to complete the questionnaire; then, once all students had received the sheet with the questions, they were asked to select only one of the four answer choices, the one corresponding, in their opinion, to the right answer; finally, they were invited to skip a question if they did not know the answer.

The complete version of the questionnaire is presented in the Supplemental material A, while the correspondence between each answer choice and the HLP levels, and the adopted scoring rubric are reported in the Supplemental material B.

**Teaching context **

This study was carried out mainly within the context of the three-year Bachelor Degree in Physics of a large university in Southern Italy. QM is taught in an introductory course (Introduction to quantum theory - QM1 from now on) and in an upper-level course (Fundamentals of Theoretical Physics - QM2 from now on). The two courses have the same lecture-based approach, with the students encouraged to ask questions mainly during the sessions devoted to solving exercises. However, the two courses strongly differ for the focus on the mathematical formalism used during instruction. In other words, tThe main aim of QM1 is in fact to familiarize students with phenomena related to QM and to help them develop a qualitative explanation of such phenomena. QM2, instead, aims at providing students with the mathematical skills necessary to calculate, for instance, the eigenvalues of an operator or its expectation value. To improve generalizability of our results, a QM course (Elements of Modern Physics – MP from now on) for Mathematics and Materials Science undergraduates in the same university was also included in the study. The teaching approach of MP is lecture-based, the same as QM1 and QM2. However, the focus of MP course is very similar to that of QM1, in that it concentrates on a qualitative description of phenomena related to QM and introduces students to the basic formalism in QM. We stress that, at the time of our study, the instructors of the three courses were all physics full professors with more than fifteen-year experience in teaching QM. Concerning pre-requisites, two calculus-based introductory physics courses (covering topics in mechanics, thermodynamics,

electromagnetism and optics) were required to attend QM1, QM2 and MP courses. The three courses were selected after checking that their syllabi were similar to those of corresponding courses held in other Italian universities. For all courses, the instructors used their own notes. A complete description of the courses is presented in the

Supplemental material C.

**Sample**

We involved 287 university students. The vast majority of the sample (N = 244) attended the Bachelor Degree in Physics. Students were randomly divided into three groups, G1, G2, and G3 (Table 3). G1 was formed by first-year undergraduates who did not attended neither QM1 and QM2, nor received any sort of formal instruction in QM at the university level. G2 was formed by second-year undergraduates, who had just taken QM1 but not yet attended QM2. G3 includesd third-year undergraduates who had attended both QM1 and QM2. The remaining part of the sample constituted the control group (N = 43, G-math). These students were enrolled in the third year of the Bachelor Degree in Mathematics and had previously attended the MP course after having passed the exams of the required calculus-based general physics courses.

Table 3 about here

**Data analysis**

Analysis was carried out through exploratory factor analysis (EFA) and one parameter logistic model (Rasch model). The EFA allows to identify underlying latent traits of a sample. In our case, we looked for emerging sub-categories across the ten

questionnaire’s items. Since items and their answer choices were related to the HLP levels, we tentatively interpreted the emerging factors as the minimal big ideas that empirically define students’ knowledge about QM.

We used principal axis factoring as method for EFA and extracted the factors using the Kaiser-Guttman rule, selecting only those with eigenvalues greater than 1 and resorting to a scree-plot to check the solution. We also looked for possible correlations

between factors, to study the extent to which the knowledge of an area of QM affected that of different ones. Factors were rotated using a Promax rotation.

To confirm the obtained patterns and assess the construct validity of the
questionnaire, we decided to couple the EFA with Rasch analysis (Bond & Fox, 2007;
Wallace & Bailey, 2010; Wilson, 2005). Given the design of the research instrument,
we used the Rasch analysis also to validate the ranking of the levels within the HLP
according to their difficulty. Since we used OMC technique, we adopted a partial credit
model (Masters, 1982). We used the Rasch partial credit model instead of a
two-parameter model since the software used in this study, Winsteps®_{ (Linacre, 2012), }

already provides a discrimination value for each item. To check the one-dimensionality of the questionnaire, we performed a Rasch’s Principal Component Analysis (PCA) of residuals. PCA tries to identify patterns in the data, after accounting for the variance explained by Rasch measures. As opposed to EFA, the aim of PCA is to falsify to identify to what extent the hypothesis that the unexplained variance is at our

measurements were affected by random noise level. The common patterns are identified by finding the principal components that account for the unexplained variance in the data. In PCA, a principal component is called “contrast” and the amount of the

corresponding unexplained variance is measured in eigenvalue unit. In PCA, differently than in EFA, . the eigenvalue of a contrast can be interpreted as the number of items that share a common trait. If two (or more) items share such a common trait, they likely concur to determine a possible “secondary dimension” (Linacre, 2012). Therefore, a contrast needs to have an eigenvalue of at least two to be above the noise level. For the aims of the present study, aAcceptable values for the variance explained by Rasch measures should be around 50%,.while Uunexplained variance, measured in strength of the items, should be less than two (Oon & Fan, 2017).

To investigate the instrument functioning, we explored items’ misfits through mean square (MNSQ) outfit and infit and summary statistics. . Infit and outfit statistics indicate for each item whether the students’ responses showed more randomness than expected. As a rule of thumb, acceptable values are between 0.7 and 1.3. For instance, an item with infit MNSQ of 1.4 has a variability that is 40% greater than what expected. We also calculated for each item its point-measure correlation, which indicates whether students’ scores correlate with levels of difficulty of the investigated construct, namely if more able students more likely answer well to a difficult item. Correlation for each item should be positive.

Four general Rasch indices were also reviewed: person and item reliability; Point-measure correlation, person and item separation item separation and person separation reliability were also calculated (Boone, Staver & Yale, 2015). Person

reliability can be interpreted similarly to classical Cronbach’s alpha (Linacre, 2012) and
hence is a measure of internal consistency of the instrument. Values should be above
0.5. Item reliability is another index of internal consistency of the instrument, in
particular, of how consistently the model differentiates the items on the basis of their
estimated difficulty. Suggested value should be around 1. Separation indices are
*conceptually equivalent to a t-test between 2 groups (Duncan et al., 2003). The *

separation indices are calculated as the ratio between the variance in the person (item) measures and the average error in estimating these measures. Clearly, the larger the index, the more distinct levels of persons (items) can be identified for the specific data set. More precisely, person separation indicates if the sample can be divided into distinct levels of increasing ability. Item separation indicates whether the items can be reliably located according to their difficulty on the latent trait that is being explored.

Suggested values for both indices are (Boone, Staver & Yale, 2015): above 1.50 (acceptable), 2.00 (good), 3.00 or above (excellent).

To further check the questionnaire’s reliability, we also calculated classical test theory indices as Cronbach’s alpha value, item difficulty and discriminationpoint-bi serial as a measure of item discrimination. To explore students’ ability distribution across the questionnaire’s items and to inspect differences across groups, we

investigated the Wright map of our data and performed an analysis of variance (one-way ANOVA) of students’ abilities. Finally, we performed a scan of the items for potential differential item functioning (DIF, Linacre, 2012) across the four groups. DIF is a technique to analyse whether items’ responses are biased with respect to a trait of the sample. In our case, differences could have been due to a higher degree of

familiarity of one or more groups with the topics targeted in the questionnaire.

**Results**

**Exploratory Factor Analysis**

Kaiser-Meyer-Olkin measure of sampling adequacy and Bartlett sphericity test were
calculated to investigate whether EFA was appropriate in our case. Obtained values
were 0.866 and 2_{ = 676.281, df =45, p<10}-4_{, respectively, which suggest that coherent }
factors can be identified. Two factors, which account for about 40% of the total

variance, were extracted. Table 4 shows how questionnaire’s items load onto these two factors.

Table 4 about here

The ten items are almost evenly divided into the two factors. However, only seven of them load strongly onto a single factor. In particular, items 5 and 8-10 load more strongly onto Factor 1, while 2-4 and 7 load more strongly onto Factor 2. Looking

in more detail at the items’ content, we can interpret Factors 1 and 2 as two minimal big ideas in QM: 1) atomic description and measurement, including Heisenberg’s principle (A&E – H); 2) wave function and its properties. Note that Ttopics related to big idea 1 (2) are roughly covered by QM1 (QM2), respectively (see Supplemental Material C). Smaller loadings of items 1, 6 and, to some degree, also of item 10 onto both factors suggests an overlapping between the two big ideas. Correspondingly, it also suggests that the topics targeted by these items are addressed in both courses. The above interpretation is graphically represented in Figure 1.

Figure 1 about here

**Rasch analysis – Principal component analysis of residuals**

The variance explained by Rasch measures was 47.1% and the eigenvalues for the first three contrasts eigenvalues forof the unexplained variance were 1.7, 1.3 and 1.2, respectively. All the values are smaller than the recommended value of 2 (Linacre, 2012). The result suggests that data are one-dimensional and hence the data “noise” did not distort the measurement of the latent trait. The PCA of residuals allows for a deeper insight into the data with respect to the EFA. We identified which items contributed to the noise by examining residual loadings for our data and their plot (Table 5).

Table 5 about here

Loadings of items 1-4 and 7 (big idea 2) are all positive, while loadings of items 5 and 8-10 (big idea 1) are all negative. These two groups of contrasting items roughly cluster to each other more than they do to other questionnaire’ items. They hence naturally correspond to topics targeted in QM2 and QM1 courses, respectively.

Moreover, item 6 clusters better with items 1-4 and 7, thus suggesting that it could be better related to big idea 2. A second noticeable pattern is that items 4 and 7 cluster to each other more than they do with items 1-3 and 6. Since items 4 and 7

correspond to an advanced knowledge of the wave function, our data support the interpretation that such part of big idea 2 could be best addressed in a more advanced course (which could be named QM3) after taking with respect to QM2. Such trends are summarized in Figure 2.

Figure 2 about here

The disattenuated correlation for each pair of clusters is 1.

**Rasch analysis – Items’ difficulties**

Table 6 reports Rasch model parameters and classical test theory indices. The latter can be considered acceptable for all items. Only item 5 shows a low point-biserial value, suggesting a low discriminating power of the item. Cronbach’s alpha is 0.82, which also can be considered a good value. Looking at the Rasch analysis values, we found no misfitting items, which means that Rasch model well describes our data. Only item 5 shows a low value of point-measure correlation. We will investigate further this item’s functioning with DIF analysis. Item reliability is 0.98, while item separation is 7.39, which ensures that the sample was large enough to consider the item difficulty hierarchy valid. Person separation is 1.70, while person reliability is 0.74, which are rather

satisfactory values. In particular, they confirm that the sample can be divided into 2 or 3 sub-groups (Boone, Staver & Yale, 2015, p. 230).

Table 6 about here

When looking at Rasch measurement of items’ difficulties (third column Table 6), we note that three of the most difficult ones items (7-4-3, difficulty > 0.30 logit) all load onto big idea 2. Two of them (7 and 4), moreover, cluster to each other more than they do to any other item in the PCA of residuals. Such finding supports that the UA of big idea 2 corresponds to the most difficult level of our LP. On the other hand, four of the easiest items (5-6, 8-9) load onto big idea 1. Hence, the UA of big idea 1

corresponds to the lowest difficulty level of our LP. In other words, the advanced properties of the wave function require a more advanced QM teaching, while basic knowledge about atomic description may be achieved with an introductory QM course. This evidence is summarized in Table 7.

Table 7 about here

Items analysis for Rasch PCA of residuals provides further details on the difference between big ideas 1 and 2 (Table 8). We note that the differences in items’ difficulty between contrast 1 and contrasts 2-3 are statistically significant (t = 2.723, df = 7, p = 0.030; t = 2.651, df = 7, p = 0.033; respectively). Therefore, the most

significant differences between big ideas 1 and 2 concern the advanced properties of the wave function.

Table 8 about here

**Rasch analysis – HLP levels’ difficulties**

Table 9 reports the hierarchy of the HLP levels according to the difficulty of the
corresponding answer choices for the three initial big ideas. Differences in difficulty
across all the levels are statistically significant (F = 9.789, df = 19, p<10-4_{). Eta squared }
is good (2_{=0.86), thus supporting the association between the HLP and the used }
instrument. To this concern, we note that the hypothesized progression in average
difficulty from the lower to the upper anchor seems to be confirmed (see last column of
Table 9). Looking within the levels of the three initial big ideas, the hypothesized
progression of levels seems to be confirmed for ME and A&E, taken separately. On the
contrary, we note some discrepancies between expected and observed difficulties for the
WF big idea. The Ddifficulty of intermediate levels (L1,and L2,and of L3 and L4)

does not increase as expectedit should be, thus suggesting the need for a revision of the hypothesized levels.

Table 9 about here

**Rasch analysis – Revised LP levels’ difficulties**

The aforementioned results led us to reduce the initial big ideas and revise the levels of the HLP. ME big idea was suppressed and Tthe two main themes within the latter, Heisenberg’s principle and the measurement process , were split and assigned to the levels of the new emerging big ideas according to their estimated difficulty estimate. Hence, we treat assigned Heisenberg’s principle as part of theto A&E big idea, which was renamed as Atomic description and measurement on particles (A&E – H).

Similarly, we included the measurement process in the WF big idea, which was
renamed as *Wave Function and its properties in the Measurement process (WF – M). *

The description of the levels of the revised LP is reported in Tables 10 and 11. The correspondence between the answer choices of the administered questionnaire and the levels of the revised LP is reported in the Supplemental material B.

Table 10 about here Table 11 about here

Finally, we performed again the Rasch item analysis to check the consistency of
the revised LP levels. The average measures for each big idea are reported in Table 12.
The trend of the average measures for the entire LP is graphically represented in Figure
3. Average difficulty across the six levels is significantly different (F = 43.464, df = 5, p
< 10-4_{). The association is also satisfactory (}2_{=0.87). Differences between two }

between L2 and L3 and between L3 and the upper anchor. The Tukey’s Honestly Significant Difference post-hoc test confirms that homogenous groups of levels are ND (p=1.000), LA/L1 (p=0.064), L2/L3 (p=0.918) and L3/UA (p=0.288).

Table 12 about here Figure 3 about here

**Rasch analysis – Students’ abilities**

The average ability of the four involved groups is reported in Figure 4. The differences
within the groups are statistically significant (F=40.256, p< 10-4_{). G3 students (third }
year physics) performed significantly better than G1 (first year physics) and G-math
students (p<10-4_{), while differences with G2 students (second year physics) are not }
statistically significant (p=0.081). Similarly, differences between G1 and G-math
students are not statistically significant (p=0.230). The Tukey Honestly Significant
Difference post-hoc test confirms that homogenous groups are G1/G-math (p=0.656)
and G2/G3 (p=0.106) are homogenous groups.

Figure 4 about here

In Figure 5, we plot the Wright map for our data set. Students are well

distributed across the ability scale, suggesting that items had a suitable difficulty for the sample as a whole. In the Wright map we report also the average abilities of the four groups of students and the average difficulty of the revised LP levels. We note that the ability of G1 students is equal to the difficulty of the LA of the LP. This means that G1 students have 50% chance of reaching such level. G-Math students’ ability is slightly less than the difficulty of L1, which means that they have approximately 50% chance of achieving such level. Levels from L2 up to the UA are more difficult to achieve for such students. Since G-math students had attended the MP course, our result suggests that such kind of courses help students achieve only a low level in our LP. We also note that

G2 students’ ability is equal to the difficulty of L3. Hence, likely, QM1 course for physics majors help students achieve at least a partial knowledge of the contents

targeted by the big ideas of our LP. Correspondingly, since G3 students’ average ability is slightly greater than the UA difficulty, likely, QM2 course helped students to achieve a sound understanding of the two big ideas in our LP. The above evidence confirms the findings from the EFA and the Rasch PCA of residuals. Specifically, QM1 plays a significant role in helping students to progress from the lower levels towards the upper levels of the LP. QM2, with respect to QM1, does provide students with conceptual tools that enable them to reason in a sound way about very basic aspects of the wave function. However, the overall percentage of students who have more than 50% probability to reason correctly about advanced properties of the wave function (e.g., superposition) is about 30%; and only 20% correctly answered the item about the wave function collapse. These results further support the argument that such topics could be better addressed in a more advanced course (the aforementioned QM3).

Figure 5 about here

**Rasch analysis - Differential Item Functioning (DIF)**

*We found only one item (5) that exhibited potential DIF as a function of groups with a *
possible meaningful effect size (Table 13). Item 5 seemed to have been interpreted
differently by G2 and G-Math with respect to G1 and G3.

Table 13 around here

As simplest remedy, we could have excluded item 5 from the analysis. However, given the small number of items in the questionnaire, we decided to retain this item and regard it as defining the big idea of measurement not in the same way across the groups. To this aim, we followed the procedure described in Boone, Staver & Yale (2015). We

first ran a Rasch analysis with the items that did not exhibit DIF. Results did not alter the ranking of the items difficulties as reported in Table 6. Then, we ran separate analyses for each group, after anchoring all the items difficulties except that of item 5. In Table 14 we report the different values of the difficulty of item 5 for the 4 groups. Results show that item 5 was considerably easier for G1 and G3 students, and more difficult for G2 and G-Math students.

Table 14 around here

A possible explanation is how the Heisenberg principle is taught in QM1 and in the MP courses with respect to QM2 course. In QM1 and MP courses, it is emphasized the experimental interpretation of the principle, while in QM2 the emphasis is on its mathematical formulation. The wording of the answer choices may have favoured those students that had treated the principle in a more abstract way. However, the values of item difficulty obtained by G1 suggested us to further inspect DIF. Hence, we

performed again the DIF analysis for the two homogenous groups, G1/G-math and G2/G3, as resulted from the Tukey Honestly Significant Difference post-hoc test. Table 15 reports the new DIF statistics. Results show that now the effect size of DIF for item 5 is well below the 0.64-threshold suggested by Linacre (2012). Such evidence further supports the choice to retain item 5 in the analysis.

Table 15 around here

**Discussion of the results**

Scientific research in fundamental and applied QM progresses very rapidly and enriches the subject with new challenging experiments, testing theoretical models and

interpretations,and potentially with a potential strong impact on future technologies and, even, on everyday life (Karakostas & Hadzidaki, 2005). This richness of

applications has suggesteds to spread and anticipate QM study in several national contexts (England GCE Ad-L, France, Norway, US NGSS), including the one of this study. However, the bases of QM are taught very briefly at the end of the last year of secondary school, with the consequence that the majority of students fails to grasp even the essential ideas of QM (Stevens, Delgado & Krajcik, 2010). Moreover, often

teaching is often limited to the semi-classical models (mainly, the Bohr model of hydrogen atom and the De Broglie relations) that may hinder the comprehension of purely quantum concepts (Fischler & Lickfeldt, 1992). Finally, even the choice of the topics that are deemed as essential in the teaching of QM at secondary school is a matter of controversy (Michelini, Ragazzon, Santi & Stefanel, 2004). These considerations suggest that secondary school teaching of QM remains problematic as far as the conceptual understanding of basic QM ideas.

Even if university teaching of QM takes advantage of an extended period of time to develop scientifically correct ideas, literature in pPhysics education has extensively shown that also upper-level physics students also have several difficulties in

understanding basic concepts of QM (Singh, 2001). It seems then highly desirable to investigate how to adjust the teaching focus and the learning pathways didactic path towards QM knowledge at the university level. Nevertheless, it is not trivial to envisage which topics should be considered “easier”, and which others “harder” for an average class, and hence which logic development, i.e., didactic sequence, may be more

effective in this new situation. The LP framework may hence be suitable to address the above issues. This study aimed to provide details on the empirical validation and subsequent revision of a tentative LP in QM for university students.

The present study hence adds to the field, where several papers addressed so far the students’ difficulties in QM, but they were mainly focused on very specific elements

of the undergraduate syllabus. Therefore, previous studies were not organically developed to cover the basic quantum ideas. Moreover, this subject still raises great interest and deserves further deepening, since the teaching of QM follows routes that significantly differ from those of classical physics or of any other scientific discipline (Lautesse, et al, 2015). In view of the perceived weirdness of QM, it is of the utter importance to optimize the path from the LA to the UA of specific big ideas that define this core concept in science. To contribute to the development and validation of this first, tentative LP in QM, we investigated three different groups of students attending a Bachelor Degree in Physics, thus covering each step of their training, plus a control group, made up of students attending the second year of a Bachelor Degree in Mathematics, who had previously been taught QM in the context of a course for non-physics majors.

Every general program of LP validation is intrinsically very ambitious (Plummer, 2012). To proceed, we hence carefully delimited the scope of our

exploratory work. First, we stress that our LP represents how students develop their understanding of QM along few content dimensions and not an ideal developmental path (Foster & Wiser, 2012). Hence, we designed a very simple linear model of LP, initially based on just three basic big ideas that are however critical for understanding the quantum theory: measurement (ME), atoms and electrons (A&E), wave function (WF). Moreover, we designed a light evaluation tool, consisting of a multiple-choice test of only 10 items. On one hand, this allowed easy and widespread administration during class periods. On the other, it was sharply cut to represent a meaningful

collection of core questions in QM. Our expectation was,then, that a careful analysis of data would have at least indicated the feasibility of the HLP in such a complex didactic

context as QM is teaching at the university level. In the following, we discuss to what extent this goal has been accomplished.

**RQ1**

As said above, Tthe first aim of the study was to investigate whether the HLP,

developed around three big ideas – A&E, ME and WF – adequately described students’ understanding of QM. We applied EFA and Rasch analysis to test our initial hypothesis. Our data supported some amendments to HLP, leading us to develop a revised LP. The main difference between the HLP and the revised LP was the suppression of the ME big idea and the consequent organization of the revised LP around two distinct yet partly overlapping big ideas (see Figure 1). The first corresponds to the atomic description and measurement on particles (A&E – H), the second to the wave function and its properties in the measurement process (WF – M). In particular, according to the difficulty of the corresponding item, we assigned the Heisenberg principle to the first big idea. A plausible justification is that Heisenberg principle in its most basic formulation involves observables that pertain to particles, namely electrons, and hence it could be better related to the atomic description framework. We can justify in a similar way the

revision of the second big idea, since the process of measurement in QM is intrinsically related to the nature of the wave function. The average difficulty of the levels of the resultant LP across the new two big ideas increases significantly when moving within initial levels (ND, LA, L1) and towards the intermediate/high levels (L2, L3, UA) of the LP (see Figure 3). Such structure is confirmed by the PCA of residuals: Hencewhile, empirically, the clusters of items are measuring the same background core concept (QM), two groups of contrasting items can be found, and the two groups correspond exactly to the two big ideas. .Moreover, when looking at the Rasch measurement of items’ difficulties, we can further discriminate within the new big idea 2, identifying the

wave function superposition and collapse as the most difficult concepts in our LP (See Figure 2). In other words, first the students acquire a basic knowledge sufficient to describe atoms, electrons and the dual wave-particle nature of matter. In doing so, they develop a more complex model of the microscopic world able to explain atomic stability. Then, they re-interpret the measurement process in QM through the wave function formalism. As such, our LP suggests that, from a conceptual perspective, the understanding of QM requires links across the two big ideas. Such links are suggested by the loadings of the items 1, 6 and 10 on both the two factors of the EFA and are confirmed by the Rasch analysis of items’ difficulty (Table 6). . For instance, More specifically,atomic stability of atoms (item 10) , as taught in an introductory course, may be difficult to understand explain for students likely because stationary waves can be best understood only with a suitable knowledge of the wave function and of the Schrödinger equation. Conversely, university introductory teaching can suitably support a qualitative understanding of the differences between the measurement process in classical physics and QM (item 6) and of the wave behavior of matter (item 1), can be achieved if typical experiments, such as electron diffraction (where both issues emerge in the same context), are discussed in detail. As a consequence, as suggested also by the EFA, since the correlation between the new two big ideas is high (0.64), the

understanding of introductory concepts necessarily affects the knowledge of advanced QM, yet not being sufficient to ensure it.

Our results may shed light also into how students develop their understanding within each big idea.

Concerning the first one, A&E-M, our findings confirm that students develop their knowledge from classical electromagnetism and electromagnetic waves towards

atomic stability through five hierarchically ordered levels (Table 11). In particular, students learn first quasi-classical models of the atom before developing more

sophisticated atomic models involving probability aspects (atomic orbitals).. In doing so, students learn the basic aspects of QM measurement, including Heisenberg principle.

For the wave function big idea, we found only some slight discrepancies between the expected sequence of levels and the students’ actual reasoning progression. The slight differences led us to revise the intermediate levels of the five-level HLP to arrive to a proposal in which we assumed that students first reinforce their basic knowledge of the wave nature of matter (Baily & Finkelstein, 2010) and then develop a more coherent view of some nuanced conceptual aspects involved with of the wave function big idea. In doing so, the students progress also towards more complex aspects of the

measurement process in QM and of its interpretation when addressing the collapse of the wave function. Looking in more detail at the Wright Map in Figure 5, we note that students’ understanding of the wave-particle duality and of the wave function

progresses initially in parallel to the first big idea themes (in particular, Heisenberg principle and atom stability). Then, the students’ progression accelerates when dealing with more advanced concepts, such as the time evolution of wave function, the

superposition of stationary states, and the interpretation of QM measurement in terms of the wave function collapse (Greca & Freire, 2003; Zhu & Singh, 2012c).

**RQ2**

According to previous LPs studies, a systematic and careful analysis of a LP implementation needs to take into account the learning conditions under which students

develop their understanding (Shepard, 2017). The results of our study show unique evidence of this dependence in the differences between the responses of students exposed to different teaching conditions about QM. The second aim of the study was to describe how students progress in their understanding of QM, when exposed to different teaching conditions. To this purpose, Wwe analysed the students’ knowledge of QM at the end of: (i) high school; (ii) an introductory course for non-physics majors (MP); (iii) an introductory course for physics majors (QM1); (iv) an advanced course for physics majors (QM2). Overall, drawing on the analysis of both items’ difficulties and students’ abilities (see the Wright Map in Figure 5), we found that the lower and intermediate LP levels, where students’ understanding progresses in parallel along the two big ideas, are reached through high school teaching and introductory courses, such as QM1 or MP. The higher difficulty levels require advanced courses, such as QM2. Therefore, ideally, high school teaching + QM1 + QM2 (or high school teaching + MP + MQ2, for non-physics majors) form a didactic path that supports our revised LP. The analysis, in particular, shows that high school students, when starting the Bachelor in Physics hold naïve ideas about atomic structure, In particular, iin agreement with previous research studies (Olsen, 2002, Ireson, 2000). Amongst the items 8-10, the latter (atom stability) was very difficult for all students: more than 40% did not answer. Such findings confirm that high school teaching MP-QM1-QM2 courses does not provide sufficient connections between the new aspects of QM theory and the atomic models (McKagan, Perkins, & Wieman, 2008). A second possible interpretation is that high school teaching may leading lead to well known misconceptions as, e.g., those about atomic orbitals, where,: for instance, the concept of trajectory is often used in combination with probability considerations (the orbits become “blurred”). Such “intermediate”

the Bohr model, or that electrons move in a ‘wave-shaped’ trajectory (Stefani & Tsaparlis, 2009).To address such issue, the EFA and the PCA of residuals support that atom stability constitutes a necessary step to bridge the knowledge of atomic models and the knowledge about the wave function. In particular, This result further supportsas the claimed by McKagan, Perkins, & Wieman (2008),for whom a key stage to

familiarize students with the quantum description of the microscopic world is to compare how Bohr, De Broglie and Schrödinger models explain and interpret the atom’s stability.in terms of orbitals and hence of the wave function is to compare how Bohr, De Broglie and Schrödinger models explain and interpret atom stability. This perspective is probably much more effective than the so-called historical approach, which simply reviews the quasi-classical atomic models, as well as the phenomenology of the atomic spectra, the blackbody radiation and the photoelectric effect (Emigh, Passante, & Shaffer, 2013; Gil-Perez & Solbes, 1993; McKagan, Handley, Perkins, & Wieman, 2009). Our data confirm previously reported students’ difficulties (Jones, 1991; Kragh, 1992), suggesting that such an approach, while still largely adopted also in introductory QM university courses, remains problematic.

Afterwards, the progress of students enrolled in the Bachelor in Physics is discontinuous, with an acceleration significant improvement in their performances in the sequence after attending the QM1 course and a more gradual refinement between theQM1 andafter attending also the QM2 course. For instance, we found, as expected, that on average only physics students who had attended both QM1 and QM2 courses, correctly answered advanced items about the wave function. This is consistent with the focus of undergraduate upper-level physics courses. However, some conceptual issues about the meaning of the wave function came out from our data with students who had attended QM2. One reason is that even in the advanced course the phenomenological

bases of QM are not clearly explained, and laboratory classes fail to bridge this gap. A second reason, which emerged from interviews conducted after the questionnaire administration, is that students were not confident enough of their own interpretation of some parts of the instrument text. This is an indirect clue of the critical role of language in QM teaching. However, this fact also points to a criticality of advanced courses, where students are heavily charged by the mathematical aspects of the theory (Zhu & Singh, 2012a; 2012b; 2013) and may miss some experimental or practical implications, or, at least, the thorough development of a mature scientific language describing the subject. In other words, as found by Singh (2008), the highly abstract mathematical formalism of QM often overshadows the meaning of physical quantities involved in calculations. A third reason, related to the previous one, could be the nature of the questions used in this study, which only addressed conceptualization and interpretation of the theory, and did not focus neither on technical and formal aspects, nor on problem solving skills in QM contexts. The latter represent exactly the main focus of the

advanced QM2 course. For this reason, it is not surprising that G3 scores better than G2 (1.06 vs. 0.73 logits), due to a longer exposure to key concepts of QM, but does not outperform it. The drawback of this finding is that the gap between conceptual and formal learning may persist during the path from introductory to advanced level courses of QM, as also previously observed (Singh & Marshman, 2015). As an example, in the QM2 course, Schrödinger equation is solved with a rigorous mathematical approach for hydrogen atom, harmonic oscillator and potential wells, but the interpretation of atom’s stability in terms of wave behaviour remains somehow hidden.

Two further explanations can be given to account for the variable achievements of the involved groups. The first one concerns the difference between physics and non-physics majors. G1 (freshmen physics majors) showed almost the same ability (-0.17 vs

-0.01 logit) as the control group G-math (third year mathematics majors-0.17 vs -0.01 logit). Such result was partially unexpected, since the latter had attended MP, which is a typicalnQM introductory course at the university level, albeit for non-physics majors. However, as pointed out in the introduction, we are aware that students attending the Bachelor in Physics are highly motivated, and presumably score higher marks in physics already at the high school, or even go beyond school syllabi on their own, before enrolling into undergraduate courses (Ireson, 2000). The gap becomes more significant during their university studies, since G2 and G-math performed significantly differently (0.73 vs. -0.01 logit). To the latter concern, the different achievements of G2 and G-math suggest that the introductory courses as QM1 or MP haved very different impact on students, perhaps depending both on their initial background knowledge and on their motivation. However, it is worth mentioning that the students attending a Bachelor in Mathematics may be spontaneously more attracted by formal aspects rather than by the phenomenological interpretation of the theory, thus missing the significance of more advanced concepts related to the wave function.

FinallyThe second explanation is , apossible further explanation to account for the variable achievements of the involved groups is related to the different teaching attitudes of the instructors, who might have emphasized the course contents in different ways, and to the diverse degree of student involvement in the sessions devoted to solving exercises (see Supplemental material C). We will deepen such an issue in the next section.

**Implications and conclusions**

In this study we aimed to obtain evidence about students’ understanding of basic concepts in QM to support a first tentative effort in developing a LP in this essential area of physics content knowledge. More specifically, drawing on the available

literature, we initially identified three big ideas around which the knowledge of QM as a core concept in science is built at the undergraduate level. Collected data suggested to revise the HLP and adopt a two-dimensional LP. We emphasize that both the

hypothesized and revised LP combine aspects of the chosen big ideas to explain different but related phenomena (e.g., electron diffraction pattern and atomic stability) that are usually taught in undergraduate physics courses. The students’ progression across the levels was found to be effectively described by a hierarchical organization of levels through which all students progress towards a more sophisticated knowledge and a correct use of explanatory models (e.g., probability waves, operators, orbitals) in QM. This suggests that the two emerged big ideas may frame future efforts in the field to develop more stable LPs in QM. Further research, however, is needed to establish more reliably if the two big ideas are sufficient to describe students’ learning in QM and how students progress from one level to the next.

A second implication concerns instructional practices in about QM at the university level. Despite the agreement with the revised LP, the ideal teaching paths envisaged by undergraduate teaching of QM (for instance, the QM1+QM2 path for physics majors) do not necessarily lead to a coherent represent the most effective way to teach QM. For instance, QM1 relies on a priori knowledge that through such path, students may be exposed to actually miss or require students to adopt within the same course contrasting models and perspectives (see Supplemental material C for details). Aan emblematic example concerns Heisenberg’s uncertainty principle. As argued above, the DIF shown by the item that addressed this concept suggests very different ways in which it is taught in different courses (see Table 14). In this case, both at high school level and in the QM1 (MP) course, the principle is presented in the simplest formulation that involves well-known classical observables (position and velocity) in a

compact and evocative formula. Several textbooks (e.g., Amaldi, 2012) relate the principle to the historic but misleading idea of experimental limitations in precision and sensitivity and to an “observer” effect, which unavoidably affects the quantum system under investigation (Hadzidaki, 2008). In other words, the concept of “indeterminacy” is described in terms of classical physics and associated to a measurement sensitivity limit intrinsic to instrumentation rather than inherent to QM theory itself (Singh, 2008). Differently, university instructors of QM2 course are used to teach the uncertainty principle with a high degree of complexity. A similar argument can be put forth for the measurement process. While item 6 does not show DIF as item 5, the way in which it is taught in undergraduate courses greatly differs. For instance, in QM1 (MP), the

emphasis is on technical equipment, whereas in QM2 the emphasis is on its

mathematical interpretation in terms of Hermitian operators. As we pointed out above, a possible explanation of our results A third implication concerns may be related toSuch lack of coherence may be due to the different teaching attitudes of the university instructors who teach QM. A recent paper (Siddiqui & Singh, 2017) surveyed attitudes and approaches to the teaching of QM of twelve undergraduate physics instructors. The results show rather fragmented opinions about: (i) the sequence of the contents to be taught, (ii) the focus on conceptual aspects of QM, (iii) the teaching of simplified models and interpretations of QM. However, interestingly enough, the instructors shared common views about the goals of QM courses and about the lecture-based approach, the latter being considered the only suitable for the teaching of QM. Hence, it is reasonable to assume that, also in our study, the instructors of the three courses, while sharing the same lectured-based approach, may hold slightly different views about what contents, conceptual aspects and models must be taught in undergraduate courses about QM. While it was beyond the scope of the present study to deepen the relationship

between students’ achievements and instructors’ attitudes and views about QM, our results suggest that more research is needed to identify which teaching contexts best support the significant transition from lower towards upper levels and the extent to which the lecture-based approach of undergraduate courses about QM can effectively foster a meaningful students’ learning.

As a relatedfurther implication, our findings suggest that also the teaching of QM at high school and at university level should be better coordinated and sequenced to help students harmonize interconnected conceptual aspects. To this concern, we are currently investigating in greater detail the teaching and learning of QM in Italian high school. In particular, we are skimming textbooks and typical exercises through content analysis with experienced high school teachers involved in professional development courses. Preliminary analysis shows that the teaching of QM builds on some chemistry notions already taught during high school (e.g., orbitals, energy levels, ...) and is limited to a qualitative description of QM phenomena. Note that physics and chemistry are taught, in Italy, by teachers with a very different academic background: physics is taught by math or physics majors, while chemistry by non-physics majors, such as biologists or chemists. Insights from such a survey might likely reinforce the interpretation of the results of the present study. More research is also needed to

investigate whether innovative approaches, which emphasize the physical interpretation of the mathematical formalism used in QM (e.g., using polarizers as in Michelini et al., 2004) or blend conceptual discussions and interactive simulations (McKagan et al., 2008; Muller & Wiesner, 2002; Kohnle et al., 2014), can fruitfully help bridge the gap between conceptual and formal learning of QM.

Finally, concerning the introduced research tool, classical test theory and Rasch analysis suggest that the questionnaire is an efficient and reliable tool to investigate

students’ ideas about quantum mechanics. The questionnaire may be used at the end of secondary school and at university level, in addition to existing tools. Overall, no misfitting items were found and most of the answer choices correspond to predicted levels of the initial LP. Moreover, questions well discriminate between low and high achievers, with persons’ abilities well distributed across the questionnaire items’

difficulties. Lastly, its light structure avoids time-consuming administrations. However, further efforts are needed to improve the generalisation of our results. Hence, we plan the administration of an updated version of the questionnaire to a new sample, followed by in-depth interviews to validate our predictions about how students may progress through the levels of the revised LP. We will increase the number of the questionnaire’s items, by adding more items for each of the LP big ideas, with the following three aims: (i) to improve the tuning between the instrument and the LP levels; (ii) to obtain a more uniform distribution of the items across the difficulty scale, and (iii) to investigate also quantitative reasoning in QM. The updated version of the questionnaire will also include a revision of item 5, for which we have detected potential DIF.

We hope that such efforts will further contribute to clarify how secondary school instruction and university courses actually support students in developing informed and coherent ideas about this cornerstone of scientific knowledge.

**Limitations of the study**

From what has been discussed above, we stress here few potential limitations of the study. First, we chose not to interview the instructors of the MP-QM1-QM2 courses with a formal protocol about the didactic goals and adopted approach. Such additional information about the teaching context might have provided further insight about the different achievements of the involved students. A second limitation concerns the low

number of items in the questionnaire, which we will revise and improve in a

forthcoming paper. Finally, the study is limited by the choice of a local sample, which does not allow generalising the results to a national context. We plan to submit the revised questionnaire to physics students of other universities, who attended similar courses in quantum mechanics.

**References**

Alonzo, A. C., Robinson, A., Christensen J. & Lee M (2017) Developing Learning Progressions for Momentum and Mechanical Energy: Insights for Instruction. Paper presented at the annual conference of NARST, S. Antonio.

Amaldi U. (2012) L’Amaldi per i licei scientifici. Bologna: Zanichelli.

Ayene, M., Kriek J., & Damtie B. (2011) Wave-particle duality and uncertainty principle: Phenomenographic categories of description of tertiary physics students’ depictions. Physical Review Special Topics Physics Education Research, 7, 020113.

Baily, C. & Finkelstein, N. (2010) Teaching and understanding of quantum

interpretations in modern physics courses, Physical Review Special Topics Physics Education Research, 6, 010101.

Bond, T. G. & Fox, C. M. (2007) Applying the Rasch model: Fundamental

measurement in the human sciences (2nd ed). New York: Psychology Press. Boone, W. J, Staver, J. R & Yale M.S. (2015) Rasch analysis in the Human Sciences.

Dordrecht: Springer.

Briggs, D. C., Alonzo, A. C., Schwab, C., & Wilson, M. (2006). Diagnostic assessment with ordered multiple-choice items. Educational Assessment, 11, 33–63.

Carey, S. (1985) Conceptual Change in childhood. Cambridge: MIT Press. Corcoran, T. B., Mosher, F. A., & Rogat, A.D., (2009) Learning Progressions in

science: An evidence-based approach to reform. Philadelphia: Consortium for Policy Research in Education.

Driver, R. (1994) Making sense of secondary science: research into children’s ideas. New York: Routledge.

Duncan, P. W., Bode, R., Lai, S. M., & Perera, S. (2003). Rasch analysis of a new stroke-specific outcome scale: The stroke impact scale. Archives of Physical Medicine and Rehabilitation, 84, 7, 950-963.

Duncan, R. G. & Rivet, A. E. (2013) Science Learning Progressions. Science, 339, 396-397.

Duncan, R. G. & Hmelo-Silver, C. E. (2009). LPs: Aligning curriculum, instruction, and assessment. Journal of Research in Science Teaching, 46(6), 606-609.

Duschl, R., Maeng, S., & Sezen, A. (2011). Learning Progressions and teaching sequences: A review and analysis. Studies in Science Education, 47, 123-182. Emigh, P. J., Passante, G., & Shaffer P. S. (2013). Student Understanding of Blackbody

Radiation and Its Application to Everyday Objects. Physics Education Research Conference, Portland, OR: July 17-18.

Fischler H. & Lichtfeldt M. (1992) Modern physics and students’ conceptions. International Journal of Science Education, 14, 181- 190.

Foster, J. & Wiser, M. (2012). The potential of learning progression research to inform the design of state science standards. In A.C. Alonzo & A.W. Gotwals (Eds.) Learning progressions in science (pp.435-460). Boston: Sense Publishers. Furtak, E. M., Morrison, D., & Kroog, H. (2014). Investigating the link between

learning progressions and classroom assessment. Science Education, 98, 640-673.

Gil-Perez, D., & Solbes, J. (1993) The introduction of modern physics: overcoming a deformed vision of science. International Journal of Science Education, 15, 3, 255-260.

Greca, I. M. & Freire, O. (2003) Does an Emphasis on the Concept of Quantum States Enhance Students’ Understanding of Quantum Mechanics? Science &

Education, 12, 541–557, 2003.

Hadenfeldt, J. C., Neumann, K., Bernholt, S., Liu, X., & Parchmann, I. (2016).

Students’ progression in understanding the matter concept. Journal of Research in Science Teaching, 53(5), 683–708.

Hadzidaki, P. (2008) The Heisenberg Microscope: A Powerful Instructional Tool for Promoting Meta-Cognitive and Meta-Scientific Thinking on Quantum

Mechanics and the Nature of Science. Science & Education, 17, 613–639. Ireson, G. (2000) The quantum understanding of pre-university physics students,