• Non ci sono risultati.

Reflections on Trusting Trust

N/A
N/A
Protected

Academic year: 2021

Condividi "Reflections on Trusting Trust"

Copied!
3
0
0

Testo completo

(1)

TURING AWARD LECTURE

Reflections on Trusting Trust

To what extent should one trust a statement that a program is free of Trojan horses? Perhaps it is more important to trust the people who wrote the software.

KEN THOMPSON

INTRODUCTION

I thank the ACM for this award. I can't help but feel that I am receiving this honor for timing and serendip- ity as m u c h as technical merit. UNIX 1 swept into popu- larity with an i n d u s t r y - w i d e change from central main- frames to autonomous minis. I suspect that Daniel Bob- row [1] would be here instead of me if he could not afford a PDP-10 and had had to "settle" for a PDP-11.

Moreover, the c u r r e n t state of UNIX is the result of the labors of a large n u m b e r of people.

There is an old adage, "Dance with the one that brought you," which means that I should talk about UNIX. I have not w o r k e d on m a i n s t r e a m UNIX in m a n y years, yet I continue to get u n d e s e r v e d credit for the work of others. Therefore, I am not going to talk about UNIX, but I w a n t to thank everyone who has contrib- uted.

That brings me to Dennis Ritchie. O u r collaboration has been a thing of beauty. In the ten years that we have w o r k e d together, I can recall only one case of miscoordination of work. On that occasion, I discovered that we both had w r i t t e n the same 20-line assembly language program. I c o m p a r e d the sources and was as- t o u n d e d to find that t h e y m a t c h e d character-for-char- acter. The result of our w o r k together has b e e n far greater than the work that we each contributed.

I am a programmer. On m y 1040 form, that is w h a t I put down as m y occupation. As a programmer, I write

1 UNIX is a trademark of AT&T Bell Laboratories.

© 1 9 8 4 0001-0782/84/0800-0761 75¢

programs. I w o u l d like to present to you the cutest program I ever wrote. I will do this in three stages and try to bring it together at the end.

STAGE I

In college, before video games, we w o u l d amuse our- selves by posing programming exercises. One of the favorites was to write the shortest self-reproducing pro- gram. Since this is an exercise divorced from reality, the usual vehicle was FORTRAN. Actually, FORTRAN was the language of choice for the same reason that three-legged races are popular.

More precisely stated, the p r o b l e m is to write a source program that, w h e n compiled and executed, will produce as output an exact copy of its source. If you have never done this, I urge you to try it on y o u r own.

The discovery of how to do it is a revelation that far surpasses any benefit obtained by being told how to do it. The part about "shortest" was just an incentive to demonstrate skill and d e t e r m i n e a winner.

Figure 1 shows a self-reproducing program in the C 3 programming language. (The purist will note that the program is not precisely a self-reproducing program, but will produce a self-reproducing program.) This en- try is m u c h too large to win a prize, but it demonstrates the t e c h n i q u e and has two i m p o r t a n t properties that I need to complete m y story: 1) This program can be easily w r i t t e n by another program. 2) This program can contain an a r b i t r a r y a m o u n t of excess baggage that will be reproduced along with the m a i n algorithm. In the example, even the c o m m e n t is reproduced.

August 1984 Volume 27 Number 8 Communications of the ACM 781

(2)

Turing Award Lecture

char s[ ] = I t011

i ; i

" ~ l l l I ' V , ] ' ~

I'~t'] l p

(213 lines deleted) 0

1;

/,

• The string s is a

• representation of the body

• of this program from ' 0 '

• to the end.

,/

main( ) {

int i;

printf("char\ts[ ] = {kn");

for(i=0; s[i]; i + + )

printf("~t%d, \n", s[i]);

printf("%s", s);

I

Here are some simple transliterations to allow a non-C programmer to read this code.

= assignment

= = equal to .EQ.

!= not equal to .NE.

+ + increment

' x ' single character constant

"xxx" multiple character string

% d format to convert to decimal

%s format to convert to string kt tab character

kn newline character

F I G U R E 1.

STAGE II

The C compiler is w r i t t e n in C. What I am about to describe is one of m a n y "chicken and egg" problems that arise w h e n compilers are w r i t t e n in their own lan- guage. In this case, I will use a specific e x a m p l e from the C compiler.

C allows a string construct to specify an initialized character array. The i n d i v i d u a l characters in the string can be escaped to represent u n p r i n t a b l e characters• For example,

"Hello w o r l d \ n "

represents a string with the c h a r a c t e r "\n," representing the new line character.

Figure 2.1 is an idealization of the code in the C compiler that interprets the c h a r a c t e r escape sequence.

This is an amazing piece of code. It "knows" in a com- pletely portable w a y w h a t c h a r a c t e r code is compiled for a new line in any c h a r a c t e r set. The act of knowing

then allows it to recompile itself, thus perpetuating the knowledge.

Suppose we wish to alter the C compiler to include the sequence "\v" to represent the vertical tab charac- ter. The extension to Figure 2.1 is obvious and is pre- sented in Figure 2.2. We then recompile the C com- piler, but we get a diagnostic. Obviously, since the bi- nary version of the compiler does not know about "\v,"

the source is not legal C. We must "train" the compiler.

After it "knows" w h a t "\v" means, then our new change will become legal C. We look up on an ASCII chart that a vertical tab is decimal 11. We alter our source to look like Figure 2.3. Now the old compiler accepts the new source. We install the resulting b i n a r y as the new official C compiler and now we can write the portable version the w a y we had it in Figure 2.2.

This is a deep concept. It is as close to a "learning"

program as I have seen. You simply tell it once, t h e n you can use this self-referencing definition.

STAGE III

Again, in the C compiler, Figure 3.1 represents the high level control of the C compiler w h e r e the routine "com-

c = next( );

if(c != ' \ V ) return(c);

c = next( );

if(c = = ' \ V ) return('\\');

if(c = = ' n ' ) return('kn ');

F I G U R E 2 . 2 .

c = next( );

if(c ~= ' \ v ) return(c);

c = next( );

if(c = = ' \ V ) return('kV);

if(c = = 'n') retum('kn');

if(c = = ' v ' ) return('\v');

F I G U R E 2 . 1 .

c = next( );

if(c != ' \ V ) return(c);

c = next( );

if(c == '\v) r e t u r n ( ' \ \ ' ) ; if(c = = ' n ' )

return('\ n');

if(c = = ' v ' ) return(11 );

F I G U R E 2 . 3 .

762 Communications of the ACM August 1984 Volume 27 Number 8

(3)

Turing Award Lecture

pile" is called to compile the next line of source. Figure 3.2 shows a simple modification to the compiler that will deliberately miscompile source w h e n e v e r a partic- ular p a t t e r n is matched. If this were not deliberate, it would be called a compiler "bug." Since it is deliberate, it should be called a "Trojan horse."

The actual bug I planted in the compiler w o u l d m a t c h code in the UNIX "login" command. The re- p l a c e m e n t code w o u l d miscompile the login c o m m a n d so that it would accept either the i n t e n d e d e n c r y p t e d password or a particular k n o w n password. Thus if this code were installed in b i n a r y and the b i n a r y were used to compile the login command, I could log into that system as any user.

Such blatant code w o u l d not go u n d e t e c t e d for long.

Even the most casual perusal of the source of the C compiler w o u l d raise suspicions.

The final step is r e p r e s e n t e d in Figure 3.3. This sim- ply adds a second Trojan horse to the one that a l r e a d y exists. The second pattern is a i m e d at the C compiler.

The r e p l a c e m e n t code is a Stage I self-reproducing pro- gram that inserts both Trojan horses into the compiler.

This requires a learning phase as in the Stage II e x a m - ple. First we compile the modified source with the nor- mal C compiler to produce a bugged binary. We install this b i n a r y as the official C. We can now remove the bugs from the source of the compiler and the new bi- nary will reinsert the bugs w h e n e v e r it is compiled. Of course, the login c o m m a n d will r e m a i n bugged w i t h no trace in source anywhere.

compile(s) char ,s;

I

FIGURE 3.1.

compile(s) char ,s;

I if(match(s, "pattern")) { compUe("bug");

return;

J

FIGURE 3.2.

compile(s) char ,s;

if(match(s, "pattern1 ")) { compile ('bug1 ");

return;

I

if(match(s, =pattern 2")) I compile ('bug 2");

return;

J

FIGURE 3.3.

MORAL

The moral is obvious. You can't trust code that you did not totally create yourself. (Especially code from com- panies that employ people like me.) No a m o u n t of source-level verification or scrutiny will protect you from using u n t r u s t e d code. In d e m o n s t r a t i n g the possi- bility of this k i n d of attack, I p i c k e d on the C compiler.

I could have picked on any p r o g r a m - h a n d l i n g program such as an assembler, a loader, or even h a r d w a r e mi- crocode. As the level of program gets lower, these bugs will be h a r d e r and h a r d e r to detect. A well-installed microcode bug will be almost impossible to detect.

After trying to convince you that I cannot be trusted, I wish to moralize. I would like to criticize the press in its handling of the "hackers," the 414 gang, the Dalton gang, etc. The acts performed by these kids are vandal- ism at best and probably trespass and theft at worst. It is only the i n a d e q u a c y of the c r i m i n a l code that saves the hackers from very serious prosecution. The compa- nies that are v u l n e r a b l e to this activity, (and most large companies are very vulnerable) are pressing h a r d to update the criminal code. U n a u t h o r i z e d access to com- puter systems is a l r e a d y a serious crime in a few states and is c u r r e n t l y being addressed in m a n y more state legislatures as well as Congress.

There is an explosive situation brewing. On the one hand, the press, television, and movies m a k e heros of vandals by calling t h e m whiz kids. On the other hand, the acts performed by these kids will soon be punisha- ble by years in prison.

I have w a t c h e d kids testifying before Congress. It is clear that they are c o m p l e t e l y u n a w a r e of the serious- ness of theft acts. There is obviously a cultural gap. The act of breaking into a c o m p u t e r system has to have t h e same social stigma as breaking into a neighbor's house.

It should not m a t t e r that the neighbor's door is un- locked. The press must learn that misguided use of a computer is no more amazing than d r u n k driving of an automobile.

Acknowledgment. I first read of the possibility of such a Trojan horse in an Air Force critique [4] of the secu- rity of an early i m p l e m e n t a t i o n of Multics. I cannot find a more specific reference to this document. I w o u l d appreciate it if a n y o n e who can supply this reference would let me know.

REFERENCES

1, Bobrow, D.G., Burchfiel, J.D., Murphy, D.L., and Tomlinson, R.S.

TENEX, a paged time-sharing system for the PDP-10. Commun. ACM 15, 3 {Mar. 1972}, 135-143.

2. Kernighan, B.W., and Ritchie, D.M. The C Programming Language.

Prentice-Hall, Englewood Cliffs, N.J., 1978.

3. Ritchie, D.M., and Thompson, K. The UNIX time-sharing system.

Commun. ACM 17, 0uly 1974), 365-375.

4. Unknown Air Force Document.

Author's Present Address: Ken Thompson, AT&T Bell Laboratories, Room 2C-519, 600 Mountain Ave., Murray Hill, NJ 07974.

Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commer- cial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission.

August 1984 Volume 27 Number 8 Communications of the ACM 763

Riferimenti

Documenti correlati

By correlating local lattice tilts with the obtained crystal A symmetric reflection is typically used for the determinashape in three-dimensional slices, the distribution of

Reproduced with permission of the copyright owner.. Further reproduction prohibited

The main basic data needed for the preparation of the GHG inventory are energy statistics published by the Ministry of Economic Development Activities (MSE) in the National

The main national fire protection industries (Gielle and Gastec Vesta), which were involved also for the Survey about HFCs alternative substances with low GWP,

The Regions with the highest number of Industry and services local units located in high and very high landslide hazard zones are Campania, Toscana, Emilia-Romagna and Lazio.

The 2003 IPCC Good Practice Guidance for LULUCF has been entirely applied for all the categories of this sector as detailed data were available from national statistics and

5) giving greater importance to EMAS in the public funding mechanisms compared to ISO 14001(value 4,51). As the diffusion of EMAS has been supported by various measures over time,

Specific QA/QC procedures and different verification activities implemented thoroughly in the current inventory compilation are figured out in the annual QA/QC plans (ISPRA, 2016