Computers/Programming Language

ch6-2. Data Type

emzei 2012. 3. 19. 14:16

▷ Array types : an homogenous aggregate of data elements in which the individual element is identified 

  by its position in the aggregate , relative to the first

♠ Design issue

  - subscript의 자료형?

  - subscript의 범위? 

(c같은 경우는 0부터 시작... 1부터 시작하는 언어도 있음 )

  - array allocation은 언제 해야할지?

  - 얼마나 많은 subscript가 허용되야 할까?

(a[10][5] - 배열의 배열, a[10,5] - 2차원 배열 in pascal, ada)

  - array 초기화?

  - 어떤 종류의 slice를 허용할지?


♠ Array 와 indexes

  - array_name(index_list) → element

  - 두개의 타입

> the element type

> type of subscripts

    * subrange of integer

    * Pascal, Ada : boolean, character, enumeration 허용


♠ Subscript bindings and Array categories

♤ Static Array 아예고정된크기

  > subscript의 범위 : statically bound

  > storage allocation 또한 static

  > ex. FORTRAN77

- all subscript : integer

- subscript range are statically bound

- all storage is statically allocated

  > advantage : efficiency : no dynamic allocation/deallocation


♤ Fixed stack-dynamic array C와 같은 경우

  > subscript의 범위 : statically bound

  > storage allocation : done at declaration elaboration time during execution

  > ex. Pascal Procedure / C functions

  > advantage : space efficiency


---------------------------------------------------------------------------------------

dynamic array에서는 메모리가 어떻게 잡히는지에 대해 ★

♤ Stack-dynamic array ~ 생성하고 없애고 ~ 호출할때마다 개별적인 기능으로 

  > subscript range : dynamically bound

  > storage allocation : dynamic but remain fixed during the lifetime of variable 

( 동적으로 할당되지만 변수의 생명주기동안은 고정 )

  > ex. Ada - flexibility

GET (LIST-LEN)

declare

LIST: array(1..LIST_LEN) of INTEGER;

begin

...

End 

  > advantage : array가 사용되기 전까지는 크기를 알 필요가 없다


♤ Heap-dynamic array

  subscript range : dynamically bound

  storage allocation : dynamic and can change any number of times during the array's lifetime 

( 동적으로 할당되고 생명주기내에서는 몇번이고 변경가능)

  > ex. FORTRAN 90

INTEGER, ALLOCATABLE, ARRAY(:,:) ::MAT 

ALLOCATE(MAT(10,NUMBER_OF_COLS)

DEALLOCATE(MAT)

  > ex. C, C++

     * array is the pointer to some collection of memory cells

     * for C : malloc, free

     * for C++ : new, delete


♠ number of subscripts

  - FORTRAN Ⅰ : 3

  - FORTRAN Ⅳ : up to 7

  - OTHERS : NO LIMITS (n차원)

cf. C : only one subscript, but arrays can have array elements

=> orthogonality

ex. int mat[5][4]


♠ Array initialization

  - Pascal, Modula-2 : not allowed

  - FORTRAN - DATA statement

INTEGER LIST(3)

DATA LIST/0,5,5/

  - C, C++

int list[] = {4,5,7,83} // initialization

  - Ada

     * by listing in the order

LIST:array(1..5) of INTEGER := (1,0,4,0,0)

     * by directly assigning values to an index using =>

LIST:array(1..5) of INTEGER := (1=>3,3=>4,others=>0)


♠ Array Operation

  - assignment (배열 자체를 copy하는 것은 불가하다. loop 이용해야함. 반면에 struct는 copy 가능)

  - catenation - single dimension array

  - relational operator - equality

  - +

  - matrix multiplication, transpose, vector dot product

  - reverse, column, row ...


♠ Slices

  - slice of an array : some substructure of that array. Not a new data type but only a mechanism 

for referencing part of an array.

    Design issue

     - the syntax of specifying a reference to a particular slice

ex. A(I , *) = B(I, *)


♠ Implementation of Array Types

  - array element로 접근할 수 있게 해주는 코드는 반드시 complie time에 생성

  - runtime 때는, 이 코드는 element를 주소로 가질 수 있게 반드시 실행

  - 구현의 key point : access to array element, why?


♠ Access function for Single-dimensional array

  - No way to precompute the address to be accessed by a reference such as 

     list[k]

  - A single dimensional array is a list of adjacent memory cells

ex. address(list[k]) = address(list[1])+(k-1)*element_size

  - The generalization of this access function for arbitrary lower bound

* list[LB..UB], E = element size

* address(list[K]) = (address(list[LB])) + ((k-LB)*E) //시작주소+증가량


  - Compile-time descriptor

  * array, element type, index type, index lower bound, index upper bound address 

(등의 정보 - 사용자가 올바로 썼는지 check하기 위해 descriptor로 check. assembly 언어 번역에 필요한 

(충분한) 정보임)


  - Run-time descriptor

( continuous 하게 붙어있거나, pointer 이용할 수도 있 고)

* case 1 : no runtime checking of index range and the attributes are all static 

(C언어는 runtime descriptor 없음)

- only the access funciton is required during execution

* case 2 : runtime checking of index range is don

- those index ranges may need to be stored in a run-time descriptor

* case 3 : any of the descriptor entries are dynamically bound

 - those part of the descriptor must be maintained at run-time


♠ Multidimensional Array

  - 2 이상의 차원을 갖는 데이터 타입의 객체는 반드시 1차원 메모리로 맵핑되야 함

* row major order (대부분의 언어)

* column major order (FORTRAN, why?)

  - 2 차원 배열의 접근 함수

  - Descriptor : complie-time / run-time descriptor

(컴파일러가 갖고 있음. 차원이 많을수록 더 복잡해짐)






▷ Record and Union types :

 Record Type : possibly heterogeneous aggregate of data elements in which the individual elements 

are identifies by name


  ♤ Design issue

     - What is the syntatic form of reference to fields? ( . / of / in )

     - Are elliptical reference allowed?


  ♤ Definitions of Record

     - Record name & fields : fields are not referenced by index

     - COBOL : level number

     - Pascal, Ada : nesting record declaration

     - C, C++ : struct

     - FORTRAN : none


   ♤ References to Record Field

     - OF notation in COBOL

> field_name OF record_name 1 OF ... OF record_name n

> 예 . MIDDLE OF EMPLOY_NAME

     - Dot(.) notation : must others

> EMPLOY_NAME.MIDDLE

     - full qualified reference(생략ㄴㄴ), elliptical reference(생략ㅇㅋ)

> 생략ㄴㄴ 예. x.c.s != x.s

> 생략ㅇㅋ 예. x.c.s  = x.s

     - with clause in Pascal


  ♤ Operations on Records

     - Assignment

: common record operation

: the types of the two sides must be identical

     - MOVE CORRESPONDING in COBOL

     - Equality or inequality in Ada (같은지 여부 검사 가능)


  ♤ Implementation

     - the fields are stored in adjacent memory location

     - offset, relative to the beginning of this record, is associated with each field

     - compile time descriptor



♠ Union Type : may store different type values at different times during program execution

  ♤ Design issue

     : should type checking be required? (type check는 runtime때만)

     : should unions be embedded in record?

  ♤ C, C++

     - free union types 포함

     - a separate category of types and are not part of the record type

     - no tags of any sort and and no type checking of references is done

  ♤ Examples

     - FORTRAN union type

INTEGER X

REAL Y

EQUIVALENCE (X,Y)

-> X and Y are to cohabit the same storage location

     - Pascal union type

- record variant

- discriminated unions within a record structure

- the variable figure consists of the tag and sufficient memory for its largest variant

- problems with pascal union (문제점)

  > the user program can change the tag without making a corresponding change in the variant

- inconsistency ( tag - variant )

- all type error can not be detected

- the programmer can simply omit the tag from the variant record ; free union

- neither the user nor the system had any way to determine the current variant type

     - Ada union type

- constrained variant variable

- perfectly safe

- type can change by assignment of a whole record, including the discriminant




▷ Set types :

: can store unordered collections of distinct values from some ordinal type  

base type )

 Design issue

  - what should be the maximum number of elements in a set base type?


♠ Operation

  - membership test, inclusion test

  - equality test

  - union, intersection, difference (합집합/교집합/차집합)


♠ Examples : sets in Pascal

type colors = (red, green, blue, black, white, yellow);

colorset = set of colors;

var set1, set2 : colorset;

begin

set1 := [red, blue, yellow, white];

set2 := [black, blue];


♠ Implementation of Set Types

  - A bit string of memory

  - set operation

> union : bit-wise OR

> intersection : bit-wise AND

> membership : bit-wise AND


♠ set operations in C

typedef enum {red, green, blue, black, ...} COLORSET;

set1 = (red|green|blue);

set2 = set1|(green|black);

set1 = set1 & set2;





▷ Pointer and Reference types :

: the variable of pointer type has a range of values that consists of memory address and a special value, nil

 (가리키고 있는 곳의 값을 표현. 

   고급언어 - 가리키는곳의 타입을 명시 (ex. int *p)

  또는 가리키는 곳의 타입 명시 않음 (ex. pointer p) - low level에 가까운언어 )

♠ Usage

  - indirect addressing

  - dynamic storage management

heap

heap-dynamic variable

- variable without name is anonymous variable

♠ Design Issues

  - pointer 변수의 범위(scope)와 생명주기(lifetime)

  - lifetime of heap-dynamic ? user-control..allocation()/free()

  - pointer가 가리킬 수 있는 객체의 타입이 제한적인가?

  - 포인터는 dynamic storage management에? indirect addressing에? 아니면 둘다?

  - 언어는 pointer 형을 지원? reference 형을 지원? 둘다 지원?


♠ Pointer Operation

  ♤ Assignment

    - sets a pointer variable to the address of some object

  ♤ Dereferencing

    - reference to the value in the memory cell whose address is in the memory cell to witch the variable

 is bound


♠ Pointer Problem

   ♤ Dangling pointers

      : 포인터가 이미 deallocated된 heap-dynamic 변수의 주소를 갖고 있는 경우

      : type checking 문제

      : dangling pointers problem

-> an explicit deallocation for dynamic variable

   ♤ Lost Heap-dynamic variables

      : 할당된 dynamic 개체가 더이상 사용자 프로그램에 접근할 수 없지만 여전히 유용한 데이터를 갖고 있을 때. 

   ---> garbage

      : In case the creating a lost heap-dynamic variable

p1 = malloc(...);

- - -

p1 = malloc(...);


♠ Solutions to the Dangling Pointers

  ♤ Tombstone

     - 모든 dynamic 변수들은 tombstone이라는 특별한 공간을 포함

     - dynamic 변수가 deallocate되면 tombstone은 여전히 남아있지만 그 값은 nill이 됨

     -  문제점)

: tombstone의 공간은 절대 deallocate될 수 없음

: indirect addressing


  ♤ Locks-and-keys

     - UW-Pascal의 구현

     - Representation

* pointer value  ordered pair (key, address)

* Heap-dynamic variable : storage for the variable + a header cell for integer lock value

     - Operation

* when a dynamic variable is allocated, a lock value is created and places in the lock cell of dynamic 

     and the key cell of the pointer :: new

* every access to the dereference pointers compare key value and lock value

* if match, the access is legal. Otherwise a runtime error occurs.

* when a dynamic variable is deallocated, its lock value is cleared to an illegal lock value : dispose


♠ Pointers in Various Languages

   ♤ Pascal : new and function dispose

   ♤ Ada : access type

   ♤ C, C++ : *, &

   ♤ FORTRAN90 : INTEGER, POINTER::INT_PTR


♠ Heap Management

  - very complex run-time process

  - heap storage is allocated/deallocated

* single size

* variable size


   ♤ single size heap management

* allocation

    - available cells are linked using the pointer

* deallocation

   - reference counter ( eager approach )

: is done when inaccessible cells are created, incrementally

   - garbage collection ( lazy approach )

: when the list of available space becomes empty


   ♤ variable size heap management

* Single size management

* Garbage collection problems

    - initialization

    - marking processing

    - maintaining the list of available 

 




▷ Type checking 

: the activity of ensuring that the operands of an operator are of compatible types

◎ compatible type : either legal for the operator or  allowed under language rules to be implicitly converted 

to a legal type

◎ Coercion 

예) 정수 + 실수 => 실수가 정수형으로 되어 연산

◎ Type error

◎ static type checking / dynamic type checking

     (컴파일 시)                   (실행 중)







 Strong Typing 

: a programming language is strongly typed if type errors are always detected.


◎ this requires that the types of all operands can be determined, either at compile time or at run time

◎ FORTRAN, Modula2, C, C++ : NO

◎ Pascal, Ada : nearly

◎ ML, Java : yes

◎ why?





▷ Type Equivalence (Compatibility) 

struct { int a, int b} x

struct { int a, int b} y

    struct { int c, int d } z


◎ Name equivalence

ex. x = y, x != z (이름이 같아야 함)

◎ Structural equivalence

ex. x = y = x (이름이 달라도 알맹이가 같음)