The Algebra Layer implements the main relational algebra constructs such as select, project, etc. The Schema Layer implements routines that define and manipulate the database schema. Utility routines for manipulating relations are also included in this layer. The Physical Layer implements routines which directly manipulate files and records through the operating system interface. The meta-data information is stored in catalogs.
You will implement MINIREL bottom-up, that is, starting with the Physical Layer, then writing the Schema Layer, and finally, the Algebra Layer. We will provide a front-end to translate user dbms commands into calls to MINIREL. This front-end will parse the dbms commands and call routines of your Algebra Layer and Schema Layer to process each command. During program development, you should use the front-end for writing queries to test the correctness and robustness of your programs.
Physical Layer: | complete by September 28 |
Schema Layer: | complete by October 12 |
Algebra Layer: | complete by October 26 |
Index Routines: | complete by November 16 |
Testing: | complete by November 23 |
You are expected to hand in a fully commented listing for each layer of your program. A documentation template and a sample documentation is available in the course account. Similarly, sample dbms command files are available for testing the correctness and robustness of your programs. After project completion, each team will have to provide an online demo.
The front-end verifies the syntactic correctness of user requests. However, the front-end has no knowledge about what relations exist, what attributes the relations contain, or the types of their attributes. Therefore, your routines must detect and deal with the possibility that incoming requests may make incorrect references to relations and attributes. It is up to your routines to do all of the semantic error detection and handling. Ideally, you should detect all semantic errors before you make any changes to the database, as undoing such changes will be harder than preventing them in the first place. Also, you should enforce meta-data security, that is, not allow users to directly alter the catalogs. Finally, your routines should return an error code and print out meaningful error messages when errors do occur.
RELATION_NAME | - | string |
ATTR_NAME | - | string |
DBNAME | - | string |
FORMAT | - | string |
FILENAME | - | string |
VALUE | - | quoted string or number |
OP | - | comparison operators like ![]() ![]() |
where
OP | CONSTANT |
![]() |
501 |
![]() |
502 |
![]() |
503 |
![]() |
504 |
![]() |
505 |
![]() |
506 |
argv[0] = ``select''
argv[1] = result relation
argv[2] = source relation
argv[3] = attribute name
argv[4] = operator
argv[5] = value
argv[argc] = NIL
This routine implements the relational selection command. It first
creates a result relation with the same set of attributes as the source
relation. It then applies the selection criterion to the records in
the source relation, and places the selected records in the result
relation. In MINIREL, only one selection criterion is handled at a
time. The selection criterion itself is specified via the name of an
attribute, a comparison operator, and a value. The supported operators
are ,
,
,
,
,
and
; as described in Section 2, the operators are passed
to MINIREL by the front-end as integers.
argv[0] = ``project''
argv[1] = result relation
argv[2] = source relation
argv[3] = attribute name 1
argv[4] = attribute name 2
. . . . . .
argv[argc-1] = attribute name N
argv[argc] = NIL
This routine implements the relational projection command. It first creates the result relation with the attributes given in argv, then performs the projection. Attributes are matched up by name, not by position, and the order of attributes in the result relation is the order specified in argv. Since eliminating attributes may introduce duplicate tuples, your implementation must ensure that the duplicates are removed from the output relation.
argv[0] = ``join''
argv[1] = result relation
argv[2] = source relation 1
argv[3] = attribute name 1
argv[4] = source relation 2
argv[5] = attribute name 2
argv[argc] = NIL
This routine implements the relational join command. For simplicity, MINIREL only implements natural join, not the full join command. This routine first creates the specified result relation, then performs a join on the two source relations based on either the nested loops technique or the index join technique.
Since natural join is implemented, the join attribute appears only once in the result relation. The join attribute from the first source relation is used as the result's join attribute. The result relation's attribute order is: source 1 attributes (including the join attribute) followed by source 2 attributes. While joins on unlike field types (e.g., an integer field and a string field) are errors, strings of different lengths are accepted as being of the same type for the purpose of performing joins. Finally, you must deal with the possibility that the two source relations may have one or more attributes with the same name -- devise an attribute renaming scheme to resolve such conflicts. Note, however, that the join attribute should not be renamed.
argv[0] = ``insert''
argv[1] = relation name
argv[2] = attribute name 1
argv[3] = attribute value 1
. . . . . .
argv[argc-2] = attribute name N
argv[argc-1] = attribute value N
argv[argc] = NIL
This routine implements the relational insert command, adding the tuple given in argv to the named relation. Since the input values are in ASCII, the appropriate conversion is performed before entering into the database. Attributes are matched up by name, not by their order. The length of incoming string-type data is regulated by padding with NULLs if too short, or truncating if too long. Make sure that adding the tuple does not introduce a duplicate. If the attribute set is incorrectly specified, an error message should be generated.
argv[0] = ``delete''
argv[1] = source relation
argv[2] = attribute name
argv[3] = operator
argv[4] = value
argv[argc] = NIL
This routine implements the relational delete command. It is similar to Select, but it deletes the specified records from the source relation instead of placing them in a result relation. You should be able to implement this routine quickly by copying your Select code and then making a few changes to it.
Note: If appropriate indexes exist, MINIREL should use them to process the above operations efficiently.
In the Schema Layer, MINIREL implements routines that form part of the logical layer and deal with schema definition and manipulation. Utility routines for printing the contents of relations, loading relations from binary data files, and building indexes on relations, are also included in this layer. This section describes each of the routines in the Schema Layer. Each routine is called by the front-end or by Algebra Layer routines with (argc, argv) parameters (as in the Algebra Layer).
argv[0] = ``createdb''
argv[1] = database name
argv[argc] = NIL
This routine creates a new database. It creates the relation and attribute catalogs and loads them with the appropriate initial information. You should check that the given database directory does not already exist.
argv[0] = ``destroydb''
argv[1] = database name
argv[argc] = NIL
This routine first checks that the database has been closed. If it is not closed, it closes it and then destroys the database.
Note: Only the files and directories related to the specified database should be destroyed, without affecting other databases.
argv[0] = ``opendb''
argv[1] = database name
argv[argc] = NIL
This routine opens a database for subsequent use. It changes the working directory to be the database directory, opens the catalogs and initializes the various global data structures that are used by other MINIREL routines.
argv[0] = ``closedb''
argv[argc] = NIL
This routine closes the currently opened database and changes the working directory to the original directory from which MINIREL was invoked.
argv[0] = ``quit''
argv[argc] = NIL
This routine first checks that the database has been closed. If it is not closed, it closes it and then terminates the programs.
Note: For CreateDB, DestroyDB, and OpenDB, assume that the full path name is given in the database name parameter. At the end of the query interaction, your program should return to the directory from which you invoked MINIREL.
argv[0] = ``create''
argv[1] = relation name
argv[2] = first attribute name
argv[3] = first attribute format
. . . . . .
argv[argc-2] = Nth attribute name
argv[argc-1] = Nth attribute format
argv[argc] = NIL
This routine creates a new relation with the specified name and attributes. Relation names and attribute names should not be more than 20 characters long (including the NULL that terminates them), and you must enforce this. The format for each attribute is one of three -- ``i'' for integer, ``f'' for floating point, or ``sN'' for character string of maximum length N (where N is a string of digits). The upper bound on N is 50. It is an error to create a relation that already exists.
The routine creates a Unix file for the relation and makes appropriate entries in the catalogs. It adds a record to the relation catalog describing the new relation, and adds records to the attribute catalog for all the attributes of the new relation.
argv[0] = ``destroy''
argv[1] = relation name
argv[argc] = NIL
This routine removes the specified relation from the database. Information about the relation is removed from the catalogs as well, and the associated Unix file is deleted.
argv[0] = ``load''
argv[1] = relation name
argv[2] = data file name
argv[argc] = NIL
This routine loads the specified relation, which must already have been created, with data from the specified file (the full path name of the file will be provided). Data files are in binary format as a sequence of data records -- the difference between a data file and a relation is that the data file has nothing in it except the data itself, whereas a relation has additional page structure (for example, the slot maps). The sizes and types of the fields of the incoming records are ascertained by consulting the catalog information for the relation being loaded. It is an error to attempt to load data into a relation that is not empty. You may assume that the input file will not contain duplicate tuples.
argv[0] = ``print''
argv[1] = relation name
argv[argc] = NIL
This routine prints out the contents of the specified relation in the form of a table. The attribute values must be printed in a nice formatted way.
argv[0] = ``buildindex''
argv[1] = relation name
argv[2] = attribute name
argv[argc] = NIL
This routine creates a B+ tree index on the given attribute for the specified relation. To simplify the implementation, it is required that the desired indexes on a relation be created before any tuples are loaded. That is, the BuildIndex command should appear before the corresponding Load command.
argv[0] = ``dropindex''
argv[1] = relation name
argv[2] = attribute name
argv[argc] = NIL
This routine destoys the index on the given attribute for the specified relation. If no attribute name is specified, the routine destroys all the indexes existing on the relation.
relName | -- | Name of the relation described by this catalog record |
recLength | -- | Length of the relation's records, in bytes |
recsPerPg | -- | Number of records per page for the relation |
numAttrs | -- | Number of attributes for the relation |
numRecs | -- | Number of records currently in the relation |
numPgs | -- | Number of pages in the relation |
Since the relation catalog is itself a relation, it should contain, as its first record, a description of itself! The relation catalog is called relcat. The second record in the relation catalog is for the attribute catalog, which is called attrcat.
relcatRid | -- | RID for record holding this relation catalog record |
relFile | -- | File descriptor for the open relation |
dirty | -- | Set to true if the corresponding catalog record on disk becomes outdated |
attrList | -- | Linked list of attribute descriptors |
When a relation is opened, its relation catalog record is copied into the cache for easy access. The relcatRid for the entry is saved to make it easy to update (if necessary) the relation catalog record on disk when the relation is subsequently closed. The dirty field is set to true if the cached version of the entry changes, and it is false until then. If this value is true when the relation is closed, the relation catalog record on disk is updated before the cache entry is reused. The relFile field is used to store the Unix file descriptor while the file is open. The role of the last field, attrList, is explained in Section 5.2.4.
After a relation is opened, MINIREL routines typically use the array index into the cache for the relation, called its relation number, for quick access of information about the relation. You are free to add, to the cache record structure, any additional information that will result in performance enhancements.
offset | -- | offset of attribute within record |
length | -- | length of attribute |
type | -- | attribute type: ``i'', ``f'', or ``s'' |
attrName | -- | name of attribute |
relName | -- | name of relation |
You are free to use a few additional buffers if needed in the implementation. However, these should be limited in number - in particular, you cannot assume that entire relations or entire indexes can be brought into memory.
The structure of the index files and their manipulation routines are to be designed by you. The detailed description given in this document of the design methodology for the other parts of MINIREL will help you in this process.
All error messages are printed by one central error routine, the ErrorMsgs routine. This routine is implemented as part of the Physical Layer and has the following minimal interface:
ErrorMsgs(errorNum, printFlag)
int errorNum - Error message number.
int printFlag - Print out the error message if non-zero.
All errors should have numbers, and these numbers should be declared as constants in a definition module that all of your modules import. The ErrorMsgs routine is called when leaving routines in which an error has occurred -- the call is of the form ``return(ErrorMsgs(...))''. If printFlag is true, the routine prints an error message and then returns the value errorNum; otherwise it just returns errorNum.
The error messages must be helpful, that is, they should convey to the user not only what the problem is, but also suggest possible solutions for fixing the problem.
For program documentation, write up a one page summary which describes the project, the input/output interface, the optimizations that have been incorporated, etc. Also list what you have NOT implemented (for example, indexing).
The following files are available, with their locations as shown:
The frontend (parser) is MINIREL/frontend/FES.o . In order to use the parser, you need a main program. This is MINIREL/run/main.c . To create your final minirel executable, a sample makefile is available in MINIREL/run. If you invoke make minirel with this makefile, it links together the object files from each layer with the parser and the main program, and creates an executable called minirel. Your queries will be run on this executable. Note that MINIREL is a single Unix process.
Sample data files are in MINIREL/data/binary and sample query files are in MINIREL/query. To enable you to check that your Load routine is correctly reading in the binary data files, the ascii equivalents of these data files are provided in MINIREL/data/ascii.