|
Issues in Large Scale Porting
This page contains slides from lectures given on porting at
various locations including the 1991 Data Processing Institute.
|
|
Introduction
The presentation is composed of:
- Introduction
- Presentation's Objectives
- The Porting Problem
- The Universe of Discourse
- Specific Problems to be Resolved
- Porting Goals
- Strategy and Mechanisms
- Specific Problem Solving
- Dr. Tuna's Porting Nostrums
- Conclusions
Presentation's Objectives
Present my approach to the porting problem w.r.t.:
- philosophy
- strategy
- specific solutions
- long term software hygiene
The principle objective is to instil one adage:
Think before you port!
Caveats
The applicability of my solution to your problem may
be difficult!
My solution is extensive
and geared to large software systems.
Through familiarity, I have bred the expected feelings
towards the suppliers,
but I will not air these feelings publicly,
as the suppliers do seem to strive for a uniform level of quality.
The Porting Problem
Historically:
-
In the beginning (1975): one system, one machine,
one compiler, well defined tools, limited expectations,
everyone had source, small community
(Duke did distribution using rk05 dump tape).
-
At first split PWB vs. V7 (1978): virtually same API, minor
variations in Cc, still same hardware.
-
At diaspora: 4.1bsd, System III, 68k systems, xenix (1981):
large G.C.D., but frayed at edges (tty controller, header file
migration and style).
Real change in emphasis (accounting vs. tools).
-
Current: many unqualified suppliers, dubious Q.A., political
pushes, competing standards, commercial pollution,
closed systems, huge range of hardware, software, and quality.
Why Port?
One ports to:
-
to meet customer demands and increase market
-
to facilitate use of appropriate (i.e., cost or performance effective)
kit for task
-
to make use of available kit
-
to improve the quality (porting to and
testing on multiple platforms finds bugs)
-
...
The Universe of Discourse
- Not concerned with net.sources --
We are dealing with large scale (greater than 100k lines),
costly (real people paid to create and maintain it) software.
- Dealing with all possible reasonable target platforms --
Assume
4.[234]bsd, OSF, Unix5.[01234], and various dichotomous hybrids.
Do not unconsciously exclude 16/24 bit words or word address machines.
- Do not assume nor demand conformance to any standard --
They are so often sub-standard.
One can look to the standards for hints, but don't
trust supplier's claims.
The more loudly announced a their adherence to a standard,
the more likely it is that their conformance is suspect.
Specific Problems to be Resolved
Problems with porting may be partitioned into
the following classifications.
Variations in:
- system architecture
-- byte order, word size, pointer size, word alignment, and so on
- the tool interfaces and/or semantics
- libraries
- headers
- environment
Variations in performance, cost, robustness, reliability,
availability, support, and the honesty of supplier
should be considered, but are not relevant to this
discussion.
Variations in Tools
-
may or may not exist (e.g., ranlib, cpp, troff, lint)
-
may have differing names (rsh vs. /usr/ucb/rsh vs.
rcmd vs. on)
-
may have differing flags (cc -gx vs. -g)
-
may or may not be required
(ranlib vs. tsort/lorder vs. NULL)
-
may or may not be accurately documented if at all
(far too many examples of this)
-
may have variations in the exit status interpretation
-
may have differing input syntax and semantics
(cc, SysV vs. 7th edition make)
-
may have widely varying internal formats and supported bug set
(ar is the mother of all examples)
-
may have truly rebarbative behaviours (leading `#' in sh scripts)
-
differing permissions (can one chown?)
Variations in Libraries
-
variations in semantics (e.g., fopen(...,"a+"))
-
differing names (index vs. strchr)
-
differing argument types and semantics (wait)
-
differing return types (sprintf, signal)
-
routines not provided (rename, gethostname,
wait3, dup2)
-
all of the above (getwd vs. getcwd)
-
non-working routines (rename)
-
name of library (libtermcap vs. libcurses)
-
use of ranlib
-
compiled under different universes
-
...
Variations in Header Files
-
name of appropriate header
(fcntl.h vs. file.h, ioctl.h vs.
termio.h, ...)
-
ordering of header (time.h vs. sys/time.h)
-
differing structs (st_blocks in stat.h,
direct.h vs. dir.h)
-
missing or declared types (ulong vs. u_long,
uid_t, gid_t)
-
same type defined differently (stdio.h's FILE)
-
some header files uncompilable
-
...
Environment
-
Number of maximum open file descriptors
-
Number of normally open file descriptors (4 on eighth edition)
-
Number of groups
-
Name of logged in user ($USER vs. $LOGNAME vs.
getlogin())
-
Maximum leaf name (not as you thought)
-
crypt & password file interface access and performance
-
Length of tty name prefix stripped before inserted into utmp file
Cc issues
-
Full ANSI? Partial ANSI? __STDC__ defined?
and if so, is in too be trusted?
signed and/or
const supported? ANSI token pasting?
Are prototypes supported? Trustworthy? Working? Required?
-
Proper enum support?
-
Type of sizeof?
Type of difference between two pointers?
-
Is char signed or unsigned by default?
-
void* meaningful? Working?
-
Any required defines (e.g., -DPORTAR, -DXENIX )?
-
Any special flags (e.g., -X28 for m88k)?
-
What are standard include and library search paths?
-
Any limitations (e.g., maximum number of -I flags)?
-
Any known bugs?
-
Usable with provided libraries and header files?
Two Splendid Examples
Seven variations of rename(2):
-
Existence proof that it can be done
-
Not provided
-
Provided but is no-op, but documentation
is Posix compliant, but the example contains a bug.
-
rename(from,to) fails when from file is busy
-
Times out far too frequently
-
Parallel renames crash kernel
-
Parallel renames corrupt file system
So you want to use struct tm and/or struct timeval!
You might have to:
-
include <sys/time.h> but not <time.h> --
latter is included in former but is not idempotent.
-
include <time.h> but not <sys/time.h> --
latter is included in former but is not idempotent.
-
include <sys/time.h> & <time.h> --
both required and neither includes other.
-
suppress use of struct timeval
because it's not supported.
-
precede either inclusion by include
of <sys/types.h>, but be careful,
because it's not always idempotent.
-
add your own typedef of time_t because it may not be provided
on some systems but it must be used.
Porting Goals
Repeat after me ...
``There's no such thing
as portable code!
There's only code that's
been ported!''
I had code that had been successfully ``ported'' to
thirty platforms, yet failed on the thirty-first.
The only achievable goal is:
Adaptable Code
that is,
code that can be quickly and easily adapted to work
on a new target platform.
So ...
How does one make code adaptable?
One:
-
builds the system on a new target
-- which should reveal the discrepancies
between the previous and new targets;
-
evolves UNIVERSAL solutions and strategies for dealing with
such discrepancies as and before they arise;
-
incorporates those solutions into the code; and
-
iterates!
What one is trying to do
-- through experimentation, experience, and folklore,
-- is to:
-
position the code such that its adaptation
to the next target is just the simple application of previously
incorporated mechanisms to adapt to the discrepancies manifested
by that target.
Nota Bene
Direct your efforts towards:
- all targets:
-
not just the one you are currently doing!
- all sources:
-
not just that one file that's currently presenting problems!
In other words, once you've have discovered a problem:
Solve it once,
and only once,
for all time.
but you must ensure that the solution is
conveyed to any developers who might invoke
the discrepancy in the future.
Caution
When changing code to adapt to a new target ...
Do not break any previous
adaptation!!!
Strategy and Mechanisms
The following sections describe
the major strategies and mechanisms I use.
Adopt them as is feasible.
Single Sourcing
All products, for all platforms, for all configurations,
for all concerned parties (e.g., developers, Q.A., release
engineers) are built from a:
Single
Universal
Shared
Source
File
System
such as [Korn 89], [Tilbrook 90a], [Glew 89], build(1).
Comprehensive Incremental Construction
The software construction system should provide a
comprehensive approach to incremental construction
that ensures that any modification causes
the appropriate constructions to be applied.
This should incorporate full dynamic transitive closure
dependency tracking (e.g., mkdepends is a half-hearted attempt
at this, but a good start).
If you are forced to use make, know ye well
its many limitations.
A Compatibility Library
Add your own compatibility library as the penultimate library
(i.e., immediately before libc) for every program.
This library should contain any subroutine mappings that
are required to compensate for libc deficiencies.
In some instances, it must come between two libcs
(one from each universe).
A Environment Header File
Insert at the beginning of every C file,
(be it source or generated) an include of one of your own
header files.
All our sources contain as the first C statement:
#include <envir/system.h>
This header file contains, in part:
-
defines used to suppress or select code based on
platform or operating system types (e.g., SY_U53,
SY_B43, MIPS_ENV)
-
commonly used types (e.g., Bool_t,
Schar_t for signed character);
-
defines for Prototype declarations;
-
defines for TRUE and FALSE,
-
Boolean manifests for type of token pasting and other
compiler settings.
Foreign Header File Wrappers
Never include a supplier's header file directly.
Always wrap in one of your own idempotent header files,
as in:
#ifndef ENVIR_STDIO_H
# define ENVIR_STDIO_H
# undef NULL
# include <stdio.h>
# undef NULL
# define NULL (0)
/* missing prototypes */
#endif /* ENVIR_STDIO_H */
This allows one to correct their mistakes and omissions,
and deal with discrepancies (e.g., type of sprintf)
in single location.
In some cases, provide capability based name for header
to deal with discrepancies.
For example create <envir/open.h> to include
header that contains open(2) arguments or
define them if not provided.
A Single Parameterization File
Build a mechanism to construct and use
a single platform parameterization file,
that is a file that provides all platform
specific settings or options for the software
(e.g., appropriate type for a signed character,
the include file that contains the open(2)'s second argument
manifests).
- F.Y.I.:
-
My parameterization file contains 112 settings
which provide all required sub-routine and header file mappings,
all site specific information (e.g., address, telephone),
a variety of system specific constants and booleans
(e.g., ANSI type token pasting, supports Prototypes).
A procedure (e.g., strfix at our site)
is used to insert parameter values
in specific configured files
(e.g., <envir/system.h> and <envir/stdio.h>)
which are then installed if the resulting file differs
from the currently installed file.
A Single Configuration File
A single file is used to specify all construction
specific information (i.e., destination directory, options,
cc flags).
The construction system ensures that these values are
applied universally and any change in their settings
will result in the reapplication of any tool that
uses them (again difficult to do with make).
Project/Software Hygiene
Apply Stenning's principles of Project Hygiene.
In particular:
-
Know what you are trying to accomplish!
-
Focus on the process as a whole,
rather than of the final product.
See [Stenning 90] and [Tilbrook 90b].
Specific Problem Solving
The following are some well known common
discrepancies plus a short description of
a possible solution.
- Readdir
-
Create your own readdir.h that either includes
appropriate header (if there is one) and defines common
struct to be used on all systems.
Provide macros to deal with missing name length in
Posix definition. Create simulation for opendir,
readdir, etc. for all seventh edition file systems.
- string.h
-
Lose theirs. Create your own superset of all the versions
you can find, incorporating appropriate macros
or mappings to your routines for memcmp, bzero,
strchr vs. index, etc.
- Termcap
-
Create your own generalized capability-based interface library
that hides differences between terminfo and termcap.
- Getcwd
-
If you have getwd, use it.
If you have getcwd() create getwd interface
that calls getcwd to do the interesting stuff.
Otherwise create getwd() that invokes pwd(1)
and reads in its output.
- Termio
-
Create your own header file with capability-based
macros to provide basic functionality (e.g., stty,
gtty, set or reset mode).
Is tricky but can be done.
Interaction with signals is a challenge.
- Ar files
-
Create single routine that retrieves generalized structure
describing archive members.
Unfortunately has to be tailored for nearly every system
individually, but you only do it once.
- Linting
-
This is another paper.
Too bad. It's an important porting aid.
Prototypes
ANSI C type prototype declarations should exist
for every routine that you use or provide.
We have a boolean parameter that specifies whether
or not prototypes should be used
(__STDC__ not to be trusted).
<envir/system.h> contains (in effect):
#if PROTOTYPES_SUPPORTED
# define Prototype(x) x
#else
# define Prototype(x) ()
#endif
Procedures are then declared using something similar to:
int func Prototype((char *nm,int cnt));
It works and is extremely useful both as documentation
and to validate routine usage.
Prototype Warnings
Some warnings regarding use of prototypes:
Dr. Tuna's Porting Nostrums
A list of unexplained rules:
-
Have a testing strategy to check your port,
before you start porting
-
Ensure system to be ported actually works on
its current host as built with the source
you plan to port!
That is don't try to port and debug a system simultaneously.
-
Avoid ifdefs if possible, and if not,
limit their use to header files.
-
Do not suppress code compilation unless necessary to link
properly.
Conditionally excluded code is frequently uncompilable.
-
At convenient closures, rebuild system on all machines,
not just the object of the current porting exercise.
-
When possible, test required changes on already supported
platform first.
-
Avoid simultaneous introductions of dramatic changes.
When starting major surgery, start with a working system.
-
Always build the full system (time permitting).
You never really know the entire scope of your changes.
-
At convenient closures, remove all remnants of the system
and totally rebuild, particularly when you think you're
finished -- you aren't.
-
Use lint and like it.
-
-D to be considered dangerous.
-
Avoid compiler or loader dependent tricks.
-
Resist the urge to fix the suppliers' bugs.
Your target should be as close to your clients' as
possible.
-
Create machine independent varargs interface
and convert all appropriate routines to use it.
-
Encouragement to use a version system should be unnecessary.
Conclusions
Actually, not so much a conclusion as credentials.
Can you believe me, or am I just another consultant
who is regurgitating other people's opinions
without truly understanding them?
The strategy described is in use at Sietec O.S.D.
It has been applied to our three major products,
which consists of about 390 directories containing 4,500 source files,
which themselves contain approximately
eight hundred thousand lines of code
and about 21 Megabytes.
The product directories contain about twenty-five hundred files.
By Feldman's metrics, this is a large system.
Yet, there is one source file system and we maintain
up to date product trees on nine different targets
simultaneously.
The only difference between two different configurations
will be the platform parameterization file
and the configuration control file.
If adaptation to a new platform takes more than a day,
(and it takes upwards of eight hours to do the compiles)
it's usually due to bugs in the target system's environment.
Bibliography
- [Stenning 90]
-
Vic Stenning, ``Project Hygiene'', EurOpen Proceedings,
Nice (Oct. '90).
- [Tilbrook 90]
-
David Tilbrook, ``Quod Erat Faciendum'',
EUUG & SUUG Conference Distributions, Nice & Moscow (Oct. '90),
- [Tilbrook 90b]
-
David Tilbrook, ``Washing Behind Your Ears: The Principles
of Software Hygiene'', EurOpen Proceedings,
Nice (Oct. '90).
- [Glew 89]
-
Andy Glew, ``Boxes, Links, and Parallel Trees:
Elements of a Configuration Management System'',
Software Management Workshop Proceedings,
New Orleans (Apr. '89).
- [Feldman 90]
-
Stuart Feldman, ``Large Scale Software Development Under Unix'',
UKUUG Proceedings, London (June '90).
- [Korn 89]
-
David Korn, ``The 3D File System'',
Usenix Proceedings, Baltimore (June '89).
porting.qh - 1.14 - 03/10/24 |
|