|
Description of Database
The
GCP database is a rectangular, "flat" file composed of 57,092 rows
(cases or records) and 87 columns (variables). The rows are mainly individuals
living in families, households or "group quarters" (convents, jails,
etc.). Several hundred cases, however, are vacant houses, included because even
the absence of residents is an aspect of urban life worth capturing. The variables
are two types_alphanumeric or "string" variables, and numeric or coded
variables. String variables contain data which were entered into the database
as written on the census manuscript such as surnames and occupations. Numeric
variables contain data or "values" which are coded for the ease of
data entry and analysis. Examples are marital status and "calidad" (ethnicity
or race).
Further, the string and numeric variables are divided into two types of variables: "literal" and "constructed."
Literal Variables. The "Literal" variables are comprised of
four types:
• First are those that contain information which was written on the manuscript
page by the census taker or his scribe. Examples of these are cuartel number,
residence type, name, age, occupation, etc.
• Second are literal variables which contain data created by the GCP staff
for ease of searching, sorting or otherwise manipulating the statistical information.
Examples of those variables are Master Index, Index, Manuscript Page, Household
Number and Person Sequence Number (sequence of all individuals in the household.
• Third are those string variables which have been "paired" with
a numeric coded variables to facilitate statistical analysis. The numeric variable
of the pair is usually identified with a "2" at the end of the name.
Examples of those types are Patria2 and Job2.
• Fourth are the consolidated variables created by GCP staff for the convenience
of our users. Examples of those variables are Age2, Estado2 (marital status),
Calidad2 (race/ethnicity), Razaclas (race and class) and Birthplace (born "elsewhere" or
born in Guadalajara).
Constructed Variables. The "Constructed" variables are those
variables created by the GCP staff from the literal data on the manuscript
pages, but which were not specifically provided by the census takers.
These are several types:
• First are the those variables which can be directly inferred with a high
degree of confidence from the data actually provided. Examples are sex (rarely
actually stated but usually obvious from names, position in the household or
the endings of other data such as occupation or marital status), race of spouse
and the location of one's spouse, mother or father within the household.
• A further example of this type of constructed variables would be the
so_called "count" variables, which provide the number of servants,
number of employed persons, number of males and females in the family and household,
etc. These are household and/or family variables in which each member of the
household/family receives the same numeric data.
• More problematic are those variables which are based on interpretations
of the data or specific assumptions used by GCP staff in the process of coding
and/or entering the data into the database. The GCP believes that the importance
of the information justifies our interpretations and assumptions. Examples of
such constructed variables are household and family structure, position in the
household/family and a variety of migration variables. Such variables were created
based as far as possible on specific, documented, assumptions or procedures.
•First, we endeavored to select the more conservative of the various assumptions
available.
•Second, we adhered to a set of rigidly maintained and documented assumptions
governing our interpretations.
•Or, third, we provided data quality variables called "flag" variables
to provide the user with a useable level of confidence
Flag Variables. The "flag" variables provide a specific measure
of data quality which will enable the user to select the level of quality
with which they prefer to work. The level of quality essentially contain
the specific assumptions used by the coders to base their interpretation
or they contain some other indications of a greater or less level of
data quality. They are in essence the level of confidence one might
expect from the data of each case. Statistical software packages generally
provide a easy means to recode variables to allow the user to work
only with those values with which they are comfortable. An example
is the flag for migrant marital status (FMIGMAR). A "0" means
that we are "absolutely" certain of our data because the
years in residence were given by the census takers. A "1" meant
we were "very certain" as the migrating couple had children
born elsewhere and children born in Guadalajara. A "2" meant
we were "somewhat certain" as the couple had children born
elsewhere (but not in Guadalajara). A "3" meant we were "uncertain" as
the couple had only children born in Guadalajara. A detailed explanation
for this and other flag variables are provided in this codebook, below.
Other Assumptions and Interpretations. There are other assumptions and
interpretations for which "flags" are not available but which
were based on specific, consistent data coding and data entry procedures.
The most common were paleographic interpretations of illegible handwriting.
If some letters were clear and others not, the latter were marked by
dots. If the data entry operator (nearly always either fluent in, or
a native speaker of, Spanish) believed they knew the intended word
or name, they placed their assumption in brackets after the partially
illegible word. Such brackets were maintained in the Archive File,
but eliminated in the consolidated file. Interpretations of names were
supported by reference to standard works on Hispanic names. Finally,
all names and terms were reviewed by native speakers of Spanish in
the process of data verification, and any anomalies checked against
the original manuscript.1
In all cases, staff assumptions and procedures are outlined in the
Codebook which follows. More extensive discussion can also be found
in separate
appendices to this codebook or online at the GCP web site. In sum, the
GCP objective in coding and entering our data is to provide the user
with a much useful information as possible. The principle followed in
the inevitable issue of interpretation is, so far as practical, to either
provide an accompanying "flag" variable, or to follow consistent,
and generally conservative, documented procedures.
Numeric and String Variables.
•Numeric Variables. Variables that are either numeric in nature (e.g. age)
or are codes created by the GCP for convenience of data entry and/or statistical
analysis (e.g. household number and sex). In the Sex variable, the data values
are presented as a code (Sex, 1=male and 2 = female). Numeric variables are those
which can be predicted in advance.
•String (alpha_numeric) Variables. Variables that are entered into the
database in their original written form. The most obvious examples are names,
occupation and birthplace, all variables which cannot be coded in advance. To
facilitate data analysis, we also have created additional "paired" numeric
variables for those string variables which can be analyzed statistically. Such
variables are identified with a "2" following the original variable
name (e.g. Job2).
Missing Data. If an individual's (case) data for any particular variable
is not present and cannot be inferred logically, it is either recorded
as unknown or as missing.
• A "missing" Value. A missing value is one which will never
be known in any particular cuartel, such as ethnicity (calidad) for those cuartels
which did not provide that data. Numeric and string variables handle missing
data in different ways.
• Numeric Values. For numeric values missing for all individuals in a given
cuartel, the cell for that variable contains a dot (.), called a "systems
missing"code. Any statistical analysis will include the systems missing
cases, but will not include them in the "valid percent" column.
• String Values. Missing data in string variables is represented by a blank.
A blank is a legitimate value in a string variable and will be included under
the "valid percent" column. However, in the numeric version of that
variable (e.g. Job2), the blank usually will be converted to a systems missing
dot.
• An "unknown" Value. An unknown value is one which can not be
determined from the information provided in the census manuscript for t hat individual,
even though generally that information was provided for other individuals in
the cuartel, a situation, for example, common in marital status.
• Numeric Values. Usually unknown numeric values will be coded a "0" with
a value label of "unknown," "unable to determine" or "unclassifiable," depending
on the circumstances. Occasionally, a "0" will represent both missing
data in cuartels where normally that information is provided and in cuartels
where no data is provided. Usually this is in the numeric version of a string
variable (e.g. "Restype2").
• String Values. Unknown values in string variables are usually written
as "unknown."
• An "assumed" Value. If a value is missing but can be inferred
from the data then it gets an "assumed" code. For example, children
of parents listed as Spanish will be given an assumed Spanish code. See "Marital
Status" below for our rules for providing "assumed" values in
that variable.
• Illegible Values. Finally, data which was provided but which could not
be read because it is too faded, illegible handwriting, etc. is usually represented
by a 98 or a 998 for numeric variables and "illegible" in string variables.
Notes:
1. The issue of data quality is an on_going discussion among researchers.
For example, see Sean Townsend, Cressida Chappell and Oscar Struijvé,
Digitising History, online at http://hds.essex.ac.uk/g2gp/digitising_history
. We have, as far as possible and practical, followed procedures established
or refined by the Integrated Public Use Microdata Series (IPUMS) at the
University of Minnesota Historical Census Project, Steven Ruggles director.
See Matthew Sobek and Steven Ruggles, "The IMPUS Project. An Update," Historical
Methods (Summer 1999), vol. 32:3, pp. 102-09, and early publications
in the same source.
Variables List for the Censuses of 1821 and 18221
MASINDEX Master index #
INDEX Unique identifying number for each record in each cuartel.
PAGE Manuscript page #.
YEAR Year of the census (padrón) was taken, 1821 or 1822.
CUARTEL Cuartel #.
RESTYPE Residence type (casa, asesoria, etc.). String.
RESTYPE2 Residence type string consolidated into numeric.
BLOCK Block ("manzana") number given in the census manuscript.
HHNUMBER Household # (Español es Unidad Domestica #).
FHHNUMBR Flag for household number.
PERSEQHH Sequence number of person within each household.
STREET Street name.
STREETC Directions given by the census taker on the census manuscript.
TITLE Individual Title.
DONDO-A Social status/don or doña.
DONDO-A2 Social status consolidated.
FIRSTNAM First or given name(s). String.
SURNAME Surname(s)/apellido(s). String.
AGE Age.
AGE2 Age consolidated.
SEX Sex.
ESTADO Estado/marital status.
ESTADO2 Estado consolidated.
JOB Job/oficio. String
JOBESP Job/oficio string consolidated into numeric. In Spanish.
JOBENG Job/oficio string consolidated into numeric. In English.
SIMJOBHD Person has same or similar occupation as head of household
CALIDAD Calidad/ethnicity.
CALIDAD2 Calidad/ethnicity consolidated.
RAZACLAS Ethnicity and class.
SPOURACE Spouse's ethnicity
RACESPOU Spouse's ethnicity combinations.
RACEHH Ethnicity combinations in household (not including servants).
PATRIA Patria/birthplace.
PATRIA2 Patria/birthplace string consolidated into numeric.
GISTOWN Geographic Information Systems (GIS) modern municipalities in
Jalisco.
GISSTATE GIS modern Mexican states.
GISBLOCK GIS assigned block #.
GISCUADR GIS assigned side or "facing" block #.
BIRTHPLC Birthplace: Guadalajara or elsewhere.
MIGREGIN Migration region.
PARTIDOS Migration partidos/towns.
GQNAME Name of group quarters.
GQNAME2 Name of group quarters string consolidated into numeric.
GQNUMBER Sequence number of group quarters in each cuartel
GQSEQ Sequence number of each individual living in each group quarters.
HEAD1822 Cuartel number of the heads of households (listed for 1822 when
a full census was not available).
HHSTRUC Household structure.
FHHSTRUC Flag: Household structure.
POSINHH Position in the household.
FPOSINHH Flag: Position in household.
FAMSEQHH Family sequence in the household.
FAMSTRUC Family structure.
FFAMSTRU Flag: Family structure.
FAM3GEN Three generations (or more) are present in household.
FAMTYPE Family type (servant family, boarder family, etc.)
POSINFAM Position in the family.
FPOSINFA Flag: Position in family.
MRFUHSTR Structure of multiple related families, unrelated to the head
of household.
POSINMF Position of each person in a MRFUH.
FPOSINMF Flag: Position in MRFUH.
NUMINHH Number in household.
SERVANTS Number of servants in household.
WORKERS Number of employed persons in the household.
MIGRANTS Number of migrants in the household.
MALES Number of males in the household.
FEMALES Number of females in the household.
BOARDERS Number of boarders in the household.
KIN Number of kin in the household.
NUMINFAM Number in the family.
ADULTS Number of adults in family (age 18/plus).
CHILDREN Number of minor children in family (17/under).
FAMKIN Number of kin in family.
MOMLOC Mother's location in household.
FMOMLOC Flag: Mother's location in household.
POPLOC Father's location in household.
FPOPLOC Flag: Father's location.
SPLOC Spouse's location in household.
MIGMARST Migrant marital status.
FMIGMAR Flag: Migrant marital status.
YEARSRES Years in residence.
AGEMIG Age at Migration. Cuartel 18 & 20, 1821 only.
AGEMIG1 Age at Migration "One." [Not provided in the database.]
AGEMIG2 Age at Migration "Two." "
AGEMIG3 Age at Migration "Three." "
STEPMIG Step Migration
MIGKIN Migrant lived with relatives.
MIGNOKIN Migrant live with non_kin migrants.
COMMENT Comments by the census takers or by GCP staff.
Notes:
1. In the order in which they appear in the database.
|