fsu torches florida state university
fsu torches
fsu torches Database Description


About the Guadalajara Censuses Project
Guadalajara City: Background & History
Censuses
Database Information
Access and Use
Codebook
Maps
Software Guides
 

Description of Database

The GCP database is a rectangular, "flat" file composed of 57,092 rows (cases or records) and 87 columns (variables). The rows are mainly individuals living in families, households or "group quarters" (convents, jails, etc.). Several hundred cases, however, are vacant houses, included because even the absence of residents is an aspect of urban life worth capturing. The variables are two types_alphanumeric or "string" variables, and numeric or coded variables. String variables contain data which were entered into the database as written on the census manuscript such as surnames and occupations. Numeric variables contain data or "values" which are coded for the ease of data entry and analysis. Examples are marital status and "calidad" (ethnicity or race).

Further, the string and numeric variables are divided into two types of variables: "literal" and "constructed."


Literal Variables. The "Literal" variables are comprised of four types:


• First are those that contain information which was written on the manuscript page by the census taker or his scribe. Examples of these are cuartel number, residence type, name, age, occupation, etc.


• Second are literal variables which contain data created by the GCP staff for ease of searching, sorting or otherwise manipulating the statistical information. Examples of those variables are Master Index, Index, Manuscript Page, Household Number and Person Sequence Number (sequence of all individuals in the household.


• Third are those string variables which have been "paired" with a numeric coded variables to facilitate statistical analysis. The numeric variable of the pair is usually identified with a "2" at the end of the name. Examples of those types are Patria2 and Job2.


• Fourth are the consolidated variables created by GCP staff for the convenience of our users. Examples of those variables are Age2, Estado2 (marital status), Calidad2 (race/ethnicity), Razaclas (race and class) and Birthplace (born "elsewhere" or born in Guadalajara).





Constructed Variables. The "Constructed" variables are those variables created by the GCP staff from the literal data on the manuscript pages, but which were not specifically provided by the census takers. These are several types:


• First are the those variables which can be directly inferred with a high degree of confidence from the data actually provided. Examples are sex (rarely actually stated but usually obvious from names, position in the household or the endings of other data such as occupation or marital status), race of spouse and the location of one's spouse, mother or father within the household.


• A further example of this type of constructed variables would be the so_called "count" variables, which provide the number of servants, number of employed persons, number of males and females in the family and household, etc. These are household and/or family variables in which each member of the household/family receives the same numeric data.


• More problematic are those variables which are based on interpretations of the data or specific assumptions used by GCP staff in the process of coding and/or entering the data into the database. The GCP believes that the importance of the information justifies our interpretations and assumptions. Examples of such constructed variables are household and family structure, position in the household/family and a variety of migration variables. Such variables were created based as far as possible on specific, documented, assumptions or procedures.


•First, we endeavored to select the more conservative of the various assumptions available.


•Second, we adhered to a set of rigidly maintained and documented assumptions governing our interpretations.


•Or, third, we provided data quality variables called "flag" variables to provide the user with a useable level of confidence





Flag Variables. The "flag" variables provide a specific measure of data quality which will enable the user to select the level of quality with which they prefer to work. The level of quality essentially contain the specific assumptions used by the coders to base their interpretation or they contain some other indications of a greater or less level of data quality. They are in essence the level of confidence one might expect from the data of each case. Statistical software packages generally provide a easy means to recode variables to allow the user to work only with those values with which they are comfortable. An example is the flag for migrant marital status (FMIGMAR). A "0" means that we are "absolutely" certain of our data because the years in residence were given by the census takers. A "1" meant we were "very certain" as the migrating couple had children born elsewhere and children born in Guadalajara. A "2" meant we were "somewhat certain" as the couple had children born elsewhere (but not in Guadalajara). A "3" meant we were "uncertain" as the couple had only children born in Guadalajara. A detailed explanation for this and other flag variables are provided in this codebook, below.


Other Assumptions and Interpretations. There are other assumptions and interpretations for which "flags" are not available but which were based on specific, consistent data coding and data entry procedures. The most common were paleographic interpretations of illegible handwriting. If some letters were clear and others not, the latter were marked by dots. If the data entry operator (nearly always either fluent in, or a native speaker of, Spanish) believed they knew the intended word or name, they placed their assumption in brackets after the partially illegible word. Such brackets were maintained in the Archive File, but eliminated in the consolidated file. Interpretations of names were supported by reference to standard works on Hispanic names. Finally, all names and terms were reviewed by native speakers of Spanish in the process of data verification, and any anomalies checked against the original manuscript.1

In all cases, staff assumptions and procedures are outlined in the Codebook which follows. More extensive discussion can also be found in separate appendices to this codebook or online at the GCP web site. In sum, the GCP objective in coding and entering our data is to provide the user with a much useful information as possible. The principle followed in the inevitable issue of interpretation is, so far as practical, to either provide an accompanying "flag" variable, or to follow consistent, and generally conservative, documented procedures.


Numeric and String Variables.


•Numeric Variables. Variables that are either numeric in nature (e.g. age) or are codes created by the GCP for convenience of data entry and/or statistical analysis (e.g. household number and sex). In the Sex variable, the data values are presented as a code (Sex, 1=male and 2 = female). Numeric variables are those which can be predicted in advance.


•String (alpha_numeric) Variables. Variables that are entered into the database in their original written form. The most obvious examples are names, occupation and birthplace, all variables which cannot be coded in advance. To facilitate data analysis, we also have created additional "paired" numeric variables for those string variables which can be analyzed statistically. Such variables are identified with a "2" following the original variable name (e.g. Job2).


Missing Data. If an individual's (case) data for any particular variable is not present and cannot be inferred logically, it is either recorded as unknown or as missing.



• A "missing" Value. A missing value is one which will never be known in any particular cuartel, such as ethnicity (calidad) for those cuartels which did not provide that data. Numeric and string variables handle missing data in different ways.


• Numeric Values. For numeric values missing for all individuals in a given cuartel, the cell for that variable contains a dot (.), called a "systems missing"code. Any statistical analysis will include the systems missing cases, but will not include them in the "valid percent" column.


• String Values. Missing data in string variables is represented by a blank. A blank is a legitimate value in a string variable and will be included under the "valid percent" column. However, in the numeric version of that variable (e.g. Job2), the blank usually will be converted to a systems missing dot.


• An "unknown" Value. An unknown value is one which can not be determined from the information provided in the census manuscript for t hat individual, even though generally that information was provided for other individuals in the cuartel, a situation, for example, common in marital status.


• Numeric Values. Usually unknown numeric values will be coded a "0" with a value label of "unknown," "unable to determine" or "unclassifiable," depending on the circumstances. Occasionally, a "0" will represent both missing data in cuartels where normally that information is provided and in cuartels where no data is provided. Usually this is in the numeric version of a string variable (e.g. "Restype2").


• String Values. Unknown values in string variables are usually written as "unknown."


• An "assumed" Value. If a value is missing but can be inferred from the data then it gets an "assumed" code. For example, children of parents listed as Spanish will be given an assumed Spanish code. See "Marital Status" below for our rules for providing "assumed" values in that variable.


• Illegible Values. Finally, data which was provided but which could not be read because it is too faded, illegible handwriting, etc. is usually represented by a 98 or a 998 for numeric variables and "illegible" in string variables.


Notes:

1. The issue of data quality is an on_going discussion among researchers. For example, see Sean Townsend, Cressida Chappell and Oscar Struijvé, Digitising History, online at http://hds.essex.ac.uk/g2gp/digitising_history . We have, as far as possible and practical, followed procedures established or refined by the Integrated Public Use Microdata Series (IPUMS) at the University of Minnesota Historical Census Project, Steven Ruggles director. See Matthew Sobek and Steven Ruggles, "The IMPUS Project. An Update," Historical Methods (Summer 1999), vol. 32:3, pp. 102-09, and early publications in the same source.


Variables List for the Censuses of 1821 and 18221

MASINDEX Master index #

INDEX Unique identifying number for each record in each cuartel.

PAGE Manuscript page #.

YEAR Year of the census (padrón) was taken, 1821 or 1822.

CUARTEL Cuartel #.

RESTYPE Residence type (casa, asesoria, etc.). String.

RESTYPE2 Residence type string consolidated into numeric.

BLOCK Block ("manzana") number given in the census manuscript.

HHNUMBER Household # (Español es Unidad Domestica #).

FHHNUMBR Flag for household number.

PERSEQHH Sequence number of person within each household.

STREET Street name.

STREETC Directions given by the census taker on the census manuscript.

TITLE Individual Title.

DONDO-A Social status/don or doña.

DONDO-A2 Social status consolidated.

FIRSTNAM First or given name(s). String.

SURNAME Surname(s)/apellido(s). String.

AGE Age.

AGE2 Age consolidated.

SEX Sex.

ESTADO Estado/marital status.

ESTADO2 Estado consolidated.

JOB Job/oficio. String

JOBESP Job/oficio string consolidated into numeric. In Spanish.

JOBENG Job/oficio string consolidated into numeric. In English.

SIMJOBHD Person has same or similar occupation as head of household

CALIDAD Calidad/ethnicity.

CALIDAD2 Calidad/ethnicity consolidated.

RAZACLAS Ethnicity and class.

SPOURACE Spouse's ethnicity

RACESPOU Spouse's ethnicity combinations.

RACEHH Ethnicity combinations in household (not including servants).

PATRIA Patria/birthplace.

PATRIA2 Patria/birthplace string consolidated into numeric.

GISTOWN Geographic Information Systems (GIS) modern municipalities in Jalisco.

GISSTATE GIS modern Mexican states.

GISBLOCK GIS assigned block #.

GISCUADR GIS assigned side or "facing" block #.

BIRTHPLC Birthplace: Guadalajara or elsewhere.

MIGREGIN Migration region.

PARTIDOS Migration partidos/towns.

GQNAME Name of group quarters.

GQNAME2 Name of group quarters string consolidated into numeric.

GQNUMBER Sequence number of group quarters in each cuartel

GQSEQ Sequence number of each individual living in each group quarters.

HEAD1822 Cuartel number of the heads of households (listed for 1822 when a full census was not available).

HHSTRUC Household structure.

FHHSTRUC Flag: Household structure.

POSINHH Position in the household.

FPOSINHH Flag: Position in household.

FAMSEQHH Family sequence in the household.

FAMSTRUC Family structure.

FFAMSTRU Flag: Family structure.

FAM3GEN Three generations (or more) are present in household.

FAMTYPE Family type (servant family, boarder family, etc.)

POSINFAM Position in the family.

FPOSINFA Flag: Position in family.

MRFUHSTR Structure of multiple related families, unrelated to the head of household.

POSINMF Position of each person in a MRFUH.

FPOSINMF Flag: Position in MRFUH.

NUMINHH Number in household.

SERVANTS Number of servants in household.

WORKERS Number of employed persons in the household.

MIGRANTS Number of migrants in the household.

MALES Number of males in the household.

FEMALES Number of females in the household.

BOARDERS Number of boarders in the household.

KIN Number of kin in the household.

NUMINFAM Number in the family.

ADULTS Number of adults in family (age 18/plus).

CHILDREN Number of minor children in family (17/under).

FAMKIN Number of kin in family.

MOMLOC Mother's location in household.

FMOMLOC Flag: Mother's location in household.

POPLOC Father's location in household.

FPOPLOC Flag: Father's location.

SPLOC Spouse's location in household.

MIGMARST Migrant marital status.

FMIGMAR Flag: Migrant marital status.

YEARSRES Years in residence.

AGEMIG Age at Migration. Cuartel 18 & 20, 1821 only.

AGEMIG1 Age at Migration "One." [Not provided in the database.]

AGEMIG2 Age at Migration "Two." "

AGEMIG3 Age at Migration "Three." "

STEPMIG Step Migration

MIGKIN Migrant lived with relatives.

MIGNOKIN Migrant live with non_kin migrants.

COMMENT Comments by the census takers or by GCP staff.


Notes:

1. In the order in which they appear in the database.


 

 

 

Google
Search WWW Search www.fsu.edu

 
     
 
fsu seal
fsu seal
FSU Home | Search | Arts & Sciences |
© 2003 Florida State University, historyweb@fsu.edu
florida state university
fsu seal