|
|
History of Guadalajara Censuses Project | |
|
|
History of the Guadalajara Censuses Project The roots of the GCP go back to 1978. While researching in the Guadalajara Municipal Archives, the project director, Rod Anderson, became aware of the existence of a considerable number of city population censuses (called padrones), including one nearly complete for 1821 and a less than complete, but rare followup census for 1822.1 Although various scholars knew of the existence of the data, only Berkeley historical demographers S. F. Cook and Woodrow Borah had published from the material, using partial, hand-calculated returns.2 In 1983 Anderson constructed a machine-readable database from a systematic sample of every tenth household taken from selected wards (cuarteles) of the two censuses, from which he published a monograph and a number of articles.3 In 1990 the PD and several of his graduate students began preliminary work on the current project. They were Elaine Carey, Gerry Geis, Burton Kirkwood, David Sicko and James (Jay) Tapp. In order to provide an institutional framework for this and other projects, in 1992 Anderson and his graduate students created the Urban History Workshop (UHW). The Workshop focuses on providing students practical experience in research design, methodology, scholarly writing and publication. In keeping with its emphasis on hands-on experience, the UHW founded the Urban History Workshop Review in the fall of 1993. When the editor of the UHWR moved on to the History Department of the University of Evansville, the latter co-published the journal along with Florida State. After the completion of the preliminary study, in the fall of 1993 and the following spring the UHW organized a series of three, three-day workshops to critique the proposed project, bring in four outside scholars from different fields and expertise: Margo Anderson, Department of History, U. of Wisconsin-Milwaukee; Michael Hawthorne, Department of Political Science, Pembroke State U., NC; Robert McCaa, Department of History, U. of Minnesota; Michael Scardaville, Department of History, U. of South Carolina. Anderson is a leading scholar of the U.S. historical censuses. Michael Howthorne, a political scientist, is an expert in research design. A Latin Americanist, Robert McCaa, is one of the field’s leading historical demographers and brought a wealth of practical experience and theoretical knowledge to our project. Michael Scardaville is a Mexicanist with long experience in urban history whose experience with the data analysis software SAS complemented Hawthorne and McCaa’s familiarity with SPSS, the latter probably the leading data analysis system used by historians. In addition also participating were: Charles Nam, nationally-known demographer from FSU’s Center for Population Studies; C. Peter Ripley, Department of History and Director of the NEH-funded Black Abolitionists’s Papers Project; Morton Winsberg, Department of Geography and expert in spatial aspects of contemporary census data; Douglas Charity, technical advisor to the project from the beginning. Although disagreeing on various details, the workshop participants made a number of specific suggestions.5 The long term goal of the Guadalajara Censuses Project is to create a database from the population censuses of Guadalajara, from the military census of 1791 to the modern city census of 1930. For the purpose of teaching the GCP emphasizes graduate student involvement in all phases of the project: research design, methodology, paleography, data collection, coding, data entry, data verification, the writing of user guidelines, Website maintenance and the development of the project CD-ROM. UHW staff began actively publicizing the Guadalajara Censuses Project in 1993, both through its annual bulletin, the Urban History Workshop Review, and through the Conference on Latin American history. In the Fall of 1993 and again, in the summer of 1997, the Project Director and the UHWR editor, J. Burton Kirkwood, spent several weeks in Guadalajara working on various aspects of the project. These activities and the GCP in general have been strongly supported by financial and other aid from the Department of History and the College of Arts and Science, of Florida State University, and from the F.S.U. Council on Research and Creativity. History chairs Neil Betten, Richard Greaves and Neil Jumonville have been particularly supportive and encouraging, as have Deans Larry Abel and Donald Foss of the College of Arts and Sciences. The current phase of the project began in April of 1999. The then Director of the Division of Preservation and Access of the National Endowment for the Humanities (NEH), Helen Agüera, notified Rod Anderson that the GCP application for funding had been approved, requiring only minor modifications. The NEH funded portion amounted to $114,959 out of a total budget of $207,435, the remainder being provided as cost-sharing by the Department of History, the College of Arts and Sciences and F.S.U.’s Office of the Vice President for Research. The project consisted of creating a database from the padrones of 1821 and 1822, and ran from June 1999 to April 2001. By the end of the project, the GCP had created a database of 89 variables for 57,091 cases, mainly individuals (the exceptions are several hundred vacant houses, which we felt useful enough to report), including many constructed variables created for the convenience of the users. These are the variables which constitute the database for the CD-ROM. We actually entered more data than we promised because in the course of the project it was decided to include data for the heads of households for six districts in 1822 for which no complete manuscript census had been found. We also entered separate lists of clerics living in religious houses, of families living in the parish charity house, and a list of all incarcerated persons. The total pieces of social data in the database are just over three million. Coding and data entry efficiency during both the Literal and the Constructed phases exceeded expectations, although the project required more time than we had originally planned. In part, the reason the project took longer than expected was that we added about twenty constructed variables to our original list of variables, several of which were relatively complex and all of which were labor intensive. We reasoned simply that given the great time and expense already undertaken, that we might as well make the most of it. Also, a good deal more time than anticipated was spent in data verification and error detection. Data Verification and Consolidation. For a detailed discussion of the GCP procedures for data verification, please go to Error Detection and Verification Procedures. The search for data entry error was carried out at regular intervals during data entry, and at the end of the entry of each cuartel. During the literal phase a print-out of the entered data is compared to an enlarged, 11" x 17" copy of the original manuscript page. Numeric values were compared orally by two staff members. Literal data written in script was sight reviewed by one staff member. When all the data for each cuartel have been entered, the data was “cleaned.” This process consisted of creating frequencies for each variable which were then reviewed for anomalies in values. Tables were created for special variables prone to error. (In our first project, for example, this process revealed fifty-three male doncellas–“ young ladies.”) All given and surnames names were examined by a native speaker of Spanish to verify their likely authenticity. Once Literal data entry was completed for each cuartel separately, they were merged into one large file and underwent a further “cleaning” process in a final search for error. In order to quantify
the rate
of error in the finished database and to identify those areas
most vulnerable to error, we generated a random
sample of 3412 individuals from our 21 literal entry variables (6% of
57091). The sample data was then compared to the original manuscript,
case by case. Errors were noted (and corrected). To provide some comparison,
we compared our rates of error to the error rates for the IPUMS U.S.
national historical census project, the leading census project in the
field.6 In the eight comparison variables for two censuses our error
rates were lower than IPUMS, and in seven they were higher. On the
average
our error rate was 0.53 per 100 cases (mean) and 0.46 (median), or approximately
half an error per 100 cases (a case being the data of one variable for
one individual). After the completion of the constructed variable stage
of our project, we took a sample of 3400 cases for each of three sets
of variables (household and family; “numbers” and kin; and “locs” and
migration), comparing the data entered to the coding sheets. Our average
error rate per 100 cases for the eleven household and family variables
in this category was 0.57 (mean) and 0.53 (median), with a low of 0.44
and a high of 0.82. Despite our efforts, errors will remain. We hope
that all users will notify us of any errors they might find, so that
they can be corrected in future versions of our database. (We would like
to remind our users that the Archive File will contain errors since caught
and changed in the Consolidated File.) Nonetheless, data from the earlier period in the history of Mexico is rare and worth analyzing even in partial form. Bracketing the five critical decades surrounding Mexico’s independence from Spain, this data will enhance the 1821-22 censuses by providing breadth of coverage, documenting urban life from the height of Bourbon rule to the chaotic years of the regime of Santa Anna. The 1821-22 censuses will serve to anchor the new data, as a base for comparison. And although the new censuses are incomplete, every one contains the manuscript census for Cuartel 8, the city’s most populous district and the heart of working-class Guadalajara (as it has remained down to the present). The sample for 1930 provides the rare opportunity to enable the database to touch three centuries. When fully integrated, the completed Guadalajara Census Project will comprise more than 130,000 cases, one of the largest, easily accessible historical urban census databases anywhere. Also planned to be included in the final product will be a documentation file with the following six guides: (1) Guide to the GCP database, written by P.I. Rod Anderson. (2) Guide to SPSS and the GCP Database, by Margo Anderson. (3) Guide to Excel and other formats in the GCP database, by Douglas Charity (project technical advisor). (4) Guide to Genealogical Explorations of the GCP Database, by Rod Anderson. (5) Guide to Census Data, by Margo Anderson. The current project is funded through a renewal grant of $166,838 from
the National Endowment for the Humanities, Division of Preservation and
Access, with a cost sharing of $290,651 from the Department of History
and the College of Arts and Sciences of Florida State University, and
the Office of the Vice President for Research. The project is projected
to run from June 2002 through December 2004. Notes: 6. W. Block and D. L. Star, “Data Entry and Verification,” Historical Methods, vol. 28:1. The variables compared were: Household number, Household size, race, sex, age, marital status, occupation, birthplace. The variables added to those to obtain our averages were: First name, surname, title and block number. The actual error rate was lower because we corrected the errors in the sample.
|
|
|||||||||||||||||||||||||||||
|
|
|
|
© 2003 Florida
State University, historyweb@fsu.edu
| ||