Social Science Data
Workshop
Winter 2005 Dan Tsang
1. What is social
science data?
“Social
science data are the raw material out of which social and economic statistics
are produced. Social science data originate from social research methodologies
or administrative records, while statistics are produced from data. Data are
the information collected and stored at the level at which the unit of analysis
was observed. Summaries of these data are usually statistics. Data must be
processed to be of practical use. This compilation is accomplished with
statistical software, which reads the raw data from a computer file.” [Definition from: Glossary of Selected Social Science
Computing Terms and Social Science Data Terms]
2. What is a
data file?
“A data file is not the analyzed findings of a study or statistics, but the raw collected data from which these statistics might be extrapolated. It consists of rows and columns of alphanumeric characters. The majority of ICPSR's data files are ASCII fixed-format files. The storage formats of data files may be either logical record length format, card image, or delimited format. The physical structure of data files also varies and may be either rectangular, hierarchical, or relational. Some data collections may also include data available in other formats, such as SPSS portable files or SAS transport files.” [From: http://www.icpsr.umich.edu/help/faq-data.html#FORMATOFDATA1] A dataset can be composed of many data files stemming from one study.
3. Types of data
formats
Data are distributed in many formats, including ASCII, OSIRIS, SPSS Portable, SAS Transport. [See: http://www.icpsr.umich.edu/help/dataformat.html]. Increasingly, data also are being made available as Excel files. Many of the files researchers use have data arranged rectangularly, usually in fixed format (i.e., each record has the same length and the location of each variable in each record is the same).
Stat Transfer is a program that can convert between formats, e.g. from SPSS portable to Excel. Contact dtsang@uci.edu if there is a request to convert a file from one format to another.
4. What is a
codebook?
“A codebook
provides information on the structure, contents, and layout of a data file.
Users are strongly encouraged to look at the codebook for a study before
downloading the data files.” [From: http://www.icpsr.umich.edu/help/faq-data.html#FORMATOFDATA2]. It is the metadata or documentation for a
dataset.
5. Social Science
Data Archives
Click on “Data Sources” on left menu for sites with data or statistics.
· General Social Survey 1972-2002
http://csa.berkeley.edu:7502/cgi-bin12/hsda?harcsda+gss00
6. Selected UCI-Licensed Data sites
http://data.lib.uci.edu
click on UCI-Licensed Data
· ICPSR [Inter-university Consortium for Political and Social Research]
Largest archive of social science datasets. Content spans social science disciplines. See also: ICPSR Data on left menu of Social Science Data Archives:
· Social Science Electronic Data Library
Click on gold key.
Sociometrics site of government-collected health-related data enhanced for secondary analysis
·
iPoll databank @ the
http://roperweb.ropercenter.uconn.edu/iPOLL/login/ipoll_login.html
Variable-level search of question-text. Data files for many of the surveys are
available. E-mail: dtsang@uci.edu.
This site has a later coverage than the one linked from ICPSR
Publications and data tables from OECD. Includes socio-economic data files.
http://data.lib.uci.edu/ssda/driweb.html
Economic time series (historical and forecast) data from private firms such as Dow Jones as well as IMF and OECD. Also known as Global Insight, DRI-WEFA. Software interface: DRI Webstract that has to be downloaded to a workstation.
Public opinion poll results from around the world at question text level with responses.
Mark Baldassare’s Orange County Annual Surveys and (since 1991) Special Survey of Orange County. Public access.
Pacific
Opinions’ surveys of
Click on: Data Sources on left menu
Click on C; and
then scroll to California Polls. Also
known as Field (
Polls. Field Institute polls of
Compiled by Daniel C. Tsang, Social Science Data Librarian, UC Irvine
380 Langson Library. E-mail: dtsang@uci.edu
Office hours: Tuesday