Social Science Data Workshop

Winter 2005 Dan Tsang

 

1. What is social science data?

 

            Social science data are the raw material out of which social and economic statistics are produced. Social science data originate from social research methodologies or administrative records, while statistics are produced from data. Data are the information collected and stored at the level at which the unit of analysis was observed. Summaries of these data are usually statistics. Data must be processed to be of practical use. This compilation is accomplished with statistical software, which reads the raw data from a computer file.”  [Definition from: Glossary of Selected Social Science Computing Terms and Social Science Data Terms]

 

2.  What is a data file?

 

            A data file is not the analyzed findings of a study or statistics, but the raw collected data from which these statistics might be extrapolated. It consists of rows and columns of alphanumeric characters. The majority of ICPSR's data files are ASCII fixed-format files. The storage formats of data files may be either logical record length format, card image, or delimited format. The physical structure of data files also varies and may be either rectangular, hierarchical, or relational. Some data collections may also include data available in other formats, such as SPSS portable files or SAS transport files.” [From: http://www.icpsr.umich.edu/help/faq-data.html#FORMATOFDATA1]   A dataset can be composed of many data files stemming from one study.

 

3. Types of data formats

            Data are distributed in many formats, including ASCII, OSIRIS, SPSS Portable, SAS Transport.  [See: http://www.icpsr.umich.edu/help/dataformat.html].  Increasingly, data also are being made available as Excel files.   Many of the files researchers use have data arranged rectangularly, usually in fixed format (i.e., each record has the same length and the location of each variable in each record is the same).

 

            Stat Transfer is a program that can convert between formats, e.g. from SPSS portable to Excel.  Contact dtsang@uci.edu if there is a request to convert a file from one format to another.

 

4. What is a codebook?

 

            “A codebook provides information on the structure, contents, and layout of a data file. Users are strongly encouraged to look at the codebook for a study before downloading the data files.” [From:  http://www.icpsr.umich.edu/help/faq-data.html#FORMATOFDATA2].  It is the metadata or documentation for a dataset.

 

5. Social Science Data Archives 

http://data.lib.uci.edu. 

Click on “Data Sources” on left menu for sites with data or statistics. 

·        General Social Survey 1972-2002

http://csa.berkeley.edu:7502/cgi-bin12/hsda?harcsda+gss00

 

 

 

 

6.  Selected UCI-Licensed Data sites

   

  http://data.lib.uci.edu click on UCI-Licensed Data

 

 

UCI Proxy Server for accessing from off-campus.

·        ICPSR [Inter-university Consortium for Political and Social Research]

http://www.icpsr.umich.edu

            Largest archive of social science datasets.  Content spans social science disciplines.  See also: ICPSR Data on left menu of Social Science Data Archives:

http://data.lib.uci.edu

 

·        Social Science Electronic Data Library

http://www.socio.com/edl.htm

            Click on gold key.

Sociometrics site of government-collected health-related data enhanced for secondary analysis

 

·        iPoll databank @ the Roper Center for Public Opinion Research

http://roperweb.ropercenter.uconn.edu/iPOLL/login/ipoll_login.html

 

Variable-level search of question-text.  Data files for many of the surveys are

available.  E-mail: dtsang@uci.edu.

 

 

This site has a later coverage than the one linked from ICPSR

 

  • SourceOECD

http://new.sourceoecd.org/

Publications and data tables from OECD.  Includes socio-economic data files.

 

  • Global Economics via the Internet

http://data.lib.uci.edu/ssda/driweb.html

           Economic time series (historical and forecast) data from private firms such as Dow Jones as well as IMF and OECD.  Also known as Global Insight, DRI-WEFA.  Software interface: DRI Webstract that has to be downloaded to a workstation.

     

  • Polling the Nations

http://poll.orspub.com/

Public opinion poll results from around the world at question text level with responses.

 

 

  • Orange County Surveys

http://ocsurveys.lib.uci.edu

Mark Baldassare’s Orange County Annual Surveys and (since 1991) Special Survey of Orange County.  Public access.

 

  • The Pacific Poll

http://pacpoll.lib.uci.edu/

Pacific Opinions’ surveys of Orange County and San Diego residents on local and transborder politics. Public access

 

  • California Polls

      http://data.lib.uci.edu

      Click on: Data Sources on left menu

      Click on C; and then scroll to California Polls.  Also known as Field (California)

       Polls. Field Institute polls of California public opinion.

 

Compiled by Daniel C. Tsang, Social Science Data Librarian, UC Irvine

380 Langson Library. E-mail: dtsang@uci.edu

Office hours: Tuesday 3-4 pm, Thursday 1-2 pm.