Data Best Practices¶
Following is a series of best practice for data collection and file creation and suggestions about best practices of formatting data within your VIP XML and CSV file.
Naming convention¶
While many of the Voting Information Project’s data processes are managed by software, the quality of the entire system relies on human intervention, especially for error reporting and quality control. For this reason, VIP files should follow a naming convention that describes the contents of each individual feed file in an accessible way.
The file containing the VIP feed should be named using the following convention:
vipfeed-${FIPS}-${ELECTION_DATE}-${STATE}[-${LOCAL}].{xml|zip}
An explanation of each of the segments of the file naming convention above are as follows:
${FIPS}
- The FIPS code for the jurisdiction.${ELECTION_DATE}
- The date of the election in ISO 8601 format.${STATE}
- The full state name (e.g. Alaska, Arkansas, etc…) and not the abbreviation. If there are spaces in the state name, they should be substituted with underscores (e.g. New York -> New_York).${LOCAL}
(optional) - This additional identifier should be used if the file contains data from a specific jurisdiction. As with${STATE}
above, all spaces should be substituted with underscores. For example, if the data contained in the file only covers Maricopa County, AZ for the November 6, 2012 election, the file name would bevipfeed-04013-2012-11-06-Arizona-Maricopa_County.xml
.{xml|zip}
- If the file is an uncompressed XML document, the extension should be.xml.
If the file is zipped, the file extension should end with.zip
.
For a final example, vipfeed-19-2012-11-06-Iowa.zip
denotes Iowa’s (NB: the FIPS code
for IA is 19) feed for the Nov 6, 2012 election that has been compressed.
Element Identifiers¶
Most elements within the VIP feed require unique identifiers, xs:ID data types. Conformance to xs:ID
requires
the identifying record to:
- begin with a letter or underscore
- only contain letters, digits, hyphens and periods
- be unique across the VIP data set
In order to maintain uniqueness and provide context for the identifiers, the best practice is to use Hungarian-Style notation for identifiers.
ID values should follow Hungarian-Style notation, were the identifier prefix implicitly names the data element. Below
is a list of preferred prefixes by element (e.g. par00001 for a Party
id
):
Element | Prefix |
---|---|
BallotMeasureContest | bmc |
BallotMeasureSelection | bms |
BallotStyle | bs |
Candidate | can |
CandidateContest | cc |
CandidateSelection | cs |
ContactInformation | ci |
Election | ele |
ElectionAdministration | ea |
ElectoralDistrict | ed |
HoursOpen | hours |
Locality | loc |
Office | off |
OrderedContest | oc |
Party | par |
PartyContest | pc |
PartySelection | ps |
Person | per |
PollingLocation | pl |
RetentionContest | rc |
Source | src |
State | st |
StreetSegment | ss |
File Structure¶
All XML and CSV files should be encoded UFT-8 and line breaks should be LF (\n
) as opposed to CR LF (\r\n
).
For consistency across files and to aid human readability all indentation of elements should be an indent of two spaces and tabs should not be used. Each child node of an element should also be indented an additional two spaces.
General Data Structure¶
All data that are presented to end users of the data (i.e. contest names, referendum text, polling location names, street names, proper names), where possible, should be converted to Title Case to aid readability.
All data should be trimmed to remove leading and trailing white space.
Optional elements without values should be omitted from XML feed.
Specific Data Types¶
Elements with a data type of xs:integer
must contain a valid whole number greater than zero.
Elements with a data type of xs:anyURI
should be entered as a fully qualified domain name
(e.g. https://www.votinginfoproject.org/)
Elements with a data type of xs:dateTime
should be entered in ISO-8601 format.
Elements with a data type of xs:boolean
should either have a value of true
or false
Elements with a data type of xs:language
should contain a two character, lower-case, value corresponding to the
ISO 639 standard.
Elements that have enumerations which include an other
should have a corresponding value assigned to OtherType
within
the containing element. For example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | <BallotMeasureContest id="bm390616670907">
<BallotSelectionId>bms390616670907</BallotSelectionId>
<ElectoralDistrictId>ed3906177703103</ElectoralDistrictId>
<Name>Proposed Tax Levy School District</Name>
<SequenceOrder>34</SequenceOrder>
<FullText>
<Text language="en">An additional tax for the benefit of the Lockland Local School District, County of Hamilton,
Ohio, for the purpose of CURRENT EXPENSES at a rate not exceeding eleven and two-tenths (11.2) mills for each
one dollar of valuation, which amounts to one dollar and twelve cents ($1.12) for each one hundred dollars of
valuation, for a continuing period of time, commencing in 2015, first due in calendar year 2016.</Text>
</FullText>
<SummaryText>
<Text language="en">4 Proposed Tax Levy</Text>
</SummaryText>
<Type>other</Type>
<OtherType>bond</OtherType>
</BallotMeasureContest>
|
Specific Data Elements¶
Street Segments: Valid street segment records should not contain leading zeros in xs:integer
fields and should have
a Zip
value of 00000
if a value is unknown.
External Identifiers: External identifiers with an enumeration of fips
should contain valid FIPS code values as
defined by the U.S. Census Bureau. External identifiers with an enumeration of ocd-id
should contain a valid
Open Civic Data Division Identifier.
For long text fields (e.g. FullText
in BallotMeasureContest
) the XML line break (

) should be used to
enforce line break styling.
In all fields the characters <
, >
, and &
should be encoded <
, >
, and &
respectively.